The jury’s still out at present, but I am very grateful for the kind offers of help of various sorts! I now have text versions of episodes of:
- “Slave of the Clock” and “The Secret of Angel Smith” (Jay Over)
- “The Sentinels” (Malcolm Shaw)
- “Fran of the Floods” (Alan Davidson)
- “Concrete Surfer (Pat Mills).
So I have enough to try to see if I can get the program to identify “Slave of the Clock” as being by Jay Over rather than any of the other writers. If anyone is able to send in any more texts, the following would be useful:
- Some texts by female writers such as Anne Digby, Alison Christie, Benita Brown
- Some more texts by the writers named above, so that I can offer the program a wider base of texts per each writer (rather than keeping on increasing the number of individual authors)
How far have I got so far? Not that far yet, I’m afraid to say. I have downloaded a copy of the program I chose (JGAAP) and I’ve got it to run (not bad in itself as this is not a commercial piece of software with the latest user-friendly features). I’ve loaded up the known authors and the test text (Slave of the Clock). However, the checks that the program gives you as options are very academic, and hard for me to understand as it’s not an area I’ve ever studied. (Binned naming times, analysed by Mahalanobis distance? What the what??) Frankly, I am stabbing at options like a monkey and seeing what I get.
I can however already see that some of the kinds of checks that the program offers are plausibly going to work, so I am optimistic that we may get something useful out of this experiment. These more successful tests involve breaking down the texts into various smaller elements like individual words, or small groups of words, or the initial words of each sentence, or by tagging the text to indicate what parts of speech are used. The idea is that this should give the program some patterns to use and match the ‘test’ text against, and this does seem to be bearing fruit so far.
So, an interim progress report – nothing very definite yet but some positive hints. I will continue working through the options that the program offers, to see if I can narrow down the various analytical checks to a subset that look like they are successfully identifying the author as Jay Over. I will then run another series of tests with a new Jay Over file – I’ll type up an episode of “The Lonely Ballerina” to do that, unless anyone else has kindly done it before me 🙂 – scans from an episode are shown below, just in case! That will be a good test to see if the chosen analytical checks do the job that I hope they will…