I don’t know yet, but I’m going to give it a go.
And I’ll need a little help from others, please.
I have been thinking about the problem of unknown writers and how we can try to identify them. In writing story posts here, Mistyfan and I sometimes raise questions about whether such and such a writer might have also written such and so other story, based on things like similar plot lines and the like. But there is a whole area of research into using computers in the Humanities, and a specific technique designed to help you attribute authorship to unknown writers: it’s called Stylometry. I want to try to use one of the pieces of software that does this – JGAAP – to see if we can get any help in thinking about who might have written what, or at least in some cases. (Edited to add: this is written by the chap who did the analysis that strongly suggested that J K Rowling was the author of “The Cuckoo’s Egg”.)
The way it works is that I need to feed the program a number of texts from Known Authors, because it then compares the unknown writing with those known samples. (All it can ever do is say ‘this piece looks most likely to have been written by Author A out of the list of A – Z that you have given me’ – it’s just matching a sample to a known finite list, so it has limitations.) That means I need some text files (as many as possible) which are typed-up versions of stories where we already know the authors, such as the below:
- Jay Over, Slave of the Clock / The Secret of Angel Smith / The Lonely Ballerina from Tammy 1982 and 19833
- I can do the first two but haven’t got any copies of The Lonely Ballerina
- Alison Christie – see list on the interview post
- Pat Mills, various stories including Moonchild in Misty and Concrete Surfer in Jinty
- I am in the middle of typing up the episode of Concrete Surfer included in the post about this story
- Alan Davidson, Fran of the Floods / The Valley of Shining Mist / Gwen’s Stolen Glory
- Malcolm Shaw, The Robot Who Cried
Can any one help by typing up one or more episodes from the stories mentioned, and sending them to me? I’m working out a standard format to use, because it’s going to be important to be consistent about things like how to indicate thought balloons or the text boxes at the beginning of each episode. We can work that out further together of course. Very many thanks in advance!
Once I have enough example files to start running them through the program, this is what I am intending to try (any comments or suggestions will be received with interest).
- Can I get the program to work at all?
- If I load a credited Jay Over text as a Known Author, and a Pat Mills story likewise as a Known Author, will an episode of “Slave of the Clock” be successfully identified as a Jay Over story?
- What if I then compare a credited “Pam of Pond Hill” story – will the program identify this as a Jay Over story, or will the comedy style mean it is not as recognisable to the program?
- What if I then compare an uncredited “Pam” story with a credited “Pam” story? We think all the Pam stories were written by Jay Over but could this program show us any other views?
- What if I then add in more Known Authors and re-run the tests above – will the results still come out the same?
- And then excitingly I could try some further tests, like:
- If I compare an episode of “Prisoner of the Bell” to “Slave of the Clock”, does the former look like the known Jay Over texts?
- If I compare an episode of “E. T. Estate” by Jake Adams to the uncredited story “The Human Zoo”, what does the program indicate about any plausible attribution?
- We think Benita Brown probably wrote “Spirit of the Lake” – is there any textual / stylistic similarity we can find between this and “Tomorrow Town” that we know she wrote?
Of course no stylistic attribution program is going to replace a statement from a creator or a source from the time, but we know these are thin on the ground and getting thinner, and what’s more people’s memories and records are getting more fragmentary as time goes by, so this seems worth trying. I don’t expect anything to happen very quickly on this because it does mean quite a bit of typing to get a good body of texts. If anyone is able to help on the typing front then I will be very grateful and hopefully will then be able to show any results sooner rather than later.
Apologies, I had meant to say something about the format of the text. I have a sample document which hopefully can be viewed via this link. In case that doesn’t work, this is what I mean for it to look like:
But I can add in extra detail such as the description that the text appeared in a word balloon, if I have a scan of the pages in question.