Can a computer program help us identify unknown writers? 4

Right now I am sorry to say that I haven’t had great success with the computer program that I was hoping would help us to identify unknown writers. I’m by no means declaring it to be impossible or unrealistic, but I think I will need to ask for help from the experts who wrote the program and/or who do more of this sort of analysis on a day to day basis.

My initial trials were to see if I could test a Jay Over script known to be by him against another one known to be by him, so as to see if the program could pick out a ‘known good’ example. It did do that pretty well, but it may be that I calibrated the program options too closely against Jay Over. I haven’t got to the stage of being able to say that this series of tests, done in this way, gives you a good chance of identifying this text by a known author. (Unless that known author is Jay Over, she says slightly bitterly.) And if I can’t do this reasonably reliably, there is no point (as yet, at least) in moving on to trying out unknown author texts.

In my last post about this computer program, I ran a series of 10 tests against a Jay Over text, and the program reliably picked out Jay Over as the most likely author of that text out of a supplied set of 4 test authors. It was much less reliable in picking out a test Malcolm Shaw text out of the same set of test authors: only 5 of the 10 tests suggested that Malcolm Shaw was the best fit. I have now tried the same 10 tests with an Alan Davidson text (“Jackie’s Two Lives”), and with a Pat Mills text (“Girl In A Bubble”). This means that all four of the test authors have been tested against a text that is known to be by them.

  • Unfortunately, in the test using an Alan Davidson text the program was even worse at picking him out as the ‘best fit’ result: it only did so in 2 of the 10 tests, and in 4 of the tests it placed him in last, or least likely to have written that test text.
  • In the test using a Pat Mills text, the program was rather better at picking him out as the ‘best fit’ result, though still not great: it did so in 4 out of 10 tests, and in 3 of the remaining tests he was listed second; and he was only listed as ‘least likely/worst fit’ in one of the tests.

The obvious next step was to try with a larger group of authors. I tried the test texts of Jay Over (“Slave of the Clock”) and of Malcolm Shaw (“Bella” and “Four Faces of Eve”) against a larger group of 6 authors (Primrose Cumming, Anne Digby, Polly Harris, Louise Jordan, Jay Over, Malcolm Shaw).

  • With the Jay Over text, only 7 of the 10 tests chose him as the ‘best fit’, so the attribution of him as the author is showing as less definite in this set of tests.
  • With the Malcolm Shaw texts, only 1 and 3 tests (for “Bella” and for “Eve” respectively identified him as the ‘best fit’ – not enough for us to have identified him as the author if we hadn’t already known him to be so. (He also came last, or second to last, in 4 of the first set of tests, and the same in the second set of tests.)

I should also try with more texts by each author. However I think that right now I will take a break from this, in favour of trying to contact the creators of the program. I hope they may be able to give me better leads of the right direction to take this in. Do we need to have much longer texts for each author, for instance? (We have generally been typing up just one episode for each author – I thought might be too much of an imposition to ask people to do any more than that, especially as it seemed sensible to try to get a reasonably-sized group of authors represented.) Are there some tests I have overlooked, or some analytical methods that are more likely to be applicable to this situation? Hopefully I will be able to come back with some extra info that means I can take this further – but probably not on any very immediate timescale.

In the meantime, I leave you with the following list of texts that people have kindly helped out with. You may find (as I have) that just looking at the texts themselves is quite interesting and revealing. I am more than happy to send on any of the texts if they would be of interest to others. There are also various scans of single episodes sent on by Mistyfan in particular, to whom many thanks are due.

  • Alison Christie, “Stefa’s Heart of Stone” (typed by Marckie)
  • Primrose Cumming, “Bella” (typed by Lorrbot)
  • Alan Davidson, three texts
    • “Fran of the Floods” (typed by Marckie)
    • “Jackie’s Two Lives” (typed by me)
    • “Kerry In the Clouds” (typed by me, in progress)
  • Anne Digby, “Tennis Star Tina” (typed by Lorrbot)
  • Gerry Finley-Day, “Slaves of War Orphan Farm” (typed by Mistyfan)
  • Polly Harris, two texts
    • “Monkey Tricks” (typed by Mistyfan)
    • “Midsummer Tresses” (typed by Mistyfan)
  • Louise Jordan, “The Hardest Ride” (typed by Mistyfan)
  • Jay Over, two texts
    • “Slave of the Clock” (typed by me)
    • “The Secret of Angel Smith” (typed by me)
  • Malcolm Shaw, five texts
    • “Lucky” (typed by Lorrbot)
    • “The Sentinels” episode 1 (typed by Lorrbot)
    • “The Sentinels” episode 2 (typed by Lorrbot)
    • “Bella” (typed by Lorrbot)
    • “Four Faces of Eve” (typed by Lorrbot)
  • Pat Mills, two texts
    • “Concrete Surfer” (typed by me)
    • “Girl In A Bubble” (typed by me)
  • John Wagner, “Eva’s Evil Eye” (typed by Mistyfan)

 

 

Advertisements

7 thoughts on “Can a computer program help us identify unknown writers? 4

  1. A pity that so far you didn’t get the results you were hoping for. Perhaps, like you say, it would work better with more texts. It could be made into a long-term project, and that we continue to type texts, until there is one of every serial that ever appeared in Jinty.
    By the way: I typed ‘Fran of the floods’! 😉

    1. I’d wait and see what the feedback and advice from experts is first, and what the next post on this says.

      1. I do like the idea of having an example of each of the stories in Jinty! But we should wait a bit and see if the experts suggest whether, for instance, that we should have a whole story by a known author written up, or some such.

    2. So you did! Apologies, I will fix it tonight. I wanted to credit (!) the typists partly to say thanks but also in case there is any difference between the way different people have done the typing, in case it turn out to be important in some unexpected way.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s