Therefore, the rather rare phrase PEDžE IS* (though frequently used in the Gospel of Thomas since we have to do there with a collection of Jesus’ sayings) is used even in both instances of speaking, instead of the form PEDžAF (+ pronominal/nominal object) + NQI + subject that is more common in dialogues or other literary texts. Here in the first instance one would expect something like PEDžAU NIS* NQI NMAThÉTÉS, and in the second instance PEDžAF NAU NQI IS*, or since Jesus answers the disciples, even AFOUÓŠB[2] NQI IS* PEDžAF NAU DžE. It seems a cautious and perhaps unsure modern Coptologist was at work here.

To actually understand what's going on, first, PEDžE. Translated as "(pronoun) said", PEDžE belongs to a funny little class of words Layton (2000:297-314) refers to as verboids. Semantically, they are like verbs, and they can even take some of the verbal affixes, but there are a few important aspects in which they differ from actual verbs. In case of PEDžE, they are as follows:

- 1. PEDžE cannot be negated or converted, i.e. it cannot take the relative, circumstantial, preterit or focalizing prefix (Layton 2000:321-322).
- 2. PEDžE only expresses the past tense.
- 3. PEDžE can only be conjugated sufixally.
- 4. PEDžE can appear in two forms:
- independently (PEDžE), in which case it must be immediately followed by the subject noun or pronoun (Layton's 'prenominal state').
- suffixed (conventionally written as PEDžA=) where the suffix marks the subject of the action of speaking (Layton's 'prepersonal state'). In this case, if the 3rd person subject is also expressed by a noun, the noun is preceded by the preposition NQI.

In GJW, PEDžE (i.e. the prenominal state) appears twice: first on line 2 (PEDžE MMAThÉTÉS NIS* DžE ... = "The apostles said to Jesus: ...") and then of course on line 4 (PEDžE IS* NAU TAHIME = "Jesus said to them 'My wife...'"). The objection Dr. Robinson raises is that this is unlikely since PEDžE in its prenominal state is rare and seeing it twice in such a short text even more so when there are other (presumably more frequent) constructions that could have been used. We have very little reason to doubt Dr. Robinson's intuition and experience. But what we also have is a way to actually check whether she's right. Distribution and probability, that's all we're dealing with here, and that is a familiar theory NLP territory where a corpus and some math is all you need. The questions to be asked can be reformulated as follows:

1. Is the prenominal state of PEDžE indeed rare?

2. What is the probability of one prenominal PEDžE following another?

3. What about the frequency of other constructions?

First, the data: I decided to use the gospels for both theoretical (we are looking at Jesus' words after all) and practical reasons (Coptic translation of the canonical gospels is readily available from a number of sources online). So I cobbled together a little Perl script to retrieve the text of the canonical ones from The Unbound Bible website. I only used the version they refer to as "Coptic: Sahidic NT" which, according to their information, ultimately comes from Sahidica. To the canonical gospels I added the Gospel of Thomas (GThom) which gets mentioned a lot in this context and which I retrieved from metalog. After some minor cleanup, I ended up with 4 UTF-8-encoded plain text files (plus one for GThom) which I then fed into antconc. One of the cool features of antconc is the ability to define custom fonts which is particularly handy for Coptic. For best results, use New Athena Unicode and for replicability, my original settings.

And now the procedure: for the first question, let's go with something simple. The null hypothesis is that the distribution of the prenominal state on one hand and all the forms of the prepersonal state on the other is roughly equal. In other words, there is no particular reason why an author or a translator should prefer one to the other. So when searching for all possible forms of PEDžE (PEDž.* in regex terms) - of which there are only a handful - we would expect to find PEDžE about 50% of the time. The table below sums the actual findings for the corpus consisting of the four canonical gospels:

Total wordcount | 59100 | ||

Total PEDž.* | 776 | ||

% of PEDž.* | Observed relative frequency (per 1000) | ||

PEDžE | 140 | 18,04% | 2,37 |

PEDžAF | 501 | 64,56% | 8,48 |

PEDžAS | 20 | 2,58% | 0,34 |

PEDžAU | 113 | 14,56% | 1,91 |

PEDžÉTN | 2 | 0,26% | 0,03 |

Total | 776 | 100,00% | 13,13 |

Prenominal state | 140 | 18,04% | 2,37 |

Prepersonal state | 636 | 81,96% | 10,76 |

Total | 776 | 100,00% | 13,13 |

Though the definition of 'rare' may vary, these figures clearly show that in the canonical gospels, the prepersonal state PEDž= is preferred to the prenominal stage. This becomes even clearer when one looks at the synoptic gospels only:

Matthew | % of PEDž.* | Observed relative frequency (per 1000) | |

Prenominal state | 2 | 1,20% | 0,12 |

Prepersonal state | 165 | 98,80% | 9,72 |

Mark | |||

Prenominal state | 12 | 10,62% | 1,19 |

Prepersonal state | 101 | 89,38% | 10,02 |

Luke | |||

Prenominal state | 55 | 20,37% | 3,19 |

Prepersonal state | 215 | 79,63% | 12,47 |

With John, the prenominal form makes up almost a third of all forms of PEDžE. Moreover, more than a half of all instances of the prenominal form in the four canonical gospels can be found in the Gospel of John:

John | % of PEDž.* | Observed relative frequency (per 1000) | |

Prenominal state | 72 | 31,72% | 4,86 |

Prepersonal state | 155 | 68,28% | 10,47 |

Gospel of Thomas, however, is another story altogether:

Gospel of Thomas | % of PEDž.* | Observed relative frequency (per 1000) | |

Prenominal state | 101 | 71,13% | 25,63 |

Prepersonal state | 41 | 28,87% | 10,41 |

Here Dr. Robinson's intuition is proven correct once again: GThom clearly prefers the prenominal state of PEDžE. Moreover, the relative frequency (per 1000 words) of this state is much higher than even the relative frequency of all forms of the verboid in any of the canonical gospels or all of them combined.

So much for the first question, now on to the second one. The problem is a trivial one: calculate the probability of one prenominal PEDžE following another prenominal PEDžE. In other words, given the probability of two complementary events (i.e. the probability of either state ocurring), we need to calculate the probability of one of those events occurring twice in a row. Let P(N) be the probability of prenominal state being selected as determined above - i.e. in a situation where the author has already decided to use a form of PEDžE, P(N) expresses the probability of this form being the prenominal state. The probability of event P(M) (the prenominal state occuring twice in a row) is calculated as follows:

Canonical gospels: P(M) = P(N) * P(N) = 0,18 * 0,18 = 0,032

Gospel of Thomas: P(M) = P(N) * P(N) = 0,71 * 0,71 = 0,504

The probability of PEDžE ocurring twice in a row is therefore 3,2% for the canonical gospels and 50,4% for the Gospel of Thomas. If GJW is a narrative similar to the canonical gospels rather than a sayings gospel like GThom, then one would be fully justified in raising a brow over the two prenominal PEDžE in a row - doubly so when one takes into account the relative frequency of that state. For the canonical gospels, it's 2,37 (from 0,12 for Matthew to 4,86 for John), for GThom, the figure is 25,63. Compare that to the figures for GJW (calculated assuming a total word count of 31 words):

GJW | |||

Total wordcount | 31 | ||

Total PEDž.* | 2 | ||

% of PEDž.* | Observed relative frequency (per 1000) | ||

PEDžE | 2 | 100,00% | 64,52 |

And finally, question no. 3. Here the issue is a little more complicated (it involves variations in word order and information structure) and as such, the answer hard to arrive at without a decently tagged corpus. What I can do, however, is throw a few regular expressions around looking at what structures are used to refer to Jesus speaking (in absolute numbers):

Jesus speaks | Canon | GThom |

PEDžE IÉSOUS / IS* | 53 | 85 |

PEDŽAF * NQI IÉSOUS / IS* | 20 | 0 |

AFOUÓŠB NQI IÉSOUS / IS* | 18 | 0 |

Interestingly enough, the second structure is used four times in GThom, but only for other people speaking (Peter, Matthew and twice Thomas). Based on this, it would not be that unreasonable to expect PEDžE with Jesus as the subject to occur in any text similar to the canonical gospels, let alone to GThom.

When it's the apostles' turn to speak to Jesus, the picture is even more complicated: the apostles can be referred to as MMATÉTÉS ("the apostles") or NEFMATÉTÉS ("his apostles"), Jesus can be referred to by his name, by the

*nomen sacrum*IS* (I counted those together) or by NAF ("to him"). A few more quick regular expressions and voila:

Apostles speak | Canon | GThom |

PEDžE MMATÉTÉS | 0 | 4 |

PEDžE NEFMATÉTÉS | 0 | 0 |

PEDžAU NIÉSOUS / NIS* NQI MMATÉTÉS | 0 | 0 |

PEDžAU NAF NQI MMATÉTÉS | 1 | 0 |

PEDžAU NAF NQI NEFMATÉTÉS | 2 | 5 |

These are by no means all possible constructions, just the ones that seem to be relevant for this discussion. So the first is the one that occurs in GJW and, no surprise there, it crops up in GThom as well, but not in the canonical gospels. The second one is just a check - it struck me that NEFMATÉTÉS crops up roughly twice as much as MMATÉTÉS in both the canonical gospels and GThom - but as you can see, it really doesn't matter since this structure cannot be found in either. The third construction is the one Dr. Robinson would expect instead of the first one. As it turns out, this would be an unreasonable expectation, as it doesn't appear in either the canonical gospels or GThom. Two of its variations do - in both cases the target of speaking (=Jesus) is expressed by means of NAF and in one of them, the subject is NEFMATÉTÉS which is not surprising considering the relative frequency of the two forms of this noun.

Of course, all these figures mean very little. GJW is a small fragment, the corpus of material I used is limited in both size and scope and chances are some of my math is wrong (check for yourself), just to give a few objections that might legitimately be raised. Nevertheless, with a decently sized and properly tagged corpus of Coptic, this is an example of what Coptologists could do to check whether their intituition regarding the distribution of certain morphological or syntactic forms is correct, not to mention all the other cool stuff.

[1] I borrowed the table from Wikipedia; the asterisk marks a

*nomen sacrum*.

[2] At least I think that's what Dr. Robinson meant.

Those verboids... are they like Latin

inquitandait, older Englishquoth, and Old High Germanquad? Those are verbs that occur only in one or two tenses, only in the third person (and I think only singular), and only in a few (but common) contexts.Honestly, I don't know enough of their etymology to comment. Plus there are a bunch of them and they don't all behave identically.

