Coptic Treebank 2.2 – moving us to better parsing!
With the data release of Universal Dependencies 2.2, an update to the Coptic Treebank is now online! Thanks to work by Mitchell Abrams and Liz Davidson we’ve been able to add the first three chapters from 1 Corinthians and make numerous corrections. Another three chapters of 1 Corinthians and a portion of the Martyrdom of Victor the General are coming soon. You can see how we’ve been annotating and the documentation of our guidelines here:Thanks to the new data, automatic parsing has become somewhat more reliable, allowing us to add automatic parses to the most recent release. The results are better than before, but note we still only expect around 90% accuracy. To illustrate where the computer can’t do what humans can, here are two examples of a verb governing a subordinate verb in a clause marked by Ϫⲉ ‘that’. The subordinate verb usually has one of two labels:
- ccomp if it’s a complement clause (I said that…)
- advcl if it’s an adverbial clause, such as a causal clause (Ϫⲉ meaning ‘because’).
One of these examples was done by a human who got things right, the other contains a parser error – see if you can spot which is which!