New Corpora Release 4.3.0

The opening lines of Pistis Sophia

It is our pleasure to announce release 4.3.0 of Coptic Scriptorium corpora, which currently cover over 1,175,000 tokens of searchable, linguistically analyzed Coptic data from dozens of ancient Coptic works. New in this release:

Corrections and additional annotations:

  • Pilot work adding partial Arabic translations (work by Philippe Zaher)
  • Improvements and error corrections to a variety of works (including Because of You Too O Prince of Evil, Dormition of John, Book of Ruth and Homilies of Proclus)

The newly released material encompasses over 57,000 tokens of semi-automatically annotated data. We would like to give special thanks to the Marcion Project for making much of the underlying digitized text available, and the annotators whose hard work has made this release possible. As with all releases, raw machine readable data for all corpora can be found, including morphological and syntactic analysis, as well as named entity recognition and entity linking, on our GitHub repository, in a variety of popular formats:

https://github.com/copticscriptorium/corpora

We hope this release will be useful and look forward to the next one!

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.