New Corpora Release 4.2.0

Automatic linguistic analysis and Entity Linking from I Samuel 25

It is our pleasure to announce the latest data release from Coptic Scriptorium, version 4.2.0. This release contains both new Coptic material and additions to older datasets, as well as expanding our entity annotations and named-entity linking to all of our data, including the semi-automatically annotated Old Testament. The also means automatic updates to all of our interfaces, such as the recently added example usage functionality in the Coptic Dictionary Online, which is linked to the corpora.

The new material, including more digitized data courtesy of the Marcion project, as well as manually digitized and corrected OCR data from out of print editions includes:

With this new release, the semi-automatically annotated data (excluding automatically processed Bible materials) in the project covers close to 300,000 words of Sahidic Coptic annotated for entities.

This release represents a tremendous amount of work over the past few months by the Coptic Scriptorium team. We would also like to thank individual contributors (which you can always find in the ‘annotation’ metadata for each document), and specifically So Miyagawa for help with Coptic OCR models, as well as the Marcion and CoptOT project for sharing their data with us, and the National Endowment for the Humanities for supporting us. We are continuing to work on more data, links to other resources and new kinds of annotations and tools. Please let us know if you have any feedback!

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.