Month: September 2019

Fall 2019 Corpora Release 3.0.0

Coptic Scriptorium is happy to announce our latest data release, including a variety of new sources thanks to our collaborators (digitized data courtesy of the Marcion and PAThs projects!). New in this release are: Saints’ lives Life of Cyrus Life of Onnophrius Lives of Longinus and Lucius Martyrdom of Victor the General (part 2)  Miscellaneous: Dormition of John Homilies […]

New release of Natural Language Processing Tools

Amir Zeldes and Luke Gessler  have spent much of the past summer improving Coptic Scriptorium’s Natural Language Processing tools, and are now happy to announce the release of Coptic-NLP V3.0.0. You can read more about what we’ve been doing and the impact on performance in our three part blog post (part 1, part 2, part […]

Dealing with Heterogeneous Low Resource Data – Part III

(This post is part of a series on our 2019 summer’s work improving processing for non-standardized Coptic resources) In this post, we present some of our work on integrating more ambitious automatic normalization tools that allow us to deal with heterogeneous spelling in Coptic, and give some first numbers on improvements in accuracy through this summer’s work. […]

Dealing with Heterogeneous Low Resource Data – Part II

(This post is part of a series on our 2019 summer’s work improving processing for non-standardized Coptic resources) The first step in processing heterogeneous data in Coptic is deciding what to spell together. As we described in part I, this is a problem because there are no spaces in original Coptic manuscripts, and editorial standards for how to […]