Recent Posts

Digital Coptic 3 – program online!

The program for the third edition of Digital Coptic is now online. Check out the workshop website for the list of projects, talks and presenters. Please join us for the workshop on July 12 and 13 – participants will receive a Zoom link and password for interactive presentations and discussion, and the workshop will also […]

A bird’s eye view of Coptic entities

Coptic Scriptorium recently annotated its Treebank for entities and will soon use automated tools to annotate all corpora. Entity recognition provides a window into what a text discusses, allowing readers to discover information about people and places of interest found throughout a large number of texts that they could not possibly read exhaustively. The Coptic […]

Entities in the Coptic Treebank

With the release of Version 2.6 of Universal Dependencies, our focus has shifted to handling Named and Non-Named Entity Recognition (NER/NNER) in Coptic data. As a result of intensive work by the Coptic Scriptorium team in the past few months, the development branch of the Treebank now contains complete entity spans and types for the entire data in […]

Universal Dependencies 2.6 released!

Check out the new Universal Dependencies (UD) release V2.6! This is the twelfth release of the annotated treebanks at http://universaldependencies.org/.  The project now covers syntactically annotated corpora in 92 languages, including Coptic. The size of the Coptic Treebank is now around 43,000 words, and growing. For the latest version of the Coptic data, see our development branch here: https://github.com/UniversalDependencies/UD_Coptic-Scriptorium/tree/dev. […]

Winter 2020 Corpora Release 3.1.0

It is our pleasure to announce a new data release, with a variety of new sources from our collaborators (including more digitized data courtesy of the Marcion and PAThs projects and other scholars). New in this release are: Saints’ lives and martyrologies Martyrdom of Victor the General (parts 3-8; this work is now complete) Life of Aphou Life of Paul of Tamma […]

The Coptic Dictionary Online wins the 2019 DH Award for Best Tool

We are very happy to announce that the Coptic Dictionary Online (CDO) has won the 2019 Digital Humanities Award in the category Best Tool or Suite of Tools! The dictionary interface, shown below, gives users access to searches by Coptic word forms, definitions in three languages (English, French and German), pattern and part of speech searches, and more. We have also […]

Fall 2019 Corpora Release 3.0.0

Coptic Scriptorium is happy to announce our latest data release, including a variety of new sources thanks to our collaborators (digitized data courtesy of the Marcion and PAThs projects!). New in this release are: Saints’ lives Life of Cyrus Life of Onnophrius Lives of Longinus and Lucius Martyrdom of Victor the General (part 2)  Miscellaneous: Dormition of John Homilies […]

New release of Natural Language Processing Tools

Amir Zeldes and Luke Gessler  have spent much of the past summer improving Coptic Scriptorium’s Natural Language Processing tools, and are now happy to announce the release of Coptic-NLP V3.0.0. You can read more about what we’ve been doing and the impact on performance in our three part blog post (part 1, part 2, part […]

Dealing with Heterogeneous Low Resource Data – Part III

(This post is part of a series on our 2019 summer’s work improving processing for non-standardized Coptic resources) In this post, we present some of our work on integrating more ambitious automatic normalization tools that allow us to deal with heterogeneous spelling in Coptic, and give some first numbers on improvements in accuracy through this summer’s work. […]

Dealing with Heterogeneous Low Resource Data – Part II

(This post is part of a series on our 2019 summer’s work improving processing for non-standardized Coptic resources) The first step in processing heterogeneous data in Coptic is deciding what to spell together. As we described in part I, this is a problem because there are no spaces in original Coptic manuscripts, and editorial standards for how to […]

Dealing with Heterogeneous Low Resource Data – Part I

Image from Budge’s (1914), Coptic Martyrdoms in the Dialect of Upper Egypt (scan made available by archive.org) (This post is part of a series on our 2019 summer’s work improving processing for non-standardized Coptic resources) A major challenge for Coptic Scriptorium as we expand to cover texts from other genres, with different authors, styles and […]

On the Road Summer 2019

Coptic Scriptorium is busy this summer conference season. I had the privilege of teaching one of the Sunoikisis Digital Classicist summer session earlier in July. I also presented some research on girls and girlhood using the Coptic Scriptorium Corpora and the Online Coptic Dictionary at the annual UCLA-St. Shenouda Society Coptic Studies Conference.  This year was […]