Winter 2021 Corpora Release 4.1.0

We are pleased to announce the latest release of data from Coptic Scriptorium, version 4.1.0. The new release adds new Coptic texts and annotation additions, underscored by the application of named and non-named entity annotation to our New Testament corpus. In total, we released approximately 40,000 tokens of manually edited text in 17 documents from […]

Comprehensive Coptic Lexicon v1.2

The “Thesaurus Linguae Aegyptiae” project (“Strukturen und Transformationen des Wortschatzes der ägyptischen Sprache”, BBAW), the “Database and Dictionary of Greek Loanwords in Coptic” (DDGLC, Freie Universität Berlin), and “Coptic Scriptorium: Digital Research in Coptic Language and Literature” are pleased to announce the latest release of the “Comprehensive Coptic Lexicon”: Version 1.2. The raw data can […]

Summer 2020 Corpora Release 4.0.0

Place name index on It is our great pleasure to announce the latest release of data from Coptic Scriptorium, version 4.0.0. This release contains both new Coptic material and extensive additions to our suite of tools and annotations, focusing on the addition of support for entity annotation and named-entity linking across our new and […]

Digital Coptic 3 – program online!

The program for the third edition of Digital Coptic is now online. Check out the workshop website for the list of projects, talks and presenters. Please join us for the workshop on July 12 and 13 – participants will receive a Zoom link and password for interactive presentations and discussion, and the workshop will also […]

A bird’s eye view of Coptic entities

Coptic Scriptorium recently annotated its Treebank for entities and will soon use automated tools to annotate all corpora. Entity recognition provides a window into what a text discusses, allowing readers to discover information about people and places of interest found throughout a large number of texts that they could not possibly read exhaustively. The Coptic […]

Entities in the Coptic Treebank

With the release of Version 2.6 of Universal Dependencies, our focus has shifted to handling Named and Non-Named Entity Recognition (NER/NNER) in Coptic data. As a result of intensive work by the Coptic Scriptorium team in the past few months, the development branch of the Treebank now contains complete entity spans and types for the entire data in […]

Universal Dependencies 2.6 released!

Check out the new Universal Dependencies (UD) release V2.6! This is the twelfth release of the annotated treebanks at  The project now covers syntactically annotated corpora in 92 languages, including Coptic. The size of the Coptic Treebank is now around 43,000 words, and growing. For the latest version of the Coptic data, see our development branch here: […]

Winter 2020 Corpora Release 3.1.0

It is our pleasure to announce a new data release, with a variety of new sources from our collaborators (including more digitized data courtesy of the Marcion and PAThs projects and other scholars). New in this release are: Saints’ lives and martyrologies Martyrdom of Victor the General (parts 3-8; this work is now complete) Life of Aphou Life of Paul of Tamma […]

The Coptic Dictionary Online wins the 2019 DH Award for Best Tool

We are very happy to announce that the Coptic Dictionary Online (CDO) has won the 2019 Digital Humanities Award in the category Best Tool or Suite of Tools! The dictionary interface, shown below, gives users access to searches by Coptic word forms, definitions in three languages (English, French and German), pattern and part of speech searches, and more. We have also […]

Fall 2019 Corpora Release 3.0.0

Coptic Scriptorium is happy to announce our latest data release, including a variety of new sources thanks to our collaborators (digitized data courtesy of the Marcion and PAThs projects!). New in this release are: Saints’ lives Life of Cyrus Life of Onnophrius Lives of Longinus and Lucius Martyrdom of Victor the General (part 2)  Miscellaneous: Dormition of John Homilies […]

New release of Natural Language Processing Tools

Amir Zeldes and Luke Gessler  have spent much of the past summer improving Coptic Scriptorium’s Natural Language Processing tools, and are now happy to announce the release of Coptic-NLP V3.0.0. You can read more about what we’ve been doing and the impact on performance in our three part blog post (part 1, part 2, part […]

Dealing with Heterogeneous Low Resource Data – Part III

(This post is part of a series on our 2019 summer’s work improving processing for non-standardized Coptic resources) In this post, we present some of our work on integrating more ambitious automatic normalization tools that allow us to deal with heterogeneous spelling in Coptic, and give some first numbers on improvements in accuracy through this summer’s work. […]