On May 14, 2013, a number of scholars working on digital humanities projects in Coptic gathered to discuss our work, standards across the field, and possible collaborations. The University of the Pacific and the Institute for German Language and Linguistics jointly hosted the event.
Overall the meeting was very fruitful, with many attendees expressing a desire to meet regularly.
Anyone interested in being involved in future discussions should contact Caroline T. Schroeder at carrie [at] carrieschroeder [dot] com.
Attendees gave presentations about their own projects. A program, with slides and URLs for project websites is now online.
In addition to presentations, we had ample time for discussing collaborations, standards, and other issues of relevance for us. I’ve excerpted some of the key discussion points here:
Next steps for the group
Colleagues attending SBL in Baltimore in November will try to meet
We discussed the possibility of a formal event once a year. Perhaps alternating North America & Europe?
Virtual conference for people unable to attend physically should be explored.
Additional outreach to other scholars working in Coptic (and Egyptian generally)
Copyrights, publications and intellectual property
We had an extensive discussion of using texts in digital projects when published editions and translations are under copyright. Copyright law varies from country to country, and scholars were encouraged to consult with counsel and experts at their institutions to determine the proper approach for individual projects.
We also discussed ways the scholarly community can ensure the openness of the data and research we produce:
- sign non-exclusive use contracts with publishers so that our work can be published digitally
- use open-source licenses that allow sharing and modifications with attribution.
Some relevant papyrological symbols do not exist in the current Coptic unicode character set, especially an oblique abbreviation stroke. Some projects (including the IFAO?) utilize a Unicode private use area set of encodings for characters that are not in the official Unicode set. Other scholars sometimes substitute existing official Unicode characters that resemble the character they need, but this could result in confusion down the line when the digital texts are searched or processed in other ways.
It is desirable to stick to official Unicode to ensure compatibility and interoperability
The private use agreement is better than disagreement (with every project following a different path) and codes defined by IFAO might be a start
Possible to annotate a character in TEI to mark it (see TEI guidelines 5, especially 5.3 at http://www.tei-c.org/Vault/P5/2.1.0/doc/tei-p5-doc/zh-TW/html/WD.html)
Additionally establishing documentation and guidelines about Coptic was generally regarded as essential. C. Schroeder will look into establishing a wiki about Coptic Unicode, possibly on the Digital Classics Wiki.
Anyone interested in pursuing an application for additional characters should contact Stephen Emmel to coordinate.
Several scholars are working on training Tesseract for Coptic optical character recognition. Layouts and fonts in the printed materials are challenges. Another option for an OCR tool might be OCRopus as another option. A section of the wiki page should address OCR.
Standard URNs and URIs
The group discussed establishing standards for URNs (uniform resource names). URNs are distinguished from URLs (uniform reference locators, which identifies the online location of the thing).
There was general agreement on desirability for standard URNS for Coptic texts. We need to distinguish the abstract text and versioned digital object.
CMCL has large database of abstract texts
The CMCL codex sigla for original codex designations is current scholarly standard. There was general agreement that it should be adapted for digital references, as well.
Trismegestos numbers for the current library/ repository cataloguing seem desirable, although Trismegistos sometimes seems to issue separate identifiers to multiple records per object; J Garcès will investigate
Agreement on three basic levels of identification:
1. abstract text: using CMCL Clavis Coptica, which T. Orlandi & the CMCL will make freely available) (+ author information)
2. physical object: Trisemgistos desirable for current library/repository identification (depending on results of our inquiry); codex sigla from CMCL also should be used as standard identification for the original repository/object’s origins (CMCL will make sigla list available)
3. digital object: need to be established through each project, either by institution or funding source; regulatory agencies monitor digital URNs and projects should consider request them when applying for funding.
A few issues discussed but not resolved included:
- author codes
- distinguishing original codex designations vs. modern library cataloguing of split up physical objects (example would be dismembered White Monastery manuscripts in which folios from one codex are now housed across various libraries);
- objects with more than one text (including palimpsests)