We’ve updated our tokenizer (which breaks Coptic bound groups into their constituent morphemes) and our normalizer (which normalizes spelling and orthography to faclitate further automatic annotations).
Version 2.0.1 of the tokenizer includes more patterns to deal with a broader variety of bound groups. It also includes a parameter (-l) to accommodate bound groups that are broken by line breaks, such as you might find in a transcription of a manuscript. The tokenizer is now designed to annotate a bound group that runs across two lines as a bound group with tags and also adds tags for the line breaks.
Version 2.0 of the normalizer adds some vocabulary and also provides a parameter (-s) for normalizing the orthography particular to the Sahidica New Testament texts.