Coptic Scriptorium
Coptic SCRIPTORIUM (Sahidic Corpus Research: Internet Platform for Interdisciplinary multilayer Methods) is a collaborative, digital project created by Caroline T. Schroeder (University of the Pacific) and Amir Zeldes (Georgetown University). The team is constantly growing.
Coptic SCRIPTORIUM provides a platform for interdisciplinary and computational research in texts in the Coptic language, particularly the Sahidic dialect. As an open-source, open-access initiative, the SCRIPTORIUM technologies and corpus function as a collaborative environment for digital research by any scholars working in Coptic. It provides:We hope SCRIPTORIUM will serve as a model for future digital humanities projects utilizing historical corpora or corpora in languages outside of the Indo-European and Semitic language families.
- tools to process Coptic texts
- a searchable, richly-annotated corpus of texts using the ANNIS search and visualization architecture
- visualizations of Coptic texts
- a collaborative platform for scholars to use and contribute to the project
- research results generated from the tools and corpus
Acephalous Work 22 by Shenoute
Abraham Our Father by Shenoute
Letters of Besa
Apophthegmata Patrum
Bible
Note: This corpus is derived from the Sahidica New Testament, which was released by Warren Wells and made available for free electronic distributionfor academic use only. It is not licensed CC-BY; click here for Sahidica licensing information.Tools
Some of the tools below use a Sahidic Coptic lexicon based on data kindly provided by Prof. Tito Orlandi and the CMCL project. When using the part-of-speech tagging models or the tokenization script and its lexicon please make sure to refer back to the CMCL project.Part-of-Speech Tagging
- Scripts and models
- Tokenization script and lexicon (assumes normalized Coptic, see tokenization guidelines)
- TreeTagger - an open source part-of-speech tagger (additional Windows interface WinTreeTagger)
- Coptic TreeTagger training models - for the fine and coarse grained tagsets (see tagging guidelines below)
- Documentation
- Diplomatic Transcription Guidelines(version 1.1.0)
- Tokenization Guidelines (see sections 3 & 4 of the Transcription Guidelines)
- Part-of-Speech Tagging Guidelines (version 1.1.0)
Additional Annotation Tools
- Normalizer (normalizes orthography, removes diacritics)
- Language of origin tagger (to annotate loan words from Greek, Latin, Hebrew/Greco-Hebrew, Aramaic)
Converters
- Coptic encoding converter (converts older text character systems used for fonts such as Coptic and Laser Coptic into standards-compliant Coptic Unicode characters)
- Simple recoding script in Perl (supports CMCL, Laser Coptic and UTF-8 encoding conversion)
- Converter for ASCII encoding / UTF-8 of Dirk Van Damme and Gregor Wurst
- Download both converters
- SaltNPepper - a metamodel based Java framework for multi-format conversion
- Excel-Plugin for importing and exporting EXMARaLDA XML, SGML, PAULA XML and subsets of TEI XML