Coptic Scriptorium

Coptic SCRIPTORIUM (Sahidic Corpus Research: Internet Platform for Interdisciplinary multilayer Methods) is a collaborative, digital project created by Caroline T. Schroeder (University of the Pacific) and Amir Zeldes (Georgetown University). The team is constantly growing.
Coptic SCRIPTORIUM provides a platform for interdisciplinary and computational research in texts in the Coptic language, particularly the Sahidic dialect. As an open-source, open-access initiative, the SCRIPTORIUM technologies and corpus function as a collaborative environment for digital research by any scholars working in Coptic. It provides:
tools to process Coptic texts
a searchable, richly-annotated corpus of texts using the ANNIS search and visualization architecture
visualizations of Coptic texts
a collaborative platform for scholars to use and contribute to the project
research results generated from the tools and corpus
We hope SCRIPTORIUM will serve as a model for future digital humanities projects utilizing historical corpora or corpora in languages outside of the Indo-European and Semitic language families.

Acephalous Work 22 by Shenoute
Abraham Our Father by Shenoute
Letters of Besa
Apophthegmata Patrum
Bible
Note: This corpus is derived from the Sahidica New Testament, which was released by Warren Wells and made available for free electronic distributionfor academic use only. It is not licensed CC-BY; click here for Sahidica licensing information.
Tools
Some of the tools below use a Sahidic Coptic lexicon based on data kindly provided by Prof. Tito Orlandi and the CMCL project. When using the part-of-speech tagging models or the tokenization script and its lexicon please make sure to refer back to the CMCL project.
Part-of-Speech Tagging
Scripts and models
Tokenization script and lexicon (assumes normalized Coptic, see tokenization guidelines)
TreeTagger - an open source part-of-speech tagger (additional Windows interface WinTreeTagger)
Coptic TreeTagger training models - for the fine and coarse grained tagsets (see tagging guidelines below)
Documentation
Diplomatic Transcription Guidelines(version 1.1.0)
Tokenization Guidelines (see sections 3 & 4 of the Transcription Guidelines)
Part-of-Speech Tagging Guidelines (version 1.1.0)
Additional Annotation Tools
Normalizer (normalizes orthography, removes diacritics)
Language of origin tagger (to annotate loan words from Greek, Latin, Hebrew/Greco-Hebrew, Aramaic)
Converters
Coptic encoding converter (converts older text character systems used for fonts such as Coptic and Laser Coptic into standards-compliant Coptic Unicode characters)
Simple recoding script in Perl (supports CMCL, Laser Coptic and UTF-8 encoding conversion)
Converter for ASCII encoding / UTF-8 of Dirk Van Damme and Gregor Wurst
Download both converters
SaltNPepper - a metamodel based Java framework for multi-format conversion
Excel-Plugin for importing and exporting EXMARaLDA XML, SGML, PAULA XML and subsets of TEI XML

Acephalous Work 22 by Shenoute

Abraham Our Father by Shenoute

Letters of Besa

Apophthegmata Patrum

Bible

Tools

Part-of-Speech Tagging

Additional Annotation Tools

Converters

Latest Images

Trending Articles

Latest Images