Electronic Khotanese Archive
The integrated digital resource
The Electronic Khotanese Archive (EKhA) is the digital core of the KhoTxt project and represents the first comprehensive, open-access research infrastructure dedicated to the entire Khotanese textual corpus. Conceived as both a scholarly tool and a long-term digital resource, the archive brings together texts, manuscripts, linguistic data, and contextual information within a single, integrated environment.
EKhA is designed to overcome the traditional fragmentation of Khotanese studies, where texts, editions, and manuscript data are scattered across publications, collections, and institutions. By providing unified access to the corpus, the archive enables new forms of research into linguistic variation, textual transmission, and the cultural history of medieval Central Asia.
KWIC Index produced by Ronald E. Emmerick in Cambridge
Scope and content
The archive aims to include every extant Khotanese manuscript unit, regardless of genre, material, or state of preservation. This includes Buddhist doctrinal texts, narrative and non-canonical literature, secular and administrative documents, medical treatises, bilingual and practical texts, as well as inscriptions and documents written on different surfaces such as paper, wood, and other materials.
Each manuscript unit is treated as a distinct object of study and is accompanied by detailed metadata concerning findspot, script type, orthography, writing material, format, and, where possible, dating. This granular structure allows the archive to reflect the diversity of the corpus and to support fine-grained comparative analysis across texts, genres, and periods.
Digital editions and text encoding
At the heart of EKhA are digital scholarly editions of Khotanese texts. All texts are encoded in a machine-readable format following the guidelines of the Text Encoding Initiative (TEI), ensuring long-term interoperability and methodological transparency. The editions distinguish between secure and uncertain readings, and record restorations and emendations.
KWIC Index produced by Ronald E. Emmerick in Cambridge
Linguistic and sociolinguistic annotation
A defining feature of EKhA is its rich linguistic and extra-linguistic annotation. Texts are ment to be systematically tagged for features relevant to historical and sociolinguistic analysis, including spelling variants, phonological developments, morphological forms, syntactic patterns, and lexical choices.
Crucially, the archive is designed to capture variation across multiple dimensions: diachronic, diatopic, diastratic, diaphasic, and diamesic. This makes it possible to investigate how linguistic features cluster according to time, region, genre, social context, or writing practice, and to reassess traditional classifications of the Khotanese language on an empirical basis.
Manuscript catalogue and advanced queries
The Electronic Khotanese Archive is closely integrated with a comprehensive manuscript catalogue, implemented as a structured database. This catalogue allows users to search and filter manuscripts by a wide range of criteria, including location, collection, script type, material, format, genre, and linguistic features.
Advanced query functions enable researchers to extract and compare datasets across the corpus—for example, to track the distribution of specific forms, orthographic conventions, or genre-specific patterns. These tools support both qualitative philological analysis and quantitative, corpus-based approaches.
KWIC Index produced by Ronald E. Emmerick in Cambridge
Interoperability and digital ecosystem
EKhA is conceived as part of a broader digital ecosystem of resources dedicated to the languages and cultures of the Silk Road. It is designed to interoperate with other major digital projects and databases in related fields, facilitating comparative research and cross-linguistic analysis.
The archive follows Open Access and Open Data principles. All materials produced within the project are made freely available, with clear documentation and licensing, ensuring transparency, reusability, and long-term sustainability.





