The Project
Khotanese Language and Texts in Context: A multimodal approach to the classification and periodisation of the Khotanese language and its texts within the cultural and historical framework of medieval Central Asia
KhoTxt – Khotanese Language and Texts in Context is a multidisciplinary research project dedicated to the study of the Khotanese language and its written tradition within the broader linguistic, historical, and cultural landscape of medieval Central Asia.
The project addresses a fundamental gap in scholarship: despite the size and importance of the Khotanese corpus—comprising religious, literary, medical, and administrative texts—the internal classification and the periodisation of the language remain among the most debated and unresolved issues in Khotanese and Central Asian linguistic studies. This is due to its high degree of internal variation, resulting from centuries of linguistic change, intense cultural contact, and diverse scribal practices.
KhoTxt aims to reconstruct the diasystemic nature of Khotanese by integrating linguistic, philological, material, and digital methodologies. Rather than viewing the language as a single, linear system, the project investigates how different varieties of Khotanese coexisted and interacted across time, space, social strata, communicative contexts, and writing media. In doing so, KhoTxt seeks to redefine the classification and periodisation of Khotanese and to reposition it as a central source for understanding the cultural history of the Silk Road.
Or. 9614/5 © British Library Board
Language variation and change
A central focus of the project is the systematic analysis of linguistic variation in Khotanese. KhoTxt applies historical sociolinguistic models to a historical language known exclusively through written sources, addressing the challenge of how to study linguistic variation without access to spoken data.
Rather than relying solely on traditional diachronic divisions, the project investigates variation across multiple dimensions:
- Diachronic variation, tracing change over the six centuries of attestation;
- Diatopic variation, identifying possible regional differences within and beyond Khotan;
- Diastratic variation, linked to social groups such as scribes, monks, officials, specialists, and non-specialists;
- Diaphasic variation, reflecting degrees of formality and communicative purpose;
- Diamesic variation, connected to writing supports, scripts, and document formats.
By correlating linguistic features with historical and cultural evidence, KhoTxt reassesses traditional labels such as Old and Late Khotanese. Particular attention is given to the relationship between written language and spoken varieties, including the possible existence of diglossic situations in which a prestigious literary language coexisted with evolving vernacular forms. This approach allows the project to move beyond purely chronological classifications and to reconstruct the complex sociolinguistic context in which Khotanese texts were produced and transmitted.
Pelliot chinois 2801 © Bibliothèque nationale de France
Texts and genres
A central objective of the project is to analyse how literary and non-literary genres shape linguistic variation. Previous research has focused predominantly on Buddhist doctrinal texts, which often reflect conservative, standardised forms of the language. KhoTxt expands the scope to include a wide range of genres, each associated with different communicative practices and audiences.
These include narrative literature, non-canonical poetry, verse letters, panegyrics, administrative and legal documents, medical treatises, and bilingual or practical texts used by diplomats and merchants. Each genre is examined as a linguistic environment with its own norms, conventions, and degrees of formality.
Through comparative genre analysis, the project seeks to identify linguistic features that may approximate colloquial or sectorial varieties of Khotanese. In particular, secular documents and medical texts often preserve specialised jargons and an at times less formal language, offering rare insights into everyday communication and professional discourse.
Ch. 00217 © British Library Board
Manuscripts, script, and dating
The study of Khotanese texts is inseparable from the study of the manuscripts that preserve them. KhoTxt adopts an integrated approach to the dating and localisation of manuscripts, combining linguistic evidence with palaeography, orthography, codicology, and archaeometric analysis. Manuscripts are written with various forms of a Central Asian descendant of the Indian Brahmi script, which continued to be adapted throughout the history of Khotanese. Furthermore, at least four distinct orthographic systems can be distinguished that are sometimes employed simultaneously even within the same manuscript.
The project examines how changes in script, spelling conventions, and manuscript formats correlate with linguistic variation and historical developments. different writing surfaces—wooden tablets, paper pustaka books and scrolls, etc.—are analysed not only as material objects but also as indicators of scribal traditions, institutional practices, and regional habits.
In addition, non-destructive scientific techniques, such as near-infrared reflectography and X-ray fluorescence analysis, are planned to be used to study inks and paper. These analyses help identify shared material features that may point to specific production centres or networks of manuscript transmission. By triangulating material, linguistic, and historical data, KhoTxt aims to establish a more robust framework for the relative chronology and provenance of Khotanese manuscripts.
Pelliot chinois 5538, detail © Bibliothèque nationale de France
Digital archive
The digital humanities component of KhoTxt is embodied in the Electronic Khotanese Archive (EKhA), the first comprehensive, open-access digital repository of the entire Khotanese corpus. The archive integrates electronic text editions, linguistic annotation, manuscript metadata, and bibliographic resources within a unified digital environment.
Texts are encoded following the TEI guidelines and enriched with fine-grained linguistic and sociolinguistic markup. This enables advanced queries across different dimensions of variation, supporting research in historical linguistics, philology, literary studies, and Central Asian history.
Beyond serving as a research tool, the EKhA is conceived as a long-term infrastructure that promotes openness, collaboration, and methodological innovation. By combining traditional philological scholarship with cutting-edge digital technologies, KhoTxt aims to transform the study of Khotanese.





