Lost in Translation? Exploring the Shift in Grammatical Gender from Latin to Occitan
Abstract
A deep learning framework is developed to analyze the grammatical gender system evolution from Latin to Romance languages, examining both lexical and contextual factors in a low-resource historical setting.
The diachronic evolution from Latin to the Romance languages involved a restructuring of the grammatical gender system from a tripartite configuration (masculine, feminine, neuter) to a bipartite one (masculine, feminine) in most Romance languages. In this work, we introduce an interpretable deep learning framework to investigate this phenomenon at both lexical and contextual levels. First, we show that conventional tokenization strategies are insufficiently robust for this low-resource historical setting, and that our proposed tokenizer improves performance over these baselines. At the lexical level, we evaluate the contribution of morphological features to gender prediction. At the contextual level, we quantify the contributions of different part-of-speech categories to grammatical gender prediction. Together, these analyses characterize the distribution of gender information between the lemma and its sentential context. We make our codebase, datasets, and results publicly available at https://github.com/ahan-2000/Lost-in-Translation-{https://github.com/ahan-2000/Lost-in-Translation-}.
Community
A deep learning framework is developed to analyze the grammatical gender system evolution from Latin to Romance languages, examining both lexical and contextual factors in a low-resource historical setting.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- From Traditional Taggers to LLMs: A Comparative Study of POS Tagging for Medieval Romance Languages (2026)
- MORPHOGEN: A Multilingual Benchmark for Evaluating Gender-Aware Morphological Generation (2026)
- Is She Even Relevant? When BERT Ignores Explicit Gender Cues (2026)
- Learning the Cue or Learning the Word? Analyzing Generalization in Metaphor Detection for Verbs (2026)
- Refining Word-Based Grammatical Error Annotation for L2 Korean (2026)
- SCRIPT: A Subcharacter Compositional Representation Injection Module for Korean Pre-Trained Language Models (2026)
- Benchmarking POS Tagging for the Tajik Language: A Comparative Study of Neural Architectures on the TajPersParallel Corpus (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend
Get this paper in your agent:
hf papers read 2605.09156 Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper