Swiss Tamil Project
Mapping Heritage Language Structure Through Sociolinguistic Cues: A Case Study of Swiss Tamil
Project by: Université de Lausanne, Switzerland
Speech Data Collection
This project aims to create a detailed corpus of spoken Tamil from three distinct communities. The team will gather over 200 hours of speech data from Tamil speakers in:
  • South India
  • Northern Sri Lanka
  • Switzerland
Corpus Processing
Transcription
Automatic speech to text
Translation
Translate from one language to another
Validation
Human validated results
Glossing
Interlinear Glossing at Morphological Level
The collected speech data will undergo a rigorous process of transcription, translation, and glossing. This process will leverage cutting-edge AI models integrated within the MATra Lab platform and then human validation.
AI Model Improvement
Dataset
High-quality human-validated dataset of different varieties of Tamil.
Fine-tuning
Fine-tune existing ASR and translation models using the new dataset.
Performance
Evaluate and iterate to continuously improve model performance.
A key objective of this project is to enhance the performance of Tamil ASR and translation systems. The dataset created will be used to fine-tune existing models, leading to more accurate and robust language processing tools.
Sociolinguistic Insights
Language Variation
Document unique features of different varieties of Tamil.
Code-Switching
Analyze the use of other languages in Tamil speech.
Language Contact
Investigate the influence of local languages.
This project will explore the rich sociolinguistic landscape of Swiss Tamil, focusing on language variation, code-switching, and the influence of local languages. The findings will offer valuable insights into language evolution and adaptation in diaspora communities.
© 2022-25 UnReaL-TecE LLP. All rights reserved.