Automatic Extraction of Verb Paradigms in Regional Languages: the case of the Linguistic Crescent varieties - Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur Accéder directement au contenu
Communication Dans Un Congrès Année : 2020

Automatic Extraction of Verb Paradigms in Regional Languages: the case of the Linguistic Crescent varieties

Résumé

An important and costly step in the process of language documentation is the transcription (total or partial transcripts) of speech data collected in the field. Several projects adopt a methodology involving the use of speech transcription systems (Adda et al. 2016; Michaud et al. 2018); in such an approach, it is necessary to adapt the systems so that they can transcribe (at least phonetically) speech collected during fieldwork. However, within the data gathered, some have either an approximate transcription (e.g. in the case of reading), or more or less precise information on its content, for example in the case of verb conjugations: the linguist proposes a verb, and the informant must give all the possible inflections, most often in a fixed order for tenses and persons. The question addressed in this paper is to explore whether it is possible to use a transcription system developed for a given language (here French) without precise adaptation of acoustic models, in order to produce both segmentation and transcription of verbal paradigms of a closely related language (here several Romance varieties spoken in central France), and the conditions under which the system will or will not require post-processing.
Fichier principal
Vignette du fichier
sltu-article-publie-2020.pdf (533.3 Ko) Télécharger le fichier
Origine : Accord explicite pour ce dépôt
Loading...

Dates et versions

halshs-02508210 , version 1 (18-05-2020)

Identifiants

  • HAL Id : halshs-02508210 , version 1

Citer

Elena Knyazeva, Gilles Adda, Philippe Boula de Mareüil, Maximilien Guérin, Nicolas Quint. Automatic Extraction of Verb Paradigms in Regional Languages: the case of the Linguistic Crescent varieties. STLU (Spoken Language Technologies for Under-resourced languages), European Language Resources Association (ELRA), Jan 2020, Marseille, France. pp.245-249. ⟨halshs-02508210⟩
421 Consultations
81 Téléchargements

Partager

Gmail Facebook X LinkedIn More