Breadcrumb

Transcription of the FrOMi corpus

Project management
Duration
01.2024 - 12.2027
Keywords
Corpus, Didactics, Migration, Learning, Teaching
Funding
University of Fribourg Research Pool (reference no. PO2490)
Description

This project – a collaboration between S. Madikeri (Department of Computational Linguistics, University of Zurich) and N. Dherbey Chapuis – aims to attain an error rate of approximately 20 percent in automatic transcriptions of audio recordings; at present, AI-generated transcriptions (e.g. Whisper, Microsoft) have an error rate of nearly 50 percent. In a second step, the transcripts will be corrected by hand.

The longitudinal FrOMi corpus comprises spontaneous oral productions uttered by 10 schoolchildren who spoke no French when they started school as well as the oral productions of 3 pupils who spoke French as a second language (FLS) and who had mastered French to differing degrees by the start of their school careers. Over the course of four school years, the children were recorded every three months during the morning lessons (3.25 hours). The corpus contains 445 hours of recordings as well as the corresponding transcriptions in Standard French.

The FrOMi corpus provides the basis for analysing the longitudinal development of French as a second language over a period of 36 months in a group of very young learners (aged five to nine) who speak first languages that have rarely been an object of study (e.g. Amharic, Tigrinya and Kurdish).

In the current research period (2024–2028), the FrOMi data are analysed in four separate projects:

  • 2024–2027: Transcription of the FrOMi corpus (Sept. 2024–Dec. 2027); funded by the University of Fribourg Research Pool (reference no. PO2490), CHF 30,000: N. Dherbey Chapuis. Collaboration between N. Dherbey Chapuis and S. Madikeri.
  • 2024–2027: Development of pedagogical resources for allophone pupils in classes 1H and 2H (Sept. 2024–Dec. 2027); funded by the Federal Office of Culture FOC (as per Languages Ordinance Art. 10; reference no. 95123), CHF 193,528, N. Dherbey Chapuis. Collaborative research with teachers at schools in the Schoenberg neighbourhood of Fribourg (co-funded by FOC, SEnOF [Canton of Fribourg office for French-speaking obligatory schools], EDK [Swiss Conference of Cantonal Ministers of Education]).
  • Funding application submitted: Impact of using linguistic chunks in spoken language on the development of grammar skills in French as a second language. Collaboration between N. Dherbey Chapuis and Professor A. Thomas.
  • 2023–2028: PhD thesis by N. Félix under the supervision of Professor R. Berthele.

Purpose – Expected results

The aim is to transcribe the FrOMi corpus and encode the spontaneous oral productions with the objective of analysing the linguistic development of multilingual children who speak French as a second language. The results are expected to promote understanding of how children develop language skills in immersive school settings. The analysis results can furthermore be used to develop teaching materials and pedagogical tools that are in better alignment with the needs of these learners.