Skip to content
Licensed Unlicensed Requires Authentication Published by De Gruyter Mouton May 23, 2023

Brazilian Portuguese-Russian (BraPoRus) corpus: automatic transcription and acoustic quality of elderly speech during the COVID-19 pandemic

  • Irina A. Sekerina ORCID logo EMAIL logo , Anna Smirnova Henriques ORCID logo , Aleksandra S. Skorobogatova ORCID logo , Natalia Tyulina ORCID logo , Tatiana V. Kachkovskaia ORCID logo , Svetlana Ruseishvili ORCID logo and Sandra Madureira ORCID logo
From the journal Linguistics Vanguard

Abstract

This article presents the Brazilian Portuguese-Russian (BraPoRus) corpus, whose goal is to collect, analyze, and preserve for posterity the spoken heritage Russian still used today in Brazil by approximately 1,500 elderly bilingual heritage Russian–Brazilian Portuguese speakers. Their unique 100-year-old variety of moribund Russian is disappearing because it has not been passed to their descendants born in Brazil. During the COVID-19 pandemic, we remotely collected 170 h of speech samples in heritage Russian from 26 participants (M age = 75.7 years) in naturalistic settings using Zoom or a phone call. To estimate the quality of collected data, we focus on two methodological challenges, automatic transcription and acoustic quality of remote recordings. First, we find that among commercially available transcription programs, Sonix far outperforms Google Transcribe and Vocalmatic on the measure of word error rate (WER). Second, we also establish that the acoustic quality of the remote recordings was adequate for intonational and speech rate analysis. Moreover, this remote method of collecting and analyzing speech samples works successfully with elderly bilingual participants who speak a heritage language different from their dominant societal language, and it can become a new norm when face-to-face communication with elderly participants is not possible.


Corresponding author: Irina A. Sekerina, Department of Psychology, College of Staten Island, 2800 Victory Blvd., 4S-108, Staten Island, NY, 10314, USA, E-mail:

Acknowledgments

We thank all of the participants for taking part in this study, and the priests of the Russian Orthodox churches in São Paulo and Rio de Janeiro for their help in contacting people. We especially thank Vera Gers Dimitrov, who presents Associação Cultural Grupo Volga, for mediating the contacts with the Russian community of Vila Zelina in São Paulo. We are also very grateful to two anonymous reviewers whose comments and suggestions have made this article much better.

  1. Research funding: The second author (A. S. H.) was supported by a PNPD/CAPES postdoctoral fellowship (Programa Nacional de Pós-Doutorado da Coordenação de Aperfeiçoamento de Pessoal de Nível Superior; process number 88882.315378/2019-01). The third author (A. S. S.) was supported by an undergraduate fellowship program from FAPESP (Fundação de Amparo à Pesquisa do Estado de São Paulo; process number 2022/01119-0).

Supplementary material

There is supplementary material associated with this article. SM 1: fragment of manual transcription and the corresponding automatic transcription by Sonix; SM 2: the Python script used for pre-processing samples; SM 3: an audio clip of the excerpt for Participant 11 (manual and Sonix transcripts are in SM 1); SM 4: audio clips associated with Figure 1.

References

Benincá, Ludimilla R. 2018. Sócio-história do contato entre o vêneto e o português: Um estudo de caso [Social history of the contact between Talian and Portuguese: A case study]. PAPIA: Revista Brasileira de Estudos do Contato Linguístico 28(1). 109–132.Search in Google Scholar

Birdsong, David, Libby M. Gertken & Mark Amengual. 2021. Bilingual language profile: An easy-to-use instrument to assess bilingualism. COERLL, University of Texas at Austin. https://sites.la.utexas.edu/bilingual/ (accessed 20 April 2023).Search in Google Scholar

Boersma, Paul. 2001. Praat, a system for doing phonetics by computer. Glot International 5. 341–345.Search in Google Scholar

Bolly, Catherine T. & Dominique Boutet. 2018. The multimodal CorpAGEst corpus: Keeping an eye on pragmatic competence in later life. Corpora 13(3). 279–317. https://doi.org/10.3366/cor.2018.0151.Search in Google Scholar

Born in slavery: Slave narratives from the Federal Writer’s Project, 1936 to 1938. 1941. https://www.loc.gov/collections/slave-narratives-from-the-federal-writers-project-1936-to-1938/about-this-collection/ (accessed 7 July 2022).Search in Google Scholar

Bulgin, James, Paul de Decker & Jennifer Nycz. 2010. Reliability of formant measurements from lossy compressed audio. Paper presented at the British Association of Academic Phoneticians Colloquium, London, 29–31 March 2010.Search in Google Scholar

D’Alessandro, Roberta, David Natvig & Michael T. Putnam. 2021. Addressing challenges in formal research on moribund heritage languages: A path forward. Frontiers in Psychology 12. 1–10. https://doi.org/10.3389/fpsyg.2021.700126.Search in Google Scholar

Freeman, Valerie & Paul De Decker. 2021. Remote sociophonetic data collection: Vowels and nasalization over video conferencing apps. Journal of the Acoustical Society of America 149(2). 1121–1223. https://doi.org/10.1121/10.0003529.Search in Google Scholar

Fukuda, Meiko, Ryota Nishimura, Hiromitsu Nishizaki, Koharu Horii, Yurie Iribe, Kazumasa Yamamoto & Norihide Kitaoka. 2022. A new speech corpus of super-elderly Japanese for acoustic modeling. Computer Speech & Language 77. 101424. https://doi.org/10.1016/j.csl.2022.101424.Search in Google Scholar

GEFF (Grupo de Estudos em Fonética Forense). 2020. Protocolo de análise fonético-forense. In Análise fonético-forense: Em tarefas de comparação de locutor, 3–15. Campinas: Millenium Editora.Search in Google Scholar

Gewehr-Borella, Sabrina, Márcia C. Zimmer & Ubiratã K. Alves. 2011. Transferências grafo-fônico-fonológicas: Uma análise de dados de crianças monolíngues (Português) e bilíngues (Hunsrückisch-Português). Gragoatá 16(30). 201–219. https://doi.org/10.22409/gragoata.v16i30.32931.Search in Google Scholar

Goral, Mira, Manuella Clark-Cotton, Avron SpiroIII, Loraine K. Obler, Jay Verkuilen & Martin L. Albert. 2011. The contribution of set switching and working memory to sentence processing in older adults. Experimental Aging Research 37. 516–538. https://doi.org/10.1080/0361073X.2011.619858.Search in Google Scholar

Hilton, Nanna Haug & Adrian Leemann. 2021. Editorial: Using smartphones to collect linguistic data. Linguistics Vanguard 7(s1). 20190031. https://doi.org/10.1515/lingvan-2020-0132.Search in Google Scholar

Johnson, Timothy P. 2014. Snowball sampling: Introduction. In Wiley StatsRef: Statistics reference online.10.1002/9781118445112.stat05720Search in Google Scholar

Kachkovskaia, Tatiana V., Anna Smirnova Henriques, Sandra Madureira & Pavel A. Skrelin. 2021. Intonation changes in Russian-Brazilian Portuguese bilinguals: Mutual interference. In Intercâmbio de Pesquisas em Linguística Aplicada, 22. Caderno de resumos de 22 InPLA: linguagem e interfaces – aproximações e distanciamentos. 121. Available at: https://www.researchgate.net/publication/355982797_Caderno_de_resumos_22_InPLA_recurso_eletronico_linguagem_e_interfaces_-_aproximacoes_e_distanciamentos.Search in Google Scholar

Kaniʻāina: Voices of the land. 2022. Ulukau. https://ulukau.org/kaniaina/?a=p&p=publicationhome&sp=A& (accessed 6 July 2022).Search in Google Scholar

Leemann, Adrian, Péter Jeszenszky, Carina Steiner, Melanie Studerus & Jan Messaerli. 2020. Linguistic fieldwork in a pandemic: Supervised data collection combining smartphone recordings and videoconferencing. Linguistics Vanguard 6(s3). 20200061. https://doi.org/10.1515/lingvan-2020-0061.Search in Google Scholar

MacWhinney, Brian. 2022. The TalkBank system. https://www.talkbank.org/ (accessed 5 January 2022).Search in Google Scholar

Montrul, Silvina & Maria Polinsky (eds.). 2021. The Cambridge handbook of heritage languages and linguistics. Cambridge, MA: Cambridge University Press.10.1017/9781108766340Search in Google Scholar

Nagy, Naomi. 2016. Heritage languages as new dialects. In Marie-Hélène Côté, Remco Knooihuizen & John Nerbonne (eds.), The future of dialects, 15–35. Berlin: Language Science Press.Search in Google Scholar

Oglezneva, Elena A. 2009. Russkij jazyk v vostochnom zarubezh’e (na materiale russkoj rechi v Harbine) [The Russian language beyond the Eastern frontiers (based on the material in Russian collected in Harbin)]. Blagoveshchensk: Amur State University.Search in Google Scholar

Passetti, Renata Regina & Plinio Almeida Barbosa. 2015. O efeito do telefone celular no sinal da fala: Uma análise fonético-acústica com implicações para a verificação de locutor em português brasileiro. Anais do Congresso Brasileiro de Prosódia 3 http://www.periodicos.letras.ufmg.br/index.php/anais_coloquio/article/view/9903 (accessed 20 April 2023).Search in Google Scholar

Radio Liberty project: Oral history, 1917–1966. 2022. Columbia University Libraries, Digital Collections. https://dlc.library.columbia.edu/catalog?utf8=%E2%9C%93&search_field=all_text_teim&q=Radio%20Liberty (accessed 28 November 2022).Search in Google Scholar

Rojas, Sandra, Elaina Kefalianos & Adam Vogel. 2020. How does our voice change as we age? A systematic review and meta-analysis of acoustic and perceptual voice data from healthy adults over 50 years of age. Journal of Speech, Language, and Hearing Research 63(2). 533–551. https://doi.org/10.1044/2019_JSLHR-19-00099.Search in Google Scholar

Romaine, Suzanne. 1995. Bilingualism, 2nd edn. Oxford: Wiley-Blackwell.Search in Google Scholar

Rose, Phillip J. 2003. The technical comparison of forensic voice samples. In Hugh Selby & Ian Freckelton (eds.), Expert evidence. Sydney: Thomson. Available at: http://expert-evidence.forensic-voice-comparison.net.Search in Google Scholar

Ruseishvili, Svetlana. 2016. Ser russo em São Paulo: Os imigrantes russos e a reformulação de identidade após a Revolução Bolchevique de 1917. São Paulo, Brazil: Universidade de São Paulo dissertation. https://teses.usp.br/teses/disponiveis/8/8132/tde-13022017-124015/pt-br.php (accessed 20 April 2023).Search in Google Scholar

Schmid, Monica & Barbara Köpke (eds.). 2019. The Oxford handbook of language attrition. Oxford: Oxford University Press.10.1093/oxfordhb/9780198793595.001.0001Search in Google Scholar

Skorobogatova, Aleksandra S., Anna Smirnova Henriques, Svetlana Ruseishvili, Irina A. Sekerina & Sandra Madureira. 2021. Verbal working memory assessment in Russian-Brazilian Portuguese bilinguals. Cadernos de_Linguística 2(4). e572. https://doi.org/10.25189/2675-4916.2021.V2.N4.ID572.Search in Google Scholar

Smirnova Henriques, Anna & Svetlana Ruseishvili. 2019. Migrantes russófonos no Brasil no século XXI: Perfis demográficos, caminhos de inserção e projetos migratórios. Ponto-e-Vírgula 25. 83–96. https://doi.org/10.23925/1982-4807.2019i25p83-96.Search in Google Scholar

Smirnova Henriques, Anna, Mario A. de S. Fontes, Pavel A. Skrelin, Tatiana V. Kachkovskaia, Svetlana Ruseishvili, Maria C. Borrego, Patrícia Piccin Bertelli Zuleta, Léslie Piccolotto Ferreira & Sandra Madureira. 2020. Russian immigrants in Brazil: To understand, to be understood. Cadernos de Linguística 1(2). 1–18. https://doi.org/10.25189/2675-4916.2020.v1.n2.id210.Search in Google Scholar

Smirnova Henriques, Anna, Aleksandra S. Skorobogatova, Svetlana Ruseishvili, Sandra Madureira & Irina A. Sekerina. 2021. Challenges in heritage language documentation: BraPoRus, spoken corpus of heritage Russian in Brazil. In Oksana Zavalina & Shobhana Lakshmi Chelliah (eds.), Proceedings of the International Workshop on Digital Language Archives: LangArc 2021, 22–24. Denton: University of North Texas. https://digital.library.unt.edu/ark:/67531/metadc1851186/ (accessed 20 April 2023).10.12794/langarc1851178Search in Google Scholar

Smirnova Henriques, Anna, Aleksandra S. Skorobogatova, Tatiana V. Kachkovskaia, Pavel A. Skrelin, Svetlana Ruseishvili, Sandra Madureira & Irina A. Sekerina. 2022. BraPoRus, spoken corpus of heritage Russian in Brazil: Protocol of data collection. Cadernos de Linguística 3(1). e629. https://doi.org/10.25189/2675-4916.2022.V3.N1.ID629.Search in Google Scholar

Wagner, Robert A. & Michael J. Fischer. 1974. The string-to-string correction problem. Journal of the ACM 21(1). 168–173. https://doi.org/10.1145/321796.321811.Search in Google Scholar

Zhang, Cong, Kathline Jepson, George Lohfink & Amalia Arvaniti. 2021. Comparing acoustic analyses of speech data collected remotely. Journal of the Acoustical Society of America 149(6). 3910–3916. https://doi.org/10.1121/10.0005132.Search in Google Scholar


Supplementary Material

This article contains supplementary material (https://doi.org/10.1515/lingvan-2021-0149).

Audio 1
Audio 2

Received: 2021-12-16
Accepted: 2022-12-13
Published Online: 2023-05-23

© 2023 Walter de Gruyter GmbH, Berlin/Boston

Downloaded on 29.3.2024 from https://www.degruyter.com/document/doi/10.1515/lingvan-2021-0149/html
Scroll to top button