Bienvenue sur mon site!

Hiking in Iceland

Ici, vous trouverez mon blog, des informations à mon sujet et celui de mes recherches ainsi que mes publications. Soyez indulgents s'il vous plaît, puisque ce site est à l'image de Montréal en été, c'est-à-dire qu'il sera en construction pendant un certain temps.

Veuillez consulter les liens situés sur la barre latérale pour en savoir plus à mon sujet (pour les spécialistes et les non-spécialistes), pour mon CV et pour quelques billets thématiques de mon blogue. L'icône "tags" située en bas de la barre latérale réunit toutes les thématiques abordées.

Vous trouverez ci-dessous mes billets de blogues les plus récents. Ceux-ci pourraient vous intéresser si vous travaillez avec des données acoustiques, de données issues de Twitter, ou si le scriptage Praat ou R vous intéresse.

It's 'gif' and 'gif': The English lexicon goes both ways

This post about the pronunciation of “gif” is written for a general public and uses International Phonetic Alphabet symbols rather than “hard” and “soft g”. Here’s all you need to know:

  • Brackets [] indicate a phonetic transcription.
  • [g] = the first sound of “good”. This is technically a velar stop.
  • [dʒ] = the first sound of “Jerry”. This is technically a post-alveolar affricate. It’s different from [g] both in where and how it’s produced.
More …

Who Signed the Pinker Letter?

I hate feeling compelled to do this, but here we are. tl;dr: I ran some quick ‘n dirty stats on the signatories of “the Pinker Letter”, and it’s not “just graduate students.” But also: why are we doing this?

More …

Automating IPA transcription grading in R

Large group sizes in introductory phonetics classes may be discouraging if you want to assign longer IPA transcriptions. This R script automates that process by generating a list of acceptable model transcriptions (e.g., where variation is permitted) from your input, importing student transcriptions from a .txt file, finding the closest model transcription(s) based on Levenshtein Distance, calculating a grade based on the minimal difference and the maximal number of characters between the two (model and student transcription), and providing a .txt feedback file in each student’s folder which can then be uploaded to your online learning environment. (Thanks to my colleague François Lareau who shared the original idea of using the Levenshtein Distance and for sharing his Python script. The largest personal addition here is generating a list of possible variants.)

More …

Working with lots of big JSON Lines files in R

If you, like me, work with publicly available Twitter databases, you might find yourself with a large number of huge JSON Lines files, like the COVID-19-TweetIDs database. In this database, you’ll find the IDs of all tweets since late January 2020 mentioning any of several COVID-19-related buzzwords, typically divided into 23 files for each day. A Python script included in the database turns each file of IDs into a .jsonl file of hydrated tweets, some of which can exceed 1 Gb. Even with a decent computer, processing and transforming these files into manageable data frames in R can be pretty taxing, mostly because of the non-uniformity of the data and the way the jsonlite package handles .jsonl files.

More …