Welcome to my website!
Here you can find my blog as well as information about me, my research and my publications. Please bear with me as I set up my site — like Montreal in the summer, we'll be under construction for a while.
In the sidebar, you can find my About page (for both specialists and non-specialists), my CV page and certain blog themes. The "tags" icon at the bottom of the sidebar gathers all blog topics in a single place.
Below, you'll find some of my more recent blog posts. These may be of interest to you especially if you work with any of the following: acoustic data, Praat scripting, R scripting and Twitter data.
This post about the pronunciation of “gif” is written for a general public and uses International Phonetic Alphabet symbols rather than “hard” and “soft g”. Here’s all you need to know:
- Brackets  indicate a phonetic transcription.
- [g] = the first sound of “good”. This is technically a velar stop.
- [dʒ] = the first sound of “Jerry”. This is technically a post-alveolar affricate. It’s different from [g] both in where and how it’s produced.
I hate feeling compelled to do this, but here we are. tl;dr: I ran some quick ‘n dirty stats on the signatories of “the Pinker Letter”, and it’s not “just graduate students.” But also: why are we doing this?
Large group sizes in introductory phonetics classes may be discouraging if you want to assign longer IPA transcriptions. This R script automates that process by generating a list of acceptable model transcriptions (e.g., where variation is permitted) from your input, importing student transcriptions from a .txt file, finding the closest model transcription(s) based on Levenshtein Distance, calculating a grade based on the minimal difference and the maximal number of characters between the two (model and student transcription), and providing a .txt feedback file in each student’s folder which can then be uploaded to your online learning environment. (Thanks to my colleague François Lareau who shared the original idea of using the Levenshtein Distance and for sharing his Python script. The largest personal addition here is generating a list of possible variants.)
If you, like me, work with publicly available Twitter databases, you might find yourself with a large number of huge JSON Lines files, like the COVID-19-TweetIDs database. In this database, you’ll find the IDs of all tweets since late January 2020 mentioning any of several COVID-19-related buzzwords, typically divided into 23 files for each day. A Python script included in the database turns each file of IDs into a .jsonl file of hydrated tweets, some of which can exceed 1 Gb. Even with a decent computer, processing and transforming these files into manageable data frames in R can be pretty taxing, mostly because of the non-uniformity of the data and the way the
jsonlite package handles .jsonl files.