The diversity of electroglottographic signals is impressive. States of the glottis can change rapidly. There is diversity across phonological categories, diversity among speaking styles, diversity along the lifespan, and of course there are differences across speakers.
The gallery presented here aims to provide a basis for a classification of phonation types. The idea is to identify some types, and relate them to the various classifications proposed in the literature. Emphasis is laid on quantified criteria, which allow for the automatic detection of these types. This is expected to facilitate discussion of phonation types among phoneticians.
Present-day digital tools allow for including the data and tools in this repository, along with reflections on states of the glottis. The term 'gallery' may be too static to describe what we are building here: perhaps it should be called a 'playground' instead. For all examples in this gallery, the signals can be downloaded from the gallery folder: for instance, for Figure 1 below, here are links to the audio and electroglottographic signal. You can also download the figures as vector drawings (in PDF format) from the images folder, and re-create them by running the scripts.
On a technical note: the scripts that create the figures require export_fig
, a toolbox for exporting figures from MATLAB to standard image and document formats nicely (available here). This is a great toolbox, but users who do not want to try it out can revert to the simpler print
command by commenting out the export_fig
command and uncommenting the print
command in the last lines of the script that produces a figure.
The work is now (2019) in its initial stage, with glottalization as a first area of investigation. (Data and analyses by Minh-Châu Nguyên, under the supervision of Alexis Michaud, Lise Buchman and Didier Demolin. Pages maintained by Alexis Michaud. Comments, feedback & collaborations are most welcome.)
Glottalized signals are often pooled together into one phonation type, variously referred to as 'creaky voice', 'vocal fry', or 'glottalized voicing'. Thus, in their typology of phonation mechanisms, Roubeau, Henrich & Castellengo (2009) acknowledge the diversity of the various modes of vibration associated with lowest fundamental frequency ("a periodic glottal cycle, with a very low frequency, or nonperiodic-impulsions, or multiple cycles (doubles and triples)", p. 431) but nonetheless group them under the same heading: "phonation mechanism zero" (M0), as distinguished from the two phonation mechanisms mostly used in operratic singing, M1 (corresponding roughly to "chest voice") and M2 ("head voice").
The approach chosen here consists in testing to what extent phonetic subtypes can be reliably characterized, and distinguished on the basis of electroglottographic signals.
The term “creaky voice” (or “creak”, used here interchangeably) refers to a number of different kinds of voice production. (Keating et al. 2015)
The sifting of examples from Muong data suggests that there is a vast continuum between pressed voice, glottal constriction and single-pulsed creak.
The first example is a token with strongly pressed voice, which some would describe as having glottal constriction (a brief span of single-pulsed creak).
Figure 1.
On the audio signal, there are hints of creak: longer pulses, of much smaller amplitude, in the second half as compared to the first.
Analysis of the EGG signal with peakdet
yields the results shown in the figure below. (The data can be loaded into Matlab from the 1.mat
file). The x axis represents glottal cycles, which constitute data points in the results file.
Figure 2. The x axis represents glottal cycles, which constitute data points in the results file.
Fundamental frequency (show as green dots on the figure) is low. The electroglottographic signal looks quasi-periodic (no noticeable jumps in duration from one cycle to the next), but measurements of f0 bring out slight irregularities (jitter) as f0 it reaches its lowest point, at glottal cycles 15 to 20. Those cycles are also a point where open quotient values (which are very low throughout this token) are harder to estimate: this is evidenced by the gap between the estimates in orange and in blue. The values in orange are calculated by simply detecting the local minimum in the EGG signal; those in blue take into account the shape of the signal (multiple peaks are detected, and a barycentre is calculated). In this token, the opening peaks in the derivative of the electroglottographic signal are so inconspicuous that it comes as a surprise that the orange line should not be discontinuous.
Overall, this signal exemplifies voicing on the verge of aperiodicity. Phonation enters into phonation mechanism zero (bona fide creak), compromising periodicity without losing it altogether.
A similar example from a female speaker (F13) is shown below. It is a token of the same word, /paj⁴/.
Figure 3.
Like in the token by speaker M1 (above), f0 is still essentially continuous. The open quotient can be assessed with precision: it goes down to values of about 30%, which is extremely low (remembering that for female speakers, as a general rule Oq is higher than for men). The span with glottal constriction (pressed voice) is longer than in the preceding example, and there is slightly less irregularity.
Figure 4. The x axis represents glottal cycles, which constitute data points in the results file.
Single-pulsed creak can be seen as an extreme form of pressed voice (as voicing goes into aperiodicity).
Figure 5.
Example 2, shown in Figure 5 (the same word as in example 1, by another speaker), is similar to example 1 in important respects: (i) cycles are long and consist of a single pulse (as opposed to the double-pulsed or multiple-pulsed patterns which will be described below); (ii) the auditory impression is one of constriction, rather than relaxed phonation; and (iii) cycle length increases again after reaching the lowest values (i.e. glottalization does not interrupt voicing).
Figure 6. The x axis represents glottal cycles, which constitute data points in the results file.
In this example, periodicity is lost. This constitutes an important argument for speaking about creak, rather than pressed voice, as the latter suggests vocal fold compression, not loss of periodicity. (The inverse of cycle duration is nonetheless still referred to as ' f0' for convenience.) Five or six cycles of 'jittery' f0 are followed by rock-bottom values (below 40 Hz): extremely low-frequency phonation.
Moreover, the opening peaks on the dEGG signal, which were still (just barely) clear enough in example 1 to allow for confident evaluation of the glottal open quotient, are so inconspicuous in example 2 that they tend to become indetectable, hence the disagreement between the values shown in blue and orange. It is not unreasonable, in view of the shape of the EGG signal, to consider that these cycles have an extremely short open phase, and the lowest Oq values (those in orange, correcting for two outliers at the 9th and 15th cycles) provide good estimates: Oq is on the order of just 20% (i.e. rock-bottom values, like for f0) for the longest cycles.
Overall, example 2 can be described as more extreme than example 1: a clear lapse into creaky voice, with the lowest possible f0 and Oq, but still with a single pulse per cycle. Phonation is almost arrested by the strong constriction, and only continues 'pulse by pulse' as puffs of air find their way through the closed sphincter. Example 2 is an extreme example, the like of which is not often encountered. Milder forms of pressed voice / glottal constriction are more common, as are cases of multiply-pulsed voice.
Quoting from Keating, Garellek & Kreiman (2015: 2):
A very common form of creak involves a special kind of F0 irregularity: alternating longer and shorter pulses. (...) In the case of double pulsing (or period doubling), there are two simultaneous periodicities; higher multiples are also possible. There are thus multiple F0s, usually one quite low and another about (though not exactly) an octave higher, but the resulting percept is usually of an indeterminate pitch, plus roughness.
Contrasting with types 1 (pressed voice / glottal constriction) and 2 (single-pulse creak), a third type is double-pulse creak, shown in the example in Figures 7 and 8.
Figures 7 and 8. In Figure 8, the x axis represents glottal cycles, which constitute data points in the results file.
This type can be characterized by detection of peaks in the dEGG signal corresponding to glottis-closure instants: knowing the duration of each glottal cycle is enough to notice the double pulses. But it is interesting to have evidence on the open quotient as well, in cases like this one, where opening peaks on the dEGG signal can be detected, and the open quotient calculated with some confidence: the Oq values reveal that the longer glottal cycles have lower Oq than the shorter ones. This offers an additional insight into the strong differences between the main pulse and the secondary pulse.
(coming soon)
This type corresponds to aperiodic voice as characterized by Keating, Garellek & Kreiman (2015: 2). Quoting:
Another variant of F0 irregularity is when it is taken to the extreme – vocal fold vibration is so irregular that there is no periodicity and thus no perceived pitch. See Fig. 5. Like multiply pulsed voice, aperiodic voice lacks the prototypical property of low F0; instead, the property of irregular F0 is enhanced, and the voice is therefore noisy.
This corresponds to aperiodicity as characterized by Laura Redi and Stefanie Shattuck-Hufnagel (2001: 414): “irregularity in duration of glottal pulses from period to period.”
Figure 9 and 10: aperiodic creak. Muong speaker F12. Syllable /rɔ⁴/ ‘banana flower’. In the figure showing the results of EGG analysis, the x axis represents glottal cycles, which constitute data points in the results file.
A difference from multiply pulsed voice is that, in aperiodic creak, the glottal open quotient cannot be estimated: opening peaks do not stand out clearly (except in two of the forty-five cycles: the first and the fifth), as shown in the figure below.
In terms of fundamental frequency, only periods 24 to 28 (in Figure 10) show a pattern that resembles multiply pulsed voice (specifically, double-pulsed voice).
example | label | materials | f0 range | periodicity | n° of pulses | Oq | phonation mechanism |
---|---|---|---|---|---|---|---|
1 | pressed voice / glottal constriction | /paj⁴/, speakers M1 and F13 | very low | almost quasi-periodic | 1 | very low | mechanism M1 |
2 | single-pulsed creak | /paj⁴/, speaker M11 | rock-bottom | aperiodic | 1 | rock-bottom | mechanism M0 |
3 | multiply pulsed creak | /paj⁴/, speaker F13 | low | aperiodic, 'saw-like' | 2 or 3 | low, 'saw-like' | mechanism M0 |
- Keating, Patricia, Marc Garellek & Jody Kreiman. 2015. Acoustic properties of different kinds of creaky voice. Proceedings of the 18th International Congress of Phonetic Sciences. Glasgow.
- Marasek, Krzysztof. 1997. Electroglottographic description of voice quality. AIMS Working Papers Stuttgart, Vol. 3. See website from 1997, still available as of 2019.
- Redi, Laura & Stefanie Shattuck-Hufnagel. 2001. Variation in the realization of glottalization in normal speakers. Journal of Phonetics 29(4). 407–429.
- Roubeau, Bernard, Nathalie Henrich & Michèle Castellengo. 2009. Laryngeal vibratory mechanisms: the notion of vocal register revisited. Journal of Voice 23(4). 425–38.