Corpus of European Portuguese Fricatives

Introduction

A speech corpus has been designed to explore the fricatives of standard European Portuguese. The phonetic and phonological evidence underlying the design of the corpus are described in the sections that follow. The complete corpus is described in Jesus(2001). We used methodology of previous fricative studies, begun with the EC SCIENCE “Fricative” Project, conducted by Shadle et al. (Shadle 1992; Shadle and Carter 1993). That study was focused on characterizing fricatives in general. Here, that methodology has been adapted to focus on Portuguese fricatives in particular, and thus uses real words and phonology of Portuguese.

Design

A rich variety of phonetic contexts using both real Portuguese words and nonsense words was selected to study the most relevant phoneme variants, and describe the spectral and articulatory characteristics of Portuguese fricative consonants. The corpora also included sustained fricatives, which are better controlled (no phenomena such as coarticulation or devoicing occur during the production of sustained fricatives) and easier to analyse than those in words.

Fricatives that were produced more naturally, but still with contextual and stress control, were studied using a corpus of nonsense words. To produce examples that would be phonotactically possible words in Portuguese, the nonsense words all followed these generally accepted (for European Portuguese) language-specific phonological rules (Mateus and Andrade 2000, p. 11):

any of the vowels /i, e, ɛ, ɐ, a, ɔ, o, u, ĩ, ẽ, ɐ̃, õ, ũ/ can occur in the tonic syllable;
any of the vowels /i, ɨ, e, ɛ, ɐ, a, ɔ, o, u, ĩ, ẽ, ɐ̃, õ, ũ/ can occur before the tonic syllable;
only vowels /i, ɨ, ɐ, u/ can occur after the tonic syllable;
the vowel /i/ does not appear in final position;
the fricatives /f, v, s, z, ʃ, ʒ/ can all occur in initial and medial positions;
/ʃ/ is the only fricative that can occur in word-final position.

In addition to these constraints and to facilitate comparisons, the corpora were designed to be compatible where possible with the fricative corpora recorded of English, American, French and German subjects (Shadle 1992; Shadle and Carter 1993).

Corpus 1: Sustained Fricatives

Corpus 1a consisted of a set of VCV sequences, where V belongs to the reduced set of Portuguese vowels /ɨ, ɐ, u/, and C is one of the Portuguese fricative consonants /f, v, s, z, ʃ, ʒ/ sustained for 5s. As shown by Shadle et al. (1996), the vowel context, even for sustained examples, influences the articulatory and spectral characteristics of fricatives. Since the vocalic contexts of Corpus 1a overlap with those of Corpus 3 (set of Portuguese words), it is possible to make a comparative study between the fricatives produced within these two experimental conditions.

A separate set of Portuguese fricative consonants, sustained for 3s, at medium, soft and loud effort levels, was also recorded (and is called Corpus 1b). Ideally we would like the articulation to be held constant, and only the mean flow velocity at the constriction during its production to be varied. We attempt to elicit this by asking for a variation in effort level.

Corpus 2: Nonsense Words

Corpus 2 consisted of /pV1CV2/ sequences, where V1, V2 were one of the vowels /ɨ, ɐ, u/. The set comprised all possible vowel and fricative permutations, each repeated about 12 times in one breath. The phoneme /p/ is an easily identifiable marker for segmentation and spectral analysis, and has been used in Rothenberg mask recordings by Shadle et al. (Shadle 1992; Shadle and Carter 1993) to measure the subglottal pressure and to check where the zero is in the recorded time signal (no flow velocity).

The stress was placed according to language-specific phonological rules, and subjects were instructed to keep it the same through all the repetitions. The subjects were not always able either to produce the indicated stress pattern or to produce a different pattern consistently, so there were some instances with equal stress in both syllables, and with deleted vowels.

Corpus 3: Real Words

Corpus 3 consisted of 154 words, each said within the frame sentence Diga ..., por favor /ˈdigɐ... puɾ ˈfɐvoɾ/, which was used to record the words in the corpus in a balanced phonetic context and with a neutral prosody. The words were presented in a randomised order.

The 154 words consist of 8 words forming nearly minimal pairs with the pattern /FV1FV2/; 54 words with the pattern /FV_/ (fricative in initial position); 69 words with the pattern /_V1FV2_/ (fricative in medial position); and 23 words with the pattern /_VF/ (fricative in final position).

The vowels in words with sequences /FV1FV2/, /FV/, /V1FV2/ and /VF/ have been divided into three groups according to their location in the vowel triangle: /i, ɨ, e/ - group 1; /ɛ, ɐ, a/ - group 2; /ɔ, o, u/ - group 3. Examples with nearly all Portuguese non-nasal vowels preceding each of the fricatives, followed by one vowel from each of the vowel groups, were used.

The vowel /ɨ/ is generally deleted in final position, as shown by Andrade (1994), and so the resulting allophone is not expected to influence the preceding fricative. Therefore words such as chefe /ˈʃɛfɨ/, ave /ˈavɨ/ and asse /ˈasɨ/, were used to ‘simulate’ final position contexts. As mentioned by Mateus and Andrade (2000), phonologically, only /ʃ/ can occur in final position, but phonetically any fricative can be found in final word position as a consequence of deletion of unstressed vowels.

Corpus 4: Real Words in Connected Speech

Corpus 4 consisted of a set of sentences including 60 words from Corpus 3. Ten of the sentences are meaningful; two include word boundaries within some of the phonetic sequences in Corpus 3, but are semantically nonsense.

Recording Method

The subjects used in this study were two male (LMTJ and CFGA) and two female (ACC and ISSS) adult Portuguese native speakers, with no reported history of hearing or speech disorders. Subject LMTJ, age 26, is from the city of Aveiro (at the centre of Portugal), and CFGA, age 26, is from Braga (in north Portugal). Speaker ACC, age 33, is from Sintra (a city very close to Lisbon), and ISSS, age 21, is from Lisbon. At the time of the recordings all subjects had been studying in England for a period of two to three years.

Recordings were made in a sound treated room using a Bruel & Kjaer 4165 1/2 inch microphone located 1m in front of the subject's mouth, connected to a Bruel & Kjaer 2639 pre-amplifier. The signal was amplified and filtered by a Bruel & Kjaer 2636 measurement amplifier, with high-pass cut-on frequency of 22Hz and low-pass cut-off frequency of 22kHz. A laryngograph signal (Lx) was also collected using a laryngograph processor. The acoustic speech signal and Lx were recorded with a Sony TCD-D7 DAT system at 16 bits, with a sampling frequency of 48kHz, and digitally transferred to a computer for post-processing. A 94dB, 1000Hz calibration tone produced by a Bruel & Kjaer 4620 calibrator was also recorded on the same tape on which speech was recorded.

Segmentation and Annotation Method

The data on the DAT tape were digitally transferred to .wav computer files, which contain the acoustic speech signal in the right channel and the laryngograph signal on the left channel, recorded at 16 bit, with a sampling frequency of 48kHz.

The time waveforms of all the corpus words were manually analysed to detect the start of the vowel-fricative transition, the start of the fricative, the end of the fricative, and the start of the fricative-vowel transition. During the vowel-fricative transition, there is a decrease in amplitude, voicing ceases (for unvoiced fricatives) and frication noise starts. During the fricative-vowel transition, there is an increase in amplitude, voicing starts (for unvoiced fricatives) and frication noise ceases (Docherty 1992, pp. 118-119). These events do not occur simultaneously or always in the same order, making the segmentation a somewhat subjective process. However, it is important to segment consistently, because the results of the analysis methods depend on where the boundaries are placed (Docherty 1992, pp. 103-110). The amplitude and voicing changes appear in both acoustic and Lx signals, which aids the segmentation process. For example, the FV transition also includes some frication noise because we've established that an unvoiced fricative would only correspond to a steady-state noise segment.

The laryngograph signal was also used in the decision process to determine the VF and FV boundaries. For unreduced vowels there was always significant voicing, and for the duration of most fricatives the laryngograph signal changed drastically. Therefore, the amplitude of the laryngograph signal was an important cue in determining the boundaries between the different phones. When it was not clear from the acoustic signal where the fricative started and ended (especially for voiced fricatives), the laryngograph signal was used as an additional cue, because its amplitude diminishes during the VF transition and increases during the FV transition.

The annotation files generated for Corpus 3, which have been used by various analysis programs, consist of eight sample numbers referring to the following locations within the corpus word:

start of first vowel-fricative transition;
start of first fricative;
end of first fricative (or start of first fricative-vowel transition);
end of first fricative-vowel transition;
start of second vowel-fricative transition;
start of second fricative;
end of second fricative (or start of second fricative-vowel transition);
end of second fricative-vowel transition.

For corpus words with only one fricative (e.g. fala /ˈfalɐ/), values 5 through 8 are set to zero.

In examples such as este /ˈeʃtɨ/, where we have a vowel-fricative-plosive segment, the fourth annotated value corresponds to the end of first fricative-plosive transition. When the words contain a final fricative, the fourth annotated value has the same sample value as the third, or the fourth annotated value corresponds to a marker in the “silence” that follows the fricative.

The four speakers produced, on average, more than 12 repetitions of each nonsense word in Corpus 2. However, the first and last, and any atypical tokens were eliminated, thus resulting in ensembles of nine tokens each.

We have also created a set of files containing a phonetic transcription, according to the International Phonetic Alphabet (IPA 1999), of all recorded speech material.

Bilingual Corpus of Fricatives

Introduction

The main aim of this study was to compare the Portuguese results to previously published results for English fricatives. Corpora developed at the University of Southampton for American English did not include such a rich variety of real words as used in the Portuguese study. Therefore we designed a new British English corpus, which included some of the sentences used in a EU study by Shadle et al. (Shadle 1992; Shadle and Carter 1993),and collected, in separate recording sessions, both the Portuguese and English data, as produced by a male bilingual speaker, PS, and a female bilingual speaker, RS. It was then possible to compare the various acoustic characteristics previously examined for the fricatives of four European Portuguese speakers, with a similar set of English fricatives Jesus(2001). We also wanted to eliminate one of the main production variation factors: the across-speaker differences.

Corpora Design and Recording

The Portuguese corpora had a very similar design to the corpora previously described. The English corpora was designed to provide valid data for cross-language comparisons with the Portuguese corpora. It also included sustained fricatives (Corpus 1a and 1b), a set of nonsense words (Corpus 2), words (Corpus 3) and sentences (Corpus 4). Previously used English corpora (Shadle 1992; Shadle and Carter 1993) were augmented to match the Portuguese corpora.

Each speaker was recorded in two separate sessions (Portuguese and English sessions), where the subjects counted and talked in the language of the current session, and the order of corpora recording was one of decreasing naturalness: we started by recording the sentence corpus (Corpus 4), followed by the real word corpus (Corpus 3), nonsense word corpus (Corpus 2), and finally the sustained fricative corpora (Corpora 1a and 1b). Technical aspects of the recording method were the same as previously described for speakers LMTJ, CFGA, ACC and ISSS.

The subjects used in this study were two adult bilingual siblings, with no reported history of hearing or speech disorders. Subject PS was a 22-year-old male and Speaker RS was an 18-year-old female. The siblings' mother is a European Portuguese speaker and the father a British English speaker; they reside in Cascais, Portugal. They have interacted with their parents since infancy in their mother tongues: in Portuguese with their mother and in English with their father.

Corpora Documentation and Distribution

Complete listings of all recorded material can be found in the thesis by Jesus(2001), and in futher documentation that resulted from the segmentation and annotation of all data: corpus.ps (PostScript file), corpus.pdf (Acrobat 7.0 file), bilingual.ps (PostScript file) and bilingual.pdf (Acrobat 7.0 file). The file corpus.ps contains a listing of all recorded material for four European Portuguese speakers (LMTJ, CFGA, ACC and ISSS). The file bilingual.ps contains a listing of all recorded material for two bilingual European Portuguese (PSp and RSp) and British English (PSe and RSe) speakers.

The corpora can be stored in one DVD that contains the following data:

The .wav files contain the acoustic speech signal in the right channel and the laryngograph signal on the left channel, recorded at 16 bits, with a sampling frequency of 48kHz. The directories with Corpus 4 material contain both separate files for each word, and .wav files containing the whole sentence.
The files with the extension .seg are annotation files. In Corpus 1a and 1b they only have two sample values referring to the start of the fricative and the end of the fricative. In corpus 2 there are 4 samples values referring to:
1. start of the vowel-fricative transition;
2. start of the fricative;
3. end of the fricative;
4. end of the fricative-vowel transition.
In Corpus 3 and Corpus 4 the annotation files have at least 8 samples values referring to:
1. start of first vowel-fricative transition;
2. start of first fricative;
3. end of first fricative;
4. end of first fricative-vowel transition;
5. start of second vowel-fricative transition;
6. start of second fricative;
7. end of second fricative;
8. end of second fricative-vowel transition;
where the fricative is one of /f, v, s, z, ʃ, ʒ/. The second set of 4 values is always zero, except for the words forming nearly minimal pairs. When there are more than 8 sample values the word has a uvular fricative /χ, ʁ/ and/or a voiceless tapped alveolar fricative /ɾ̥/. The sets of sample values appear in the order in which the fricatives are located in the word, and correspond to:
- start of the vowel-fricative transition;
- start of the fricative;
- end of the fricative;
- end of the fricative-vowel transition;
where the fricative is one of /χ, ʁ, ɾ̥/.
The .ipa files contain a phonetic transcription of the speech segments contained in the .wav with the same name. Each phonetic symbol is separated by - and the symbols used are based on the TSIPA IPA (International Phonetic Alphabet) font for TEX described in the file tsipadoc.ps or tsipadoc.pdf
The file silence.wav contains the acoustic signal recorded in silence at the end of each session, which can be used to calculate the spectrum of the room noise.
The file tone.wav corresponds to the calibration tone.

Last updated 30/7/2015

lmtj@ua.pt
Luis Miguel Teixeira de Jesus

Back to Homepage