UCLA: SPAPL CVs

Consonant Vowel Tokens

An extensive database of 1,728 isolated CVs is available through this website. 8 tokens for each of 18 consonants in 3 vowel contexts were digitally recorded from 4 talkers (2 males and 2 females).

Directories and filenames describe the CV and the talker. The three vowel contexts are labeled in ascii as:

"a" for the vowel in "bought"
"ee" for the vowel in "beat"
"oo" for the vowel in "boot"

The 18 consonant contexts are labeled in ascii as:

"b" for the first consonant in "bad"
"d" for the first consonant in "do"
"g" for the first consonant in "go"
"p" for the first consonant in "pat"
"t" for the first consonant in "to"
"k" for the first consonant in "cat"
"m" for the first consonant in "mop"
"n" for the first consonant in "not"
"s" for the first consonant in "sat"
"Z" for the first consonant in "Zoo"
"sh" for the first consonant in "shoe"
"SH" for the fourth consonant in "composure"
"f" for the first consonant in "father"
"V" for the first consonant in "violet"
"th" for the first consonant in "thin"
"TH" for the first consonant in "these"
"j" for the first consonant in "judge"
"ch" for the first consonant in "chew"

The CV directory includes 46 subdirectories containing each CV combination from the 18 consonants and 3 vowels. The name of the subdirectory is the concatonation of the CV (ba, da, SHee, etc.). Within each subdirectory there are 8 exemplars from each of the 4 talkers. These files are named CV.n.SU, where the CV matches the name of the subdirectory, n is the token number, and SU is the identification for each subject. Subjects "peg" and "jb" are female, and "bh" and "bps" are male.

The data files contain raw big-endian short integer data, with no header. The sampling rate is 16 kHz.

To download tar files containing 32 exemplars for each one of the 46 CVs, choose the CV from the following gzipped tar files:

CV files

Back to the UCLA SPAPL home page