|
COPYRIGHT
NOTICE
This page includes
the toolkits and data that we used in our papers. Please cite our
corresponding papers when using any of the following materials.
|
Sharewares: Codes, Databases,
and Useful Links |
|
Glottaltopography
is a method to analyze high-speed laryngeal videos. The
method is described in this paper: Gang Chen,
Jody Kreiman, Abeer Alwan,
"The
glottaltopogram: a method of analyzing high-speed images of the vocal folds", Computer Speech and Language, 2014, in press. Briefly, the "glottaltopogram" is based on principal component analysis of pixels' light-intensity time sequences from consecutive video images. This method reveals the overall synchronization of the vibrational patterns of the vocal folds over the entire laryngeal area. This method is effective in visualizing pathological and normal vocal fold vibratory patterns. The GTG toolkit is available for download here.
|
Harmfreq_MOLRT: a
statistical model, likelihood ratio test (LRT)-based speech/non-speech
detection algorithm |
Harmfreq_MOLRT is a statistical model, likelihood ratio test (LRT)-based speech/non-speech detection algorithm. The likelihood ratios (LRs) for voiced and unvoiced frames are computed differently: LR for voiced frames is calculated using only the harmonic DFTs; for unvoiced frames, LR is calculated using all DFTs. It is an improved version of the multiple observation (MO) LRT VAD proposed by Ramirez et. al. [Matlab
code of Harmfreq_MOLRT VAD]
|
MBSC: a
Multi-Band Summary Correlogram (MBSC)-based pitch detection algorithm
for noisy speech |
MBSC is a Multi-Band Summary Correlogram (MBSC)-based pitch detection algorithm for noisy speech. The package contains the matlab code that is used to generate the pitch detection results reported in L. N. Tan, and A. Alwan, "Multi-Band Summary Correlogram-based Pitch Detection for Noisy Speech", Speech Communication, in press. A fast version of the code is also provided in the package. [Matlab
code of MBSC pitch detector (to be updated soon)]
|
|
|
|
|
↑Top |
Variable Frame Rate (VFR) analysis is a method of feature extraction for noise robust automatic speech recognition (ASR) which builds on speech perception research that shows that dynamic spectro-temporal information is important, and, hence, not all equi-duration speech segments are equally important perceptually. For example, formant transitions at the onset of a vowel can carry more discriminative information than the steady-state part of the vowel ... (details) (
download)
|
VoiceSauce is an application, implemented in Matlab, which provides automated voice measurements over time from audio recordings. Inputs are standard wave (*.wav) files and the measures currently computed are: F0, Formants F1-F4, H1(*), H2(*), H4(*), A1(*), A2(*), A3(*), H1(*)-H2(*), H2(*)-H4(*), H1(*)-A1(*), H1(*)-A2(*), H1(*)-A3(*), Energy, and Cepstral Peak Prominence ... (details)
|
XVocal is the UNIX version of Dr. Shinji Maeda's Vocal Tract Articulatory Synthesizer, VTCALCS (originally developed for the PC platform). In 1995,
Edmond Chi Hin Chui of our
laboratory ported the PC version to UNIX. With the permission by Dr. Maeda, XVocal is now freely available if used for research purposes only. Please check out the
user manual for a detailed instruction on how to use the program... (details)
|
|
|
|
Speechdemo is a Matlab-based graphical tool for speech analysis by Qifeng Zhu. It supports simultaneous analysis of signals in two channels. The user can view the signal in time and frequency using a variety of analysis tools such as the Discrete Fourier Transform (DFT); Linear Predictive Coding (LPC); Mel-Frequency Cepstral Coefficients (MFCC); and others... (
details)
|
|
|
|
- The Child Subglottal Resonances Database. Released t 2022, ISBN: 1-58563-985-0
- UCLA Speaker Variability Database. Released through the LDC, 2021, ISBN: 1-58563-977-X
- UCLA High-Speed Laryngeal Audio and Video Database. Released through the LDC, 2017, ISBN: 1-58563-803-X
- The Subglottal Resonances Database. Released through the LDC, 2015, ISBN: 1-58563-711-4
|
|
|
|
A database designed to sample speaking variability within individual speakers and across a large number of speakers is available through this website. It will also be available from the Linguistic Data Consortium (LDC) as of October, 2021. (download, paper, Readme )
|
|
|
|
An extensive database of 1,728 isolated Consonants and Vowels (CV) is available through this website. (details)
|
|
|
The speech group at Microsoft Research (Redmond, Washington, US) and IPAM and Electrical Engineering at UCLA (Los Angeles, CA, US) have recently jointly developed a database for manually labeled vocal-tract-resonance (or formant) trajectories, for research in speech processing including analysis, synthesis, and recognition. (details)
|
|
|
|
A narrated videotape showing 3D tongue and vocal tract reconstructions from MRI data for consonants and vowels as produced by 2 talkers. Sample 3D models can be seen at:
http://www.ee.ucla.edu/~spapl/projects/mri.html. This videotape is an effective teaching aid, and is produced by Shrikanth Narayanan and Abeer Alwan. ... (details)
For a free copy of the videotape, please email Prof. Alwan at: alwan@icsl.ucla.edu
|
The label files in this package contain the time-stamps of silence (sil) and short pause (sp) found in Aurora-2 test sets. These time-stamps are obtained through a manual visual inspection of the spectrograms of clean test files.
|
|
This database includes raw audio, 0dB babble noise corrupted audio and 5dB babble noise corrupted audio files.
|
|
|
|
UCSC Speech
Links
•
Alexander
Graham Bell's Path to the Telephone
•
F0
Estimation Resorces (from the PhD dissertation of Arturo Camacho, SWIPE: A Sawtooth Waveform Inspired Pitch Estimator for Speech and Music, 2007) ;
• AC-P: This algorithm (Boersma, 1993) computes the autocorrelation of the signal and divides it by the autocorrelation of the window used to analyze the signal. It uses postprocessing to reduce discontinuities in the pitch trace. It is available with the Praat System at <http://www.fon.hum.uva.nl/praat> The name of the function is ac.
• AC-S: This algorithm uses the autocorrelation of the cubed signal. It is available with the Speech Filing System at <http://www.phon.ucl.ac.uk/resource/sfs> . The name of the function is fxac.
• ANAL: This algorithm (Secrest and Doddington, 1983) uses autocorrelation to estimate the pitch, and dynamic programming to remove discontinuities in the pitch trace. It is available with the Speech Filing System at <http://www.phon.ucl.ac.uk/resource/sfs>. The name of the function is fxanal.
•
CATE: This algorithm uses a quasi autocorrelation function of the speech excitation signal to estimate the pitch. We implemented it based on its original description (Di Martino, 1999). The dynamic programming component used to remove discontinuities in the pitch trace was not implemented.
• CC: This algorithm uses cross-correlation to estimate the pitch and post-processing to remove discontinuities in the pitch trace. It is available with the Praat System at
<http://www.fon.hum.uva.nl/praat>. The name of the function is cc.
• CEP: This algorithm (Noll, 1967) uses the cepstrum of the signal and is available with the Speech Filing System at <http://www.phon.ucl.ac.uk/resource/sfs>. The name of the function is fxcep.
• ESRPD: This algorithm (Bagshaw, 1993; Medan, 1991) uses a normalized cross-correlation to estimate the pitch, and post-processing to remove discontinuities in the pitch trace. It is available with the Festival Speech Filing System at <http://www.cstr.ed.ac.uk/projects/festival>. The name of the function is pda.
• RAPT: This algorithm (Secrest and Doddington, 1983) uses a normalized cross- correlation to estimate the pitch, and dynamic programming to remove discontinuities in the pitch trace. It is available with the Speech Filing System at <http://www.phon.ucl.ac.uk/resource/sfs>. The name of the function is fxrapt.
• SHS: This algorithm (Hermes, 1988) uses subharmonic summation. It is available with the Praat System at <http://www.fon.hum.uva.nl/praat>. The name of the function is shs.
• SHR: This algorithm (Sun, 2000) uses the subharmonic-to-harmonic ratio. It is available at Matlab Central <http://www.mathworks.com/matlabcentral, under the title “Pitch Determination Algorithm”>. The name of the function is shrp.
• TEMPO: This algorithm (Kawahara et al., 1999) uses the instantaneous frequency of the outputs of a filterbank. It is available with the STRAIGHT System at its author web page <http://www.wakayama-u.ac.jp/~kawahara>. The name of the function is exstraightsource.
• YIN: This algorithm (de Cheveigné and Kawahara, 2002) uses a modified version of the average squared difference function. It is available from its author web page at <http://www.ircam.fr/pcm/cheveign/sw/yin.zip>. The name of the function is yin.
|
|
|
|
|
|
|