Paper Title
Effect of Number of Mixture Components of GMM and Feature Vector Dimensions in Non-Intrusive Speech Quality Evaluation
Abstract
A meaningful objective model for the estimation of non-intrusive speech quality can be established by utilizing
the speech production model and the auditory perception phenomena of the human auditory system. To supplement the
subjective mean opinion score (MOS), the estimation of objective MOS is configured using the principle of human auditory
perception models and the speech production model. In this work, the Lyon’s auditory features, mel-frequency cepstral
coefficients (MFCC) and features corresponding to the vocal tract resonances such as line spectral frequencies (LSF) are
concatenated to make the feature vector. The size of feature vectors are reduced using principal component analysis (PCA)
and reduced size feature vectors are used to compute the objective MOS using GMM probabilistic approach. The effect of
number of mixture components in GMM and the dimensions in different reduced size feature vectors made up of the
combinations of meaningful speech features such as Lyon’s auditory features, MFCC and LSF has been studied that which
one leads to better objective MOS in terms of increased correlation with the subjective MOS. These feature vectors also
include the first and second differences of MFCC and LSF features. The training of Gaussian Mixture Model (GMM) to
obtain its parameters using these reduced size feature vectors has been done using expectation maximization (EM) algorithm
for different speech databases. The performance evaluation in terms of correlation between the subjective MOS and the
objective MOS using different reduced size feature vectors for different GMM mixture components are compared. The
results are also compared with ITU-T Recommendation P.563, the standard for non-intrusive speech quality estimation.
Index Terms— Speech quality, Gaussian mixture model, Auditory features, mel-frequency cepstral coefficients, Line
spectral frequencies.