Paper Title
Effect of Number of Mixture Components of GMM and Feature Vector Dimensions in Non-Intrusive Speech Quality Evaluation

A meaningful objective model for the estimation of non-intrusive speech quality can be established by utilizing the speech production model and the auditory perception phenomena of the human auditory system. To supplement the subjective mean opinion score (MOS), the estimation of objective MOS is configured using the principle of human auditory perception models and the speech production model. In this work, the Lyon’s auditory features, mel-frequency cepstral coefficients (MFCC) and features corresponding to the vocal tract resonances such as line spectral frequencies (LSF) are concatenated to make the feature vector. The size of feature vectors are reduced using principal component analysis (PCA) and reduced size feature vectors are used to compute the objective MOS using GMM probabilistic approach. The effect of number of mixture components in GMM and the dimensions in different reduced size feature vectors made up of the combinations of meaningful speech features such as Lyon’s auditory features, MFCC and LSF has been studied that which one leads to better objective MOS in terms of increased correlation with the subjective MOS. These feature vectors also include the first and second differences of MFCC and LSF features. The training of Gaussian Mixture Model (GMM) to obtain its parameters using these reduced size feature vectors has been done using expectation maximization (EM) algorithm for different speech databases. The performance evaluation in terms of correlation between the subjective MOS and the objective MOS using different reduced size feature vectors for different GMM mixture components are compared. The results are also compared with ITU-T Recommendation P.563, the standard for non-intrusive speech quality estimation. Index Terms— Speech quality, Gaussian mixture model, Auditory features, mel-frequency cepstral coefficients, Line spectral frequencies.