Detail publikace

Robust and complex approach of pathological speech signal analysis

Originální název

Robust and complex approach of pathological speech signal analysis

Anglický název

Robust and complex approach of pathological speech signal analysis

Jazyk

en

Originální abstrakt

This paper presents a study of the approaches in the state-of-the-art in the field of pathological speech signal analysis with a special focus on parametrization techniques. It provides a description of 92 speech features where some of them are already widely used in this field of science and some of them have not been tried yet (they come from different areas of speech signal processing like speech recognition or coding). As an original contribution, this work introduces 36 completely new pathological voice measures based on modulation spectra, inferior colliculus coefficients, bicepstrum, sample and approximate entropy and empirical mode decomposition. The significance of these features was tested on 3 (English, Spanish and Czech) pathological voice databases with respect to classification accuracy, sensitivity and specificity. To our best knowledge the introduced approach based on complex feature extraction and robust testing outperformed all works that have been published already in this field. The results (accuracy, sensitivity and specificity equal to 100.0%) are discussable in the case of Massachusetts Eye and Ear Infirmary (MEEI) database because of its limitation related to a length of sustained vowels, however in the case of Príncipe de Asturias (PdA) Hospital in Alcalá de Henares of Madrid database we made improvements in classification accuracy (82.1%) and specificity (83.8%) when considering a single-classifier approach. Hopefully, large improvements may be achieved in the case of Czech Parkinsonian Speech Database (PARCZ), which are discussed in this work as well. All the features introduced in this work were identified by Mann–Whitney U test as significant (p<0.05) when processing at least one of the mentioned databases. The largest discriminative power from these proposed features has a cepstral peak prominence extracted from the first intrinsic mode function (p=6.9443E−32) which means, that among all newly designed features those that quantify especially hoarseness or breathiness are good candidates for pathological speech identification. The paper also mentions some ideas for the future work in the field of pathological speech signal analysis that can be valuable especially under the clinical point of view.

Anglický abstrakt

This paper presents a study of the approaches in the state-of-the-art in the field of pathological speech signal analysis with a special focus on parametrization techniques. It provides a description of 92 speech features where some of them are already widely used in this field of science and some of them have not been tried yet (they come from different areas of speech signal processing like speech recognition or coding). As an original contribution, this work introduces 36 completely new pathological voice measures based on modulation spectra, inferior colliculus coefficients, bicepstrum, sample and approximate entropy and empirical mode decomposition. The significance of these features was tested on 3 (English, Spanish and Czech) pathological voice databases with respect to classification accuracy, sensitivity and specificity. To our best knowledge the introduced approach based on complex feature extraction and robust testing outperformed all works that have been published already in this field. The results (accuracy, sensitivity and specificity equal to 100.0%) are discussable in the case of Massachusetts Eye and Ear Infirmary (MEEI) database because of its limitation related to a length of sustained vowels, however in the case of Príncipe de Asturias (PdA) Hospital in Alcalá de Henares of Madrid database we made improvements in classification accuracy (82.1%) and specificity (83.8%) when considering a single-classifier approach. Hopefully, large improvements may be achieved in the case of Czech Parkinsonian Speech Database (PARCZ), which are discussed in this work as well. All the features introduced in this work were identified by Mann–Whitney U test as significant (p<0.05) when processing at least one of the mentioned databases. The largest discriminative power from these proposed features has a cepstral peak prominence extracted from the first intrinsic mode function (p=6.9443E−32) which means, that among all newly designed features those that quantify especially hoarseness or breathiness are good candidates for pathological speech identification. The paper also mentions some ideas for the future work in the field of pathological speech signal analysis that can be valuable especially under the clinical point of view.

BibTex


@article{BUT115860,
  author="Jiří {Mekyska} and Eva {Janoušová} and Pedro {Gomez-Vilda} and Zdeněk {Smékal} and Irena {Rektorová} and Ilona {Eliášová} and Milena {Košťálová} and Martina {Mračková} and Jesus {Alonso-Hernandez} and Marcos {Faúndez Zanuy} and Karmele {Lopez-de-Ipina}",
  title="Robust and complex approach of pathological speech signal analysis",
  annote="This paper presents a study of the approaches in the state-of-the-art in the field of pathological speech signal analysis with a special focus on parametrization techniques. It provides a description of 92 speech features where some of them are already widely used in this field of science and some of them have not been tried yet (they come from different areas of speech signal processing like speech recognition or coding). As an original contribution, this work introduces 36 completely new pathological voice measures based on modulation spectra, inferior colliculus coefficients, bicepstrum, sample and approximate entropy and empirical mode decomposition. The significance of these features was tested on 3 (English, Spanish and Czech) pathological voice databases with respect to classification accuracy, sensitivity and specificity. To our best knowledge the introduced approach based on complex feature extraction and robust testing outperformed all works that have been published already in this field. The results (accuracy, sensitivity and specificity equal to 100.0%) are discussable in the case of Massachusetts Eye and Ear Infirmary (MEEI) database because of its limitation related to a length of sustained vowels, however in the case of Príncipe de Asturias (PdA) Hospital in Alcalá de Henares of Madrid database we made improvements in classification accuracy (82.1%) and specificity (83.8%) when considering a single-classifier approach. Hopefully, large improvements may be achieved in the case of Czech Parkinsonian Speech Database (PARCZ), which are discussed in this work as well. All the features introduced in this work were identified by Mann–Whitney U   test as significant (p<0.05) when processing at least one of the mentioned databases. The largest discriminative power from these proposed features has a cepstral peak prominence extracted from the first intrinsic mode function (p=6.9443E−32) which means, that among all newly designed features those that quantify especially hoarseness or breathiness are good candidates for pathological speech identification. The paper also mentions some ideas for the future work in the field of pathological speech signal analysis that can be valuable especially under the clinical point of view.",
  chapter="115860",
  doi="10.1016/j.neucom.2015.02.085",
  howpublished="online",
  number="1",
  volume="167",
  year="2015",
  month="november",
  pages="94--111",
  type="journal article"
}