네트워크 환경에서 서버용 음성 인식을 위한 MFCC 기반 음성 부호화기 설계

네트워크 환경에서 서버용 음성 인식을 위한 MFCC 기반 음성 부호화기 설계

ㆍ 저자명: 이길호,윤재삼,오유리,김홍국,Lee. Gil-Ho,Yoon. Jae-Sam,Oh. Yoo-Rhee,Kim. Hong-Kook
ㆍ 간행물명: 말소리
ㆍ 권/호정보: 2005년|54권 1호|pp.27-43 (17 pages)
ㆍ 발행정보: 대한음성학회
ㆍ 파일정보: 정기간행물|
PDF텍스트
ㆍ 주제분야: 기타

이 논문은 한국과학기술정보연구원과 논문 연계를 통해 무료로 제공되는 원문입니다.

서지반출

기타언어초록

Existing standard speech coders can provide speech communication of high quality while they degrade the performance of speech recognition systems that use the reconstructed speech by the coders. The main cause of the degradation is that the spectral envelope parameters in speech coding are optimized to speech quality rather than to the performance of speech recognition. For example, mel-frequency cepstral coefficient (MFCC) is generally known to provide better speech recognition performance than linear prediction coefficient (LPC) that is a typical parameter set in speech coding. In this paper, we propose a speech coder using MFCC instead of LPC to improve the performance of a server-based speech recognition system in network environments. However, the main drawback of using MFCC is to develop the efficient MFCC quantization with a low-bit rate. First, we explore the interframe correlation of MFCCs, which results in the predictive quantization of MFCC. Second, a safety-net scheme is proposed to make the MFCC-based speech coder robust to channel error. As a result, we propose a 8.7 kbps MFCC-based CELP coder. It is shown from a PESQ test that the proposed speech coder has a comparable speech quality to 8 kbps G.729 while it is shown that the performance of speech recognition using the proposed speech coder is better than that using G.729.

키워드

CELP speech coder MFCC Predictive VQ Safety-net VQ Speech recognition

다운URL