비트율-왜곡 기반 음성 신호 시간축 분할

비트율-왜곡 기반 음성 신호 시간축 분할

ㆍ 저자명: 이기승
ㆍ 간행물명: 한국음향학회지= The journal of the acoustical society of Korea
ㆍ 권/호정보: 2002년|21권 3호|pp.315-322 (8 pages)
ㆍ 발행정보: 한국음향학회
ㆍ 파일정보: 정기간행물|
PDF텍스트
ㆍ 주제분야: 기타

이 논문은 한국과학기술정보연구원과 논문 연계를 통해 무료로 제공되는 원문입니다.

서지반출

기타언어초록

본 논문에서는 음성 신호 시간축 분할의 새로운 기법으로, 비트율과 왜곡을 함께 고려한 기법이 제안되었다. 시간축 분할에 필요한 보간 함수는 학습 음성 데이터로부터 얻어진다. 보간 함수는 두 타겟간의 길이에 따라 유일하게 결정되므로 보간 함수는 추가 정보없이 표현된다. 타겟 샘플은 비트율을 최소화시키면서 동시에 최대 스펙트럼 오차가 문턱 치보다 작게 되도록 선택하였다. 제안된 기법은 음성 부호화기의 스펙트럼 변수로 널리 사용되는 LSP계수의 부호화에 적용되었으며, 모의실험 결과 평균적으로 8 bits/Frame의 비트율에서 1.4 dB의 스펙트럼 왜곡이 얻어짐을 알 수 있었다.

기타언어초록

In this paper, a new temporal decomposition method is proposed. which takes into consideration not only spectral distortion but also bit rates. The interpolation functions, which are one of necessary parameters for temporal decomposition, are obtained from the training speech corpus. Since the interval between the two targets uniquely defines the interpolation function, the interpolation can be represented without additional information. The locations of the targets are determined by minimizing the bit rates while the maximum spectral distortion maintains below a given threshold. The proposed method has been applied to compressing the LSP coefficients which are widely used as a spectral parameter. The results of the simulation show that an average spectral distortion of about 1.4 dB can be achieved at an average bit rate of about 8 bits/Frame.

키워드

음성 신호의 시간축 분할 LSP계수의 압축 비트율 왜곡 정리 Temporal decomposition for speech signal LSP coefficients compression Rate-distortion theory

다운URL