코퍼스 기반 한국어 합성기의 억양 구현 방안

코퍼스 기반 한국어 합성기의 억양 구현 방안
A Method of Intonation Modeling for Corpus-Based Korean Speech Synthesizer

ㆍ 저자명: 김진영,박상언,엄기완,최승호,Kim. Jin-Young,Park. Sang-Eon,Eom. Ki-Wan,Choi. Seung-Ho
ㆍ 간행물명: 음성과학
ㆍ 권/호정보: 2000년|7권 2호|pp.193-208 (16 pages)
ㆍ 발행정보: 한국음성과학회
ㆍ 파일정보: 정기간행물|
PDF텍스트
ㆍ 주제분야: 기타

이 논문은 한국과학기술정보연구원과 논문 연계를 통해 무료로 제공되는 원문입니다.

서지반출

기타언어초록

This paper describes a multi-step method of intonation modeling for corpus-based Korean speech synthesizer. We selected 1833 sentences considering various syntactic structures and built a corresponding speech corpus uttered by a female announcer. We detected the pitch using laryngograph signals and manually marked the prosodic boundaries on recorded speech, and carried out the tagging of part-of-speech and syntactic analysis on the text. The detected pitch was separated into 3 frequency bands of low, mid, high frequency components which correspond to the baseline, the word tone, and the syllable tone. We predicted them using the CART method and the Viterbi search algorithm with a word-tone-dictionary. In the collected spoken sentences, 1500 sentences were trained and 333 sentences were tested. In the layer of word tone modeling, we compared two methods. One is to predict the word tone corresponding to the mid-frequency components directly and the other is to predict it by multiplying the ratio of the word tone to the baseline by the baseline. The former method resulted in a mean error of 12.37 Hz and the latter in one of 12.41 Hz, similar to each other. In the layer of syllable tone modeling, it resulted in a mean error rate less than 8.3% comparing with the mean pitch, 193.56 Hz of the announcer, so its performance was relatively good.

키워드

speech synthesis intonation modeling

다운URL