형태소 분석기 사용을 배제한 음절 단위의 한국어 품사 태깅

형태소 분석기 사용을 배제한 음절 단위의 한국어 품사 태깅

ㆍ 저자명: 심광섭,Shim. Kwang-Seob
ㆍ 간행물명: 인지과학
ㆍ 권/호정보: 2011년|22권 3호|pp.327-345 (19 pages)
ㆍ 발행정보: 한국인지과학회
ㆍ 파일정보: 정기간행물|
PDF텍스트
ㆍ 주제분야: 기타

이 논문은 한국과학기술정보연구원과 논문 연계를 통해 무료로 제공되는 원문입니다.

서지반출

기타언어초록

본 논문에서는 형태소 분석기를 사용하지 않는 음절 단위의 한국어 품사 태깅 방법론을 제안한다. 기존 연구에서 한국어 품사 태거는 형태소 분석기가 생성한 결과 중에서 문맥에 가장 잘 맞는 형태소/품사 열을 결정하는 데 반하여, 본 논문에서 제안한 방법론에서는 품사열을 결정할 뿐만 아니라 형태소도 생성한다. 398,632 어절의 학습 데이터로 학습을 하고 33,467 어절의 평가 데이터로 성능 평가를 한 결과 어절 단위의 정확도가 96.31%인 것으로 나타났다.

기타언어초록

In this paper, a new approach to Korean POS (Part-of-Speech) tagging is proposed. In previous works, a Korean POS tagger was regarded as a post-processor of a morphological analyzer, and as such a tagger was used to determine the most likely morpheme/POS sequence from morphological analysis. In the proposed approach, however, the POS tagger is supposed to generate the most likely morpheme and POS pair sequence directly from the given sentences. 398,632 eojeol POS-tagged corpus and 33,467 eojeol test data are used for training and evaluation, respectively. The proposed approach shows 96.31% of POS tagging accuracy.

키워드

음절 단위 품사 태깅 품사 태깅 품사 태거 형태소 분석 형태소 분석기 Syllable-based Part-of-Speech Tagging Part-of-Speech Tagging Part-of-Speech Tagger Morphological Analysis Morphological Analyzer

다운URL