한국어 단어 공간 모델을 이용한 단어 의미 중의성 해소

한국어 단어 공간 모델을 이용한 단어 의미 중의성 해소

ㆍ 저자명: 박용민,이재성,Park. Yong-Min,Lee. Jae-Sung
ㆍ 간행물명: 한국콘텐츠학회논문지
ㆍ 권/호정보: 2012년|12권 6호|pp.41-47 (7 pages)
ㆍ 발행정보: 한국콘텐츠학회
ㆍ 파일정보: 정기간행물|
PDF텍스트
ㆍ 주제분야: 기타

이 논문은 한국과학기술정보연구원과 논문 연계를 통해 무료로 제공되는 원문입니다.

서지반출

기타언어초록

한국어 단어의 의미 중의성 해소 방법들은 주로 소규모의 의미 태그 부착 말뭉치나 사전 정보 등을 이용하여 엔트로피 정보, 조건부 확률, 상호정보 등을 각각 계산하고 이를 중의성 해소에 이용하는 방법 등으로 다양하게 제안되었다. 본 논문에서는 대규모로 구축된 의미 태그 부착 말뭉치를 이용하여 한국어 단어 벡터를 추출하고 이 벡터들 사이의 유사도를 계산하여 단어 의미 중의성을 해소하는 단어 공간 모델 방법을 제안한다. 세종 형태의미분석 말뭉치를 사용하여 학습하고 임의의 200문장(583 단어 종류)에 대해 평가한 결과, 정확도가 94%로 기존의 방법에 비해 매우 우수했다.

기타언어초록

Various Korean word sense disambiguation methods have been proposed using small scale of sense-tagged corpra and dictionary definitions to calculate entropy information, conditional probability, mutual information and etc. for each method. This paper proposes a method using Korean Word Space model which builds word vectors from a large scale of sense-tagged corpus and disambiguates word senses with the similarity calculation between the word vectors. Experiment with Sejong morph sense-tagged corpus showed 94% precision for 200 sentences(583 word types), which is much superior to the other known methods.

키워드

단어 의미 중의성 해소 단어 공간 모델 단어 벡터 동형이의어 Word Sense Disambiguation Word Space Model Word Vector Homograph

다운URL