질의응답에서 위키피디아 인포박스에서의 답변 추출을 위한 페이지 제목과 인포박스 속성 인식

질의응답에서 위키피디아 인포박스에서의 답변 추출을 위한 페이지 제목과 인포박스 속성 인식

ㆍ 저자명: 허정,류법모,김현기,박상규,옥철영,Heo. Jeong,Ryu. Pum Mo,Kim. Hyun Ki,Park. Sang Kyu,Ock. Cheol Young
ㆍ 간행물명: 정보과학회논문지. Journal of KIISE. 소프트웨어 및 응용
ㆍ 권/호정보: 2013년|40권 9호|pp.544-557 (14 pages)
ㆍ 발행정보: 한국정보과학회
ㆍ 파일정보: 정기간행물|
PDF텍스트
ㆍ 주제분야: 기타

이 논문은 한국과학기술정보연구원과 논문 연계를 통해 무료로 제공되는 원문입니다.

서지반출

기타언어초록

본 논문에서는 위키피디아 인포박스 질의응답의 질문분석을 위한 페이지 제목 인식과 인포박스 속성제약 방법을 제안한다. 위키피디아는 반구조화된 지식정보로서, 페이지 제목, 본문, 인포박스 등의 정보가 포함되어 있다. 특히 인포박스는 페이지 제목과 관련된 중요정보를 테이블형식의 구조화된 방식으로 기술하고 있다. 따라서, 위키피디아 인포박스 질의응답을 위해 질문에 포함된 위키피디아 페이지 제목과 인포박스 속성정보를 인식하는 것이 매우 중요하다. 본 논문은 페이지 제목 인식과 인포박스 속성정보 인식을 위해 명사기준 가변길이 슬라이딩 윈도우 방법과 어휘-의미 패턴을 이용한 방법을 제안한다. 그리고, 페이지 제목 인식 향상을 위한 음절기준 가변길이 슬라이딩 윈도우 방법을 제안한다. 인포박스 속성제약을 위해 정답유형에 기반한 제약방법을 제안한다. 평가데이터로 위키피디아 인포박스를 대상으로 한 질문 398개를 수작업으로 구축하였다. 실험결과, 질문 내 페이지 제목과 인포박스 속성 쌍의 인식 정밀도가 60.05%였다. 이는 위키피디아 인포박스를 대상으로 한 질문의 약 60%는 페이지 또는 단락검색과 정답추출 없이도 정답추출이 가능함을 의미한다.

기타언어초록

Concerning the question analysis for Wikipedia Infobox Q&A, this paper proposes a method for recognizing the title of a Wikipedia page, and restricting the Infobox attributes. Wikipedia is a semi-structured knowledge source which incorporates variety of information, such as titles, contents, and Infobox. Infobox is especially significant since it describes title-related information in a structured fashion using tables. Therefore, to successfully perform Wikipedia Infobox Q&A, it is essential to recognize titles and Infobox attributes included in the queries. This paper proposes noun-based variable-length sliding window method and lexico-semantic pattern method for the respective recognition tasks. To further increase the performance of title recognition, we additionally use syllable-based variable-length sliding window method. To restrict the space of Infobox attributes, we apply a method based on answer types. 398 Infobox-related questions were manually constructed for evaluation. Experiments showed that the precision for recognizing titles and Infobox attributes in the questions was 60.05%. This suggests that approximately 60% of the Infobox-related questions could be answered without having to search and extract answers from the contents.

키워드

위키피디아 질의응답 질문분석 위키피디아 인포박스 인포박스 속성제약 question and answering for wikipedia question analysis wikipedia infobox restriction of infobox attribute

다운URL