An Active Co-Training Algorithm for Biomedical Named-Entity Recognition

An Active Co-Training Algorithm for Biomedical Named-Entity Recognition
An Active Co-Training Algorithm for Biomedical Named-Entity Recognition

ㆍ 저자명: Munkhdalai. Tsendsuren,Li. Meijing,Yun. Unil,Namsrai. Oyun-Erdene,Ryu. Keun Ho
ㆍ 간행물명: Journal of information processing systems
ㆍ 권/호정보: 2012년|8권 4호|pp.575-588 (14 pages)
ㆍ 발행정보: 한국정보처리학회
ㆍ 파일정보: 정기간행물|ENG|
PDF텍스트
ㆍ 주제분야: 기타

이 논문은 한국과학기술정보연구원과 논문 연계를 통해 무료로 제공되는 원문입니다.

서지반출

기타언어초록

Exploiting unlabeled text data with a relatively small labeled corpus has been an active and challenging research topic in text mining, due to the recent growth of the amount of biomedical literature. Biomedical named-entity recognition is an essential prerequisite task before effective text mining of biomedical literature can begin. This paper proposes an Active Co-Training (ACT) algorithm for biomedical named-entity recognition. ACT is a semi-supervised learning method in which two classifiers based on two different feature sets iteratively learn from informative examples that have been queried from the unlabeled data. We design a new classification problem to measure the informativeness of an example in unlabeled data. In this classification problem, the examples are classified based on a joint view of a feature set to be informative/non-informative to both classifiers. To form the training data for the classification problem, we adopt a query-by-committee method. Therefore, in the ACT, both classifiers are considered to be one committee, which is used on the labeled data to give the informativeness label to each example. The ACT method outperforms the traditional co-training algorithm in terms of f-measure as well as the number of training iterations performed to build a good classification model. The proposed method tends to efficiently exploit a large amount of unlabeled data by selecting a small number of examples having not only useful information but also a comprehensive pattern.

키워드

Biomedical Named-Entity Recognition Co-Training Semi-Supervised Learning Feature Processing Text Mining

다운URL