- Naive Bayes 문서 분류기를 위한 점진적 학습 모델 연구
- ㆍ 저자명
- 김제욱,김한준,이상구
- ㆍ 간행물명
- 정보기술과 데이타베이스저널
- ㆍ 권/호정보
- 2001년|8권 1호|pp.95-104 (10 pages)
- ㆍ 발행정보
- 한국데이타베이스학회
- ㆍ 파일정보
- 정기간행물| PDF텍스트
- ㆍ 주제분야
- 기타
In the text classification domain, labeling the training documents is an expensive process because it requires human expertise and is a tedious, time-consuming task. Therefore, it is important to reduce the manual labeling of training documents while improving the text classifier. Selective sampling, a form of active learning, reduces the number of training documents that needs to be labeled by examining the unlabeled documents and selecting the most informative ones for manual labeling. We apply this methodology to Naive Bayes, a text classifier renowned as a successful method in text classification. One of the most important issues in selective sampling is to determine the criterion when selecting the training documents from the large pool of unlabeled documents. In this paper, we propose two measures that would determine this criterion : the Mean Absolute Deviation (MAD) and the entropy measure. The experimental results, using Renters 21578 corpus, show that this proposed learning method improves Naive Bayes text classifier more than the existing ones.