- 연결특성함수를 이용한 문서화상에서의 영역 분리와 문자열 추출
- ㆍ 저자명
- 김석태,이대원,박찬용,남궁재찬
- ㆍ 간행물명
- 한국통신학회논문지
- ㆍ 권/호정보
- 1997년|22권 11호|pp.2531-2542 (12 pages)
- ㆍ 발행정보
- 한국통신학회
- ㆍ 파일정보
- 정기간행물| PDF텍스트
- ㆍ 주제분야
- 기타
This paper describes a method for region segmentation and string extractionin documents which are mixed with text, graphic and picture images by the use of the structural characteristic of connceted components. In segmentation of non-text regionas, with connection-characteristic functions which are made by structural characteristic of connected components, segmentation process is progressed. In the string extraction, first we organize basic-unit-region of which vertical and horizontal length are 1/4 of average length of connection components. Second, by merging the basic-unit-regions one other that have smaller values than a given connection intensity threshold. Third, by linking the word blocks with similar block anagles, initial strings are cresed. Finally the whold strings are generated by merging remaining word blocks whose angles are not decided, if their height and prosition are similar to the initial strings. This method can extract strings that are neither horizontal nor of various character sizes. Through computer exteriments with different style documents, we have shown that the feasibility of our method successes.