- 기능 도메인 예측을 위한 유전자 서열 클러스터링
- ㆍ 저자명
- 한상일,이성근,허보경,변윤섭,황규석,Han. Sang-Il,Lee. Sung-Gun,Hou. Bo-Kyeng,Byun. Yoon-Sup,Hwang. Kyu-Suk
- ㆍ 간행물명
- 제어·자동화·시스템공학 논문지
- ㆍ 권/호정보
- 2006년|12권 10호|pp.1044-1049 (6 pages)
- ㆍ 발행정보
- 제어로봇시스템학회
- ㆍ 파일정보
- 정기간행물| PDF텍스트
- ㆍ 주제분야
- 기타
Multiple sequence alignment is a method to compare two or more DNA or protein sequences. Most of multiple sequence alignment tools rely on pairwise alignment and Smith-Waterman algorithm to generate an alignment hierarchy. Therefore, in the existing multiple alignment method as the number of sequences increases, the runtime increases exponentially. In order to remedy this problem, we adopted a parallel processing suffix tree algorithm that is able to search for common subsequences at one time without pairwise alignment. Also, the cross-matching subsequences triggering inexact-matching among the searched common subsequences might be produced. So, the cross-matching masking process was suggested in this paper. To identify the function of the clusters generated by suffix tree clustering, BLAST and CDD (Conserved Domain Database)search were combined with a clustering tool. Our clustering and annotating tool consists of constructing suffix tree, overlapping common subsequences, clustering gene sequences and annotating gene clusters by BLAST and CDD search. The system was successfully evaluated with 36 gene sequences in the pentose phosphate pathway, clustering 10 clusters, finding out representative common subsequences, and finally identifying functional domains by searching CDD database.