K-means 알고리즘 기반 클러스터링 인덱스 비교 연구

K-means 알고리즘 기반 클러스터링 인덱스 비교 연구
A Performance Comparison of Cluster Validity Indices based on K-means Algorithm

ㆍ 저자명: 심요성,정지원,최인찬,Shim. Yo-Sung,Chung. Ji-Won,Choi. In-Chan
ㆍ 간행물명: 경영정보학연구
ㆍ 권/호정보: 2006년|16권 1호|pp.127-144 (18 pages)
ㆍ 발행정보: 한국경영정보학회
ㆍ 파일정보: 정기간행물|
PDF텍스트
ㆍ 주제분야: 기타

이 논문은 한국과학기술정보연구원과 논문 연계를 통해 무료로 제공되는 원문입니다.

서지반출

기타언어초록

The K-means algorithm is widely used at the initial stage of data analysis in data mining process, partly because of its low time complexity and the simplicity of practical implementation. Cluster validity indices are used along with the algorithm in order to determine the number of clusters as well as the clustering results of datasets. In this paper, we present a performance comparison of sixteen indices, which are selected from forty indices in literature, while considering their applicability to nonhierarchical clustering algorithms. Data sets used in the experiment are generated based on multivariate normal distribution. In particular, four error types including standardization, outlier generation, error perturbation, and noise dimension addition are considered in the comparison. Through the experiment the effects of varying number of points, attributes, and clusters on the performance are analyzed. The result of the simulation experiment shows that Calinski and Harabasz index performs the best through the all datasets and that Davis and Bouldin index becomes a strong competitor as the number of points increases in dataset.

키워드

Data Mining Cluster Analysis Nonhierarchical Clustering K-means Cluster Validity Index

다운URL