멀티미디어 데이터베이스에서 최근접 질의의 성능평가에 관한 분석모델

서지반출

기타언어초록

다차원 색인 트리를 이용하는 최근접 질의는 멀티미디어 데이타베이스템에서 자주 사용되는 중요한 질의 형식으로서 그 성능 분석은 질의 성능 개선을 위해 중요하다. 지금까지 다차원 트리에서의 질의 성능 분석 모델에 관한 대부분의 연구는 R 트리와 같은 특정 트리에서의 범위 질의에 대한 분석만을 주로 다루고 있었으나 최근 다차원 색인 트리에서 최근접 질의에 대한 성능 모델 1 이 발표되었다. 그러나 이 모델은 1-최근접 질의만을 다루고 있다. 본 논문에서는 다차원 색인 트리에서 일반적인 k-최근접 질의의 성능 분석 모델을 제시한다. 이 모델은 다차원 색인 트리의 종류나 k-최근접 질의 처리 알고리즘의 종류에 관계 없이 적용되는 모델이다. 모델의 기본 개념으로서 지역 평균 볼륨과 가변 밀도 함수의 개념을 소개한다. 본 모델의 이점은 다음과 같다: 임의의 데이타 분포를 가진 데이타 집합에 대해서도 적용할 수 있고, 1-최근접 질의뿐 아니라 k-최근접 질의에서도 잘 적용되며, 색인 트리에 저장된 데이타로 곧바로 분석하므로 시간이 많이 소요되는 시뮬레이션 없이 빠르게 분석할 수 있다. 본 모델의 정확성을 평가하기 위해서 여러 가지 분포의 데이타 집합에 관하여 실험하였다. 실험 결과는 저차원 또는 중차원 데이타 집합에 대하여 데이타의 분포에 관계없이 정확한 결과를 보여주고 있다.Abstract The k-nearest neighbor query in multidimensional index tree is one of the most frequently used query types in multimedia databases. It is important to analyze the performance of the k-nearest neighbor query for its performance improvement. Until now, most of the analytic models are restricted to a particular type of the index tree, for example, the R-Tree and they concentrate on the analysis of the range query. Recently, a cost model 1 was reported for nearest neighbor queries. However, the model considered only 1-nearest neighbor queries rather than k-nearest neighbor queries. In this paper, we present an analytic model for the performance of the k-nearest neighbor query in multidimensional index trees. This model is independent of kinds of multi-dimensional index trees and k-nearest neighbor algorithms. As a basis of the model, we introduce the concept of the regional average volume and the varying density function. The advantages of our model are in particular as follows: It is applicable to any type of datasets with arbitrary distributions (uniform and non-uniform ones), works for the k- as well as 1-nearest neighbor query, and is a dynamic analysis method which enables a rapid analysis without requiring a time-consuming simulation of data. To estimate the accuracy of our model, we conducted a various range of experiments on the datasets with various distributions. The results show that our analytic model is accurate for the data sets with non-uniform distributions as well as uniform distributions in low and mid dimensions.

다운URL