시계열 데이타베이스에서 임의 계수의 이동평균 변환을 지원하는 서브시퀀스 매칭 알고리즘

서지반출

기타언어초록

본 논문에서는 시계열 데이터베이스에서 임의 계수의 이동평균 변환을 지원하는 서브시퀀스 매칭 알고리즘을 제안한다. 이동평균 변환은 시계열 데이터 내의 잡음의 영향을 감소시킴으로써 시계열데이타 전체의 경향을 파악하는데에 유용하여 통계경제학 등의 분야에서 널리 사용되어 왔다. 응용 분야와 분석하려고 하는 시계열 데이타의 특성에 따라 잡음의 영향을 줄이는 정도와 경향을 파악하는 주기가 달라지므로 이동평균 계수의 선택도 달라진다. 제안된 매칭 알고리즘은 기존의 서브시퀀스 매칭 알고리즘을 확장하여 임의 계수의 이동평균 변환을 지원한다. 기존의 서브시퀀스 매칭 알고리즘을 확장 없이 그대로 응용할 경우 하나의 이동평균 계수에 대하여 하나씩의 인덱스를 생성하여야 한다. 따라서, 임의의 이동평균 계수를 지원하려면 저장 공간 및 데이터 시퀀스의 삽입/삭제 부담이 매우 심각하다. 본 논문에서는 하나의 이동평균 계수 $ extsc{k}$에 대해서 생성한 인덱스만을 이용하여 인덱스가 생성되어 있지 않은 계수 m($leq$$textsc{k}$)에 대해서도 탐색을 수행하는 방법을 제안한다. 이때, 제안된 탐색 기법이 질의 결과로 반환되어야 할 서브시퀀스를 모두 찾아내지 못하는 착오 기각이 발생하지 않음을 증명한다. 제안된 알고리즘은 하나 이상의 이동평균 계수에 대하여 생성된 인덱스를 이용할 수도 있으며, 이때 탐색 성능의 향상을 얻을 수 있다. 실험을 통하여 제안된 알고리즘의 평균 탐색 성능을 구한 결과, 제안된 알고리즘이 순차 검색에 비하여 최대 약 2.7배까지 우수하였다. 제안된 알고리즘의 탐색 성능은 탐색 결과 선택률이 작아질수록 향상되어, 일반적인 데이터베이스 응용에서의 효용성이 높다고 판단된다. 본 논문에서 제안된 탐색 기법은 유사한 경향을 갖는 주가 데이타의 검색, 특정 상품의 판매 예측, 기온 데이터 분석을 통한 일기 예보 등 이동평균 변환을 필요로 하는 다양한 응용 분야에 적용될수 있다.

기타언어초록

In this paper, we propose a subsequence matching algorithm that supports moving average transform of arbitrary order in time-series databases. Moving average transform reduces the effect of noise and has been used in many areas such as econometrics since it is useful in finding the overall trends in the time-series data. The moving average order to be used varies, since the users want to control the degree of noise reduction and the frequency of analysis depending on the applications and the characteristics of data sequences. The proposed matching algorithm supports moving average transform of arbitrary order by extending the existing subsequence matching algorithm. If we applied the existing subsequence matching algorithm without any extension, we would have to generate an index per each moving average order. Thus, supporting an arbitrary moving average order would cause serious overhead on storage space and insertion/deletion of data sequences. The proposed algorithm can use only one index for a preselected moving average order k and performs subsequence matching for an arbitrary order m($leq$k). We prove that the proposed algorithm causes no false dismissal, i.e., it does not miss part of the final search result. The proposed algorithm can also use more than one index for improving search performance. We have evaluated the performance of the proposed algorithm through experiments. The results show that the proposed algorithm improves the performance by up to 2.7 times on the average compared with the sequential scan algorithm. Since the proposed subsequence matching algorithm works better with smaller selectivities, it is suitable for practical applications. The proposed algorithm can be applied in a variety of areas that use the moving average transform. They include finding stock items with similar trends in prices, estimation of sales for a Product, and weather forecast through temperature data analysis.

다운URL