고차 데이터 분류를 위한 순차적 베이지안 샘플링을 기반으로 한 하이퍼네트워크 모델의 진화적 학습 기법

서지반출

기타언어초록

본 연구에서는 고차 데이터 분류를 위해 순차적 베이지안 샘플링 기반의 진화연산 기법을 이용한 하이퍼네트워크 모델의 학습 알고리즘을 제시한다. 제시하는 방법에서는 모델의 조건부 확률의 사후(posterior) 분포를 최대화하도록 학습이 진행된다. 이를 위해 사전(prior) 분포를 문제와 관련된 사전지식(prior knowledge) 및 모델 복잡도(model complexity)로 정의하고, 측정된 모델의 분류성능을 우도(likelihood)로 사용하며, 측정된 사전분포와 우도를 이용하여 모델의 적합도(fitness)를 정의한다. 이를 통해 하이퍼네트워크 모델은 고차원 데이터에 대하여 효율적인 학습이 가능할 뿐만 아니라 모델의 학습시간 및 분류성능이 개선될 수 있다. 또한 학습 시에 파라미터로 주어지던 하이퍼에지의 구성 및 모델의 크기가 학습과정 중에 적응적으로 결정될 수 있다. 제안하는 학습방법의 검증을 위해 본 논문에서는 약 25,000개의 유전자 발현정보 데이터 집합에 대한 분류문제에 모델을 적용한다. 실험 결과를 통해 제안하는 방법이 기존 하이퍼네트워크 학습 방법 뿐 아니라 다른 모델들에 비해 우수한 분류 성능을 보여주는 것을 확인할 수 있다. 또한 다양한 실험을 통해 사전분포로 사용된 사전지식이 모델 학습에 끼치는 영향을 분석한다.

기타언어초록

Here we propose an evolutionary method for learning hypernetworks based on sequential Bayesian sampling for high-dimensional data. The hypernetworks is learned to maximize the conditional posterior distribution with the proposed method by defining the model fitness to the posterior. The prior distribution is defined to the function of both prior knowledge of the problem and the model complexity for efficient learning. The likelihood is estimated by measuring the discriminative capability on the given data of the model. Therefore, the hypernetworks can efficiently learn very high-dimensional data and be improved with respect to the classification performance and the learning time by our method. Moreover, significant parameters in learning hypernetworks such as hyperedge degrees and the composition of a hyperedge are adaptively determined by the fitness. For evaluation, we apply the proposed method to classifying the two types of prostate cancer from expression dataset consisting of approximately 25,000 genes. Experimental results show that the hypernetwork applying the proposed method outperforms conventional hypernetworks as well as other classification models. Also, we present how the information used as the prior distribution influences on the model learning with diverse experiments.

키워드

하이퍼네트워크 고차 데이터 분류 하이퍼그래프 시퀀스 베이지안 샘플링 hypernetwork high-dimensional data classification hypergraph sequential Bayesian sampling

다운URL