임무수행을 위한 개선된 강화학습 방법

임무수행을 위한 개선된 강화학습 방법

ㆍ 저자명: 권우영,이상훈,서일홍
ㆍ 간행물명: 전기학회논문지. The transactions of the Korean Institute of Electrical Engineers. D / D, 시스템 및 제어부문
ㆍ 권/호정보: 2003년|52권 9호|pp.533-539 (7 pages)
ㆍ 발행정보: 대한전기학회
ㆍ 파일정보: 정기간행물|
PDF텍스트
ㆍ 주제분야: 기타

이 논문은 한국과학기술정보연구원과 논문 연계를 통해 무료로 제공되는 원문입니다.

서지반출

기타언어초록

Reinforcement learning (RL) has been widely used as a learning mechanism of an artificial life system. However, RL usually suffers from slow convergence to the optimum state-action sequence or a sequence of stimulus-response (SR) behaviors, and may not correctly work in non-Markov processes. In this paper, first, to cope with slow-convergence problem, if some state-action pairs are considered as disturbance for optimum sequence, then they no to be eliminated in long-term memory (LTM), where such disturbances are found by a shortest path-finding algorithm. This process is shown to let the system get an enhanced learning speed. Second, to partly solve a non-Markov problem, if a stimulus is frequently met in a searching-process, then the stimulus will be classified as a sequential percept for a non-Markov hidden state. And thus, a correct behavior for a non-Markov hidden state can be learned as in a Markov environment. To show the validity of our proposed learning technologies, several simulation result j will be illustrated.

키워드

reinforcement learning delayed reward markov process batch process

다운URL