차별적 손실을 이용한 모델기반 강화학습

진 광; 노요환; 이도훈; Guang Jin; Yohwan Noh; DoHoon Lee

연구문헌

국내 논문지

홈 > 연구문헌 > 국내 논문지 > 한국정보과학회 논문지 > 정보과학회논문지 (Journal of KIISE)

정보과학회논문지 (Journal of KIISE)

Current Result Document :

한글제목(Korean Title)	차별적 손실을 이용한 모델기반 강화학습
영문제목(English Title)	Model-Based Reinforcement Learning with Discriminative Loss
저자(Author)	진 광 노요환 이도훈 Guang Jin Yohwan Noh DoHoon Lee
원문수록처(Citation)	VOL 47 NO. 06 PP. 0547 ~ 0552 (2020. 06)
한글내용 (Korean Abstract)	강화학습은 여러 가지 어려운 문제들을 해결하는 데 좋은 결과를 보여주고 있다. 그러나 이를 실제 문제에 적용하기에는 샘플 효율성이 큰 문제이다. 이 논문에서는 차별 손실함수를 이용한 모델기반 강화학습 프레임워크를 제안한다. 이 방법은 모델이 서로 다른 동작을 구별할 수 있도록 훈련한다. 이 프레임워크로 사전 학습된 인코더가 추출한 특징은 정책 그라디언트 방법이 추출한 특징과 일치한다는 것을 발견했다. 제안한 방법은 아타리(Atari) 게임 환경에서 기존의 모델기반 강화학습 방법보다 높은 샘플 효율성을 보였으며 특히 학습의 초기 단계에서는 기준선보다도 높은 효율성을 보였다
영문내용 (English Abstract)	Reinforcement learning is a framework for training the agent to make a good sequence of decisions through interacting with a complex environment. Although reinforcement learning has shown promising results in many tasks, sample efficiency still remains a major challenge for its real world application. We propose a novel model-based reinforcement learning framework that incorporates the discriminative loss function, in which models are trained to discriminate one action from another. The encoder pre-trained in this framework shows the feature alignment property, which aligns with the policy gradient method. The proposed method showed better sample efficiency than conventional model-based reinforcement learning approaches in the Atari game environment. In the early stage of the training, the proposed method surpassed the baseline by a large margin
키워드(Keyword)	강화학습 모델기반 차별 손실 함수 동작 구별 reinforcement learning model-based RL discriminative loss function action discriminatio
파일첨부	PDF 다운로드