ViStoryNet: 비디오 스토리 재현을 위한 연속 이벤트 임베딩 및 BiLSTM 기반 신경망

허민오; 김경민; 장병탁; Min-Oh Heo; Kyung-Min Kim; Byoung-Tak Zhang

연구문헌

국내 논문지

홈 > 연구문헌 > 국내 논문지 > 한국정보과학회 논문지 > 정보과학회 컴퓨팅의 실제 논문지 (KIISE Transactions on Computing Practices)

정보과학회 컴퓨팅의 실제 논문지 (KIISE Transactions on Computing Practices)

Current Result Document :

한글제목(Korean Title)	ViStoryNet: 비디오 스토리 재현을 위한 연속 이벤트 임베딩 및 BiLSTM 기반 신경망
영문제목(English Title)	ViStoryNet: Neural Networks with Successive Event Order Embedding and BiLSTMs for Video Story Regeneration
저자(Author)	허민오 김경민 장병탁 Min-Oh Heo Kyung-Min Kim Byoung-Tak Zhang
원문수록처(Citation)	VOL 24 NO. 03 PP. 0138 ~ 0144 (2018. 03)
한글내용 (Korean Abstract)	본 고에서는 비디오로부터 coherent story를 학습하여 비디오 스토리를 재현할 수 있는 스토리 학습/재현 프레임워크를 제안한다. 이를 위해 연속 이벤트 순서를 감독학습 정보로 사용함으로써 각 에피소드들이 은닉 공간 상에서 궤적 형태를 가지도록 유도하여, 순서정보와 의미정보를 함께 다룰 수 있는 복합된 표현 공간을 구축하고자 한다. 이를 위해 유아용 비디오 시리즈를 학습데이터로 활용하였다. 이는 이야기 구성의 특성, 내러티브 순서, 복잡도 면에서 여러 장점이 있다. 여기에 연속 이벤트 임베딩을 반영한 인코더-디코더 구조를 구축하고, 은닉 공간 상의 시퀀스의 모델링에 양방향 LSTM을 학습시키되 여러 스텝의 서열 데이터 생성을 고려하였다. ‘뽀롱뽀롱 뽀로로’ 시리즈 비디오로부터 추출된 약 200 개의 에피소드를 이용하여 실험결과를 보였다. 실험을 통해 에피소드들이 은닉공간에서 궤적 형태를 갖는 것과 일부 큐가 주어졌을 때 스토리를 재현하는 문제에 적용할 수 있음을 보였다.
영문내용 (English Abstract)	A video is a vivid medium similar to human’s visual-linguistic experiences, since it can inculcate a sequence of situations, actions or dialogues that can be told as a story. In this study, we propose story learning/regeneration frameworks from videos with successive event order supervision for contextual coherence. The supervision induces each episode to have a form of trajectory in the latent space, which constructs a composite representation of ordering and semantics. In this study, we incorporated the use of kids videos as a training data. Some of the advantages associated with the kids videos include omnibus style, simple/explicit storyline in short, chronological narrative order, and relatively limited number of characters and spatial environments. We build the encoder-decoder structure with successive event order embedding, and train bi-directional LSTMs as sequence models considering multi-step sequence prediction. Using a series of approximately 200 episodes of kids videos named ‘Pororo the Little Penguin’, we give empirical results for story regeneration tasks and SEOE. In addition, each episode shows a trajectory-like shape on the latent space of the model, which gives the geometric information for the sequence models.
키워드(Keyword)	비디오 스토리 학습 비디오 스토리 재현 연속 이벤트 임베딩 유아용 비디오 데이터집합 video story learning video story regeneration successive event order embedding kids video dataset
파일첨부	PDF 다운로드