Á¤º¸°úÇÐȸ ÄÄÇ»ÆÃÀÇ ½ÇÁ¦ ³í¹®Áö (KIISE Transactions on Computing Practices)
Current Result Document :
ÇѱÛÁ¦¸ñ(Korean Title) |
ViStoryNet: ºñµð¿À ½ºÅ丮 ÀçÇöÀ» À§ÇÑ ¿¬¼Ó À̺¥Æ® ÀÓº£µù ¹× BiLSTM ±â¹Ý ½Å°æ¸Á |
¿µ¹®Á¦¸ñ(English Title) |
ViStoryNet: Neural Networks with Successive Event Order Embedding and BiLSTMs for Video Story Regeneration |
ÀúÀÚ(Author) |
Çã¹Î¿À
±è°æ¹Î
À庴Ź
Min-Oh Heo
Kyung-Min Kim
Byoung-Tak Zhang
|
¿ø¹®¼ö·Ïó(Citation) |
VOL 24 NO. 03 PP. 0138 ~ 0144 (2018. 03) |
Çѱ۳»¿ë (Korean Abstract) |
º» °í¿¡¼´Â ºñµð¿À·ÎºÎÅÍ coherent story¸¦ ÇнÀÇÏ¿© ºñµð¿À ½ºÅ丮¸¦ ÀçÇöÇÒ ¼ö ÀÖ´Â ½ºÅ丮 ÇнÀ/ÀçÇö ÇÁ·¹ÀÓ¿öÅ©¸¦ Á¦¾ÈÇÑ´Ù. À̸¦ À§ÇØ ¿¬¼Ó À̺¥Æ® ¼ø¼¸¦ °¨µ¶ÇнÀ Á¤º¸·Î »ç¿ëÇÔÀ¸·Î½á °¢ ¿¡ÇǼҵåµéÀÌ Àº´Ð °ø°£ »ó¿¡¼ ±ËÀû ÇüŸ¦ °¡Áöµµ·Ï À¯µµÇÏ¿©, ¼ø¼Á¤º¸¿Í ÀǹÌÁ¤º¸¸¦ ÇÔ²² ´Ù·ê ¼ö ÀÖ´Â º¹ÇÕµÈ Ç¥Çö °ø°£À» ±¸ÃàÇÏ°íÀÚ ÇÑ´Ù. À̸¦ À§ÇØ À¯¾Æ¿ë ºñµð¿À ½Ã¸®Á ÇнÀµ¥ÀÌÅÍ·Î È°¿ëÇÏ¿´´Ù. ÀÌ´Â À̾߱⠱¸¼ºÀÇ Æ¯¼º, ³»·¯Æ¼ºê ¼ø¼, º¹Àâµµ ¸é¿¡¼ ¿©·¯ ÀåÁ¡ÀÌ ÀÖ´Ù. ¿©±â¿¡ ¿¬¼Ó À̺¥Æ® ÀÓº£µùÀ» ¹Ý¿µÇÑ ÀÎÄÚ´õ-µðÄÚ´õ ±¸Á¶¸¦ ±¸ÃàÇÏ°í, Àº´Ð °ø°£ »óÀÇ ½ÃÄö½ºÀÇ ¸ðµ¨¸µ¿¡ ¾ç¹æÇâ LSTMÀ» ÇнÀ½ÃÅ°µÇ ¿©·¯ ½ºÅÜÀÇ ¼¿ µ¥ÀÌÅÍ »ý¼ºÀ» °í·ÁÇÏ¿´´Ù. ¡®»Ç·Õ»Ç·Õ »Ç·Î·Î¡¯ ½Ã¸®Áî ºñµð¿À·ÎºÎÅÍ ÃßÃâµÈ ¾à 200 °³ÀÇ ¿¡ÇǼҵ带 ÀÌ¿ëÇÏ¿© ½ÇÇè°á°ú¸¦ º¸¿´´Ù. ½ÇÇèÀ» ÅëÇØ ¿¡ÇǼҵåµéÀÌ Àº´Ð°ø°£¿¡¼ ±ËÀû ÇüŸ¦ °®´Â °Í°ú ÀϺΠť°¡ ÁÖ¾îÁ³À» ¶§ ½ºÅ丮¸¦ ÀçÇöÇÏ´Â ¹®Á¦¿¡ Àû¿ëÇÒ ¼ö ÀÖÀ½À» º¸¿´´Ù.
|
¿µ¹®³»¿ë (English Abstract) |
A video is a vivid medium similar to human¡¯s visual-linguistic experiences, since it can inculcate a sequence of situations, actions or dialogues that can be told as a story. In this study, we propose story learning/regeneration frameworks from videos with successive event order supervision for contextual coherence. The supervision induces each episode to have a form of trajectory in the latent space, which constructs a composite representation of ordering and semantics. In this study, we incorporated the use of kids videos as a training data. Some of the advantages associated with the kids videos include omnibus style, simple/explicit storyline in short, chronological narrative order, and relatively limited number of characters and spatial environments. We build the encoder-decoder structure with successive event order embedding, and train bi-directional LSTMs as sequence models considering multi-step sequence prediction. Using a series of approximately 200 episodes of kids videos named ¡®Pororo the Little Penguin¡¯, we give empirical results for story regeneration tasks and SEOE. In addition, each episode shows a trajectory-like shape on the latent space of the model, which gives the geometric information for the sequence models.
|
Å°¿öµå(Keyword) |
ºñµð¿À ½ºÅ丮 ÇнÀ
ºñµð¿À ½ºÅ丮 ÀçÇö
¿¬¼Ó À̺¥Æ® ÀÓº£µù
À¯¾Æ¿ë ºñµð¿À µ¥ÀÌÅÍÁýÇÕ
video story learning
video story regeneration
successive event order embedding
kids video dataset
|
ÆÄÀÏ÷ºÎ |
PDF ´Ù¿î·Îµå
|