• Àüü
  • ÀüÀÚ/Àü±â
  • Åë½Å
  • ÄÄÇ»ÅÍ
´Ý±â

»çÀÌÆ®¸Ê

Loading..

Please wait....

±¹³» ³í¹®Áö

Ȩ Ȩ > ¿¬±¸¹®Çå > ±¹³» ³í¹®Áö > Çѱ¹Á¤º¸°úÇÐȸ ³í¹®Áö > Á¤º¸°úÇÐȸ³í¹®Áö (Journal of KIISE)

Á¤º¸°úÇÐȸ³í¹®Áö (Journal of KIISE)

Current Result Document :

ÇѱÛÁ¦¸ñ(Korean Title) ¿µ»ó ±â¹Ý ´ëÈ­¸¦ À§ÇÑ ¸ðµâ ½Å°æ¸Á ÇнÀ
¿µ¹®Á¦¸ñ(English Title) Neural Module Network Learning for Visual Dialog
ÀúÀÚ(Author) Á¶¿µ¼ö   ±èÀÎö   Yeongsu Cho   Incheol Kim  
¿ø¹®¼ö·Ïó(Citation) VOL 46 NO. 12 PP. 1304 ~ 1313 (2019. 12)
Çѱ۳»¿ë
(Korean Abstract)
º» ³í¹®¿¡¼­´Â ¿µ»ó ±â¹Ý ´ëÈ­¸¦ À§ÇÑ »õ·Î¿î ¸ðµâ ½Å°æ¸Á ¸ðµ¨À» Á¦¾ÈÇÑ´Ù. ¿µ»ó ±â¹Ý ´ëÈ­´Â ¸î °¡Áö ¾î·Á¿î µµÀüÀû °úÁ¦¸¦ °¡Áö°í ÀÖ´Ù. ù ¹ø°´Â ÀÚ¿¬¾î Áú¹®¿¡¼­ ¾ð±ÞÇÏ´Â °³Ã¼µéÀ» ÁÖ¾îÁø ÀÔ·Â ¿µ»óÀÇ ¾î¶² ¹°Ã¼µé°ú ¿¬°ü Áö¾î ÀÌÇØÇØ¾ß Çϴ°¡¿¡ °üÇÑ ½Ã°¢Àû Á¢Áö ¹®Á¦ÀÌ´Ù. ±×¸®°í µÎ ¹ø°´Â »õ·Î¿î Áú¹®¿¡ Æ÷ÇÔµÈ ¸í»ç±¸³ª ´ë¸í»ç°¡ °ú°Å Áú¹®À̳ª ´äº¯¿¡ µîÀåÇÏ´Â ¾î¶² °³Ã¼¸¦ °¡¸®Å°¸ç, °á±¹ ÀÔ·Â ¿µ»óÀÇ ¾î¶² ¹°Ã¼¸¦ ÀǹÌÇÏ´Â Áö¸¦ ¾Ë¾Æ³»´Â ½Ã°¢Àû »óÈ£ ÂüÁ¶ ÇØ¼Ò ¹®Á¦ÀÌ´Ù. ÀÌ·¯ÇÑ ¹®Á¦µéÀ» ÇØ°áÇÏ°íÀÚ, º» ³í¹®¿¡¼­´Â Áú¹® ¸ÂÃãÇü ¸ðµâ ½Å°æ¸Á°ú ÂüÁ¶ Ç®À» ÀÌ¿ëÇÏ´Â »õ·Î¿î ¿µ»ó ±â¹Ý ´ëÈ­ ¸ðµ¨À» Á¦¾ÈÇÑ´Ù. º» ³í¹®ÀÇ Á¦¾È ¸ðµ¨Àº ºñ±³ Áú¹®µé¿¡ È¿°úÀûÀ¸·Î ´äÇϱâ À§ÇÑ »õ·Î¿î ºñ±³ ¸ðµâÀ» Æ÷ÇÔ ÇÒ»Ó¸¸ ¾Æ´Ï¶ó, ÀÌÁß ÁÖÀÇ ÁýÁß ¸ÞÄ¿´ÏÁòÀ» Àû¿ëÇØ ¼º´ÉÀ» Çâ»ó½ÃŲ »õ·Î¿î ŽÁö ¸ðµâ, ÂüÁ¶ Ç®À» ÀÌ¿ëÇØ ½Ã°¢Àû »óÈ£ ÂüÁ¶¸¦ ÇؼÒÇÏ´Â ÂüÁ¶ ¸ðµâ µîÀ» Æ÷ÇÔÇÑ´Ù. Á¦¾È ¸ðµ¨ÀÇ ¼º´É Æò°¡¸¦ À§ÇØ, ´ë±Ô¸ð º¥Ä¡¸¶Å© µ¥ÀÌÅÍ ÁýÇÕÀÎ VisDial v0.9¿Í VisDial v1.0À» ÀÌ¿ëÇÑ ´Ù¾çÇÑ ½ÇÇèµéÀ» ¼öÇàÇÏ¿´´Ù. ±×¸®°í ÀÌ ½ÇÇèµéÀ» ÅëÇØ, ±âÁ¸ÀÇ ÃֽŠ¿µ»ó ±â¹Ý ´ëÈ­ ¸ðµ¨µé¿¡ ºñÇØ º» ³í¹®¿¡¼­ Á¦¾ÈÇÑ ¸ðµ¨ÀÇ ´õ ¶Ù¾î³­ ¼º´ÉÀ» È®ÀÎÇÒ ¼ö ÀÖ¾ú´Ù.
¿µ¹®³»¿ë
(English Abstract)
In this paper, we propose a novel neural module network (NMN) model for visual dialog. Visual dialog currently has several challenges: The first one is visual grounding, which is concerned with how to associate the entities mentioned in the natural language question with the visual objects included in the given image. The other one is visual co-reference resolution, which involves how to determine which words, typically noun phrases and pronouns, co-refer to the same visual object in a given image. In order to address these issues, we suggest a new visual dialog model using both question-customized neural module networks and a reference pool. The proposed model includes not only a new Compare module to answer the questions that require comparing prosperities between two visual objects, but also a novel Find module improved by using a dual attention mechanism, and a Refer module to resolve visual co-references with the reference pool. To evaluate the performance of the proposed model, we conduct various experiments on two large benchmark datasets, VisDial v0.9 and VisDial v1.0. The results of these experiments show that the proposed model outperforms the state-of-the-art models for visual dialog.
Å°¿öµå(Keyword) ¿µ»ó ±â¹Ý ´ëÈ­   ¸ðµâ ½Å°æ¸Á   ½Ã°¢Àû Á¢Áö   ½Ã°¢ »óÈ£ ÂüÁ¶ ÇؼҠ  ½ÉÃþ ½Å°æ¸Á   visual dialog   neural module network   visual grounding   visual co-reference resolution   deep neural network  
ÆÄÀÏ÷ºÎ PDF ´Ù¿î·Îµå