Á¤º¸Ã³¸®ÇÐȸ ³í¹®Áö ¼ÒÇÁÆ®¿þ¾î ¹× µ¥ÀÌÅÍ °øÇÐ
Current Result Document :
ÇѱÛÁ¦¸ñ(Korean Title) |
KG_VCR: Áö½Ä ±×·¡ÇÁ¸¦ ÀÌ¿ëÇÏ´Â ¿µ»ó ±â¹Ý »ó½Ä Ãß·Ð ¸ðµ¨ |
¿µ¹®Á¦¸ñ(English Title) |
KG_VCR: A Visual Commonsense Reasoning Model Using Knowledge Graph |
ÀúÀÚ(Author) |
JaeYun Lee
Incheol Kim
ÀÌÀçÀ±
±èÀÎö
|
¿ø¹®¼ö·Ïó(Citation) |
VOL 09 NO. 03 PP. 0091 ~ 0100 (2020. 03) |
Çѱ۳»¿ë (Korean Abstract) |
±âÁ¸ÀÇ ¿µ»ó ±â¹Ý Áú¹®-ÀÀ´ä(VQA) ¹®Á¦µé°ú´Â ´Þ¸®, »õ·Î¿î ¿µ»ó ±â¹Ý »ó½Ä Ãß·Ð(VCR) ¹®Á¦µéÀº ¿µ»ó¿¡ Æ÷ÇÔµÈ »ç¹°µé °£ÀÇ °ü°è Æľǰú ´äº¯ ±Ù°Å Á¦½Ã µî°ú °°ÀÌ Ãß°¡ÀûÀÎ ½ÉÃþ »ó½Ä Ãß·ÐÀ» ¿ä±¸ÇÑ´Ù. º» ³í¹®¿¡¼´Â ¿µ»ó ±â¹Ý »ó½Ä Ãß·Ð ¹®Á¦µéÀ» À§ÇÑ »õ·Î¿î ½ÉÃþ ½Å°æ¸Á ¸ðµ¨ÀÎ KG_VCRÀ» Á¦¾ÈÇÑ´Ù. KG_VCR ¸ðµ¨Àº ÀÔ·Â µ¥ÀÌÅÍ(¿µ»ó, ÀÚ¿¬¾î Áú¹®, ÀÀ´ä ¸®½ºÆ® µî)¿¡¼ ÃßÃâÇÏ´Â »ç¹°µé °£ÀÇ °ü°è¿Í ¸Æ¶ô Á¤º¸µéÀ» ÀÌ¿ëÇÒ »Ó¸¸ ¾Æ´Ï¶ó, ¿ÜºÎ Áö½Ä º£À̽ºÀÎ ConceptNetÀ¸·ÎºÎÅÍ ±¸Çس»´Â »ó½Ä ÀÓº£µùÀ» ÇÔ²² È°¿ëÇÑ´Ù. ƯÈ÷ Á¦¾È ¸ðµ¨Àº ConceptNetÀ¸·ÎºÎÅÍ °Ë»öÇس½ ¿¬°ü Áö½Ä ±×·¡ÇÁ¸¦ È¿°úÀûÀ¸·Î ÀÓº£µùÇϱâ À§ÇØ ±×·¡ÇÁ ÇÕ¼º°ö ½Å°æ¸Á(GCN) ¸ðµâÀ» ä¿ëÇÑ´Ù. VCR º¥Ä¡¸¶Å© µ¥ÀÌÅÍ ÁýÇÕÀ» ÀÌ¿ëÇÑ ´Ù¾çÇÑ ½ÇÇèµéÀ» ÅëÇØ, º» ³í¹®¿¡¼´Â Á¦¾È ¸ðµ¨ÀÎ KG_VCRÀÌ ±âÁ¸ÀÇ VQA ÃÖ°í ¸ðµ¨°ú R2C VCR ¸ðµ¨º¸´Ù ´õ ³ôÀº ¼º´ÉÀ» º¸Àδٴ °ÍÀ» ÀÔÁõÇÑ´Ù.
|
¿µ¹®³»¿ë (English Abstract) |
Unlike the existing Visual Question Answering(VQA) problems, the new Visual Commonsense Reasoning(VCR) problems require deep common sense reasoning for answering questions: recognizing specific relationship between two objects in the image, presenting the rationale of the answer. In this paper, we propose a novel deep neural network model, KG_VCR, for VCR problems. In addition to make use of visual relations and contextual information between objects extracted from input data (images, natural language questions, and response lists), the KG_VCR also utilizes commonsense knowledge embedding extracted from an external knowledge base called ConceptNet. Specifically the proposed model employs a Graph Convolutional Neural Network(GCN) module to obtain commonsense knowledge embedding from the retrieved ConceptNet knowledge graph. By conducting a series of experiments with the VCR benchmark dataset, we show that the proposed KG_VCR model outperforms both the state of the art(SOTA) VQA model and the R2C VCR model.
|
Å°¿öµå(Keyword) |
¿µ»ó ±â¹Ý »ó½Ä Ãß·Ð
½ÉÃþ ½Å°æ¸Á
±×·¡ÇÁ ÇÕ¼º°ö ½Å°æ¸Á
Áö½Ä ±×·¡ÇÁ ÀÓº£µù
Visual Commonsense Reasoning
Deep Neural Network
Graph Convolutional Network
Knowledge Graph Embedding
|
ÆÄÀÏ÷ºÎ |
PDF ´Ù¿î·Îµå
|