Á¤º¸°úÇÐȸ³í¹®Áö (Journal of KIISE)
Current Result Document :
ÇѱÛÁ¦¸ñ(Korean Title) |
Ç°»ç Á¤º¸¸¦ È°¿ëÇÑ À̹ÌÁö ĸ¼Ç »ý¼º |
¿µ¹®Á¦¸ñ(English Title) |
Boosting Image Caption Generation with Parts of Speech |
ÀúÀÚ(Author) |
°Çʱ¸
ÀÓÀ¯ºó
±èÇüÁÖ
Philgoo Kang
Yubin Lim
Hyoungjoo Kim
|
¿ø¹®¼ö·Ïó(Citation) |
VOL 48 NO. 03 PP. 0317 ~ 0324 (2021. 03) |
Çѱ۳»¿ë (Korean Abstract) |
ÀÏ»ó »ýÈ° ¼Ó¿¡¼ÀÇ ½º¸¶Æ® ±â±â¿Í AI¿¡ ´ëÇÑ ÀÇÁ¸µµ°¡ ³ô¾ÆÁö¸é¼, ½Ã°¢ Àå¾ÖÀÎ º¸Á¶, Àΰ£ ÄÄÇ»ÅÍ »óÈ£ ÀÛ¿ë µî ´Ù¾çÇÑ ºÐ¾ß¿¡ Á¢¸ñ °¡´ÉÇÑ À̹ÌÁö ĸ¼Ç »ý¼º ±â¼úÀÇ Á߿伺ÀÌ ³ô¾ÆÁö°í ÀÖ´Ù. º» ³í ¹®¿¡¼´Â ĸ¼Ç »ý¼º ±â´ÉÀÇ Çâ»óÀ» À§ÇØ ¸í»ç, µ¿»ç¿Í °°Àº ¾ð¾îÀÇ Ç°»ç(POS) Á¤º¸¸¦ À̹ÌÁö·ÎºÎÅÍ ÃßÃâÇÏ¿© È°¿ëÇÏ´Â »õ·Î¿î ±â¹ýÀ» Á¦¾ÈÇÑ´Ù. Á¦¾ÈÇÏ´Â ¸ðµ¨Àº º¹¼öÀÇ CNN ÀÎÄÚ´õ¸¦ Ç°»ç º°·Î ÇнÀÇÏ¿© Ç°»ç º° Ư¡ º¤Å͸¦ ÃßÃâÇÑ ÈÄ, ÃßÃâÇÑ Ç°»ç º¤Å͸¦ LSTM¿¡ ÀÔ·ÂÇÏ¿© ĸ¼ÇÀ» »ý¼ºÇÑ´Ù. Á¦¾ÈÇÑ ¸ðµ¨Àº Flickr30k, MS-COCO µ¥ÀÌÅÍ ¼Â¿¡ ´ëÇØ ½ÇÇèÀ» ÁøÇàÇϸç, »ç¶÷À» ´ë»óÀ¸·Î 2°¡Áö ¼³¹® Á¶»ç¸¦ ÁøÇàÇÏ¿© °á°ú¹°ÀÇ ½ÇÁúÀûÀÎ À¯È¿¼ºÀ» °ËÁõÇÑ´Ù.
|
¿µ¹®³»¿ë (English Abstract) |
With the integration of smart devices and reliance on AI into our daily lives, the ability to generate image caption is becoming increasingly important in various fields such as guidance for visually-impaired individuals, human-computer interaction and so on. In this paper, we propose a novel approach based on parts of speech (POS), such as nouns and verbs extracted from image to enhance the image caption generation. The proposed model exploits multiple CNN encoders, which were specifically trained to identify features related to POS, and feed them into an LSTM decoder to generate image captions. We conducted experiments involving both Flickr30k and MS-COCO datasets using several text metrics and additional human surveys to validate the practical effectiveness of the proposed model.
|
Å°¿öµå(Keyword) |
À̹ÌÁö ĸ¼Ç »ý¼º
ÀÎÄÚ´õ-µðÄÚ´õ ±¸Á¶
Ç°»ç
ÄÄÇ»ÅÍ ºñÀü
image caption generation
encoder-decoder architec
parts of speech
computer vision
|
ÆÄÀÏ÷ºÎ |
PDF ´Ù¿î·Îµå
|