• Àüü
  • ÀüÀÚ/Àü±â
  • Åë½Å
  • ÄÄÇ»ÅÍ
´Ý±â

»çÀÌÆ®¸Ê

Loading..

Please wait....

±¹³» ³í¹®Áö

Ȩ Ȩ > ¿¬±¸¹®Çå > ±¹³» ³í¹®Áö > Çѱ¹Á¤º¸°úÇÐȸ ³í¹®Áö > Á¤º¸°úÇÐȸ³í¹®Áö (Journal of KIISE)

Á¤º¸°úÇÐȸ³í¹®Áö (Journal of KIISE)

Current Result Document :

ÇѱÛÁ¦¸ñ(Korean Title) Ç°»ç Á¤º¸¸¦ È°¿ëÇÑ À̹ÌÁö ĸ¼Ç »ý¼º
¿µ¹®Á¦¸ñ(English Title) Boosting Image Caption Generation with Parts of Speech
ÀúÀÚ(Author) °­Çʱ¸   ÀÓÀ¯ºó   ±èÇüÁÖ   Philgoo Kang   Yubin Lim   Hyoungjoo Kim  
¿ø¹®¼ö·Ïó(Citation) VOL 48 NO. 03 PP. 0317 ~ 0324 (2021. 03)
Çѱ۳»¿ë
(Korean Abstract)
ÀÏ»ó »ýÈ° ¼Ó¿¡¼­ÀÇ ½º¸¶Æ® ±â±â¿Í AI¿¡ ´ëÇÑ ÀÇÁ¸µµ°¡ ³ô¾ÆÁö¸é¼­, ½Ã°¢ Àå¾ÖÀÎ º¸Á¶, Àΰ£ ÄÄÇ»ÅÍ »óÈ£ ÀÛ¿ë µî ´Ù¾çÇÑ ºÐ¾ß¿¡ Á¢¸ñ °¡´ÉÇÑ À̹ÌÁö ĸ¼Ç »ý¼º ±â¼úÀÇ Á߿伺ÀÌ ³ô¾ÆÁö°í ÀÖ´Ù. º» ³í ¹®¿¡¼­´Â ĸ¼Ç »ý¼º ±â´ÉÀÇ Çâ»óÀ» À§ÇØ ¸í»ç, µ¿»ç¿Í °°Àº ¾ð¾îÀÇ Ç°»ç(POS) Á¤º¸¸¦ À̹ÌÁö·ÎºÎÅÍ ÃßÃâÇÏ¿© È°¿ëÇÏ´Â »õ·Î¿î ±â¹ýÀ» Á¦¾ÈÇÑ´Ù. Á¦¾ÈÇÏ´Â ¸ðµ¨Àº º¹¼öÀÇ CNN ÀÎÄÚ´õ¸¦ Ç°»ç º°·Î ÇнÀÇÏ¿© Ç°»ç º° Ư¡ º¤Å͸¦ ÃßÃâÇÑ ÈÄ, ÃßÃâÇÑ Ç°»ç º¤Å͸¦ LSTM¿¡ ÀÔ·ÂÇÏ¿© ĸ¼ÇÀ» »ý¼ºÇÑ´Ù. Á¦¾ÈÇÑ ¸ðµ¨Àº Flickr30k, MS-COCO µ¥ÀÌÅÍ ¼Â¿¡ ´ëÇØ ½ÇÇèÀ» ÁøÇàÇϸç, »ç¶÷À» ´ë»óÀ¸·Î 2°¡Áö ¼³¹® Á¶»ç¸¦ ÁøÇàÇÏ¿© °á°ú¹°ÀÇ ½ÇÁúÀûÀÎ À¯È¿¼ºÀ» °ËÁõÇÑ´Ù.
¿µ¹®³»¿ë
(English Abstract)
With the integration of smart devices and reliance on AI into our daily lives, the ability to generate image caption is becoming increasingly important in various fields such as guidance for visually-impaired individuals, human-computer interaction and so on. In this paper, we propose a novel approach based on parts of speech (POS), such as nouns and verbs extracted from image to enhance the image caption generation. The proposed model exploits multiple CNN encoders, which were specifically trained to identify features related to POS, and feed them into an LSTM decoder to generate image captions. We conducted experiments involving both Flickr30k and MS-COCO datasets using several text metrics and additional human surveys to validate the practical effectiveness of the proposed model.
Å°¿öµå(Keyword) À̹ÌÁö ĸ¼Ç »ý¼º   ÀÎÄÚ´õ-µðÄÚ´õ ±¸Á¶   Ç°»ç   ÄÄÇ»ÅÍ ºñÀü   image caption generation   encoder-decoder architec   parts of speech   computer vision  
ÆÄÀÏ÷ºÎ PDF ´Ù¿î·Îµå