• Àüü
  • ÀüÀÚ/Àü±â
  • Åë½Å
  • ÄÄÇ»ÅÍ
´Ý±â

»çÀÌÆ®¸Ê

Loading..

Please wait....

±¹³» ³í¹®Áö

Ȩ Ȩ > ¿¬±¸¹®Çå > ±¹³» ³í¹®Áö > Çѱ¹Á¤º¸°úÇÐȸ ³í¹®Áö > Á¤º¸°úÇÐȸ³í¹®Áö (Journal of KIISE)

Á¤º¸°úÇÐȸ³í¹®Áö (Journal of KIISE)

Current Result Document :

ÇѱÛÁ¦¸ñ(Korean Title) ´Éµ¿ ½Ã°¢À» ÀÌ¿ëÇÑ À̹ÌÁö-ÅؽºÆ® ´ÙÁß ¸ð´Þ ü°è ÇнÀ
¿µ¹®Á¦¸ñ(English Title) Active Vision from Image-Text Multimodal System Learning
ÀúÀÚ(Author) ±èÁøÈ­   À庴Ź   Jin-Hwa Kim   Byoung-Tak Zhang  
¿ø¹®¼ö·Ïó(Citation) VOL 43 NO. 07 PP. 0795 ~ 0800 (2016. 07)
Çѱ۳»¿ë
(Korean Abstract)
À̹ÌÁö ºÐ·ù ¹®Á¦´Â Àΰ£ ¼öÁØÀÇ ¼º´ÉÀ» º¸ÀÌÁö¸¸ ÀϹÝÀûÀÎ ÀÎ½Ä ¹®Á¦´Â ¾î·Á¿î Á¡µéÀÌ ³²¾ÆÀÖ´Ù. ½Ç³» ȯ°æÀº ´Ù¾çÇÑ Á¤º¸¸¦ ´ã°í ÀÖ¾î Á¤º¸ ó¸®ÀÇ ¾çÀ» È¿À²ÀûÀ¸·Î ÁÙÀÏ Çʿ伺ÀÌ ÀÖ´Ù. Á¤º¸ÀÇ ¾çÀ» È¿À²ÀûÀ¸·Î ÁÙÀÏ ¼ö ÀÖµµ·Ï ´ë»ó °´Ã¼ÀÇ À§Ä¡ ÃøÁ¤À» À§ÇÑ º¯ºÐ Ãß·Ð, º¯ºÐ º£ÀÌÁö¾È µîÀÇ ¹æ¹ýÀÌ ¼Ò°³µÇ¾úÁö¸¸, ¸ðµç °æ¿ì¿¡ ´ëÇÑ ÁÖº¯(marginal) È®·ü ºÐÆ÷¸¦ ±¸Çϱ⠾î·Æ±â ¶§¹®¿¡ Çö½ÇÀûÀ¸·Î °è»êÇϱ⠾î·Æ´Ù. º» ¿¬±¸¿¡¼­´Â °ø°£ º¯Çü ³×Æ®¿öÅ©(Spatial Transformer Networks)À» ÀÀ¿ëÇÏ¿© ´Éµ¿ ½Ã°¢À» ÀÌ¿ëÇÑ À̹ÌÁö-ÅؽºÆ® ÅëÇÕ ÀÎÁö ü°è¸¦ Á¦¾ÈÇÑ´Ù. ÀÌ Ã¼°è´Â ÁÖ¾îÁø ÅؽºÆ® Á¤º¸¸¦ ¹ÙÅÁÀ¸·Î À̹ÌÁöÀÇ ÀϺθ¦ È¿À²ÀûÀ¸·Î »ùÇøµ Çϵµ·Ï ÇнÀÇÑ´Ù. À̸¦ ÅëÇØ ÀüÅëÀûÀÎ ¹æ¹ýÀ¸·Î ÇØ°áÇϱ⠾î·Á¿î ¹®Á¦¸¦ »ó´çÇÑ °ÝÂ÷·Î ¼º´ÉÀ» Çâ»ó ½Ãų ¼ö ÀÖ´Ù´Â °ÍÀ» º¸ÀδÙ. Á¦¾ÈÇÏ´Â ¸ðµ¨À» ÅëÇØ »ùÇøµ µÈ À̹ÌÁö¸¦ Á¤¼ºÀûÀ¸·Î ºÐ¼®ÇÏ¿© ÀÌ ¸ðµ¨ÀÌ °¡Áö´Â Ư¼ºµµ ÇÔ²² »ìÆ캻´Ù.
¿µ¹®³»¿ë
(English Abstract)
In image classification, recent CNNs compete with human performance. However, there are limitations in more general recognition. Herein we deal with indoor images that contain too much information to be directly processed and require information reduction before recognition. To reduce the amount of data processing, typically variational inference or variational Bayesian methods are suggested for object detection. However, these methods suffer from the difficulty of marginalizing over the given space. In this study, we propose an image-text integrated recognition system using active vision based on Spatial Transformer Networks. The system attempts to efficiently sample a partial region of a given image for a given language information. Our experimental results demonstrate a significant improvement over traditional approaches. We also discuss the results of qualitative analysis of sampled images, model characteristics, and its limitations.
Å°¿öµå(Keyword) ½Ã°¢ ÁÖÀÇ   ´Éµ¿ ½Ã°¢   °´Ã¼ ÀνĠ  µö ·¯´×   visual Attention   active vision   object recognition   deep learning  
ÆÄÀÏ÷ºÎ PDF ´Ù¿î·Îµå