• Àüü
  • ÀüÀÚ/Àü±â
  • Åë½Å
  • ÄÄÇ»ÅÍ
´Ý±â

»çÀÌÆ®¸Ê

Loading..

Please wait....

±¹³» ³í¹®Áö

Ȩ Ȩ > ¿¬±¸¹®Çå > ±¹³» ³í¹®Áö > Çѱ¹Á¤º¸°úÇÐȸ ³í¹®Áö > Á¤º¸°úÇÐȸ³í¹®Áö (Journal of KIISE)

Á¤º¸°úÇÐȸ³í¹®Áö (Journal of KIISE)

Current Result Document :

ÇѱÛÁ¦¸ñ(Korean Title) ºñ°Ý½Ä ¹®¼­ ºÐ·ù ¼º´É °³¼±À» À§ÇÑ LDA ´Ü¾î ºÐÆ÷ ±â¹ÝÀÇ ÀÚÁú È®Àå
¿µ¹®Á¦¸ñ(English Title) Feature Expansion based on LDA Word Distribution for Performance Improvement of Informal Document Classification
ÀúÀÚ(Author) ÀÌÈ£°æ   ¾ç¼±   °í¿µÁß   Hokyung Lee   Seon Yang   Youngjoong Ko  
¿ø¹®¼ö·Ïó(Citation) VOL 43 NO. 09 PP. 1008 ~ 1014 (2016. 09)
Çѱ۳»¿ë
(Korean Abstract)
Æ®À§ÅÍ, ÆäÀ̽ººÏ, ¿Â¶óÀÎ °í°´ ¸®ºä µîÀº ½Å¹®±â»çó·³ Á¤Á¦µÈ ±ÛÀÌ ¾Æ´Ñ ÀÚÀ¯·Ó°Ô ±â¼úµÇ´Â ºñ°Ý½Ä(informal) ÅؽºÆ® ¹®¼­¿¡ ¼ÓÇÑ´Ù. ÀÌ·¯ÇÑ ºñ°Ý½Ä ¹®¼­¿¡¼­ ÀÏ°üµÈ ±ÔÄ¢À̳ª ÆÐÅÏÀ» ã´Â ÀÏÀº °Ý½Ä(formal) ¹®¼­ °æ¿ì¿¡ ºñÇØ ¿ëÀÌÇÏÁö ¾Ê±â ¶§¹®¿¡, ºñ°Ý½Ä ¹®¼­ ºÐ¼®À» À§Çؼ­´Â ¼º´É °³¼±À» À§ÇÑ Ãß°¡ÀûÀÎ Á¢±Ù ¹æ¹ý ÇÊ¿ä´Ù°í ÆǴܵȴÙ. º» ¿¬±¸¿¡¼­´Â ´ëÇ¥Àû ºñ°Ý½Ä ¹®¼­ÀÎ Æ®À§ÅÍ µ¥ÀÌÅ͸¦ ¿­ °¡Áö Ä«Å×°í¸®·Î ºÐ·ùÇÔ¿¡ ÀÖ¾î LDA(Latent Dirichlet allocation) ´Ü¾î ºÐÆ÷¸¦ »ç¿ëÇÏ¿© ÀÚÁú(feature)À» ±³Á¤ÇÏ°í È®ÀåÇÑ´Ù. ÅäÇȺ°·Î »óÀ§¿¡ ·©Å©µÈ ´Ü¾î ÀÚÁúµéÀ» ±â¹ÝÀ¸·Î ´Ù¸¥ ´Ü¾î ÀÚÁúµéÀ» ºÐÇØ ¹× º´ÇÕÇÏ´Â ¹æ½ÄÀ¸·Î À¯¿ëÇÑ ÀÚÁú ÁýÇÕÀ» ¹Ýº¹ÀûÀ¸·Î È®Àå½ÃŲ´Ù. ÀÌ·¸°Ô »ý¼ºµÈ ÀÚÁú·Î ¹®¼­ ºÐ·ù¸¦ ¼öÇàÇÑ °á°ú ÀÚÁú È®Àå ÀÌÀü¿¡ ºñÇØ ¸¶ÀÌÅ©·Î Æò±Õ F1-score 7.11%pÀÇ ¼º´É °³¼± È¿°ú¸¦ È®ÀÎÇÒ ¼ö ÀÖ¾ú´Ù.
¿µ¹®³»¿ë
(English Abstract)
Data such as Twitter, Facebook, and customer reviews belong to the informal document group, whereas, newspapers that have grammar correction step belong to the formal document group. Finding consistent rules or patterns in informal documents is difficult, as compared to formal documents. Hence, there is a need for additional approaches to improve informal document analysis. In this study, we classified Twitter data, a representative informal document, into ten categories. To improve performance, we revised and expanded features based on LDA(Latent Dirichlet allocation) word distribution. Using LDA top-ranked words, the other words were separated or bundled, and the feature set was thus expanded repeatedly. Finally, we conducted document classification with the expanded features. Experimental results indicated that the proposed method improved the micro-averaged F1-score of 7.11%p, as compared to the results before the feature expansion step.
Å°¿öµå(Keyword) ºñ°Ý½Ä ¹®¼­   Æ®À§ÅÍ   ¹®¼­ ºÐ·ù   LDA ´Ü¾î ºÐÆ÷   ÀÚÁú È®Àå   informal document   Twitter   document classification   LDA word distribution   feature expansion  
ÆÄÀÏ÷ºÎ PDF ´Ù¿î·Îµå