• Àüü
  • ÀüÀÚ/Àü±â
  • Åë½Å
  • ÄÄÇ»ÅÍ
´Ý±â

»çÀÌÆ®¸Ê

Loading..

Please wait....

±¹³» ÇÐȸÁö

Ȩ Ȩ > ¿¬±¸¹®Çå > ±¹³» ÇÐȸÁö > µ¥ÀÌÅͺ£À̽º ¿¬±¸È¸Áö(SIGDB)

µ¥ÀÌÅͺ£À̽º ¿¬±¸È¸Áö(SIGDB)

Current Result Document : 11 / 11

ÇѱÛÁ¦¸ñ(Korean Title) ÀÓº£µù ±³Ã¼¿¡ µû¸¥ ±¸¾îü ÅؽºÆ® ŽÁö ¸ðµ¨ ¼º´É ºñ±³
¿µ¹®Á¦¸ñ(English Title) Performance Comparison of the Spoken Language Detection Model with Embedding Replacement
ÀúÀÚ(Author) ±èÇöÁ¾   Hyeonjong Kim   ³²±ÃÁÖÈ«   ¹®¾ç¼¼   Yang-Sae Moon   ÃÖÇüÁø   Hyung-Jin Choi   Juhong Namgung   ±æ¸í¼±   Myeong-Seon Gil  
¿ø¹®¼ö·Ïó(Citation) VOL 36 NO. 02 PP. 0045 ~ 0055 (2020. 08)
Çѱ۳»¿ë
(Korean Abstract)
µö·¯´× ±â¹Ý ¿å¼³ ŽÁö ¸ðµ¨Àº ±¸¾îüÀÇ ¿ÀÅ»ÀÚ ¹× ¶ç¾î¾²±â ¿À·ù·Î ÀÎÇØ Á¤È®µµ Çâ»ó¿¡ ¸¹Àº Á¦¾àÀÌ ÀÖ´Ù. ƯÈ÷, ±¸¾îü´Â ÇнÀ µ¥ÀÌÅÍ »ý¼ºÀ» À§ÇÑ ÇüÅÂ¼Ò ºÐ¼®¿¡¼­ ´Ü¾î ÀÇ¹Ì ÆľÇÀ» ¹æÇØÇÏ´Â ÇüżҰ¡ ºó¹øÇÏ°Ô »ý¼ºµÇ´Â ¹®Á¦Á¡ÀÌ ÀÖÀ¸¸ç, À̴ ŽÁö ¸ðµ¨ÀÇ Á¤È®µµ¸¦ ¶³¾î¶ß¸®´Â °¡Àå Å« ¿äÀÎÀÌ´Ù. º» ³í¹®¿¡¼­´Â ÀÌ·¯ÇÑ Çѱ¹¾î ±¸¾îüÀÇ ¹®Á¦Á¡À» ±Øº¹Çϱâ À§ÇØ, ÀÓº£µù¿¡ µû¸¥ ŽÁö ¸ðµ¨À» ¼³°è ¹× ±¸ÇöÇÏ°í, À̸¦ ±â¹ÝÀ¸·Î ¿å¼³ ŽÁö Á¤È®µµ¸¦ ºñ±³ÇÑ´Ù. ŽÁö¿¡´Â Word2Vec, fastText, SKT-KoBERT, KoELECTRAÀÇ ÃÑ ³× °¡Áö ÀÓº£µù ¸ðµ¨À» »ç¿ëÇϸç, ½ÇÇèÀ» ÅëÇØ °¢ ÀÓº£µù ±â¹Ý ¿å¼³ ŽÁö ¸ðµ¨ ¼º´ÉÀ» ºñ±³ ¹× Æò°¡ÇÑ´Ù. ½ÇÇè °á°ú, »ç¿ë ¹®ÀÚ ´ÜÀ§¿¡ µû¸¥ ½ÇÇèÀº Word2Vec°ú fastText ¸ðµÎ 90% ÀÌ»óÀÇ Á¤È®µµ¸¦ º¸¿´°í, ÁßÀǼº ÆÇ´Ü ¿©ºÎ¿¡ µû¸¥ ½ÇÇè¿¡¼­´Â SKT-KoBERT°¡ fastText¿¡ ºñÇØ ¿ùµîÈ÷ ³ôÀº ¼º´ÉÀ» º¸ÀÌ´Â °ÍÀ¸·Î ³ªÅ¸³µ´Ù. ¸¶Áö¸·À¸·Î, »çÀü ÇнÀ ¹æ¹ý¿¡ µû¸¥ ½ÇÇè ¶ÇÇÑ SKT-KoBERT°¡ KoELECTRA¿¡ ºñÇØ ³ôÀº ¼º´ÉÀ» º¸ÀÌ´Â °ÍÀ¸·Î ³ªÅ¸³µ´Ù. º» ³í¹®ÀÇ ½ÇÇè °á°ú¸¦ ÅëÇØ, ´Ù¾çÇÑ ±¸¾îü ±â¹Ý µö·¯´× ¼­ºñ½º¿¡ º¸´Ù È¿°úÀûÀÎ ÀÓº£µù ±â¼úÀ» Àû¿ëÇÒ ¼ö ÀÖÀ» °ÍÀ¸·Î »ç·áµÈ´Ù.
¿µ¹®³»¿ë
(English Abstract)
Deep learning-based abuse detection model is limited in accuracy due to frequent typos and spacing errors in Korean text. Particularly, in the process of morphological analysis of spoken language for generating learning data, there is a problem in morphemes that make it difficult to grasp the meaning of words are frequently extracted. This is the biggest cause of degrading the accuracy of the abuse detection model. In this paper, to overcome the problem of Korean spoken language, we design and implement a detection model based on embedding, and compare the accuracy of abuse detection We use four embedding models: Word2Vec, fastText, SKT-KoBERT, and KoELECTRA for detection, and we compare and evaluate the performance of each embedding-based abuse detection model through experiments. As a result of the experiment, the character unit-based experiments showed more than 90% accuracy in both Word2Vec and fastText, and in the experiment according to the determination of ambiguity, SKT-KoBERT showed significantly higher performance than fastText. Finally, the experiment according to the pre-learning method also showed higher performance of SKT-KoBERT than KoELECTRA. Through the experimental results of this paper, it is considered that more effective embedding technology can be applied to various spoken language-based deep learning services.
Å°¿öµå(Keyword) ±¸¾îü ÅؽºÆ®   ÅؽºÆ® ±â¹Ý µö·¯´× ¸ðµ¨   ÅؽºÆ® ÀÓº£µù   ¿å¼³ ŽÁö   Spoken language   Text-based deep learning model   Text embedding   Abuse detection  
ÆÄÀÏ÷ºÎ PDF ´Ù¿î·Îµå