• Àüü
  • ÀüÀÚ/Àü±â
  • Åë½Å
  • ÄÄÇ»ÅÍ
´Ý±â

»çÀÌÆ®¸Ê

Loading..

Please wait....

±¹³» ³í¹®Áö

Ȩ Ȩ > ¿¬±¸¹®Çå > ±¹³» ³í¹®Áö > Çѱ¹Á¤º¸Ã³¸®ÇÐȸ ³í¹®Áö > Á¤º¸Ã³¸®ÇÐȸ ³í¹®Áö B

Á¤º¸Ã³¸®ÇÐȸ ³í¹®Áö B

Current Result Document :

ÇѱÛÁ¦¸ñ(Korean Title) ´Ü¾î ¹Ýº¹ Ư¡À» ÀÌ¿ëÇÑ ½ºÆÔ ¹®¼­ ºÐ·ù ¹æ¹ý¿¡ °üÇÑ ¿¬±¸
¿µ¹®Á¦¸ñ(English Title) A Study on Spam Document Classification Method using Characteristics of Keyword Repetition
ÀúÀÚ(Author) À̼ºÁø   ¹éÁ¾¹ü   ÇÑÁ¤¼®   À̼ö¿ø   Seongjin Lee   Jongbum Baik   Chung-Seok Han   Soowon Lee  
¿ø¹®¼ö·Ïó(Citation) VOL 18-B NO. 05 PP. 0315 ~ 0324 (2011. 10)
Çѱ۳»¿ë
(Korean Abstract)
ÀÎÅÍ³Ý È¯°æ¿¡¼­ ½ºÆÔÀÇ ¹ü¶÷Àº °³ÀÎ Á¤º¸ÀÇ À¯Ãâ, Çǽ̿¡ ÀÇÇÑ ±ÝÀüÀû ¼ÕÇØ, ¹«ºÐº°ÇÑ À¯ÇØ ÄÜÅÙÃ÷ÀÇ À¯Åë µî ½É°¢ÇÑ »çȸ ¹®Á¦¸¦ ¾ß±âÇÏ°í ÀÖ´Ù. ¶ÇÇÑ »çȸÀû ÅëÁ¦¸¦ ÇÊ¿ä·Î ÇÏ´Â À¯ÇØ Á¤º¸¸¦ ¹«Â÷º°ÀûÀ¸·Î À¯Åë½ÃÅ°´Â ½ºÆÔÀÇ ÇüÅÂ¿Í ±â¼úÀÌ °¥¼ö·Ï ´Ù¾çÇØÁö°í ÀÖ´Ù. Bag-of-Words ¸ðµ¨À» ÀÌ¿ëÇÑ ÇнÀ ±â¹Ý ½ºÆÔ ºÐ·ù ¹æ¹ýÀº ÇöÀç±îÁöÀÇ ¿¬±¸ Áß¿¡¼­ °¡Àå ÀϹÝÀûÀ¸·Î »ç¿ëµÇ´Â ¹æ¹ýÀÌ´Ù. ±×·¯³ª ÀÌ ¹æ¹ýÀº ºÐ·ù ¸ðµ¨ ÇнÀ °úÁ¤¿¡¼­ »ç¿ëµÈ Å°¿öµåÀÇ ÃâÇö Á¤º¸¸¸À¸·Î ½ºÆÔ ¹®¼­¸¦ ºÐ·ùÇϱ⠶§¹®¿¡ ÃÖ±Ù ÈçÈ÷ ¹ß°ßÇÒ ¼ö ÀÖ´Â ½ºÆÔ Â÷´Ü ȸÇÇ ¹æ¹ý¿¡ ´ëÇÑ ´ëó ´É·ÂÀÌ ºÎÁ·ÇÏ´Ù.
º» ³í¹®¿¡¼­´Â ÀÌ·¯ÇÑ ¹®Á¦¸¦ ÇØ°áÇϱâ À§ÇØ ¹®¼­¿¡¼­ µîÀåÇÏ´Â ¹Ýº¹ ´Ü¾îÀÇ Æ¯Â¡À» ÀÌ¿ëÇÑ ½ºÆÔ ¹®¼­ ŽÁö ¹æ¹ýÀ» Á¦¾ÈÇÑ´Ù. ÃÖ±Ù ´ëºÎºÐÀÇ ½ºÆÔ ¹®¼­¿¡¼­´Â ³ëÃâÇÏ°íÀÚ ÇÏ´Â ½ºÆÔ ¹®±¸¸¦ ¹Ýº¹ÇÏ´Â °æÇâÀÌ ÀÖÀ¸¸ç, ÀÌ´Â ½ºÆÔ ¹®¼­¸¦ ÆǺ°ÇÏ´Â ±âÁØÀ¸·Î »ç¿ëµÉ ¼ö ÀÖ´Ù. º» ³í¹®¿¡¼­´Â ´Ü¾î ¹Ýº¹ÀÇ Æ¯Â¡À» Ç¥ÇöÇÒ ¼ö ÀÖ´Â 6°³ÀÇ º¯¼ö¸¦ Á¤ÀÇÇÏ°í À̸¦ ºÐ·ù ¸ðµ¨ »ý¼ºÀ» À§ÇÑ ¼Ó¼ºÀ¸·Î »ç¿ëÇÑ´Ù. º» ³í¹®¿¡¼­ Á¦¾ÈÇÏ´Â ½ºÆÔ Å½Áö ¹æ¹ýÀÇ ¼º´É Æò°¡¸¦ À§ÇØ ºí·Î±× Æ÷½ºÆ® µ¥ÀÌÅÍ¿Í À̸ÞÀÏ µ¥ÀÌÅ͸¦ ÀÌ¿ëÇÏ¿© ±âÁ¸ ¹æ¹ýµé°úÀÇ ºñ±³ ½ÇÇèÀ» ÁøÇàÇÏ¿´°í, °á°ú ºÐ¼®À» ÅëÇØ Á¦¾È ¹æ¹ýÀÌ ¿ì¼öÇÔÀ» È®ÀÎÇÏ¿´´Ù.
¿µ¹®³»¿ë
(English Abstract)
In Web environment, a flood of spam causes serious social problems such as personal information leak, monetary loss from fishing and distribution of harmful contents. Moreover, types and techniques of spam distribution which must be controlled are varying as days go by. The learning based spam classification method using Bag-of-Words model is the most widely used method until now. However, this method is vulnerable to anti-spam avoidance techniques, which recent spams commonly have, because it classifies spam documents utilizing only keyword occurrence information from classification model training process.
In this paper, we propose a spam document detection method using a characteristic of repeating words occurring in spam documents as a solution of anti-spam avoidance techniques. Recently, most spam documents have a trend of repeating key phrases that are designed to spread, and this trend can be used as a measure in classifying spam documents. In this paper, we define six variables, which represent a characteristic of word repetition, and use those variables as a feature set for constructing a classification model. The effectiveness of proposed method is evaluated by an experiment with blog posts and E-mail data. The result of experiment shows that the proposed method outperforms other approaches.
Å°¿öµå(Keyword) ½ºÆÔ Â÷´Ü   ½ºÆÔ   ½ºÆÔµ¦½Ì   ´Ü¾î ½ºÆй֠  ´Ü¾î ¹Ýº¹   Spam Filtering   Spam   Spamdexing   Term Spamming   Word Repetition  
ÆÄÀÏ÷ºÎ PDF ´Ù¿î·Îµå