Á¤º¸°úÇÐȸ ÄÄÇ»ÆÃÀÇ ½ÇÁ¦ ³í¹®Áö (KIISE Transactions on Computing Practices)
Current Result Document : 5 / 6
ÇѱÛÁ¦¸ñ(Korean Title) |
Å©¶ó¿ìµå¼Ò½Ì ±â¹Ý ¹®ÀåÀ籸¼º ¹æ¹ýÀ» ÅëÇÑ ÀÇ°ß ½ºÆÔ µ¥ÀÌÅͼ ±¸Ãà ¹× Æò°¡ |
¿µ¹®Á¦¸ñ(English Title) |
A Crowdsourcing-Based Paraphrased Opinion Spam Dataset and Its Implication on Detection Performance |
ÀúÀÚ(Author) |
À̼º¿î
±è¼º¼ø
¹Úµ¿Çö
°Àç¿ì
Seongwoon Lee
Seongsoon Kim
Donghyeon Park
Jaewoo Kang
|
¿ø¹®¼ö·Ïó(Citation) |
VOL 22 NO. 07 PP. 0338 ~ 0343 (2016. 07) |
Çѱ۳»¿ë (Korean Abstract) |
À¥ÀÌ Á¤º¸ ±³È¯ÀÇ ÁÖµÈ ¼ö´ÜÀ¸·Î »ç¿ëµÇ¸é¼, ¿Â¶óÀÎ ¸®ºäÀÇ Áß¿äµµ°¡ Áõ°¡ÇÏ´Â µ¿½Ã¿¡ »ç¿ëÀÚÀÇ ¿Ã¹Ù¸¥ ÀÇ»ç°áÁ¤À» ÀúÇØÇÏ´Â ÀÇ°ß ½ºÆÔ À̽´°¡ ºÎ°¢µÇ°í ÀÖÀ¸¸ç, °ü·Ã ¿¬±¸°¡ È°¹ßÇÏ°Ô ÁøÇàµÇ°í ÀÖ´Ù. ÇÏÁö¸¸ ºÐ¼® ¹× ÇнÀ¿¡ ÇÊ¿äÇÑ ±âÁØ µ¥ÀÌÅͼÂÀÇ ºÎÁ·ÇÔ°ú ÇÑ°èÁ¡µéÀº °ü·Ã ¿¬±¸ÀÇ ¹ßÀüÀ» ´õµð°Ô ÇÏ°í ÀÖ´Ù. º» ³í¹®¿¡¼´Â »ç½Ç ¸®ºä¸¦ ¸ð»çÇÑ »õ·Î¿î ÇüÅÂÀÇ Paraphrased Opinion Spam(POS) µ¥ÀÌÅͼÂÀ» ¼Ò°³ÇÑ´Ù. ¿ì¸®´Â ½ÇÁ¦ ½ºÆиӵéÀÌ ½ºÆÔÀ» ÀÛ¼ºÇÒ ¶§ ½ÇÁ¦ ¸®ºä¸¦ Âü°íÇÑ´Ù´Â °æÇâ¿¡ Âø¾ÈÇÏ¿©, ½ÇÁ¦ ¸®ºä¾îµéÀÌ ÀÛ¼ºÇÑ ¸®ºä¸¦ ÀÇ¿ªÇÏ´Â °úÁ¤À» ÅëÇÏ¿© º»¹®¿¡ Æ÷ÇԵǾî ÀÖ´Â »ç½Ç Á¤º¸¿Í °æÇèÀ» ´ãÀº ½ºÆÔ µ¥ÀÌÅÍ ¼ÂÀ» »ý¼ºÇÏ¿´´Ù. ½ÇÇè °á°ú, »õ·Ó°Ô »ý¼ºµÈ POS µ¥ÀÌÅͼÂÀÌ ¾ð¾îÇÐÀûÀ¸·Î ½ÇÁ¦ ¸®ºäµé°ú À¯»çÇÏ¿© ½ºÆԺзù ¸ðµ¨À» ÀÌ¿ëÇÏ¿© ºÐ·ù ½Ã ±âÁ¸ÀÇ µ¥ÀÌÅͼµ麸´Ù ´õ ºÐ·ùÇϱâ Èûµé´Ù´Â °ÍÀ» ¹ß°ßÇß´Ù. ¶ÇÇÑ µ¥ÀÌÅÍÀÇ ÇнÀ·®¿¡ µû¶ó¼ ½ºÆÔ ¸®ºäÀÇ ºÐ·ù Á¤È®µµ°¡ ºñ·ÊÀûÀ¸·Î Áõ°¡ÇÏ´Â °ÍÀ» È®ÀÎÇÔÀ¸·Î½á, µ¥ÀÌÅÍÀÇ ¾çÀÌ ½ºÆÔ ºÐ·ù ¸ðµ¨ ¼º´É¿¡ Áß¿äÇÑ ¿ä¼Ò·Î ÀÛ¿ëÇÑ´Ù´Â °ÍÀ» È®ÀÎÇÒ ¼ö ÀÖ¾ú´Ù.
|
¿µ¹®³»¿ë (English Abstract) |
Today, opinion reviews on the Web are often used as a means of information exchange. As the importance of opinion reviews continues to grow, the number of issues for opinion spam also increases. Even though many research studies on detecting spam reviews have been conducted, some limitations of gold-standard datasets hinder research. Therefore, we introduce a new dataset called ¡°Paraphrased Opinion Spam (POS)" that contains a new type of review spam that imitates truthful reviews. We have noticed that spammers refer to existing truthful reviews to fabricate spam reviews. To create such a seemingly truthful review spam dataset, we asked task participants to paraphrase truthful reviews to create a new deceptive review. The experiment results show that classifying our POS dataset is more difficult than classifying the existing spam datasets since the reviews in our dataset more linguistically look like truthful reviews. Also, training volume has been found to be an important factor for classification model performance.
|
Å°¿öµå(Keyword) |
¹®ÀåÀ籸¼º
ÀÇ°ß ½ºÆÔ
Å©¶ó¿ìµå¼Ò½Ì
¸®¼Ò½º »ý¼º
¸®¼Ò½º Æò°¡
paraphrasing
opinion spam
crowdsourcing
resources generation
resources evaluation
|
ÆÄÀÏ÷ºÎ |
PDF ´Ù¿î·Îµå
|