• Àüü
  • ÀüÀÚ/Àü±â
  • Åë½Å
  • ÄÄÇ»ÅÍ
´Ý±â

»çÀÌÆ®¸Ê

Loading..

Please wait....

±¹³» ÇÐȸÁö

Ȩ Ȩ > ¿¬±¸¹®Çå > ±¹³» ÇÐȸÁö > µ¥ÀÌÅͺ£À̽º ¿¬±¸È¸Áö(SIGDB)

µ¥ÀÌÅͺ£À̽º ¿¬±¸È¸Áö(SIGDB)

Current Result Document : 23 / 23

ÇѱÛÁ¦¸ñ(Korean Title) FLASHer: À¯»ç ÁÖÁ¦¸¦ °®´Â ´ë·®ÀÇ ´º½º ±â»ç¿¡ ´ëÇÑ ±ºÁý ±â¹Ý Áߺ¹ Á¦°Å ¹æ¹ý
¿µ¹®Á¦¸ñ(English Title) FLASHer: A Novel Clustering-Based Scheme for Deduplicating a Large Amount of News Articles with Similar Topics
ÀúÀÚ(Author) ÀÌÁÖ¿µ   Â÷¼öÁø   ¼­¿µ±Õ   Joo-Young Lee   Sujin Cha   Young-Kyoon Suh  
¿ø¹®¼ö·Ïó(Citation) VOL 35 NO. 02 PP. 0054 ~ 0065 (2019. 08)
Çѱ۳»¿ë
(Korean Abstract)
¿À´Ã³¯ ºòµ¥ÀÌÅÍ ½Ã´ë¸¦ ¸Â¾Æ ¸ÅÀÏ ¼öõ °ÇÀÇ ´º½º±â»ç°¡ ´Ù¾çÇÑ ¾ð·Ð»ç¿¡¼­ »ý¼ºµÇ°í ÀÖ´Ù. ÀÌ·¯ÇÑ ¸¹Àº ¾çÀÇ ±â»ç µ¥ÀÌÅ͸¦ È°¿ëÇÑ ¼ö¸¹Àº Á¾·ùÀÇ ÀÀ¿ëµéÀÌ »ý¼ºµÇ°í ÀÖ´Ù. ±×·³¿¡µµ ºÒ±¸ÇÏ°í, µ¿ÀÏÇÑ »ç°Ç¿¡ ´ëÇØ ±â¼úµÈ ´ëºÎºÐÀÇ ±â»çµéÀÌ µ¿ÀÏÇÑ ³»¿ëÀ» Æ÷ÇÔÇÏ°í ÀÖÀ½À» ½±°Ô Á¢ÇÏ°Ô µÈ´Ù. ±×·¯ÇÑ ´º½º ±â»çÀÇ Áߺ¹Àº »ç¿ëÀÚµéÀÌ È¹ÀÏÈ­µÈ °üÁ¡À» °¡Áú ¼ö ÀÖ°Ô ÇÒ »Ó¸¸ ¾Æ´Ï¶ó ±â»ç µ¥ÀÌÅ͸¦ È°¿ëÇÏ´Â ´Ù¾çÇÑ ÀÀ¿ë ½Ã½ºÅÛµéÀÇ ÀúÀå ¹× Ã³¸® ½Ã°£ Ãø¸é¿¡¼­ ¼º´É ÀúÇÏ ¹®Á¦¸¦ ºÒ·¯ÀÏÀ¸Å³ ¼ö ÀÖ´Ù. º» ³í¹®Àº ÁÖ¾îÁø ´º½º µ¥ÀÌÅÍ¿¡ ´ëÇÑ È¿À²ÀûÀÌ°í È®À强 ÀÖ´Â Áߺ¹ Á¦°Å¸¦ ¼öÇàÇÏ´Â FLASHer ±â¹ýÀ» ¼Ò°³ÇÑ´Ù. FLASHer ´Â ¸ÕÀú ÁÖ¾îÁø ´º½º ¹®¼­ µ¥ÀÌÅÍ¿¡ ´ëÇÑ Àü󸮸¦ ¼öÇàÇÑ ´ÙÀ½, °ü·Ã ¹®¼­µé³¢¸® ±ºÁýÈ­ÇÑ´Ù. À̾î, ¹®¼­°£ÀÇ ÄÚ»çÀÎ À¯»çµµ¸¦ °è»êÇÏ°í ±×°ÍÀ» ÀÌ¿ëÇÏ¿© µ¥ÀÌÅÍÀÇ Áߺ¹À» Á¦°ÅÇÑ´Ù. ½ÇÇè °á°ú, FLASHer ´Â ¸Þ¸ð¸®¸¦ ÈξÀ ´õ ¸¹ÀÌ ¼ÒºñÇÏ´Â ±âÁ¸ÀÇ º£À̽º¶óÀÎ ¾Ë°í¸®Áò ´ëºñ ÃÖ¼Ò ¾à 8% ÀÇ ¸Þ¸ð¸®¸¸ »ç¿ëÇϸ鼭, ´ë·« 4.5%ÀÇ Áߺ¹ ¹®¼­¸¦ Á¦°Å ÇÒ ¼ö ÀÖ¾ú´Ù. Á¦¾ÈµÈ ¾Ë°í¸®ÁòÀ» ÅëÇØ »ç¿ëÀÚµéÀº Áߺ¹ÀÌ µÇÁö ¾ÊÀº ´Ù¾çÇÑ ´º½º ±â»çµéÀ» º¼ ¼ö ÀÖÀ¸¸ç, ÀÌ·¯ÇÑ °íÇ°Áú µ¥ÀÌÅ͸¦ È°¿ëÇÑ ÀÀ¿ë °³¹ß¿¡ ÁýÁßÇÒ ¼ö ÀÖ´Ù.
¿µ¹®³»¿ë
(English Abstract)
In the era of big data, thousands of news articles are being produced by many different agencies on a daily basis. A variety of applications are accordingly developed based on these articles. That said, it is not surprising to witness that most of the articles over the same event contain the same contents. Duplication of such articles not only exposes a uniform viewpoint to readers but also causes performance degradation of application systems in terms of storage and processing time. In this regard, we introduce a novel scheme, termed FLASHer, to perform efficient and scalable deduplication given a large amount of news document data. FLASHer first preprocesses the given document data and carries out clustering on the data. Subsequently, it calculates the cosine similarity between the documents and eliminates duplicate documents by leveraging the similarity. In our empirical evaluation, FLASHer demonstrates that it can remove approximately 4.5% redundant documents while spending only a minimum of 8% of memory, compared with existing baseline algorithms consuming much larger memory. Using the proposed algorithm, the users (or developers) can view various news articles that are not duplicated and focus on writing their applications based on such high quality data.
Å°¿öµå(Keyword) Large Volume   News Articles   Documents   Deduplication   Information Retrieval   Natural language processing   ´ë¿ë·® ´º½º   Áߺ¹ Á¦°Å   Á¤º¸°Ë»ö   ÀÚ¿¬¾î󸮠 
ÆÄÀÏ÷ºÎ PDF ´Ù¿î·Îµå