• Àüü
  • ÀüÀÚ/Àü±â
  • Åë½Å
  • ÄÄÇ»ÅÍ
´Ý±â

»çÀÌÆ®¸Ê

Loading..

Please wait....

±¹³» ³í¹®Áö

Ȩ Ȩ > ¿¬±¸¹®Çå > ±¹³» ³í¹®Áö > Çѱ¹Á¤º¸Ã³¸®ÇÐȸ ³í¹®Áö > Á¤º¸Ã³¸®ÇÐȸ ³í¹®Áö ¼ÒÇÁÆ®¿þ¾î ¹× µ¥ÀÌÅÍ °øÇÐ

Á¤º¸Ã³¸®ÇÐȸ ³í¹®Áö ¼ÒÇÁÆ®¿þ¾î ¹× µ¥ÀÌÅÍ °øÇÐ

Current Result Document : 26 / 41 ÀÌÀü°Ç ÀÌÀü°Ç   ´ÙÀ½°Ç ´ÙÀ½°Ç

ÇѱÛÁ¦¸ñ(Korean Title) Spark ÇÁ·¹ÀÓ¿öÅ© ±â¹Ý ºñÁ¤Çü ºòµ¥ÀÌÅÍ ÅäÇÈ ÃßÃ⠽ýºÅÛ ¼³°è
¿µ¹®Á¦¸ñ(English Title) A Design on Informal Big Data Topic Extraction System Based on Spark Framework
ÀúÀÚ(Author) ¹Ú±âÁø   Kiejin Park  
¿ø¹®¼ö·Ïó(Citation) VOL 05 NO. 11 PP. 0521 ~ 0526 (2016. 11)
Çѱ۳»¿ë
(Korean Abstract)
¿Â¶óÀλ󿡼­ ´Ù·ç¾îÁö´Â ºñÁ¤Çü ÅؽºÆ® µ¥ÀÌÅÍ´Â ´ë¿ë·®À̸鼭 ºñ±¸Á¶Àû ÇüÅÂÀÇ Æ¯¼ºÀ» °¡Áö°í Àֱ⠶§¹®¿¡, ±âÁ¸ °ü°èÇü µ¥ÀÌÅÍ ¸ðµ¨ÀÇ ÀúÀå ¹æ½Ä°ú ºÐ¼® ¹æ¹ý¸¸À¸·Î´Â ÇÑ°è°¡ ÀÖ´Ù. ´õ±º´Ù³ª, µ¿ÀûÀ¸·Î ¹ß»ýÇÏ´Â ´ë·®ÀÇ ¼Ò¼È µ¥ÀÌÅ͸¦ È°¿ëÇÏ¿© ÀÌ¿ëÀÚÀÇ ¹ÝÀÀÀ» ½Ç½Ã°£À¸·Î ºÐ¼®Çϱâ¶õ ¾î·Á¿î »óȲÀÌ´Ù. ÀÌ¿¡ º» ³í¹®¿¡¼­´Â ´ë¿ë·® ºñÁ¤Çü µ¥ÀÌÅÍ(¹®¼­)ÀÇ Àǹ̸¦ ºü¸£°í, ¿ëÀÌÇÏ°Ô ÆľÇÇϱâ À§ÇÏ¿© µ¥ÀÌÅÍ ¼Â¿¡ ´ëÇÑ »çÀü ÇнÀ ¾øÀÌ, ¹®¼­ ³» ´Ü¾î ºñÁß¿¡ µû¶ó ÀÚµ¿À¸·Î ÅäÇÈ(ÁÖÁ¦)ÀÌ ÃßÃâµÇ´Â ½Ã½ºÅÛÀ» ¼³°è ¹× ±¸ÇöÇÏ¿´´Ù. Á¦¾ÈµÈ ½Ã½ºÅÛÀÇ ÅäÇÈ ¸ðµ¨¸µ¿¡ »ç¿ëµÉ ÀÔ·Â ´Ü¾î´Â N-gram ¾Ë°í¸®Áò¿¡ ÀÇÇÏ¿© µµÃâµÇ¾î º¹¼ö °³ÀÇ ´Ü¾îµµ ¹­À½ ó¸®ÇÒ ¼ö ÀÖ°Ô ÇßÀ¸¸ç, ¶ÇÇÑ, ´ë¿ë·® ºñÁ¤Çü µ¥ÀÌÅÍ ÀúÀå ¹× ¿¬»êÀ» À§ÇÏ¿© Hadoop°ú ºÐ»ê Àθ޸𸮠ó¸® ÇÁ·¹ÀÓ¿öÅ©ÀÎ Spark ±â¹Ý Ŭ·¯½ºÅ͸¦ ±¸¼ºÇÏ¿©, ÅäÇÈ ¸ðµ¨ ¿¬»êÀ» ¼öÇàÇÏ¿´´Ù. ¼º´É ½ÇÇè¿¡¼­´Â TB±ÞÀÇ ¼Ò¼È ´ñ±Û µ¥ÀÌÅ͸¦ ÀÐ¾î µé¿©, Àüü µ¥ÀÌÅÍ¿¡ ´ëÇÑ Àüó¸® °úÁ¤°ú ƯÁ¤ Ç׸ñÀÇ ÅäÇÈ ÃßÃâ ÀÛ¾÷À» ¼öÇàÇÏ¿´À¸¸ç, ´ë¿ë·® µ¥ÀÌÅ͸¦ Ŭ·¯½ºÅÍÀÇ µð½ºÅ©°¡ ¾Æ´Ñ ¸Þ¸ð¸®¿¡ ¹Ù·Î ÀûÀç ÈÄ, ó¸®ÇÔÀ¸·Î½á ÅäÇÈ ÃßÃâ ¼º´ÉÀÇ ¿ì¼ö¼ºÀ» È®ÀÎÇÒ ¼ö ÀÖ¾ú´Ù.
¿µ¹®³»¿ë
(English Abstract)
As on-line informal text data have massive in its volume and have unstructured characteristics in nature, there are limitations in applying traditional relational data model technologies for data storage and data analysis jobs. Moreover, using dynamically generating massive social data, social user¡¯s real-time reaction analysis tasks is hard to accomplish. In the paper, to capture easily the semantics of massive and informal on-line documents with unsupervised learning mechanism, we design and implement automatic topic extraction systems according to the mass of the words that consists a document. The input data set to the proposed system are generated first, using N-gram algorithm to build multiple words to capture the meaning of the sentences precisely, and Hadoop and Spark (In-memory distributed computing framework) are adopted to run topic model. In the experiment phases, TB level input data are processed for data preprocessing and proposed topic extraction steps are applied. We conclude that the proposed system shows good performance in extracting meaningful topics in time as the intermediate results come from main memories directly instead of an HDD reading.
Å°¿öµå(Keyword) ÅäÇȸ𵨠  N-gram   Spark   Hadoop   ±â°èÇнÀ   Topic Model   N-gram   Spark   Hadoop   Machine Learning  
ÆÄÀÏ÷ºÎ PDF ´Ù¿î·Îµå