µ¥ÀÌÅͺ£À̽º ¿¬±¸È¸Áö(SIGDB)
Current Result Document :
ÇѱÛÁ¦¸ñ(Korean Title) |
´ë±Ô¸ð À¥ ¹®¼ÀÇ ½Ç½Ã°£ ÀÚ¿¬¾î 󸮸¦ À§ÇÑ µ¥ÀÌÅÍ ¼öÁý¡¤ÀúÀå ½Ã½ºÅÛ ¼³°è ¹× ±¸Çö |
¿µ¹®Á¦¸ñ(English Title) |
Design and Implementation of Data Collection and Storage System for Real-Time Natural Language Processing of Large-Scale Web Documents |
ÀúÀÚ(Author) |
ÇöÀϼº
À±À翬
ÃÖº´¼
ÀÌÀÍÈÆ
ÀÌ»ó±¸
Richeg Xuan
Jaeyeun Yoon
Byeongseo Choe
Lg-hoon Lee
Sang-goo Lee
|
¿ø¹®¼ö·Ïó(Citation) |
VOL 34 NO. 02 PP. 0059 ~ 0073 (2018. 08) |
Çѱ۳»¿ë (Korean Abstract) |
ºòµ¥ÀÌÅÍ ½Ã´ë¿¡ ºòµ¥ÀÌÅÍ ½Ã½ºÅÛ ±¸Ãà ¹× È°¿ëÀ» À§ÇØ µ¥ÀÌÅ͸¦ ¼öÁýÇÏ°í ÀúÀå ¹× Ã³¸®ÇÏ´Â ÀÏÀº °¡Àå ±âº»ÀûÀ̸鼵µ ÇÙ½ÉÀûÀÎ ÀÏÀÌ´Ù. ÀÎÅÍ³Ý ÅؽºÆ® µ¥ÀÌÅÍ´Â ´ëÇ¥ÀûÀÎ ºòµ¥ÀÌÅÍÀÌ°í, ´ë¿ë·®ÀÇ ÅؽºÆ® µ¥ÀÌÅÍ ¼öÁý ¹× ó¸®¿Í ÀÚ¿¬¾î 󸮿¡ ´ëÇÑ ¼ö¿ä´Â Áö¼ÓÀûÀ¸·Î Áõ°¡ÇÏ°í ÀÖ´Ù. º» ³í¹®¿¡¼´Â ´ë±Ô¸ð À¥ ¹®¼ÀÇ ÅؽºÆ® µ¥ÀÌÅ͸¦ ¼öÁýÇÏ°í ÀúÀåÇÏ´Â ½Ã½ºÅÛÀ» ¼³°èÇÏ°í ±¸ÇöÇÑ´Ù. µ¥ÀÌÅÍ ¼öÁý ºÎºÐ¿¡¼´Â API°¡ Á¦°øµÇÁö ¾Ê´Â ´Ù¾çÇÑ À¥ »çÀÌÆ®·ÎºÎÅÍ ÅؽºÆ® µ¥ÀÌÅ͸¦ ¼öÁýÇÒ ¼ö ÀÖ´Â ¼³°è¸¦ Á¦¾ÈÇÑ´Ù. ¶ÇÇÑ µ¥ÀÌÅ͸¦ ºü¸£°í È¿À²ÀûÀ¸·Î ¼öÁýÇϱâ À§ÇÑ º´·ÄÈ ¹æ¹ýÀ» Á¦¾ÈÇÑ´Ù. ÀúÀå ½Ã½ºÅÛÀº ´Ù¾çÇÑ ÀÚ¿¬¾î ó¸® ¸ðµâ¿¡ Àû¿ëÇÒ ¼ö ÀÖ°í ½Ç½Ã°£ ÀÚ¿¬¾î 󸮸¦ Áö¿øÇϱâ À§ÇØ Àθ޸𸮠µ¥ÀÌÅͺ£À̽º °ü¸® ½Ã½ºÅÛÀ» »ç¿ëÇÔÀ¸·Î½á ½ÇÇà ¼Óµµ¸¦ Çâ»ó½ÃÄ×´Ù. º» ³í¹®ÀÇ ½ÇÇè¿¡¼´Â ½ÇÁ¦·Î À¥ ¹®¼ÀÇ ´ë±Ô¸ð ÅؽºÆ® µ¥ÀÌÅ͸¦ ¼öÁýÇÏ°í ó¸®ÇÏ´Â ½ÇÇèÀ» ÅëÇØ ½Ã½ºÅÛÀÇ À¯È¿¼ºÀ» È®ÀÎÇÏ¿´´Ù.
|
¿µ¹®³»¿ë (English Abstract) |
In the big data era, collecting and processing data is the most fundamental and central thing for big data system implementation and utilization. Internet text data is the one of the most representative big data. The demand of collection of these big data and natural language processing thereof is steadily increasing. In the paper, we propose a system for collecting and storing text data of large-scale web documents. The proposed data collection system can collect data from various websites which support no API. In addition, the massive text data can be collected quickly and efficiently through various parallelization methods for performance improvement. The proposed storage system can be applied to various natural language processing modules and the execution speed is improved by using in-memory DBMS for real-time natural language processing. The validity of the proposed system is verified by our experiments to collect actual large web documents.
|
Å°¿öµå(Keyword) |
Big data
Natural language processing
Data crawling
Real-time processing
ºòµ¥ÀÌÅÍ
ÀÚ¿¬¾î ó¸®
µ¥ÀÌÅÍ Å©·Ñ¸µ
½Ç½Ã°£ ó¸®
|
ÆÄÀÏ÷ºÎ |
PDF ´Ù¿î·Îµå
|