• Àüü
  • ÀüÀÚ/Àü±â
  • Åë½Å
  • ÄÄÇ»ÅÍ
´Ý±â

»çÀÌÆ®¸Ê

Loading..

Please wait....

±¹³» ÇÐȸÁö

Ȩ Ȩ > ¿¬±¸¹®Çå > ±¹³» ÇÐȸÁö > µ¥ÀÌÅͺ£À̽º ¿¬±¸È¸Áö(SIGDB)

µ¥ÀÌÅͺ£À̽º ¿¬±¸È¸Áö(SIGDB)

Current Result Document : 24 / 24

ÇѱÛÁ¦¸ñ(Korean Title) ´ë±Ô¸ð ¿ÀÇ µ¥ÀÌÅÍ ·¹ÀÌÅ© ±¸ÃàÀ» À§ÇÑ Ç÷§Æû µ¶¸³Àû ÀÚµ¿È­ ÇÁ·¹ÀÓ¿öÅ©
¿µ¹®Á¦¸ñ(English Title) A Platform-independent Framework for Automatic Constructing Large-scale Open Data Lakes
ÀúÀÚ(Author) ±è´Ù¼Ö   ¹®¾ç¼¼   Dasol Kim   Yang-Sae Moon  
¿ø¹®¼ö·Ïó(Citation) VOL 38 NO. 03 PP. 0083 ~ 0095 (2022. 12)
Çѱ۳»¿ë
(Korean Abstract)
ÃÖ±Ù °¢±¹ Á¤ºÎ¿Í ±â°üÀÇ µ¥ÀÌÅÍ °ø°³°¡ È°¹ßÇØÁö¸é¼­, ¸Ó½Å ·¯´×, µ¥ÀÌÅÍ ºÐ¼® µî ´Ù¾çÇÑ ºÐ¾ß¿¡¼­ ¿ÀÇ µ¥ÀÌÅÍ È°¿ë ¿¬±¸°¡ Å©°Ô Áõ°¡ÇÏ°í ÀÖ´Ù. ¿ÀÇ µ¥ÀÌÅÍ´Â ÁÖ·Î Á¤ºÎ¿¡¼­ °ø°³ÇÏ´Â °ø°ø µ¥ÀÌÅÍ·Î, À̸¦ °ü¸®ÇÏ´Â ÀúÀå¼Ò¸¦ ¿ÀÇ µ¥ÀÌÅÍ ·¹ÀÌÅ©¶ó ÇÑ´Ù. º» ³í¹®¿¡¼­´Â ¿ÀÇ µ¥ÀÌÅÍÀÇ È°¿ëµµ¸¦ ³ôÀ̱â À§ÇØ, ¿©·¯ Æ÷Å»¿¡ ºÐ»êµÈ µ¥ÀÌÅ͸¦ ¿¬°èÇÏ¿© ƯÁ¤ µµ¸ÞÀο¡ ´ëÇÑ ¿ÀÇ µ¥ÀÌÅÍ ·¹ÀÌÅ© ±¸Ãà Áö¿ø ÇÁ·¹ÀÓ¿öÅ©¸¦ Á¦¾ÈÇÑ´Ù. À̸¦ À§ÇØ, ±âÁ¸ ¿ÀÇ µ¥ÀÌÅÍ Æ÷Å»ÀÇ µ¥ÀÌÅÍ °ü¸® ¹æ½ÄÀ» ºÐ¼®ÇÏ°í È°¿ëµµ¸¦ ÀúÇϽÃÅ°´Â Àüó¸® º¹À⼺, Ç÷§Æû Á¾¼Ó¼º, ½ºÄÉÀÏ Á¦ÇÑÀÇ ¼¼ °¡Áö ¹®Á¦Á¡À» µµÃâÇÑ´Ù. Àüó¸® º¹À⼺°ú ½ºÄÉÀÏ Á¦ÇÑ ¹®Á¦¸¦ ÇØ°áÇϱâ À§ÇØ ÀÚµ¿È­ È®Àå ±â´ÉÀ» Àû¿ëÇÑ ¼¼ ´Ü°èÀÇ ÀÚµ¿È­ ó¸® ·ÎÁ÷À» ±¸ÇöÇÏ°í, Ç÷§Æû Á¾¼Ó¼º ¹®Á¦ ÇØ°áÀ» À§ÇØ Ç÷§Æû¿¡ µû¸¥ ¼¼ºÎ ó¸® ·ÎÁ÷À» ±¸ÇöÇÑ´Ù. ¶ÇÇÑ, ¿ÀÇ µ¥ÀÌÅÍ ·¹ÀÌÅ©¸¦ À§ÇÑ ¸ÞŸµ¥ÀÌÅÍ °ü¸® ±â´ÉÀ» ¼³°è ¹× ±¸ÇöÇÑ´Ù. ½ÇÁ¦ µ¥ÀÌÅÍ Æ÷Å»À» ´ë»óÀ¸·Î ÇÑ ½ÇÇèÀ» ÅëÇØ, Á¦¾ÈÇÏ´Â ÇÁ·¹ÀÓ¿öÅ©°¡ ¾Õ¼­ µµÃâÇÑ ¹®Á¦¸¦ ¸ðµÎ ÇØ°áÇÏ°í, µ¥ÀÌÅÍ ·¹ÀÌÅ© ±¸Ãà»Ó ¾Æ´Ï¶ó È¿À²ÀûÀÎ °ü¸®¸¦ Áö¿øÇÏ´Â ÅëÇÕ ÇÁ·¹ÀÓ¿öÅ©ÀÓÀ» È®ÀÎÇÑ´Ù. º» ³í¹®ÀÇ °á°ú¹°Àº ¿ÀÇ µ¥ÀÌÅÍÀÇ È°¿ëµµ¸¦ ³ôÀÌ°í, ¿¬±¸ µ¥ÀÌÅÍ ºÎÁ· ¹®Á¦¸¦ ÇØ°áÇÏ´Â ½ÇÁúÀûÀÎ ÇØ°áÃ¥À» Á¦½ÃÇÑ ÃÖÃÊÀÇ ÅëÇÕ ÇÁ·¹ÀÓ¿öÅ©¶ó ÇÒ ¼ö ÀÖ´Ù.
¿µ¹®³»¿ë
(English Abstract)
With the recent increase in data disclosure, research using open data in various fields, such as machine learning and data analysis, is also rapidly increasing. Open data is public data that is released by the government, and the repository that manages it is called an open data lake. In this paper, we propose a new framework for constructing an open data lake for a specific domain by federating the data distributed across multiple portals to increase open data utilization. We analyze data management methods of the existing data portals and derive three problems: pre-processing complexity, platform dependency, and scale limitation that reduce usability. To solve the pre-processing complexity and scale limitation problems, we present the three-step automatic processing logic to which we applied the automation expansion. We also propose a detailed processing logic according to the platform to solve the platform dependency problem. We then design and implement metadata management functions for an open data lake. Through experiments, we confirmed that the proposed framework is an integrated solution that solves all problems previously derived and supports efficient management and construction of a data lake. We present the first integrated framework that supports the construction and management of open data lakes.
Å°¿öµå(Keyword) µ¥ÀÌÅÍ ·¹ÀÌÅ©   µ¥ÀÌÅÍ ¼öÁý   µ¥ÀÌÅÍ ºÐ¼®   ¿ÀÇ µ¥ÀÌÅÍ   ¿ÀÇ µ¥ÀÌÅÍ Ç÷§Æû   Data lakes   Data collection   Data analysis   Open data   Open data platforms  
ÆÄÀÏ÷ºÎ PDF ´Ù¿î·Îµå