• Àüü
  • ÀüÀÚ/Àü±â
  • Åë½Å
  • ÄÄÇ»ÅÍ
´Ý±â

»çÀÌÆ®¸Ê

Loading..

Please wait....

±¹³» ³í¹®Áö

Ȩ Ȩ > ¿¬±¸¹®Çå > ±¹³» ³í¹®Áö > Çѱ¹Á¤º¸°úÇÐȸ ³í¹®Áö > Á¤º¸°úÇÐȸ ÄÄÇ»ÆÃÀÇ ½ÇÁ¦ ³í¹®Áö (KIISE Transactions on Computing Practices)

Á¤º¸°úÇÐȸ ÄÄÇ»ÆÃÀÇ ½ÇÁ¦ ³í¹®Áö (KIISE Transactions on Computing Practices)

Current Result Document :

ÇѱÛÁ¦¸ñ(Korean Title) ½ÃÄö½º À¯»çµµ¿¡ ±â¹ÝÇÑ À¯Àüü µ¥ÀÌÅͺ£À̽º ¾ÐÃà ¹× ¿µÇ⠺м®
¿µ¹®Á¦¸ñ(English Title) The Analysis of Genome Database Compaction based on Sequence Similarity
ÀúÀÚ(Author) ±Ç¼±¿µ   À̺´ÇÑ   ¹Ú½ÂÇö   Á¶Á¤Èñ   À±¼º·Î   Sunyoung Kwon   Byunghan Lee   Seunghyun Park   Jeonghee Jo   Sungroh Yoon  
¿ø¹®¼ö·Ïó(Citation) VOL 23 NO. 04 PP. 0250 ~ 0255 (2017. 04)
Çѱ۳»¿ë
(Korean Abstract)
À¯Àüü µ¥ÀÌÅÍÀÇ ±ÞÁõ ¹× Á¤¹ÐÀÇ·á µî ÀÀ¿ë ºÐ¾ß È®´ë¿¡ µû¶ó À¯Àüü µ¥ÀÌÅͺ£À̽ºÀÇ È¿À²Àû °ü¸®¿¡ ´ëÇÑ Á߿伺ÀÌ Ä¿Áö°í ÀÖ´Ù. ÀüÅëÀûÀÎ ¾ÐÃà ±â¹ýÀ» ÅëÇØ À¯Àüü µ¥ÀÌÅ͸¦ ¾ÐÃàÇÒ °æ¿ì, ¾ÐÃàÈ¿°ú´Â Å©Áö¸¸, ¾ÐÃàµÈ »óÅ¿¡¼­ µ¥ÀÌÅͺ£À̽º¸¦ ºñ±³Çϰųª °Ë»öÇÏ´Â µîÀÇ ÀÛ¾÷ÀÌ ¿ëÀÌÇÏÁö ¾Ê°Ô µÈ´Ù. À¯Àüü µ¥ÀÌÅÍ ºÐ¼®¿¡ ¼Ò¿äµÇ´Â ½Ã°£Àº µ¥ÀÌÅͺ£À̽º¿¡ Á¸ÀçÇÏ´Â ½ÃÄö½º ¼ö¿¡ ºñ·ÊÇϸç, Áߺ¹µÇ°Å³ª À¯»çÇÑ ½ÃÄö½º°¡ ´Ù¼ö Á¸ÀçÇÑ´Ù´Â Á¡¿¡ Âø¾ÈÇÏ¿©, º» ³í¹®¿¡¼­´Â À¯Àüü µ¥ÀÌÅͺ£À̽º »ó¿¡ Á¸ÀçÇÏ´Â À¯»ç ½ÃÄö½º¸¦ Á¦°ÅÇÔÀ¸·Î½á Àüü µ¥ÀÌÅͺ£À̽º Å©±â¸¦ ÁÙÀÌ´Â ±â¹ýÀ» Á¦¾ÈÇÑ´Ù. ½ÇÇèÀ» ÅëÇØ ½ÃÄö½º À¯»çµµ 1% ±âÁØÀ¸·Îµµ ÀüüÀÇ ¾à 84% ½ÃÄö½º°¡ Á¦°ÅµÇ¸ç, ¾à 10¹è ºü¸¥ ºÐ·ùºÐ¼®ÀÌ °¡´ÉÇÔÀ» º¸ÀδÙ. ¶ÇÇÑ Å« ÆøÀÇ ¾ÐÃàÈ¿°ú¿¡µµ ºÒ±¸ÇÏ°í, ¹üÁÖ ´Ù¾ç¼º ¹× ºÐ·ù ºÐ¼® µî¿¡ ¹ÌÄ¡´Â º¯È­°¡ ¹Ì¹ÌÇÔÀ» È®ÀÎÇÔÀ¸·Î½á, ½ÃÄö½º À¯»çµµ ±â¹ÝÀÇ Á¦¾È ¾ÐÃà ±â¹ýÀÌ À¯Àüü µ¥ÀÌÅͺ£À̽º ¾ÐÃà¿¡ È¿°úÀûÀÎ ¹æ¹ýÀÓÀ» Á¦½ÃÇÑ´Ù.
¿µ¹®³»¿ë
(English Abstract)
Given the explosion of genomic data and expansion of applications such as precision medicine, the importance of efficient genome-database management continues to grow. Traditional compression techniques may be effective in reducing the size of a database, but a new challenge follows in terms of performing operations such as comparison and searches on the compressed database. Based on that many genome databases typically have numerous duplicated or similar equences, and that the runtime of genome analyses is normally proportional to the number of sequences in a database, we propose a technique that can compress a genome database by eliminating similar entries from the database. Through our experiments, we show that we can remove approximately 84% of sequences with 1% similarity threshold, accelerating the downstream classification tasks by approximately 10 times. We also confirm that our compression method does not significantly affect the accuracy of taxonomy diversity assessments or classification.
Å°¿öµå(Keyword) ½ÃÄö½º   À¯Àüü   µ¥ÀÌÅͺ£À̽º   ¾ÐÃà   À¯»çµµ   cd-hit-est   sequence   genome   database   compression   similarity   cd-hit-est  
ÆÄÀÏ÷ºÎ PDF ´Ù¿î·Îµå