Á¤º¸°úÇÐȸ ³í¹®Áö B : ¼ÒÇÁÆ®¿þ¾î ¹× ÀÀ¿ë
Current Result Document :
ÇѱÛÁ¦¸ñ(Korean Title) |
Çâ»óµÈ ¹Ì»ý¹° À¯Àüü ÁÖ¼® 󸮸¦ À§ÇÑ ´Ü¹éÁú ¹ßÇö À¯ÀüÀÚ ¹× À§À¯ÀüÀÚ ÆǺ° ¾Ë°í¸®Áò |
¿µ¹®Á¦¸ñ(English Title) |
An Algorithm for Identifying Protein-coding Sequences and Pseudogenes to Improve Microbial Genome Annotation |
ÀúÀÚ(Author) |
À¯µ¿¼ö
Á¤ÇØ¿µ
±èº´±Ç
¼ÛÁÖ¿¬
ÀÌ´ëÈñ
°øÀº¹è
±èÁöÇö
Dong Su Yu
Haeyoung Jeong
Byung Kwon Kim
Ju Yeon Song
Dae-Hee Lee
Eun Bae Kong
Jihyun F. Kim
|
¿ø¹®¼ö·Ïó(Citation) |
VOL 39 NO. 02 PP. 0075 ~ 0083 (2012. 02) |
Çѱ۳»¿ë (Korean Abstract) |
Â÷¼¼´ë ¿°±â¼¿ Çص¶ ±â¼ú¿¡ ÀÇÇØ Çص¶µÇ´Â ¹Ì»ý¹° À¯ÀüüÀÇ ¼ö°¡ ±ÞÁõÇÏ°í ÀÖ´Â °¡¿îµ¥, ÀÚµ¿ÈµÈ ÁÖ¼® ó¸® ½Ã½ºÅÛÀº ¼ö¸¹Àº À¯Àüü Á¤º¸¸¦ ó¸®ÇÏ´Â Á¡¿¡¼ ´õ¿í ´õ ÁÖ¸ñµÇ°í ÀÖ´Ù. Çص¶µÈ ¹Ì»ý¹° À¯Àüü¸¦ ÁÖ¼® ó¸®ÇÏ´Â °úÁ¤¿¡¼ ÁÖ¼® ó¸® °á°úÀÇ ¹Î°¨µµ(sensitivity) Çâ»óÀ» À§ÇÏ¿© µÎ °³ ÀÌ»óÀÇ À¯ÀüÀÚ ¿¹Ãø ÇÁ·Î±×·¥À» »ç¿ëÇÏ´Â °ÍÀÌ È¿°úÀûÀÌÁö¸¸, À߸ø ¿¹ÃøµÈ À¯ÀüÀÚÀÇ ¼ö°¡ Áõ°¡ÇÏ¿© ³·Àº ƯÀ̵µ(specificity)¿Í Á¤È®µµ(accuracy)¸¦ º¸ÀÌ´Â ¹®Á¦°¡ ÀÖ´Ù. ¸¹Àº ÁÖ¼® ó¸® ½Ã½ºÅÛÀº ¿¹ÃøµÈ À¯ÀüÀڷκÎÅÍ ´Ü¹éÁúÀ» ¾ÏÈ£ÈÇÏ´Â À¯ÀüÀÚ(coding sequence)¿Í À§À¯ÀüÀÚ(pseudogene)¸¦ ±¸ºÐÇÏÁö ¾Ê±â ¶§¹®¿¡ ÁÖ¼® ó¸® °á°úÀÇ ÁúÀû Çâ»óÀ» À§ÇØ Àü¹®°¡µéÀÌ ¼öÀÛ¾÷À¸·Î ÁÖ¼® ³»¿ëÀ» ¼öÁ¤ÇÏ°í ÀÖ´Â °ÍÀÌ Çö½ÇÀÌ´Ù. º» ³í¹®¿¡¼´Â µÎ °³ ÀÌ»óÀÇ ÇÁ·Î±×·¥¿¡ ÀÇÇØ ¿¹ÃøµÈ À¯ÀüÀÚµé Áß¿¡¼ Á¤È®ÇÑ À¯ÀüÀÚ¸¦ ±¸ºÐÇÏ°í, À§À¯ÀüÀÚ¸¦ ¿¹ÃøÇÏ¿© ¹Ì»ý¹° À¯Àüü ÁÖ¼® ó¸® °á°úÀÇ ÁúÀûÀÎ Çâ»ó¿¡ ±â¿©ÇÒ ¼ö ÀÖ´Â GeneCuraid ¾Ë°í¸®ÁòÀ» ¼Ò°³ÇÑ´Ù. ´ëÀå±Õ K-12 MG1655 À¯Àüü ¿°±â¼¿À» ´ë»óÀ¸·Î GeneCuraid ¾Ë°í¸®ÁòÀ» ½ÃÇèÇÑ °á°ú, 98.09% ¹Î°¨µµ, 24.33% ƯÀ̵µ ±×¸®°í 91.90%ÀÇ Á¤È®µµ¸¦ º¸ÀÓÀ¸·Î½á, CRITICA, GLIMMER, GeneMarkS, ±×¸®°í AutoFACT ÇÁ·Î±×·¥À¸·Î ±¸¼ºÇÑ ÁÖ¼® ½Ã½ºÅÛÀÇ °á°úº¸´Ù ´õ ³ô°Ô ³ªÅ¸³µ´Ù. µû¶ó¼ GeneCuraid ¾Ë°í¸®ÁòÀº Á¤È®ÇÑ ´Ü¹éÁú ¹ßÇö À¯ÀüÀÚ¸¦ ±¸ºÐÇÏ°í À§À¯ÀüÀÚ¸¦ ¿¹ÃøÇÔÀ¸·Î½á Çص¶µÈ À¯Àüü ÁÖ¼®ÀÇ ÁúÀ» ´õ¿í Çâ»ó½ÃÅ°°í, Á¤È®ÇÑ ´Ü¹éÁú ¹ßÇö À¯ÀüÀÚ ¹× À§À¯ÀüÀÚ¸¦ ¼öÀÛ¾÷À¸·Î °áÁ¤Çϴµ¥ ¼ÒºñµÇ´Â ¸¹Àº ½Ã°£°ú ºñ¿ëÀ» Àý°¨½Ãų °ÍÀ¸·Î ±â´ëµÈ´Ù.
|
¿µ¹®³»¿ë (English Abstract) |
As more and more microbial genomes are being sequenced by next-generation sequencing technologies, automated genome annotation systems have become more important to process a vast amount of genome sequence information. The usage of multiple gene prediction programs warrants higher sensitivity, but this approach is ineffective in terms of specificity and accuracy because of increasing false positives. Furthermore, since many automated genome annotation systems do not distinguish pseudogenes from functional genes, manual curation is necessary to ensure high-quality annotation which is time-consuming and not always feasible. We developed GeneCuraid that aids the high confidence curation of protein-coding sequences and pseudogenes from genes predicted by automatic annotation tools. When the genome sequence of Escherichia coli K-12 MG1655 was used as a test data set, the algorithm improved specificity and accuracy of the annotation results were 24.33% and 91.90%, while maintaining sensitivity as high as 98.09%. Therefore, we expect that GeneCuraid algorithm would attain the high-quality genome annotation and help to reduce time and cost in manually determining correct protein-coding genes and pseudogenes.
|
Å°¿öµå(Keyword) |
¹Ì»ý¹° À¯Àüü ¿°±â¼¿
ÁÖ¼® ó¸®
À§À¯ÀüÀÚ
°ãÃÄÁø À¯ÀüÀÚ
´Ü¹éÁú ¹ßÇö À¯ÀüÀÚ
prokaryotic genome
microbial genome annotation
pseudogene
overlapping genes
coding sequence
|
ÆÄÀÏ÷ºÎ |
PDF ´Ù¿î·Îµå
|