• Àüü
  • ÀüÀÚ/Àü±â
  • Åë½Å
  • ÄÄÇ»ÅÍ
´Ý±â

»çÀÌÆ®¸Ê

Loading..

Please wait....

±¹³» ³í¹®Áö

Ȩ Ȩ > ¿¬±¸¹®Çå > ±¹³» ³í¹®Áö > Çѱ¹ÀÎÅͳÝÁ¤º¸ÇÐȸ ³í¹®Áö

Çѱ¹ÀÎÅͳÝÁ¤º¸ÇÐȸ ³í¹®Áö

Current Result Document : 6 / 44 ÀÌÀü°Ç ÀÌÀü°Ç   ´ÙÀ½°Ç ´ÙÀ½°Ç

ÇѱÛÁ¦¸ñ(Korean Title) °ø°Ý ¸ÞÀÏ ½Äº°À» À§ÇÑ ºñÁ¤Çü µ¥ÀÌÅ͸¦ »ç¿ëÇÑ À¯ÀüÀÚ ¾Ë°í¸®Áò ±â¹ÝÀÇ Æ¯Â¡¼±Åà ¾Ë°í¸®Áò
¿µ¹®Á¦¸ñ(English Title) Feature-selection algorithm based on genetic algorithms using unstructured data for attack mail identification
ÀúÀÚ(Author) È«¼º»ï   ±èµ¿¿í   ÇÑ¸í¹¬   Sung-Sam Hong   Dong-Wook Kim   Myung-Mook Han  
¿ø¹®¼ö·Ïó(Citation) VOL 20 NO. 01 PP. 0001 ~ 0010 (2019. 02)
Çѱ۳»¿ë
(Korean Abstract)
ºò µ¥ÀÌÅÍ¿¡¼­ ÅؽºÆ® ¸¶ÀÌ´×Àº ¸¹Àº ¼öÀÇ µ¥ÀÌÅͷκÎÅÍ ¸¹Àº Ư¡ ÃßÃâÇϱ⠶§¹®¿¡, Ŭ·¯½ºÅ͸µ ¹× ºÐ·ù °úÁ¤ÀÇ °è»ê º¹Àâµµ°¡ ³ô°í ºÐ¼®°á°úÀÇ ½Å·Ú¼ºÀÌ ³·¾ÆÁú ¼ö ÀÖ´Ù. ƯÈ÷ ÅؽºÆ®¸¶ÀÌ´× °úÁ¤À» ÅëÇØ ¾ò´Â Term document matrix´Â term°ú ¹®¼­°£ÀÇ Æ¯Â¡µéÀ» Ç¥ÇöÇÏ°í ÀÖÁö¸¸, Èñ¼ÒÇà·Ä ÇüŸ¦ º¸ÀÌ°Ô µÈ´Ù. º» ³í¹®¿¡¼­´Â ŽÁö¸ðµ¨À» À§ÇØ ÅؽºÆ®¸¶À̴׿¡¼­ °³¼±µÈ GA(Genetic Algorithm)À» ÀÌ¿ëÇÑ Æ¯Â¡ ÃßÃâ ¹æ¹ýÀ» ¼³°èÇÏ¿´´Ù. TF-IDF´Â Ư¡ ÃßÃâ¿¡¼­ ¹®¼­¿Í ¿ë¾î°£ÀÇ °ü°è¸¦ ¹Ý¿µÇϴµ¥ »ç¿ëµÈ´Ù. ¹Ýº¹°úÁ¤À» ÅëÇØ »çÀü¿¡ ¹Ì¸® °áÁ¤µÈ ¸¸Å­ÀÇ Æ¯Â¡À» ¼±ÅÃÇÑ´Ù. ¶ÇÇÑ Å½Áö¸ðµ¨ÀÇ ¼º´É Çâ»óÀ» À§ÇØ sparsity score(Èñ¼Ò¼º Á¡¼ö)¸¦ »ç¿ëÇÏ¿´´Ù. ½ºÆÔ¸ÞÀÏ ¼¼Æ®ÀÇ Èñ¼Ò¼ºÀÌ ³ôÀ¸¸é ŽÁö¸ðµ¨ÀÇ ¼º´ÉÀÌ ³·¾ÆÁ® ÃÖÀûÈ­µÈ ŽÁö ¸ðµ¨À» ã±â°¡ ¾î·Æ´Ù. ¿ì¸®´Â fitness function¿¡¼­ s(F)¸¦ »ç¿ëÇÏ¿© Èñ¼Ò¼ºÀÌ ³·°í TF-IDF Á¡¼ö°¡ ³ôÀº ŽÁö¸ðµ¨À» ã¾Ò´Ù. ¶ÇÇÑ Á¦¾ÈµÈ ¾Ë°í¸®ÁòÀ» ÅؽºÆ® ºÐ·ù ½ÇÇè¿¡ Àû¿ëÇÏ¿© ¼º´ÉÀ» °ËÁõÇÏ¿´´Ù. °á°úÀûÀ¸·Î, Á¦¾ÈÇÑ ¾Ë°í¸®ÁòÀº °ø°Ý ¸ÞÀÏ ºÐ·ù¿¡¼­ ÁÁÀº ¼º´É(¼Óµµ¿Í Á¤È®µµ)À» º¸¿©ÁÖ¾ú´Ù.
¿µ¹®³»¿ë
(English Abstract)
Since big-data text mining extracts many features and data, clustering and classification can result in high computational complexity and low reliability of the analysis results. In particular, a term document matrix obtained through text mining represents term-document features, but produces a sparse matrix. We designed an advanced genetic algorithm (GA) to extract features in text mining for detection model. Term frequency inverse document frequency (TF-IDF) is used to reflect the document-term relationships in feature extraction. Through a repetitive process, a predetermined number of features are selected. And, we used the sparsity score to improve the performance of detection model. If a spam mail data set has the high sparsity, detection model have low performance and is difficult to search the optimization detection model. In addition, we find a low sparsity model that have also high TF-IDF score by using s(F) where the numerator in fitness function. We also verified its performance by applying the proposed algorithm to text classification. As a result, we have found that our algorithm shows higher performance (speed and accuracy) in attack mail classification.
Å°¿öµå(Keyword) º¸¾È   ºñÁ¤Çü µ¥ÀÌÅÍ   Áö´ÉÇü µ¥ÀÌÅÍ ºÐ¼®   Ư¡ ¼±Åà  °ø°Ý ¸ÞÀÏ   Security   Unstructured Data   Intelligent Data Analysis   Feature Selection   Attack Mail  
ÆÄÀÏ÷ºÎ PDF ´Ù¿î·Îµå