네트워크 트래픽 데이터의 희소 클래스 분류 문제 해결을 위한 전처리 연구

류경준; 신동일; 신동규; 박정찬; 김진국; Ryu Kyung Joon; Shin DongIl; Shin DongKyoo; Park JeongChan; Kim JinGoog

연구문헌

국내 논문지

홈 > 연구문헌 > 국내 논문지 > 한국정보처리학회 논문지 > 정보처리학회 논문지 소프트웨어 및 데이터 공학

정보처리학회 논문지 소프트웨어 및 데이터 공학

Current Result Document : 11 / 12

한글제목(Korean Title)	네트워크 트래픽 데이터의 희소 클래스 분류 문제 해결을 위한 전처리 연구
영문제목(English Title)	A Pre-processing Study to Solve the Problem of Rare Class Classification of Network Traffic Data
저자(Author)	류경준 신동일 신동규 박정찬 김진국 Ryu Kyung Joon Shin DongIl Shin DongKyoo Park JeongChan Kim JinGoog
원문수록처(Citation)	VOL 09 NO. 12 PP. 0411 ~ 0418 (2020. 12)
한글내용 (Korean Abstract)	정보보안을 위한 IDS(Intrusion Detection Systems)는 통상적으로 서명기반(signature based) 침입탐지시스템과 이상기반(anomaly-based) 침입 탐지시스템으로 분류한다. 이 중에서도 네트워크에서 발생하는 트래픽 데이터를 기계학습으로 분석하는 이상기반 IDS 연구가 활발하게 진행됐다. 본 논문에서는 공격 유형 학습에 사용되는 데이터에 존재하는 희소 클래스 문제로 인한 성능 저하를 해결하기 위한 전처리 방안에 대해 연구했다. 희소 클래스(Rare Class)와 준 희소 클래스(Semi Rare Class)를 기준으로 데이터를 재구성하여 기계학습의 분류 성능의 개선에 대하여 실험했다. 재구성된 3종의 데이터 세트에 대하여 Wrapper와 Filter 방식을 연이어 적용하는 하이브리드 특징 선택을 수행한 이후에 Quantile Scaler로 정규화를 처리하여 전처리를 완료한다. 준비된 데이터는 DNN(Deep Neural Network) 모델로 학습한 후 TP(True Positive)와 FN(False Negative)를 기준으로 분류 성능을 평가했다. 이 연구를 통해 3종류의 데이터 세트에서 분류 성능이 모두 개선되는 결과를 얻었다.
영문내용 (English Abstract)	In the field of information security, IDS(Intrusion Detection System) is normally classified in two different categories: signature-based IDS and anomaly-based IDS. Many studies in anomaly-based IDS have been conducted that analyze network traffic data generated in cyberspace by machine learning algorithms. In this paper, we studied pre-processing methods to overcome performance degradation problems cashed by rare classes. We experimented classification performance of a Machine Learning algorithm by reconstructing data set based on rare classes and semi rare classes. After reconstructing data into three different sets, wrapper and filter feature selection methods are applied continuously. Each data set is regularized by a quantile scaler. Depp neural network model is used for learning and validation. The evaluation results are compared by true positive values and false negative values. We acquired improved classification performances on all of three data sets.
키워드(Keyword)	기계학습 희소 클래스 준 희소 클래스 전처리 특징 선택 Machine Learning Rare Class Semi Rare Class Pre-processing Feature Selection
파일첨부	PDF 다운로드