CNN기반 소리 분류 모델의 고조파 신호 인식 개선을 위한 오디오 전처리 방법

구본철; Bon-Cheul Koo; 백문기; Moon-Ki Back; 이규철; Kyu-Chul Lee

연구문헌

국내 학회지

홈 > 연구문헌 > 국내 학회지 > 데이터베이스 연구회지(SIGDB)

데이터베이스 연구회지(SIGDB)

Current Result Document : 11 / 11

한글제목(Korean Title)	CNN기반 소리 분류 모델의 고조파 신호 인식 개선을 위한 오디오 전처리 방법
영문제목(English Title)	Audio Pre-processing Method for Improved Harmonics Signal Recognition in CNN-based Sound Classification Model
저자(Author)	구본철 Bon-Cheul Koo 백문기 Moon-Ki Back 이규철 Kyu-Chul Lee
원문수록처(Citation)	VOL 36 NO. 01 PP. 0018 ~ 0038 (2020. 04)
한글내용 (Korean Abstract)	최근 소리 분류와 음성인식 분야에서 CNN이 성공적으로 적용되어 보다 효과적인 모델학습을 위한 입력 표현에 대한 관심이 높아졌다. 대표적인 STFT 스펙트로그램과 같은 다양한 시간-주파수 표현을 통한 소리 신호의 시각적 표현은 원신호의 시간에 따른 스펙트럼 변화를 관찰할 수 있도록 돕는다. 그런데 일부 입력 소리 신호에 존재할 수 있는 주기성과 주어진 프레임 크기와 보폭을 가지는 STFT와 같은 프레임 합성곱 기법의 출력 보폭은 서로 정확히 같을 수 없다. 때문에 둘 사이의 정렬이 시간의 흐름에 따라 어긋나게 되어 원신호의 국소적 주기성에 대한 위상 정보가 유실되는 현상이 발생한다. 본 연구에서는 소리 분류작업에 유용한 특성을 학습하는데 효과적인 프레임 기반 시간-주파수 전처리 과정에서 유실되는 위상정보를 복원하여 CNN기반 분류 모델의 학습 데이터로 활용하는 실험을 진행하였으며, 그 결과 특정 소리 집단에서 10% 이상의 분류 정확도 증가를 확인하였다.
영문내용 (English Abstract)	Recently, CNN has been successfully applied in the field of sound classification and speech recognition, and interest in input expression for more effective model learning has increased. The visual representation of the audio signal through various time-frequency representations, such as the typical STFT spectrogram, helps to observe the spectral changes over time of the original signal. However, the periodicity that may exist in some input audio signals and the output stride length of a frame-based convolutional method such as STFTs with given frame size and stride, cannot be exactly the same. Therefore, the alignment between the two are shifted over time, resulting in a loass of phase information about the local periodicity of the original signal. In this paper, we conducted an experiment to restore the phase information lost in the frame-based time-frequency pre-processing, which is effective for learning useful properties for sound classification task, and use it as training data for the CNN-based classification model. And as a result, we confirmed an increased in classification accuracy of more than 10% in a specific sound group.
키워드(Keyword)	신호처리 환경음 소리분류 CNN 딥러닝 Signal processing environmental sound classification convolutional neural network deep learning
파일첨부	PDF 다운로드