음향신호 압축을 위한 심층망 구성과 종단간 학습

다니엘라 림; 장인선; 최희열; Daniela N. Rim; Inseon Jang; Heeyoul Choi

연구문헌

국내 논문지

홈 > 연구문헌 > 국내 논문지 > 한국정보과학회 논문지 > 정보과학회논문지 (Journal of KIISE)

정보과학회논문지 (Journal of KIISE)

Current Result Document : 246 / 247

한글제목(Korean Title)	음향신호 압축을 위한 심층망 구성과 종단간 학습
영문제목(English Title)	Deep Neural Networks and End-to-End Learning for Audio Compression
저자(Author)	다니엘라 림 장인선 최희열 Daniela N. Rim Inseon Jang Heeyoul Choi
원문수록처(Citation)	VOL 48 NO. 08 PP. 0940 ~ 0946 (2021. 08)
한글내용 (Korean Abstract)	단일 딥러닝 모델에 대한 최근의 성과는 고도로 구조화된 데이터를 하나의 통합된 모델로 다루는 일들을 가능하게 했다. 하지만, 오디오 신호를 압축하기 위한 단일 딥러닝 모델을 학습하는 것은 내부적으로 신호에 대해 이산표현을 필요로 하기 때문에 어려운 작업이었다. 본 논문에서는 은닉공간에 이산 표현을 가지는 변이 오토인코더 의 훈련 전략 내에서 순환 신경망(RNNs)를 결합하는 단일모델 기반 심층망 모델과 학습방법을 제시한다. 제안하는 방법에서는 베르누이(Bernoulli) 분포를 위한 재파라미터화 기법을 사용하여 이산표현에서 역전파를 가능하게 하도록 하였으며 그 결과 실제 오디오 압축에 필수적인 인코더와 디코더를 분리할 수 있었다. 우리가 아는 범위에서, 제안된 모델은 오디오 압축을 위해 RNN를 사용한 단일모델 학습의 최초의 구현으로써, 20.53dB의 SDR (신호 대 왜곡 비율)을 달성한다.
영문내용 (English Abstract)	Recent advances in end-to-end deep learning have encouraged the exploration of tasks dealing with highly structured data using unified deep network models. The fabrication and design of such models for compressing audio signals has been a challenge due to the need for discrete representations that are not easy to train with end-to-end backpropagation. In this paper, we present an end-to-end deep learning approach that combines recurrent neural networks (RNNs) within the training strategy of variational autoencoders (VAEs) with a binary representation of the latent space. We apply a reparametrization trick for the Bernoulli distribution for the discrete representations, which allows smooth backpropagation. In addition, our approach enables the separation of the encoder and decoder, which is necessary for compression tasks. To the best of our knowledge, this is the first end-to-end learning for a single audio compression model with RNNs, and our model achieves a Signal to Distortion Ratio (SDR) of 20.53dB.
키워드(Keyword)	음향 압축 종단간 학습 이산 상태 공간 변이오토인코더 audio compression end-to-end learning discrete latent space variational autoencoder
파일첨부	PDF 다운로드