음성-음악 혼재 데이터에서의 음성분리를 위한 확률적 어텐션을 사용한 양방향 LSTM 기반 피치 분류

김한규; 장길진; 박정식; 오영환; 최호진; Han-Gyu Kim; Gil-Jin Jang; Jeong-Sik Park; Yung-Hwan Oh; Ho-Jin Choi

연구문헌

국내 논문지

홈 > 연구문헌 > 국내 논문지 > 한국정보과학회 논문지 > 정보과학회 컴퓨팅의 실제 논문지 (KIISE Transactions on Computing Practices)

정보과학회 컴퓨팅의 실제 논문지 (KIISE Transactions on Computing Practices)

Current Result Document : 1 / 1

한글제목(Korean Title)	음성-음악 혼재 데이터에서의 음성분리를 위한 확률적 어텐션을 사용한 양방향 LSTM 기반 피치 분류
영문제목(English Title)	Pitch Classification Based on Bidirectional LSTM with robabilistic Attention for Speech Segregation from Speech-Music Mixtures
저자(Author)	김한규 장길진 박정식 오영환 최호진 Han-Gyu Kim Gil-Jin Jang Jeong-Sik Park Yung-Hwan Oh Ho-Jin Choi
원문수록처(Citation)	VOL 25 NO. 04 PP. 0223 ~ 0230 (2019. 04)
한글내용 (Korean Abstract)	Sub-band masking 기반 단일채널 음성분리에서는 음성피치를 추정하여 추정된 피치와 일치하는 주파수 에너지만 통과시키는 필터를 사용하여 배경 잡음으로부터 음성을 분리한다. 음성과 음악은 비슷한 하모닉 구조를 가지고 있어, 음악이 잡음으로 입력될 경우 추정된 피치에 음성 피치와 음악 피치가 공존하게 되며, 이는 음성분리의 성능하락으로 연결된다. 따라서 음성-음악 혼재 데이터에서의 효과적인 음성분리를 위해 음성 피치와 음악 피치를 분류해야 한다. 본 연구에서는 양방향 LSTM을 사용하는 음성/음악 피치 분류 방법을 제안하였으며, 양방향 LSTM의 성능을 향상시키기 위해서 확률적 어텐션 레이어 구조를 제안하였다. 또한 피치 분류 결과로부터 자연스러운 음성분리 결과를 얻기 위해 음악 에너지가 제거된 음성분리 마스크 생성 기법을 제안하였다. 실험결과 확률적 어텐션 기반 양방향 LSTM이 다른 방법에 비해 더 좋은 음성분리 성능을 보여주었다.
영문내용 (English Abstract)	Speech segregation based on sub-band masking extracts speech signals from audio mixtures via estimation of speech pitch and conservation of signals compatible with the estimated pitch. As speech and music exhibit similar harmonic structures, speech pitch and music pitch coexist in the estimated pitch when speech-music mixture is used as the input, which leads to performance degradation. In order to overcome this limitation, we propose pitch classification using bidirectional LSTM. The probabilistic attention layer is also proposed to improve the bidirectional LSTM. Further, musical energy removal for segregation mask generation is also proposed in order to obtain naturally segregated speech with pitch classification. The experiment results show that the proposed pitch classification using bidirectional LSTM based on probabilistic attention outscores other speech segregation methods.
키워드(Keyword)	음성분리 피치 분류 양방향 LSTM 확률적 어텐션 speech segregation pitch classification bidirectional LSTM probabilistic attention
파일첨부	PDF 다운로드