Improvement of Vocal Detection Accuracy Using Convolutional Neural Networks

Shingchern D. You; Chien-Hung Liu; Jia-Wei Lin

연구문헌

영문 논문지

홈 > 연구문헌 > 영문 논문지 > TIIS (한국인터넷정보학회)

TIIS (한국인터넷정보학회)

Current Result Document :

한글제목(Korean Title)	Improvement of Vocal Detection Accuracy Using Convolutional Neural Networks
영문제목(English Title)	Improvement of Vocal Detection Accuracy Using Convolutional Neural Networks
저자(Author)	Shingchern D. You Chien-Hung Liu Jia-Wei Lin
원문수록처(Citation)	VOL 15 NO. 02 PP. 0729 ~ 0748 (2021. 02)
한글내용 (Korean Abstract)
영문내용 (English Abstract)	Vocal detection is one of the fundamental steps in musical information retrieval. Typically, the detection process consists of feature extraction and classification steps. Recently, neural networks are shown to outperform traditional classifiers. In this paper, we report our study on how to improve detection accuracy further by carefully choosing the parameters of the deep network model. Through experiments, we conclude that a feature-classifier model is still better than an end-to-end model. The recommended model uses a spectrogram as the input plane and the classifier is an 18-layer convolutional neural network (CNN). With this arrangement, when compared with existing literature, the proposed model improves the accuracy from 91.8% to 94.1% in Jamendo dataset. As the dataset has an accuracy of more than 90%, the improvement of 2.3% is difficult and valuable. If even higher accuracy is required, the ensemble learning may be used. The recommend setting is a majority vote with seven proposed models. Doing so, the accuracy increases by about 1.1% in Jamendo dataset.
키워드(Keyword)	Vocal Detection CNN Ensemble Learning
파일첨부	PDF 다운로드