• Àüü
  • ÀüÀÚ/Àü±â
  • Åë½Å
  • ÄÄÇ»ÅÍ
´Ý±â

»çÀÌÆ®¸Ê

Loading..

Please wait....

¿µ¹® ³í¹®Áö

Ȩ Ȩ > ¿¬±¸¹®Çå > ¿µ¹® ³í¹®Áö > TIIS (Çѱ¹ÀÎÅͳÝÁ¤º¸ÇÐȸ)

TIIS (Çѱ¹ÀÎÅͳÝÁ¤º¸ÇÐȸ)

Current Result Document :

ÇѱÛÁ¦¸ñ(Korean Title) Improvement of Vocal Detection Accuracy Using Convolutional Neural Networks
¿µ¹®Á¦¸ñ(English Title) Improvement of Vocal Detection Accuracy Using Convolutional Neural Networks
ÀúÀÚ(Author) Shingchern D. You   Chien-Hung Liu   Jia-Wei Lin  
¿ø¹®¼ö·Ïó(Citation) VOL 15 NO. 02 PP. 0729 ~ 0748 (2021. 02)
Çѱ۳»¿ë
(Korean Abstract)
¿µ¹®³»¿ë
(English Abstract)
Vocal detection is one of the fundamental steps in musical information retrieval. Typically, the detection process consists of feature extraction and classification steps. Recently, neural networks are shown to outperform traditional classifiers. In this paper, we report our study on how to improve detection accuracy further by carefully choosing the parameters of the deep network model. Through experiments, we conclude that a feature-classifier model is still better than an end-to-end model. The recommended model uses a spectrogram as the input plane and the classifier is an 18-layer convolutional neural network (CNN). With this arrangement, when compared with existing literature, the proposed model improves the accuracy from 91.8% to 94.1% in Jamendo dataset. As the dataset has an accuracy of more than 90%, the improvement of 2.3% is difficult and valuable. If even higher accuracy is required, the ensemble learning may be used. The recommend setting is a majority vote with seven proposed models. Doing so, the accuracy increases by about 1.1% in Jamendo dataset.
Å°¿öµå(Keyword) Vocal Detection   CNN   Ensemble Learning  
ÆÄÀÏ÷ºÎ PDF ´Ù¿î·Îµå