DCNN-LSTM Based Audio Classification Combining Multiple Feature Engineering and Data Augmentation Techniques

Short-Time Fourier Transform
Data Augmentation
Spectral Feature Extraction
Ensemble Classification

Presented at the 4th INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING & OPTIMIZATION 2021. Available Online at Lecture Notes in Networks and Systems, Springer

Everything we know is based on our brain’s ability to process sensory data. Hearing is a crucial sense for our ability to learn. Sound is essential for a wide range of activities such as exchanging information, interacting with others, and so on. To convert the sound electrically, the role of the audio signal comes into play. Because of the countless essential applications, audio signal & their classification poses an important value. However, in this day and age, classifying audio signals remains a difficult task. To classify audio signals more accurately and effectively, we have proposed a new model. In this study, we’ve applied a brand-new method for audio classification that combines the strengths of Deep Convolutional Neural Network (DCNN) and Long-Short Term Memory (LSTM) models with a unique combination of feature engineering to get the best possible outcome. Here, we have integrated data augmentation and feature extraction together before fitting it into the model to evaluate the performance. There is a higher degree of accuracy observed after the experiment. To validate the efficacy of our model, a comparative analysis has been made with the latest conducted reference works.