Automatic Spatial Audio Scene Classification in Binaural Recordings of Music
Sławomir Zieliński , Hyunkook Lee
AbstractThe aim of the study was to develop a method for automatic classification of the three spatial audio scenes, differing in horizontal distribution of foreground and background audio content around a listener in binaurally rendered recordings of music. For the purpose of the study, audio recordings were synthesized using thirteen sets of binaural-room-impulse-responses (BRIRs), representing room acoustics of both semi-anechoic and reverberant venues. Head movements were not considered in the study. The proposed method was assumption-free with regards to the number and characteristics of the audio sources. A least absolute shrinkage and selection operator was employed as a classifier. According to the results, it is possible to automatically identify the spatial scenes using a combination of binaural and spectro-temporal features. The method exhibits a satisfactory classification accuracy when it is trained and then tested on different stimuli but synthesized using the same BRIRs (accuracy ranging from 74% to 98%), even in highly reverberant conditions. However, the generalizability of the method needs to be further improved. This study demonstrates that in addition to the binaural cues, the Mel-frequency cepstral coefficients constitute an important carrier of spatial information, imperative for the classification of spatial audio scenes.
|Journal series||Applied Sciences-Basel, ISSN 2076-3417, (N/A 70 pkt)|
|Publication size in sheets||1.05|
|Keywords in English||binaural audio, machine-listening, machine-learning, spatial audio scene classification|
|ASJC Classification||; ; ; ; ;|
|License||Journal (articles only); published final; ; with publication|
|Score|| = 30.0, 04-06-2019, manual|
= 70.0, 04-03-2020, ArticleFromJournal
|Publication indicators||: 2018 = 0.985; : 2018 = 2.217 (2) - 2018=2.287 (5)|
* presented citation count is obtained through Internet information analysis and it is close to the number calculated by the Publish or Perish system.