Konstantinos Drossos, Maximos Kaliakatsos-Papakostas, Andreas Floros, and Tuomas Virtanen, “On the Impact of the Semantic Content of Sound Events in Emotion Elicitation,” Journal of Audio Engineering Society, Vol. 64, No. 7/8, pp. 525–532, 2016
Sound events are proven to have an impact on the emotions of the listener. Recent works on the field of emotion recognition from sound events show, on one hand, the possibility of automatic emotional information retrieval from sound events and, on the other hand, the need for deeper understanding of the significance of the sound events’ semantic content on listener’s affective state. In this work we present a first, to the best of authors’ knowledge, investigation of the relation between the semantic similarity of the sound events and the elicited emotion. For that cause we use two emotionally annotated sound datasets and the Wu-Palmer semantic similarity measure according to WordNet. Results indicate that the semantic content seems to have a limited role in the conformation of the listener’s affective states. On the contrary, when the semantic content is matched to specific areas in the Arousal-Valence space or also the source’s spatial position is taken into account, it is exhibited that the importance of the semantic content effect is higher, especially for the cases with medium to low valence and medium to high arousal or when the sound source is at the lateral positions of the listener’s head, respectively.
Stylianos Ioannis Mimilakis, Konstantinos Drossos, Tuomas Virtanen, and Gerald Schuller, “Deep Neural Networks for Dynamic Range Compression in Mastering Applications”, in proceedings of the 140th Audio Engineering Society (AES) Convention, Jul. 4–7, Paris, France, 2016
The process of audio mastering often, if not always, includes various audio signal processing techniques such as frequency equalization and dynamic range compression. With respect to the genre and style of the audio content, the parameters of these techniques are controlled by a mastering engineer, in order to process the original audio material. This operation relies on musical and perceptually pleasing facets of the perceived acoustic characteristics, transmitted from the audio material under the mastering process. Modeling such dynamic operations, which involve adaptation regarding the audio content, becomes vital in automated applications since it significantly affects the overall performance. In this work we present a system capable of modelling such behavior focusing on the automatic dynamic range compression. It predicts frequency coefficients that allow the dynamic range compression, via a trained deep neural network, and applies them to unmastered audio signal served as input. Both dynamic range compression and the prediction of the corresponding frequency coefficients take place inside the time-frequency domain, using magnitude spectra acquired from a critical band filter bank, similar to humans’ peripheral auditory system. Results from conducted listening tests, incorporating professional music producers and audio mastering engineers, demonstrate on average an equivalent performance compared to professionally mastered audio content. Improvements were also observed when compared to relevant and commercial software.
Konstantinos Drossos, Maximos Kaliakatsos-Papakostas, and Andreas Floros, “Affective Audio Synthesis for Sound Experience Enhancement”, Experimental Multimedia Systems for Interactivity and Strategic Inovation, I. Deliyannis, P. Kostagiolas (Eds), IGI-Global, 2016
With the advances of technology, multimedia tend to be a recurring and prominent component in almost all forms of communication. Although their content spans in various categories, there are two protuberant channels that are used for information conveyance, i.e. audio and visual. The former can transfer numerous content, ranging from low-level characteristics (e.g. spatial location of source and type of sound producing mechanism) to high and contextual (e.g. emotion). Additionally, recent results of published works depict the possibility for automated synthesis of sounds, e.g. music and sound events. Based on the above, in this chapter the authors propose the integration of emotion recognition from sound with automated synthesis techniques. Such a task will enhance, on one hand, the process of computer driven creation of sound content by adding an anthropocentric factor (i.e. emotion) and, on the other, the experience of the multimedia user by offering an extra constituent that will intensify the immersion and the overall user experience level.