Publications

All 2022 2021 2020 2019 2018 2017 2016 2015 2014 2013 2012 2011 2010 2008

Keyword: audio captioning (2) Back

2021

Continual Learning for Automated Audio Captioning Using The Learning Without Forgetting Approach [Conference]

J. Berg and K. Drossos, "Continual Learning for Automated Audio Captioning Using The Learning Without Forgetting Approach," in Proceedings of the 6th Detection and Classification of Acoustic Scenes and Events (DCASE) Workshop, pp. 140-144, Barcelona, Spain, 2021

Automated audio captioning (AAC) is the task of automatically creating textual descriptions (i.e. captions) for the contents of a general audio signal. Most AAC methods are using existing datasets to optimize and/or evaluate upon. Given the limited information held by the AAC datasets, it is very likely that AAC methods learn only the information contained in the utilized datasets. In this paper we present a first approach for continuously adapting an AAC method to new information, using a continual learning method. In our scenario, a pre-optimized AAC method is used for some unseen general audio signals and can update its parameters in order to adapt to the new information, given a new reference caption. We evaluate our method using a freely available, pre-optimized AAC method and two freely available AAC datasets. We compare our proposed method with three scenarios, two of training on one of the datasets and evaluating on the other and a third of training on one dataset and fine-tuning on the other. Obtained results show that our method achieves a good balance between distilling new knowledge and not forgetting the previous one.

https://arxiv.org/abs/2107.08028

Paper (.pdf)
Updated: 15-03-2022 09:25 - Size: 337.28 KB

BibTex record (.bib)
Updated: 15-03-2022 11:28 - Size: 609 B

Evaluating Off-the-Shelf Machine Listening and Natural Language Models for Automated Audio Captioning [Conference]

B. Weck, X. Favory, K. Drossos, and X. Serra, "Evaluating Off-the-Shelf Machine Listening and Natural Language Models for Automated Audio Captioning," in Proceedings of the 6th Detection and Classification of Acoustic Scenes and Events (DCASE) Workshop, pp. 60-64, Barcelona, Spain, 2021

Automated audio captioning (AAC) is the task of automatically generating textual descriptions for general audio signals. A captioning system has to identify various information from the input signal and express it with natural language. Existing works mainly focus on investigating new methods and try to improve their performance measured on existing datasets. Having attracted attention only recently, very few works on AAC study the performance of existing pre-trained audio and natural language processing resources. In this paper, we evaluate the performance of off-the-shelf models with a Transformer-based captioning approach. We utilize the freely available Clotho dataset to compare four different pre-trained machine listening models, four word embedding models, and their combinations in many different settings. Our evaluation suggests that YAMNet combined with BERT embeddings produces the best captions. Moreover, in general, fine-tuning pre-trained word embeddings can lead to better performance. Finally, we show that sequences of audio embeddings can be processed using a Transformer encoder to produce higher-quality captions.

https://arxiv.org/abs/2110.07410

Paper (.pdf)
Updated: 15-03-2022 09:27 - Size: 426.4 KB

Show/Hide All

Publications

BiBTeX Record

Konstantinos Drossos®

Subscribe to my newsletter

Konstantinos Drossos^®