CoarseSoundNet
The released model corresponds to the paper 'CoarseSoundNet: Building a reliable model for ecological soundscape analysis' (link will follow upon acceptance). The model tags audio files to one or more of the three coarse soundscape components Anthropophony, Biophony, or Geophony. It also predicts silence, but we recommend to ignore this class and assume silence only when none of the three classes was predicted.
Installation
To use the model, you have to install autrainer, e.g. via pip:
pip install autrainer
This model has been trained and tested with autrainer version 0.6.0.
For more information about autrainer, please refer to: https://autrainer.github.io/autrainer/index.html
Usage
The model can be applied on all wav files present in a folder (<data-root>) and stored in another folder (<output-root>):
autrainer inference hf:HearTheSpecies/CoarseSoundNet -r <data-root> <output-root> -w 10 -s 10 -sr 48000
, where -w is the window size in seconds, -s is the step size in seconds and -sr is the sampling rate.
For other possible inference settings and all usable parameters, please have a look at the autrainer documentation.
However, the above settings are recommended.
Caution! The sampling rate needs to be kept as given above, in order to process the files correctly.
Training
Pretraining
TODO
Dataset
TODO
Features
The audio recordings were resampled to 48kHz. We then used the CLAP-FeatureExtractor ('laion/clap-htsat-unfused') from Wu et al. (2023). For more information, please check our github repository (will be made public soon).
Training process
The model has been trained for 30 epochs. At the end of each epoch, the model was evaluated on our validation set.
We release the state that achieved the best performance on this validation set.
All training hyperparameters can be found inside conf/config.yaml inside the model folder.
Evaluation
TODO
Acknowledgments
TODO
Please acknowledge the work which produced the original model. We would appreciate an acknowledgment to autrainer.