Cross-species F0 estimation, dataset and study of baseline algorithms
infos on current collaborations in https://docs.google.com/document/d/179dD1d6lmWhQ9e2E1AUoJyLZ1d_5c-aMNwAPkbmIcBU
metadata.py
Stores a dictionary of datasets and characteristics (SR, NFFT, and path to access soundfiles) Convenient to iterate over the whole dataset
from metadata import species
for specie in species:
wavpath, FS, nfft = species[specie].values()
# iterate over files (one per vocalisation)
for fn in tqdm(glob(wavpath), desc=specie):
sig, fs = sf.read(fn) # read soundfile
annot = pd.read_csv(f'{fn[:-4]}.csv') # read annotations (one column Time in seconds, one column Freq in Herz)
print_annot.py
For each vocalisation, prints a spectrogram and overlaid annotations as .png file stored in the annot_pngs
folder.
run_all.py
Runs all baseline algorithms over the dataset.
- praat (praat-parselmouth implem)
- pyin (librosa implem)
- crepe (torchcrepe implem)
- crepe finetuned (torchcrepe implem)
- crepe finetuned over all species BUT the one it is evaluated on
- crepe (original tensorflow implem https://arxiv.org/abs/1802.06182)
- basic pitch (https://arxiv.org/abs/2203.09893)
- pesto (https://arxiv.org/abs/2309.02265)
This scripts stores predictions along with resampled annotations in {basename}_preds.csv
files
eval_all.py
Evaluates each algorithms over the dataset using {basename}_preds.csv
files, with a threshold of 50 cents for accuracies.
For each algorithms and species, this outputs ROC optimal thresholds, Recall, False alarm, Pitch accuracy, and Chroma accuracy.
/!\ These metrics are mesured per vocalisation before being averaged.
Scores are stored in scores/{specie}_scores.csv
files
get_noisy_labels.py
Detects potential misannotations. It measures
- the SNR as the median of the annotated f0's energy relative to the median energy for its time bin.
- the presence of a sub-harmonic as the SNR (see above) of half the annotated f0
- the presence of a 2/3 sub-harmonic as the SNR (see above) of 2/3 the annotated f0
These values are then thresholded and "noisy" vocalisation spectrograms are copied into the
noisy_pngs
folder to browse and check results. One can then check if the spectrogram exists innoisy_pngs
to discard noisy labels.
train_crepe.py
Fine tunes the crepe model using the whole dataset.
-
Loads 1024 sample windows and their corresponding f0 to be stored in a large
train_set.pkl
file (skip if data hasn't changed). - Applies gradient descent using the BCE following the crepe paper (this task is treated as a binary classification for each spectral bin).
-
The fine tuned model is stored in
model_all.pth
- Train on all species but one given as argument