Skip to content
Snippets Groups Projects
Select Git revision
  • 285f766aa1b629c05e9da8da6b7d7b72eab96de8
  • master default protected
2 results

f0_estimation

  • Clone with SSH
  • Clone with HTTPS
  • Cross-species F0 estimation, dataset and study of baseline algorithms

    infos on current collaborations in https://docs.google.com/document/d/179dD1d6lmWhQ9e2E1AUoJyLZ1d_5c-aMNwAPkbmIcBU

    metadata.py

    Stores a dictionary of datasets and characteristics (SR, NFFT, and path to access soundfiles) Convenient to iterate over the whole dataset

    from metadata import species
    for specie in species:
        wavpath, FS, nfft = species[specie].values()
        # iterate over files (one per vocalisation)
        for fn in tqdm(glob(wavpath), desc=specie):
            sig, fs = sf.read(fn) # read soundfile
            annot = pd.read_csv(f'{fn[:-4]}.csv') # read annotations (one column Time in seconds, one column Freq in Herz)

    print_annot.py

    For each vocalisation, prints a spectrogram and overlaid annotations as .png file stored in the annot_pngs folder.

    run_all.py

    Runs all baseline algorithms over the dataset.

    This scripts stores predictions along with resampled annotations in {basename}_preds.csv files

    eval_all.py

    Evaluates each algorithms over the dataset using {basename}_preds.csv files, with a threshold of 50 cents for accuracies. For each algorithms and species, this outputs ROC optimal thresholds, Recall, False alarm, Pitch accuracy, and Chroma accuracy. /!\ These metrics are mesured per vocalisation before being averaged. Scores are stored in scores/{specie}_scores.csv files

    get_noisy_labels.py

    Detects potential misannotations. It measures

    • the SNR as the median of the annotated f0's energy relative to the median energy for its time bin.
    • the presence of a sub-harmonic as the SNR (see above) of half the annotated f0
    • the presence of a 2/3 sub-harmonic as the SNR (see above) of 2/3 the annotated f0 These values are then thresholded and "noisy" vocalisation spectrograms are copied into the noisy_pngs folder to browse and check results. One can then check if the spectrogram exists in noisy_pngs to discard noisy labels.

    train_crepe.py

    Fine tunes the crepe model using the whole dataset.

    • Loads 1024 sample windows and their corresponding f0 to be stored in a large train_set.pkl file (skip if data hasn't changed).
    • Applies gradient descent using the BCE following the crepe paper (this task is treated as a binary classification for each spectral bin).
    • The fine tuned model is stored in model_all.pth
    • Train on all species but one given as argument