RAVEN2YOLO
This GitHub repository was created to simplify learning YOLOv5 and to adapt it for bioacoustics. It supports a paper analyzing Humpback Whale (Megaptera novaeangliae) vocalizations through the automatic detection and classification of 28 units.
See : Publication
This repository includes essential scripts for adapting YOLOv5 to bioacoustics research. Key scripts provided are:
- get_spectrogram.py : Extracts spectrograms from multiple recordings.
- labelme2yolo.py : Converts LabelMe annotations to YOLO format.
- yolo2labelme.py : Converts YOLO detections to LabelMe format.
- get_train_annot.py : Converts Raven format dataframe annotations to YOLO format.
- get_train_val.py : Separates training and validation datasets in a balanced manner.
- get_time_freq_detection.py : Compiles detections from a trained model into a dataframe and/or into Raven annotation format (.txt).
To use the scripts with Raven annotation software (Raven Lite), export annotations in the recommended format and follow these steps:
- go into the folder that contains the scripts
- Run get_train_annot.py
- Launch training following instructions
To use the scripts without Raven annotation software, you can follow these steps:
- Run get_spectrogram.py
- Install Labelme (pip install labelme) and annotate the spectrograms
- Run labelme2yolo.py
- Run get_train_val.py
- Launch training
The get_time_freq_detection.py script compiles detections into a NetCDF (.nc) file, detailing minimum, mid-range, and maximum frequency, duration, detection position, and model confidence.
Additional scripts may be added over time to automate other processes.
- For proper citation when using this methodology, please refer to the provided CITATION.cff file.
Install
git clone https://gitlab.lis-lab.fr/stephane.chavin/raven2yolo.git
pip install -r requirements.txt
Spectrogram Extraction Script
Description
This script extracts spectrograms from .wav
files. It allows you to specify various parameters such as duration, window size, hop ratio, high and low pass filters, overlap, resampling frequency, and CPU usage to optimize the process.
Usage
To run the script, use the following command:
python get_spectrogram.py <path> <directory> [options]
Arguments
Positional Arguments
- path: Path to the folder or file that contains the recordings.
- directory: Directory where the extracted spectrograms will be stored.
Optional Arguments
--duration (int): Duration for each spectrogram. Default is 8.
--window (int): Window size for the Fourier Transform. Default is 1024.
--hop (float): Ratio of hop in window. 50%% corresponds to 0.5. Default is 0.5.
--high (int): High Pass Filter value in Hz. Default is 10.
--low (int): Low Pass Filter value in Hz. Default is None.
--overlap (int): Overlap in seconds between two spectrograms. Default is 0.
--rf (int): Resampling Frequency of the signal. If not provided, the original frequency sampling of the recording will be used. Default is None.
--cpu (int): Number of CPUs to use for processing to speed up the process. Provide 2 or more. Default is 1.
--test (flag): If provided, sets the test flag to 1, otherwise it is None.
LabelMe to YOLO Annotation Converter
Description
This script converts annotations from the LabelMe format to a YOLO compatible format. It allows you to specify the path to the LabelMe annotations and the directory where the converted YOLO annotations will be stored.
Usage
To run the script, use the following command:
python labelme2yolo.py <path_to_data> <directory>
Arguments
Positional Arguments
- path_to_data: Path to the folder that contains the LabelMe annotations.
- directory: Directory where the YOLO annotations will be stored.
YOLO to JSON/LabelMe Annotation Converter
Description
This script converts annotations from the YOLO format (stored in .txt
files) to JSON files. It allows you to specify the path to the folder containing the .txt
files, the path to the folder containing the images, and optionally the directory where the modified JSON files will be stored.
Usage
To run the script, use the following command:
python yolo2labelme.py <path_to_txt> <path_to_img> [options]
Arguments
Positional Arguments
- path_to_txt: Path to the folder containing the .txt files.
- path_to_img: Path to the folder containing the .jpg images.
Optional Arguments
-d, --directory (str): Directory where the modified JSON files will be stored. If not provided, the directory will be the same as path_to_txt.
Data Splitting and Storage Script
Description
This script splits data into training, validation, and optionally test sets based on a specified ratio. It allows you to specify the path to the folder containing the .txt
files, the directory where the spectrogram and .txt
files will be stored, and an optional test flag to include a test split.
Usage
To run the script, use the following command:
python get_train_val.py <path_to_data> <directory> [options]
Arguments
Positional Arguments
- path_to_data: Path to the folder that contains the .txt files (should end with labels/).
- directory: Directory where the spectrogram and .txt files will be stored (should be different from <path_to_data>).
CSV to Spectrogram and Annotation Converter
Description
This script creates .txt
and .jpg
files for each annotation from a CSV file. It takes in the path to the CSV file containing annotations, the path to the folder containing the recordings, and the directory where the spectrograms and .txt
files will be stored. The script includes options for setting the duration and overlap of the spectrograms, frequency resampling, window size, hop ratio, CPU usage, and an optional test flag to include a test split.
Usage
To run the script, use the following command:
python get_train_annot.py <filename_path> <path_to_data> <directory> [options]
Arguments
Positional Arguments
- filename_path: Path/name of the folder/file containing the annotations. If a file, use Raven format and add a Path column with the path to the .wav files.
- path_to_data: Path of the folder that contains the recordings.
- directory: Directory where the spectrograms and .txt files will be stored.
Optional Arguments
--duration (int): Duration for each spectrogram. Default is 8.
--overlap (int): Overlap in seconds between two spectrograms. Default is 2.
--rf (int): Frequency resampling. Default is None.
--window (int): Window size for the Fourier Transform. Default is 1024.
--hop (float): Ratio of hop in window (e.g., 50% = 0.5). Default is 0.5.
--cpu (int): Number of CPUs to speed up the process. Default is 1.
--test (flag): If provided, splits the data into train/test/validation sets with the ratio 1 - Ratio / 2 for test and the same for validation. If not provided, only train and validation splits are created.
--minimum : If True, vmin will be stft.mean(), else stft.min().
Detection Collector and DataFrame Generator
Description
This script collects detections from .txt
files and returns a complete dataframe. It takes in the path to the folder containing the .txt
files, the directory where the dataframe will be stored, and the path to the YOLOv5
custom_data.yaml
file. Additionally, it allows specifying the sampling rate and the duration of the spectrogram.
Usage
To run the script, use the following command:
python get_time_freq_detection.py <path_to_data> <directory> <names> [options]
Arguments
Positional Arguments
- path_to_data: Path to the folder that contains the .txt files.
- directory: Directory where the dataframe will be stored.
- names: Path to the YOLOv5 custom_data.yaml file.
Optional Arguments
-s, --sr (int): Sampling rate of the spectrogram. This argument is required.
--duration (int): Duration of the spectrogram. Default is 8.
Training a YOLOv5 model
For this project, we adapt the YOLOv5 DataLoader to compute detection on a folder that contains .WAV files. If you need more informations about YOLOv5, see: https://github.com/ultralytics/yolov5
- Jocher Glenn (2020), "YOLOv5 by Ultralytics", doi: 10.5281/zenodo.3908559, license: AGPL-3.0
python yolov5/train.py --imgsz <IMG_SIZE> --batch <BATCH_SIZE> --epochs <NB_EPOCHS> --data <custom_data.yaml> --weights yolov5/weights/yolov5l.pt --hyp <custom_hyp.yaml> --cache
Detection
- Detect on audio files
python detect.py --weights yolov5/runs/train/<EXP_NB>/weights/best.pt --imgsz <imgsz> --conf <conf> --source <PATH_TO_FOLDER_THAT_CONTAIN_WAV> --save-txt --sound --rf <RF> --sampleDur <SampleDur> --minimum <True/False --window <window> --hop <hop> --low <low> --high <high> --cmap <cmap> --save-conf
- Detect on audio files without saving the detection images
python detect.py --weights yolov5/runs/train/<EXP_NB>/weights/best.pt --imgsz <imgsz> --conf <conf> --source <PATH_TO_FOLDER_THAT_CONTAIN_WAV> --save-txt --sound --rf <RF> --sampleDur <SampleDur> --minimum <True/False> --window <window> --hop <hop> --low <low> --high <high> --cmap <cmap> --save-conf --nosave
Arguments :
--imgsz: Inference size height and width. Default is 640.
--sampleDur: Duration for each spectrogram for detection. Default is 8.
--sr: Samplerate for each spectrogram for detection. Default is 22050.
--window: Window size for each spectrogram for detection. Default is 1024.
--hop: Hop length for each spectrogram for detection. Default is 50% of window = 512.
--sound: Enable sound. Default is False. Action 'store_true'.
--low: Low pass filter value.
--high: High pass filter.
--cmap: Colormap for the Spectrograms.
--minimum : If True, vmin will be stft.mean(), else stft.min().
Contact
If you have any questions, please contact me at the following e-mail address : stephane.chavin@univ-tln.fr
Contributors
H. Glotin managed the data storage and the human ressources for the
realisation of this software within the framework of AI Chair ADSIL
ANR-20-CHIA-0014 (Agence Nationale de la Recherche and DGA AID),
SYLVANIA ANR-21-CE04-0019 and the BIODIVERSA EUROPAM 2023-2026 projects.