BirdNET v2.4 ONNX Backbone
Backbone-only ONNX exports of the BirdNET v2.4 bird sound classifier. The classification head has been removed, leaving only frontend + feature-extraction.
Two variants are provided, matching the originals from justinchuby/BirdNET-onnx: model_backbone.onnx and birdnet_backbone.onnx. Both models output a single tensor named embedding with shape (1, 1024).
Embeddings are numerically verified against the reference TF SavedModel published on Zenodo (BirdNET_v2.4_protobuf).
Quick start
import numpy as np
import onnxruntime as ort
from huggingface_hub import hf_hub_download
# Download backbone
path = hf_hub_download(
repo_id="biodiversica/BirdNET-onnx-backbone",
filename="model_backbone.onnx",
)
sess = ort.InferenceSession(path)
# 3 s of audio at 48 kHz
audio = np.zeros((1, 144000), dtype=np.float32)
(embedding,) = sess.run(["embedding"], {"INPUT": audio})
print(embedding.shape) # (1, 1024)
For birdnet_backbone.onnx the input key is "input" (lowercase):
path = hf_hub_download(
repo_id="biodiversica/BirdNET-onnx-backbone",
filename="birdnet_backbone.onnx",
)
sess = ort.InferenceSession(path)
(embedding,) = sess.run(["embedding"], {"input": audio})
print(embedding.shape) # (1, 1024)
Extraction procedure
The extraction and testing procedure can be reproduced using extract_backbone.py. The script will:
- Download
model.onnxandbirdnet.onnxfrom justinchuby/BirdNET-onnx. - Download the BirdNET v2.4 TF SavedModel from Zenodo (BirdNET_v2.4_protobuf).
- Extract the backbone subgraph (everything up to and including the
model/GLOBAL_AVG_POOL/Mean_reduced_0node), renaming the output toembedding. - Save
model_backbone.onnxandbirdnet_backbone.onnx. - Run a numerical comparison between ONNX and TF SavedModel embeddings on a fixed random waveform (seed 42, 3 s at 48 kHz).
Expected output:
=== Downloading models ===
Downloaded model.onnx -> ...
Downloaded birdnet.onnx -> ...
Downloading BirdNET protobuf from Zenodo...
Extracted audio-model -> ...
=== Extracting backbones ===
Backbone saved -> model_backbone.onnx
inputs : ['INPUT']
outputs: ['embedding']
Backbone saved -> birdnet_backbone.onnx
inputs : ['input']
outputs: ['embedding']
=== Comparing embeddings against Zenodo TF SavedModel ===
PB embedding shape: (1, 1024)
model_backbone.onnx:
ONNX embedding shape: (1, 1024)
|diff| mean=1.230468e-06 max=9.298325e-06
Embeddings match PB reference with rtol=1e-03, atol=1e-03 PASSED
birdnet_backbone.onnx:
ONNX embedding shape: (1, 1024)
|diff| mean=6.440870e-05 max=5.004406e-04
Embeddings match PB reference with rtol=1e-03, atol=1e-03 PASSED
How extraction works
The _extract function in extract_backbone.py performs a backwards BFS from the
model/GLOBAL_AVG_POOL/Mean_reduced_0 output node (the global average pool), collecting
every node that contributes to that output and discarding everything downstream (the
classification dense layer). The output tensor is then renamed to embedding. It then
rebuilds a minimal ONNX graph containing only the retained nodes and their initializers.
Credits
- Original ONNX conversion: justinchuby/BirdNET-onnx
- BirdNET Team
Model tree for biodiversica/BirdNET-onnx-backbone
Base model
justinchuby/BirdNET-onnx