BirdNET v2.4 ONNX Backbone

Backbone-only ONNX exports of the BirdNET v2.4 bird sound classifier. The classification head has been removed, leaving only frontend + feature-extraction.

Two variants are provided, matching the originals from justinchuby/BirdNET-onnx: model_backbone.onnx and birdnet_backbone.onnx. Both models output a single tensor named embedding with shape (1, 1024).

Embeddings are numerically verified against the reference TF SavedModel published on Zenodo (BirdNET_v2.4_protobuf).

Quick start

import numpy as np
import onnxruntime as ort
from huggingface_hub import hf_hub_download

# Download backbone
path = hf_hub_download(
    repo_id="biodiversica/BirdNET-onnx-backbone",
    filename="model_backbone.onnx",
)

sess = ort.InferenceSession(path)

# 3 s of audio at 48 kHz
audio = np.zeros((1, 144000), dtype=np.float32)
(embedding,) = sess.run(["embedding"], {"INPUT": audio})
print(embedding.shape)  # (1, 1024)

For birdnet_backbone.onnx the input key is "input" (lowercase):

path = hf_hub_download(
    repo_id="biodiversica/BirdNET-onnx-backbone",
    filename="birdnet_backbone.onnx",
)
sess = ort.InferenceSession(path)
(embedding,) = sess.run(["embedding"], {"input": audio})
print(embedding.shape)  # (1, 1024)

Extraction procedure

The extraction and testing procedure can be reproduced using extract_backbone.py. The script will:

Download model.onnx and birdnet.onnx from justinchuby/BirdNET-onnx.
Download the BirdNET v2.4 TF SavedModel from Zenodo (BirdNET_v2.4_protobuf).
Extract the backbone subgraph (everything up to and including the model/GLOBAL_AVG_POOL/Mean_reduced_0 node), renaming the output to embedding.
Save model_backbone.onnx and birdnet_backbone.onnx.
Run a numerical comparison between ONNX and TF SavedModel embeddings on a fixed random waveform (seed 42, 3 s at 48 kHz).

Expected output:

=== Downloading models ===
Downloaded model.onnx -> ...
Downloaded birdnet.onnx -> ...
Downloading BirdNET protobuf from Zenodo...
Extracted audio-model -> ...

=== Extracting backbones ===
Backbone saved -> model_backbone.onnx
  inputs : ['INPUT']
  outputs: ['embedding']
Backbone saved -> birdnet_backbone.onnx
  inputs : ['input']
  outputs: ['embedding']

=== Comparing embeddings against Zenodo TF SavedModel ===
PB embedding shape: (1, 1024)

model_backbone.onnx:
  ONNX embedding shape: (1, 1024)
  |diff| mean=1.230468e-06  max=9.298325e-06
  Embeddings match PB reference with rtol=1e-03, atol=1e-03  PASSED

birdnet_backbone.onnx:
  ONNX embedding shape: (1, 1024)
  |diff| mean=6.440870e-05  max=5.004406e-04
  Embeddings match PB reference with rtol=1e-03, atol=1e-03  PASSED

How extraction works

The _extract function in extract_backbone.py performs a backwards BFS from the model/GLOBAL_AVG_POOL/Mean_reduced_0 output node (the global average pool), collecting every node that contributes to that output and discarding everything downstream (the classification dense layer). The output tensor is then renamed to embedding. It then rebuilds a minimal ONNX graph containing only the retained nodes and their initializers.

Credits

Original ONNX conversion: justinchuby/BirdNET-onnx
BirdNET Team

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for biodiversica/BirdNET-onnx-backbone

Base model

justinchuby/BirdNET-onnx

Quantized

(1)

this model