Intent Classifier (ruBERT-tiny2)

Fine-tuned cointegrated/rubert-tiny2 for classifying Russian chatbot messages into 3 intents.

Use Case

RAG (Retrieval-Augmented Generation) chatbots need to classify user messages before processing:

  • rag - user wants to search documents / knowledge base
  • chat - greeting, small talk, bot questions
  • followup - clarification of previous answer

This model replaces LLM API calls (300-2000ms, ~$0.001/req) with local inference (3.7ms, $0).

Results

Class Precision Recall F1
rag 0.94 0.98 0.96
chat 0.87 0.90 0.88
followup 0.86 0.73 0.79
Overall 0.90

Quick Start (ONNX)

import numpy as np
import onnxruntime as ort
from transformers import AutoTokenizer

session = ort.InferenceSession("model.onnx")
tokenizer = AutoTokenizer.from_pretrained("Gleckus/intent-classifier-rubert-tiny2")
LABELS = ["rag", "chat", "followup"]

def classify(text):
    inputs = tokenizer(text, return_tensors="np", padding="max_length", truncation=True, max_length=128)
    outputs = session.run(None, {"input_ids": inputs["input_ids"], "attention_mask": inputs["attention_mask"]})
    probs = np.exp(outputs[0][0]) / np.exp(outputs[0][0]).sum()
    return LABELS[np.argmax(probs)], float(probs.max())

label, conf = classify("какие условия возврата?")
print(f"{label} ({conf:.1%})")  # rag (95.2%)

Training

  • Base model: cointegrated/rubert-tiny2 (29M params)
  • Dataset: 2,877 synthetic examples (template-based + augmented)
  • Training: 5 epochs, batch 32, lr 2e-5, Google Colab T4 GPU
  • Export: ONNX format, ~111MB

Links

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Evaluation results