Intent Classifier (ruBERT-tiny2)
Fine-tuned cointegrated/rubert-tiny2 for classifying Russian chatbot messages into 3 intents.
Use Case
RAG (Retrieval-Augmented Generation) chatbots need to classify user messages before processing:
- rag - user wants to search documents / knowledge base
- chat - greeting, small talk, bot questions
- followup - clarification of previous answer
This model replaces LLM API calls (300-2000ms, ~$0.001/req) with local inference (3.7ms, $0).
Results
| Class | Precision | Recall | F1 |
|---|---|---|---|
| rag | 0.94 | 0.98 | 0.96 |
| chat | 0.87 | 0.90 | 0.88 |
| followup | 0.86 | 0.73 | 0.79 |
| Overall | 0.90 |
Quick Start (ONNX)
import numpy as np
import onnxruntime as ort
from transformers import AutoTokenizer
session = ort.InferenceSession("model.onnx")
tokenizer = AutoTokenizer.from_pretrained("Gleckus/intent-classifier-rubert-tiny2")
LABELS = ["rag", "chat", "followup"]
def classify(text):
inputs = tokenizer(text, return_tensors="np", padding="max_length", truncation=True, max_length=128)
outputs = session.run(None, {"input_ids": inputs["input_ids"], "attention_mask": inputs["attention_mask"]})
probs = np.exp(outputs[0][0]) / np.exp(outputs[0][0]).sum()
return LABELS[np.argmax(probs)], float(probs.max())
label, conf = classify("какие условия возврата?")
print(f"{label} ({conf:.1%})") # rag (95.2%)
Training
- Base model: cointegrated/rubert-tiny2 (29M params)
- Dataset: 2,877 synthetic examples (template-based + augmented)
- Training: 5 epochs, batch 32, lr 2e-5, Google Colab T4 GPU
- Export: ONNX format, ~111MB
Links
- GitHub Repository - full code, dataset, documentation
Evaluation results
- F1 (weighted)self-reported0.900
- Accuracyself-reported0.900