Intent Classifier (ruBERT-tiny2)

Fine-tuned cointegrated/rubert-tiny2 for classifying Russian chatbot messages into 3 intents.

Use Case

RAG (Retrieval-Augmented Generation) chatbots need to classify user messages before processing:

rag - user wants to search documents / knowledge base
chat - greeting, small talk, bot questions
followup - clarification of previous answer

This model replaces LLM API calls (300-2000ms, ~$0.001/req) with local inference (3.7ms, $0).

Results

Class	Precision	Recall	F1
rag	0.94	0.98	0.96
chat	0.87	0.90	0.88
followup	0.86	0.73	0.79
Overall			0.90

Quick Start (ONNX)

import numpy as np
import onnxruntime as ort
from transformers import AutoTokenizer

session = ort.InferenceSession("model.onnx")
tokenizer = AutoTokenizer.from_pretrained("Gleckus/intent-classifier-rubert-tiny2")
LABELS = ["rag", "chat", "followup"]

def classify(text):
    inputs = tokenizer(text, return_tensors="np", padding="max_length", truncation=True, max_length=128)
    outputs = session.run(None, {"input_ids": inputs["input_ids"], "attention_mask": inputs["attention_mask"]})
    probs = np.exp(outputs[0][0]) / np.exp(outputs[0][0]).sum()
    return LABELS[np.argmax(probs)], float(probs.max())

label, conf = classify("какие условия возврата?")
print(f"{label} ({conf:.1%})")  # rag (95.2%)

Training

Base model: cointegrated/rubert-tiny2 (29M params)
Dataset: 2,877 synthetic examples (template-based + augmented)
Training: 5 epochs, batch 32, lr 2e-5, Google Colab T4 GPU
Export: ONNX format, ~111MB

Evaluation results

F1 (weighted)
self-reported

0.900
Accuracy
self-reported

0.900