Text Classification
sentence-transformers
ONNX
Safetensors
English
modernbert
cross-encoder
sts
stsb
stsbenchmark-sts
Eval Results (legacy)
text-embeddings-inference
Instructions to use dleemiller/ModernCE-base-sts with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use dleemiller/ModernCE-base-sts with sentence-transformers:
from sentence_transformers import CrossEncoder model = CrossEncoder("dleemiller/ModernCE-base-sts") query = "Which planet is known as the Red Planet?" passages = [ "Venus is often called Earth's twin because of its similar size and proximity.", "Mars, known for its reddish appearance, is often referred to as the Red Planet.", "Jupiter, the largest planet in our solar system, has a prominent red spot.", "Saturn, famous for its rings, is sometimes mistaken for the Red Planet." ] scores = model.predict([(query, passage) for passage in passages]) print(scores) - Notebooks
- Google Colab
- Kaggle
| license: mit | |
| datasets: | |
| - dleemiller/wiki-sim | |
| - sentence-transformers/stsb | |
| language: | |
| - en | |
| metrics: | |
| - spearmanr | |
| - pearsonr | |
| base_model: | |
| - answerdotai/ModernBERT-base | |
| pipeline_tag: text-classification | |
| library_name: sentence-transformers | |
| tags: | |
| - cross-encoder | |
| - modernbert | |
| - sts | |
| - stsb | |
| - stsbenchmark-sts | |
| model-index: | |
| - name: CrossEncoder based on answerdotai/ModernBERT-base | |
| results: | |
| - task: | |
| type: semantic-similarity | |
| name: Semantic Similarity | |
| dataset: | |
| name: sts test | |
| type: sts-test | |
| metrics: | |
| - type: pearson_cosine | |
| value: 0.9162245947821821 | |
| name: Pearson Cosine | |
| - type: spearman_cosine | |
| value: 0.9121555789491528 | |
| name: Spearman Cosine | |
| - task: | |
| type: semantic-similarity | |
| name: Semantic Similarity | |
| dataset: | |
| name: sts dev | |
| type: sts-dev | |
| metrics: | |
| - type: pearson_cosine | |
| value: 0.9260833551026787 | |
| name: Pearson Cosine | |
| - type: spearman_cosine | |
| value: 0.9236030687487745 | |
| name: Spearman Cosine | |
| # ModernBERT Cross-Encoder: Semantic Similarity (STS) | |
| Cross encoders are high performing encoder models that compare two texts and output a 0-1 score. | |
| I've found the `cross-encoders/roberta-large-stsb` model to be very useful in creating evaluators for LLM outputs. | |
| They're simple to use, fast and very accurate. | |
| Like many people, I was excited about the architecture and training uplift from the ModernBERT architecture (`answerdotai/ModernBERT-base`). | |
| So I've applied it to the stsb cross encoder, which is a very handy model. Additionally, I've added | |
| pretraining from a much larger semi-synthetic dataset `dleemiller/wiki-sim` that targets this kind of objective. | |
| The inference performance efficiency, expanded context and simplicity make this a really nice platform as an evaluator model. | |
| --- | |
| ## Features | |
| - **High performing:** Achieves **Pearson: 0.9162** and **Spearman: 0.9122** on the STS-Benchmark test set. | |
| - **Efficient architecture:** Based on the ModernBERT-base design (149M parameters), offering faster inference speeds. | |
| - **Extended context length:** Processes sequences up to 8192 tokens, great for LLM output evals. | |
| - **Diversified training:** Pretrained on `dleemiller/wiki-sim` and fine-tuned on `sentence-transformers/stsb`. | |
| --- | |
| ## Performance | |
| | Model | STS-B Test Pearson | STS-B Test Spearman | Context Length | Parameters | Speed | | |
| |--------------------------------|--------------------|---------------------|----------------|------------|---------| | |
| | `dleemiller/ModernCE-large-sts` | **0.9256** | **0.9215** | **8192** | 395M | **Medium** | | |
| | `dleemiller/CrossGemma-sts-300m` | 0.9175 | 0.9135 | 2048 | 303M | **Medium** | | |
| | `dleemiller/ModernCE-base-sts` | 0.9162 | 0.9122 | **8192** | 149M | **Fast** | | |
| | `cross-encoder/stsb-roberta-large` | 0.9147 | - | 512 | 355M | Slow | | |
| | `dleemiller/EttinX-sts-m` | 0.9143 | 0.9102 | **8192** | 149M | **Fast** | | |
| | `dleemiller/NeoCE-sts` | 0.9124 | 0.9087 | 4096 | 250M | **Fast** | | |
| | `dleemiller/EttinX-sts-s` | 0.9004 | 0.8926 | **8192** | 68M | **Very Fast** | | |
| | `cross-encoder/stsb-distilroberta-base` | 0.8792 | - | 512 | 82M | Fast | | |
| | `dleemiller/EttinX-sts-xs` | 0.8763 | 0.8689 | **8192** | 32M | **Very Fast** | | |
| | `dleemiller/EttinX-sts-xxs` | 0.8414 | 0.8311 | **8192** | 17M | **Very Fast** | | |
| | `dleemiller/sts-bert-hash-nano` | 0.7904 | 0.7743 | **8192** | 0.97M | **Very Fast** | | |
| | `dleemiller/sts-bert-hash-pico` | 0.7595 | 0.7474 | **8192** | 0.45M | **Very Fast** | | |
| --- | |
| ## Usage | |
| To use ModernCE for semantic similarity tasks, you can load the model with the Hugging Face `sentence-transformers` library: | |
| ```python | |
| from sentence_transformers import CrossEncoder | |
| # Load ModernCE model | |
| model = CrossEncoder("dleemiller/ModernCE-base-sts") | |
| # Predict similarity scores for sentence pairs | |
| sentence_pairs = [ | |
| ("It's a wonderful day outside.", "It's so sunny today!"), | |
| ("It's a wonderful day outside.", "He drove to work earlier."), | |
| ] | |
| scores = model.predict(sentence_pairs) | |
| print(scores) # Outputs: array([0.9184, 0.0123], dtype=float32) | |
| ``` | |
| ### Output | |
| The model returns similarity scores in the range `[0, 1]`, where higher scores indicate stronger semantic similarity. | |
| --- | |
| ## Training Details | |
| ### Pretraining | |
| The model was pretrained on the `pair-score-sampled` subset of the [`dleemiller/wiki-sim`](https://huggingface.co/datasets/dleemiller/wiki-sim) dataset. This dataset provides diverse sentence pairs with semantic similarity scores, helping the model build a robust understanding of relationships between sentences. | |
| - **Classifier Dropout:** a somewhat large classifier dropout of 0.3, to reduce overreliance on teacher scores. | |
| - **Objective:** STS-B scores from `cross-encoder/stsb-roberta-large`. | |
| ### Fine-Tuning | |
| Fine-tuning was performed on the [`sentence-transformers/stsb`](https://huggingface.co/datasets/sentence-transformers/stsb) dataset. | |
| ### Validation Results | |
| The model achieved the following test set performance after fine-tuning: | |
| - **Pearson Correlation:** 0.9162 | |
| - **Spearman Correlation:** 0.9122 | |
| --- | |
| ## Model Card | |
| - **Architecture:** ModernBERT-base | |
| - **Tokenizer:** Custom tokenizer trained with modern techniques for long-context handling. | |
| - **Pretraining Data:** `dleemiller/wiki-sim (pair-score-sampled)` | |
| - **Fine-Tuning Data:** `sentence-transformers/stsb` | |
| --- | |
| ## Thank You | |
| Thanks to the AnswerAI team for providing the ModernBERT models, and the Sentence Transformers team for their leadership in transformer encoder models. | |
| --- | |
| ## Citation | |
| If you use this model in your research, please cite: | |
| ```bibtex | |
| @misc{moderncestsb2025, | |
| author = {Miller, D. Lee}, | |
| title = {ModernCE STS: An STS cross encoder model}, | |
| year = {2025}, | |
| publisher = {Hugging Face Hub}, | |
| url = {https://huggingface.co/dleemiller/ModernCE-base-sts}, | |
| } | |
| ``` | |
| --- | |
| ## License | |
| This model is licensed under the [MIT License](LICENSE). |