Fix missing generate method by inheriting from GenerationMixin

by Sematre - opened Jul 30, 2025

base: refs/heads/main

←

from: refs/pr/2

Discussion Files changed

-1

Inherit from GenerationMixin (#22)b68bd909

Sematre

Jul 30, 2025

No description provided.

Sematre changed pull request title from inherit-generation-mixin to Fix missing generate method by inheriting from GenerationMixin Jul 30, 2025

Sematre changed pull request title from Fix missing generate method by inheriting from GenerationMixin to Fix missing `generate` method by inheriting from GenerationMixin Jul 30, 2025

Sematre changed pull request title from Fix missing `generate` method by inheriting from GenerationMixin to Fix missing generate method by inheriting from GenerationMixin Jul 30, 2025

Sematre

Jul 30, 2025

This PR resolves the AttributeError: 'Florence2LanguageForConditionalGeneration' object has no attribute 'generate' error by adding GenerationMixin to the class inheritance.

Changes:

Added from transformers.generation.utils import GenerationMixin import
Updated Florence2LanguageForConditionalGeneration to inherit from both Florence2LanguagePreTrainedModel and GenerationMixin

This change addresses the deprecation warning in transformers v4.50+ where PreTrainedModel no longer inherits from GenerationMixin by default.

Source: Cherry-picked from Microsoft's official Florence-2-base repository: microsoft/Florence-2-base@2a2d45e

Sematre changed pull request status to open Jul 30, 2025

whnf

Nov 28, 2025

Hi, were you able to get this running properly? I've been unable to get OCR working on macOS (Apple Silicon). Thank you for any advice

Sematre

Nov 30, 2025

I was able to get it running in a Colab notebook. Feel free to take a look: https://colab.research.google.com/drive/1qJWddZQwktmFgTZHi9SrVzCT0EWCgEan?usp=sharing

skypan322

8 days ago

Hello. Is there any way for a model to output consistent character_cluster_labels throughout multiple pages? (right now, each character cluster label is only connected to a single page, so it is impossible to track one character throught several pages with predict_detections_and_associations).
Thank you for any advice

Sematre

4 days ago

•

edited 4 days ago

Unfortunately not. Magi v3 uses Florence-2 as its base (an encoder-decoder transformer). The character clustering is done via the character_character_affinity_matrices which come from the association heads. Internally it iterates over decoder_hidden_states in the get_character_character_affinity_matrices function which has one entry per image in the batch. For each image, it extracts the character embeddings from that image's decoder output and computes affinities between characters within that same image.

This is different from Magi v2's approach where:

You could call do_chapter_wide_prediction with multiple pages
It would use the crop embedder (ViT-MAE embeddings) to get embeddings for all characters across all pages
Then apply chapter-wide clustering using those embeddings
Leveraging constraints like must-link within page clusters and cannot-link across non-matching page clusters

One solution would be to use v3 for per-page clustering, then use the v2's crop embedder to match clusters to the character bank (which implicitly gives cross-page identity through e.g. shared reference images). If every detection of a character would be a node in a graph and every edge would be the similarity from the crop embedder, and the per-page clustering would connect the nodes from every page together, then creating a minimal spanning tree would connect the same person together from all pages.

Sematre

4 days ago

By the way, I improved the pipeline for magi v3 quite a bit from the previous notebook. The last one contained some bugs.

https://colab.research.google.com/drive/1zoGdtbcPtwCS5QRMcZ7U9PhCq37pDr01?usp=sharing

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Ready to merge

This branch is ready to get merged automatically.

· Sign up or log in to comment