Fix missing generate method by inheriting from GenerationMixin
This PR resolves the AttributeError: 'Florence2LanguageForConditionalGeneration' object has no attribute 'generate' error by adding GenerationMixin to the class inheritance.
Changes:
- Added
from transformers.generation.utils import GenerationMixinimport - Updated
Florence2LanguageForConditionalGenerationto inherit from bothFlorence2LanguagePreTrainedModelandGenerationMixin
This change addresses the deprecation warning in transformers v4.50+ where PreTrainedModel no longer inherits from GenerationMixin by default.
Source: Cherry-picked from Microsoft's official Florence-2-base repository: microsoft/Florence-2-base@2a2d45e
Hi, were you able to get this running properly? I've been unable to get OCR working on macOS (Apple Silicon). Thank you for any advice
I was able to get it running in a Colab notebook. Feel free to take a look: https://colab.research.google.com/drive/1qJWddZQwktmFgTZHi9SrVzCT0EWCgEan?usp=sharing
Hello. Is there any way for a model to output consistent character_cluster_labels throughout multiple pages? (right now, each character cluster label is only connected to a single page, so it is impossible to track one character throught several pages with predict_detections_and_associations).
Thank you for any advice
Unfortunately not. Magi v3 uses Florence-2 as its base (an encoder-decoder transformer). The character clustering is done via the character_character_affinity_matrices which come from the association heads. Internally it iterates over decoder_hidden_states in the get_character_character_affinity_matrices function which has one entry per image in the batch. For each image, it extracts the character embeddings from that image's decoder output and computes affinities between characters within that same image.
This is different from Magi v2's approach where:
- You could call
do_chapter_wide_predictionwith multiple pages - It would use the crop embedder (ViT-MAE embeddings) to get embeddings for all characters across all pages
- Then apply chapter-wide clustering using those embeddings
- Leveraging constraints like must-link within page clusters and cannot-link across non-matching page clusters
One solution would be to use v3 for per-page clustering, then use the v2's crop embedder to match clusters to the character bank (which implicitly gives cross-page identity through e.g. shared reference images). If every detection of a character would be a node in a graph and every edge would be the similarity from the crop embedder, and the per-page clustering would connect the nodes from every page together, then creating a minimal spanning tree would connect the same person together from all pages.
By the way, I improved the pipeline for magi v3 quite a bit from the previous notebook. The last one contained some bugs.
https://colab.research.google.com/drive/1zoGdtbcPtwCS5QRMcZ7U9PhCq37pDr01?usp=sharing