Fix missing generate method by inheriting from GenerationMixin

#2
by Sematre - opened
No description provided.
Sematre changed pull request title from inherit-generation-mixin to Fix missing generate method by inheriting from GenerationMixin
Sematre changed pull request title from Fix missing generate method by inheriting from GenerationMixin to Fix missing `generate` method by inheriting from GenerationMixin
Sematre changed pull request title from Fix missing `generate` method by inheriting from GenerationMixin to Fix missing generate method by inheriting from GenerationMixin

This PR resolves the AttributeError: 'Florence2LanguageForConditionalGeneration' object has no attribute 'generate' error by adding GenerationMixin to the class inheritance.

Changes:

  • Added from transformers.generation.utils import GenerationMixin import
  • Updated Florence2LanguageForConditionalGeneration to inherit from both Florence2LanguagePreTrainedModel and GenerationMixin

This change addresses the deprecation warning in transformers v4.50+ where PreTrainedModel no longer inherits from GenerationMixin by default.

Source: Cherry-picked from Microsoft's official Florence-2-base repository: microsoft/Florence-2-base@2a2d45e

Sematre changed pull request status to open

Hi, were you able to get this running properly? I've been unable to get OCR working on macOS (Apple Silicon). Thank you for any advice

I was able to get it running in a Colab notebook. Feel free to take a look: https://colab.research.google.com/drive/1qJWddZQwktmFgTZHi9SrVzCT0EWCgEan?usp=sharing

Hello. Is there any way for a model to output consistent character_cluster_labels throughout multiple pages? (right now, each character cluster label is only connected to a single page, so it is impossible to track one character throught several pages with predict_detections_and_associations).
Thank you for any advice

Unfortunately not. Magi v3 uses Florence-2 as its base (an encoder-decoder transformer). The character clustering is done via the character_character_affinity_matrices which come from the association heads. Internally it iterates over decoder_hidden_states in the get_character_character_affinity_matrices function which has one entry per image in the batch. For each image, it extracts the character embeddings from that image's decoder output and computes affinities between characters within that same image.

This is different from Magi v2's approach where:

  • You could call do_chapter_wide_prediction with multiple pages
  • It would use the crop embedder (ViT-MAE embeddings) to get embeddings for all characters across all pages
  • Then apply chapter-wide clustering using those embeddings
  • Leveraging constraints like must-link within page clusters and cannot-link across non-matching page clusters

One solution would be to use v3 for per-page clustering, then use the v2's crop embedder to match clusters to the character bank (which implicitly gives cross-page identity through e.g. shared reference images). If every detection of a character would be a node in a graph and every edge would be the similarity from the crop embedder, and the per-page clustering would connect the nodes from every page together, then creating a minimal spanning tree would connect the same person together from all pages.

By the way, I improved the pipeline for magi v3 quite a bit from the previous notebook. The last one contained some bugs.

https://colab.research.google.com/drive/1zoGdtbcPtwCS5QRMcZ7U9PhCq37pDr01?usp=sharing

Ready to merge
This branch is ready to get merged automatically.

Sign up or log in to comment