BBox coordinates in PaddleOCR-VL JSON don’t match PDF crop — how to correctly map/convert coordinates?

#8
by sogm1 - opened

Hi, thanks for releasing PaddleOCR-VL — the parsing quality is great.

When I parse a PDF with PaddleOCR-VL (Model A), the output JSON includes bounding boxes (bbox). However, when I try to crop the PDF using those bbox coordinates (via pdfplumber), the cropped regions do not match the actual object positions(like table, figure)
What coordinate system does PaddleOCR-VL use for bbox in the JSON output?

That's how i call paddle pipeline:
pipeline = PaddleOCRVL(
pipeline_version="v1",
device="gpu:0",
use_layout_detection=True,
use_doc_orientation_classify=True,
use_doc_unwarping=True,
)

maybe "use_doc_unwarping=True" occurs this result.

Is there an official / recommended way to convert PaddleOCR-VL bbox to PDF page coordinates for accurate cropping in use_doc_unwarping?

i will hope to get u guys reply
thank u

PaddlePaddle org

Hi,@sogm1 when use_doc_unwarping is set to True, the image pixels will be shifted, which causes the output coordinates to no longer correspond to the original image. You need to set use_doc_unwarping to False.

Sign up or log in to comment