Crafter: A Multi-Agent Harness for Editable Scientific Figure Generation from Diverse Inputs
Abstract
Automated systems for generating scientific figures face limitations in handling diverse figure types and conditions, prompting the development of multi-agent frameworks that generalize across different input scenarios and produce editable output formats.
Scientific figures are among the most effective means of communicating complex research ideas, yet producing publication-quality illustrations remains one of the most labor-intensive parts of paper preparation. Existing automated systems each target a single figure type under text-only input, leaving the diversity of types and conditions researchers actually use unaddressed; their raster outputs further cannot be locally revised. Because scientific figures are structured compositions of discrete semantic components, the localized errors generators produce on such layouts demand not a stronger backbone but a harness. We instantiate this harness in two complementary systems: Crafter, a multi-agent harness for figure generation that generalizes across figure types and input conditions without architectural changes, and CraftEditor, which applies the same pattern to convert raster outputs into editable SVGs. Moreover, we introduce CraftBench, a benchmark spanning three figure types and four input conditions with human quality annotation. Experiments show that Crafter substantially outperforms both standalone generators and the agentic baseline on PaperBanana-Bench and CraftBench, with ablations confirming each component's independent contribution; CraftEditor faithfully converts outputs into editable SVGs that surpass all baselines. Our code and benchmark are available at https://github.com/HaozheZhao/Crafter.
Community
Crafter is a multi-agent system for generating publication-quality scientific figures across diverse types and conditions, with CraftEditor turning raster outputs into editable SVGs and CraftBench for evaluation.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- LiveFigure: Generating Editable Scientific Illustration with VLM Agents (2026)
- VCG-Bench: Towards A Unified Visual-Centric Benchmark for Structured Generation and Editing (2026)
- Large Language Models are Universal Reasoners for Visual Generation (2026)
- CV-Arena: An Open Benchmark for Instructional Computer Vision Problem Solving with Human-AI Collaborative Preferences (2026)
- GenEvolve: Self-Evolving Image Generation Agents via Tool-Orchestrated Visual Experience Distillation (2026)
- Generation Navigator: A State-Aware Agentic Framework for Image Generation (2026)
- See Before You Code: Learning Visual Priors for Spatially Aware Educational Animation Generation (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend
the most interesting bit for me is the shared memory s and the four-role loop (d, e, v, r) that keeps edits structured instead of chasing endless prompts. encoding typed edits into a structured memory and letting a critic gate candidate plans lets the system preserve consistency across rounds while still exploring diversity, and the fact you can swap in stronger backends without architectural changes feels very practical. i am curious how you handle conflicting constraints when different plans propose different edits to the same element—does the critic resolve that by edit distance, or is there a higher-priority rule? crafteditor's extraction-processing-composition pipeline to produce editable svg from raster is clever, but i worry about error accumulation in the vector layer if the raster has occlusions or ambiguous labels. btw arxivlens had a solid walkthrough that covers this pattern well, https://arxivlens.com/PaperView/Details/crafter-a-multi-agent-harness-for-editable-scientific-figure-generation-from-diverse-inputs-2710-cc1998e6
Get this paper in your agent:
hf papers read 2605.30611 Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper