Instructions to use black-forest-labs/FLUX.2-dev with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use black-forest-labs/FLUX.2-dev with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline from diffusers.utils import load_image # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("black-forest-labs/FLUX.2-dev", dtype=torch.bfloat16, device_map="cuda") prompt = "Turn this cat into a dog" input_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/cat.png") image = pipe(image=input_image, prompt=prompt).images[0] - Diffusion Single File
How to use black-forest-labs/FLUX.2-dev with Diffusion Single File:
# No code snippets available yet for this library. # To use this model, check the repository files and the library's documentation. # Want to help? PRs adding snippets are welcome at: # https://github.com/huggingface/huggingface.js
- Inference
- Notebooks
- Google Colab
- Kaggle
OOM on 96GB H20 with FLUX.2-dev (BF16) - Is 96GB not enough?
First off, congrats on the FLUX.2 release. The performance and prompt adherence are truly next-level—it's a huge step forward for the community.
I'm currently trying to run the BF16 version on a single NVIDIA H20 (96GB VRAM) but met CUDA Out of Memory. Is 96GB VRAM officially insufficient for FLUX.2, or is there a specific peak memory spike during loading that I should optimize for? Any recommended settings to force it into a single 96GB card without OOM?
Here's how i load the FLUX.2-dev (BF16)
Thanks again for the incredible work.
Hey, I had a similar mistake, and I'm documenting it here. I hope it helps you: https://huggingface.co/black-forest-labs/FLUX.2-dev/discussions/35
I had this problem trying to use 2x 48gb cards with NVLink; the issue is often that the text encoder is a full LLM unto itself, which is blowing out the VRAM
I used the 60 GB flux2-dev.safetensors in Comfy UI with 96 GB RAM and a 5070 TI (16 GB of VRAM). I used the standard template "Flux.2 Dev Text to Image" and replaced the 30 GB FP8 version (flux2_dev_fp8mixed.safetensors) with the full 60 GB file. It works barely but it works. RAM and VRAM are almost full and it takes 2 - 4 minutes for a 1280 x 720 image. But the precise and mostly instantly successful results are worth it. The 30 GB FP8 file is faster and takes up much less RAM. Even that file delivers really awesome results.