Has anyone managed to run ComfyUI + GGUF in ZeroGPU Pro hosting Space?
Zero GPU space is more like “CPU space with occasional GPU availability” than dedicated GPU space, making it difficult to run ComfyUI itself. Furthermore, the constraint that it can only be used as Zero GPU space if it’s a Gradio space makes it even more challenging. (ComfyUI’s GUI is not Gradio.)
It might be usable if the workflow is converted to Python first…
You generally cannot run the actual ComfyUI web app (the long-running UI server) on ZeroGPU, even with Pro. The reason is structural: ZeroGPU is Gradio-only and “GPU-per-function-call”, while ComfyUI is a persistent server process that expects a GPU to exist continuously.
What ZeroGPU actually is
ZeroGPU is “serverless GPU” for Spaces:
- The Space only gets a GPU while a decorated function runs.
- After the function finishes, the GPU is released.
- You opt into this by decorating GPU work with
@spaces.GPU(...). - There is an explicit duration budget per call (default 60s, configurable). (Hugging Face)
ZeroGPU also has a hard compatibility constraint:
- “ZeroGPU Spaces are exclusively compatible with the Gradio SDK.” (Hugging Face)
- The ZeroGPU Explorers page repeats the same: “only works with the Gradio SDK.” (Hugging Face)
Pro affects quota and queue priority, not the runtime model:
- Pro users get “x7 more daily usage quota and highest priority in GPU queues.” (Hugging Face)
Why ComfyUI conflicts with ZeroGPU
ComfyUI, when you “run ComfyUI,” is a web server that stays up, maintains a job queue, holds models in VRAM, and runs inference over time. That deployment style usually maps to a Docker Space or a VM.
But:
- ZeroGPU is not available for Docker SDK Spaces. Hugging Face staff state this directly in the ZeroGPU Explorers discussions: “unfortunately ZeroGPU is not available for Docker SDK Spaces.” (Hugging Face)
- The official ComfyUI Space example on Hugging Face is a Docker-based setup (CUDA base image, apt installs, etc.), which is exactly the type of Space ZeroGPU does not support. (Hugging Face)
So the common failure pattern is:
- You try to host ComfyUI in Docker → ZeroGPU cannot be selected or does nothing → ComfyUI starts without a usable CUDA GPU → “No CUDA GPUs are available” or similar.
Where GGUF fits (and why it adds friction)
“ComfyUI + GGUF” usually means using ComfyUI-GGUF, a custom node pack that loads GGUF-quantized diffusion/DiT models (Flux, SD 3.5, etc.).
Key points from the project itself:
- It is “very much WIP.”
- It relies on custom ops support in ComfyUI and wants a “recent-enough” ComfyUI.
- Installation is “pip install --upgrade gguf”.
- You place
.ggufmodels underComfyUI/models/unetand swap in the “Unet Loader (GGUF)” node. (GitHub)
That is very doable in a Docker + dedicated GPU Space. It is much less reliable in a serverless, Gradio-only environment.
What does work on Hugging Face today
Option A: Full ComfyUI UI + GGUF (recommended if you need the real UI)
Use a Docker Space with a dedicated GPU upgrade (L4, A10G, A100, etc.), not ZeroGPU.
- Docker Spaces are explicitly meant for custom servers and non-Gradio apps. (Hugging Face)
- Hugging Face provides a ComfyUI Docker Space example you can fork and adapt. (Hugging Face)
- Then add ComfyUI-GGUF and models exactly as the repo describes. (GitHub)
Option B: “ComfyUI workflow” on ZeroGPU (not the ComfyUI UI)
If your goal is “run this workflow and expose a simple interface,” Hugging Face’s official approach is:
- Export the ComfyUI workflow to Python.
- Wrap it in a Gradio app.
- Run inference inside
@spaces.GPU(...).
This is documented step-by-step in Hugging Face’s blog post on running ComfyUI workflows on ZeroGPU. (Hugging Face)
Important nuance:
- This runs a workflow, not the ComfyUI web UI.
- You also typically move model initialization to global scope to avoid re-loading every call. (Hugging Face)
GGUF in this setup is “maybe, but expect work”:
- In principle you can include custom nodes by folding their requirements into the Space and importing them once. (Hugging Face)
- In practice, ComfyUI-GGUF’s “custom ops / recent ComfyUI” requirements can collide with ZeroGPU’s constrained runtime and the need to keep execution inside the decorated call. (GitHub)
Quick checklist to confirm what’s going wrong
-
If your Space uses Docker (
sdk: docker): ZeroGPU is not supported there. That alone explains it. (Hugging Face) -
If your Space is Gradio + ZeroGPU:
- Any GPU work must be inside
@spaces.GPU. - If you see “No CUDA GPUs are available,” you are likely calling CUDA code outside the decorated function or starting a persistent server that never “enters” the GPU context. (Hugging Face)
- Any GPU work must be inside
-
If your inference exceeds the call budget: raise
duration=...on the decorator, but remember this is still “per call” serverless execution. (Hugging Face)
Links (in one place)
ZeroGPU docs (compatibility + how GPU is allocated): https://huggingface.co/docs/hub/en/spaces-zerogpu
ZeroGPU Explorers (usage + duration): https://huggingface.co/zero-gpu-explorers
ZeroGPU not available for Docker (HF staff reply): https://huggingface.co/spaces/zero-gpu-explorers/README/discussions/42
HF blog: run ComfyUI workflows on ZeroGPU via Gradio: https://huggingface.co/blog/run-comfyui-workflows-on-spaces
ComfyUI Docker Space example: https://huggingface.co/spaces/SpacesExamples/ComfyUI/blob/main/Dockerfile
ComfyUI-GGUF repo: https://github.com/city96/ComfyUI-GGUF
Summary
- ZeroGPU is Gradio-only and GPU-per-function-call. (Hugging Face)
- Full ComfyUI UI is a persistent server, usually deployed as Docker. ZeroGPU does not support Docker. (Hugging Face)
- Best path for ComfyUI + GGUF is Docker Space + paid GPU. (Hugging Face)
- Best path for ZeroGPU is workflow-to-Gradio, not running the ComfyUI UI. (Hugging Face)