🚀 FramePack
FramePack is an open-source video diffusion framework developed by Lvmin Zhang (lllyasviel) at Stanford, designed to make high-quality video generation efficient and practical—even on consumer-grade GPUs.
🔍 Key Features
- Next-frame prediction: Generates video progressively, predicting one (or one section of) frame at a time using a novel neural network structure.
- Context packing: Compresses historical frames into a fixed-length context so GPU workload doesn't increase with video length.
- Low VRAM requirement: Capable of generating 60‑second, 30 fps videos on just 6 GB of VRAM—even on laptop GPUs (e.g. RTX 30 series).
- Anti-drifting sampling: Uses bidirectional sampling or reverse generation to combat drift and maintain visual coherence over long videos.
- Scalable performance: Supports large-batch fine-tuning (e.g. 13B models) and runs at ~1.5–2.5s per frame on RTX 4090, with slower but usable speeds on laptop GPUs.
- Cross-platform & easy-to-use: Works on both Windows and Linux; offers a one‑click Windows installer and a Gradio GUI (
demo_gradio.py
,demo_gradio_f1.py
).
⚙️ Versions
- FramePack – the base model featuring bi-directional context packing.
- FramePack‑F1 – forward-only “F1” variant emphasizing dynamic motion, with anti‑drifting regulation added.
- FramePack‑P1 – next-gen “Planned Anti-Drifting” and “History Discretization” enhancements for ultra-long, high-coherence generation.
📥 Installation & Usage
Windows:
- Download one‑click package (CUDA 12.6 & PyTorch 2.6 included).
- Extract, run
update.bat
, thenrun.bat
to launch.
Linux:
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu126
pip install -r requirements.txt
python demo_gradio.py # or demo_gradio_f1.py for F1