🎵 ACE‑Step
ACE‑Step ("A Step Towards Music Generation Foundation Model") is an open‑source, Apache‑2.0 licensed music foundation model co‑developed by ACE Studio & StepFun, designed to generate high‑quality, fully‑coherent original music from text and lyrics in under 20 seconds for tracks up to 4 minutes using optimized architectures.
🔍 Key Features
- 📈 Diffusion + DCAE + Linear Transformer
Combines diffusion‑based generation, Sana’s Deep Compression AutoEncoder, and a lightweight linear transformer for optimal quality and speed - ⚡ Blazing fast
Generates ~4 min of music in ~20 s on A100 GPU—≈15× faster than LLM‑based alternatives - 🎼 High musical coherence
Retains melody, harmony, and rhythm details with superb lyric alignment - 🎤 Advanced controllability
Offers voice cloning, lyric editing, remixing, singing ↔ accompaniment, and targeted repainting/remixing - 🌍 Multilingual support
Handles 19+ languages (incl. Chinese, English, Japanese, Korean, Spanish, Russian) - 🛠 Extensible with LoRA & ControlNet
Enables fine-tuning for specific voices/styles and downstream tasks
💡 Why It Matters
ACE‑Step solves the long-standing triad trade-off in AI music generation: speed, coherence, and control. Its foundation‑model design makes it ideal for building high-level tools without reinventing core audio synthesis.
⚙️ Installation & Usage
-
Clone the repo
git clone https://github.com/ace-step/ACE-Step.git
cd ACE-Step
-
Set up environment
Supports Python 3.10+; use venv or Conda -
Install dependencies & models per README
-
Run inference to generate music from text/lyrics prompts
Supports local GPU usage (~14 GB VRAM) or ComfyUI integration workflows.
🧩 ComfyUI Integration
- Now natively supported in ComfyUI—import workflow, load
ace_step_v1_3.5b.safetensors
, and generate audio directly - Community plugins like billwuhao/ComfyUI_ACE‑Step add multi‑language lyric support and advanced editing nodes
📎 Links
- GitHub (code & weights): https://github.com/ace-step/ACE-Step
- Official site & demo: https://ace-step.github.io
- Paper (ArXiv): ACE‑Step: A Step Towards Music Generation Foundation Model — https://arxiv.org/abs/2506.00045
- ComfyUI Workflow: via ComfyUI native nodes and plugins
🎯 Use Cases
- Generate full songs (lyrics, melody, accompaniment) from prompts
- Edit lyrics or music sections with repainting & remixing
- Clone voice style or adjust instrumentation with fine-tuning
- Combine text-to-music pipelines for content creation, songwriting, game audio, etc.
ACE‑Step sets a new benchmark for AI‑powered music creation—with high fidelity, ultra speed, and deep control, all in an open‑source foundation ideal for innovation.