# Local Anime Series Production Pipeline — Complete Setup Guide

> **Target hardware:** RTX 5070 Ti (16 GB VRAM) · Windows 11 · ComfyUI + Ollama + LM Studio already installed  
> **Goal:** Full episode pipeline — script → storyboard → character images → video → post-production — 100% local, zero cloud, zero paid models.

---

## 1. Recommended Stack Overview

| Stage | Tool | Purpose |
|-------|------|---------|
| Script Writing | Ollama + local LLM (already installed) | Generate episode scripts, scene breakdowns, narration |
| Character Design | ComfyUI + Animagine XL 3.1 or NoobAI XL | Create consistent character reference sheets at 1024×1024 |
| Storyboarding | ComfyUI + anime checkpoint + ControlNet | Generate scene-by-scene key frame images |
| Video Generation | ComfyUI + Wan 2.2 I2V A14B (GGUF Q4_K_S) | Animate storyboard frames into 3–10 s clips |
| Anime-style Motion | Wan 2.2 Animate (separate model) | Smooth line-art motion, consistent stylized output |
| Character Consistency | Stand-In LoRA (Kijai) + YOLO preprocessing | Lock character identity across all shots |
| Long Video Extension | FramePack F1 via ComfyUI-FramePackWrapper | Extend 5 s clips to 30–60 s with anti-drift technology |
| Speed Acceleration | LightX2V Lightning LoRA (4-step) | Reduce steps from 40 → 4–8 with minimal quality loss |
| Post-Production | ffmpeg + DaVinci Resolve (free tier) | Stitch clips, transitions, audio mix |

---

## 2. Hardware Reality Check — RTX 5070 Ti 16 GB

The 5070 Ti ([Hostrunway GPU guide](https://www.hostrunway.com/blog/ai-video-generation-2026-best-gpus-vram-guide-and-smart-setups-that-work/)) is rated at **1,406 AI TOPS (NVFP4)** with 896 GB/s memory bandwidth on 16 GB GDDR7. Here is what actually fits:

| Workload | VRAM Required | Fits on 5070 Ti? | Notes |
|----------|--------------|-----------------|-------|
| Animagine XL 3.1 (image gen) | ~6–8 GB | ✅ Comfortable | Full quality, SDXL base |
| Wan 2.2 I2V A14B at Q4_K_S GGUF | ~12 GB peak | ✅ Fits | Two models: ~8.2 GB high + 8.2 GB low, loaded sequentially |
| Wan 2.2 I2V A14B at FP16 | ~33 GB | ❌ OOM | Do not attempt without NVFP4 |
| Wan 2.2 TI2V 5B (compact) | ~8 GB | ✅ Easy | Good starting point |
| FramePack F1 | ~6 GB | ✅ Easy | Runs on 6 GB laptop GPUs |
| UMT5-XXL text encoder at Q5_K_M GGUF | ~4 GB | ✅ Easy | Use GGUF version only |

**NVFP4 acceleration:** The RTX 5070 Ti is a Blackwell-architecture card and supports ComfyUI's NVFP4 quantization, which can reduce VRAM usage by up to 60% and provide a ~2× speed boost vs FP8 ([ComfyUI NVFP4 blog post](https://blog.comfy.org/p/new-comfyui-optimizations-for-nvidia)). **Critical requirement:** you must be running PyTorch built with CUDA 13.0 (`cu130`). If your PyTorch build uses an older CUDA, NVFP4 models will run up to 2× *slower* — check this first.

**Recommended approach for Wan 2.2 14B:** Use GGUF Q4_K_S models from the [wangkanai/wan22-fp16-i2v-gguf](https://huggingface.co/wangkanai/wan22-fp16-i2v-gguf) repo. Both high and low noise experts are 8.2 GB each — they are loaded and unloaded sequentially during the two-stage sampling process, never simultaneously at full precision. Combined with the Q5_K_M UMT5 text encoder (~4 GB) from [city96/umt5-xxl-encoder-gguf](https://huggingface.co/city96/umt5-xxl-encoder-gguf), total VRAM pressure stays comfortably under 14 GB ([Latenode community guide](https://community.latenode.com/t/quick-video-ai-setup-for-budget-12gb-graphics-cards-wan-2-2-gguf-models-lightning-training-optimized-steps/33286)).

**Expected generation time on 5070 Ti:** A 5 s / 49-frame clip at 640p with 5 Lightning steps runs ~5 min on a 12 GB card ([Latenode community guide](https://community.latenode.com/t/quick-video-ai-setup-for-budget-12gb-graphics-cards-wan-2-2-gguf-models-lightning-training-optimized-steps/33286)). With 16 GB and ~30 s faster headroom, expect approximately 3–4 min per clip.

---

## 3. Step-by-Step Installation Guide

### 3a. ComfyUI — Clean Fresh Install

Since previous installs produced setup errors, **start completely fresh**. Remove or rename your old ComfyUI folder entirely. Do not try to patch it.

**Step 1: Download ComfyUI Portable for Windows**

Go to [github.com/comfyanonymous/ComfyUI](https://github.com/comfyanonymous/ComfyUI), scroll to Releases, and download the latest `ComfyUI_windows_portable_nvidia.7z` (or `.zip`). Extract to a drive with at least 200 GB free — **not** `C:\` — for example:

```
D:\ComfyUI\
```

The portable package bundles its own Python, pip, and CUDA libraries. You do not need to touch system Python.

**Step 2: First launch**

Double-click `run_nvidia_gpu.bat`. ComfyUI will start on `http://127.0.0.1:8188`. Confirm the browser opens and the default workflow loads with no errors.

**Step 3: Update ComfyUI immediately**

Run `update\update_comfyui.bat` (inside the portable folder). This pulls the latest ComfyUI code, which is required for NVFP4 support and Wan 2.2 native nodes.

**Step 4: Install ComfyUI Manager**

ComfyUI Manager is now bundled in current ComfyUI releases. If it is missing, install it manually:

```bat
cd D:\ComfyUI\ComfyUI\custom_nodes
git clone https://github.com/ltdrdata/ComfyUI-Manager.git
```

Then from the portable root:
```bat
python_embeded\python.exe -m pip install -r ComfyUI\custom_nodes\ComfyUI-Manager\requirements.txt
```

Restart ComfyUI and refresh your browser. A **Manager** button appears in the top menu ([ComfyUI custom node install docs](https://docs.comfy.org/installation/install_custom_node)).

---

### 3b. Required Custom Nodes

Install all of the following via ComfyUI Manager → **Custom Nodes Manager** → search → Install. Restart ComfyUI after each batch.

| Node | Search Name in Manager | Purpose |
|------|----------------------|---------|
| ComfyUI-GGUF | `ComfyUI-GGUF` (by city96) | Loads `.gguf` model files into ComfyUI |
| ComfyUI-KJNodes | `ComfyUI-KJNodes` | Kijai's utility nodes — prerequisite for WanVideoWrapper |
| ComfyUI-WanVideoWrapper | `ComfyUI-WanVideoWrapper` | Full Wan 2.x video pipeline nodes (Stand-In, block swap, etc.) |
| ComfyUI-VideoHelperSuite | `ComfyUI-VideoHelperSuite` (VHS) | Video file I/O, frame encoding/decoding |
| ComfyUI-FramePackWrapper_Plus | `FramePack` | FramePack F1 long-video generation |
| ComfyUI-Impact-Pack | `ComfyUI-Impact-Pack` | YOLO face detection — needed for Stand-In V3 YOLO preprocessing |

For any node not found in Manager, install via Git URL (Manager → **Install via Git URL**).

**After installing each set of nodes:** Restart ComfyUI. If you see `import failed` errors in the console, navigate to that custom node's folder and run:

```bat
D:\ComfyUI\python_embeded\python.exe -m pip install -r requirements.txt
```

---

### 3c. Model Downloads — Exact Files and Folder Paths

Create all subfolders before downloading. They must exist exactly as shown:

```
D:\ComfyUI\ComfyUI\models\
├── checkpoints\
├── diffusion_models\
├── text_encoders\
├── vae\
├── loras\
└── clip_vision\
```

#### Anime Image Generation (Character Reference Sheets)

| File | Size | Destination | Source |
|------|------|-------------|--------|
| `animagine-xl-3.1.safetensors` | ~6.4 GB | `models\checkpoints\` | [cagliostrolab/animagine-xl-3.1 on HuggingFace](https://huggingface.co/cagliostrolab/animagine-xl-3.1) |
| `sdxl_vae.safetensors` | ~335 MB | `models\vae\` | Part of the Animagine XL release or use the SDXL base VAE |

Animagine XL 3.1 uses [Danbooru-style tags](https://huggingface.co/cagliostrolab/animagine-xl-3.1) as prompts (`1girl, long hair, blue eyes, school uniform, masterpiece, best quality`), not natural language. Use negative prompt: `lowres, bad anatomy, bad hands, text, error, cropped, worst quality, low quality, jpeg artifacts, watermark`.

**NoobAI XL** is a strong alternative to Animagine XL 3.1 — slightly better at complex poses. Download `NoobAI-XL-v1.1.safetensors` from [Hugging Face](https://huggingface.co/Laxhar/noobai-XL-1.1) and place in `models\checkpoints\`. Launch ComfyUI with `--bf16-unet` in `run_nvidia_gpu.bat` arguments for best results with NoobAI.

#### Wan 2.2 I2V A14B — GGUF Versions (Use These, Not FP16)

Source: [wangkanai/wan22-fp16-i2v-gguf](https://huggingface.co/wangkanai/wan22-fp16-i2v-gguf)

| File | Size | Destination |
|------|------|-------------|
| `wan22-i2v-a14b-high-q4-k-s.gguf` | 8.2 GB | `models\diffusion_models\` |
| `wan22-i2v-a14b-low-q4-k-s.gguf` | 8.2 GB | `models\diffusion_models\` |

These are the two MoE expert models. The "high" model handles early noisy timesteps (layout/composition), the "low" model handles late refinement timesteps (details). They are used sequentially, not simultaneously ([wangkanai HuggingFace repo](https://huggingface.co/wangkanai/wan22-fp16-i2v-gguf)).

#### UMT5-XXL Text Encoder — GGUF Version

Source: [city96/umt5-xxl-encoder-gguf](https://huggingface.co/city96/umt5-xxl-encoder-gguf)

| File | Size | Destination |
|------|------|-------------|
| `umt5-xxl-encoder-Q5_K_M.gguf` | ~4.15 GB | `models\text_encoders\` |

**Do not use** the full FP16 or FP32 versions of UMT5-XXL (22 GB+) — they will OOM on 16 GB. The city96 GGUF repo recommends Q5_K_M or larger for best quality. If VRAM is tight, Q4_K_M works but may lose subtle prompt nuance.

The official ComfyUI Wan 2.2 repackaged repo also provides `umt5_xxl_fp8_e4m3fn_scaled.safetensors` at the [Comfy-Org/Wan_2.2_ComfyUI_Repackaged](https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged) repo — this FP8 safetensors version is also viable (~11 GB) if you prefer it over GGUF ([ComfyUI official Wan 2.2 docs](https://docs.comfy.org/tutorials/video/wan/wan2_2)).

#### Wan 2.2 VAE

| File | Size | Destination | Source |
|------|------|-------------|--------|
| `wan2.2_vae.safetensors` | 1.41 GB | `models\vae\` | [Comfy-Org/Wan_2.2_ComfyUI_Repackaged](https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged/blob/main/vae/wan2.2_vae.safetensors) |

#### LightX2V Lightning LoRAs — Speed Acceleration

Source: [Kijai/WanVideo_comfy — Wan22-Lightning folder](https://huggingface.co/Kijai/WanVideo_comfy/tree/main/Wan22-Lightning)

Download the I2V versions (for image-to-video workflows):

| File | Size | Destination |
|------|------|-------------|
| `Wan2.2-Lightning_I2V-A14B-4steps-lora_HIGH_fp16.safetensors` | 614 MB | `models\loras\` |
| `Wan2.2-Lightning_I2V-A14B-4steps-lora_LOW_fp16.safetensors` | 614 MB | `models\loras\` |

If you also run T2V workflows:

| File | Size | Destination |
|------|------|-------------|
| `Wan2.2-Lightning_T2V-A14B-4steps-lora_HIGH_fp16.safetensors` | 614 MB | `models\loras\` |
| `Wan2.2-Lightning_T2V-A14B-4steps-lora_LOW_fp16.safetensors` | 614 MB | `models\loras\` |

These reduce generation from 40 steps to 4–8 steps. Recommended settings: lora strength 0.6–0.8 on HIGH model, 1.0 on LOW model, CFG 2–3.5 on high / CFG 1.0 on low, 8 total steps (4+4).

#### Stand-In LoRAs — Character Consistency

Source: [Kijai/WanVideo_comfy — LoRAs/Stand-In folder](https://huggingface.co/Kijai/WanVideo_comfy/tree/main/LoRAs/Stand-In)

| File | Size | Destination |
|------|------|-------------|
| `Stand-In_wan2.2_T2V_A14B_HIGH_fp16.safetensors` | 315 MB | `models\loras\` |
| `Stand-In_wan2.2_T2V_A14B_LOW_fp16.safetensors` | 315 MB | `models\loras\` |

These are the Wan 2.2-specific Stand-In LoRA files. Use both simultaneously in the Stand-In workflow — one for the high-noise stage and one for the low-noise stage.

#### FramePack F1

FramePack models **auto-download** on first run via the ComfyUI-FramePackWrapper node. No manual download needed. Ensure you have at least 20 GB free on the drive where ComfyUI is installed. Alternatively, pre-download from [lllyasviel/FramePack on GitHub](https://github.com/lllyasviel/FramePack).

---

### 3d. Wan 2.2 — Anime-Specific: Wan 2.2 Animate

[Wan 2.2 Animate](https://aiartimind.com/wan-2-2-animate-the-ultimate-guide-to-features-installation-and-getting-started/) is a distinct sub-model within the Wan 2.2 family specifically built for:

- **Clean line art and smooth illustrated motion** — designed for stylized/anime aesthetics rather than photorealistic video
- **Holistic motion replication** — separates and injects skeleton-based body motion and facial expression signals independently via cross-attention, giving better control of anime character animation
- **Character swap/replacement** — the "Replacement" mode overlays a new character over reference video motion while preserving the original lighting via a Relighting LoRA
- **Long-clip temporal guidance** — segment chaining where subsequent clips condition on prior frames, critical for maintaining style across multi-clip episodes

**Download the Animate model:**

```bash
pip install "huggingface_hub[cli]"
huggingface-cli download Wan-AI/Wan2.2-Animate-14B --local-dir D:\models\Wan2.2-Animate-14B
```

Then move the diffusion model files to `models\diffusion_models\` and the VAE to `models\vae\` per ComfyUI's expected structure. The Animate model integrates with the same ComfyUI-WanVideoWrapper node set.

Use this prompt template for anime motion with Wan 2.2 Animate ([FluxPro AI guide](https://fluxproweb.com/blog/detail/The-Ultimate-Guide-to-WAN-AI-Video-Generation-on-Flux-Pro-AI-From-Fast-Concepts-to-Cinematic-Realism-2bc0599b2853/)):

```
dynamic anime shot, smooth in-between frames, bold color lines, dramatic pose motion, [character description], [action], [setting]
```

---

### 3e. Load the Official Wan 2.2 Workflow

After models are in place ([ComfyUI official Wan 2.2 tutorial](https://docs.comfy.org/tutorials/video/wan/wan2_2)):

1. In ComfyUI, go to **Workflow → Browse Templates → Video**
2. Select **"Wan2.2 14B I2V"** for image-to-video
3. In the `Load Diffusion Model` (high noise) node: select `wan22-i2v-a14b-high-q4-k-s.gguf`
4. In the `Load Diffusion Model` (low noise) node: select `wan22-i2v-a14b-low-q4-k-s.gguf`
5. In the `Load CLIP` node: select `umt5-xxl-encoder-Q5_K_M.gguf`, type set to **WAN**
6. In the `Load VAE` node: select `wan2.2_vae.safetensors`
7. In the `Load Image` node: upload your storyboard frame
8. Run with `Ctrl+Enter`

---

## 4. The Full Episode Production Workflow

### Phase 1 — Script & Scene Breakdown

Use Ollama with a capable local LLM (Qwen 2.5 32B, Mistral, or LLaMA 3.1 70B if your system RAM allows it). Prompt structure:

```
You are an anime screenwriter. Write a 5-minute anime episode script for [series name].
Episode: [number and title]
Genre: [action/romance/isekai/etc]
Characters: [list character names and brief descriptions]

Format output as:
- Episode synopsis (3 sentences)
- Scene list (numbered 1–12, each with: scene title, location, characters present, action, camera angle suggestion, mood)
- Full script with dialogue
```

The scene list becomes your production queue. Each scene = one or more video clips. Target 10–15 scenes per episode for a manageable workflow.

**Scene metadata template** (save each as a JSON or text file):

```json
{
  "scene_id": "S01E01_003",
  "description": "Hana steps through the portal into the spirit world",
  "setting": "forest clearing, moonlight, mist",
  "characters": ["Hana"],
  "action": "slow walk forward, looking up in awe",
  "camera": "low angle, slight upward tilt",
  "mood": "wonder, slight fear",
  "duration_s": 5,
  "prompt_positive": "anime girl, long dark hair, school uniform, stepping through glowing portal, forest, moonlight, mist, looking up, awe expression, dynamic anime shot, masterpiece",
  "prompt_negative": "lowres, bad anatomy, extra limbs, watermark"
}
```

---

### Phase 2 — Character Reference Sheets

Use ComfyUI with Animagine XL 3.1. For each character, generate a reference sheet:

**Fixed parameters per character (never change these):**
- Seed: pick one and lock it permanently per character
- Model: `animagine-xl-3.1.safetensors`
- Resolution: 1024×1024 (square) or 832×1216 (portrait)
- CFG: 7.0
- Steps: 28–35
- Sampler: DPM++ 2M Karras

**Prompt template:**

```
[character name] character reference sheet, front view, side view, back view, [hair color] [hair style], [eye color], [outfit description], white background, full body, anime style, masterpiece, best quality, character design sheet
```

Generate at minimum:
1. Front face close-up (reference for Stand-In YOLO cropping)
2. Full-body front view
3. Full-body side view (optional: 3/4 view)

Save all reference images to a dedicated folder: `D:\AnimeProject\[CharacterName]\references\`

---

### Phase 3 — Train Character LoRAs (Optional but Recommended for Multi-Scene Consistency)

Training a custom LoRA for each main character gives you identity control across all scenes, not just clips where that character's reference image can be used as a direct input. Use [AI Toolkit by ostris](https://github.com/ostris/ai-toolkit) ([LoRA training tutorial](https://www.youtube.com/watch?v=HlXmji1O_bI)):

```bash
git clone https://github.com/ostris/ai-toolkit
cd ai-toolkit
pip install -r requirements.txt
```

**Dataset preparation:**
- 10–20 images per character, diverse angles and expressions
- Consistent Danbooru-style captions for each image
- Include: front face, 3/4 face, profile, smiling, neutral, action pose
- Resolution: 1024×1024 (crop and resize from your reference sheet outputs)

**Training config** (save as `character_lora.yaml`):

```yaml
job: extension
config:
  name: hana_character_lora
  process:
    - type: 'sd_trainer'
      training_folder: "D:/lora_training/hana"
      device: cuda:0
      trigger_word: "hana_char"
      network:
        type: "lora"
        linear: 64
        linear_alpha: 64
      train:
        batch_size: 1
        steps: 1500
        gradient_accumulation_steps: 1
        train_unet: true
        train_text_encoder: false
        gradient_checkpointing: true
        noise_scheduler: "flowmatch"
        lr: 1e-4
      model:
        name_or_path: "D:/ComfyUI/ComfyUI/models/diffusion_models/wan22-i2v-a14b-high-q4-k-s.gguf"
        arch: "wan"
```

Training takes approximately 1–2 hours on the RTX 5070 Ti. Output: a `.safetensors` LoRA file → place in `models\loras\`.

---

### Phase 4 — Storyboard Generation

For each scene from your scene list:

1. Open ComfyUI with Animagine XL 3.1
2. Use the scene's `prompt_positive` + `prompt_negative` from your metadata JSON
3. Apply the character's LoRA (if trained) at strength 0.7–0.9
4. Use the character's reference image as a ControlNet input (OpenPose or IP-Adapter) for pose guidance
5. Set seed to the scene ID hash (for reproducibility: `hash("S01E01_003") % 999999`)
6. Generate 3–5 variations, pick the best

Output: one key frame image per scene → save as `D:\AnimeProject\storyboard\S01E01_003.png`

---

### Phase 5 — Video Generation with Wan 2.2

This is the core pipeline. Load your storyboard frame, animate it with Wan 2.2 I2V.

**Basic I2V workflow in ComfyUI (Kijai WanVideoWrapper nodes):**

```
LoadImage → WanVideoEncoder →→ WanVideoSampler (high noise, 4 steps, Lightning LoRA HIGH)
                                      ↓
UMT5Encoder (GGUF) ───────→  WanVideoSampler (low noise, 4 steps, Lightning LoRA LOW)
                                      ↓
                               WanVideoVAEDecode → VideoHelperSuite Save
```

**Resolution and frame count:**

| Output | Width | Height | Frames | Duration at 16fps |
|--------|-------|--------|--------|-------------------|
| 480p landscape | 832 | 480 | 49 | ~3 s |
| 640p landscape | 960 | 544 | 81 | ~5 s |
| 720p landscape | 1280 | 720 | 81 | ~5 s |

Start at 640p (960×544). 720p is possible on the 5070 Ti using NVFP4 or block swapping.

**Lightning LoRA settings** ([Wan 2.2 + LightX2V workflow guide](https://www.runcomfy.com/comfyui-workflows/wan-2-2-lightx2v-v2-comfyui-workflow-ultra-fast-text-to-video)):

```
HIGH noise sampler:
  - Steps: 4
  - CFG: 2.0–3.5
  - Sampler: euler
  - Lightning LoRA: Wan2.2-Lightning_I2V-A14B-4steps-lora_HIGH_fp16.safetensors (strength: 0.7)

LOW noise sampler:
  - Steps: 4
  - CFG: 1.0
  - Sampler: lcm
  - Lightning LoRA: Wan2.2-Lightning_I2V-A14B-4steps-lora_LOW_fp16.safetensors (strength: 1.0)
  - Add noise: DISABLED
  - Return with leftover noise: DISABLED
  - Start at step: 4, End at step: 8
```

**Enabling block swapping (if hitting OOM):**

Add a `WanVideoSetBlockSwap` node connected to your model loader ([WanVideoBlockSwap documentation](https://comfyai.run/documentation/WanVideoBlockSwap)):

```
blocks_to_swap: 0        # Start here
offload_img_emb: False   # Set True if still OOM
offload_txt_emb: False   # Set True if still OOM
use_non_blocking: True
```

If you get OOM, increase `blocks_to_swap` by 5 until stable. Maximum is 40. Each swap costs generation speed — keep it as low as possible. On the 5070 Ti with Q4_K_S GGUF, you may not need any block swapping at all at 640p.

**Stand-In V3 for Character Consistency** ([Stand-In V3 tutorial](https://www.youtube.com/watch?v=28yCZqFMsjY)):

The key insight from the optimized V3 workflow: **crop and isolate just the face from the reference image** before feeding it to Stand-In, regardless of whether your reference is a close-up or full-body shot. This prevents consistency degradation in wide shots and full-body scenes.

Workflow additions:
1. Add **Stand-In preprocessor (YOLO)** node — takes your character reference image, automatically detects and crops the face region
2. Feed the cropped face into Stand-In's reference input
3. Load both `Stand-In_wan2.2_T2V_A14B_HIGH_fp16.safetensors` and `Stand-In_wan2.2_T2V_A14B_LOW_fp16.safetensors`
4. Set Stand-In weight to 3.0 initially, tune down if the character looks too "locked"

Model download for Stand-In: [Kijai/WanVideo_comfy — LoRAs/Stand-In](https://huggingface.co/Kijai/WanVideo_comfy/tree/main/LoRAs/Stand-In)

---

### Phase 6 — Extending Clips with FramePack F1

For scenes longer than 10 seconds, use FramePack F1 instead of generating one long Wan 2.2 clip (which degrades quality). FramePack uses [anti-drifting sampling technology](https://github.com/lllyasviel/FramePack/discussions/459) that prevents error accumulation across frames.

**FramePack F1 approach for long sequences** ([FramePack F1 ComfyUI tutorial](https://www.youtube.com/watch?v=maspxhFdyyo)):

1. Generate your first 5 s clip with Wan 2.2 I2V
2. Extract the **last frame** of that clip (VHS node → Last Frame output)
3. Feed that last frame as the **new reference image** into FramePack F1
4. FramePack F1 generates the next segment conditioned on this frame
5. Chain as many segments as needed

**FramePack F1 key nodes:**
- `FramePackSamplerF1` — the F1 sampler (forward-only, no bi-directional constraints, more dynamic motion)
- `FramePackEncode` — prompt-per-timestamp node (write different prompts for different time points, e.g., "at 0s: standing" / "at 5s: running")
- VAE Decode Tiling — enable for clips longer than 10 s

**FramePack parameters:**
- Steps: 25–30 (do not reduce for FramePack — it is more sensitive to step count than Wan 2.2)
- CFG: 10.0 (default, do not increase)
- Anti-drift sampler type: `fixed` (leave default)
- VRAM required: ~6 GB — the 5070 Ti handles this easily ([RunComfy FramePack guide](https://www.runcomfy.com/comfyui-workflows/framepack-wrapper-for-comfyui-long-video-generation-with-low-memory))

---

### Phase 7 — Post-Production

**Concatenate all clips with ffmpeg:**

```bash
# Create a file list
(for %f in (D:\AnimeProject\clips\*.mp4) do echo file '%f') > D:\AnimeProject\filelist.txt

# Concatenate
ffmpeg -f concat -safe 0 -i D:\AnimeProject\filelist.txt -c copy D:\AnimeProject\episode01_raw.mp4
```

**For clips with different resolutions or codecs, re-encode:**

```bash
ffmpeg -f concat -safe 0 -i filelist.txt -vf "scale=1280:720,fps=24" -c:v libx264 -crf 18 -preset slow episode01_raw.mp4
```

**DaVinci Resolve (free tier):**

1. Import `episode01_raw.mp4`
2. Add transitions, color grading, scene timing adjustments
3. Import or generate audio: background music (Suno, etc.) + voice narration
4. For voice: generate a narration script with Ollama → feed to a local TTS system (Kokoro TTS, StyleTTS2, or Piper)
5. Export at 1280×720 H.264 for delivery

---

## 5. Common Setup Errors and Fixes

### Error 1: CUDA Out of Memory (OOM) during Wan 2.2 generation

**Cause:** Using FP16 or FP8 safetensors models instead of GGUF, or attempting 720p without VRAM optimization.

**Fix:**
1. Confirm you are using `wan22-i2v-a14b-high-q4-k-s.gguf` and `wan22-i2v-a14b-low-q4-k-s.gguf` — NOT the FP16 safetensors files
2. Confirm text encoder is `umt5-xxl-encoder-Q5_K_M.gguf` — NOT the full 22 GB FP32 version
3. In the VAE decode step, enable **VAE tiling** (split decode by tiles) if decoding 720p
4. Add `WanVideoSetBlockSwap` node, start `blocks_to_swap` at 0, increase by 5 until stable
5. Enable `offload_img_emb: True` and `offload_txt_emb: True` if still OOM

([Latenode GGUF optimization guide](https://community.latenode.com/t/quick-video-ai-setup-for-budget-12gb-graphics-cards-wan-2-2-gguf-models-lightning-training-optimized-steps/33286))

---

### Error 2: "Model not found" or node can't find model files

**Cause:** ComfyUI expects models in specific subfolders. The most common mistake: putting diffusion model files in `models\unet\` instead of `models\diffusion_models\`.

**Fix:** Verify the exact paths:

```
models\diffusion_models\wan22-i2v-a14b-high-q4-k-s.gguf  ✅
models\unet\wan22-i2v-a14b-high-q4-k-s.gguf               ❌ Wrong folder
```

Create the folders manually if they do not exist — ComfyUI will not create them automatically. After adding files, click **Refresh** in the ComfyUI model dropdown; you should not need a full restart.

---

### Error 3: Custom nodes show "import failed" in console

**Cause:** Python dependencies for the custom node are not installed, or there is a version conflict between packages.

**Fix:**
1. Read the exact error in the console — it names the missing module
2. Navigate to the failing node's folder: `ComfyUI\custom_nodes\[NodeName]\`
3. Run: `D:\ComfyUI\python_embeded\python.exe -m pip install -r requirements.txt`
4. If you see a version conflict, try: `pip install [package] --upgrade`
5. Delete `__pycache__` folders inside the failing node directory and restart ComfyUI
6. If the conflict persists, uninstall the conflicting package and reinstall the node from scratch via Manager

---

### Error 4: CLIP/text encoder errors — "unexpected keyword argument" or dimension mismatch

**Cause:** Using the wrong text encoder type. Wan 2.2 uses UMT5-XXL, not CLIP-L or OpenCLIP.

**Fix:** In the `Load CLIP` (or `WanVideo T5 TextEncoder`) node, explicitly set the **type** dropdown to `WAN` (or `WAN2`). Make sure you are loading `umt5-xxl-encoder-Q5_K_M.gguf`, not any SDXL CLIP encoder. They are incompatible ([ComfyUI Wan 2.2 official docs](https://docs.comfy.org/tutorials/video/wan/wan2_2)).

---

### Error 5: Slow generation — 20+ minutes per clip

**Cause:** Not using Lightning LoRAs, too many steps, or block swapping enabled when unnecessary.

**Fix:**
1. Add LightX2V Lightning LoRAs for both high and low samplers (see model download table above)
2. Reduce total steps to 8 (4 high + 4 low)
3. If `blocks_to_swap > 0` and you have enough VRAM, set it back to 0 — every swap to CPU costs significant time
4. Check that PyTorch is using `cu130` CUDA for NVFP4 support: in ComfyUI console, look for CUDA version on startup

Expected times on 5070 Ti with Lightning LoRA at 8 steps, 640p, 49 frames: **~2–3 minutes** per clip.

---

### Error 6: Character changes appearance across clips (identity drift)

**Cause:** Without Stand-In or a character LoRA, Wan 2.2 reinterprets the character in each generation.

**Fix (tiered approach):**

1. **Minimal fix:** Use the same character reference image and seed in every clip
2. **Better fix:** Use Stand-In V3 with YOLO preprocessing — crop just the face from the reference before feeding Stand-In. The face region size in the reference image is the dominant factor; a full-body reference image without cropping degrades identity on wide shots ([Stand-In V3 optimized workflow](https://www.youtube.com/watch?v=28yCZqFMsjY))
3. **Best fix:** Combine a trained character LoRA (from Phase 3) + Stand-In + consistent reference image. Apply the character LoRA to both the high and low noise samplers at strength 0.6–0.8

---

### Error 7: ComfyUI freezes or crashes on startup after installing new nodes

**Fix:**
1. Open the console/terminal (not the browser) — the error is logged there
2. Identify the failing node from the `import failed: [NodeName]` message
3. Disable that node in Manager → **Disable** the node, restart ComfyUI
4. Fix that node's dependencies before re-enabling
5. Never install more than 3–4 new nodes at once — install, test, then continue

---

## 6. Key Resources & Workflows

### Official Documentation

| Resource | URL |
|----------|-----|
| Wan 2.2 ComfyUI official workflows | [docs.comfy.org/tutorials/video/wan/wan2_2](https://docs.comfy.org/tutorials/video/wan/wan2_2) |
| ComfyUI custom node installation | [docs.comfy.org/installation/install_custom_node](https://docs.comfy.org/installation/install_custom_node) |
| ComfyUI NVFP4 optimization guide | [blog.comfy.org/p/new-comfyui-optimizations-for-nvidia](https://blog.comfy.org/p/new-comfyui-optimizations-for-nvidia) |
| FramePack GitHub (lllyasviel) | [github.com/lllyasviel/FramePack](https://github.com/lllyasviel/FramePack) |
| Kijai WanVideo models repo | [huggingface.co/Kijai/WanVideo_comfy](https://huggingface.co/Kijai/WanVideo_comfy/tree/main) |

### Model Downloads

| Model | URL |
|-------|-----|
| Wan 2.2 I2V GGUF (wangkanai) | [huggingface.co/wangkanai/wan22-fp16-i2v-gguf](https://huggingface.co/wangkanai/wan22-fp16-i2v-gguf) |
| UMT5-XXL GGUF text encoder (city96) | [huggingface.co/city96/umt5-xxl-encoder-gguf](https://huggingface.co/city96/umt5-xxl-encoder-gguf) |
| Wan 2.2 VAE (Comfy-Org repack) | [huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged](https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged) |
| Stand-In LoRA files (Kijai) | [huggingface.co/Kijai/WanVideo_comfy/tree/main/LoRAs/Stand-In](https://huggingface.co/Kijai/WanVideo_comfy/tree/main/LoRAs/Stand-In) |
| Lightning LoRA (LightX2V via Kijai) | [huggingface.co/Kijai/WanVideo_comfy/tree/main/Wan22-Lightning](https://huggingface.co/Kijai/WanVideo_comfy/tree/main/Wan22-Lightning) |
| Animagine XL 3.1 | [huggingface.co/cagliostrolab/animagine-xl-3.1](https://huggingface.co/cagliostrolab/animagine-xl-3.1) |

### Video Tutorials

| Tutorial | URL |
|----------|-----|
| Stand-In V3 character consistency (optimized workflow) | [youtube.com/watch?v=28yCZqFMsjY](https://www.youtube.com/watch?v=28yCZqFMsjY) |
| FramePack F1 long video guide | [youtube.com/watch?v=maspxhFdyyo](https://www.youtube.com/watch?v=maspxhFdyyo) |
| Character LoRA training for Wan 2.2 | [youtube.com/watch?v=HlXmji1O_bI](https://www.youtube.com/watch?v=HlXmji1O_bI) |
| Wan 2.2 Lightning LoRA speed tutorial | [youtube.com/watch?v=01e__jFQlTw](https://www.youtube.com/watch?v=01e__jFQlTw) |
| Wan 2.2 Animate character swap | [youtube.com/watch?v=woCP1Q_Htwo](https://www.youtube.com/watch?v=woCP1Q_Htwo) |
| BindWeave character consistency (alternative) | [youtube.com/watch?v=QsG0CRAxIHw](https://www.youtube.com/watch?v=QsG0CRAxIHw) |
| Wan 2.2 GGUF beginner-friendly workflow | [youtube.com/watch?v=9rDbnyycxU0](https://www.youtube.com/watch?v=9rDbnyycxU0) |

---

## 7. Recommended Production Order for Your First Episode

1. **Install ComfyUI clean** (Section 3a) → verify default workflow runs
2. **Install custom nodes** (Section 3b) → verify no import errors
3. **Download Wan 2.2 GGUF files** (Section 3c) → run the official Wan 2.2 5B TI2V template first (smaller model, faster to validate your setup)
4. **Generate one test clip** with the 5B TI2V model before touching the 14B model — confirm the pipeline works end-to-end
5. **Download 14B GGUF + UMT5 Q5_K_M** → run the I2V 14B template with one storyboard image
6. **Add Lightning LoRAs** → confirm generation time drops to ~3 min
7. **Add Stand-In** → confirm character identity locks across multiple clips
8. **Begin character reference sheet production** with Animagine XL
9. **Write scripts with Ollama**, generate storyboard frames, run the full pipeline scene by scene
10. **Post-produce** with ffmpeg + DaVinci Resolve