File size: 4,401 Bytes
4b40584
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
# Usage Guide - WAN 2.2 Image-to-Video LoRA Demo

## Quick Start

### 1. Deploying to Hugging Face Spaces

To deploy this demo to Hugging Face Spaces:

```bash

# Install git-lfs if not already installed

git lfs install



# Create a new Space on huggingface.co

# Then clone your space repository

git clone https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE_NAME

cd YOUR_SPACE_NAME



# Copy all files from this demo

cp -r * YOUR_SPACE_NAME/



# Commit and push

git add .

git commit -m "Initial commit: WAN 2.2 Image-to-Video LoRA Demo"

git push

```

### 2. Running Locally

```bash

# Create a virtual environment

python -m venv venv

source venv/bin/activate  # On Windows: venv\Scripts\activate



# Install dependencies

pip install -r requirements.txt



# Run the app

python app.py

```

The app will be available at `http://localhost:7860`

## Using the Demo

### Basic Usage

1. **Upload Image**: Click the image upload area and select an image file
2. **Enter Prompt**: Type a description of the motion you want (e.g., "A person walking forward, cinematic")
3. **Click Generate**: Wait for the video to be generated (first run will download the model)
4. **View Result**: The generated video will appear in the output area

### Advanced Settings

Expand the "Advanced Settings" accordion to access:

- **Inference Steps** (20-100): More steps = higher quality but slower generation
  - 20-30: Fast, lower quality
  - 50: Balanced (recommended)
  - 80-100: Slow, highest quality

- **Guidance Scale** (1.0-15.0): How closely to follow the prompt
  - 1.0-3.0: More creative, less faithful to prompt
  - 6.0: Balanced (recommended)
  - 10.0-15.0: Very faithful to prompt, less creative

- **Use LoRA**: Enable/disable LoRA fine-tuning

- **LoRA Type**:
  - **High-Noise**: Best for dynamic, action-heavy scenes
  - **Low-Noise**: Best for subtle, smooth motions

## Example Prompts

### Good Prompts

- "A cat walking through a garden, sunny day, high quality"
- "Waves crashing on a beach, sunset lighting, cinematic"
- "A car driving down a highway, fast motion, 4k"
- "Smoke rising from a campfire, slow motion"

### Tips for Better Results

1. **Be Specific**: Include details about motion, lighting, and quality
2. **Use Keywords**: "cinematic", "high quality", "4k", "smooth"
3. **Describe Motion**: Clearly state what should move and how
4. **Consider Style**: Add style descriptors like "photorealistic" or "animated"

## Troubleshooting

### Out of Memory Error

If you encounter OOM errors:

1. The model requires significant VRAM (16GB+ recommended)
2. On Hugging Face Spaces, ensure you're using at least `gpu-medium` hardware
3. For local runs, try reducing the number of frames or using CPU offloading

### Slow Generation

- First generation will be slower (model downloads)
- Reduce inference steps for faster results
- Ensure GPU is being used (check logs for "Loading model on cuda")

### Model Not Loading

If the model fails to load:

1. Check your internet connection (model is ~20GB)
2. Ensure sufficient disk space
3. For Hugging Face Spaces, check your Space's logs

## Customization

### Using Your Own LoRA Files

To use your own LoRA weights:

1. Upload LoRA `.safetensors` files to Hugging Face
2. Update the URLs in `app.py`:

```python

HIGH_NOISE_LORA_URL = "https://huggingface.co/YOUR_USERNAME/YOUR_REPO/resolve/main/your_lora.safetensors"

```

3. Uncomment and implement the LoRA loading code in the `generate_video` function

### Changing the Model

To use a different model:

1. Update `MODEL_ID` in `app.py`
2. Ensure the model is compatible with `CogVideoXImageToVideoPipeline`
3. Adjust memory optimizations if needed

## Performance Notes

- **GPU (A10G/T4)**: ~2-3 minutes per video
- **GPU (A100)**: ~1-2 minutes per video
- **CPU**: Not recommended (20+ minutes)

## API Access

For programmatic access, you can use the Gradio Client:

```python

from gradio_client import Client



client = Client("YOUR_USERNAME/YOUR_SPACE_NAME")

result = client.predict(

    image="path/to/image.jpg",

    prompt="A cat walking",

    api_name="/predict"

)

```

## Credits

- Model: CogVideoX by THUDM
- Framework: Hugging Face Diffusers
- Interface: Gradio

## License

Apache 2.0 - See LICENSE file for details