Update weights

Files changed (13) hide show

.gitattributes +2 -0
README.md +97 -0
Z-Image-Turbo-Fun-Controlnet-Union.safetensors +3 -0
asset/canny.jpg +3 -0
asset/depth.jpg +3 -0
asset/hed.jpg +3 -0
asset/pose.jpg +3 -0
asset/pose2.jpg +3 -0
results/canny.png +3 -0
results/depth.png +3 -0
results/hed.png +3 -0
results/pose.png +3 -0
results/pose2.png +3 -0

.gitattributes CHANGED Viewed

@@ -1,3 +1,5 @@
 *.7z filter=lfs diff=lfs merge=lfs -text
 *.arrow filter=lfs diff=lfs merge=lfs -text
 *.bin filter=lfs diff=lfs merge=lfs -text

+*.png filter=lfs diff=lfs merge=lfs -text
+*.jpg filter=lfs diff=lfs merge=lfs -text
 *.7z filter=lfs diff=lfs merge=lfs -text
 *.arrow filter=lfs diff=lfs merge=lfs -text
 *.bin filter=lfs diff=lfs merge=lfs -text

README.md CHANGED Viewed

@@ -1,3 +1,100 @@
 ---
 license: apache-2.0
 ---

 ---
 license: apache-2.0
 ---
+# Z-Image-Turbo-Fun-Controlnet-Union
+## Model Features
+- This ControlNet is added on 6 blocks.
+- The model was trained from scratch for 10,000 steps on a dataset of 1 million high-quality images covering both general and human-centric content. Training was performed at 1328 resolution using BFloat16 precision, with a batch size of 64, a learning rate of 2e-5, and a text dropout ratio of 0.10.
+- It supports multiple control conditions—including Canny, HED, depth maps, pose estimation, and MLSD—and can be used like a standard ControlNet.
+- You can adjust controlnet_conditioning_scale and control_guidance_end for stronger control and better detail preservation. For better stability, we highly recommend using a detailed prompt. The optimal range for controlnet_conditioning_scale is from 0.65 to 0.80.
+## TODO
+- [ ] Train on more data and for more steps.
+- [ ] Support inpaint mode.
+## Results
+<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
+  <tr>
+    <td>Pose</td>
+    <td>Output</td>
+  </tr>
+  <tr>
+    <td><img src="asset/pose2.jpg" width="100%" /></td>
+    <td><img src="results/pose2.png" width="100%" /></td>
+  </tr>
+</table>
+<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
+  <tr>
+    <td>Pose</td>
+    <td>Output</td>
+  </tr>
+  <tr>
+    <td><img src="asset/pose.jpg" width="100%" /></td>
+    <td><img src="results/pose.png" width="100%" /></td>
+  </tr>
+</table>
+<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
+  <tr>
+    <td>Canny</td>
+    <td>Output</td>
+  </tr>
+  <tr>
+    <td><img src="asset/canny.jpg" width="100%" /></td>
+    <td><img src="results/canny.png" width="100%" /></td>
+  </tr>
+</table>
+<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
+  <tr>
+    <td>HED</td>
+    <td>Output</td>
+  </tr>
+  <tr>
+    <td><img src="asset/hed.jpg" width="100%" /></td>
+    <td><img src="results/hed.png" width="100%" /></td>
+  </tr>
+</table>
+<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
+  <tr>
+    <td>Depth</td>
+    <td>Output</td>
+  </tr>
+  <tr>
+    <td><img src="asset/depth.jpg" width="100%" /></td>
+    <td><img src="results/depth.png" width="100%" /></td>
+  </tr>
+</table>
+## Inference
+Go to the VideoX-Fun repository for more details.
+Please clone the VideoX-Fun repository and create the required directories:
+```sh
+# Clone the code
+git clone https://github.com/aigc-apps/VideoX-Fun.git
+# Enter VideoX-Fun's directory
+cd VideoX-Fun
+# Create model directories
+mkdir -p models/Diffusion_Transformer
+mkdir -p models/Personalized_Model
+```
+Then download the weights into models/Diffusion_Transformer and models/Personalized_Model.
+```
+📦 models/
+├── 📂 Diffusion_Transformer/
+│   └── 📂 Z-Image-Turbo/
+├── 📂 Personalized_Model/
+│   └── 📦 Z-Image-Turbo-Fun-Controlnet-Union.safetensors
+```
+Then run the file `examples/z_image_fun/predict_t2i_control.py`.