How to use Hunyuan1.5 for video upscaling

#2
by makisekurisu-jp - opened

How should the Hunyuan1.5 super-resolution model be applied for video upscaling? In my current workflow, I input a 640×640 resolution video, but the resulting output is a 1920×1072 black screen.

workflow

I’m using the model from the link below because the official model provided by Comfy is too large for me.
https://huggingface.co/lightx2v/Hy1.5-Quantized-Models/blob/main/hy15_1080p_sr_cfg_distiled_fp8_e4m3_lightx2v.safetensors

might be ok, not sure. I'll give it a test too. But the other non-comfy models do produce black video, so its likely this model is also for LightX own codebase, and not comfy
Maybe the Comfy repository will also get some fp8 versions too

See here also https://huggingface.co/lightx2v/Hy1.5-Quantized-Models/discussions/2#6921bbee921df8e6e4f81d4f

I’m using the model from the link below because the official model provided by Comfy is too large for me.
https://huggingface.co/lightx2v/Hy1.5-Quantized-Models/blob/main/hy15_1080p_sr_cfg_distiled_fp8_e4m3_lightx2v.safetensors

might be ok, not sure. I'll give it a test too. But the other non-comfy models do produce black video, so its likely this model is also for LightX own codebase, and not comfy
Maybe the Comfy repository will also get some fp8 versions too

See here also https://huggingface.co/lightx2v/Hy1.5-Quantized-Models/discussions/2#6921bbee921df8e6e4f81d4f

My intended test is to apply the 1080p super-resolution model from hy1.5 video to upscale the original video and then compare its performance with FlashVSR’s current super-resolution upscaling. It is possible that the 1080p upscaling model from lightx2v was not adapted for ComfyUI and is only compatible with their proprietary code.

It is possible that the 1080p upscaling model from lightx2v was not adapted for ComfyUI and is only compatible with their proprietary code.

Yes exactly. The original models posted were not compatible with ComfyUI.
They have since added some that are, but the upscale models are probably not yet made compatible
https://huggingface.co/lightx2v/Hy1.5-Quantized-Models/tree/main (if you look at the file names, it says ComfyUI in those that are compatible)
And ComfyUI also made compatible ones here: https://huggingface.co/Comfy-Org/HunyuanVideo_1.5_repackaged/tree/main/split_files (but no fp8 for the upscale model... yet at least)

Comfy Org org

but no fp8 for the upscale model... yet at least)

Added.

but no fp8 for the upscale model... yet at least)

Added.

still black screen with both upsamplers hunyuanvideo1.5_720p_sr_distilled_fp8_scaled.safetensors and 1080_sr

Comfy Org org

but no fp8 for the upscale model... yet at least)

Added.

still black screen with both upsamplers hunyuanvideo1.5_720p_sr_distilled_fp8_scaled.safetensors and 1080_sr

Oh that's weird... same script was used as with the other models, will have to double check what could be different here.

but no fp8 for the upscale model... yet at least)

Added.

still black screen with both upsamplers hunyuanvideo1.5_720p_sr_distilled_fp8_scaled.safetensors and 1080_sr

Oh that's weird... same script was used as with the other models, will have to double check what could be different here.

This is the video and workflow, maybe I’m the one doing it wrong
workflow: https://pastebin.com/dUQcM80D

I used Comfy’s official sample workflow, with the 480p GGUF Q8 quantized model, and the LoRA provided by the official Comfy repo. However, the results are terrible — what exactly is going wrong?

https://huggingface.co/jayn7/HunyuanVideo-1.5_T2V_480p-GGUF/blob/main/480p/hunyuanvideo1.5_480p_t2v-Q8_0.gguf

https://huggingface.co/Comfy-Org/HunyuanVideo_1.5_repackaged/blob/main/split_files/loras/hunyuanvideo1.5_t2v_480p_lightx2v_4step_lora_rank_32_bf16.safetensors

workflow

I used Comfy’s official sample workflow, with the 480p GGUF Q8 quantized model, and the LoRA provided by the official Comfy repo. However, the results are terrible — what exactly is going wrong?

https://huggingface.co/jayn7/HunyuanVideo-1.5_T2V_480p-GGUF/blob/main/480p/hunyuanvideo1.5_480p_t2v-Q8_0.gguf

https://huggingface.co/Comfy-Org/HunyuanVideo_1.5_repackaged/blob/main/split_files/loras/hunyuanvideo1.5_t2v_480p_lightx2v_4step_lora_rank_32_bf16.safetensors

workflow

try with cfg 1

I used Comfy’s official sample workflow, with the 480p GGUF Q8 quantized model, and the LoRA provided by the official Comfy repo. However, the results are terrible — what exactly is going wrong?

https://huggingface.co/jayn7/HunyuanVideo-1.5_T2V_480p-GGUF/blob/main/480p/hunyuanvideo1.5_480p_t2v-Q8_0.gguf

https://huggingface.co/Comfy-Org/HunyuanVideo_1.5_repackaged/blob/main/split_files/loras/hunyuanvideo1.5_t2v_480p_lightx2v_4step_lora_rank_32_bf16.safetensors

workflow

try with cfg 1

Thank you, after setting cfg to 1 the results finally became normal.

Below is the output using the GGUF Q8 model, 4-step LoRA, cfg=1, steps=4

GGUF Q8 model, 4-step LoRA, cfg=1, steps=6

GGUF Q8 model, cfg=6, steps=20

but no fp8 for the upscale model... yet at least)

Added.

still black screen with both upsamplers hunyuanvideo1.5_720p_sr_distilled_fp8_scaled.safetensors and 1080_sr

Same issue.
The Diffusion Model and Latent Upscale Model I loaded are both the 1080p models from this repo. Since my VRAM is only 12GB, directly using the latent from the first sampling inevitably causes OOM.
Therefore, I passed the latent to Hunyuan Video 15 Latent Upscale With Model using the load video + VAE encode method, but the result was just a black screen.

workflow

The issue with the fp8 SR models was that the SR model doesn't actually use the t_embedder layer, but the weights still exist in the model.... they're just all zeroes, and my fp8 scaling script didn't account for that and stupidly made the scale_weight zeros too.

Uploaded fixed versions now, sorry for the confusion.

The issue with the fp8 SR models was that the SR model doesn't actually use the t_embedder layer, but the weights still exist in the model.... they're just all zeroes, and my fp8 scaling script didn't account for that and stupidly made the scale_weight zeros too.

Uploaded fixed versions now, sorry for the confusion.

Thank you, and on behalf of the community, we are all grateful for your wonderful efforts and works!

I used Comfy’s official sample workflow, with the 480p GGUF Q8 quantized model, and the LoRA provided by the official Comfy repo. However, the results are terrible — what exactly is going wrong?

https://huggingface.co/jayn7/HunyuanVideo-1.5_T2V_480p-GGUF/blob/main/480p/hunyuanvideo1.5_480p_t2v-Q8_0.gguf

https://huggingface.co/Comfy-Org/HunyuanVideo_1.5_repackaged/blob/main/split_files/loras/hunyuanvideo1.5_t2v_480p_lightx2v_4step_lora_rank_32_bf16.safetensors

workflow

try with cfg 1

Thank you, after setting cfg to 1 the results finally became normal.

You're welcome, if you use higher steps, lower the LoRA strength like, Lightx2v lora strength: 0.5 with Steps: 6, 8, and lowering the shift will also have some effect especially with fp16 models

Works perfectly, gave it a try with the new fp8 SR models ;-)

Works perfectly, gave it a try with the new fp8 SR models ;-)

(832x480 => 1280x720p upscale)

Did you connect the latent output from the first KSampler to the hy15 Latent Upscale With Model? Is it possible to connect it using load video + VAE encode? How does the resulting video quality compare with FlahVSR?

Yes the latent output from first sampler goes to the upscale with model. Should be possible to use Load Video + VAE endode, I think. Will give it a try.
FlashVSR is for sure a lot faster, but Hunyuan upscale might be a good alternative

Speaking of speed, not sure if its worth converting the lightx "tiny vae" to comfyUI format as well (but maybe it would require more than just a model conversion, and not sure if its worth the hassle)
https://huggingface.co/lightx2v/Autoencoders/blob/main/lighttaehy1_5.safetensors

Yes the latent output from first sampler goes to the upscale with model. Should be possible to use Load Video + VAE endode, I think. Will give it a try.
FlashVSR is for sure a lot faster, but Hunyuan upscale might be a good alternative

Speaking of speed, not sure if its worth converting the lightx "tiny vae" to comfyUI format as well (but maybe it would require more than just a model conversion, and not sure if its worth the hassle)
https://huggingface.co/lightx2v/Autoencoders/blob/main/lighttaehy1_5.safetensors

It would be great if there were a tiny VAE custom node that could directly replace Comfy’s native load VAE node, because the hy1.5 VAE is really too large. So far, I haven’t found one. Kijai’s load VAE node can load a tiny VAE, but it can only connect to his WanVideo wrapper node and cannot replace Comfy’s original load VAE node.

Comfy Org org

I actually submitted a PR for the tiny VAEs yesterday: https://github.com/comfyanonymous/ComfyUI/pull/10884

I actually submitted a PR for the tiny VAEs yesterday: https://github.com/comfyanonymous/ComfyUI/pull/10884

ah very nice, hopefully the PR goes through. The current VAE is a bit slow, so might be nice to try the tiny one

? Is it possible to connect it using load video + VAE encode?

Tried that, works like a charm ;-)

(input video 480p => upscaled to 720p)
Workflow attached/embedded in the video if you want to play around with it (download the video and drop into comfy for workflow)

I actually submitted a PR for the tiny VAEs yesterday: https://github.com/comfyanonymous/ComfyUI/pull/10884

ah very nice, hopefully the PR goes through. The current VAE is a bit slow, so might be nice to try the tiny one

? Is it possible to connect it using load video + VAE encode?

Tried that, works like a charm ;-)

(input video 480p => upscaled to 720p)
Workflow attached/embedded in the video if you want to play around with it (download the video and drop into comfy for workflow)

Edit : ups, that video was quite large for the thread, i will make a wide 16:9 instead;-)hehe

Honestly, I would delete the hy1.5 video base model, because its video quality is inferior to Wan Video. However, its super-resolution upscale model does have genuine native support from Comfy’s official implementation, which makes it much more convenient to use than FlashVSR.

in some areas i feel its better than Wan.
I am definitive keeping both ;-) both have their strengths

As for FlashVSR, I use the one in WanVideoWrapper. Its more than good enough for my use (and extremely fast)
https://github.com/kijai/ComfyUI-WanVideoWrapper/blob/main/example_workflows/wanvideo_1_3B_FlashVSR_upscale_example.json

I also feel is better than wan(base wan) with the correct sampler.

makisekurisu-jp changed discussion status to closed

Sign up or log in to comment