--- license: apache-2.0 pipeline_tag: image-to-image library_name: diffusers --- # iMontage: Unified, Versatile, Highly Dynamic Many-to-many Image Generation [![arXiv](https://img.shields.io/badge/arXiv-2511.20635-b31b1b.svg)](https://huggingface.co/papers/2511.20635) [![Project Page](https://img.shields.io/badge/Project-Page-4b9e5f.svg)](https://kr1sjfu.github.io/iMontage-web/) [![GitHub](https://img.shields.io/badge/GitHub-Code-keygen.svg?logo=github&style=flat-square)](https://github.com/Kr1sJFU/iMontage)

iMontage Teaser

iMontage is a unified framework designed to repurpose a powerful video model into an all-in-one image generator. The framework consumes and produces variable-length image sets, unifying a wide array of image generation and editing tasks. This approach allows the model to acquire broad image manipulation capabilities without corrupting its invaluable original motion priors. iMontage excels across several mainstream many-in-many-out tasks, maintaining strong cross-image contextual consistency and generating scenes with extraordinary dynamics. ## 📦 Features - ⚡ High-dynamic, high-consistency image generation from flexible inputs - 🎛️ Robust instruction following across heterogeneous tasks - 🌀 Video-like temporal coherence, even for non-video image sets - 🏆 SOTA results across different tasks ## 🚀 Sample Usage For detailed installation instructions and more complex inference examples, please refer to the [GitHub repository](https://github.com/Kr1sJFU/iMontage). Here is a simple Python code snippet demonstrating image generation with iMontage: ```python from inference_solver import FlexARInferenceSolver from PIL import Image # ******************** Image Generation ******************** inference_solver = FlexARInferenceSolver( model_path="Kr1sJ/iMontage", # Use this Hugging Face model precision="bf16", target_size=768, # Ensure target_size is consistent with the checkpoint ) q1 = f"Generate an image of 768x768 according to the following prompt:\ " \ f"Image of a dog playing water, and a waterfall is in the background." # generated: tuple of (generated response, list of generated images) generated = inference_solver.generate( images=[], qas=[[q1, None]], max_gen_len=8192, temperature=1.0, logits_processor=inference_solver.create_logits_processor(cfg=4.0, image_top_k=2000), ) a1, new_image = generated[0], generated[1][0] # Display or save the image new_image.show() # new_image.save("generated_dog.png") ``` The model supports various tasks as illustrated below: | **Task Type** | **Input** | **Prompt** | **Output** | |----------------------|-------------------------------------------------------------|----------------------------------------------------------------|-------------------------------------------------------------| | **image_editing** |

| *Change the material of the lava to silver.* |

| | **cref** |

| *Confucius from the first image, Moses from the second…* |

| | **conditioned_cref** |

| *depth* |

| | **sref** |

| *(empty)* |

| | **multiview** |

| *1. Shift left; 2. Look up; 3. Zoom out.* |

| | **storyboard** |

| *Vintage film: 1. Hepburn carrying the yellow bag…* |

| ## 💖 Acknowledgment We sincerely thank the open-source community for providing strong foundations that enabled this work. In particular, we acknowledge the following projects for their models, datasets, and valuable insights: - **HunyuanVideo-T2V**, **HunyuanVideo-I2V** – Provided base generative model designs and code. - **FastVideo** – Contributed key components and open-source utilities that supported our development. These contributions have greatly influenced our research and helped shape the design of **iMontage**. --- ## 📝 Citation If you find **iMontage** useful for your research or applications, please consider starring ⭐ the repo and citing our paper: ```bibtex @article{fu2025iMontage, title={iMontage: Unified, Versatile, Highly Dynamic Many-to-many Image Generation}, author={Zhoujie Fu and Xianfang Zeng and Jinghong Lan and Xinyao Liao and Cheng Chen and Junyi Chen and Jiacheng Wei and Wei Cheng and Shiyu Liu and Yunuo Chen and Gang Yu and Guosheng Lin}, journal={arXiv preprint arXiv:2511.20635}, year={2025}, } ```