Disty0
/

sote-diffusion-cascade-decoder_pre-alpha0

+---
+pipeline_tag: text-to-image
+license: other
+license_name: stable-cascade-nc-community
+license_link: LICENSE
+---
+# SoteDiffusion Cascade
+Anime finetune of Stable Cascade Decoder.
+No commercial use thanks to StabilityAI.
+## Code Example
+```shell
+pip install diffusers
+```
+```python
+import torch
+from diffusers import StableCascadeDecoderPipeline, StableCascadePriorPipeline
+prompt = "(extremely aesthetic, best quality, newest), 1girl, solo, cat ears, looking at viewer, blush, light smile, upper body,"
+negative_prompt = "very displeasing, worst quality, monochrome, sketch, blurry, fat, child,"
+prior = StableCascadePriorPipeline.from_pretrained("Disty0|SoteDiffusion-Cascade_pre-alpha0", torch_dtype=torch.float16)
+decoder = StableCascadeDecoderPipeline.from_pretrained("SoteDiffusion-Cascade_Decoder", torch_dtype=torch.float16)
+prior.enable_model_cpu_offload()
+prior_output = prior(
+    prompt=prompt,
+    height=1024,
+    width=1024,
+    negative_prompt=negative_prompt,
+    guidance_scale=6.0,
+    num_images_per_prompt=1,
+    num_inference_steps=30
+)
+decoder.enable_model_cpu_offload()
+decoder_output = decoder(
+    image_embeddings=prior_output.image_embeddings.to(torch.float16),
+    prompt=prompt,
+    negative_prompt=negative_prompt,
+    guidance_scale=1.0,
+    output_type="pil",
+    num_inference_steps=10
+).images[0]
+decoder_output.save("cascade.png")
+```
+## Dataset
+Used the same dataset as SoteDiffusion-Cascade_pre-alpha0.
+Selected images from newest dataset that got more than 0.98 score by both aesthetic and quality taggers.
+Trained with 98K~ images.
+## Training:
+**GPU used for training**: 1x AMD RX 7900 XTX 24GB
+**Software used**: https://github.com/2kpr/StableCascade
+### Config:
+```
+experiment_id: sotediffusion-sc-b_3b
+model_version: 3B
+dtype: bfloat16
+use_fsdp: False
+batch_size: 64
+grad_accum_steps: 64
+updates: 3000
+backup_every: 128
+save_every: 32
+warmup_updates: 100
+lr: 4.0e-6
+optimizer_type: Adafactor
+adaptive_loss_weight: True
+stochastic_rounding: True
+image_size: 768
+multi_aspect_ratio: [1/1, 1/2, 1/3, 2/3, 3/4, 1/5, 2/5, 3/5, 4/5, 1/6, 5/6, 9/16]
+shift: 4
+checkpoint_path: /mnt/DataSSD/AI/SoteDiffusion/StableCascade/
+output_path: /mnt/DataSSD/AI/SoteDiffusion/StableCascade/
+webdataset_path: file:/mnt/DataSSD/AI/anime_image_dataset/best/newest_best-{0000..0001}.tar
+effnet_checkpoint_path: /mnt/DataSSD/AI/models/sd-cascade/effnet_encoder.safetensors
+stage_a_checkpoint_path: /mnt/DataSSD/AI/models/sd-cascade/stage_a.safetensors
+generator_checkpoint_path: /mnt/DataSSD/AI/SoteDiffusion/StableCascade/stage_b-generator-049152.safetensors
+```
+## Limitations and Bias
+### Bias
+- This model is intended for anime illustrations.
+  Realistic capabilites are not tested at all.
+### Limitations
+- Far shot eyes are bad thanks to the heavy latent compression.

model_index.json CHANGED Viewed

@@ -1,7 +1,7 @@
 {
   "_class_name": "StableCascadeDecoderPipeline",
   "_diffusers_version": "0.27.0",
-  "_name_or_path": "stabilityai/stable-cascade",
   "decoder": [
     "diffusers",
     "StableCascadeUNet"

 {
   "_class_name": "StableCascadeDecoderPipeline",
   "_diffusers_version": "0.27.0",
+  "_name_or_path": "Disty0/SoteDiffusion-Cascade_Decoder",
   "decoder": [
     "diffusers",
     "StableCascadeUNet"