PokeFA-SDXL-LoRA — Stable Diffusion XL LoRA adapters for domain adaptation to Pokemon fanart

Welcome to my repo from Pikachu (generated with my LoRA checkpoints).

Overview

The adapters are trained on the PokeFA Pokémon fan-art dataset. PokeFA is a captioned Pokémon fanart dataset of 15,888 images, filtered down from 30,000 collected raw images after NSFW filterng, OCR inpainting, aesthetic + relevance scoring & filtering, and hybrid captionining.

Two LoRA stages are trained:

SDXL Base UNet and text-encoder LoRA – Trained on all ~16k cleaned images to learn the global Pokémon fan-art distribution: composition, pose, color, Pokémon identity, and broad stylistic cues. This is where most of the domain shift (generic SDXL → Pokémon fan-art) happens.
SDXL Refiner UNet LoRA (SDXL refiner) – Trained on the top ~30% highest-scoring images (mostly aesthetic with relevance nudge), focused on low-noise timesteps to polish detail and micro-structure (line work, textures, small facial features, highlights). The refiner uses the same text_encoder_2 with the base TE2 LoRA merged, so both the SDXL base and refiner see the same prompt embeddings during training and inference.

If you are interested in how these LoRAs were actually trained, check out the PokeFA-SDXL-LoRA GitHub repo for the full training codebase.

LoRA checkpoitns

This repo ships LoRA weights only and need to be paired with the original SDXL base and refiner checkpoints.

SDXL Base UNet LoRA
- LoRA adapters (rank 64) are attached to the Q/K/V and output projection layers (to_q, to_k, to_v, to_out.0) of the unet attention blocks.
Base text-encoder LoRA (TE1 + TE2)
- LoRA adapters (rank 32) are attached to the Q/K/V and output projection layers (q_proj, k_proj, v_proj, out_proj) of the text_encoder and text_encoder_2 attention blocks.
SDXL Refiner UNet LoRA
- LoRA adapters (rank 64) are attached to the Q/K/V and output projection layers (to_q, to_k, to_v, to_out.0) of the unet attention blocks.

Inference:
Run SDXL base + UNet & TE1 & TE2 LoRA for the first 70% of the diffusion process, then feed that latent into SDXL refiner + UNet LoRA for final detail polishing.

Qualitative Comparison: SDXL vs PokeFA-SDXL-LoRA

What do the LoRA adapters actually buy us? Let's find out with a quick illustration:

Same prompt: Bulbasaur in a beautiful garden filled with flowers | bulbasaur, full-body view, cute
Same identical inference settings (scheduler, denoising steps, CFG scale, LoRA scales, etc.) and the same random seed (same initial noise)

SDXL Base LoRA Comparison.

Left: vanilla SDXL base. Middle: SDXL base + UNet LoRA. Right: SDXL base + UNet LoRA + TE1 & TE2 LoRA.

Vanilla SDXL base doesn’t really “know” Bulbasaur as a Pokemon, producing a generic frog-like creature. Adding the UNet LoRA bends the image space toward Pokémon fanart and gets much closer to Bulbasaur, but there are still noticeable inconsistencies. Once TE1 & TE2 LoRA are enabled, the Bulbasaur identity improves significantly — but the background still a bit-messy.

SDXL Refiner LoRA Comparison.

Left: SDXL base + UNet LoRA + TE1 & TE2 LoRA (no refiner). Middle: SDXL base + UNet LoRA + TE1 & TE2 LoRA + vanilla refiner. Right: SDXL base + UNet LoRA + TE1 & TE2 LoRA + refiner LoRA.

Adding the vanilla refiner sharpens edges and cleans up the background, but still misses small, domain-specific details (in this case Bulbasaur’s mouth). With the refiner LoRA, it polishes fine details in a way that’s consistent with Bulbasaur appearance and the general Pokemon fanart style.

More Generation Examples

Pikachu playing on a sandy beach at sunset | pikachu, anime-style, cute

Squirtle, chibi-style, playing by a calm pond in a city park | squirtle, chibi-style, full-body view, daytime, city_park, pond, water, cute

Gengar standing under a starry night sky on a grassy hill | gengar, full-body view, night, stars, grassy_hill

Charmander with a backpack walking through a sun-dappled forest path | charmander, anime-style, full-body view, daytime, forest, relaxed

Note: All images shown here are fan-made generations produced by LoRA checkpoints in this repo. They are provided for illustrative, non-commercial purposes only and are not official Pokémon artwork.

Use this model

This repo contains a ready-to-run inference script: poke_sdxl_lora_inference.py. After downloading / cloning this model repo and editing the config block at the top:

python poke_sdxl_lora_inference.py

Limitations

Uneven generation qualities across different Pokemon
Generation quality tends to degrade for less popular Pokemon. Popular pokemon have far more high-quality fanart available in both the fanart community and the training data, so they’re usually rendered more cleanly and consistently. Less well-known Pokémon may look off, have simplified designs, or show more style drift.
Not meant for official, pixel-perfect designs
This LoRA is not trained to exactly match the official Pokémon Company designs, art-style, or exact Pokedex poses. It is domain-adapted to fanart, so the output reflects the creativity, style, and tastes of the Pokemon fanart community. As a result, appearance, anatomy, markings, or colors are not guaranteed to be 100% faithful to official designs.
Probabilistic outputs: multiple generations may be required
SDXL is a probabilistic diffusion model: even with the same prompt and inference settings, different random seeds can produce very different images. To get a result you’re happy with, you may need to:
- Try multiple torch.Generator seeds, or
- Generate a small batch and cherry-pick your favorite sample.
Struggles with complex compositional prompts
Like vanilla SDXL, this LoRA can struggle with prompts that require precise spatial relationships or multi-character interactions. For example, a prompt like “Sylveon standing on Umbreon’s back while Glaceon waves in the background” may produce incorrect positions, merged characters, or missing elements. The model is best suited for single-character or simple multi-character scenes rather than intricate, layout-sensitive compositions.

License

license: cc-by-nc-4.0

This model is released under the Creative Commons Attribution–NonCommercial 4.0 International (CC BY-NC 4.0) license.
You may use, modify, and share the weights and outputs for non-commercial purposes only, as long as you provide proper attribution.

Pokémon IP notice:

Pokémon and all related names, characters, and imagery are trademarks and copyrighted works of The Pokémon Company, Nintendo, Game Freak, and Creatures Inc. This project is fan-made, non-commercial, and is not affiliated with, endorsed by, or sponsored by any of these entities.

Any use of this model that involves commercial exploitation of Pokémon IP (including but not limited to, selling generated images, using them in paid products or services, or monetized media) may infringe third-party rights. You are solely responsible for ensuring that your use of the model and its outputs complies with applicable laws, platform policies, and the rights of The Pokémon Company and other IP holders.

This description is provided for informational purposes only and does not constitute legal advice.

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for Kev0208/Pokemon-Fanart-SDXL-LoRA

Base model

stabilityai/stable-diffusion-xl-base-1.0

Finetuned

(1239)

this model

Kev0208
/

Pokemon-Fanart-SDXL-LoRA