PokeFA-SDXL-LoRA — Stable Diffusion XL LoRA adapters for domain adaptation to Pokemon fanart
Welcome to my repo from Pikachu (generated with my LoRA checkpoints).
Overview
The adapters are trained on the PokeFA Pokémon fan-art dataset. PokeFA is a captioned Pokémon fanart dataset of 15,888 images, filtered down from 30,000 collected raw images after NSFW filterng, OCR inpainting, aesthetic + relevance scoring & filtering, and hybrid captionining.
Two LoRA stages are trained:
- SDXL Base UNet and text-encoder LoRA – Trained on all ~16k cleaned images to learn the global Pokémon fan-art distribution: composition, pose, color, Pokémon identity, and broad stylistic cues. This is where most of the domain shift (generic SDXL → Pokémon fan-art) happens.
- SDXL Refiner UNet LoRA (SDXL refiner) – Trained on the top ~30% highest-scoring images (mostly aesthetic with relevance nudge), focused on low-noise timesteps to polish detail and micro-structure (line work, textures, small facial features, highlights). The refiner uses the same
text_encoder_2with the base TE2 LoRA merged, so both the SDXL base and refiner see the same prompt embeddings during training and inference.
If you are interested in how these LoRAs were actually trained, check out the PokeFA-SDXL-LoRA GitHub repo for the full training codebase.
LoRA checkpoitns
This repo ships LoRA weights only and need to be paired with the original SDXL base and refiner checkpoints.
SDXL Base UNet LoRA
- LoRA adapters (rank 64) are attached to the Q/K/V and output projection layers (
to_q,to_k,to_v,to_out.0) of theunetattention blocks.
- LoRA adapters (rank 64) are attached to the Q/K/V and output projection layers (
Base text-encoder LoRA (TE1 + TE2)
- LoRA adapters (rank 32) are attached to the Q/K/V and output projection layers (
q_proj,k_proj,v_proj,out_proj) of thetext_encoderandtext_encoder_2attention blocks.
- LoRA adapters (rank 32) are attached to the Q/K/V and output projection layers (
SDXL Refiner UNet LoRA
- LoRA adapters (rank 64) are attached to the Q/K/V and output projection layers (
to_q,to_k,to_v,to_out.0) of theunetattention blocks.
- LoRA adapters (rank 64) are attached to the Q/K/V and output projection layers (
Inference:
Run SDXL base + UNet & TE1 & TE2 LoRA for the first 70% of the diffusion process, then feed that latent into SDXL refiner + UNet LoRA for final detail polishing.
Qualitative Comparison: SDXL vs PokeFA-SDXL-LoRA
What do the LoRA adapters actually buy us? Let's find out with a quick illustration:
- Same prompt: Bulbasaur in a beautiful garden filled with flowers | bulbasaur, full-body view, cute
- Same identical inference settings (scheduler, denoising steps, CFG scale, LoRA scales, etc.) and the same random seed (same initial noise)
SDXL Base LoRA Comparison.
Left: vanilla SDXL base. Middle: SDXL base + UNet LoRA. Right: SDXL base + UNet LoRA + TE1 & TE2 LoRA.
Vanilla SDXL base doesn’t really “know” Bulbasaur as a Pokemon, producing a generic frog-like creature. Adding the UNet LoRA bends the image space toward Pokémon fanart and gets much closer to Bulbasaur, but there are still noticeable inconsistencies. Once TE1 & TE2 LoRA are enabled, the Bulbasaur identity improves significantly — but the background still a bit-messy.
SDXL Refiner LoRA Comparison.
Left: SDXL base + UNet LoRA + TE1 & TE2 LoRA (no refiner). Middle: SDXL base + UNet LoRA + TE1 & TE2 LoRA + vanilla refiner. Right: SDXL base + UNet LoRA + TE1 & TE2 LoRA + refiner LoRA.
Adding the vanilla refiner sharpens edges and cleans up the background, but still misses small, domain-specific details (in this case Bulbasaur’s mouth). With the refiner LoRA, it polishes fine details in a way that’s consistent with Bulbasaur appearance and the general Pokemon fanart style.
More Generation Examples
Pikachu playing on a sandy beach at sunset | pikachu, anime-style, cute
Squirtle, chibi-style, playing by a calm pond in a city park | squirtle, chibi-style, full-body view, daytime, city_park, pond, water, cute
Gengar standing under a starry night sky on a grassy hill | gengar, full-body view, night, stars, grassy_hill
Charmander with a backpack walking through a sun-dappled forest path | charmander, anime-style, full-body view, daytime, forest, relaxed
Note: All images shown here are fan-made generations produced by LoRA checkpoints in this repo. They are provided for illustrative, non-commercial purposes only and are not official Pokémon artwork.
Use this model
This repo contains a ready-to-run inference script: poke_sdxl_lora_inference.py. After downloading / cloning this model repo and editing the config block at the top:
python poke_sdxl_lora_inference.py
Limitations
Uneven generation qualities across different Pokemon
Generation quality tends to degrade for less popular Pokemon. Popular pokemon have far more high-quality fanart available in both the fanart community and the training data, so they’re usually rendered more cleanly and consistently. Less well-known Pokémon may look off, have simplified designs, or show more style drift.Not meant for official, pixel-perfect designs
This LoRA is not trained to exactly match the official Pokémon Company designs, art-style, or exact Pokedex poses. It is domain-adapted to fanart, so the output reflects the creativity, style, and tastes of the Pokemon fanart community. As a result, appearance, anatomy, markings, or colors are not guaranteed to be 100% faithful to official designs.Probabilistic outputs: multiple generations may be required
SDXL is a probabilistic diffusion model: even with the same prompt and inference settings, different random seeds can produce very different images. To get a result you’re happy with, you may need to:- Try multiple
torch.Generatorseeds, or - Generate a small batch and cherry-pick your favorite sample.
- Try multiple
Struggles with complex compositional prompts
Like vanilla SDXL, this LoRA can struggle with prompts that require precise spatial relationships or multi-character interactions. For example, a prompt like “Sylveon standing on Umbreon’s back while Glaceon waves in the background” may produce incorrect positions, merged characters, or missing elements. The model is best suited for single-character or simple multi-character scenes rather than intricate, layout-sensitive compositions.
License
license: cc-by-nc-4.0
This model is released under the Creative Commons Attribution–NonCommercial 4.0 International (CC BY-NC 4.0) license.
You may use, modify, and share the weights and outputs for non-commercial purposes only, as long as you provide proper attribution.
Pokémon IP notice:
Pokémon and all related names, characters, and imagery are trademarks and copyrighted works of The Pokémon Company, Nintendo, Game Freak, and Creatures Inc. This project is fan-made, non-commercial, and is not affiliated with, endorsed by, or sponsored by any of these entities.
Any use of this model that involves commercial exploitation of Pokémon IP (including but not limited to, selling generated images, using them in paid products or services, or monetized media) may infringe third-party rights. You are solely responsible for ensuring that your use of the model and its outputs complies with applicable laws, platform policies, and the rights of The Pokémon Company and other IP holders.
This description is provided for informational purposes only and does not constitute legal advice.
Model tree for Kev0208/Pokemon-Fanart-SDXL-LoRA
Base model
stabilityai/stable-diffusion-xl-base-1.0