entropy-v1-fp8

Try this model on Entropy Studio

Entropy v1 FP8 (Gemma 3 27B IT)

Entropy v1 FP8 is a merged + FP8-quantized checkpoint based on google/gemma-3-27b-it, fine-tuned to rewrite AI-polished text into more human-sounding prose while preserving meaning.

This repo is intended for efficient inference in vLLM without runtime LoRA.

What It Does

Given an AI-sounding passage, the model rewrites it to be:

  • More human and textured (less generic "professional polish")
  • More varied in rhythm/word choice
  • Meaning-preserving (style change, not content change)

Prompt Trigger (Recommended)

This is the pattern used in our fine-tuning data. Keep the passage after a newline.

Polish this AI passage to feel more human:
{passage}

Short variants that usually work similarly:

  • Rephrase this AI passage to feel more human:\n{passage}
  • Convert this AI passage into a more human-sounding version:\n{passage}

How To Run (vLLM)

1) Start an OpenAI-compatible server

vllm serve ysong21/entropy-v1-fp8 \
  --served-model-name entropy-v1-fp8 \
  --host 0.0.0.0 \
  --port 8000 \
  --dtype bfloat16 \
  --max-model-len 8192

Notes:

  • This checkpoint is already quantized (compressed-tensors FP8_DYNAMIC). You do not need to pass --quantization fp8.
  • FP8 execution is hardware-dependent; see "Quantization" below.

2) Send a request

curl http://127.0.0.1:8000/v1/chat/completions \
  -H 'Content-Type: application/json' \
  -H 'Authorization: Bearer sk-noop' \
  -d '{
    "model": "entropy-v1-fp8",
    "messages": [
      {
        "role": "user",
        "content": "Polish this AI passage to feel more human:\nThis is a highly polished paragraph that sounds generic and overly smooth..."
      }
    ],
    "temperature": 0.7,
    "top_p": 0.95,
    "max_tokens": 512
  }'

Validation Benchmark (70 Gutenberg Examples)

We evaluate by computing the conditional negative log-likelihood of the target (human) rewrite given the prompt, and report character-normalized bits_per_char:

  • Let NLL be the sum of token NLL over the target rewrite (teacher-forced).
  • Let C be the number of characters in the target rewrite.
  • bits_per_char = (NLL / C) / ln(2).

This char-normalization makes the score more comparable across models/tokenizers than token-based perplexity.

Lower is better.

Results

Baseline for relative improvement: N8Programs/Unslopper-30B-A3B-bf16.

System bits_per_char (↓) Relative improvement vs Unslopper (↑)
Entropy v1 FP8 (this repo) 0.35994 +4.07%
N8Programs/Unslopper-30B-A3B-bf16 0.37522 +0.00%
Base google/gemma-3-27b-it 0.99565 -165.35%

Interpretation:

  • Entropy v1 FP8 achieves the best bits_per_char on this 70-example Gutenberg validation study.

Quantization (Merged FP8_DYNAMIC)

This checkpoint is produced in two steps:

  1. Merge: a PEFT LoRA adapter is merged into the base Gemma 3 27B IT weights (no runtime LoRA).
  2. Quantize: we apply FP8_DYNAMIC (W8A8) quantization with llm-compressor:
  • Targets: all Linear layers in the language model
  • Weights: FP8, static per-channel scaling
  • Activations: FP8, dynamic per-token scaling
  • Ignored: lm_head and the Gemma 3 vision tower (left in BF16)

The model is saved in a vLLM-loadable compressed-tensors format.

Hardware notes (vLLM):

  • Hopper/Ada/Blackwell-class NVIDIA GPUs can execute FP8 efficiently.
  • Other GPUs may fall back to less optimized modes.

Throughput (vLLM)

Measured on a single NVIDIA RTX PRO 6000 Blackwell 96GB using vllm/vllm-openai:v0.11.2 with random prompts:

  • Input length: 512 tokens
  • Output length: 256 tokens
max_concurrency output tok/s total tok/s
1 25.87 77.51
20 412.60 1236.20

Limitations / Misuse

  • Trained primarily on literary/public-domain style passages; performance may vary on technical/legal writing.
  • Like other "humanizer" models, it can be misused for deceptive purposes. Use responsibly and follow applicable policies and disclosure norms.

Citation

If you use this model in research, please cite:

@misc{entropy_v1_fp8,
  title  = {Entropy v1 FP8 (Gemma 3 27B IT)},
  author = {ysong21},
  year   = {2026},
  note   = {Merged + FP8_DYNAMIC quantized checkpoint for AI-to-human rewriting.}
}
Downloads last month
4
Safetensors
Model size
27B params
Tensor type
BF16
·
F8_E4M3
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ysong21/entropy-v1-fp8

Quantized
(116)
this model
Quantizations
1 model

Dataset used to train ysong21/entropy-v1-fp8