Try this model on Entropy Studio

Entropy v1 FP8 (Gemma 3 27B IT)

Entropy v1 FP8 is a merged + FP8-quantized checkpoint based on google/gemma-3-27b-it, fine-tuned to rewrite AI-polished text into more human-sounding prose while preserving meaning.

This repo is intended for efficient inference in vLLM without runtime LoRA.

What It Does

Given an AI-sounding passage, the model rewrites it to be:

More human and textured (less generic "professional polish")
More varied in rhythm/word choice
Meaning-preserving (style change, not content change)

Prompt Trigger (Recommended)

This is the pattern used in our fine-tuning data. Keep the passage after a newline.

Polish this AI passage to feel more human:
{passage}

Short variants that usually work similarly:

Rephrase this AI passage to feel more human:\n{passage}
Convert this AI passage into a more human-sounding version:\n{passage}

How To Run (vLLM)

1) Start an OpenAI-compatible server

vllm serve ysong21/entropy-v1-fp8 \
  --served-model-name entropy-v1-fp8 \
  --host 0.0.0.0 \
  --port 8000 \
  --dtype bfloat16 \
  --max-model-len 8192

Notes:

This checkpoint is already quantized (compressed-tensors FP8_DYNAMIC). You do not need to pass --quantization fp8.
FP8 execution is hardware-dependent; see "Quantization" below.

2) Send a request

curl http://127.0.0.1:8000/v1/chat/completions \
  -H 'Content-Type: application/json' \
  -H 'Authorization: Bearer sk-noop' \
  -d '{
    "model": "entropy-v1-fp8",
    "messages": [
      {
        "role": "user",
        "content": "Polish this AI passage to feel more human:\nThis is a highly polished paragraph that sounds generic and overly smooth..."
      }
    ],
    "temperature": 0.7,
    "top_p": 0.95,
    "max_tokens": 512
  }'

Validation Benchmark (70 Gutenberg Examples)

We evaluate by computing the conditional negative log-likelihood of the target (human) rewrite given the prompt, and report character-normalized bits_per_char:

Let NLL be the sum of token NLL over the target rewrite (teacher-forced).
Let C be the number of characters in the target rewrite.
bits_per_char = (NLL / C) / ln(2).

This char-normalization makes the score more comparable across models/tokenizers than token-based perplexity.

Lower is better.

Results

Baseline for relative improvement: N8Programs/Unslopper-30B-A3B-bf16.

System	bits_per_char (↓)	Relative improvement vs Unslopper (↑)
Entropy v1 FP8 (this repo)	0.35994	+4.07%
N8Programs/Unslopper-30B-A3B-bf16	0.37522	+0.00%
Base google/gemma-3-27b-it	0.99565	-165.35%

Interpretation:

Entropy v1 FP8 achieves the best bits_per_char on this 70-example Gutenberg validation study.

Quantization (Merged FP8_DYNAMIC)

This checkpoint is produced in two steps:

Merge: a PEFT LoRA adapter is merged into the base Gemma 3 27B IT weights (no runtime LoRA).
Quantize: we apply FP8_DYNAMIC (W8A8) quantization with llm-compressor:

Targets: all Linear layers in the language model
Weights: FP8, static per-channel scaling
Activations: FP8, dynamic per-token scaling
Ignored: lm_head and the Gemma 3 vision tower (left in BF16)

The model is saved in a vLLM-loadable compressed-tensors format.

Hardware notes (vLLM):

Hopper/Ada/Blackwell-class NVIDIA GPUs can execute FP8 efficiently.
Other GPUs may fall back to less optimized modes.

Throughput (vLLM)

Measured on a single NVIDIA RTX PRO 6000 Blackwell 96GB using vllm/vllm-openai:v0.11.2 with random prompts:

Input length: 512 tokens
Output length: 256 tokens

max_concurrency	output tok/s	total tok/s
1	25.87	77.51
20	412.60	1236.20

Limitations / Misuse

Trained primarily on literary/public-domain style passages; performance may vary on technical/legal writing.
Like other "humanizer" models, it can be misused for deceptive purposes. Use responsibly and follow applicable policies and disclosure norms.

Citation

If you use this model in research, please cite:

@misc{entropy_v1_fp8,
  title  = {Entropy v1 FP8 (Gemma 3 27B IT)},
  author = {ysong21},
  year   = {2026},
  note   = {Merged + FP8_DYNAMIC quantized checkpoint for AI-to-human rewriting.}
}

Downloads last month: 4

Safetensors

Model size

27B params

Tensor type

BF16

F8_E4M3

Model tree for ysong21/entropy-v1-fp8

Base model

google/gemma-3-27b-pt

Finetuned

google/gemma-3-27b-it

Quantized

(116)

this model

Quantizations

1 model

ysong21
/

entropy-v1-fp8