Try this model on Entropy Studio
Entropy v1 FP8 (Gemma 3 27B IT)
Entropy v1 FP8 is a merged + FP8-quantized checkpoint based on google/gemma-3-27b-it, fine-tuned to rewrite AI-polished text into more human-sounding prose while preserving meaning.
This repo is intended for efficient inference in vLLM without runtime LoRA.
What It Does
Given an AI-sounding passage, the model rewrites it to be:
- More human and textured (less generic "professional polish")
- More varied in rhythm/word choice
- Meaning-preserving (style change, not content change)
Prompt Trigger (Recommended)
This is the pattern used in our fine-tuning data. Keep the passage after a newline.
Polish this AI passage to feel more human:
{passage}
Short variants that usually work similarly:
Rephrase this AI passage to feel more human:\n{passage}Convert this AI passage into a more human-sounding version:\n{passage}
How To Run (vLLM)
1) Start an OpenAI-compatible server
vllm serve ysong21/entropy-v1-fp8 \
--served-model-name entropy-v1-fp8 \
--host 0.0.0.0 \
--port 8000 \
--dtype bfloat16 \
--max-model-len 8192
Notes:
- This checkpoint is already quantized (compressed-tensors FP8_DYNAMIC). You do not need to pass
--quantization fp8. - FP8 execution is hardware-dependent; see "Quantization" below.
2) Send a request
curl http://127.0.0.1:8000/v1/chat/completions \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer sk-noop' \
-d '{
"model": "entropy-v1-fp8",
"messages": [
{
"role": "user",
"content": "Polish this AI passage to feel more human:\nThis is a highly polished paragraph that sounds generic and overly smooth..."
}
],
"temperature": 0.7,
"top_p": 0.95,
"max_tokens": 512
}'
Validation Benchmark (70 Gutenberg Examples)
We evaluate by computing the conditional negative log-likelihood of the target (human) rewrite given the prompt, and report character-normalized bits_per_char:
- Let
NLLbe the sum of token NLL over the target rewrite (teacher-forced). - Let
Cbe the number of characters in the target rewrite. bits_per_char = (NLL / C) / ln(2).
This char-normalization makes the score more comparable across models/tokenizers than token-based perplexity.
Lower is better.
Results
Baseline for relative improvement: N8Programs/Unslopper-30B-A3B-bf16.
| System | bits_per_char (↓) | Relative improvement vs Unslopper (↑) |
|---|---|---|
| Entropy v1 FP8 (this repo) | 0.35994 | +4.07% |
| N8Programs/Unslopper-30B-A3B-bf16 | 0.37522 | +0.00% |
| Base google/gemma-3-27b-it | 0.99565 | -165.35% |
Interpretation:
- Entropy v1 FP8 achieves the best bits_per_char on this 70-example Gutenberg validation study.
Quantization (Merged FP8_DYNAMIC)
This checkpoint is produced in two steps:
- Merge: a PEFT LoRA adapter is merged into the base Gemma 3 27B IT weights (no runtime LoRA).
- Quantize: we apply FP8_DYNAMIC (W8A8) quantization with
llm-compressor:
- Targets: all
Linearlayers in the language model - Weights: FP8, static per-channel scaling
- Activations: FP8, dynamic per-token scaling
- Ignored:
lm_headand the Gemma 3 vision tower (left in BF16)
The model is saved in a vLLM-loadable compressed-tensors format.
Hardware notes (vLLM):
- Hopper/Ada/Blackwell-class NVIDIA GPUs can execute FP8 efficiently.
- Other GPUs may fall back to less optimized modes.
Throughput (vLLM)
Measured on a single NVIDIA RTX PRO 6000 Blackwell 96GB using vllm/vllm-openai:v0.11.2 with random prompts:
- Input length: 512 tokens
- Output length: 256 tokens
| max_concurrency | output tok/s | total tok/s |
|---|---|---|
| 1 | 25.87 | 77.51 |
| 20 | 412.60 | 1236.20 |
Limitations / Misuse
- Trained primarily on literary/public-domain style passages; performance may vary on technical/legal writing.
- Like other "humanizer" models, it can be misused for deceptive purposes. Use responsibly and follow applicable policies and disclosure norms.
Citation
If you use this model in research, please cite:
@misc{entropy_v1_fp8,
title = {Entropy v1 FP8 (Gemma 3 27B IT)},
author = {ysong21},
year = {2026},
note = {Merged + FP8_DYNAMIC quantized checkpoint for AI-to-human rewriting.}
}
- Downloads last month
- 4
