Need4Speed

company

Activity Feed

AI & ML interests

None defined yet.

Recent Activity

Haihao authored a paper 1 day ago

SignRoundV2: Closing the Performance Gap in Extremely Low-Bit Post-Training Quantization for LLMs

wenhuach authored a paper 1 day ago

SignRoundV2: Closing the Performance Gap in Extremely Low-Bit Post-Training Quantization for LLMs

lvwerra authored a paper about 2 months ago

BigCodeArena: Unveiling More Reliable Human Preferences in Code Generation via Execution

View all activity

Haihao

authored a paper 1 day ago

SignRoundV2: Closing the Performance Gap in Extremely Low-Bit Post-Training Quantization for LLMs

Paper • 2512.04746 • Published 2 days ago • 7

wenhuach

authored a paper 1 day ago

SignRoundV2: Closing the Performance Gap in Extremely Low-Bit Post-Training Quantization for LLMs

Paper • 2512.04746 • Published 2 days ago • 7

wenhuach

posted an update 1 day ago

Post

1603

🚀 SignRoundV2 for LLM quantization: PTQ-level cost, QAT-level accuracy — yes, even at 2 bits.

SignRoundV2: Closing the Performance Gap in Extremely Low-Bit Post-Training Quantization for LLMs (2512.04746)

wenhuach

posted an update about 1 month ago

Post

297

🚀 AutoRound(https://github.com/intel/auto-round) is now supported by SGLang!

After integrations with TorchAO, Transformers, and VLLM, AutoRound-quantized models are now officially compatible with SGLang — bringing faster and more flexible deployment to your LLM workflows.

💡 We’ve also enhanced the RTN mode (--iters 0), cutting quantization costs significantly for low-resource users.

⭐ Star our repo and stay tuned for more exciting updates!

wenhuach

posted an update about 2 months ago

Post

1758

AutoRound keeps evolving its LLM quantization algorithm! 🚀
After enhancing W2A16 quantization, we now offer a fast algorithm to generate mixed bits/data-type schemes (~2mins for 8B models), great for MXFP4 and W2A16.
Learn more: https://github.com/intel/auto-round/blob/main/docs/step_by_step.md#autoscheme

lvwerra

authored a paper about 2 months ago

BigCodeArena: Unveiling More Reliable Human Preferences in Code Generation via Execution

Paper • 2510.08697 • Published Oct 9 • 35

wenhuach

posted an update 3 months ago

Post

421

AutoRound v0.7 is out! 🚀
This release includes enhanced algorithms for W2A16, NVFP4, and MXFP4, along with support for FP8 models as input.
👉 Check out the full details here: https://github.com/intel/auto-round/releases/tag/v0.7.0

wenhuach

posted an update 5 months ago

Post

1941

🚀 AutoRound(https://github.com/intel/auto-round) Now Supports GGUF Export & Custom Bit Settings!

We're excited to announce that AutoRound now supports:
✅ GGUF format export – for seamless compatibility with popular inference engines.
✅ Custom bit settings – tailor quantization to your needs for optimal performance.

Check out these newly released models:
🔹Intel/Qwen3-235B-A22B-Instruct-2507-gguf-q4km-AutoRound
🔹Intel/Qwen3-235B-A22B-Instruct-2507-gguf-q2ks-mixed-AutoRound
🔹Intel/Kimi-K2-Instruct-gguf-q2ks-mixed-AutoRound

Stay tuned! An even more advanced algorithm for some configurations is coming soon.

lvwerra

authored a paper 5 months ago

FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language

Paper • 2506.20920 • Published Jun 26 • 75

loubnabnl

authored a paper 6 months ago

The Common Pile v0.1: An 8TB Dataset of Public Domain and Openly Licensed Text

Paper • 2506.05209 • Published Jun 5 • 58

zhentaoyu

authored a paper 6 months ago

HunyuanPortrait: Implicit Condition Control for Enhanced Portrait Animation

Paper • 2503.18860 • Published Mar 24 • 6

wenhuach

posted an update 7 months ago

Post

1909

AutoRound(https://github.com/intel/auto-round) has been integrated into vLLM , allowing you to run AutoRound-formatted models directly in the upcoming release.

Beside, we strongly recommend using AutoRound to generate AWQ INT4 models, as AutoAWQ is no longer maintained and manually configuring new models is not trivial due to the need for custom layer mappings.

loubnabnl

posted an update 7 months ago

Post

5817

SmolVLM is now available on PocketPal — you can run it offline on your smartphone to interpret the world around you. 🌍📱

And check out this real-time camera demo by @ngxson , powered by llama.cpp:
https://github.com/ngxson/smolvlm-realtime-webcam
https://x.com/pocketpal_ai

4 replies

zhentaoyu

authored a paper 7 months ago

HunyuanCustom: A Multimodal-Driven Architecture for Customized Video Generation

Paper • 2505.04512 • Published May 7 • 36

wenhuach

posted an update 7 months ago

Post

1941

AutoRound(https://github.com/intel/auto-round) has been integrated into Transformers, allowing you to run AutoRound-formatted models directly in the upcoming release. Additionally, we are actively working on supporting the GGUF double-quant format, e.g. q4_k_s, stay tuned!

https://huggingface.co/blog/autoround

lvwerra

authored a paper 8 months ago

SmolVLM: Redefining small and efficient multimodal models

Paper • 2504.05299 • Published Apr 7 • 200

loubnabnl

authored a paper 8 months ago

SmolVLM: Redefining small and efficient multimodal models

Paper • 2504.05299 • Published Apr 7 • 200

Haihao

authored a paper 9 months ago

Faster Inference of LLMs using FP8 on the Intel Gaudi

Paper • 2503.09975 • Published Mar 13 • 1

wenhuach

posted an update 9 months ago

Post

2537

Check out [DeepSeek-R1 INT2 model( OPEA/DeepSeek-R1-int2-mixed-sym-inc). This 200GB DeepSeek-R1 model shows only about a 2% drop in MMLU, though it's quite slow due to kernel issue.

| | BF16 | INT2-mixed |
| ------------- | ------ | ---------- |
| mmlu | 0.8514 | 0.8302 |
| hellaswag | 0.6935 | 0.6657 |
| winogrande | 0.7932 | 0.7940 |
| arc_challenge | 0.6212 | 0.6084 |

wenhuach

posted an update 10 months ago

Post

748

OPEA Space has released several quantized DeepSeek models, including INT2. Explore them here
OPEA/deepseek-6784a012d91191015587584a

AI & ML interests

Recent Activity

Team members 19

need-for-speed's activity