AI & ML interests

None defined yet.

angtΒ 
posted an update 5 days ago
view post
Post
1534
I'm excited to share that https://installama.sh is up and running! πŸš€

On Linux / macOS / FreeBSD it is easier than ever:
curl https://installama.sh | sh


And Windows just joined the party πŸ₯³
irm https://installama.sh | iex

Stay tuned for new backends on Windows!
angtΒ 
posted an update 10 days ago
view post
Post
373
πŸš€ installama.sh update: Vulkan & FreeBSD support added!

The fastest way to install and run llama.cpp has just been updated!

We are expanding hardware and OS support to make local AI even more accessible. This includes:

πŸŒ‹ Vulkan support for Linux on x86_64 and aarch64.
😈 FreeBSD support (CPU backend) on x86_64 and aarch64 too.
✨ Lots of small optimizations and improvements under the hood.

Give it a try right now:
curl angt.github.io/installama.sh | MODEL=unsloth/Qwen3-4B-GGUF:Q4_0 sh
angtΒ 
posted an update 19 days ago
view post
Post
1953
One command line is all you need...

...to launch a local llama.cpp server on any Linux box or any Metal-powered Mac πŸš€

curl angt.github.io/installama.sh | MODEL=unsloth/gpt-oss-20b-GGUF sh


Learn more: https://github.com/angt/installama.sh
hlarcherΒ 
posted an update 4 months ago
view post
Post
348
GH200 cooking time πŸ§‘β€πŸ³πŸ”₯!

We just updated GPU-fryer 🍳 to run on Grace Hopper Superchip (GH200) - fully optimized for ARM-based systems!
With this release, we switched to cuBLASLt to support running FP8 benchmarks. You can monitor GPU throttling, TFLOPS outliers, HBM memory health, and ensure that you get the most of your hardware setup.
Perfect for stress testing and tuning datacenter GPUs.

Check it out on Github πŸ‘‰ https://github.com/huggingface/gpu-fryer
angtΒ 
posted an update 4 months ago
angtΒ 
posted an update 6 months ago
hlarcherΒ 
posted an update 11 months ago
view post
Post
1171
We are introducing multi-backend support in Hugging Face Text Generation Inference!
With new TGI architecture we are now able to plug new modeling backends to get best performances according to selected model and available hardware. This first step will very soon be followed by the integration of new backends (TRT-LLM, llama.cpp, vLLM, Neuron and TPU).

We are polishing the TensorRT-LLM backend which achieves impressive performances on NVIDIA GPUs, stay tuned πŸ€— !

Check out the details: https://huggingface.co/blog/tgi-multi-backend