GPUStack

@GPUStack_ai

Manage GPU clusters for running LLMs https://github.com/gpustack/gpustack

gpustack.ai

Joined June 2024

30Posts 124Followers 73Following

GPUStack

@GPUStack_ai

Apr 29

🚀GPUStack supports all Qwen3 models — on day 0! ✅ Mac/Windows/Linux (Apple/NVIDIA/AMD GPU) ✅ Mixed clusters via llama-box (llama.cpp) ✅ Scalable Linux clusters via vLLM + Ray Run Qwen3 anywhere — open-source & production-ready. #Qwen3 #GPUStack

Qwen

@Alibaba_Qwen

Apr 28

Introducing Qwen3! We release and open-weight Qwen3, our latest large language models, including 2 MoE models and 6 dense models, ranging from 0.6B to 235B. Our flagship model, Qwen3-235B-A22B, achieves competitive results in benchmark evaluations of coding, math, general…

Alibaba_Qwen's tweet image. Introducing Qwen3!

We release and open-weight Qwen3, our latest large language models, including 2 MoE models and 6 dense models, ranging from 0.6B to 235B. Our flagship model, Qwen3-235B-A22B, achieves competitive results in benchmark evaluations of coding, math, general…

GPUStack

@GPUStack_ai

Feb 2

Thank you Roman for the invitation. We are really happy to join the Dev room and have a great time and experience with the community!

Roman V Shaposhnik 🇨🇾🇺🇸 🏳️‍🌈

@rhatr

Feb 2

Really happy to have @GPUStack_ai present here at AI Devroom at #FOSDEM2025 They are really leading the “how do you inference on clusters of GPU” approach with some clever, Deepseek-level hacks

GPUStack

@GPUStack_ai

Feb 2

We are live at #FOSDEM2025 🔥! The first 10 people to repost this with a photo of installed GPUStack on their device, can claim a T-shirt at room UB2.252A in ULB! #LowlevelAIEngineeringandHacking Deadline: 6PM today. #GPUStack #fosdem25

GPUStack_ai's tweet image. We are live at #FOSDEM2025 🔥! The first 10 people to repost this with a photo of installed GPUStack on their device, can claim a T-shirt at room UB2.252A in ULB! #LowlevelAIEngineeringandHacking

Deadline: 6PM today.

#GPUStack #fosdem25

GPUStack

@GPUStack_ai

Jan 30

Thanks for the great work. To use the update in GPUStack, just define your llama-box backend version to v0.0.112. GPUStack will automatically download and use the new version for you.

GPUStack_ai's tweet image. Thanks for the great work. To use the update in GPUStack, just define your llama-box backend version to v0.0.112. GPUStack will automatically download and use the new version for you.

Georgi Gerganov

@ggerganov

Jan 29

PSA: make sure to update your brew llama.cpp packages to enjoy major performance improvement for your llama.vim and llama.vscode FIMs brew install llama.cpp github.com/ggerganov/llam…

github.com

metal : use residency sets by ggerganov · Pull Request #11427 · ggml-org/llama.cpp

fix #10119 Using residency sets makes the allocated memory stay wired and eliminates almost completely the overhead observed in #10119. For example, on M2 Ultra, using 7B Q8_0 model the requests ar...

Source: github.com

GPUStack

@GPUStack_ai

Jan 30

We are Heading to Brussel for @fosdem . If you are there, come to the Low-Level AI Engineering and Hacking Dev Room and find us on Sunday. #FOSDEM2025

GPUStack_ai's tweet image. We are Heading to Brussel for @fosdem . If you are there, come to the Low-Level AI Engineering and Hacking Dev Room and find us on Sunday.
#FOSDEM2025

GPUStack

@GPUStack_ai

Jan 27

🚀 Want to run DeepSeek-R1 across Mac, Windows, and Linux with Apple, AMD, and Nvidia GPUs? Try GPUStack v0.5! No blind forced distribution - we auto-calculate resource needs and pick the optimal deployment. Flexibility meets power! 💪 #DeepSeekR1 #AMD #MacOS #GPUs

GPUStack

@GPUStack_ai

Dec 9

🚀 GPUStack 0.4.0 is here! Now with support for image generation & audio models, inference engine version management, offline support, and more. Ready to power your AI workflows like never before! Learn more here 👇 gpustack.ai/gpustack-v0-4-… #AI #LLMs #flux1

GPUStack

@GPUStack_ai

Nov 21

Looking forward to open source!

DeepSeek

@deepseek_ai

Nov 20

🚀 DeepSeek-R1-Lite-Preview is now live: unleashing supercharged reasoning power! 🔍 o1-preview-level performance on AIME & MATH benchmarks. 💡 Transparent thought process in real-time. 🛠️ Open-source models & API coming soon! 🌐 Try it now at chat.deepseek.com #DeepSeek

deepseek_ai's tweet image. 🚀 DeepSeek-R1-Lite-Preview is now live: unleashing supercharged reasoning power!

🔍 o1-preview-level performance on AIME &amp; MATH benchmarks.
💡 Transparent thought process in real-time.
🛠️ Open-source models &amp; API coming soon!

🌐 Try it now at chat.deepseek.com
#DeepSeek

GPUStack

@GPUStack_ai

Nov 20

Want to run GPUStack in Docker? 🚀 Learn how to set up NVIDIA Container Runtime and effortlessly deploy GPUStack with Docker in this tutorial👇 gpustack.ai/how-to-set-up-… #LLM #GPU #NVIDIA

GPUStack

@GPUStack_ai

Nov 13

A step-by-step guide on how to use llama.cpp to convert and quantize GGUF models and upload them to Hugging Face.👇 gpustack.ai/convert-and-up…

GPUStack

@GPUStack_ai

Nov 6, 2024

Unlock the power of a private ChatGPT and knowledge base with @AnythingLLM + GPUStack! 🎉 Learn how to build your own AI assistant here: gpustack.ai/building-your-…

GPUStack

@GPUStack_ai

Nov 1, 2024

🚀 GPUStack 0.3.2 is out! Support for new reranker models: gte-multilingual-reranker-base and jina-reranker-v2-base-multilingual. Learn more here👇 github.com/gpustack/gpust…

GPUStack_ai's tweet card. Release Notes Enhancements Added support for the --cache-dir parameter to configure model cache independently. See issue #504. Added support for new reranker models: gte-multilingual-reranker-base...

Release 0.3.2 · gpustack/gpustack

Source: github.com

GPUStack

@GPUStack_ai

Oct 30, 2024

Want to build a RAG-Powered Chatbot with Chat, Embed, and Rerank endpoints entirely on your MacBook or anywhere? Just try github.com/gpustack/gpust… backed by llama.cpp. Thanks a lot to @ggerganov and the llama.cpp community for the great work.

GPUStack_ai's tweet image. Want to build a RAG-Powered Chatbot with Chat, Embed, and Rerank endpoints entirely on your MacBook or anywhere? Just try github.com/gpustack/gpust… backed by llama.cpp. Thanks a lot to @ggerganov and the llama.cpp community for the great work.

GPUStack

@GPUStack_ai

Oct 25, 2024

🚀 GPUStack 0.3.1 is released, introducing support for Rerank models and API and Windows ARM64 devices. Learn more here 👇 gpustack.ai/introducing-gp… #AI #LLM #GenAI #OpenAI

GPUStack

@GPUStack_ai

Oct 17, 2024

Run @MistralAI's Ministral in GPUStack using vLLM or llama.cpp backend.

Mistral AI

@MistralAI

Oct 16, 2024

mistral.ai/news/ministrau…

GPUStack

@GPUStack_ai

Oct 17, 2024

Nemotron, A 70B instruct model customized by @nvidia from @AIatMeta llama 3.1. Let's try it with GPUStack!

NVIDIA AI Developer

@NVIDIAAIDev

Oct 15, 2024

Our Llama-3.1-Nemotron-70B-Instruct model is a leading model on the 🏆 Arena Hard benchmark (85) from @lmarena_ai. Arena Hard uses a data pipeline to build high-quality benchmarks from live data in Chatbot Arena, and is known for its predictive ability of Chatbot Arena Elo…

GPUStack reposted

Sebastian Raschka

@rasbt

Oct 13, 2024

Previously, RAG systems were the standard method for retrieving information from documents. However, if you are not repeatedly querying the same document, it may be more convenient and effective to just use long-context LLMs. For example, Llama 3.1 8B and Llama 3.2 1B/3B now…