GPUStack_ai's profile picture. Manage GPU clusters for running LLMs https://github.com/gpustack/gpustack

GPUStack

@GPUStack_ai

Manage GPU clusters for running LLMs https://github.com/gpustack/gpustack

🚀GPUStack supports all Qwen3 models — on day 0! ✅ Mac/Windows/Linux (Apple/NVIDIA/AMD GPU) ✅ Mixed clusters via llama-box (llama.cpp) ✅ Scalable Linux clusters via vLLM + Ray Run Qwen3 anywhere — open-source & production-ready. #Qwen3 #GPUStack

Introducing Qwen3! We release and open-weight Qwen3, our latest large language models, including 2 MoE models and 6 dense models, ranging from 0.6B to 235B. Our flagship model, Qwen3-235B-A22B, achieves competitive results in benchmark evaluations of coding, math, general…

Alibaba_Qwen's tweet image. Introducing Qwen3! 

We release and open-weight Qwen3, our latest large language models, including 2 MoE models and 6 dense models, ranging from 0.6B to 235B. Our flagship model, Qwen3-235B-A22B, achieves competitive results in benchmark evaluations of coding, math, general…
Alibaba_Qwen's tweet image. Introducing Qwen3! 

We release and open-weight Qwen3, our latest large language models, including 2 MoE models and 6 dense models, ranging from 0.6B to 235B. Our flagship model, Qwen3-235B-A22B, achieves competitive results in benchmark evaluations of coding, math, general…


Thank you Roman for the invitation. We are really happy to join the Dev room and have a great time and experience with the community!

Really happy to have @GPUStack_ai present here at AI Devroom at #FOSDEM2025 They are really leading the “how do you inference on clusters of GPU” approach with some clever, Deepseek-level hacks

rhatr's tweet image. Really happy to have @GPUStack_ai present here at AI Devroom at #FOSDEM2025 

They are really leading the “how do you inference on clusters of GPU” approach with some clever, Deepseek-level hacks


We are live at #FOSDEM2025 🔥! The first 10 people to repost this with a photo of installed GPUStack on their device, can claim a T-shirt at room UB2.252A in ULB! #LowlevelAIEngineeringandHacking Deadline: 6PM today. #GPUStack #fosdem25

GPUStack_ai's tweet image. We are live at #FOSDEM2025 🔥! The first 10 people to repost this with a photo of installed GPUStack on their device, can claim a T-shirt at room UB2.252A in ULB! #LowlevelAIEngineeringandHacking

Deadline: 6PM today.

#GPUStack #fosdem25
GPUStack_ai's tweet image. We are live at #FOSDEM2025 🔥! The first 10 people to repost this with a photo of installed GPUStack on their device, can claim a T-shirt at room UB2.252A in ULB! #LowlevelAIEngineeringandHacking

Deadline: 6PM today.

#GPUStack #fosdem25
GPUStack_ai's tweet image. We are live at #FOSDEM2025 🔥! The first 10 people to repost this with a photo of installed GPUStack on their device, can claim a T-shirt at room UB2.252A in ULB! #LowlevelAIEngineeringandHacking

Deadline: 6PM today.

#GPUStack #fosdem25
GPUStack_ai's tweet image. We are live at #FOSDEM2025 🔥! The first 10 people to repost this with a photo of installed GPUStack on their device, can claim a T-shirt at room UB2.252A in ULB! #LowlevelAIEngineeringandHacking

Deadline: 6PM today.

#GPUStack #fosdem25

Thanks for the great work. To use the update in GPUStack, just define your llama-box backend version to v0.0.112. GPUStack will automatically download and use the new version for you.

GPUStack_ai's tweet image. Thanks for the great work. To use the update in GPUStack, just define your llama-box backend version to v0.0.112. GPUStack will automatically download and use the new version for you.

We are Heading to Brussel for @fosdem . If you are there, come to the Low-Level AI Engineering and Hacking Dev Room and find us on Sunday. #FOSDEM2025

GPUStack_ai's tweet image. We are Heading to Brussel for @fosdem . If you are there, come to the Low-Level AI Engineering and Hacking Dev Room and find us on Sunday. 
#FOSDEM2025
GPUStack_ai's tweet image. We are Heading to Brussel for @fosdem . If you are there, come to the Low-Level AI Engineering and Hacking Dev Room and find us on Sunday. 
#FOSDEM2025

🚀 Want to run DeepSeek-R1 across Mac, Windows, and Linux with Apple, AMD, and Nvidia GPUs? Try GPUStack v0.5! No blind forced distribution - we auto-calculate resource needs and pick the optimal deployment. Flexibility meets power! 💪 #DeepSeekR1 #AMD #MacOS #GPUs


🚀 GPUStack 0.4.0 is here! Now with support for image generation & audio models, inference engine version management, offline support, and more. Ready to power your AI workflows like never before! Learn more here 👇 gpustack.ai/gpustack-v0-4-… #AI #LLMs #flux1

GPUStack_ai's tweet image. 🚀 GPUStack 0.4.0 is here! Now with support for image generation & audio models, inference engine version management, offline support, and more. Ready to power your AI workflows like never before!

Learn more here 👇
gpustack.ai/gpustack-v0-4-…

#AI #LLMs #flux1

Looking forward to open source!

🚀 DeepSeek-R1-Lite-Preview is now live: unleashing supercharged reasoning power! 🔍 o1-preview-level performance on AIME & MATH benchmarks. 💡 Transparent thought process in real-time. 🛠️ Open-source models & API coming soon! 🌐 Try it now at chat.deepseek.com #DeepSeek

deepseek_ai's tweet image. 🚀 DeepSeek-R1-Lite-Preview is now live: unleashing supercharged reasoning power!

🔍 o1-preview-level performance on AIME & MATH benchmarks.
💡 Transparent thought process in real-time.
🛠️ Open-source models & API coming soon!

🌐 Try it now at chat.deepseek.com
#DeepSeek


Want to run GPUStack in Docker? 🚀 Learn how to set up NVIDIA Container Runtime and effortlessly deploy GPUStack with Docker in this tutorial👇 gpustack.ai/how-to-set-up-… #LLM #GPU #NVIDIA

GPUStack_ai's tweet image. Want to run GPUStack in Docker? 🚀 Learn how to set up NVIDIA Container Runtime and effortlessly deploy GPUStack with Docker in this tutorial👇
gpustack.ai/how-to-set-up-…
#LLM #GPU #NVIDIA

A step-by-step guide on how to use llama.cpp to convert and quantize GGUF models and upload them to Hugging Face.👇 gpustack.ai/convert-and-up…

GPUStack_ai's tweet image. A step-by-step guide on how to use llama.cpp to convert and quantize GGUF models and upload them to Hugging Face.👇

gpustack.ai/convert-and-up…

Unlock the power of a private ChatGPT and knowledge base with @AnythingLLM + GPUStack! 🎉 Learn how to build your own AI assistant here: gpustack.ai/building-your-…

GPUStack_ai's tweet image. Unlock the power of a private ChatGPT and knowledge base with @AnythingLLM + GPUStack! 🎉 Learn how to build your own AI assistant here: gpustack.ai/building-your-…

🚀 GPUStack 0.3.2 is out! Support for new reranker models: gte-multilingual-reranker-base and jina-reranker-v2-base-multilingual. Learn more here👇 github.com/gpustack/gpust…


Want to build a RAG-Powered Chatbot with Chat, Embed, and Rerank endpoints entirely on your MacBook or anywhere? Just try github.com/gpustack/gpust… backed by llama.cpp. Thanks a lot to @ggerganov and the llama.cpp community for the great work.

GPUStack_ai's tweet image. Want to build a RAG-Powered Chatbot with Chat, Embed, and Rerank endpoints entirely on your MacBook or anywhere? Just try github.com/gpustack/gpust… backed by llama.cpp. Thanks a lot to @ggerganov and the llama.cpp community for the great work.
GPUStack_ai's tweet image. Want to build a RAG-Powered Chatbot with Chat, Embed, and Rerank endpoints entirely on your MacBook or anywhere? Just try github.com/gpustack/gpust… backed by llama.cpp. Thanks a lot to @ggerganov and the llama.cpp community for the great work.

🚀 GPUStack 0.3.1 is released, introducing support for Rerank models and API and Windows ARM64 devices. Learn more here 👇 gpustack.ai/introducing-gp… #AI #LLM #GenAI #OpenAI


Run @MistralAI's Ministral in GPUStack using vLLM or llama.cpp backend.


Nemotron, A 70B instruct model customized by @nvidia from @AIatMeta llama 3.1. Let's try it with GPUStack!

GPUStack_ai's tweet image. Nemotron, A 70B instruct model customized by

@nvidia from @AIatMeta llama 3.1.

Let's try it with GPUStack!

Our Llama-3.1-Nemotron-70B-Instruct model is a leading model on the 🏆 Arena Hard benchmark (85) from @lmarena_ai. Arena Hard uses a data pipeline to build high-quality benchmarks from live data in Chatbot Arena, and is known for its predictive ability of Chatbot Arena Elo…



GPUStack reposted

Previously, RAG systems were the standard method for retrieving information from documents. However, if you are not repeatedly querying the same document, it may be more convenient and effective to just use long-context LLMs. For example, Llama 3.1 8B and Llama 3.2 1B/3B now…

rasbt's tweet image. Previously, RAG systems were the standard method for retrieving information from documents. However, if you are not repeatedly querying the same document, it may be more convenient and effective to just use long-context LLMs. For example, Llama 3.1 8B and Llama 3.2 1B/3B now…

vLLM v0.6.3 just released! @vllm_project No-code experience with GPUStack.

GPUStack_ai's tweet image. vLLM v0.6.3 just released! @vllm_project No-code experience with GPUStack.

GPUStack now supports ModelScope @MaaSAI42 , try it out.

GPUStack_ai's tweet image. GPUStack now supports ModelScope @MaaSAI42 , try it out.

GPUStack supports Qwen2.5 series and Qwen2-VL. Give it a try! @Alibaba_Qwen

GPUStack_ai's tweet image. GPUStack supports Qwen2.5 series and Qwen2-VL. Give it a try! @Alibaba_Qwen

United States Trends

Loading...

Something went wrong.


Something went wrong.