javaeeeee1's profile picture. Software developer with physics background, teacher, entrepreneur

Dmitry Noranovich

@javaeeeee1

Software developer with physics background, teacher, entrepreneur

Dmitry Noranovich 已轉發

New on the Anthropic Engineering blog: tips on how to build more efficient agents that handle more tools while using fewer tokens. Code execution with the Model Context Protocol (MCP): anthropic.com/engineering/co…


Dmitry Noranovich 已轉發

Inference engines turn trained models into a live, callable API. FastAPI works for simple models... until you need streaming, batching, or speculative decoding. And when your “model” is really multiple models stitched together, every API hop adds latency. That's why we built…

LightningAI's tweet image. Inference engines turn trained models into a live, callable API. FastAPI works for simple models... until you need streaming, batching, or speculative decoding. And when your “model” is really multiple models stitched together, every API hop adds latency.

That's why we built…

Dmitry Noranovich 已轉發

REFRAG from Meta Superintelligence Labs is a SUPER EXCITING breakthrough that may spark the second summer of Vector Databases! ☀️🏖️ REFRAG illustrates how Database Systems are becoming even more integral to LLM inference 🧬 By making clever use of how context vectors are…

CShorten30's tweet image. REFRAG from Meta Superintelligence Labs is a SUPER EXCITING breakthrough that may spark the second summer of Vector Databases! ☀️🏖️

REFRAG illustrates how Database Systems are becoming even more integral to LLM inference 🧬

By making clever use of how context vectors are…

Want to learn how to use precomputed, compressed vector representations of the data from a vector database instead of using the full text of retrieved documents? Listen to a podcast about REFRAG. REFRAG significantly reduces the amount of information the large language model…

I am SUPER EXCITED to publish the 130th episode of the Weaviate Podcast featuring Xiaoqiang Lin (@xiaoqiang_98), the lead author of REFRAG from Meta Superintelligence Labs! 🎙️🎉 Traditional RAG systems use vectors to retrieve relevant context, but then throw away the vectors,…

CShorten30's tweet image. I am SUPER EXCITED to publish the 130th episode of the Weaviate Podcast featuring Xiaoqiang Lin (@xiaoqiang_98), the lead author of REFRAG from Meta Superintelligence Labs! 🎙️🎉

Traditional RAG systems use vectors to retrieve relevant context, but then throw away the vectors,…


Dmitry Noranovich 已轉發

You can now run Qwen3-VL locally! 💜 Run the 235B variant for SOTA vision/OCR on 128GB unified memory (dynamic 4-bit). Includes our chat template fixes. Qwen3-VL-2B runs at ~40 t/s on 4GB RAM. Fine-tune & RL via Unsloth free notebooks & export to GGUF. docs.unsloth.ai/models/qwen3-vl

UnslothAI's tweet image. You can now run Qwen3-VL locally! 💜

Run the 235B variant for SOTA vision/OCR on 128GB unified memory (dynamic 4-bit). Includes our chat template fixes.

Qwen3-VL-2B runs at ~40 t/s on 4GB RAM.

Fine-tune & RL via Unsloth free notebooks & export to GGUF.

docs.unsloth.ai/models/qwen3-vl

Dmitry Noranovich 已轉發

You can now run Qwen3-VL locally with Unsloth AI. 👇Fine-tune & RL via free notebooks.

You can now run Qwen3-VL locally! 💜 Run the 235B variant for SOTA vision/OCR on 128GB unified memory (dynamic 4-bit). Includes our chat template fixes. Qwen3-VL-2B runs at ~40 t/s on 4GB RAM. Fine-tune & RL via Unsloth free notebooks & export to GGUF. docs.unsloth.ai/models/qwen3-vl

UnslothAI's tweet image. You can now run Qwen3-VL locally! 💜

Run the 235B variant for SOTA vision/OCR on 128GB unified memory (dynamic 4-bit). Includes our chat template fixes.

Qwen3-VL-2B runs at ~40 t/s on 4GB RAM.

Fine-tune & RL via Unsloth free notebooks & export to GGUF.

docs.unsloth.ai/models/qwen3-vl


Loading...

Something went wrong.


Something went wrong.