javaeeeee1's profile picture. Software developer with physics background, teacher, entrepreneur

Dmitry Noranovich

@javaeeeee1

Software developer with physics background, teacher, entrepreneur

Dmitry Noranovich сделал(а) репост

Inference engines turn trained models into a live, callable API. FastAPI works for simple models... until you need streaming, batching, or speculative decoding. And when your “model” is really multiple models stitched together, every API hop adds latency. That's why we built…

LightningAI's tweet image. Inference engines turn trained models into a live, callable API. FastAPI works for simple models... until you need streaming, batching, or speculative decoding. And when your “model” is really multiple models stitched together, every API hop adds latency.

That's why we built…

Dmitry Noranovich сделал(а) репост

REFRAG from Meta Superintelligence Labs is a SUPER EXCITING breakthrough that may spark the second summer of Vector Databases! ☀️🏖️ REFRAG illustrates how Database Systems are becoming even more integral to LLM inference 🧬 By making clever use of how context vectors are…

CShorten30's tweet image. REFRAG from Meta Superintelligence Labs is a SUPER EXCITING breakthrough that may spark the second summer of Vector Databases! ☀️🏖️

REFRAG illustrates how Database Systems are becoming even more integral to LLM inference 🧬

By making clever use of how context vectors are…

Want to learn how to use precomputed, compressed vector representations of the data from a vector database instead of using the full text of retrieved documents? Listen to a podcast about REFRAG. REFRAG significantly reduces the amount of information the large language model…

I am SUPER EXCITED to publish the 130th episode of the Weaviate Podcast featuring Xiaoqiang Lin (@xiaoqiang_98), the lead author of REFRAG from Meta Superintelligence Labs! 🎙️🎉 Traditional RAG systems use vectors to retrieve relevant context, but then throw away the vectors,…

CShorten30's tweet image. I am SUPER EXCITED to publish the 130th episode of the Weaviate Podcast featuring Xiaoqiang Lin (@xiaoqiang_98), the lead author of REFRAG from Meta Superintelligence Labs! 🎙️🎉

Traditional RAG systems use vectors to retrieve relevant context, but then throw away the vectors,…


Dmitry Noranovich сделал(а) репост

You can now run Qwen3-VL locally! 💜 Run the 235B variant for SOTA vision/OCR on 128GB unified memory (dynamic 4-bit). Includes our chat template fixes. Qwen3-VL-2B runs at ~40 t/s on 4GB RAM. Fine-tune & RL via Unsloth free notebooks & export to GGUF. docs.unsloth.ai/models/qwen3-vl

UnslothAI's tweet image. You can now run Qwen3-VL locally! 💜

Run the 235B variant for SOTA vision/OCR on 128GB unified memory (dynamic 4-bit). Includes our chat template fixes.

Qwen3-VL-2B runs at ~40 t/s on 4GB RAM.

Fine-tune & RL via Unsloth free notebooks & export to GGUF.

docs.unsloth.ai/models/qwen3-vl

Dmitry Noranovich сделал(а) репост

You can now run Qwen3-VL locally with Unsloth AI. 👇Fine-tune & RL via free notebooks.

You can now run Qwen3-VL locally! 💜 Run the 235B variant for SOTA vision/OCR on 128GB unified memory (dynamic 4-bit). Includes our chat template fixes. Qwen3-VL-2B runs at ~40 t/s on 4GB RAM. Fine-tune & RL via Unsloth free notebooks & export to GGUF. docs.unsloth.ai/models/qwen3-vl

UnslothAI's tweet image. You can now run Qwen3-VL locally! 💜

Run the 235B variant for SOTA vision/OCR on 128GB unified memory (dynamic 4-bit). Includes our chat template fixes.

Qwen3-VL-2B runs at ~40 t/s on 4GB RAM.

Fine-tune & RL via Unsloth free notebooks & export to GGUF.

docs.unsloth.ai/models/qwen3-vl


Dmitry Noranovich сделал(а) репост

Gave a smol 🤏 intro to Agents using smolagents last Monday! Sharing the slides in case you're curious. They serve as a gentle first step into the Agents Course we developed at @huggingface 🫶🫶

SergioPaniego's tweet image. Gave a smol 🤏 intro to Agents using smolagents last Monday!
Sharing the slides in case you're curious. They serve as a gentle first step into the Agents Course we developed at @huggingface 🫶🫶

Dmitry Noranovich сделал(а) репост

Ah sure, that's magazine.sebastianraschka.com/p/the-big-llm-… I am also working on one where I am explaining DeltaNet in more detail.


NVIDIA GeForce RTX 5090 vs 4080 for AI (2025): VRAM, Bandwidth, Tensor Cores bestgpusforai.com/gpu-comparison…

javaeeeee1's tweet image. NVIDIA GeForce RTX 5090 vs 4080 for AI (2025): VRAM, Bandwidth, Tensor Cores

bestgpusforai.com/gpu-comparison…

Loading...

Something went wrong.


Something went wrong.