nikhil_r_ghosh's profile picture. inferring all over the place @anyscalecompute @raydistributed

Nikhil G

@nikhil_r_ghosh

inferring all over the place @anyscalecompute @raydistributed

Nikhil G reposted

The quality of this year’s @raydistributed summit agenda and speaker lineups is awesome🔥. Personally looking forward to these: physical AI @DrJimFan Terminal-Bench @Mike_A_Merrill PrimeIntellect & EnvHub for RL on LLMs @willccbb @johannes_hage Apple on LLM inference w/ Ray…


Nikhil G reposted

SkyRL now supports Megatron! Training massive MoE models demands more than just ZeRO-3/FSDP sharding. The Megatron backend for SkyRL unlocks high throughput training with: ✅ 5D parallelism (tensor + pipeline + context + expert + data) ✅ Efficient training for 30B+ MoEs

erictang000's tweet image. SkyRL now supports Megatron!

Training massive MoE models demands more than just ZeRO-3/FSDP sharding. The Megatron backend for SkyRL unlocks high throughput training with:

✅ 5D parallelism (tensor + pipeline + context + expert + data)
✅ Efficient training for 30B+ MoEs

Nikhil G reposted

Prefix cache-aware routing is now available in Ray 2.49 🚀 Scaling input token-heavy workloads (like multi-turn convos & agent loops) requires maintaining prefix cache hit rate across 100s of vLLM engine replicas, and PrefixCacheAffinityRouter makes it easy. Here’s how it…

seiji_________'s tweet image. Prefix cache-aware routing is now available in Ray 2.49 🚀

Scaling input token-heavy workloads (like multi-turn convos & agent loops) requires maintaining prefix cache hit rate across 100s of vLLM engine replicas, and PrefixCacheAffinityRouter makes it easy.

Here’s how it…

Nikhil G reposted

Very excited to see the Tinker release! @pcmoritz and I had a chance to experiment with the API. It does a nice job of providing flexibility while abstracting away GPU handling. Here's a simple example showing how to generate synthetic data and fine tune a text to SQL model.…

Introducing Tinker: a flexible API for fine-tuning language models. Write training loops in Python on your laptop; we'll run them on distributed GPUs. Private beta starts today. We can't wait to see what researchers and developers build with cutting-edge open models!…

thinkymachines's tweet image. Introducing Tinker: a flexible API for fine-tuning language models.

Write training loops in Python on your laptop; we'll run them on distributed GPUs.

Private beta starts today. We can't wait to see what researchers and developers build with cutting-edge open models!…


Loading...

Something went wrong.


Something went wrong.