raywzy1's profile picture. Researcher @Microsoft. Previously @GoogleDeepMind | @Stanford | @RealityLabs | @CityUHongKong | @MSFTResearch | Tencent AI Lab.

Ziyu Wan

@raywzy1

Researcher @Microsoft. Previously @GoogleDeepMind | @Stanford | @RealityLabs | @CityUHongKong | @MSFTResearch | Tencent AI Lab.

great post!

Our latest post explores on-policy distillation, a training approach that unites the error-correcting relevance of RL with the reward density of SFT. When training it for math reasoning and as an internal chat assistant, we find that on-policy distillation can outperform other…

thinkymachines's tweet image. Our latest post explores on-policy distillation, a training approach that unites the error-correcting relevance of RL with the reward density of SFT. When training it for math reasoning and as an internal chat assistant, we find that on-policy distillation can outperform other…


Want to know how to evaluate the spatial intelligence capability of your VLM? We will present VLM4D this afternoon! #ICCV2025

Can VLMs really think in 4D (3D space + time)? 🤔 When a model can’t tell “left” from “right,” something’s missing. That’s why we built VLM4D — a benchmark for spatiotemporal reasoning, debuting at #ICCV2025 📅 Oct 21 | 🕒 3–5 PM | 📍Exhibit Hall I #798 vlm4d.github.io



Ziyu Wan reposted

Bee A High-Quality Corpus and Full-Stack Suite to Unlock Advanced Fully Open MLLMs

_akhaliq's tweet image. Bee

A High-Quality Corpus and Full-Stack Suite to Unlock Advanced Fully Open MLLMs

Ziyu Wan reposted

I’ll be at ICCV! 🙋‍♂️Message me if you’re interested in joining us at Snap Research (Personalization Team) — we’re hiring research interns year-round in 🖼️Image editing 🎨Personalized generation 🤖Agentics 🧠VLMs for generation We also have full-time Research Scientist roles open!

guocheng_qian's tweet image. I’ll be at ICCV! 🙋‍♂️Message me if you’re interested in joining us at Snap Research (Personalization Team) — we’re hiring research interns year-round in
🖼️Image editing
🎨Personalized generation
🤖Agentics
🧠VLMs for generation
We also have full-time Research Scientist roles open!

👍👍👍

🚀Excited to share our recent research:🚀 “Learning to Reason as Action Abstractions with Scalable Mid-Training RL” We theoretically study 𝙝𝙤𝙬 𝙢𝙞𝙙-𝙩𝙧𝙖𝙞𝙣𝙞𝙣𝙜 𝙨𝙝𝙖𝙥𝙚𝙨 𝙥𝙤𝙨𝙩-𝙩𝙧𝙖𝙞𝙣𝙞𝙣𝙜 𝙍𝙇. The findings lead to a scalable algorithm for learning action…

ShenaoZhang's tweet image. 🚀Excited to share our recent research:🚀

“Learning to Reason as Action Abstractions with Scalable Mid-Training RL”

We theoretically study 𝙝𝙤𝙬 𝙢𝙞𝙙-𝙩𝙧𝙖𝙞𝙣𝙞𝙣𝙜 𝙨𝙝𝙖𝙥𝙚𝙨 𝙥𝙤𝙨𝙩-𝙩𝙧𝙖𝙞𝙣𝙞𝙣𝙜 𝙍𝙇. 
The findings lead to a scalable algorithm for learning action…


Ziyu Wan reposted

RenderFormer, from Microsoft Research, is the first model to show that a neural network can learn a complete graphics rendering pipeline. It’s designed to support full-featured 3D rendering using only machine learning—no traditional graphics computation required. Learn more:…


Ziyu Wan reposted

“Everyone knows” what an autoencoder is… but there's an important complementary picture missing from most introductory material. In short: we emphasize how autoencoders are implemented—but not always what they represent (and some of the implications of that representation).🧵

keenanisalive's tweet image. “Everyone knows” what an autoencoder is… but there's an important complementary picture missing from most introductory material.

In short: we emphasize how autoencoders are implemented—but not always what they represent (and some of the implications of that representation).🧵

Ziyu Wan reposted

Instead of blending colors along rays and supervising the resulting images, we project the training images into the scene to supervise the radiance field. Each point along a ray is treated as a surface candidate, independently optimized to match that ray's reference color.

wenzeljakob's tweet image. Instead of blending colors along rays and supervising the resulting images, we project the training images into the scene to supervise the radiance field.
Each point along a ray is treated as a surface candidate, independently optimized to match that ray's reference color.

simple but effective way to inject 3D info into your world simulator😉

✅Videos are 2D projections of our dynamic 3D world. 🧠But can video diffusion models implicitly learn 3D information through training on raw video data, without explicit 3D supervision? ❌Our answer is NO !!!

deeptimhe's tweet image. ✅Videos are 2D projections of our dynamic 3D world.

🧠But can video diffusion models implicitly learn 3D information through training on raw video data, without explicit 3D supervision?

❌Our answer is NO !!!


Ziyu Wan reposted

📢 The submission portal for #3DV2026 is LIVE on OpenReview 👉 openreview.net/group?id=3DV/2… Ready to ride the wave? 🌊 Deadline Aug 18 (≈ 5 weeks)—bring your best and make a splash!

3DVconf's tweet image. 📢 The submission portal for #3DV2026 is LIVE on OpenReview
👉 openreview.net/group?id=3DV/2…

Ready to ride the wave? 🌊
Deadline Aug 18 (≈ 5 weeks)—bring your best and make a splash!

Ziyu Wan reposted

awesome work by @jiacheng_chen_ and @sanghyunwoo1219 on 3D-grounded visual compositing (and nice demos!)

Introducing BlenderFusion: Reassemble your visual elements—objects, camera, and background—to compose a new visual narrative. Play the interactive demo: blenderfusion.github.io



🤯!!!

Let's test #Veo2 by dropping fruits into water! 🍓🍏🍎🫐 Blueberries first



“Introduction to Algorithms” is truly a phenomenal textbook! (Fun fact: back in high school, my laptop and this 1,300-page giant were inseparable desk buddies 😂😂😂)

raywzy1's tweet image. “Introduction to Algorithms” is truly a phenomenal textbook! (Fun fact: back in high school, my laptop and this 1,300-page giant were inseparable desk buddies 😂😂😂)

MIT’s “Introduction to Algorithms,” published #otd in 1990, is the world’s most cited CS text, with 67K citations & over a million copies sold. bit.ly/3y1yMPR @mitpress

MIT_CSAIL's tweet image. MIT’s “Introduction to Algorithms,” published #otd in 1990, is the world’s most cited CS text, with 67K citations & over a million copies sold.

bit.ly/3y1yMPR

@mitpress
MIT_CSAIL's tweet image. MIT’s “Introduction to Algorithms,” published #otd in 1990, is the world’s most cited CS text, with 67K citations & over a million copies sold.

bit.ly/3y1yMPR

@mitpress


Ziyu Wan reposted

I grew up editing action movies with two vhs players & having to warn police we were filming with painted toy guns. My first VFX was frame by frame in photoshop from minidv. Kids growing up with advanced versions of this tech are going to do absolutely incredible things.


Ziyu Wan reposted

We discovered that imposing a spatio-temporal weight space via LoRAs on DIT-based video models unlocks powerful customization! It captures dynamic concepts with precision and even enables composition of multiple videos together!🎥✨


Inverse (direct G-buffer estimation) and forward rendering (no light transport simulation) using video diffusion model! Congratulations on the great work🥳🥳🥳

🚀 Introducing DiffusionRenderer, a neural rendering engine powered by video diffusion models. 🎥 Estimates high-quality geometry and materials from videos, synthesizes photorealistic light transport, enables relighting and material editing with realistic shadows and reflections



Loading...

Something went wrong.


Something went wrong.