jasonyzhang2's profile picture. 3D @ Google. PhD @CMU_robotics.

Jason Y. Zhang

@jasonyzhang2

3D @ Google. PhD @CMU_robotics.

Jason Y. Zhang reposted

📢 SceneComp @ ICCV 2025 🏝️ 🌎 Generative Scene Completion for Immersive Worlds 🛠️ Reconstruct what you know AND 🪄 Generate what you don’t! 🙌 Meet our speakers @angelaqdai, @holynski_, @jampani_varun, @ZGojcic @taiyasaki, Peter Kontschieder scenecomp.github.io #ICCV2025


Photographic proof I can run a 90 minute half marathon 😉 If you haven't tried text-based image editing yet, you're missing out!

This is a group pic from our half marathon in SF. I want: - Golden Gate Bridge? - Wish you finished within 1.5 hours? - No bananas in your hand? No problem, Gemini 2.5 Flash transformed the photo in seconds! #Gemini Let's go, GRC! @tkipf @RuiqiGao @jasonyzhang2

songyoupeng's tweet image. This is a group pic from our half marathon in SF. I want:

- Golden Gate Bridge? 
- Wish you finished within 1.5 hours? 
- No bananas in your hand?

No problem, Gemini 2.5 Flash transformed the photo in seconds! #Gemini

Let's go, GRC! @tkipf @RuiqiGao @jasonyzhang2
songyoupeng's tweet image. This is a group pic from our half marathon in SF. I want:

- Golden Gate Bridge? 
- Wish you finished within 1.5 hours? 
- No bananas in your hand?

No problem, Gemini 2.5 Flash transformed the photo in seconds! #Gemini

Let's go, GRC! @tkipf @RuiqiGao @jasonyzhang2


Jason Y. Zhang reposted

Excited to introduce Genie 3, our general purpose world model that creates interactive, playable environments from any text prompt. It can generate dynamic worlds at 720p and 24 FPS, with each frame created in response to user actions in *real-time*.

What if you could not only watch a generated video, but explore it too? 🌐 Genie 3 is our groundbreaking world model that creates interactive, playable environments from a single text prompt. From photorealistic landscapes to fantasy realms, the possibilities are endless. 🧵



Jason Y. Zhang reposted

Our team at Google DeepMind Foundational Research has an opening for a full-time Research Scientist! Areas of Interest are Multimodal, 3D and Spatial Reasoning, Self-improving Agents. Looking for candidates with strong publications at top ML and CV conferences. Email:…


Last year, my ring bearer was a Skild robot. Excited to see how far they've come!!

jasonyzhang2's tweet image. Last year, my ring bearer was a Skild robot. Excited to see how far they've come!!

Modern AI is confined to the digital world. At Skild AI, we are building towards AGI for the real world, unconstrained by robot type or task — a single, omni-bodied brain. Today, we are sharing our journey, starting with early milestones, with more to come in the weeks ahead.…



Jason Y. Zhang reposted

Modern AI is confined to the digital world. At Skild AI, we are building towards AGI for the real world, unconstrained by robot type or task — a single, omni-bodied brain. Today, we are sharing our journey, starting with early milestones, with more to come in the weeks ahead.…


Jason Y. Zhang reposted

For everyone interested in precise 📷camera control 📷 in transformers [e.g., video / world model etc] Stop settling for Plücker raymaps -- use camera-aware relative PE in your attention layers, like RoPE (for LLMs) but for cameras! Paper & code: liruilong.cn/prope/

ruilong_li's tweet image. For everyone interested in precise 📷camera control 📷 in transformers [e.g., video / world model etc]

Stop settling for Plücker raymaps -- use camera-aware relative PE in your attention layers, like RoPE (for LLMs) but for cameras! 

Paper & code: liruilong.cn/prope/

Jason Y. Zhang reposted

Bolt3D is accepted to @ICCVConference 🥳 see you in Hawaii!

⚡️ Introducing Bolt3D ⚡️ Bolt3D generates interactive 3D scenes in less than 7 seconds on a single GPU from one or more images. It features a latent diffusion model that *directly* generates 3D Gaussians of seen and unseen regions, without any test time optimization. 🧵👇 (1/9)



Jason Y. Zhang reposted

image ⇒ video ⇒ 3D/4D I'm super excited to build the next generation of models that understand and can imagine the world like we do at SpAItial with amazing people. Sounds fun? We are hiring! spaitial.ai

🚀🚀🚀Announcing our $13M funding round to build the next generation of AI: 𝐒𝐩𝐚𝐭𝐢𝐚𝐥 𝐅𝐨𝐮𝐧𝐝𝐚𝐭𝐢𝐨𝐧 𝐌𝐨𝐝𝐞𝐥𝐬 that can generate entire 3D environments anchored in space & time. 🚀🚀🚀 Interested? Join our world-class team: 🌍 spaitial.ai #GenAI #3DAI



Jason Y. Zhang reposted

Video, meet audio. 🎥🤝🔊 With Veo 3, our new state-of-the-art generative video model, you can add soundtracks to clips you make. Create talking characters, include sound effects, and more while developing videos in a range of cinematic styles. 🧵


Jason Y. Zhang reposted

Veo3 is out! deepmind.google/models/veo/ This model is awesome! It now generates audio as well as video. I'm really impressed by the background audio and music, and the synchronization of sound effects to the video. Try it out using Flow! labs.google/flow/about


Jason Y. Zhang reposted

Reference-powered Veo lets you go for walks in the Himalayas with your dog!


Jason Y. Zhang reposted

Here's a nice "proof without words": The sum of the squares of several positive values can never be bigger than the square of their sum. This picture helps make sense of how ℓ₁ and ℓ₂ norms regularize and sparsify solutions (resp.). [1/n]


Loading...

Something went wrong.


Something went wrong.