mats_cgo's profile picture.

Mats

@mats_cgo

Mats reposted

🚀 Introducing InstantSfM: Fully Sparse and Parallel Structure-from-Motion. ✅ Python + GPU-optimized implementation, no C++ anymore! ✅ 40× faster than COLMAP with 5K images on single GPU! ✅ Scales beyond 100 images (more than VGGT/VGGSfM can consume)! ✅ Support metric scale.

UUUUUsher's tweet image. 🚀 Introducing InstantSfM: Fully Sparse and Parallel Structure-from-Motion.
✅ Python + GPU-optimized implementation, no C++ anymore!
✅ 40× faster than COLMAP with 5K images on single GPU!
✅ Scales beyond 100 images (more than VGGT/VGGSfM can consume)!
✅ Support metric scale.
UUUUUsher's tweet image. 🚀 Introducing InstantSfM: Fully Sparse and Parallel Structure-from-Motion.
✅ Python + GPU-optimized implementation, no C++ anymore!
✅ 40× faster than COLMAP with 5K images on single GPU!
✅ Scales beyond 100 images (more than VGGT/VGGSfM can consume)!
✅ Support metric scale.
UUUUUsher's tweet image. 🚀 Introducing InstantSfM: Fully Sparse and Parallel Structure-from-Motion.
✅ Python + GPU-optimized implementation, no C++ anymore!
✅ 40× faster than COLMAP with 5K images on single GPU!
✅ Scales beyond 100 images (more than VGGT/VGGSfM can consume)!
✅ Support metric scale.

Mats reposted

To solve AGI, we must first solve Geoguessr For that I built vlm-gym, a simple RL gym written in scratch, in JAX for Qwen3VL-4B (released yesterday) And added Geospot, a RL environment for geolocation and learned VLMs can learn how to geoguess. More:


Mats reposted

Today we're introducing Skills in claude dot ai, Claude Code, and the API. Skills let you package specialized knowledge into reusable capabilities that Claude loads on demand as agents tackle more complex tasks. Here's how they work and why they matter for the future of agents:

alexalbert__'s tweet image. Today we're introducing Skills in claude dot ai, Claude Code, and the API.

Skills let you package specialized knowledge into reusable capabilities that Claude loads on demand as agents tackle more complex tasks.

Here's how they work and why they matter for the future of agents:

Mats reposted

Google Maps v2.0? 🌍 Here's @PlayCanvas streaming a scene with 2 BILLION Gaussians! 💪 Coming sooooon!


Mats reposted

🚀 PaddleOCR-VL is here! Introducing PaddleOCR-VL (0.9B) — the ultra-compact Vision-Language model that reaches SOTA accuracy across text, tables, formulas, charts & handwriting. Breaking the limits of document parsing!🌍 Powered by: • NaViT dynamic vision encoder • ERNIE…

PaddlePaddle's tweet image. 🚀 PaddleOCR-VL is here! 

Introducing PaddleOCR-VL (0.9B) — the ultra-compact Vision-Language model that reaches SOTA accuracy across text, tables, formulas, charts & handwriting. Breaking the limits of document parsing!🌍

Powered by:
• NaViT dynamic vision encoder
• ERNIE…

Mats reposted

⚡️Generating 3DGS scenes in 5 seconds on a single GPU⚡️ #FlashWorld enables ⚡️*fast*⚡️ (10~100x faster than previous methods) and 🔥*high-quality*🔥 3D world generation, from a single image or text prompt. Code: github.com/imlixinyang/Fl… Page: imlixinyang.github.io/FlashWorld-Pro…


Mats reposted

InstantSfM: Fully Sparse and Parallel Structure-from-Motion TLDR: InstantSfM is a fully sparse and parallel Structure-from-Motion pipeline. It leverages GPU acceleration to achieve up to 40× speedup over traditional methods like COLMAP, while maintaining or improving…


Mats reposted

make nanochat multimodal for < $10! this evening, i trained nanochatVL: via a projection model (llava-style) between SigLIP ViT and @karpathy nanochat to extend its understanding to images it's a huge wip rn, but have a few promising results! now i can finally sleep

Excited to release new repo: nanochat! (it's among the most unhinged I've written). Unlike my earlier similar repo nanoGPT which only covered pretraining, nanochat is a minimal, from scratch, full-stack training/inference pipeline of a simple ChatGPT clone in a single,…

karpathy's tweet image. Excited to release new repo: nanochat!
(it&apos;s among the most unhinged I&apos;ve written).

Unlike my earlier similar repo nanoGPT which only covered pretraining, nanochat is a minimal, from scratch, full-stack training/inference pipeline of a simple ChatGPT clone in a single,…


Mats reposted

The model + resources are now on HuggingFace and GitHub so researchers can keep building and experimenting. More details here: blog.google/technology/ai/…


Mats reposted

FlashWorld: High-quality 3D Scene Generation within Seconds Contributions: • We introduce a dual-mode pretraining strategy built on a video diffusion model to train a multi-view diffusion model. This model is capable of operating in both MV-oriented and 3D-oriented modes. •…


Mats reposted

the team at @Alibaba_Qwen literally cooks 🤠 they released a VLM cookbook that shows you how to do various tasks from OCR to object grounding using Qwen3-VL 👏

mervenoyann's tweet image. the team at @Alibaba_Qwen literally cooks 🤠

they released a VLM cookbook that shows you how to do various tasks from OCR to object grounding using Qwen3-VL 👏

Mats reposted

Spatial representations are central to world models🌍 SuperDec is an extremely compact 3D scene representation (replacing millions of Gaussians with just a few hundred primitives) ideal for abstract reasoning and planning in 3D ➡️super-dec.github.io ✨Oral @ICCVConference

Are photorealistic representations all we need? In SuperDec, we turn millions of points into compact and modular abstractions made of just a few superquadrics!🧩 Try our code and get a compact representation of your favorite scene!🚀 👾: github.com/elisabettafede…



Mats reposted

Excited about this one. Direct result of customer feedback. We saw hundreds of chat apps being created, and now you can actually hook it up to OpenAI. x.com/magicpatterns/…

You can now connect your Magic Patterns design to Open AI in our new integrations tab. Building a chatbot or chat app is one of the most popular use cases, so we made it real.



Mats reposted

🥁🥁🥁

zeddotdev's tweet image. 🥁🥁🥁

Mats reposted

This paper shows a 2-brain design so a voice model can think and speak with near 0 delay. It scores 92.8% on a math speech test at 0 latency, and 82.5 on a dialogue test. Waiting for a full chain of thought slows replies, and mixing thinking and speaking in one model causes…

rohanpaul_ai's tweet image. This paper shows a 2-brain design so a voice model can think and speak with near 0 delay. 

It scores 92.8% on a math speech test at 0 latency, and 82.5 on a dialogue test.

Waiting for a full chain of thought slows replies, and mixing thinking and speaking in one model causes…

Mats reposted

🇨🇳 There Are More Robots Working in China Than the Rest of the World Combined They recorded a world record of 2mn+ industrial robots working in factories.

rohanpaul_ai's tweet image. 🇨🇳 There Are More Robots Working in China Than the Rest of the World Combined

They recorded a world record of 2mn+ industrial robots working in factories.

🤖🇨🇳 China’s added 295,000 industrial robots in 2024 and a stock above 2mn. While US added about 34,000, Germany 27,000, and the UK 2,500. “If we lose this, we do not have a future at Ford,” says Jim Farley, CEO at Ford Robot intensity is also way higher in China, with 567…

rohanpaul_ai's tweet image. 🤖🇨🇳 China’s added 295,000 industrial robots in 2024 and a stock above 2mn. 

While US added about 34,000, Germany 27,000, and the UK 2,500.

“If we lose this, we do not have a future at Ford,” says Jim Farley, CEO at Ford

Robot intensity is also way higher in China, with 567…


Mats reposted

New LFM2 release 🥳 It's a Japanese PII extractor with only 350M parameters. It's extremely fast and on par with GPT-5 (!) in terms of quality. Check it out, it's available today on @huggingface!


Mats reposted

Excited to release new repo: nanochat! (it's among the most unhinged I've written). Unlike my earlier similar repo nanoGPT which only covered pretraining, nanochat is a minimal, from scratch, full-stack training/inference pipeline of a simple ChatGPT clone in a single,…

karpathy's tweet image. Excited to release new repo: nanochat!
(it&apos;s among the most unhinged I&apos;ve written).

Unlike my earlier similar repo nanoGPT which only covered pretraining, nanochat is a minimal, from scratch, full-stack training/inference pipeline of a simple ChatGPT clone in a single,…

Mats reposted

Search every version of your Revenue DAX definitions across all PBIX files in sub-second time with DuckDB and the new pbix2vpax() scalar function

igocrite's tweet image. Search every version of your Revenue DAX definitions 
across all PBIX files 
in sub-second time 
with DuckDB and the new pbix2vpax() scalar function

Mats reposted

I made a RL policy that guesses where a picture was taken without GPS data It continuously learns, updating its weights with every use in realtime -- over the weekend it improved 13.9% with <100 images Best of all, it does this without ever storing any image data, link below

made an app that guesses where you are in the world with just a picture using image embeddings trained on street view data first time using swiftui, consumer apps in general, TestFlight below



Loading...

Something went wrong.


Something went wrong.