r_o_b_e_r_t_1's profile picture. I like cool stuff.

Robert Musser

@r_o_b_e_r_t_1

I like cool stuff.

Robert Musser reposted

OPEN-SOURCES ALL MODEL CHECKPOINTS AND TRAINING LOGS REFERENCED IN THE PAPER HF: huggingface.co/collections/ji… PAPER: arxiv.org/abs/2511.03276 MEGADLMS: github.com/JinjieNi/MegaD…


Robert Musser reposted

Diffusion Language Models are Super Data Learners… now on arXiv with MegaDLMs, the full large-scale training framework (6.1K H100s, 462B-param run, 47 % MFU). Supports diffusion and autoregressive LMs, dense and MoE architectures, FP8/BF16/FP16 precision, and multi-axis…

gm8xx8's tweet image. Diffusion Language Models are Super Data Learners… now on arXiv with MegaDLMs, the full large-scale training framework (6.1K H100s, 462B-param run, 47 % MFU).
Supports diffusion and autoregressive LMs, dense and MoE architectures, FP8/BF16/FP16 precision, and multi-axis…

Robert Musser reposted

vLLM Sleep Mode 😴→ ⚡Zero-reload model switching for multi-model serving. Benchmarks: 18–200× faster switches and 61–88% faster first inference vs cold starts. Explanation Blog by @EmbeddedLLM 👇 Why it’s fast: we keep the process alive, preserving the allocator, CUDA graphs,…

vllm_project's tweet image. vLLM Sleep Mode 😴→ ⚡Zero-reload model switching for multi-model serving. Benchmarks: 18–200× faster switches and 61–88% faster first inference vs cold starts. Explanation Blog by @EmbeddedLLM 👇
Why it’s fast: we keep the process alive, preserving the allocator, CUDA graphs,…
vllm_project's tweet image. vLLM Sleep Mode 😴→ ⚡Zero-reload model switching for multi-model serving. Benchmarks: 18–200× faster switches and 61–88% faster first inference vs cold starts. Explanation Blog by @EmbeddedLLM 👇
Why it’s fast: we keep the process alive, preserving the allocator, CUDA graphs,…

Robert Musser reposted

Also out: R-HORIZON. It composes interdependent chains across math, code, and agent tasks to test real long-horizon reasoning. Top models degrade rapidly as horizon grows: DeepSeek-R1 falls from 87.3% to 24.6% at 5 linked problems, R1-Qwen-7B drops from 93.6% to 0% at 16.…

gm8xx8's tweet image. Also out: R-HORIZON. It composes interdependent chains across math, code, and agent tasks to test real long-horizon reasoning. Top models degrade rapidly as horizon grows: DeepSeek-R1 falls from 87.3% to 24.6% at 5 linked problems, R1-Qwen-7B drops from 93.6% to 0% at 16.…
gm8xx8's tweet image. Also out: R-HORIZON. It composes interdependent chains across math, code, and agent tasks to test real long-horizon reasoning. Top models degrade rapidly as horizon grows: DeepSeek-R1 falls from 87.3% to 24.6% at 5 linked problems, R1-Qwen-7B drops from 93.6% to 0% at 16.…
gm8xx8's tweet image. Also out: R-HORIZON. It composes interdependent chains across math, code, and agent tasks to test real long-horizon reasoning. Top models degrade rapidly as horizon grows: DeepSeek-R1 falls from 87.3% to 24.6% at 5 linked problems, R1-Qwen-7B drops from 93.6% to 0% at 16.…

LongCat isn’t a one-off. It’s part of a deep, coordinated research push that’s been unfolding quietly inside Meituan, the same research platform behind Flash-Thinking for large-scale reasoning, CodePlot-CoT for visual math, M4V for multimodal diffusion, BNPO for stable…

gm8xx8's tweet image. LongCat isn’t a one-off. It’s part of a deep, coordinated research push that’s been unfolding quietly inside Meituan, the same research platform behind Flash-Thinking for large-scale reasoning, CodePlot-CoT for visual math, M4V for multimodal diffusion, BNPO for stable…


Robert Musser reposted

a periodic reminder: there 👏 is 👏 no 👏 such 👏 thing 👏 as 👏 private, end-to-end encryption 👏 with 👏 meta-ai-in-the-middle 👏

suchenzang's tweet image. a periodic reminder:

there 👏 is 👏 no 👏 such 👏 thing 👏 as 👏 private, end-to-end encryption 👏 with 👏 meta-ai-in-the-middle 👏
suchenzang's tweet image. a periodic reminder:

there 👏 is 👏 no 👏 such 👏 thing 👏 as 👏 private, end-to-end encryption 👏 with 👏 meta-ai-in-the-middle 👏

the illusion of security continues...

suchenzang's tweet image. the illusion of security continues...


Robert Musser reposted

Managers have been vibe coding forever by @thekitze :D

lukasz_app's tweet image. Managers have been vibe coding forever by @thekitze :D

Robert Musser reposted

Demystifying Synthetic Data paper: arxiv.org/abs/2510.01631 Data Mixing & Phase Transitions paper: arxiv.org/abs/2505.18091


Robert Musser reposted

Meta just ran one of the largest synthetic-data studies (over 1000 LLMs, more than 100k GPU hours). Result: mixing synthetic and natural data only helps once you cross the right scale and ratio (~30%). Small models learn nothing; larger ones suddenly gain a sharp threshold…

gm8xx8's tweet image. Meta just ran one of the largest synthetic-data studies (over 1000 LLMs, more than 100k GPU hours).
Result: mixing synthetic and natural data only helps once you cross the right scale and ratio (~30%).
Small models learn nothing; larger ones suddenly gain a sharp threshold…

Data Mixing Can Induce Phase Transitions in Knowledge Acquisition 𝑨 𝑪𝑳𝑬𝑨𝑵, 𝑭𝑶𝑹𝑴𝑨𝑳 𝑩𝑹𝑬𝑨𝑲𝑫𝑶𝑾𝑵 𝑶𝑭 𝑾𝑯𝒀 𝒀𝑶𝑼𝑹 𝟕𝑩 𝑳𝑳𝑴 𝑳𝑬𝑨𝑹𝑵𝑺 𝑵𝑶𝑻𝑯𝑰𝑵𝑮 𝑭𝑹𝑶𝑴 𝑯𝑰𝑮𝑯-𝑸𝑼𝑨𝑳𝑰𝑻𝒀 𝑫𝑨𝑻𝑨 This paper reveals phase transitions in factual memorization…

gm8xx8's tweet image. Data Mixing Can Induce Phase Transitions in Knowledge Acquisition

𝑨 𝑪𝑳𝑬𝑨𝑵, 𝑭𝑶𝑹𝑴𝑨𝑳 𝑩𝑹𝑬𝑨𝑲𝑫𝑶𝑾𝑵 𝑶𝑭 𝑾𝑯𝒀 𝒀𝑶𝑼𝑹 𝟕𝑩 𝑳𝑳𝑴 𝑳𝑬𝑨𝑹𝑵𝑺 𝑵𝑶𝑻𝑯𝑰𝑵𝑮 𝑭𝑹𝑶𝑴 𝑯𝑰𝑮𝑯-𝑸𝑼𝑨𝑳𝑰𝑻𝒀 𝑫𝑨𝑻𝑨

This paper reveals phase transitions in factual memorization…


Robert Musser reposted
SwiftOnSecurity's tweet image.

Robert Musser reposted

I've been researching the Microsoft cloud for almost 7 years now. A few months ago that research resulted in the most impactful vulnerability I will probably ever find: a token validation flaw allowing me to get Global Admin in any Entra ID tenant. Blog: dirkjanm.io/obtaining-glob…


Robert Musser reposted

I just hope they infected "left-pad", "is-number", "is-odd", "is-even" packages

vctrstrm's tweet image. I just hope they infected "left-pad", "is-number", "is-odd", "is-even" packages

Robert Musser reposted

It’s pretty fucking straightforward here! thebulwark.com/p/stock-tradin…

SonnyBunch's tweet image. It’s pretty fucking straightforward here! thebulwark.com/p/stock-tradin…

Robert Musser reposted

It’s genuinely insane that it hasn’t been shut down yet. Like, just open flouting of a law passed by Congress, signed by a president, and upheld 9-0 by the Supreme Court.

There has never been a better time to ban TikTok.



Robert Musser reposted

> be Google in 2017 > small team drops “Attention Is All You Need” on arXiv > execs nod politely, go back to selling ads for socks > let Transformer gather dust for 5 yrs like a vintage Beanie Baby > be Noam Shazeer, OG wizard > quits, builds AI-boyfriend app…

zephyr_z9's tweet image. > be Google in 2017
> small team drops “Attention Is All You Need” on arXiv
> execs nod politely, go back to selling ads for socks
> let Transformer gather dust for 5 yrs like a vintage Beanie Baby
> be Noam Shazeer, OG wizard
> quits, builds AI-boyfriend app…

Robert Musser reposted

All #OrangeCon2025 talks are now online! Watch them on our YouTube channel: youtube.com/@OrangeCon


Robert Musser reposted

> be vibe coder > 2025: “I'm gonna vibe-code the next unicorn, it's a billion-dollar vibe, bro” > grab xxx.ai, .ai costs more than my rent > subscribe to every tool in existence, $1000 bucks gone > tweet demo GIF, caption “built in 3 hours, no cap”, await…

crystalsssup's tweet image. > be vibe coder
> 2025: “I'm gonna vibe-code the next unicorn, it's a billion-dollar vibe, bro”
> grab xxx.ai, .ai costs more than my rent
> subscribe to every tool in existence, $1000 bucks gone
>  tweet demo GIF, caption “built in 3 hours, no cap”, await…

Robert Musser reposted

There's a sick linenoise article by @iximeow in @phrack 71 called "Learning An ISA By Force Of Will", where ixi goes from unknown binary blob, to manual instruction decoding, to figuring out control flow, and gives a critique of the RE'd ISA. phrack.org/issues/71/3#ar…

How do you program an unknown CPU? The original specs are gone; no compilers exist, and the ISA is completely unrecognized. It happens more often than you think, behind very closed doors. It's almost always military hardware.

lauriewired's tweet image. How do you program an unknown CPU?

The original specs are gone; no compilers exist, and the ISA is completely unrecognized.

It happens more often than you think, behind very closed doors. 

It's almost always military hardware.
lauriewired's tweet image. How do you program an unknown CPU?

The original specs are gone; no compilers exist, and the ISA is completely unrecognized.

It happens more often than you think, behind very closed doors. 

It's almost always military hardware.


Robert Musser reposted

The Great Firewall of China (GFW) today experienced the largest internal document leak in its history. More than 500GB of source code, work logs, and internal communications have been exposed, revealing details about the development and operation of the GFW. The leak originated…

gfw_report's tweet image. The Great Firewall of China (GFW) today experienced the largest internal document leak in its history. More than 500GB of source code, work logs, and internal communications have been exposed, revealing details about the development and operation of the GFW.

The leak originated…
gfw_report's tweet image. The Great Firewall of China (GFW) today experienced the largest internal document leak in its history. More than 500GB of source code, work logs, and internal communications have been exposed, revealing details about the development and operation of the GFW.

The leak originated…
gfw_report's tweet image. The Great Firewall of China (GFW) today experienced the largest internal document leak in its history. More than 500GB of source code, work logs, and internal communications have been exposed, revealing details about the development and operation of the GFW.

The leak originated…

Robert Musser reposted

Exciting times. I'm publishing Dittobytes today after presenting it at @OrangeCon_nl ! Dittobytes is a true metamorphic cross-compiler aimed at evasion. Use Dittobytes to compile your malware. Each compilation produces unique, functional shellcode. github.com/tijme/dittobyt…


Loading...

Something went wrong.


Something went wrong.