TensorSlay's profile picture. 張量殺手

Tensor-Slayer

@TensorSlay

張量殺手

Ghim

Exploring Direct Tensor Manipulation in Language Models: A Case Study in Binary-Level Model Enhancement: areu01or00.github.io/Tensor-Slayer.…


Codex and CC helping me refactor codebase


Sonnet 4.5 has been nerfed.

TensorSlay's tweet image. Sonnet 4.5 has been nerfed.

Tensor-Slayer đã đăng lại

今天是 Kimi 两岁生日,转发这条推…


Tweet sent from Android-Use. Model used : Gemini flash 2.0. Tokens used : 800


Experimental repo : github.com/areu01or00/And… Use models like 4o-mini/flash 2.0 to control your android devices. I had an old android and wanted to do something fun.

LMAO THIS SHIT WORKS



LMAO THIS SHIT WORKS


Scary number of startups built over the last 2-3 years are one dev day away from getting obsolete.


Sonnet 4.5 about gpt-5-codex (medium)

TensorSlay's tweet image. Sonnet 4.5 about gpt-5-codex (medium)
TensorSlay's tweet image. Sonnet 4.5 about gpt-5-codex (medium)

Tensor-Slayer đã đăng lại

🎯 Milestone Unlocked I’m excited to share that I’ve completed the “Scratch to Scale: Large-Scale Training in the Modern World” course by @TheZachMueller on maven ! Scratch to Scale has been one of the most practical and insightful courses I’ve taken — it goes far beyond…

thefirehacker's tweet image. 🎯 Milestone Unlocked

I’m excited to share that I’ve completed the “Scratch to Scale: Large-Scale Training in the Modern World” course by @TheZachMueller  on maven !

Scratch to Scale has been one of the most practical and insightful courses I’ve taken — it goes far beyond…

Damn GoodFire team arrived and how

TensorSlay's tweet image. Damn GoodFire team arrived and how

This is cool. Do read.

Agents for experimental research != agents for software development. This is a key lesson we've learned after several months refining agentic workflows! More takeaways on effectively using experimenter agents + a key tool we're open-sourcing to enable them: 🧵

GoodfireAI's tweet image. Agents for experimental research != agents for software development.

This is a key lesson we've learned after several months refining agentic workflows!

More takeaways on effectively using experimenter agents + a key tool we're open-sourcing to enable them: 🧵


Tensor-Slayer đã đăng lại

(0) Scaling AI often lets you bypass engineering solutions to a problem. A bitter lesson! (1) It doesn’t let you bypass designing a careful problem specification. There’s no free lunch. (2) But scale can raise the level of abstraction at which you can define your problem. DSPy.


Tensor-Slayer đã đăng lại

软硬协同加速落地:DeepSeek-V3.2-Exp携手TileLang,共启国产AI新周期 编者按:还记得我们上次一起分享的文章,DeepSeek带来的优化是国产AI与半导体协同的起点。这次有更多的东西开始落地。…

Compute_King's tweet image. 软硬协同加速落地:DeepSeek-V3.2-Exp携手TileLang,共启国产AI新周期

编者按:还记得我们上次一起分享的文章,DeepSeek带来的优化是国产AI与半导体协同的起点。这次有更多的东西开始落地。…
Compute_King's tweet image. 软硬协同加速落地:DeepSeek-V3.2-Exp携手TileLang,共启国产AI新周期

编者按:还记得我们上次一起分享的文章,DeepSeek带来的优化是国产AI与半导体协同的起点。这次有更多的东西开始落地。…
Compute_King's tweet image. 软硬协同加速落地:DeepSeek-V3.2-Exp携手TileLang,共启国产AI新周期

编者按:还记得我们上次一起分享的文章,DeepSeek带来的优化是国产AI与半导体协同的起点。这次有更多的东西开始落地。…

DeepSeek的UE8M0 FP8优化:国产AI与半导体协同的战略转折点 在人工智能训练和推理加速的竞赛中,浮点数(Floating…

Compute_King's tweet image. DeepSeek的UE8M0 FP8优化:国产AI与半导体协同的战略转折点

在人工智能训练和推理加速的竞赛中,浮点数(Floating…
Compute_King's tweet image. DeepSeek的UE8M0 FP8优化:国产AI与半导体协同的战略转折点

在人工智能训练和推理加速的竞赛中,浮点数(Floating…
Compute_King's tweet image. DeepSeek的UE8M0 FP8优化:国产AI与半导体协同的战略转折点

在人工智能训练和推理加速的竞赛中,浮点数(Floating…


Is this a high signal

TensorSlay's tweet image. Is this a high signal

Tensor-Slayer đã đăng lại

The most cited paper of the 21st century is on deep residual learning with residual connections. Who invented this? Timeline: ★ 1991: @HochreiterSepp solves vanishing gradient problem through recurrent residual connections (weight 1.0) ★ 1997 LSTM: plain recurrent residual…

SchmidhuberAI's tweet image. The most cited paper of the 21st century is on deep residual learning with residual connections. Who invented this? Timeline:

★ 1991: @HochreiterSepp solves vanishing gradient problem through recurrent residual connections (weight 1.0) 
★ 1997 LSTM: plain recurrent residual…

Task : Scale test time compute Prompt : Write a poem about Balrogs saving Morgoth from Ungoliant as if it was written by JRR Tolkien Result : Patched model being steered in desired direction, demonstrated below. --------------------------------------------------------- No…

Hacking model architecture with @DSPyOSS + GEPA. This method, in contrast to techniques like MIPROv2 and GEPA, modifies the architecture/paramters of the model to steer the model in right direction. For instance, see the classic "How many r's" question to a tiny model - Qwen3…



Hacking model architecture with @DSPyOSS + GEPA. This method, in contrast to techniques like MIPROv2 and GEPA, modifies the architecture/paramters of the model to steer the model in right direction. For instance, see the classic "How many r's" question to a tiny model - Qwen3…


Me digging myself out of veRL dependency hell : Don’t be difficult


Tensor-Slayer đã đăng lại

Time to lock in and revamp for cohort 2 🫡 A few highlights: - Less speakers (more code focused workshops) - More TorchTitan (hands on with a framework for each implementation) - Analyzing torch profile traces (see our results) - My practical FP8 updates (how does one FP8 well)

TheZachMueller's tweet image. Time to lock in and revamp for cohort 2 🫡 

A few highlights:
- Less speakers (more code focused workshops)
- More TorchTitan (hands on with a framework for each implementation)
- Analyzing torch profile traces (see our results)
- My practical FP8 updates (how does one FP8 well)

^_^

oLLM: a lightweight Python library for LLM inference build on top of transformers 🔥 Run qwen3-next-80B, GPT-OSS, Llama3, on consumer hardware. Awesome work by Anuar!

LysandreJik's tweet image. oLLM: a lightweight Python library for LLM inference build on top of transformers 🔥

Run qwen3-next-80B, GPT-OSS, Llama3, on consumer hardware. Awesome work by Anuar!


Loading...

Something went wrong.


Something went wrong.