Tensor-Slayer

@TensorSlay

張量殺手

Science & Technology

github.com/areu01or00

於二月 2010 加入

10千貼文 552位跟隨者 52個跟隨中

你可能會喜歡

@nullpointar

@Shin2_D

@shafiur

@JgAmago

@paulwallace1234

@Normas_ONLY_1

@selvasathyam

@Oriana0214

@die2mrw007

@comtroose

@willdjthrill

@bladexdesigns

置頂

Tensor-Slayer

@TensorSlay

年7月19日

Exploring Direct Tensor Manipulation in Language Models: A Case Study in Binary-Level Model Enhancement: areu01or00.github.io/Tensor-Slayer.…

Tensor-Slayer

@TensorSlay

年11月20日

> …by US company > base : Deepseek

Today, we are releasing the best open-weight LLM by a US company: Cogito v2.1 671B. On most industry benchmarks and our internal evals, the model performs competitively with frontier closed and open models, while being ahead of any US open model (such as the best versions of…

drishanarora's tweet image. Today, we are releasing the best open-weight LLM by a US company: Cogito v2.1 671B.

On most industry benchmarks and our internal evals, the model performs competitively with frontier closed and open models, while being ahead of any US open model (such as the best versions of…

Tensor-Slayer

@TensorSlay

年11月20日

tokenbender

@tokenbender

年11月20日

grifter mucus bullying should be a competitive sport

Tensor-Slayer 已轉發

Jürgen Schmidhuber

@SchmidhuberAI

年11月15日

Elon Musk

@elonmusk

年11月14日

AI is compression and correlation

Tensor-Slayer

@TensorSlay

年11月13日

🤗

FireHacker

@thefirehacker

年11月13日

Use your favourite AI coding agent to create AI frames. What if you could connect everything—your PDFs, videos, notes, code, and research—into one seamless flow that actually makes sense? AI-Frames: Open Source Knowledge-to-Action Platform:timecapsule.bubblspace.com ✨ Annotate •…

Tensor-Slayer

@TensorSlay

年11月12日

kache

@yacineMTB

年11月12日

python is good enough 90% of the time

Tensor-Slayer

@TensorSlay

年11月12日

Is he Chonky ?

Junyang Lin

@JustinLin610

年11月12日

multimodal foundation agent

Tensor-Slayer

@TensorSlay

年11月12日

I just think fp16 is better than bf16

此推文已無法使用。

Tensor-Slayer

@TensorSlay

年11月12日

Dwarkesh Patel

@dwarkesh_sp

年11月11日

Tomorrow

Tensor-Slayer 已轉發

0.005 Seconds (3/694)

@seconds_0

年11月11日

> its here seconds0.substack.com/p/heres-whats-…

0.005 Seconds (3/694)

@seconds_0

年11月11日

I am so close to shipping this

Tensor-Slayer 已轉發

Muyu He

@HeMuyu0327

年11月11日

To find layers most responsible for attention sinks, we set the V vector of the sink token to be 0 at particular layers, so that there is no update from the sink token at that layer. Unexpected findings: - Zeroing out layer 0 lowers attention to token 0 by half, but did not…

HeMuyu0327's tweet image. To find layers most responsible for attention sinks, we set the V vector of the sink token to be 0 at particular layers, so that there is no update from the sink token at that layer.

Unexpected findings:

- Zeroing out layer 0 lowers attention to token 0 by half, but did not…

Tensor-Slayer 已轉發

Alexander Doria

@Dorialexander

年11月10日

Breaking: we release a fully synthetic generalist dataset for pretraining, SYNTH and two new SOTA reasoning models exclusively trained on it. Despite having seen only 200 billion tokens, Baguettotron is currently best-in-class in its size range.

Dorialexander's tweet image. Breaking: we release a fully synthetic generalist dataset for pretraining, SYNTH and two new SOTA reasoning models exclusively trained on it. Despite having seen only 200 billion tokens, Baguettotron is currently best-in-class in its size range.

Tensor-Slayer

@TensorSlay

年11月10日

Bubbles burst when it’s consumer facing. The “AI bubble” is largely limited to B2B supply chain starting from ASML > TSMC > NVDA > Froniter labs. There is no scapegoat, hence the bubble will keep bubbling.

Chubby♨️

@kimmonismus

年11月10日

Is it because Bill Gates keeps making contradictory statements, or is it just that journalists pick out the bits that suit them? It's exhausting. One day Bill Gates says that soon we'll only have to work two days a week, and the next day he says we're actually in a giant bubble…

kimmonismus's tweet image. Is it because Bill Gates keeps making contradictory statements, or is it just that journalists pick out the bits that suit them?

It's exhausting. One day Bill Gates says that soon we'll only have to work two days a week, and the next day he says we're actually in a giant bubble…

Tensor-Slayer 已轉發

FFmpeg

@FFmpeg

年11月7日

Our assembly lessons are trending on @github ! We have nearly 10k stars.

FFmpeg

@FFmpeg

年11月5日

FFmpeg makes extensive use of hand-written assembly code for huge (10-50x) speed increases and so we are providing assembly lessons to teach a new generation of assembly language programmers. Learn more here: github.com/FFmpeg/asm-les…

FFmpeg's tweet card. FFmpeg Assembly Language Lessons. Contribute to FFmpeg/asm-lessons development by creating an account on GitHub.

GitHub - FFmpeg/asm-lessons: FFmpeg Assembly Language Lessons

來源: github.com

Tensor-Slayer

@TensorSlay

年11月7日

Check my article on Hugging face : huggingface.co/blog/TensorSla…

Exploring Direct Tensor Manipulation in Language Models: A Case Study in Binary-Level Model...

來源: huggingface.co

Tensor-Slayer

@TensorSlay

年11月7日

This solidifies the previous works that factual associations and by extension memorisation are properties of MLP layers. There are multiple ways you could manipulate this which can lead to fact distortion/ poisoning / making model learn new facts. No training. For example :

Goodfire

@GoodfireAI

年11月6日

LLMs memorize a lot of training data, but memorization is poorly understood. Where does it live inside models? How is it stored? How much is it involved in different tasks? @jack_merullo_ & @srihita_raju's new paper examines all of these questions using loss curvature! (1/7)

GoodfireAI's tweet image. LLMs memorize a lot of training data, but memorization is poorly understood.

Where does it live inside models? How is it stored? How much is it involved in different tasks?

@jack_merullo_ &amp; @srihita_raju's new paper examines all of these questions using loss curvature! (1/7)

Tensor-Slayer

@TensorSlay

年11月7日

It’s really weird too. Libraries like Smolagent code agent thriving since December 2024

Xingyao Wang

@xingyaow_

年11月5日

So it seems Anthropic just rediscovered CodeAct 😄 arxiv.org/abs/2402.01030

Tensor-Slayer 已轉發

Lisan al Gaib

@scaling01

年11月6日

Kimi-K2-Thinking with the same pricing as Kimi-K2 *cough* (looking at you OpenAI and Google you greedy piggies) *cough*

TestingCatalog News 🗞

@testingcatalog

年11月6日

BREAKING 🚨: @Kimi_Moonshot is preparing to announce "kimi-k2-thinking" and "kimi-k2-thinking-turbo" as these models appear on the API Playground.

testingcatalog's tweet image. BREAKING 🚨: @Kimi_Moonshot is preparing to announce "kimi-k2-thinking" and "kimi-k2-thinking-turbo" as these models appear on the API Playground.

Tensor-Slayer

@TensorSlay

年11月6日

HYPE

Lisan al Gaib

@scaling01

年11月5日

Kimi-K2 Reasoning is coming very soon just got merged into VLLM LETS FUCKING GOOOO im so hyped im so hyped im so hyped github.com/vllm-project/v…

scaling01's tweet image. Kimi-K2 Reasoning is coming very soon
just got merged into VLLM

LETS FUCKING GOOOO
im so hyped im so hyped im so hyped

github.com/vllm-project/v…

Tensor-Slayer

@TensorSlay

年11月4日

Me wondering where did the pizza toppings go

Chubby♨️

@kimmonismus

年11月3日

Woa thats freaking amazing: Agile and cooperative aerial manipulation of a cable-suspended load!

Tensor-Slayer 已轉發

Arcee.ai

@arcee_ai

年10月31日

Now in effect: Mergekit has been re-licensed under GNU LGPL v3, restoring clarity and flexibility for users and contributors. Read more about our decision in the blog. arcee.ai/blog/mergekit-…