weakly_typed's profile picture. learning {ML, PL, maths} // CS pre-grad // DMs open :)

weakly typed

@weakly_typed

learning {ML, PL, maths} // CS pre-grad // DMs open :)

Pinned

while this is an impressive demonstration of the capabilities of large language models to synthesise natural-language problem statements into formal / executable versions, we're still a long way off from 'true' system 2 mathematical reasoning (1/3)


weakly typed reposted

Exciting, mechanistic interpretability has a dedicated lecture in the syllabus of a Cambridge CS masters course! The field has come so far in the past few years ❤️

NeelNanda5's tweet image. Exciting, mechanistic interpretability has a dedicated lecture in the syllabus of a Cambridge CS masters course! The field has come so far in the past few years ❤️

weakly typed reposted

The slowly-unfolding premise of the Good Place is that everyone is damned. They are damned because they participate in the modern world; they buy from sweatshops, they eat chocolate, they fly in airplanes while the poorest people in the world see their harvests fail thanks to…


weakly typed reposted

Take a break from arxiv/LW/AF. Sit in the woods with a random textbook and mull new ideas away from interp community lockstep. Diverge. Don’t compete with a saturated subtopic, maybe you’ll get to take weekends off. Premature overinvestment comes from monoculture.

So what should the community do? I'd guess we're over-invested in fundamental SAE research, but shouldn't abandon it completely. And SAEs remain a valuable tool, esp for exploration and debugging I'm most keen on applied work, and making targeted fixes for fundamental issues.

NeelNanda5's tweet image. So what should the community do?

I'd guess we're over-invested in fundamental SAE research, but shouldn't abandon it completely. And SAEs remain a valuable tool, esp for exploration and debugging

I'm most keen on applied work, and making targeted fixes for fundamental issues.


weakly typed reposted

I've recently learned about Algebraic Positional Encoding from @bgavran3 and isnt this the coolest breakthrough in mathematical approaches to transformers in the last few years arxiv.org/abs/2312.16045


weakly typed reposted

LLMs are dramatically worse at ARC tasks the bigger they get. However, humans have no such issues - ARC task difficulty is independent of size. Most ARC tasks contain around 512-2048 pixels, and o3 is the first model capable of operating on these text grids reliably.

mikb0b's tweet image. LLMs are dramatically worse at ARC tasks the bigger they get. However, humans have no such issues - ARC task difficulty is independent of size.

Most ARC tasks contain around 512-2048 pixels, and o3 is the first model capable of operating on these text grids reliably.

weakly typed reposted

This is a really creative and well-executed paper on using "black-box interpretability" methods to understand and control model cognition. Especially impressed by the many applications explored IMO this is an important direction; this paper sets the field on an excellent path!

LLMs have behaviors, beliefs, and reasoning hidden in their activations. What if we could decode them into natural language? We introduce LatentQA: a new way to interact with the inner workings of AI systems. 🧵

aypan_17's tweet image. LLMs have behaviors, beliefs, and reasoning hidden in their activations. What if we could decode them into natural language?

We introduce LatentQA: a new way to interact with the inner workings of AI systems. 🧵


weakly typed reposted
voooooogel's tweet image.

weakly typed reposted

The tragic suicide of Sewell Setzer III shows our generation has become unwitting test subjects in a vast, unregulated AI experiment. That's why we're launching @youthandai with our Generation AI Survey in @TIME. A thread: (1/10)

American teenagers believe addressing the potential risks of AI should be a top priority for lawmakers, according to a new poll time.com/7098524/teenag…



weakly typed reposted

Announcing Transluce, a nonprofit research lab building open source, scalable technology for understanding AI systems and steering them in the public interest. Read a letter from the co-founders Jacob Steinhardt and Sarah Schwettmann: transluce.org/introducing-tr…


SHA-256: 218cebed21f2e8514df2ea1e4caca39750349cf30804995d5d577f08afc5855a


in slight defense of mathiness / mathematical notation in ML research papers: a thread (twessay?)

in slight defense of mathiness: there’s a flavour of research that looks like “finding the right abstractions through which to think about things” — either to make it easier to build tools to manipulate the things, or to inspire researchers to import ideas from other fields



weakly typed reposted

Who should I meet in Cambridge? (You?)


weakly typed reposted

On Reddit's statistics forum, the most common question is "What test should I use?" My answer, from 2011, is "There is only one test" allendowney.blogspot.com/2011/05/there-…

AllenDowney's tweet image. On Reddit's statistics forum, the most common question is "What test should I use?"
My answer, from 2011, is "There is only one test"

allendowney.blogspot.com/2011/05/there-…

weakly typed reposted

Mechanistic interpretability gives us rich explanations of models. But can we convert these explanations into formal proofs? Surprisingly, yes! Mech interp helps write short proofs of generalization bounds — and, shorter proofs provide more mechanistic understanding. 🧵

diagram_chaser's tweet image. Mechanistic interpretability gives us rich explanations of models. But can we convert these explanations into formal proofs?

Surprisingly, yes! Mech interp helps write short proofs of generalization bounds — and, shorter proofs provide more mechanistic understanding. 🧵

perhaps growing up is realising that 'growing up' was a comforting lie


on reading ml papers:

weakly_typed's tweet image. on reading ml papers:

maybe the most exciting interp result I’ve seen all year (if it ends up being true for interesting reasons): a meaningful step towards uncovering the type of the residual stream

Fundamentally, high-level concepts group into categorical variables---mammal, reptile, fish, bird---with a semantic hierarchy---poodle is a dog is a mammal is an animal. How do LLMs internally represent this structure? arxiv.org/abs/2406.01506



weakly typed reposted

fyi the real reason i've been ignoring you is: - i want to reply - i want to be able to give you the attention and focus you deserve - i never feel like i have enough energy to properly do that

fuck, did i just cut off every single one of my autistic friends (all of my friends) who can't read jokes??

arithmoquine's tweet image. fuck, did i just cut off every single one of my autistic friends (all of my friends) who can't read jokes??


mechinterp people: does anyone have a good (formal?) definition of 'feature' that doesn't assume the linear representation hypothesis? like, if I have some points in high-dim space, what makes them "the composition of several features" as opposed to "some random points"


weakly typed reposted

very interesting that every frontier lab interp team is working on sparse autoencoders (SAEs) and ~ no one in academia is


Loading...

Something went wrong.


Something went wrong.