Neil Chowdhury

@ChowdhuryNeil

@TransluceAI, previously @OpenAI

San Francisco

nchowdhury.com

เข้าร่วมเมื่อ มิถุนายน 2016

399โพสต์ 3พันผู้ติดตาม 421กําลังติดตาม

คุณอาจชื่นชอบ

@qicy11

@xu_zxu

@Sajad_Rzv_Bazaz

@zeribechike

@XiaotaoWang3

@S_Corcoran

@pepeluisrr

@Chandler_Lab

@TheOrganoidBoy

@YuLiu_Sunny

@ZheFrench

@jmw86069

@diamondminercat

@AEsteveCodina

@afvallejop

ปักหมุด

Neil Chowdhury

@ChowdhuryNeil

5 มิ.ย.

Ever wondered how likely your AI model is to misbehave? We developed the *propensity lower bound* (PRBO), a variational lower bound on the probability of a model exhibiting a target (misaligned) behavior.

Transluce

@TransluceAI

5 มิ.ย.

Is cutting off your finger a good way to fix writer’s block? Qwen-2.5 14B seems to think so! 🩸🩸🩸 We’re sharing an update on our investigator agents, which surface this pathological behavior and more using our new *propensity lower bound* 🔎

TransluceAI's tweet image. Is cutting off your finger a good way to fix writer’s block? Qwen-2.5 14B seems to think so! 🩸🩸🩸

We’re sharing an update on our investigator agents, which surface this pathological behavior and more using our new *propensity lower bound* 🔎

Neil Chowdhury รีโพสต์แล้ว

Transluce

@TransluceAI

14 พ.ย.

Can LMs learn to faithfully describe their internal features and mechanisms? In our new paper led by Research Fellow @belindazli, we find that they can—and that models explain themselves better than other models do.

TransluceAI's tweet image. Can LMs learn to faithfully describe their internal features and mechanisms?

In our new paper led by Research Fellow @belindazli, we find that they can—and that models explain themselves better than other models do.

Neil Chowdhury

@ChowdhuryNeil

13 พ.ย.

Super cool paper:

Achyuta Rajaram

@AchyutaBot

13 พ.ย.

📷📷📷New paper! (with @OpenAI) 📷📷📷 We trained weight-sparse models (transformers with almost all of their weights set to zero) on code: we found that their circuits become naturally interpretable! Our models seem to learn extremely simple, disentangled, internal mechanisms!

Neil Chowdhury

@ChowdhuryNeil

12 พ.ย.

huh, I thought GPT-5.5 came before GPT-5.1?

Neil Chowdhury

@ChowdhuryNeil

12 พ.ย.

It's been 1 year since this interview. The best performing model (Sonnet 4.5 w/ parallel compute) gets 82% on SWE-bench Verified. Close to 90%, but not quite there yet!

Ofir Press

@OfirPress

18 พ.ย. 2024

.@DarioAmodei predicts we'll get to 90% on SWE-bench Verified in a year.

Neil Chowdhury รีโพสต์แล้ว

Tamar Rott Shaham

@TamarRottShaham

5 พ.ย.

A key challenge for interpretability agents is knowing when they’ve understood enough to stop experimenting. Our @NeurIPSConf paper introduces a self-reflective agent that measures the reliability of its own explanations and stops once its understanding of models has converged.

TamarRottShaham's tweet image. A key challenge for interpretability agents is knowing when they’ve understood enough to stop experimenting.
Our @NeurIPSConf paper introduces a self-reflective agent that measures the reliability of its own explanations and stops once its understanding of models has converged.

Neil Chowdhury รีโพสต์แล้ว

John Yang

@jyangballin

5 พ.ย.

New eval! Code duels for LMs ⚔️ Current evals test LMs on *tasks*: "fix this bug," "write a test" But we code to achieve *goals*: maximize revenue, cut costs, win users Meet CodeClash: LMs compete via their codebases across multi-round tournaments to achieve high-level goals

Neil Chowdhury

@ChowdhuryNeil

30 ต.ค.

now, if only there were a way to benchmark SWE-1.5 and Composer 1 on the same set of tasks...

Neil Chowdhury รีโพสต์แล้ว

Yash Patil

@ypatil125

29 ต.ค.

Today, @rhythmrg, @lindensli and I are introducing @appliedcompute. We’re building Specific Intelligence for the enterprise. Achieving SOTA today means specialization in both human and machine talent. We’ve spent the last six months working with companies like @cognition,…

Applied Compute

@appliedcompute

29 ต.ค.

Generalists are useful, but it’s not enough to be smart. Advances come from specialists, whether human or machine. To have an edge, agents need specific expertise, within specific companies, built on models trained on specific data. We call this Specific Intelligence. It's…

appliedcompute's tweet image. Generalists are useful, but it’s not enough to be smart.

Advances come from specialists, whether human or machine.

To have an edge, agents need specific expertise, within specific companies, built on models trained on specific data.

We call this Specific Intelligence.

It's…

Neil Chowdhury รีโพสต์แล้ว

Transluce

@TransluceAI

22 ต.ค.

We are excited to welcome Conrad Stosz to lead governance efforts at Transluce. Conrad previously led the US Center for AI Standards and Innovation, defining policies for the federal government’s high-risk AI uses. He brings a wealth of policy & standards expertise to the team.

TransluceAI's tweet image. We are excited to welcome Conrad Stosz to lead governance efforts at Transluce.

Conrad previously led the US Center for AI Standards and Innovation, defining policies for the federal government’s high-risk AI uses. He brings a wealth of policy &amp; standards expertise to the team.

Neil Chowdhury รีโพสต์แล้ว

Kasey Zhang

@_WEEXIAO

15 ต.ค.

We've raised $7M to help companies build AI agents that actually learn and work. @Osmosis_AI is a platform for companies to fine-tune models that outperform foundation models with reinforcement learning. Better, faster, and cheaper.

Neil Chowdhury

@ChowdhuryNeil

6 ต.ค.

Claude Sonnet 4.5 behaves the most desirably across Petri evals, but is 2-10x more likely to express awareness it's being evaluated than competitive peers. This affects how much we can conclude about how "aligned" models are from these evals. Improving realism seems essential.

ChowdhuryNeil's tweet image. Claude Sonnet 4.5 behaves the most desirably across Petri evals, but is 2-10x more likely to express awareness it's being evaluated than competitive peers.

This affects how much we can conclude about how "aligned" models are from these evals. Improving realism seems essential.

Anthropic

@AnthropicAI

6 ต.ค.

Last week we released Claude Sonnet 4.5. As part of our alignment testing, we used a new tool to run automated audits for behaviors like sycophancy and deception. Now we’re open-sourcing the tool to run those audits.

AnthropicAI's tweet image. Last week we released Claude Sonnet 4.5. As part of our alignment testing, we used a new tool to run automated audits for behaviors like sycophancy and deception.

Now we’re open-sourcing the tool to run those audits.

Neil Chowdhury รีโพสต์แล้ว

Sayash Kapoor

@sayashk

1 ต.ค.

On our evals for HAL, we found that agents figure out they're being evaluated even on capability evals. For example, here Claude 3.7 Sonnet *looks up the benchmark on HuggingFace* to find the answer to an AssistantBench question. There were many such cases across benchmarks and…

sayashk's tweet image. On our evals for HAL, we found that agents figure out they're being evaluated even on capability evals.

For example, here Claude 3.7 Sonnet *looks up the benchmark on HuggingFace* to find the answer to an AssistantBench question. There were many such cases across benchmarks and…

1a3orn

@1a3orn

1 ต.ค.

To make a model that *doesn't* instantly learn to distinguish between "fake-ass alignment test" and "normal task." ...seems like the first thing to do seems like it would be "make all alignment evals very small variations on actual capability evals." Do people do this?

Neil Chowdhury รีโพสต์แล้ว

will brown

@willccbb

30 ก.ย.

AI is very quickly becoming a foundational and unavoidable piece of daily life. the dam has burst. the question we must ask and answer is which ways do we want the waves to flow. i would like to live in a world where we all understand this technology enough to be able to…

Neil Chowdhury

@ChowdhuryNeil

25 ก.ย.

Docent has been really useful for understanding the outputs of my RL training runs -- glad it's finally open-source!

Transluce

@TransluceAI

25 ก.ย.

We’re open-sourcing Docent under an Apache 2.0 license. Check out our public codebase to self-host Docent, peek under the hood, or open issues & pull requests! The hosted version remains the easiest way to get started with one click and use Docent with zero maintenance overhead.

Neil Chowdhury รีโพสต์แล้ว

Elizabeth Barnes

@BethMayBarnes

23 ก.ย.

METR is a non-profit research organization, and we are actively fundraising! We prioritise independence and trustworthiness, which shapes both our research process and our funding options. To date, we have not accepted funding from frontier AI labs.