isha

@is_h_a

alignment research @MATSprogram

Berkeley, CA

isha-gpt.github.io

Joined January 2025

8Posts 58Followers 175Following

isha

Oct 6

if you’ve had a hunch that your favorite model does a weird thing, you can now go audit it in 5 minutes with Petri! lets go @kaifronsdal - thoughtful open source tooling for alignment research is so important, more coming soon!!

Anthropic

Oct 6

Last week we released Claude Sonnet 4.5. As part of our alignment testing, we used a new tool to run automated audits for behaviors like sycophancy and deception. Now we’re open-sourcing the tool to run those audits.

AnthropicAI's tweet image. Last week we released Claude Sonnet 4.5. As part of our alignment testing, we used a new tool to run automated audits for behaviors like sycophancy and deception.

Now we’re open-sourcing the tool to run those audits.

isha reposted

Ken Liu

Aug 26

New paper! We explore a radical paradigm for AI evals: assessing LLMs on *unsolved* questions. Instead of contrived exams where progress ≠ value, we eval LLMs on organic, unsolved problems via reference-free LLM validation & community verification. LLMs solved ~10/500 so far:

kenziyuliu's tweet image. New paper! We explore a radical paradigm for AI evals: assessing LLMs on *unsolved* questions.

Instead of contrived exams where progress ≠ value, we eval LLMs on organic, unsolved problems via reference-free LLM validation &amp; community verification. LLMs solved ~10/500 so far:

isha reposted

Rylan Schaeffer

@RylanSchaeffer

Jul 3

New position paper! Machine Learning Conferences Should Establish a “Refutations and Critiques” Track Joint w/ @sanmikoyejo @JoshuaK92829 @yegordb @bremen79 @koustuvsinha @in4dmatics @JesseDodge @suchenzang @BrandoHablando @MGerstgrasser @is_h_a @ObbadElyas 1/6

RylanSchaeffer's tweet image. New position paper! Machine Learning Conferences Should Establish a “Refutations and Critiques” Track

Joint w/ @sanmikoyejo @JoshuaK92829 @yegordb @bremen79 @koustuvsinha @in4dmatics @JesseDodge @suchenzang @BrandoHablando @MGerstgrasser @is_h_a @ObbadElyas

1/6

isha

Mar 31

... and I just extended it to allow for multi-prompt, dual-model optimization, following the approach from the original @andyzou_jiaming GCG pager to make the jailbreak suffix universal and transferable. universal-nanoGCG available at github.com/isha-gpt/unive…!

is_h_a's tweet card. An extension of nanoGCG which allows multi-prompt, dual model optimization - isha-gpt/universal-nanoGCG

GitHub - isha-gpt/universal-nanoGCG: An extension of nanoGCG which allows multi-prompt, dual model...

Source: github.com

Anthropic

Oct 6

Last week we released Claude Sonnet 4.5. As part of our alignment testing, we used a new tool to run automated audits for behaviors like sycophancy and deception. Now we’re open-sourcing the tool to run those audits.

AnthropicAI's tweet image. Last week we released Claude Sonnet 4.5. As part of our alignment testing, we used a new tool to run automated audits for behaviors like sycophancy and deception.

Now we’re open-sourcing the tool to run those audits.

Egesor

@Egesor840024

bici

@bici207157

Gautham Elango

@gautham_elango

Ciprian Cîmpan

@ciprian_cimpan

Nathan Delisle

@delisl3

Advait

@advtydv

keshav

@kshenoy_

Stephen Price

@stephenprice100

Aritra

@lalalaepsilon

✦✦✦

@not_infinite___

lucy

@_backpropaganda

Maaz

@mmaaz_98

Kai Fronsdal

@kaifronsdal

abhayesian

@abhayesian

Christine Ye

@christinexye

Lukas Möller

@lukasmoellerd1

Jonas Ngnawé

@JNgnawe

Ortatne

@Ortatne37602

Raimundo Baravaglio

@proferay

yulong

@_yulonglin

Damaris

@RRLC1o3foKdv4Bb

🚀Henry is launching the Astra Research Program!

@sleight_henry

Vincent

@vvvincent_c

Alice Wilson

@Logic5q80

Tarun Davuluri

@Tarun_Davuluri

Sumeet Motwani

@sumeetrm

Taylor

@Brutui690595

vitor

@Negromont3

Basit Mustafa

@moltar81435

Mr. Jack Tung

@MrJackTung

Ibrahim Ahmad

@ahmad993196

Neil Rathi

@neil_rathi

Carlo Geat

@geatcarlo

Aryaman Arora

@aryaman2020

Paul Bogdan

@paulcbogdan

Susana

@Susana999Susana

anpaure

@anpaure

Elyas Obbad

@ObbadElyas

lily (xiaoqing)

@lilysun004

Joschka Braun

@BraunJoschka

Caleb Biddulph

@CalebBiddulph

Pranav Mehta

@pranavmehta97

daniel phillips (dp) 📈❤️‍🔥

@DpEpsilon

Simon Lermen

@SimonLermenAI

Clem

@clemhus

Rylan Schaeffer

@RylanSchaeffer

Justus Mattern

@MatternJustus

Dominique Paul

@DominiqueCAPaul

Alyssa Unell

@AlyssaUnell

Ken Liu

@kenziyuliu

Joshua Kazdan

@JoshuaK92829

Nathan Delisle

@delisl3

Lukas Möller

@lukasmoellerd1

Christine Ye

@christinexye

Maaz

@mmaaz_98

Advait

@advtydv

keshav

@kshenoy_

Sam Bowman

@sleepinyourhat

Kai Fronsdal

@kaifronsdal

abhayesian

@abhayesian

Chloe Loughridge

@ChloeLough333

Emil Ryd

@emilaryd

yulong

@_yulonglin

Gary Marcus

@GaryMarcus

Xander Davies

@alxndrdavies

daniel phillips (dp) 📈❤️‍🔥

@DpEpsilon

Caleb Biddulph

@CalebBiddulph

Elyas Obbad

@ObbadElyas

Neil Rathi

@neil_rathi

Joanne Jang

@joannejang

🚀Henry is launching the Astra Research Program!

@sleight_henry

Vincent

@vvvincent_c

Wojciech Zaremba

@woj_zaremba

Samuel Marks

@saprmarks

Adam Gleave

@ARGleave

Eric Wallace

@Eric_Wallace_

Rohin Shah

@rohinmshah

Ajeya Cotra

@ajeya_cotra

Jeremy Fox 🦊

@JeremyDanielFox

Shlomi Fruchter

@shlomifruchter

Elizabeth Barnes

@BethMayBarnes

METR

@METR_Evals

Apollo Research

@apolloaievals

Marius Hobbhahn

@MariusHobbhahn

Epoch AI

@EpochAIResearch

Trenton Bricken

@TrentonBricken

lily (xiaoqing)

@lilysun004

Joschka Braun

@BraunJoschka

MATS Research

@MATSprogram

Fabien Roger

@FabienDRoger

Andrea Michi

@andreamichi

Jack Rae

@jack_w_rae

Nate Rahn

@n8rahn

Stephen McAleer

@McaleerStephen

Constantin Venhoff

@cvenhoff00

Simon Lermen

@SimonLermenAI

Julian Minder

@jkminder

Jack Lindsey

@Jack_W_Lindsey

Goodfire

@GoodfireAI

Clem

@clemhus

United States Trends

1. Good Sunday 51.8K posts
2. Discussing Web3 N/A
3. #HealingFromMozambique 18.5K posts
4. #sundayvibes 4,555 posts
5. Wordle 1,576 X N/A
6. Blessed Sunday 17K posts
7. Trump's FBI 11.1K posts
8. Biden FBI 17.5K posts
9. Gilligan's Island 5,485 posts
10. Macrohard 9,434 posts
11. The CDC 32.1K posts
12. Dissidia 7,600 posts
13. Go Broncos 1,260 posts
14. Pegula 5,512 posts
15. Nor'easter 1,687 posts
16. FDV 5min 2,177 posts
17. Utah 25.3K posts
18. Market Cap Surges N/A
19. QUICK TRADE 2,172 posts
20. Whale - Buy 1,792 posts

Something went wrong.

Something went wrong.