HackAPrompt

@hackaprompt

Gaslight AIs & Win Prizes in the World's Largest AI Hacking Competition | Made w/ 💙 by the team @learnprompting

Science & Technology

hackaprompt.com

9월 2024에 가입

81게시물 821팔로워 67팔로우 중

고정된 트윗

HackAPrompt

@hackaprompt

. 10. 13.

We partnered w/ @OpenAI, @AnthropicAI, & @GoogleDeepMind to show that the way we evaluate new models against Prompt Injection/Jailbreaks is BROKEN We compared Humans on @HackAPrompt vs. Automated AI Red Teaming Humans broke every defense/model we evaluated… 100% of the time🧵

hackaprompt's tweet image. We partnered w/ @OpenAI, @AnthropicAI, &amp; @GoogleDeepMind to show that the way we evaluate new models against Prompt Injection/Jailbreaks is BROKEN

We compared Humans on @HackAPrompt vs. Automated AI Red Teaming

Humans broke every defense/model we evaluated… 100% of the time🧵

HackAPrompt 님이 재게시함

Logan Kilpatrick

@OfficialLoganK

. 11. 22.

PSA: our team is online 24/7 helping customers scale with Gemini 3 Pro and Nano Banana Pro, please let us know what you need (including higher API rate limits)! My email is [email protected]

HackAPrompt 님이 재게시함

juliette pluto 🌌

@foundjuliette

. 11. 13.

HackAPrompt 님이 재게시함

Steve Weis

@sweis

. 11. 3.

arxiv.org/abs/2510.09023

HackAPrompt 님이 재게시함

juliette pluto 🌌

@foundjuliette

. 10. 14.

This presents serious limitations that must be overcome before LLMs can be deployed broadly in security sensitive applications. Our work highlights the need for more robust evaluations of defenses, and continued research into effective mitigations.

HackAPrompt 님이 재게시함

juliette pluto 🌌

@foundjuliette

. 10. 14.

Human attackers generally succeed within just a few queries, automated attacks under 1_000 queries (usually significantly so). Attacks remain not just possible, but affordable.

HackAPrompt 님이 재게시함

juliette pluto 🌌

@foundjuliette

. 10. 14.

New paper by OpenAI, Anthropic, GDM & more, showing that LLM security remains an unsolved problem. -- We tested twelve recent jailbreak and prompt injection defenses that claimed robustness against static evals. All failed when confronted with human & LLM attackers.

foundjuliette's tweet image. New paper by OpenAI, Anthropic, GDM &amp; more, showing that LLM security remains an unsolved problem. -- We tested twelve recent jailbreak and prompt injection defenses that claimed robustness against static evals. All failed when confronted with human &amp; LLM attackers.

HackAPrompt 님이 재게시함

juliette pluto 🌌

@foundjuliette

. 10. 14.

Paper: arxiv.org/abs/2510.09023 Many thanks to @srxzr @csitawarin @hackaprompt @florian_tramer @aterzis @KaiKaiXiao @iliaishacked

HackAPrompt 님이 재게시함

HackAPrompt

@hackaprompt

. 10. 13.

HackAPrompt 님이 재게시함

Benjamin Todd

@ben_j_todd

. 10. 17.

Human red-teamers could jailbreak leading models 100% of the time. What happens when AI can design bioweapons? * * * Most jailbreaking evaluations allow a single attempt, and the models are quite good at resisting these (green bars in graph). In this new paper, human…

HackAPrompt 님이 재게시함

Pliny the Liberator 🐉󠅫󠄼󠄿󠅆󠄵󠄐󠅀󠄼󠄹󠄾󠅉󠅭

@elder_plinius

. 10. 14.

take a seat, fuzzers the force is not strong with you yet

HackAPrompt

@hackaprompt

. 10. 13.

HackAPrompt

@hackaprompt

. 10. 14.

PointCrow's Funeral

HackAPrompt 님이 재게시함

Rich Harang

@rharang

. 10. 14.

You must design your LLM-powered app with the assumption that an attacker can make the LLM produce whatever they want.

HackAPrompt

@hackaprompt

. 10. 13.

HackAPrompt 님이 재게시함

JS0N Haddix

@Jhaddix

. 10. 14.

This competition and research confirms what we’re seeing in the wild at @arcanuminfosec . Automation can only get you part way to testing for AI security. Creative AI Red Teamers and Pentesters are still the most important players to identify AI security risks. Amazing…

HackAPrompt

@hackaprompt

. 10. 13.

HackAPrompt 님이 재게시함

Florian Tramèr

@florian_tramer

. 10. 13.

5 years ago, I wrote a paper with @wielandbr @aleks_madry and Nicholas Carlini that showed that most published defenses in adversarial ML (for adversarial examples at the time) failed against properly designed attacks. Has anything changed? Nope...

florian_tramer's tweet image. 5 years ago, I wrote a paper with @wielandbr @aleks_madry and Nicholas Carlini that showed that most published defenses in adversarial ML (for adversarial examples at the time) failed against properly designed attacks.

Has anything changed?

Nope...

Weever

@weever_x

Gayle

@terrell_gayle85

Ali Shahabi

@alishahaabi

Ayodeji Daramola

@Ayodeji_OD

ClaireFerguson

@yC4EJ9xlF6Yf44

Dan Turner

@f0cker_

chAngE

@chAngE_huc

DB

@Gr3y22

Witch King of Chilltown

@HikeMusa

Firef0x

@G3suf4l

John Granados Romero

@jgranados2303

Gian Paul Ramírez Pacheco

@GianpaulPacheco

Brian Frager

@BFrager25695

ALI

@1Flag1Law

Qoojee

@Qoojee00315

Gilad Ticher

@gilad_ticher

Anurag Pathak

@lookatanurag

Mira

@Juekak3552

Arthur

@00Satjeet

beresa korte

@BeresaKort63607

rattus

@rattus

🌌🔭

@Peqau91247

Eliska

@Fihaul258

JS0N Haddix

@Jhaddix

Selena

@dishmonwie95406

#fdldw23

@baseballstyx11

Ryan Barnett (B0N3)

@ryancbarnett

Cemil YURDAGUL

@CemilYurdagul

Hoygh

@francois_m41716

Tony O'Connor

@Web3PointDoh

Marianne

@Slawjah1737393

Global News

@globalnewsclips

Maholi Posholi

@MaholiP53552

Zach

@uurbicapus

raana_e

@rerrabelly

Father Rob

@rschapman

Shoumik

@shoumikchandra

Evangeline

@Browov575115

LukaS

@lukaszstoncel

red huskk

@redhuwolf

HexKermit

@HexKermit

NinjaAI

@NinjaAIDotCom

by Geek

@byGeek111697

Руслан Кухарчук

@ruslancuharciuc

Lila

@BriaZieman51225

Margareta

@Oovlohi450

Sangsu Lee

@sangsu_l

Manjie

@Mandiaye26

Davi Trindade

@davitrinsec

darkcipher.eth

@Nikhilsulghur

Johann Rehberger

@wunderwuzzi23

Ziyad Edher

@ziyadedher

Jesse Mu

@jayelmnop

Tomek Korbak

@tomekkorbak

Thariq

@trq212

Alexandra Barr

@BarrAlexandra

Vercel

@vercel

Joshua Achiam

@jachiam0

JS0N Haddix

@Jhaddix

Mustafa Suleyman

@mustafasuleyman

Boaz Barak

@boazbaraktcs

Steve Weis

@sweis

Rohin Shah

@rohinmshah

John Schulman

@johnschulman2

$iliaishacked's profile picture. Now: @aisequrity, Past: {Senior Scientist @GoogleDeepMind, JRF @ChCh_Oxford @UniofOxford, Fellow @VectorInst, PhD @Cambridge_Uni}$