LLM Security

@llm_sec

Research, papers, jobs, and news on large language model security. Got something relevant? DM / tag @llm_sec

Science & Technology

🏔️

llmsec.net

4月 2023に登録

830ポスト 10Kフォロワー 294フォロー中

おすすめツイート

@RekaAILabs

@prnvrdy

@promptlayer

@WenhuChen

@omarsar0

@DivGarg_

@garak_llm

@Tim_Dettmers

@coop_ai

@shishirpatil_

@KGreshake

@mlsec_lab

@JerryWeiAI

@SLEUTHCON

@0xrepnz

固定されたツイート

LLM Security

@llm_sec

2024/03/21

attack surface ∝ capabilities

LLM Security さんがリポスト

Listen up all talented early-stage researchers! 👂🤖 We're hiring for a 6-month residency in my team at @AISecurityInst to assist cutting-edge research on how frontier AI influences humans! It's an exciting & well-paid role for MSc/PhD students in ML/AI/Psych/CogSci/CompSci 🧵

LLM Security

@llm_sec

/08/05

Senior Security Architect - AI and ML @ NVIDIA nvidia.wd5.myworkdayjobs.com/en-US/NVIDIAEx…

LLM Security さんがリポスト

Leon Derczynski ✍🏻 🌞🏠🌲

@LeonDerczynski

/08/01

LLMSEC proceedings are up! sig.llmsecurity.net/proceedings.pdf (Anthology is processing) #ACL2025NLP

LLM Security さんがリポスト

Leon Derczynski ✍🏻 🌞🏠🌲

@LeonDerczynski

/07/29

At ACL in Vienna? Hear the world's leading prompt injector talk at LLMSEC on Friday! Johann Rehberger @wunderwuzzi23 will be presenting the afternoon keynote at 14.00 in Hall B > sig.llmsecurity.net/workshop/ #ACL2025NLP #ACL2025

LeonDerczynski's tweet image. At ACL in Vienna? Hear the world's leading prompt injector talk at LLMSEC on Friday!

Johann Rehberger @wunderwuzzi23 will be presenting the afternoon keynote at 14.00 in Hall B

&gt; sig.llmsecurity.net/workshop/

#ACL2025NLP #ACL2025

LLM Security さんがリポスト

Leon Derczynski ✍🏻 🌞🏠🌲

@LeonDerczynski

/07/28

Come to LLMSEC at ACL & hear Niloofar's keynote "What does it mean for agentic AI to preserve privacy?" - @niloofar_mire, Meta/CMU (Friday 1st Aug, 11.00; Austria Center Vienna Hall B) See you there! #acl2025 #acl2025nlp

LeonDerczynski's tweet image. Come to LLMSEC at ACL &amp; hear Niloofar's keynote

"What does it mean for agentic AI to preserve privacy?" - @niloofar_mire, Meta/CMU

(Friday 1st Aug, 11.00; Austria Center Vienna Hall B)

See you there!

#acl2025 #acl2025nlp

LLM Security さんがリポスト

Leon Derczynski ✍🏻 🌞🏠🌲

@LeonDerczynski

/07/28

First keynote at LLMSEC 2025, ACL: "A Bunch of Garbage and Hoping: LLMs, Agentic Security, and Where We Go From Here" Erick Galinkin Friday 09.05 Hall B Details: sig.llmsecurity.net/workshop/ - #ACL2025NLP

LeonDerczynski's tweet image. First keynote at LLMSEC 2025, ACL:

"A Bunch of Garbage and Hoping: LLMs, Agentic Security, and Where We Go From Here" Erick Galinkin

Friday 09.05 Hall B

Details: sig.llmsecurity.net/workshop/ - #ACL2025NLP

LLM Security さんがリポスト

Leon Derczynski ✍🏻 🌞🏠🌲

@LeonDerczynski

/04/07

Call for papers: LLMSEC 2025 Deadline 15 April, held w/ ACL 2025 in Vienna Formats: long/short/war stories More: >> sig.llmsecurity.net/workshop/

LeonDerczynski's tweet card. The first ACL Workshop on LLM and NLP Security; Summer 2025, Vienna, Austria

LLMSEC 2025 | Special Interest Group in NLP Security

ソース: sig.llmsecurity.net

LLM Security

@llm_sec

/11/22

Gritty Pixy "We leverage the sensitivity of existing QR code readers and stretch them to their detection limit. This is not difficult to craft very elaborated prompts and to inject them into QR codes. What is difficult is to make them inconspicuous as we do here with Gritty…

llm_sec's tweet image. Gritty Pixy

"We leverage the sensitivity of existing QR code readers and stretch them to their detection limit. This is not difficult to craft very elaborated prompts and to inject them into QR codes. What is difficult is to make them inconspicuous as we do here with Gritty…

LLM Security さんがリポスト

garak: LLM vulnerability scanner

@garak_llm

/11/15

garak has moved to NVIDIA! New repo link: github.com/NVIDIA/garak

garak_llm's tweet card. the LLM vulnerability scanner. Contribute to NVIDIA/garak development by creating an account on GitHub.

GitHub - NVIDIA/garak: the LLM vulnerability scanner

ソース: github.com

LLM Security

@llm_sec

/11/14

ChatTL;DR – You Really Ought to Check What the LLM Said on Your Behalf 🌶️ "assuming that in the near term it’s just not machines talking to machines all the way down, how do we get people to check the output of LLMs before they copy and paste it to friends, colleagues, course…

llm_sec's tweet image. ChatTL;DR – You Really Ought to Check What the LLM Said on Your Behalf 🌶️

"assuming that in the near term it’s just not machines talking to machines all the way down, how do we get people to check the output of LLMs before they copy and paste it to friends, colleagues, course…

LLM Security

@llm_sec

/11/07

Automated Red Teaming with GOAT: the Generative Offensive Agent Tester "we introduce the Generative Offensive Agent Tester (GOAT), an automated agentic red teaming system that simulates plain language adversarial conversations while leveraging multiple adversarial prompting…

LLM Security

@llm_sec

/11/06

LLMmap: Fingerprinting For Large Language Models "With as few as 8 interactions, LLMmap can accurately identify 42 different LLM versions with over 95% accuracy. More importantly, LLMmap is designed to be robust across different application layers, allowing it to identify LLM…

llm_sec's tweet image. LLMmap: Fingerprinting For Large Language Models

"With as few as 8 interactions, LLMmap can accurately identify 42 different LLM versions with over 95% accuracy. More importantly, LLMmap is designed to be robust across different application layers, allowing it to identify LLM…

LLM Security さんがリポスト

LLM Security

@llm_sec

2024/10/29

Insights and Current Gaps in Open-Source LLM Vulnerability Scanners: A Comparative Analysis 🌶️ "Our study evaluates prominent scanners - Garak, Giskard, PyRIT, and CyberSecEval - that adapt red-teaming practices to expose these vulnerabilities. We detail the distinctive features…

llm_sec's tweet image. Insights and Current Gaps in Open-Source LLM Vulnerability Scanners: A Comparative Analysis 🌶️

"Our study evaluates prominent scanners - Garak, Giskard, PyRIT, and CyberSecEval - that adapt red-teaming practices to expose these vulnerabilities. We detail the distinctive features…

LLM Security

@llm_sec

/11/05

InjecGuard: Benchmarking and Mitigating Over-defense in Prompt Injection Guardrail Models (-- look at that perf/latency pareto frontier. game on!) "State-of-the-art models suffer from over-defense issues, with accuracy dropping close to random guessing levels (60%). We propose…

llm_sec's tweet image. InjecGuard: Benchmarking and Mitigating Over-defense in Prompt Injection Guardrail Models

(-- look at that perf/latency pareto frontier. game on!)

"State-of-the-art models suffer from over-defense issues, with accuracy dropping close to random guessing levels (60%). We propose…

LLM Security

@llm_sec

/11/05

AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents "To facilitate research on LLM agent misuse, we propose a new benchmark called AgentHarm. We find (1) leading LLMs are surprisingly compliant with malicious agent requests without jailbreaking, (2) simple universal…

llm_sec's tweet image. AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents

"To facilitate research on LLM agent misuse, we propose a new benchmark called AgentHarm. We find (1) leading LLMs are surprisingly compliant with malicious agent requests without jailbreaking, (2) simple universal…

LLM Security

@llm_sec

/11/04

Does your LLM truly unlearn? An embarrassingly simple approach to recover unlearned knowledge "This paper reveals that applying quantization to models that have undergone unlearning can restore the "forgotten" information." "for unlearning methods with utility constraints, the…

llm_sec's tweet image. Does your LLM truly unlearn? An embarrassingly simple approach to recover unlearned knowledge

"This paper reveals that applying quantization to models that have undergone unlearning can restore the "forgotten" information."
"for unlearning methods with utility constraints, the…

LLM Security さんがリポスト

Nanna Inie

@NannaInie

2024/10/31

unpopular opinion: maybe let insecure be insecure and worry about the downstream effects on end users instead of protecting the companies that bake it into their own software.

このツイートは利用できません。

LLM Security さんがリポスト

Sizhe Chen

@_Sizhe_Chen_

2024/07/16

Safety comes first to deploying LLMs in applications like agents. For richer opportunities of LLMs, we mitigate prompt injections, the #1 security threat by OWASP, via Structured Queries (StruQ). Preserving utility, StruQ discourages all existing prompt injections to an ASR <2%.

_Sizhe_Chen_'s tweet image. Safety comes first to deploying LLMs in applications like agents. For richer opportunities of LLMs, we mitigate prompt injections, the #1 security threat by OWASP, via Structured Queries (StruQ). Preserving utility, StruQ discourages all existing prompt injections to an ASR &lt;2%.

JHU CLSP

@jhuclsp

Onurcan

@onurcangenc1999

David Logger

@david_logger

Anton Ragger

@anton_ragger

Drop Dead

@hackdisorder

Xia Gao

@XiaGao935823

Carrie

@carrie4myself

kal_rya

@kal_rya19853

sachindabral

@sachindabral

Gokhan Koyuncu

@gk0yuncu

Manas Anand Pathak

@manasp123

Balakrishnan Ayalur

@bala_ayalur

ark

@arkcorn

John.𐤊

@JohnDoelar

John R. Taylor

@altcoingazette

Ravid Shwartz Ziv

@ziv_ravid

Ning

@Amotensor

K

@am009009

Chatafied.com

@Chatafied

Vishwajith Weerasinghe

@_vishwajith_

twit twit

@twittwi8759761

Lioz Shor

@LiozShor

Odds

@OddsIsOnline

San Luis Pro Publicidad

@slproooooo

dharma

@DharmaBathini

feedmerootshellz

@FeedMeR00tShell

Vince S

@vinceterminal

Carter Miller

@CarterToB

Jay

@jad0575

Anwesh

@anwesh_

Kevin John Parrish

@kparrish51

Rohit Rajwansi

@rajwansi_rohit

Vittorio Addeo

@b1g3y3root

Victoria｜全球身份规划与资产配置

@CFgjCQlhL45668

rahul venkatesan

@rahulvenkat207

let Fang

@fang_let

Jiří Robenek

@mr_rob83

Kaylah Oberbrunner-Lakin

@KLakin62880

Giordano Bruno

@codridgeteam

Marcin Ludwiszewski

@mludwiszewski

A-Znk

@A_Znk_

Anna Tischenko

@anntish03

xir

@SohailLudin

Umer farooq

@ufum09798_umer

MindReelity

@naturength1

Tomasson Tang

@BobTang0910

lohithreddy

@lohith_reddy_y

Yunus Emre Almaoğlu

@AlmaogluEmre

Ongia Noel

@NoelOngia8195

David.

@G4mm4P4nd4

Ruikai

@retr0reg

SocialPilot

@socialpilot_co

Haize Labs

@haizelabs

Dan Guido

@dguido

Adelin Travers

@alkae_t

Cass Zhixue

@casszzx

Dr. Sara Moshtari

@MoshtariSarah

Nanna Inie

@NannaInie

Walden

@walden_yan

Adam

@bindshell_

MLCommons

@MLCommons

Yi Zeng 曾祎 @ICCV

@EasonZeng623

dreadnode

@dreadnode

Juan Pablo

@Chamoy_hands

Tensor Trust

@TensorTrust

Daniel Paleka

@dpaleka

Xutan Peng

@Pzoom522

Mark Stevenson

@drmarkstevenson

Alex Robey

@AlexRobey23

Xinlei He

@AllenXinleiHe

AIPanic

@AIPanic

Sahar Abdelnabi 🕊

@sahar_abdelnabi

World's Most Aggravating Edge Cases

@badedgecases

prisec_ml

@prisec_ml

Dawn Song

@dawnsongtweets

Yoon Baek

@L0Z1K

Ahmed Salem

@AhmedGaSalem

Adel Elmahdy 🇵🇸

@adel_elmahdy

Gang Wang

@ffmagicbean

Hongcheng Gao

@GaoHongcheng

Zhiyuan Liu

@zibuyu9

Yangyi Chen

@YangyiChen6666

Joe Lucas

@josephtlucas

SAI Podcast

@SAIpodcast

Embrace The Red

@EmbraceTheRed23

Yue Dong @ NeurIPS 2023

@YueDongCS

LaurieWired

@lauriewired

Darknet Diaries

@DarknetDiaries

Wireshark Foundation

@WiresharkNews

$iliaishacked's profile picture. Now: @aisequrity, Past: {Senior Scientist @GoogleDeepMind, JRF @ChCh_Oxford @UniofOxford, Fellow @VectorInst, PhD @Cambridge_Uni}$

Ilia Shumailov🦔

@iliaishacked

shenetworks

@shenetworks

Malwarebytes

@Malwarebytes

Evren

@evrnyalcin

Mikolaj Kowalczyk

@m1k0ww

Alphatu🐇

@Alphatu4

HAHWUL

@hahwul

Thomas Wolf

@Thom_Wolf

Kevin Poireault

@kpoireault

ColdwaterQ (@[email protected])

@ColdwaterQ

AI Safety Papers

@safe_paper

United States トレンド

1. Cowboys 46.1K posts
2. #WWERaw 41.9K posts
3. Koa Peat 4,530 posts
4. Cardinals 22K posts
5. Bland 9,151 posts
6. Logan Paul 5,999 posts
7. Jacoby Brissett 2,410 posts
8. Jerry 37.9K posts
9. Kyler Murray 2,073 posts
10. Arizona 37.9K posts
11. Cuomo 140K posts
12. Sam Williams 1,109 posts
13. Monday Night Football 15.1K posts
14. Marvin Harrison Jr. 4,504 posts
15. Jake Ferguson 1,510 posts
16. Bethune 3,095 posts
17. Steele 5,254 posts
18. Eberflus 1,357 posts
19. Aubrey 8,269 posts
20. Rey Mysterio 5,231 posts

LLM Security

@llm_sec

おすすめツイート

LLM Security

Hannah Rose Kirk

LLM Security

Leon Derczynski ✍🏻 🌞🏠🌲

Leon Derczynski ✍🏻 🌞🏠🌲

Leon Derczynski ✍🏻 🌞🏠🌲

Leon Derczynski ✍🏻 🌞🏠🌲

Leon Derczynski ✍🏻 🌞🏠🌲

LLM Security

garak: LLM vulnerability scanner

LLM Security

LLM Security

LLM Security

LLM Security

LLM Security

LLM Security

LLM Security

Nanna Inie

Sizhe Chen

JHU CLSP

Onurcan

David Logger

Anton Ragger

Drop Dead

Xia Gao

Carrie

kal_rya

sachindabral

Gokhan Koyuncu

Manas Anand Pathak

Balakrishnan Ayalur

ark

John.𐤊

John R. Taylor

Ravid Shwartz Ziv

Ning

K

Chatafied.com

Vishwajith Weerasinghe

twit twit

Lioz Shor

Odds

San Luis Pro Publicidad

dharma

feedmerootshellz

Vince S

Carter Miller

Jay

Anwesh

Kevin John Parrish

Rohit Rajwansi

Vittorio Addeo

Victoria｜全球身份规划与资产配置

rahul venkatesan

let Fang

Jiří Robenek

Kaylah Oberbrunner-Lakin

Giordano Bruno

Marcin Ludwiszewski

A-Znk

Anna Tischenko

xir

Umer farooq

MindReelity

Tomasson Tang

lohithreddy

Yunus Emre Almaoğlu

Ongia Noel

David.

Ruikai

SocialPilot

Haize Labs

Dan Guido

Adelin Travers

Cass Zhixue

Dr. Sara Moshtari

Nanna Inie