Zeming Wei

@weizeming25

First-year Ph.D. student @PKU1898, ex-visiting student @UCBerkeley, research intern @BytedanceTalk. I focus on developing Trustworthy AI/ML.

Academic

Beijing

weizeming.github.io

Joined August 2017

79Posts 276Followers 497Following

You might like

@EasonZeng623

@xichen_pan

@roopali_vij

@Yihe__Deng

@jerryjiaruijin

@Tongtian_Zhu

@Ziquan12

@JJAGuerreiro

@JiachenWang97

@wzekai99

@yufengyang1999

@shiyu04490786

Pinned

Zeming Wei

Sep 5

🚨False Sense of Security: Our new paper identifies a critical limitation in representation probing-based malicious input detection—purported "high detection accuracy" may confer a false sense of security: arxiv.org/pdf/2509.03888

weizeming25's tweet image. 🚨False Sense of Security: Our new paper identifies a critical limitation in representation probing-based malicious input detection—purported "high detection accuracy" may confer a false sense of security:
arxiv.org/pdf/2509.03888

Zeming Wei reposted

Jiao Sun

@sunjiao123sun_

Dec 14

Mitigating racial bias from LLMs is a lot easier than removing it from humans! Can’t believe this happened at the best AI conference @NeurIPSConf We have ethical reviews for authors, but missed it for invited speakers? 😡

sunjiao123sun_'s tweet image. Mitigating racial bias from LLMs is a lot easier than removing it from humans!

Can’t believe this happened at the best AI conference @NeurIPSConf

We have ethical reviews for authors, but missed it for invited speakers? 😡

Zeming Wei reposted

Yifei Wang

Oct 20, 2024

Yes. AI could even lead to a decline of science since we may stop to ask why and how. We need to make AI more interpretable — not only for AI but also for humanity.

Rohan Pandey

Oct 18, 2024

unfortunately, the solution that works best often isn’t the one that teaches you the most cleaning up pretraining data works great for improving LLM perf, prompt engineering works great for improving agent perf but you won’t learn much by doing these things

Zeming Wei reposted

Yifei Wang

Sep 27, 2024

Our theory on LLM self-correction has been accepted to #NeurIPS2024 ! A good time to revisit it after the GPT-o1 release:)

Yifei Wang

Jun 15, 2024

Can LLMs improve themselves by self-correction, just like humans? In this new paper, we take a serious look into the self-correction ability of Transformers. We find although possible, self-correction is much harder than supervised learning, so we need a fully armed Transformer!

yifeiwang77's tweet image. Can LLMs improve themselves by self-correction, just like humans? In this new paper, we take a serious look into the self-correction ability of Transformers.
We find although possible, self-correction is much harder than supervised learning, so we need a fully armed Transformer!

Zeming Wei reposted

Yifei Wang

Jul 27, 2024

Excited to share that our LLM self-correction paper received the Spotlight Award (top 3) at ICML’24 ICL workshop! iclworkshop.github.io

yifeiwang77's tweet image. Excited to share that our LLM self-correction paper received the Spotlight Award (top 3) at ICML’24 ICL workshop!
iclworkshop.github.io

yifeiwang77's tweet image. Excited to share that our LLM self-correction paper received the Spotlight Award (top 3) at ICML’24 ICL workshop!
iclworkshop.github.io

Yifei Wang

Jun 15, 2024

Can LLMs improve themselves by self-correction, just like humans? In this new paper, we take a serious look into the self-correction ability of Transformers. We find although possible, self-correction is much harder than supervised learning, so we need a fully armed Transformer!

yifeiwang77's tweet image. Can LLMs improve themselves by self-correction, just like humans? In this new paper, we take a serious look into the self-correction ability of Transformers.
We find although possible, self-correction is much harder than supervised learning, so we need a fully armed Transformer!

Zeming Wei reposted

Yifei Wang

Apr 3, 2024

😎So excited to see that our In-context Attack (ICA) method has been leveraged by Anthrophic to break down the most prominent LLMs -- simply by extending # in-context examples! What a lesson of scaling!😆 See how this idea originates w/ @weizeming25 arxiv.org/abs/2310.06387

Anthropic

Apr 2, 2024

New Anthropic research paper: Many-shot jailbreaking. We study a long-context jailbreaking technique that is effective on most large language models, including those developed by Anthropic and many of our peers. Read our blog post and the paper here: anthropic.com/research/many-…

AnthropicAI's tweet image. New Anthropic research paper: Many-shot jailbreaking.

We study a long-context jailbreaking technique that is effective on most large language models, including those developed by Anthropic and many of our peers.

Read our blog post and the paper here: anthropic.com/research/many-…

Zeming Wei

Apr 4, 2024

Glad to share that I just reached 100 citations according to Google Scholar. Thanks all coauthors! scholar.google.com/citations?user…

Zeming Wei reposted

LLM Security

Jan 3, 2024

Jatmo: Prompt Injection Defense by Task-Specific Finetuning "It harnesses a teacher instruction-tuned model to generate a task-specific dataset, which is then used to fine-tune a base model (i.e., a non-instruction-tuned model). Jatmo only needs a task prompt and a dataset of…

llm_sec's tweet image. Jatmo: Prompt Injection Defense by Task-Specific Finetuning

"It harnesses a teacher instruction-tuned model to generate a task-specific dataset, which is then used to fine-tune a base model (i.e., a non-instruction-tuned model). Jatmo only needs a task prompt and a dataset of…

Xinran Zhao

@xinranz3

qwuoih

@qwiouh

Jiliang Tang

@tangjiliang

Zihao Wang

@ZihaoWang0810

花谢听秋雨

@huaxietingqiuyu

Bún bò Huế

@Bunbohue40037

Tiansheng Wen(Seeking for Fall '26 PhD)

@TianshengV111

Paige

@CPvaAnQlo15zZ3I

anpaure

@anpaure

Coleman Waelchi

@waelchi27165

Cheng Wang

@WangCheng_0116

Keapie

@Keapie204

Orhuher

@Orhuher481740

Trevor Loy

@trevorloy

EQUULEUS

@Custos_Messium

CC

@CC9427584971112

free_will

@freewill2357

Yizhu (Joy) Wang

@YizhuJoy

Rui Wang

@wangrui285714

若鸣

@YangYi80653

lruacy

@lruacy1709031

KAMITO

@KAMITO94515009

Slusesy

@SlusesybZ5q_K

MonaJonah

@fjd1YCfhFq5LvFo

Siméon

@Simeon_Cps

Iona

@lawrenceiona77

Jan Dubiński

@jan_dubinski_

洛欢

@luohuan0319

Jeffrey

@jeffrey_ai

kenji

@neuronsama

feliciytttty

@kkkkdomkkkk

H00K

@Yorecx

Syzygy/合冲

@SyzygyYuan

1oclock

@yexkif

Zihan Wang - on RAGEN

@wzihanw

Kunal

@koooooonal

Xiang Li | 李想

@XiangHCI

Selene Gao

@selenegao

VITA Group

@VITAGroupUT

Ingenium Research Group

@ingenium_rg

Hangliang Ding

@_foreverpiano

Guhao Feng

@Guhao_Feng

qzxg1910

@qzxg1910

Yu Wang

@rain305f

Zongyu Wu

@ChaserWuuu

Yuchen

@abse27088252895

Kuofeng Gao

@Kuofeng268701

Zheng Zhou

@zhouzhengqd

Shanda Li 黎善达

@Shanda_Li_2000

sudarsh

@sudarshk_

Chi Jin

@chijinML

Xinran Zhao

@xinranz3

Andrej Karpathy

@karpathy

花谢听秋雨

@huaxietingqiuyu

Tiansheng Wen(Seeking for Fall '26 PhD)

@TianshengV111

ByteDance

@BytedanceTalk

Qin Liu

@QinLiu_NLP

Junxian He

@junxian_he

Cheng Wang

@WangCheng_0116

Haodong Wen

@herrywen1

Zheng Zhou

@zhouzhengqd

Jack D. Carson

@mtlushan

Trevor Loy

@trevorloy

Zihao Wang

@ZihaoWang0810

Hengyu Fu

@HengyuF

Yizhu (Joy) Wang

@YizhuJoy

The AI Timeline

@TheAITimeline

Hongxin Wei

@OwenWei8

Yanda Chen

@yanda_chen_

Chuang Gan

@gan_chuang

Fengqing Jiang

@fengqing_jiang

Siméon

@Simeon_Cps

Ningyu Zhang@ZJU

@zxlzr

Hongkang Li

@LiHongkang_jntm

Jan Dubiński

@jan_dubinski_

洛欢

@luohuan0319

Cheng Lu

@clu_cheng

Syzygy/合冲

@SyzygyYuan

1oclock

@yexkif

Manling Li

@ManlingLi_

Zihan Wang - on RAGEN

@wzihanw

Kunal

@koooooonal

Zuxin Liu

@LiuZuxin

Anna Goldie

@annadgoldie

Xiang Li | 李想

@XiangHCI

$sunjiao123sun_'s profile picture. Senior RS @ GoogleDeepMind, improving Gemini Coding \n\n NLP PhD @ USC, Amazon ML Fellow \n\n ex-{Google Brain, Alexa AI} nlper, IIIS Tsinghua-Ren$

Jiao Sun

@sunjiao123sun_

Zeyuan Allen-Zhu, Sc.D.

@ZeyuanAllenZhu

Zhaorun Chen

@ZRChen_AISafety

Yuan Zhang

@YuanZhang_PKU

Yunyi Shen/申云逸 🐺

@ShenRaphael

Nitish Mutha ⚡️

@nitishmutha

Andreea Bobu

@andreea7b

Siqi Zhu

@realagi25

Selene Gao

@selenegao

VITA Group

@VITAGroupUT

Ingenium Research Group

@ingenium_rg

Kuofeng Gao

@Kuofeng268701

Zongyu Wu

@ChaserWuuu

Yu Wang

@rain305f

Hangliang Ding

@_foreverpiano

United States Trends

1. Under Armour 5,726 posts
2. Blue Origin 10.8K posts
3. Megyn Kelly 36.8K posts
4. Nike 27K posts
5. New Glenn 11.2K posts
6. Senator Fetterman 21.5K posts
7. Curry Brand 4,679 posts
8. Brainiac 8,613 posts
9. Vine 37.9K posts
10. #2025CaracasWordExpo 12.3K posts
11. Operación Lanza del Sur 4,432 posts
12. Operation Southern Spear 4,976 posts
13. CarPlay 4,759 posts
14. Eric Swalwell 33.2K posts
15. Matt Gaetz 18K posts
16. Portugal 69.3K posts
17. Coach Beam N/A
18. World Cup 110K posts
19. #UFC322 9,255 posts
20. Thursday Night Football 2,463 posts

You might like

Something went wrong.

Something went wrong.