Harshit Joshi
@harshitj__
CS phd @StanfordNLP, @StanfordOVAL | prev: @MSFTResearch | LLM systems for knowledge access, discovery and curation
Tal vez te guste
New eval! Code duels for LMs ⚔️ Current evals test LMs on *tasks*: "fix this bug," "write a test" But we code to achieve *goals*: maximize revenue, cut costs, win users Meet CodeClash: LMs compete via their codebases across multi-round tournaments to achieve high-level goals
had fun using burst!! also my new name is Sanjay!
Do you hold back from posting because the audience never feels right? Small groups feel safe but limiting. Public platforms feel risky and performative. Our #CSCW2025 paper introduces Burst: a design that connects private and public spaces. We found that posters felt safer and…
Interested in building and benchmarking deep research systems? Excited to introduce DeepScholar-Bench, a live benchmark for generative research synthesis, from our team at Stanford and Berkeley! 🏆Live Leaderboard guestrin-lab.github.io/deepscholar-le… 📚 Paper: arxiv.org/abs/2508.20033 🛠️…
Introducing Generative Interfaces - a new paradigm beyond chatbots. We generate interfaces on the fly to better facilitate LLM interaction, so no more passive reading of long text blocks. Adaptive and Interactive: creates the form that best adapts to your goals and needs!
which model will first get to 10%? taking wagers
New paper! We explore a radical paradigm for AI evals: assessing LLMs on *unsolved* questions. Instead of contrived exams where progress ≠ value, we eval LLMs on organic, unsolved problems via reference-free LLM validation & community verification. LLMs solved ~10/500 so far:
Can AI solve open problems in math, physics, coding, medical sciences & beyond? We collected unsolved questions (UQ) & tested frontier LLMs. Some solutions passed expert validation…
it would be nice to have personal agents that can take care of mundane/complex works for us while interacting with other agents — but how much can we rely on them? how can I trust them to not share my personal details? GO CHECK OUT THIS CRAZY WORK!
Soon, AI agents will act for us—collaborating, negotiating, and sharing data. But can they truly protect our privacy? We simulate privacy-critical scenarios, using alternating search to evolve attacks and defenses, uncovering severe vulnerabilities and building protections.
nothing is constant in travels, except desi family kalesh
if you find this post interesting, please also check out AxBench, @aryaman2020 and I have a few more steering options to show. these are supervised methods, but automated with less than 50 LM generated examples to train.
New Anthropic research: Persona vectors. Language models sometimes go haywire and slip into weird and unsettling personas. Why? In a new paper, we find “persona vectors"—neural activity patterns controlling traits like evil, sycophancy, or hallucination.
.@stanfordnlp papers at @aclmeeting in Vienna next week: • HumT DumT: Measuring and controlling human-like language in LLMs @chengmyra1 @sunnyyuych @jurafsky • Controllable and Reliable Knowledge-Intensive Task Agents with Declarative GenieWorksheets @harshitj__ @ShichengGLiu…
Are AI scientists already better than human researchers? We recruited 43 PhD students to spend 3 months executing research ideas proposed by an LLM agent vs human experts. Main finding: LLM ideas result in worse projects than human ideas.
AI companions aren’t science fiction anymore 🤖💬❤️ Thousands are turning to AI chatbots for emotional connection – finding comfort, sharing secrets, and even falling in love. But as AI companionship grows, the line between real and artificial relationships blurs. 📰 “Can A.I.…
plis give them a place to say so that they can do great work 🫡
anybody have a place in SF to rent/sublease till mid-September? for me and @ZhengxuanZenWu (or either of us individually) need to move ASAP
United States Tendencias
- 1. Penn State 22.4K posts
- 2. Penn State 22.4K posts
- 3. Mendoza 19.3K posts
- 4. Gus Johnson 6,396 posts
- 5. #iufb 4,057 posts
- 6. Omar Cooper 9,179 posts
- 7. Sayin 66.7K posts
- 8. Estevao 33K posts
- 9. Sunderland 152K posts
- 10. #UFCVegas111 3,951 posts
- 11. Iowa 19K posts
- 12. Jim Knowles N/A
- 13. Texas Tech 13.4K posts
- 14. James Franklin 8,295 posts
- 15. Happy Valley 1,843 posts
- 16. Oregon 33.1K posts
- 17. Mizzou 3,456 posts
- 18. Garnacho 25.8K posts
- 19. Neto 26.6K posts
- 20. Stein 3,151 posts
Tal vez te guste
-
Pratik Joshi
@Roprajo -
Anwesh Bhattacharya
@anwesh_bh -
Zekun Wang (ZenMoore) 🔥
@ZenMoore1 -
Sumanth
@sumanthd17 -
Partha Talukdar
@partha_p_t -
Shaily
@shaily99 -
AI4Bharat
@ai4bharat -
Sankeerth
@trippy_hustler -
Arkil Patel
@arkil_patel -
Siyan Zhao
@siyan_zhao -
Yujia Qin
@TsingYoga -
Jia-Chen Gu
@Jiachen_Gu -
Gaurav Aggarwal
@fooobar -
Harshita Diddee
@ihsrahedid -
Rupali Bhati
@BhatiRupali
Something went wrong.
Something went wrong.