Rohit Malhotra

@rohit_malh5

Openhands Maintainer | Ex-CTO @sitewizai | NLP @ CMU | Primarily interested in Agents | Secondary interests in creative design

malhotra5.github.io

Joined July 2018

150Posts 93Followers 67Following

Rohit Malhotra

@rohit_malh5

Sep 24

SWE-Agents are crushing benchmarks like SWE-Bench but are still fragile in the wild. I argue A/B testing is the missing piece for evaluating and improving SWE-Agents. Proof in Production: Evaluating Effectiveness of SWE Agents with A/B Tests open.substack.com/pub/rohitmalh/…

Rohit Malhotra reposted

Jiseung Hong

@jiseungh99

Sep 22

We are excited to launch the ⚔️PR Arena⚔️ leaderboard! Full results will be revealed after a certain milestone of community votes. Fix your GitHub issues for free and vote for better fix! 👉Leaderboard & Setup Guide: prarena.web.app

jiseungh99's tweet image. We are excited to launch the ⚔️PR Arena⚔️ leaderboard!

Full results will be revealed after a certain milestone of community votes.

Fix your GitHub issues for free and vote for better fix!

👉Leaderboard &amp; Setup Guide: prarena.web.app

Rohit Malhotra reposted

Valerie Chen

@valeriechen_

Sep 16

A recent study by Becker et al. finds AI copilots like Cursor slowed expert OSS devs by 19%. But what happens when we compare copilots to more autonomous coding agents? Our study finds the opposite story: agents can boost productivity. 🧵

valeriechen_'s tweet image. A recent study by Becker et al. finds AI copilots like Cursor slowed expert OSS devs by 19%. But what happens when we compare copilots to more autonomous coding agents? Our study finds the opposite story: agents can boost productivity. 🧵

Rohit Malhotra reposted

Robert Brennan

@rbren_dev

Sep 9

I'll be speaking about automating large-scale refactors with OpenHands at AI Engineer Paris! It's amazing how much software agents can get done if you orchestrate them thoughtfully.

rbren_dev's tweet image. I'll be speaking about automating large-scale refactors with OpenHands at AI Engineer Paris!

It's amazing how much software agents can get done if you orchestrate them thoughtfully.

Rohit Malhotra reposted

Graham Neubig

@gneubig

Sep 3

Which LM is better at agentic coding? We have a bunch of useful academic benchmarks like SWE-Bench, but we don't have a good comparison of agentic coding LMs *in the wild*. To solve this, we released PR Arena: github.com/neulab/pr-arena

gneubig's tweet card. ⚔️ OpenHands PR Arena ⚔️ is a platform for evaluating and benchmarking agentic coding assistants through paired pull request (PR) generations. - neulab/pr-arena

GitHub - neulab/pr-arena: ⚔️ OpenHands PR Arena ⚔️ is a platform for evaluating and benchmarking...

Source: github.com

Jiseung Hong

@jiseungh99

Sep 3

Introducing ⚔️PR Arena⚔️ - free AI coding agents to fix real GitHub issues. Claude Sonnet 4 vs Gemini 2.5 Pro… Who writes better pull requests? 👉 Install here: github.com/apps/openhands… Powered by @allhands_ai

Rohit Malhotra reposted

Jiseung Hong

@jiseungh99

Sep 3

Rohit Malhotra reposted

OpenHands

@OpenHandsDev

Aug 25

Having appropriate tests makes a world of difference for agent-driven development. If your agent can write a test to localize a bug or exercise a new feature, the following implementation is much more solid. OpenHands+GPT-5 is now 🥇 on the SWT-Bench testing leaderboard!

OpenHandsDev's tweet image. Having appropriate tests makes a world of difference for agent-driven development.

If your agent can write a test to localize a bug or exercise a new feature, the following implementation is much more solid.

OpenHands+GPT-5 is now 🥇 on the SWT-Bench testing leaderboard!

Rohit Malhotra reposted

OpenHands

@OpenHandsDev

Aug 22

We built OpenHands in the open (~60K ⭐️ on GitHub). Now we’re giving back to the OSS ecosystem. Announcing the OpenHands Cloud OSS Credit Program → $100–$500 credits for maintainers. 👉 Learn how to apply!

Rohit Malhotra reposted

Robert Brennan

@rbren_dev

Jul 22

Nothing more frustrating than seeing "private scaffold" on public benchmark results I love that model providers like Qwen and Mistral are now reporting their results specifically using OpenHands as the scaffold--feels like we're becoming a standard here x.com/Alibaba_Qwen/s…

Qwen

@Alibaba_Qwen

Jul 22

>>> Qwen3-Coder is here! ✅ We’re releasing Qwen3-Coder-480B-A35B-Instruct, our most powerful open agentic code model to date. This 480B-parameter Mixture-of-Experts model (35B active) natively supports 256K context and scales to 1M context with extrapolation. It achieves…

Alibaba_Qwen's tweet image. &gt;&gt;&gt; Qwen3-Coder is here! ✅

We’re releasing Qwen3-Coder-480B-A35B-Instruct, our most powerful open agentic code model to date. This 480B-parameter Mixture-of-Experts model (35B active) natively supports 256K context and scales to 1M context with extrapolation. It achieves…

Rohit Malhotra reposted

Qwen

@Alibaba_Qwen

Jul 22

Rohit Malhotra

@rohit_malh5

Jul 19

OpenHands is so general-purpose that I now think of leveraging it with workflow-driven prompting. Also stating constraints works well for me. Examples: • Examine the existing architecture, read docs for Y, plan how to implement X, then do it → Instead of: "Implement feature…

Rohit Malhotra reposted

Mistral AI

@MistralAI

Jul 10

Introducing Devstral Small and Medium 2507! This latest update offers improved performance and cost efficiency, perfectly suited for coding agents and software engineering tasks.

MistralAI's tweet image. Introducing Devstral Small and Medium 2507! This latest update offers improved performance and cost efficiency, perfectly suited for coding agents and software engineering tasks.

Rohit Malhotra reposted

Graham Neubig

@gneubig

Jun 30

What will software development look like in 2026? With coding agents rapidly improving, dev roles may look quite different. My current workflow has changed a lot: - Work in github, not IDEs - Agents in parallel - Write English, not code - More code review Thoughts + a video👇

Rohit Malhotra

@rohit_malh5

Jun 17

PSA for engineering leadership exploring software agent solutions 🚨 This post nails the difference between agentic and agentless approaches — and why it actually matters for real software tasks, beyond SWE-Bench scores!

OpenHands

@OpenHandsDev

Jun 17

Congratulations to Moonshot AI on their release of Kimi-Dev-72B, an open-weights model that achieves a great score of 60.4% on SWE-Bench Verified! Our community tried it in OpenHands, but it didn't work well, only 17% accuracy... Is this surprising? Actually not really! 🧵

OpenHandsDev's tweet image. Congratulations to Moonshot AI on their release of Kimi-Dev-72B, an open-weights model that achieves a great score of 60.4% on SWE-Bench Verified!

Our community tried it in OpenHands, but it didn't work well, only 17% accuracy... Is this surprising? Actually not really! 🧵

Rohit Malhotra

@rohit_malh5

Jun 12

Some users click with code agents. Others struggle. Why? Agents are flexible and creative - just like their users! It's confusing! Agents should understand, educate, and adapt to users. Even personalize. If the agent isn’t willing to grow, the user likely won’t either.

Rohit Malhotra

@rohit_malh5

May 28

Agents like OpenHands are flexible and have common sense, adjusting to insensible or illogical user demands Traditional software, however, follows rigid rules of behavior, which is then designed for imaginative users "Good" design will change as software becomes more flexible