rohit_malh5's profile picture. Openhands Maintainer | Ex-CTO @sitewizai | NLP @ CMU | Primarily interested in Agents | Secondary interests in creative design

Rohit Malhotra

@rohit_malh5

Openhands Maintainer | Ex-CTO @sitewizai | NLP @ CMU | Primarily interested in Agents | Secondary interests in creative design

Rohit Malhotra reposted

Have you heard of the inner and outer loops of development? It's a term coined in Microsoft in the late 2010s, and development was moving from single workstations to cloud-based collaboration.

OpenHandsDev's tweet image. Have you heard of the inner and outer loops of development?

It's a term coined in Microsoft in the late 2010s, and development was moving from single workstations to cloud-based collaboration.

OpenHands was founded on the belief that highly autonomous SWE agents should be open, accessible, and free for everyone. Our $18.8M Series A helps push this vision. Highly secure, highly autonomous, model‑agnostic, async cloud‑based SWE agents platform. ALL shipped openly!

Big news: OpenHands has raised an $18.8M Series A led by @MadronaVentures to build the open standard for autonomous software development. Open source, model agnostic, and already used across thousands of repos. Read more: openhands.dev/blog/weve-just…

OpenHandsDev's tweet image. Big news: OpenHands has raised an $18.8M Series A led by @MadronaVentures to build the open standard for autonomous software development.

Open source, model agnostic, and already used across thousands of repos.

Read more: openhands.dev/blog/weve-just…


Rohit Malhotra reposted

New blog post out! 📜 We share our latest research efforts to build more effective, human-centered AI collaboration. Months ago, I was genuinely surprised by how quickly AI agents were improving, and with that came a deep fear of being replaced, of humans slowly losing agency as…


Rohit Malhotra reposted

This is honestly my new favorite use case for agents, it feels pretty magical. See an error popped up in your SaaS -> copy-paste the error into a github workflow -> 8-10 minutes later you have a diagnosis and a patch (and for us, it's probably about 80% correct).

This is how we use agents to debug errors in our production web service: 1. Get @datadoghq logs and finding out when the error started 2. Look through the commit history, finding any suspicious code changes 3. If something is found, report back to human engineer w/ a patch



Rohit Malhotra reposted

The video for my talk "Lessons from the trenches in building usable coding agents" has been uploaded! youtube.com/watch?v=p7zebv… It's an overview of some of the problems we faced and research work we've done to fix them over the past 1.5 years, hope it's interesting!

gneubig's tweet card. Lessons from the Trenches on Building Usable Coding Agents - Graham...

youtube.com

YouTube

Lessons from the Trenches on Building Usable Coding Agents - Graham...


Rohit Malhotra reposted

What do these tasks have in common? - Fixing security vulnerabilities - Data entry from messy unstructured forms - Version upgrades They're repetitive but important tasks that can be solved in a few lines of code by our new OpenHands Software Agent SDK.

OpenHandsDev's tweet image. What do these tasks have in common?

- Fixing security vulnerabilities
- Data entry from messy unstructured forms
- Version upgrades

They're repetitive but important tasks that can be solved in a few lines of code by our new OpenHands Software Agent SDK.

Rohit Malhotra reposted

Strange that people are not giving credit to the CodeAct paper:

nileshtrivedi's tweet image. Strange that people are not giving credit to the CodeAct paper:

Rohit Malhotra reposted

Hoping your coding agents could understand you and adapt to your preferences? Meet TOM-SWE, our new framework for coding agents that don’t just write code, but model the user's mind persistently (ranging from general preferences to small details) arxiv: arxiv.org/abs/2510.21903


Rohit Malhotra reposted

We just built and released the largest dataset for supervised fine-tuning of agentic LMs, 1.27M trajectories (~36B tokens)! Up until now, large-scale SFT for agents is rare - not for lack of data, but because of fragmentation across heterogeneous formats, tools, and interfaces.…


SWE-Agents are crushing benchmarks like SWE-Bench but are still fragile in the wild. I argue A/B testing is the missing piece for evaluating and improving SWE-Agents. Proof in Production: Evaluating Effectiveness of SWE Agents with A/B Tests open.substack.com/pub/rohitmalh/…


Rohit Malhotra reposted

We are excited to launch the ⚔️PR Arena⚔️ leaderboard! Full results will be revealed after a certain milestone of community votes. Fix your GitHub issues for free and vote for better fix! 👉Leaderboard & Setup Guide: prarena.web.app

jiseungh99's tweet image. We are excited to launch the ⚔️PR Arena⚔️ leaderboard!

Full results will be revealed after a certain milestone of community votes.

Fix your GitHub issues for free and vote for better fix!

👉Leaderboard & Setup Guide: prarena.web.app

Rohit Malhotra reposted

A recent study by Becker et al. finds AI copilots like Cursor slowed expert OSS devs by 19%. But what happens when we compare copilots to more autonomous coding agents? Our study finds the opposite story: agents can boost productivity. 🧵

valeriechen_'s tweet image. A recent study by Becker et al. finds AI copilots like Cursor slowed expert OSS devs by 19%. But what happens when we compare copilots to more autonomous coding agents? Our study finds the opposite story: agents can boost productivity. 🧵

Rohit Malhotra reposted

I'll be speaking about automating large-scale refactors with OpenHands at AI Engineer Paris! It's amazing how much software agents can get done if you orchestrate them thoughtfully.

rbren_dev's tweet image. I'll be speaking about automating large-scale refactors with OpenHands at AI Engineer Paris!

It's amazing how much software agents can get done if you orchestrate them thoughtfully.

Rohit Malhotra reposted

Which LM is better at agentic coding? We have a bunch of useful academic benchmarks like SWE-Bench, but we don't have a good comparison of agentic coding LMs *in the wild*. To solve this, we released PR Arena: github.com/neulab/pr-arena

Introducing ⚔️PR Arena⚔️ - free AI coding agents to fix real GitHub issues. Claude Sonnet 4 vs Gemini 2.5 Pro… Who writes better pull requests? 👉 Install here: github.com/apps/openhands… Powered by @allhands_ai



Rohit Malhotra reposted

Introducing ⚔️PR Arena⚔️ - free AI coding agents to fix real GitHub issues. Claude Sonnet 4 vs Gemini 2.5 Pro… Who writes better pull requests? 👉 Install here: github.com/apps/openhands… Powered by @allhands_ai


Rohit Malhotra reposted

Having appropriate tests makes a world of difference for agent-driven development. If your agent can write a test to localize a bug or exercise a new feature, the following implementation is much more solid. OpenHands+GPT-5 is now 🥇 on the SWT-Bench testing leaderboard!

OpenHandsDev's tweet image. Having appropriate tests makes a world of difference for agent-driven development.

If your agent can write a test to localize a bug or exercise a new feature, the following implementation is much more solid.

OpenHands+GPT-5 is now 🥇 on the SWT-Bench testing leaderboard!

Rohit Malhotra reposted

We built OpenHands in the open (~60K ⭐️ on GitHub). Now we’re giving back to the OSS ecosystem. Announcing the OpenHands Cloud OSS Credit Program → $100–$500 credits for maintainers. 👉 Learn how to apply!


Rohit Malhotra reposted

Nothing more frustrating than seeing "private scaffold" on public benchmark results I love that model providers like Qwen and Mistral are now reporting their results specifically using OpenHands as the scaffold--feels like we're becoming a standard here x.com/Alibaba_Qwen/s…

>>> Qwen3-Coder is here! ✅ We’re releasing Qwen3-Coder-480B-A35B-Instruct, our most powerful open agentic code model to date. This 480B-parameter Mixture-of-Experts model (35B active) natively supports 256K context and scales to 1M context with extrapolation. It achieves…

Alibaba_Qwen's tweet image. >>> Qwen3-Coder is here! ✅

We’re releasing Qwen3-Coder-480B-A35B-Instruct, our most powerful open agentic code model to date. This 480B-parameter Mixture-of-Experts model (35B active) natively supports 256K context and scales to 1M context with extrapolation. It achieves…


Rohit Malhotra reposted

>>> Qwen3-Coder is here! ✅ We’re releasing Qwen3-Coder-480B-A35B-Instruct, our most powerful open agentic code model to date. This 480B-parameter Mixture-of-Experts model (35B active) natively supports 256K context and scales to 1M context with extrapolation. It achieves…

Alibaba_Qwen's tweet image. >>> Qwen3-Coder is here! ✅

We’re releasing Qwen3-Coder-480B-A35B-Instruct, our most powerful open agentic code model to date. This 480B-parameter Mixture-of-Experts model (35B active) natively supports 256K context and scales to 1M context with extrapolation. It achieves…

United States Trends

Loading...

Something went wrong.


Something went wrong.