Ebtesam ✈️ AIES

@ebtesamdotpy

AI tools for SE research | CS PhD @GeorgeMasonU @INSPIREDLabGMU | Prev @MSFTResearch

Washington, DC

mason.gmu.edu/~ehaque4

10월 2021에 가입

34게시물 119팔로워 194팔로우 중

내가 좋아할 만한 콘텐츠

@Adriano34554795

@vlhcc

@kevpmo

@HowardBGil

@livtoshare

@CarlosLiverani

Ebtesam ✈️ AIES 님이 재게시함

Omar Khattab

@lateinteraction

. 9. 23.

crazy that they called it context window when attention span was right there

Ebtesam ✈️ AIES 님이 재게시함

As we all know by now, reasoning models often generate longer responses, which raises compute costs. Now, this new paper (arxiv.org/abs/2504.05185) shows that this behavior comes from the RL training process, not from an actual need for long answers for better accuracy. The RL…

rasbt's tweet image. As we all know by now, reasoning models often generate longer responses, which raises compute costs. Now, this new paper (arxiv.org/abs/2504.05185) shows that this behavior comes from the RL training process, not from an actual need for long answers for better accuracy. The RL…

Ebtesam ✈️ AIES 님이 재게시함

Tristan T

@trirpi

. 3. 19.

Ebtesam ✈️ AIES 님이 재게시함

I Am Devloper

@iamdevloper

. 3. 20.

vibe coding, where 2 engineers can now create the tech debt of at least 50 engineers

Ebtesam ✈️ AIES 님이 재게시함

Nabeel S. Qureshi

@nabeelqu

. 2. 27.

For the confused, it's actually super easy: - GPT 4.5 is the new Claude 3.6 (aka 3.5) - Claude 3.7 is the new o3-mini-high - Claude Code is the new Cursor - Grok is the new Perplexity - o1 pro is the 'smartest', except for o3, which backs Deep Research Obviously. Keep up.

Ebtesam ✈️ AIES 님이 재게시함

Hamel Husain

@HamelHusain

. 1. 17.

New post re: Devin (the AI SWE). We couldn't find many reviews of people using it for real tasks, so we went MKBHD mode and put Devin through its paces. We documented our findings here. Would love to know if others have had a different experience. answer.ai/posts/2025-01-…

HamelHusain's tweet image. New post re: Devin (the AI SWE). We couldn't find many reviews of people using it for real tasks, so we went MKBHD mode and put Devin through its paces.

We documented our findings here. Would love to know if others have had a different experience.

answer.ai/posts/2025-01-…

Ebtesam ✈️ AIES 님이 재게시함

Diomidis Spinellis

@CoolSWEng

. 10. 24.

Long overdue, a paper finally exposes the Emperor's New “Threats to Validity” Clothes in empirical software engineering research. Even better, it provides suggestions for improving the state of practice.

CoolSWEng's tweet image. Long overdue, a paper finally exposes the Emperor's New “Threats to Validity” Clothes in empirical software engineering research. Even better, it provides suggestions for improving the state of practice.

Prof. Per Runeson

@SoftEngResGrp

. 10. 24.

Presenting our paper @ESEM_conf soon: Threats to Validity in Software Engineering – hypocritical paper section or essential analysis? Paper #OpenAccess dl.acm.org/doi/10.1145/36…

Ebtesam ✈️ AIES 님이 재게시함

Jiaxin Pei

@jiaxin_pei

. 10. 23.

It's common to add personas in system prompts, assuming this can help LLMs. However, through analyzing 162 roles x 4 LLMs x 2410 questions, we show that adding a persona mostly has *no* statistically significant difference from the no-persona setting. If there is a difference, it…

Mingqian Zheng

@elisazmq_zheng

. 10. 23.

🎙️ What if the way we prompt LLMs might actually hold it back? 🚨 Assigning personas like "helpful assistant" in system prompts might *not* be as helpful as we think! ✨ Check out our work accepted to Findings of @emnlpmeeting ✨ 📜 arxiv.org/abs/2311.10054 🧵 [1/7]

elisazmq_zheng's tweet image. 🎙️ What if the way we prompt LLMs might actually hold it back?
🚨 Assigning personas like "helpful assistant" in system prompts might *not* be as helpful as we think!
✨ Check out our work accepted to Findings of @emnlpmeeting ✨

📜 arxiv.org/abs/2311.10054
🧵 [1/7]

Ebtesam ✈️ AIES 님이 재게시함

Ishan

@radshaan

2024. 9. 1.

If you get frequent urges to go deep into a subject, do not ignore them Pick a weekend, stop everything else, and give in to the urge Fresh insights await at the other end

Ebtesam ✈️ AIES 님이 재게시함

Upol Ehsan

@UpolEhsan

2024. 6. 11.

Is hallucination in LLMs inevitable even with an idealized model architecture and perfect training data? This work argues YES and offers a formal proof. Let's dig in ⤵ 🧵1/n

UpolEhsan's tweet image. Is hallucination in LLMs inevitable even with an idealized model architecture and perfect training data?

This work argues YES and offers a formal proof.

Let's dig in ⤵

🧵1/n

Ebtesam ✈️ AIES 님이 재게시함

Edward Grefenstette

@egrefen

2024. 4. 28.

Instead, evaluation processes should track the diverse notions of extrinsic utility which are to be found in both everyday usage of our technology today, but also anticipating how people might use technology tomorrow.

Ebtesam ✈️ AIES 님이 재게시함

Dr Meming

@Dr_Meming

2024. 2. 23.

Heck

Ebtesam ✈️ AIES 님이 재게시함

INSPIRED Lab @ GMU

@INSPIREDLabGMU

2024. 2. 21.

🚨 Inclusive tech research alert! 🚨 Are you a tech user who identifies as BIPOC (bit.ly/BIPOC_defined)? Or a researcher/practitioner who uses data in your work? Share your experiences in our 20 min. survey→go.gmu.edu/EngagingTheMar… IRBNet #: 1945546-2 #data #tech #trust

Ebtesam ✈️ AIES 님이 재게시함

Dr. Amy Lee

@minisciencegirl

2024. 2. 6.

Never name a manuscript draft "_FINAL"

Ebtesam ✈️ AIES 님이 재게시함

Dr Meming

@Dr_Meming

2024. 2. 3.

Academic research: months of experiments and data analysis that ends up being a few sentences in a paper

Ebtesam ✈️ AIES 님이 재게시함

will depue

@willdepue

2024. 2. 2.

I feel like large language model feels a bit reductive when GPT-2 is in the same class as GPT-4. gigantic language models? enormous language models? big ass language models? Nimitz-class language models? better suggestions needed

Ebtesam ✈️ AIES 님이 재게시함

MIT CSAIL

@MIT_CSAIL

2024. 1. 31.

Happy birthday to Python creator Guido van Rossum. The open source language was named after comedy troupe Monty Python: bit.ly/2B8R7h6 Image v/Midjourney

MIT_CSAIL's tweet image. Happy birthday to Python creator Guido van Rossum. The open source language was named after comedy troupe Monty Python: bit.ly/2B8R7h6

Image v/Midjourney

Ebtesam ✈️ AIES 님이 재게시함

François Chollet

@fchollet

2024. 1. 30.

When I got started with programming, I debugged using printf() statements. Today, I debug with print() statements. The purpose of debugging is to correct your mental model of what your code does, and no tool can do that for you. The best any tool can do is provide visibility…

MIT CSAIL

@MIT_CSAIL

2024. 1. 30.

“The most effective debugging tool is still careful thought, coupled with judiciously placed print statements.” — Brian Kernighan, co-creator of Unix

Ebtesam ✈️ AIES 님이 재게시함

Michael Baym

@baym

2024. 1. 19.

If there was only one scientific practice I could teach to every scientist regardless of stage or field I think it would be: look at the data. Spot check it. Find a few data points and trace them through to see if they make sense. Look at the raw data. Don't just do analyses.