freddiev4's profile picture. cto & co-founder @quotientai Research @cohere_labs — past: mle @github Copilot, data @quantopian — Tico 🇨🇷 & Bostonian 🇺🇸

Freddie Vargus

@freddiev4

cto & co-founder @quotientai Research @cohere_labs — past: mle @github Copilot, data @quantopian — Tico 🇨🇷 & Bostonian 🇺🇸

Pinned

it's a good model

freddiev4's tweet image. it's a good model

Freddie Vargus reposted

Who's that pokemon SOTA!

it's a good model

freddiev4's tweet image. it's a good model


but how good is Gemini 3 at JRPGs?


Freddie Vargus reposted

Goodharting childhood

Gemini 3 pro just crushed every other model on this benchmark. did y'all train for this @Google ?



Gemini 3 pro just crushed every other model on this benchmark. did y'all train for this @Google ?

Quick weekend project: how good are LLM's at "Who's That Pokémon?" answer: not great! I tested some of the best models on a simple game segment from the show with a small benchmark I call PokeShadowBench. some results below

freddiev4's tweet image. Quick weekend project: how good are LLM's at "Who's That Pokémon?" 

answer: not great!

I tested some of the best models on a simple game segment from the show with a small benchmark I call PokeShadowBench. some results below


rip only 1 Michelin star for Boston


Freddie Vargus reposted

Want to build a successful AI product? @swyx has some advice for you! Thrilled to have swyx join to chat with us about - tools that work (and what doesn't) - developer experience - strategic approach to building There's a lot to learn this week, don't miss it

ToolUsePodcast's tweet image. Want to build a successful AI product?

@swyx has some advice for you!

Thrilled to have swyx join to chat with us about
- tools that work (and what doesn't)
- developer experience
- strategic approach to building

There's a lot to learn this week, don't miss it

Cody’s got a bunch of gems / great write ups 👀 check out the first one

I've got something new for everyone. My first substack article! Not the one I planned to do first, but a fun one! I have made a handy calculator base on the DeepSeek v1 coefficients for finding optimal LR and batch sizes for dense LLMs.

code_star's tweet image. I've got something new for everyone. My first substack article! Not the one I planned to do first, but a fun one!

I have made a handy calculator base on the DeepSeek v1 coefficients for finding optimal LR and batch sizes for dense LLMs.


Freddie Vargus reposted

I've got something new for everyone. My first substack article! Not the one I planned to do first, but a fun one! I have made a handy calculator base on the DeepSeek v1 coefficients for finding optimal LR and batch sizes for dense LLMs.

code_star's tweet image. I've got something new for everyone. My first substack article! Not the one I planned to do first, but a fun one!

I have made a handy calculator base on the DeepSeek v1 coefficients for finding optimal LR and batch sizes for dense LLMs.

huh. how often do people find these types of issues in RDS? “How we Uncovered a Race Condition in Aurora RDS” news.ycombinator.com/item?id=459299…


Freddie Vargus reposted

jet lagged and about to demo some never seen before alpha we *just* cooked

julianeagu's tweet image. jet lagged and about to demo some never seen before alpha we *just* cooked
julianeagu's tweet image. jet lagged and about to demo some never seen before alpha we *just* cooked

hands on the best conference venue I’ve seen - tucked away in a forest outside of the city @lisbonai_

julianeagu's tweet image. hands on the best conference venue I’ve seen - tucked away in a forest outside of the city @lisbonai_
julianeagu's tweet image. hands on the best conference venue I’ve seen - tucked away in a forest outside of the city @lisbonai_
julianeagu's tweet image. hands on the best conference venue I’ve seen - tucked away in a forest outside of the city @lisbonai_
julianeagu's tweet image. hands on the best conference venue I’ve seen - tucked away in a forest outside of the city @lisbonai_


Freddie Vargus reposted

can every AI agent get better, and better, and better.. automatically? humans learn from their environments, so why wouldn't agents? nurture is built in production shared a first glimpse of what's possible @lisbonai_

jet lagged and about to demo some never seen before alpha we *just* cooked

julianeagu's tweet image. jet lagged and about to demo some never seen before alpha we *just* cooked
julianeagu's tweet image. jet lagged and about to demo some never seen before alpha we *just* cooked


Freddie Vargus reposted

You can now fork Claude Code agents in Sculptor! 🍴 Spin off a new agent mid-conversation—it keeps all prior context. Try multiple implementations in parallel, spin off subtasks, and save money by reusing context.


Freddie Vargus reposted

make sure to take steps to protect your tokens so the spirits don’t get them tonight


Freddie Vargus reposted

"try/except" is the em-dash of vibe code


Freddie Vargus reposted

We benchmarked how well open language models handle tool calls and found some clear patterns: - 1 in 6 calls use the wrong tool - 2–3% have parameter name mismatches - 1–2% pass values in the wrong format Most tool use issues come from unclear schemas, overlapping tool names, or…

julianeagu's tweet image. We benchmarked how well open language models handle tool calls and found some clear patterns:
- 1 in 6 calls use the wrong tool
- 2–3% have parameter name mismatches
- 1–2% pass values in the wrong format

Most tool use issues come from unclear schemas, overlapping tool names, or…

if you want to work with an insanely talented team on very interesting problems, you should absolutely look at what Sara & co are doing and apply

I'm starting a new project. Working on what I consider to be the most important problem: building thinking machines that adapt and continuously learn. We have incredibly talent dense founding team + are hiring for engineering, ops, design. Join us: adaptionlabs.ai



Loading...

Something went wrong.


Something went wrong.