arrayailabs's profile picture. Model Behavior Design & Engineering http://arrayailabs.com

array

@arrayailabs

Model Behavior Design & Engineering http://arrayailabs.com

array reposted

Microsoft did it again! Building with AI agents almost never works on the first try. You spend days tweaking prompts, adding examples, hoping it gets better. Nothing systematic, just guesswork. This is exactly what Microsoft's Agent Lightning solves. It's an open-source…

akshay_pachaar's tweet image. Microsoft did it again!

Building with AI agents almost never works on the first try.

You spend days tweaking prompts, adding examples, hoping it gets better. Nothing systematic, just guesswork.

This is exactly what Microsoft's Agent Lightning solves.

It's an open-source…

array reposted

Now in private beta: Aardvark, an agent that finds and fixes security bugs using GPT-5. openai.com/index/introduc…

OpenAI's tweet image. Now in private beta: Aardvark, an agent that finds and fixes security bugs using GPT-5.

openai.com/index/introduc…

array reposted

After ~4 years building SOTA models & datasets, we're sharing everything we learned in ⚡The Smol Training Playbook We cover the full LLM cycle: designing ablations, choosing an architecture, curating data, post-training, and building solid infrastructure. We'll help you…

LoubnaBenAllal1's tweet image. After ~4 years building SOTA models & datasets, we're sharing everything we learned in ⚡The Smol Training Playbook

We cover the full LLM cycle: designing ablations, choosing an architecture, curating data, post-training, and building solid infrastructure.

We'll help you…

array reposted

New Anthropic research: Signs of introspection in LLMs. Can language models recognize their own internal thoughts? Or do they just make up plausible answers when asked about them? We found evidence for genuine—though limited—introspective capabilities in Claude.

AnthropicAI's tweet image. New Anthropic research: Signs of introspection in LLMs.

Can language models recognize their own internal thoughts? Or do they just make up plausible answers when asked about them? We found evidence for genuine—though limited—introspective capabilities in Claude.

array reposted

Crawling isn't innate (unlike walking). Every baby must *invent* crawling, from scratch, using extremely little data, and no reference to imitate. Which is why different babies end up with different ways of crawling. Sometimes people tell me, "you say AI isn't intelligent until…

Adaptable Intelligence. Multiple possible paths to an objective.



array reposted

The @karpathy interview 0:00:00 – AGI is still a decade away 0:30:33 – LLM cognitive deficits 0:40:53 – RL is terrible 0:50:26 – How do humans learn? 1:07:13 – AGI will blend into 2% GDP growth 1:18:24 – ASI 1:33:38 – Evolution of intelligence & culture 1:43:43 - Why self…


array reposted

New paper 📜: Tiny Recursion Model (TRM) is a recursive reasoning approach with a tiny 7M parameters neural network that obtains 45% on ARC-AGI-1 and 8% on ARC-AGI-2, beating most LLMs. Blog: alexiajm.github.io/2025/09/29/tin… Code: github.com/SamsungSAILMon… Paper: arxiv.org/abs/2510.04871


array reposted

nanochat d32, i.e. the depth 32 version that I specced for $1000, up from $100 has finished training after ~33 hours, and looks good. All the metrics go up quite a bit across pretraining, SFT and RL. CORE score of 0.31 is now well above GPT-2 at ~0.26. GSM8K went ~8% -> ~20%,…

karpathy's tweet image. nanochat d32, i.e. the depth 32 version that I specced for $1000, up from $100 has finished training after ~33 hours, and looks good. All the metrics go up quite a bit across pretraining, SFT and RL. CORE score of 0.31 is now well above GPT-2 at ~0.26. GSM8K went ~8% -> ~20%,…

✍️

arrayailabs's tweet image. ✍️

New paper: You can make ChatGPT 2x as creative with one sentence. Ever notice how LLMs all sound the same? They know 100+ jokes but only ever tell one. Every blog intro: "In today's digital landscape..." We figured out why – and how to unlock the rest 🔓 Copy-paste prompt: 🧵



What's your take an model behavior?

We made ChatGPT pretty restrictive to make sure we were being careful with mental health issues. We realize this made it less useful/enjoyable to many users who had no mental health problems, but given the seriousness of the issue we wanted to get this right. Now that we have…



array reposted

Introducing Veo 3.1 and Veo 3.1 Fast, our latest state of the art video models with: - richer native audio - better cinematic styles - reference to video - transitions between frames - video extensions


very cool!

Introducing NotebookLM for arXiv papers 🚀 Transform dense AI research into an engaging conversation With context across thousands of related papers, it captures motivations, draws connections to SOTA, and explains key insights like a professor who's read the entire field



array reposted

Very excited to share @theworldlabs ‘s latest research work RTFM!! It’s a real-time, persistent, and 3D consistent generative World Model running on *a single* H100 GPU! Blog and live demo are available below! 🤩

Generative World Models will inevitably be computationally demanding, potentially scaling beyond even the requirements of today’s LLMs. But we believe they are a crucial research direction to explore in the future of rendering and spatial intelligence. worldlabs.ai/blog/rtfm



array reposted

that's actually really awesome huggingface.co/chat/

suzatweet's tweet image. that's actually really awesome huggingface.co/chat/

array reposted

A big part of our mission at Thinking Machines is to improve people’s scientific understanding of AI and work with the broader research community. Introducing Connectionism today to share some of our scientific insights.

Today Thinking Machines Lab is launching our research blog, Connectionism. Our first blog post is “Defeating Nondeterminism in LLM Inference” We believe that science is better when shared. Connectionism will cover topics as varied as our research is: from kernel numerics to…

thinkymachines's tweet image. Today Thinking Machines Lab is launching our research blog, Connectionism. Our first blog post is “Defeating Nondeterminism in LLM Inference”

We believe that science is better when shared. Connectionism will cover topics as varied as our research is: from kernel numerics to…


array reposted

Research research research research research research research research research research research research research research research research research research research research research research research research research research research research research research research…


array reposted

JSON prompting for LLMs, clearly explained:


array reposted

Couldn't resist. Here's a pure PyTorch from-scratch re-implementation of Gemma 3 270M in a Jupyter Notebook (uses about 1.49 GB RAM): github.com/rasbt/LLMs-fro…

rasbt's tweet image. Couldn't resist. 
Here's a pure PyTorch from-scratch re-implementation of Gemma 3 270M in a Jupyter Notebook (uses about 1.49 GB RAM): github.com/rasbt/LLMs-fro…

Gemma 3 270M! Great to see another awesome, small open-weight LLM for local tinkering. Here's a side-by-side comparison with Qwen3. Biggest surprise that it only has 4 attention heads!

rasbt's tweet image. Gemma 3 270M! Great to see another awesome, small open-weight LLM for local tinkering.
Here's a side-by-side comparison with Qwen3. Biggest surprise that it only has 4 attention heads!


array reposted

Come join us on Thursday for a gpt-oss Deep Dive! We'll take a look at the model architecture, algo gems and other technical details of gpt-oss, OpenAI's latest and first open-weight reasoning model. meetup.com/machine-learni…

__MLT__'s tweet image. Come join us on Thursday for a gpt-oss Deep Dive! We'll take a look at the model architecture, algo gems and other technical details of gpt-oss, OpenAI's latest and first open-weight reasoning model.
meetup.com/machine-learni…

United States Trends

Loading...

Something went wrong.


Something went wrong.