joekina's profile picture. 靖源
Focus on N.O.W

Jing Yuan

@joekina

靖源 Focus on N.O.W

Jing Yuan reposted

When you use an autonomous AI agent to automate your corporate expense reports and it makes a mistake or worse, who is responsible? Does it matter if I created the agent myself, if was provided by the company, or if it was required by the company? And what about when you use…


Jing Yuan reposted

Interesting experiment found that an AI agent built around the obsolete GPT-3.5 and GPT-4 models beat experienced human venture capital analysts in predicting which early-stage startups would survive based on early screening (at much lower costs as well). sciencedirect.com/science/articl…

emollick's tweet image. Interesting experiment found that an AI agent built around the obsolete GPT-3.5 and GPT-4 models beat experienced human venture capital analysts in predicting which early-stage startups would survive based on early screening (at much lower costs as well). sciencedirect.com/science/articl…

Jing Yuan reposted

Lex, thank you for the discussion!! Was great to see you.

Here's my conversation with Michael Levin (@drmichaellevin) about the nature of intelligence in biological systems, including unconventional & alien intelligence, agency, memory, consciousness, and life in all its forms here on Earth and beyond. It's here on X in full and is up…



Jing Yuan reposted
fchollet's tweet image.

One point I made that didn’t come across: - Scaling the current thing will keep leading to improvements. In particular, it won’t stall. - But something important will continue to be missing.



Jing Yuan reposted

Demis Hassabis talks about the moment he wanted to pursue research. At 12, he was world’s 2nd best chess player for his age. He went to a tournament and lost to a 30-year old player, who was overly happy beating a kid. Demis loved chess but realized all the brainpower in that…


Jing Yuan reposted

One point I made that didn’t come across: - Scaling the current thing will keep leading to improvements. In particular, it won’t stall. - But something important will continue to be missing.

here are the most important points from today's ilya sutskever podcast: - superintelligence in 5-20 years - current scaling will stall hard; we're back to real research - superintelligence = super-fast continual learner, not finished oracle - models generalize 100x worse than…



Jing Yuan reposted

From the makers of the popular AlphaGo documentary, The Thinking Game gives a much broader picture of the story of DeepMind and our mission to build AGI, drawing on interviews with myself and others going back many years. You can now freely watch it here: youtube.com/watch?v=d95J8y…

ShaneLegg's tweet card. The Thinking Game | Full documentary | Tribeca Film Festival official...

youtube.com

YouTube

The Thinking Game | Full documentary | Tribeca Film Festival official...


Jing Yuan reposted

New Harvard+MIT+Georgia Tech paper argues that truly understanding language means linking words to rich nonverbal brain systems that model reality. First, it explains that the brain's language regions mostly track patterns in words and grammar, similar to phone typing…

rohanpaul_ai's tweet image. New Harvard+MIT+Georgia Tech paper argues that truly understanding language means linking words to rich nonverbal brain systems that model reality.

First, it explains that the brain's language regions mostly track patterns in words and grammar, similar to phone typing…

Jing Yuan reposted

As one of the authors of the original “jagged frontier” paper, I think this undersells how jagged AI is (& likely will be) at even the level of individual jobs: having a couple of critical tasks that AI can’t do creates deep bottlenecks especially as shape of frontier is unknown.

My take on the jagged frontier debate:

tomaspueyo's tweet image. My take on the jagged frontier debate:


Jing Yuan reposted

How Claudey is Opus 4.5? We previously described Claudiness as "good at agentic tasks while being weaker at multimodal and math". This pattern remains when comparing Opus 4.5 to other newly-released models, though the gap on agentic coding and tool-calling benchmarks is small.

EpochAIResearch's tweet image. How Claudey is Opus 4.5?

We previously described Claudiness as "good at agentic tasks while being weaker at multimodal and math". This pattern remains when comparing Opus 4.5 to other newly-released models, though the gap on agentic coding and tool-calling benchmarks is small.

Jing Yuan reposted

This is insane… OpenAI Anthropic & Google just got access to petabytes of proprietary Data, The data is coming from the 17 National Laboratories, which have been hoarding experimental data for decades. We aren't just talking about better chatbots anymore. The US Government’s…

chatgpt21's tweet image. This is insane… OpenAI Anthropic & Google just got access to petabytes of proprietary Data, The data is coming from the 17 National Laboratories, which have been hoarding experimental data for decades.

We aren't just talking about better chatbots anymore. The US Government’s…

Jing Yuan reposted

Introducing Claude Opus 4.5: the best model in the world for coding, agents, and computer use. Opus 4.5 is a step forward in what AI systems can do, and a preview of larger changes to how work gets done.

claudeai's tweet image. Introducing Claude Opus 4.5: the best model in the world for coding, agents, and computer use.

Opus 4.5 is a step forward in what AI systems can do, and a preview of larger changes to how work gets done.

Jing Yuan reposted

To my surprise, Opus 4.5 one-shot my hardest λ-calculus problem (tying with Gemini 3), and it did solve the stack underflow bug that an old checkpoint of Gemini 3 (NOT the deployed version) solved. So, in terms of first hour impression, that couldn't be more promising I guess...


Jing Yuan reposted

14MB ram / 9MB disk (MB, *not* GB!) to index all of Windows 10, in 1 second. Index stays updated automatically. It's amazing what's possible with a modern computer if you actually care about engineering. voidtools.com

jeremyphoward's tweet image. 14MB ram / 9MB disk (MB, *not* GB!) to index all of Windows 10, in 1 second. Index stays updated automatically.

It's amazing what's possible with a modern computer if you actually care about engineering.
voidtools.com

對嘛!這個時代的行銷,根本不用太多 demo 或 benchmark 直接叫你們家長相帥氣、聲音有磁性的員工出來說幾句話就好啦😆 這也是 AI 無法取代的能力 😂 youtu.be/56kq0VTkU4k?si…

joekina's tweet card. Introducing Claude Opus 4.5

youtube.com

YouTube

Introducing Claude Opus 4.5


Loading...

Something went wrong.


Something went wrong.