はち

@CurveWeb

IT企業勤務。犬とコーヒーが好き。 HuggingFace → https://huggingface.co/HachiML Note → https://note.com/hatti8 LLM, Synthetic data(合成データ), Agent Systemについて発言します

Joined March 2021

4KPosts 2KFollowers 848Following

You might like

@ynakaiR8

@arayashiki04

@okuyu____

@_nokorin_

@maropiyooo

@nogizakapython

@Novice_Invester

@Naoten_site

@skilluphii

Pinned

はち

@CurveWeb

Dec 16

OpenAI o1再現を目指し、LLMの推論能力を高めるライブラリを作成しました。 MCTSアルゴリズムを簡単にLLM（CoTデータ学習済）に統合して推論できるようにしてあります。また、Transformersとなるべく近い使い方になっているので比較的簡単に試せると思います。 github.com/Hajime-Y/reaso…

CurveWeb's tweet card. Contribute to Hajime-Y/reasoning-model development by creating an account on GitHub.

GitHub - Hajime-Y/reasoning-model

Source: github.com

はち reposted

Unlocking the Power of Multi-Agent LLM for Reasoning Designing and optimizing multi-agent systems is important. This paper analyzes multi‑agent systems where one meta‑thinking agent plans and another reasoning agent executes, and identifies a lazy agent failure mode. They find…

omarsar0's tweet image. Unlocking the Power of Multi-Agent LLM for Reasoning

Designing and optimizing multi-agent systems is important.

This paper analyzes multi‑agent systems where one meta‑thinking agent plans and another reasoning agent executes, and identifies a lazy agent failure mode.

They find…

はち reposted

Cameron R. Wolfe, Ph.D.

@cwolferesearch

Nov 4

The memory folding mechanism proposed in this paper is great. It makes sense that agents should spend time explicitly compressing their memory into a semantic / organized format to avoid context explosion. Worth mentioning though that memory compression / retention in agents…

cwolferesearch's tweet image. The memory folding mechanism proposed in this paper is great. It makes sense that agents should spend time explicitly compressing their memory into a semantic / organized format to avoid context explosion.

Worth mentioning though that memory compression / retention in agents…

はち reposted

Sumanth

@Sumanth_077

Nov 1

Open-source, private, local alternative to Manus AI! AgenticSeek is an autonomous agent that browses the web, writes code, and plans tasks, all on your device. It runs entirely on your hardware, ensuring complete privacy and zero cloud dependency. Key Features: 🔒 Local &…

Sumanth_077's tweet image. Open-source, private, local alternative to Manus AI!

AgenticSeek is an autonomous agent that browses the web, writes code, and plans tasks, all on your device.

It runs entirely on your hardware, ensuring complete privacy and zero cloud dependency.

Key Features:

🔒 Local &amp;…

はち reposted

alphaXiv

@askalphaxiv

Nov 1

cool idea from Meta What if we augment CoT + RL’s token space thinking into a “latent space”? This research proposes “The Free Transformer”, with a way to let LLMs make global decisions within a latent space (via VAE encoder) that could later simplify autoregressive sampling

askalphaxiv's tweet image. cool idea from Meta

What if we augment CoT + RL’s token space thinking into a “latent space”?

This research proposes “The Free Transformer”, with a way to let LLMs make global decisions within a latent space (via VAE encoder) that could later simplify autoregressive sampling

はち reposted

OpenAI

@OpenAI

Oct 29

Now in research preview: gpt-oss-safeguard Two open-weight reasoning models built for safety classification. openai.com/index/introduc…

OpenAI's tweet card. New open safety reasoning models (120b and 20b) that support custom safety policies.

Introducing gpt-oss-safeguard

Source: openai.com

はち reposted

Amazon Science

@AmazonScience

Oct 20

Introducing Chronos-2: a foundation model that enables forecasting with an arbitrary number of dimensions in a zero-shot manner, outperforming existing time series foundation models by a substantial margin: amzn.to/4nkHQqp

AmazonScience's tweet card. In-context learning enables a model that can solve forecasting tasks with an arbitrary number of dimensions in a zero-shot manner.

Introducing Chronos-2: From univariate to universal forecasting

Source: amazon.science

はち reposted

elvis

@omarsar0

Oct 21

People are sleeping on Deep Agents. Start using them now. This is a fun paper showcasing how to put together advanced deep agents for enterprise use cases. Uses the best techniques: task decomposition, planning, specialized subagents, MCP for NL2SQL, file analysis, and more.

omarsar0's tweet image. People are sleeping on Deep Agents.

Start using them now.

This is a fun paper showcasing how to put together advanced deep agents for enterprise use cases.

Uses the best techniques: task decomposition, planning, specialized subagents, MCP for NL2SQL, file analysis, and more.

はち reposted

iwashi / Yoshimasa Iwase

@iwashi86

Oct 24

"「Attention Is All You Need」の共著者(Sakana AI の CTO)、「トランスフォーマーにはうんざり」と語る" という記事から：・AI研究が「トランスフォーマー」という一つの技術に偏りすぎている・AI研究の現状は危険なほど視野が狭くなっている…

はち reposted

Dr Singularity

@Dr_Singularity

Oct 24

Just 2 days after Google, we have another big quantum computing breakthrough IBM says one of its key quantum error correction algorithms now can run in real time on AMD FPGA (field programmable gate array) chips, no exotic hardware is needed. This breakthrough could make…

Dr_Singularity's tweet image. Just 2 days after Google, we have another big quantum computing breakthrough

IBM says one of its key quantum error correction algorithms now can run in real time on AMD FPGA (field programmable gate array) chips, no exotic hardware is needed.

This breakthrough could make…

はち reposted

Rohan Paul

@rohanpaul_ai

Oct 21

Knowledge Flow show LLMs can push past context limits by carrying a tiny editable knowledge list across attempts. Hits 100% on AIME25 using text only, so test time memory can unlock big gains. This approach achieved a 100% accuracy on AIME 2025 using only open-source models,…

rohanpaul_ai's tweet image. Knowledge Flow show LLMs can push past context limits by carrying a tiny editable knowledge list across attempts.

Hits 100% on AIME25 using text only, so test time memory can unlock big gains.

This approach achieved a 100% accuracy on AIME 2025 using only open-source models,…

Yufan Zhuang

@yufan_zhuang

Oct 20

Can LLMs reason beyond context limits? 🤔 Introducing Knowledge Flow, a training-free method that helped gpt-oss-120b & Qwen3-235B achieve 100% on the AIME-25, no tools. How? like human deliberation, for LLMs. 📝 Blog: yufanzhuang.notion.site/knowledge-flow 💻 Code: github.com/EvanZhuang/kno…

yufan_zhuang's tweet image. Can LLMs reason beyond context limits? 🤔

Introducing Knowledge Flow, a training-free method that helped gpt-oss-120b &amp; Qwen3-235B achieve 100% on the AIME-25, no tools.

How? like human deliberation, for LLMs.

📝 Blog: yufanzhuang.notion.site/knowledge-flow
💻 Code: github.com/EvanZhuang/kno…

はち reposted

Ikko Hamamura (濵村一航)

@ikkoham

Oct 22

Googleの量子アドバンテージきたー thequantuminsider.com/2025/10/22/goo… スクランブリングのあるダイナミクスにおいて2次のOTOCを測って、「構成的な干渉」を見ました。簡単に言うと量子コンピュータで物理現象をみましたもちろん反論も出てます。 nature.com/articles/d4158… 原論文はnature.com/articles/s4158…

ikkoham's tweet card. Nature - Experimental measurements of high-order out-of-time-order correlators on a superconducting quantum processor show that these correlators remain highly sensitive to the quantum many-body...

Observation of constructive interference at the edge of quantum ergodicity

Source: nature.com

はち reposted

Badr AlKhamissi

@bkhmsi

Oct 20

🚀 Excited to share a major update to our “Mixture of Cognitive Reasoners” (MiCRo) paper! We ask: What benefits can we unlock by designing language models whose inner structure mirrors the brain’s functional specialization? More below 🧠👇 cognitive-reasoners.epfl.ch

bkhmsi's tweet image. 🚀 Excited to share a major update to our “Mixture of Cognitive Reasoners” (MiCRo) paper!

We ask: What benefits can we unlock by designing language models whose inner structure mirrors the brain’s functional specialization?

More below 🧠👇
cognitive-reasoners.epfl.ch

はち reposted

Rohan Paul

@rohanpaul_ai

Oct 10

ByteDance introduced a major advancement in long-context modeling with linearly scaling compute. 👏 Addresses a core challenge in AI—balancing efficiency and fidelity when processing extended sequences—by drawing inspiration from biological memory systems. On 128k tests, FLOPs…

rohanpaul_ai's tweet image. ByteDance introduced a major advancement in long-context modeling with linearly scaling compute. 👏

Addresses a core challenge in AI—balancing efficiency and fidelity when processing extended sequences—by drawing inspiration from biological memory systems.

On 128k tests, FLOPs…

はち reposted

Quant Science

@quantscience_

Oct 10

🚨BREAKING: A new open-source multi-agent LLM trading framework in Python It's called TradingAgents. Here's what it does (and how to get it for FREE): 🧵

quantscience_'s tweet image. 🚨BREAKING: A new open-source multi-agent LLM trading framework in Python

It's called TradingAgents.

Here's what it does (and how to get it for FREE): 🧵

はち reposted

Langfuse JP同好会

@LangfuseJP

Oct 3

コンパクトですが、ちゃんとまとまったRAGAS入門 with Langfuse の Blogです。 gao-ai.com/post/ragas-rag…

LangfuseJP's tweet card. RAGを作ったものの、「性能をどう客観的に評価すればいいか分からない」「ハルシネーションが起きる」といった課題に直面していませんか？この記事では、RAG評価の定番ライブラリRagasの主要メトリクスを徹底解説。検索と生成の品質を定量的に測定し、具体的な改善アクションに繋げる実践的ノウハウを掴めます。データに基づいたRAG改善の第一歩を踏み出しましょう。

もうRAG評価で迷わない！Ragas最新メトリクス解説と実践的改善ガイド

Source: gao-ai.com

はち reposted

Yoonho Lee

@yoonholeee

Oct 3

The standard way to improve reasoning in LLMs is to train on long chains of thought. But these traces are often brute-force and shallow. Introducing RLAD, where models instead learn _reasoning abstractions_: concise textual strategies that guide structured exploration. 1/N🧵

yoonholeee's tweet image. The standard way to improve reasoning in LLMs is to train on long chains of thought.

But these traces are often brute-force and shallow.

Introducing RLAD, where models instead learn _reasoning abstractions_: concise textual strategies that guide structured exploration.
1/N🧵

はち reposted

OpenAI

@OpenAI

Sep 25

Today we’re introducing GDPval, a new evaluation that measures AI on real-world, economically valuable tasks. Evals ground progress in evidence instead of speculation and help track how AI improves at the kind of work that matters most. openai.com/index/gdpval-v0

OpenAI's tweet card. We’re introducing GDPval, a new evaluation that measures model performance on economically valuable, real-world tasks across 44 occupations.

Measuring the performance of our models on real-world tasks

Source: openai.com

はち reposted

Takuya Akiba

@iwiwi

Sep 25

ShinkaEvolve出ました！LLMを使ったコード自動改善のフレームワークです。Sakana AI版AlphaEvolve……と言うと話は簡単ですが、性能も使いやすさも様々な工夫があり、凄く良く出来てます。自分も面白い利用を既に何度か試してまして、このソフトウェアの大ファンです。その話はまた後日……！

Sakana AI

@SakanaAILabs

Sep 25

We’re excited to introduce ShinkaEvolve: An open-source framework that evolves programs for scientific discovery with unprecedented sample-efficiency. Blog: sakana.ai/shinka-evolve/ Code: github.com/SakanaAI/Shink… Like AlphaEvolve and its variants, our framework leverages LLMs to…

はち reposted

AI at Meta

@AIatMeta

Sep 24

New from Meta FAIR: Code World Model (CWM), a 32B-parameter research model designed to explore how world models can transform code generation and reasoning about code. We believe in advancing research in world modeling and are sharing CWM under a research license to help empower…

はち reposted

Anthropic

@AnthropicAI

Sep 11

New on the Anthropic Engineering blog: writing effective tools for LLM agents. AI agents are only as powerful as the tools we give them. So how do we make those tools more effective? We share our best tips for developers: anthropic.com/engineering/wr…