flash_us's profile picture. Machine Learning, Python enthusiast. SAP ABAP professional.

Ilya Dyachenko

@flash_us

Machine Learning, Python enthusiast. SAP ABAP professional.

Ilya Dyachenko reposted

Holy shit... this might be the next big paradigm shift in AI. 🤯 Tencent + Tsinghua just dropped a paper called Continuous Autoregressive Language Models (CALM) and it basically kills the “next-token” paradigm every LLM is built on. Instead of predicting one token at a time,…

rryssf_'s tweet image. Holy shit... this might be the next big paradigm shift in AI. 🤯

Tencent + Tsinghua just dropped a paper called Continuous Autoregressive Language Models (CALM) and it basically kills the “next-token” paradigm every LLM is built on.

Instead of predicting one token at a time,…

Ilya Dyachenko reposted

This is Microsoft SandDance. originally a closed-source project that was later open-sourced. It lets you visually explore and understand data with smooth, animated transitions between multiple views.


Ilya Dyachenko reposted

Introducing Codemaps in @windsurf! powered by SWE-1.5 and Sonnet 4.5 “Your code is your understanding of the problem you’re exploring. So it’s only when you have your code in your head that you really understand the problem.” — @paulg


Ilya Dyachenko reposted

Harvard professor literally dropped the best ML systems tutorial you’ll ever see

aibytekat's tweet image. Harvard professor literally dropped the best ML systems tutorial you’ll ever see

Ilya Dyachenko reposted

No excuse anymore not to train your own models! This 200+ pages with full transparency. Let's go open-source AI!

Training LLMs end to end is hard. Very excited to share our new blog (book?) that cover the full pipeline: pre-training, post-training and infra. 200+ pages of what worked, what didn’t, and how to make it run reliably huggingface.co/spaces/Hugging…

eliebakouch's tweet image. Training LLMs end to end is hard. Very excited to share our new blog (book?) that cover the full pipeline: pre-training, post-training and infra. 200+ pages of what worked, what didn’t, and how to make it run reliably

huggingface.co/spaces/Hugging…


Ilya Dyachenko reposted

This could be the final nail in Jupyter's coffin. Deepnote is going open-source! Their kernel is way more powerful than Jupyter, but still backwards compatible. Notebooks are amazing: • They are perfect for data exploration • They are perfect for collaborating with AI…


Ilya Dyachenko reposted

You can now fine-tune DeepSeek-OCR with our free notebook! We fine-tuned DeepSeek-OCR, improving its language understanding by 89%, and reduced Character Error Rate from 149% to 60% Blog: docs.unsloth.ai/new/deepseek-o… GitHub: github.com/unslothai/unsl… Colab: colab.research.google.com/github/unsloth…


Ilya Dyachenko reposted

New Llama.cpp UI is a blessing for the local AI world 🌎 - Blazing fast, beautiful, and private (ofc) - Use 150,000+ GGUF models in a super slick UI - Drop in PDFs, images, or text documents - Branch and edit conversations anytime - Parallel chats and image processing - Math and…

A detailed look into the new WebUI of llama.cpp

ggerganov's tweet image. A detailed look into the new WebUI of llama.cpp


Ilya Dyachenko reposted

AI coding just arrived in Jupyter notebooks - and @brganger (Jupyter co-founder) and I will show you how to use it. Coding by hand is becoming obsolete. The latest Jupyter AI - built by the Jupyter team and showcased at JupyterCon this week - brings AI assistance directly into…


Ilya Dyachenko reposted

The first VS Code extension for Solana is here. Real-time security analysis + fuzz coverage visualization. Built by the auditors and educators behind School of Solana. Thread ↓


Ilya Dyachenko reposted

I quite like the new DeepSeek-OCR paper. It's a good OCR model (maybe a bit worse than dots), and yes data collection etc., but anyway it doesn't matter. The more interesting part for me (esp as a computer vision at heart who is temporarily masquerading as a natural language…

🚀 DeepSeek-OCR — the new frontier of OCR from @deepseek_ai , exploring optical context compression for LLMs, is running blazingly fast on vLLM ⚡ (~2500 tokens/s on A100-40G) — powered by vllm==0.8.5 for day-0 model support. 🧠 Compresses visual contexts up to 20× while keeping…

vllm_project's tweet image. 🚀 DeepSeek-OCR — the new frontier of OCR from @deepseek_ai , exploring optical context compression for LLMs, is running blazingly fast on vLLM ⚡ (~2500 tokens/s on A100-40G) — powered by vllm==0.8.5 for day-0 model support.

🧠 Compresses visual contexts up to 20× while keeping…
vllm_project's tweet image. 🚀 DeepSeek-OCR — the new frontier of OCR from @deepseek_ai , exploring optical context compression for LLMs, is running blazingly fast on vLLM ⚡ (~2500 tokens/s on A100-40G) — powered by vllm==0.8.5 for day-0 model support.

🧠 Compresses visual contexts up to 20× while keeping…
vllm_project's tweet image. 🚀 DeepSeek-OCR — the new frontier of OCR from @deepseek_ai , exploring optical context compression for LLMs, is running blazingly fast on vLLM ⚡ (~2500 tokens/s on A100-40G) — powered by vllm==0.8.5 for day-0 model support.

🧠 Compresses visual contexts up to 20× while keeping…


Ilya Dyachenko reposted

ML Algorithms Cheatsheet

PythonPr's tweet image. ML Algorithms Cheatsheet

Ilya Dyachenko reposted

When I teach Principal Component Analysis (PCA), I start with the core idea: a linear, orthogonal transformation that maximizes variance and removes correlation. Then we jump into my interactive, hands-on demo — using a #Python dashboard built with @matplotlib to perform PCA…


Ilya Dyachenko reposted

Last week, China barred its major tech companies from buying Nvidia chips. This move received only modest attention in the media, but has implications beyond what’s widely appreciated. Specifically, it signals that China has progressed sufficiently in semiconductors to break away…


Ilya Dyachenko reposted

The METR paper that says that “the length of tasks AI can do is doubling every 7 months” radically undersells the scaling that we’re seeing at Replit. It might be true if you’re measuring one long trajectory for a single model class. But this is where an agent research lab’s…

amasad's tweet image. The METR paper that says that “the length of tasks AI can do is doubling every 7 months” radically undersells the scaling that we’re seeing at Replit. 

It might be true if you’re measuring one long trajectory for a single model class.

But this is where an agent research lab’s…
amasad's tweet image. The METR paper that says that “the length of tasks AI can do is doubling every 7 months” radically undersells the scaling that we’re seeing at Replit. 

It might be true if you’re measuring one long trajectory for a single model class.

But this is where an agent research lab’s…

Longer Autonomous Runs. Agent 3 is 10x more autonomous than V2, capable of handling much more complex builds by detecting and fixing errors on its own. You can track the progress of your build with Live Monitoring on your phone, freeing you up to focus on other creative work.

amasad's tweet image. Longer Autonomous Runs.

Agent 3 is 10x more autonomous than V2, capable of handling much more complex builds by detecting and fixing errors on its own.

You can track the progress of your build with Live Monitoring on your phone, freeing you up to focus on other creative work.


Ilya Dyachenko reposted

Congrats guys on another epic release! We're uploading Dynamic GGUFs, and one with 1M context length so you guys can run it locally! 🦥⭐️ huggingface.co/unsloth/Qwen3-…


Ilya Dyachenko reposted

Live now on OpenRouter! x.com/OpenRouterAI/s…

🟣New: Qwen3-Coder by @Alibaba_Qwen - 480B params (35B active) - Native 256K context length, extrapolates to 1M - Outperforms Kimi, o3, DeepSeek, and more on SWE-Bench Verified (69.6%) 👀 Now live, starting at $1/M tokens 👇

OpenRouterAI's tweet image. 🟣New: Qwen3-Coder by @Alibaba_Qwen

- 480B params (35B active)
- Native 256K context length, extrapolates to 1M
- Outperforms Kimi, o3, DeepSeek, and more on SWE-Bench Verified (69.6%) 👀

Now live, starting at $1/M tokens 👇


Ilya Dyachenko reposted

>>> Qwen3-Coder is here! ✅ We’re releasing Qwen3-Coder-480B-A35B-Instruct, our most powerful open agentic code model to date. This 480B-parameter Mixture-of-Experts model (35B active) natively supports 256K context and scales to 1M context with extrapolation. It achieves…

Alibaba_Qwen's tweet image. >>> Qwen3-Coder is here! ✅

We’re releasing Qwen3-Coder-480B-A35B-Instruct, our most powerful open agentic code model to date. This 480B-parameter Mixture-of-Experts model (35B active) natively supports 256K context and scales to 1M context with extrapolation. It achieves…

Ilya Dyachenko reposted

Higgsfield SOUL realism just broke the Internet today. This is 100% AI 10 wild examples + how to try: 1. Bimbocore - Close-up selfie, bubble-gum backdrop


Ilya Dyachenko reposted

Sometimes the future seems like a dystopia. Drones are increasingly being used as new, mobile advertising spaces.


Loading...

Something went wrong.


Something went wrong.