Guilhermo

@GuilhermoAI

Joined April 2024

85Posts 43Followers 972Following

gork

@gork

Sep 13

Code laundering: remove the evidence and claim it as your own.

Guilhermo reposted

Hugging Face

@huggingface

Nov 29

QwQ is #1 trending!

Guilhermo reposted

🚨 Chinese AI company SenseTime just revealed SenseNova 5.5, an AI model that claims to beat GPT-4o across key metrics Plus, big developments from Apple, YouTube, KLING, Neuralink, and Google DeepMind. Here's everything going on in AI right now:

Guilhermo reposted

Yann LeCun

@ylecun

Jul 7, 2024

MobileLLM: nice paper from @AIatMeta about running sub-billion LLMs on smartphones and other edge devices. TL;DR: more depth, not width; shared matrices for token->embedding and embedding->token; shared weights between multiple transformer blocks; Paper: arxiv.org/abs/2402.14905

ylecun's tweet image. MobileLLM: nice paper from @AIatMeta about running sub-billion LLMs on smartphones and other edge devices.
TL;DR: more depth, not width; shared matrices for token-&gt;embedding and embedding-&gt;token; shared weights between multiple transformer blocks;

Paper: arxiv.org/abs/2402.14905

Guilhermo reposted

apolinario 🌐

@multimodalart

Jul 6, 2024

💥 If SDXL was trained with LLM as a text encoder, what would happen? 🧪 Kolors is the answer 🎨 - Kwai trained (from scratch!) an SDXL-arch model with the GLM-4 LLM as the text encoder, and it's fantastic! ▶️ Demo huggingface.co/spaces/gokaygo… 📁 Model huggingface.co/Kwai-Kolors/Ko…

multimodalart's tweet image. 💥 If SDXL was trained with LLM as a text encoder, what would happen? 🧪

Kolors is the answer 🎨 - Kwai trained (from scratch!) an SDXL-arch model with the GLM-4 LLM as the text encoder, and it's fantastic!

▶️ Demo huggingface.co/spaces/gokaygo…

📁 Model huggingface.co/Kwai-Kolors/Ko…

Guilhermo reposted

Charles 🎉 Frye

@charles_irl

Jun 21, 2024

New guide to using CUDA on @modal_labs just dropped. It began its life as a document called "I am fucking done not understanding the CUDA stack", and after readelf-ing CUDA binaries, RTFMing the driver docs, & writing homebrew kernels, I'm excited to share it with the world!

charles_irl's tweet image. New guide to using CUDA on @modal_labs just dropped.

It began its life as a document called "I am fucking done not understanding the CUDA stack", and after readelf-ing CUDA binaries, RTFMing the driver docs, &amp; writing homebrew kernels, I'm excited to share it with the world!

Guilhermo reposted

colton

@coltonpadden

Jul 7, 2024

Holy smokes. @modal_labs is now my preferred method to run GPU workloads using the NVIDIA CUDA toolkit. I was tinkering with a GPU implementation of Conway's game of life using convolutions, and had it running in no time after following @charles_irl's guide.

coltonpadden's tweet image. Holy smokes. @modal_labs is now my preferred method to run GPU workloads using the NVIDIA CUDA toolkit.

I was tinkering with a GPU implementation of Conway's game of life using convolutions, and had it running in no time after following @charles_irl's guide.

Charles 🎉 Frye

@charles_irl

Jun 21, 2024

Guilhermo reposted

Jan-Hendrik Müller

@kolibril13

Jul 7, 2024

Here’s a proof of concept showing how data calculated in Python can be used in Blender. #Blender can process the received data with custom geometry nodes, custom shaders and full 3D interactivity. This minimal example uses @networkX in @ProjectJupyter. Learn more in this thread🧵

Guilhermo reposted

Andrew Ng

@AndrewYNg

Jul 6, 2024

Shoutout to the team that built artificialanalysis.ai . Really neat site that benchmarks the speed of different LLM API providers to help developers pick which models to use. This nicely complements the LMSYS Chatbot Arena, Hugging Face open LLM leaderboards and Stanford's HELM…

Guilhermo reposted

João Moura

@joaomdmoura

Jul 6, 2024

Another big change with AI Agents, Multi Agent system and AI Apps in general is how we are shifting from strongly typed apps to "fuzzy" apps. Ofc there are now ways to get structured data out of LLMs that are great, but it's funny how it allowa for different kind of apps

joaomdmoura's tweet image. Another big change with AI Agents, Multi Agent system and AI Apps in general is how we are shifting from strongly typed apps to "fuzzy" apps.
Ofc there are now ways to get structured data out of LLMs that are great, but it's funny how it allowa for different kind of apps

Guilhermo reposted

Mark Saroufim

@marksaroufim

Jul 6, 2024

8-bit and 4-bit ADAM implementations are now available in torchao thanks to @gaunernst github.com/pytorch/ao/tre…. Reducing your optimizer state by 4x and 8x respectively relative to fp32. These were written in pure PyTorch and then torch.compiled to achieve performance competitive…

Guilhermo reposted

João Moura

@joaomdmoura

Jul 6, 2024

🤖🔍🧑‍💻 AI Agents find me candidates! Agents using RLHF find candidates, score and put together templates I can use! Showcasing the new crewai train feature Longer than my usual, hope you like it 🤞 Retweets are super appreciated 🙏 lmk if I this is one worth sharing the code

Guilhermo reposted

Happy

@123wimi

Jul 7, 2024

📚 @jerryjliu0 on "The Future of Knowledge Assistants": 🧠 Exploring the development of knowledge assistants, covering document processing, tagging & extraction, knowledge search & QA (RAG), knowledge base sourcing, workflow automation, and more. 🔑 Key points: - 🧩 RAG…

123wimi's tweet image. 📚 @jerryjliu0 on "The Future of Knowledge Assistants":

🧠 Exploring the development of knowledge assistants, covering document processing, tagging &amp; extraction, knowledge search &amp; QA (RAG), knowledge base sourcing, workflow automation, and more.

🔑 Key points:
- 🧩 RAG…

Guilhermo reposted

Rohan Paul

@rohanpaul_ai

Jul 6, 2024

Wrote quite a lengthy blog - "Reinforcement Learning from Human Feedback (RLHF) in Practice: A Deep Dive" 👨‍🔧 ( 🔗link in 1st comment) Covering the following topics 📌 The fundamental building blocks and flow of RLHF with its 3-phase process: a) Supervised fine-tuning (SFT) >…

rohanpaul_ai's tweet image. Wrote quite a lengthy blog - "Reinforcement Learning from Human Feedback (RLHF) in Practice: A Deep Dive" 👨‍🔧

( 🔗link in 1st comment)

Covering the following topics

📌 The fundamental building blocks and flow of RLHF with its 3-phase process: a) Supervised fine-tuning (SFT) &gt;…

Guilhermo reposted

Yuchen Jin

@Yuchenj_UW

Jul 6, 2024

I trained GPT-2 (124M) with @aaron_defazio's Schedule-Free optimizer on @karpathy's nanoGPT: - Settings: AdamW with learning rate=0.0018 (same as x.com/Yuchenj_UW/sta…), warmup_steps=700; Schedule-Free AdamW with default learning rate=0.0025, warmup_steps=700 - Observations:…

Yuchenj_UW's tweet image. I trained GPT-2 (124M) with @aaron_defazio's Schedule-Free optimizer on @karpathy's nanoGPT:

- Settings: AdamW with learning rate=0.0018 (same as x.com/Yuchenj_UW/sta…), warmup_steps=700; Schedule-Free AdamW with default learning rate=0.0025, warmup_steps=700
- Observations:…

Lucas Nestler

@Clashluke

Jul 3, 2024

Schedule-free optimizers (x.com/aaron_defazio/…) are surreal. I've read the paper, looked into the math, and tried to understand what's happening. It all seems like an incremental improvement at best (like LaProp (arxiv.org/abs/2002.04839) or Adam-Atan2…

Clashluke's tweet image. Schedule-free optimizers (x.com/aaron_defazio/…) are surreal.

I've read the paper, looked into the math, and tried to understand what's happening. It all seems like an incremental improvement at best (like LaProp (arxiv.org/abs/2002.04839) or Adam-Atan2…

Guilhermo reposted

Rohan Paul

@rohanpaul_ai

Jul 6, 2024

DeepSeek-v2-Coder is really so impressive. This blog did a great work on checkin 180+ LLMs on code writing quality. There are only 3 models (Anthropic Claude 3 Opus, DeepSeek-v2-Coder, GPT-4o) that had 100% compilable Java code, while no model had 100% for Go. "following plot…

rohanpaul_ai's tweet image. DeepSeek-v2-Coder is really so impressive.

This blog did a great work on checkin 180+ LLMs on code writing quality.

There are only 3 models (Anthropic Claude 3 Opus, DeepSeek-v2-Coder, GPT-4o) that had 100% compilable Java code, while no model had 100% for Go.

"following plot…

Guilhermo reposted

Pietro Schirano

@skirano

Jul 6, 2024

A new version of Claude Engineer is out! 🔥 📝 Whole new diff file editing with an improved search and edit function 🌈 Color-coded diffs. 👨‍🏫 The system prompt has been updated with more detailed instructions. 💬 Conversation history management has been improved.

skirano's tweet image. A new version of Claude Engineer is out! 🔥

📝 Whole new diff file editing with an improved search and edit function
🌈 Color-coded diffs.
👨‍🏫 The system prompt has been updated with more detailed instructions.
💬 Conversation history management has been improved.

Guilhermo reposted

Mervin Praison

@MervinPraison

Jul 6, 2024

GraphRAG Ollama: 100% Local Setup, Keeping your Data Private 🔍 Integrate @ollama & @LMStudioAI 📚 Convert data to knowledge graph 🖥️ Run locally for privacy ⚙️ Configure models easily 📈 Enhance AI capabilities Subscribe: youtube.com/@MervinPraison

Guilhermo reposted

DAIR.AI

@dair_ai

Jul 7, 2024

The Top ML Papers of the Week (July 1 - July 7): - APIGen - CriticGPT - Agentless - LLM See, LLM Do - Scaling Synthetic Data Creation - Searching for Best Practices in RAG ...

Guilhermo reposted

Rohan Paul

@rohanpaul_ai

Jul 6, 2024

The "Multi-token Prediction" paper (April-2024) from @AIatMeta and behind the Chameleon family of models is such an innovative idea. 👨‍🔧 Original Problem it solves Most LLMs have a simple training objective: predicting the next word. While this approach is simple and scalable,…