halfartificial's profile picture. @halfline alt account for ai related discourse

Ray Strode

@halfartificial

@halfline alt account for ai related discourse

Ray Strode reposted

chinese bros back at it again - train a Decoder-Only transformer < 3 hrs on a 3090 - fully studded with LoRA, DPO, SFT. - well documented training - vision, moe, and other goodies also avail.

k7agar's tweet image. chinese bros  back at it again
- train a Decoder-Only transformer &amp;lt; 3 hrs on a 3090
- fully studded with LoRA, DPO, SFT.
- well documented training 
- vision, moe, and other goodies also avail.

Ray Strode reposted

Knowledge Graphs give LLMs the context they need to understand your code better This paper presents a novel approach to improve software repository question-answering by combining LLMs with knowledge graphs. The research demonstrates how knowledge graphs can enhance LLMs'…

rohanpaul_ai's tweet image. Knowledge Graphs give LLMs the context they need to understand your code better

This paper presents a novel approach to improve software repository question-answering by combining LLMs with knowledge graphs. The research demonstrates how knowledge graphs can enhance LLMs&apos;…

bye bye byte pair encoding. you've served as well, but your time has come.

META JUST KILLED TOKENIZATION !!! A few hours ago they released "Byte Latent Transformer". A tokenizer free architecture that dynamically encodes Bytes into Patches and achieves better inference efficiency and robustness! (I was just talking about how we need dynamic…

scaling01's tweet image. META JUST KILLED TOKENIZATION !!!

A few hours ago they released &quot;Byte Latent Transformer&quot;. A tokenizer free architecture that dynamically encodes Bytes into Patches and achieves better inference efficiency and robustness!

(I was just talking about how we need dynamic…
scaling01's tweet image. META JUST KILLED TOKENIZATION !!!

A few hours ago they released &quot;Byte Latent Transformer&quot;. A tokenizer free architecture that dynamically encodes Bytes into Patches and achieves better inference efficiency and robustness!

(I was just talking about how we need dynamic…
scaling01's tweet image. META JUST KILLED TOKENIZATION !!!

A few hours ago they released &quot;Byte Latent Transformer&quot;. A tokenizer free architecture that dynamically encodes Bytes into Patches and achieves better inference efficiency and robustness!

(I was just talking about how we need dynamic…
scaling01's tweet image. META JUST KILLED TOKENIZATION !!!

A few hours ago they released &quot;Byte Latent Transformer&quot;. A tokenizer free architecture that dynamically encodes Bytes into Patches and achieves better inference efficiency and robustness!

(I was just talking about how we need dynamic…


Ray Strode reposted

canvas is now available to all chatgpt users, and can execute code! more importantly it can also still emojify your writing.


Ray Strode reposted

Open-sourced local LLM based RAG, chatting with your documents with open-source LLMs. ✨ It trended at Number-1 in Github for quite sometime. And a clean & customizable RAG UI for chatting with your documents. → Open-source RAG UI for document QA → Supports local LLMs and…

rohanpaul_ai's tweet image. Open-sourced local LLM based RAG, chatting with your documents with open-source LLMs. ✨

It trended at Number-1 in Github for quite sometime.

And a clean &amp;amp; customizable RAG UI for chatting with your documents.

→ Open-source RAG UI for document QA

→ Supports local LLMs and…

Ray Strode reposted

As R&D staff @answerdotai, I work a lot on boosting productivity with AI. A common theme that always comes up is the combination of human+AI. This combination proved to be powerful in our new project ShellSage, which is an AI terminal buddy that learns and teaches with you. A 🧵

ncooper57's tweet image. As R&amp;amp;D staff @answerdotai, I work a lot on boosting productivity with AI. A common theme that always comes up is the combination of human+AI. This combination proved to be powerful in our new project ShellSage, which is an AI terminal buddy that learns and teaches with you. A 🧵

Ray Strode reposted

Welcome PaliGemma 2! 🤗 Google released PaliGemma 2, best vision language model family that comes in various sizes: 3B, 10B, 28B, based on Gemma 2 and SigLIP, comes with transformers support day-0 🎁 Saying this model is amazing would be an understatement, keep reading ✨

mervenoyann's tweet image. Welcome PaliGemma 2! 🤗

Google released PaliGemma 2, best vision language model family that comes in various sizes: 3B, 10B, 28B, based on Gemma 2 and SigLIP, comes with transformers support day-0 🎁

Saying this model is amazing would be an understatement, keep reading  ✨

Ray Strode reposted

The First Globally Trained 10B Parameter Model is released. 👏👏 INTELLECT-1 is a groundbreaking 10B parameter LLM trained collaboratively across multiple continents globally using distributed computing, representing a 10x scale-up from previous research. → The model achieved…

rohanpaul_ai's tweet image. The First Globally Trained 10B Parameter Model is released. 👏👏

INTELLECT-1 is a groundbreaking 10B parameter LLM trained collaboratively across multiple continents globally using distributed computing, representing a 10x scale-up from previous research.

→ The model achieved…

Releasing INTELLECT-1: We’re open-sourcing the first decentralized trained 10B model: - INTELLECT-1 base model & intermediate checkpoints - Pre-training dataset - Post-trained instruct models by @arcee_ai - PRIME training framework - Technical paper with all details



Ray Strode reposted

ollama run qwq 🤯 an experimental 32B model by the Qwen team that is competitive with o1-mini and o1-preview in some cases. ollama.com/library/qwq Note: This is the pronunciation of QwQ: /kwju:/ , similar to the word “quill”

ollama's tweet image. ollama run qwq 

🤯 an experimental 32B model by the Qwen team that is competitive with o1-mini and o1-preview in some cases. 

ollama.com/library/qwq

Note: This is the pronunciation of QwQ: /kwju:/ , similar to the word “quill”
ollama's tweet image. ollama run qwq 

🤯 an experimental 32B model by the Qwen team that is competitive with o1-mini and o1-preview in some cases. 

ollama.com/library/qwq

Note: This is the pronunciation of QwQ: /kwju:/ , similar to the word “quill”
ollama's tweet image. ollama run qwq 

🤯 an experimental 32B model by the Qwen team that is competitive with o1-mini and o1-preview in some cases. 

ollama.com/library/qwq

Note: This is the pronunciation of QwQ: /kwju:/ , similar to the word “quill”
ollama's tweet image. ollama run qwq 

🤯 an experimental 32B model by the Qwen team that is competitive with o1-mini and o1-preview in some cases. 

ollama.com/library/qwq

Note: This is the pronunciation of QwQ: /kwju:/ , similar to the word “quill”

Ray Strode reposted

Adding rule-based guidance doubles RAG's performance in document retrieval and answer generation. Basically, RAG gets a proper manual on how to use its knowledge. It's like giving RAG a GPS instead of letting it wander around blindly. 🎯 Original Problem: Current…

rohanpaul_ai's tweet image. Adding rule-based guidance doubles RAG&apos;s performance in document retrieval and answer generation.

Basically, RAG gets a proper manual on how to use its knowledge.

It&apos;s like giving RAG a GPS instead of letting it wander around blindly.

🎯 Original Problem:

Current…

Ray Strode reposted

2:4 Sparsity + @AIatMeta Llama-3.1: At @neuralmagic, we've developed a recipe to produce very competitive sparse LLMs, and we are starting by open-sourcing the first one: Sparse-Llama-3.1-8B-2of4. We also show how to leverage it for blazingly fast inference in @vllm_project.

_EldarKurtic's tweet image. 2:4 Sparsity + @AIatMeta Llama-3.1: At @neuralmagic,  we&apos;ve developed a recipe to produce very competitive sparse LLMs, and we are starting by open-sourcing the first one: Sparse-Llama-3.1-8B-2of4. We also show how to leverage it for blazingly fast inference in @vllm_project.
_EldarKurtic's tweet image. 2:4 Sparsity + @AIatMeta Llama-3.1: At @neuralmagic,  we&apos;ve developed a recipe to produce very competitive sparse LLMs, and we are starting by open-sourcing the first one: Sparse-Llama-3.1-8B-2of4. We also show how to leverage it for blazingly fast inference in @vllm_project.

Ray Strode reposted

Introducing the Model Context Protocol (MCP) An open standard we've been working on at Anthropic that solves a core challenge with LLM apps - connecting them to your data. No more building custom integrations for every data source. MCP provides one protocol to connect them all:

alexalbert__'s tweet image. Introducing the Model Context Protocol (MCP)

An open standard we&apos;ve been working on at Anthropic that solves a core challenge with LLM apps - connecting them to your data.

No more building custom integrations for every data source. MCP provides one protocol to connect them all:

Ray Strode reposted

Hunyuan-Large by Tencent is a 389B param MOE (52B active). It's the largest open-weights MOE. In some benchmarks it exceeds Llama 3.1 405B. With MLX's new 3-bit quant it just barely fits on a single 192GB M2 Ultra! And runs at a very decent >15 toks/sec:


Ray Strode reposted

Lightricks just dropped the fastest text-to-video generation mode ever. It can generate videos faster than the time it takes to watch them! Code: github.com/Lightricks/LTX…

From AK

llama.cpp/vllm in a container ready to go. cross platform support. models stored in OCI compatible container registries like quay.io

Been working on a new RamaLama project last couple of months. Goal is to make running AI Models inside of containers super easy. Get your AI Models anywhere. This blog announces the today. Try it out. Love to hear your feedback @redhat @openshift @ibm developers.redhat.com/articles/2024/…



"real" and open source reinforcement learning recipe that uses verifiable rewards called tulu is released. Will be used on molmo soon.

I've spent the last two years scouring all available resources on RLHF specifically and post training broadly. Today, with the help of a totally cracked team, we bring you the fruits of that labor — Tülu 3, an entirely open frontier model post training recipe. We beat Llama 3.1…

natolambert's tweet image. I&apos;ve spent the last two years scouring all available resources on RLHF specifically and post training broadly. Today, with the help of a totally cracked team, we bring you the fruits of that labor — Tülu 3, an entirely open frontier model post training recipe. We beat Llama 3.1…


as with many things, the devil is in the details, and that, includes sparse retrievers like BM25, apparently. Of course, the wider point is to do your own benchmarks and don't rely on feature matrices when choosing an implementation.


chain of thought reasoning exhibited in normally discarded top-k results

Can LLMs reason effectively without prompting? Great paper by @GoogleDeepMind By considering multiple paths during decoding, LLMs show improved reasoning without special prompts. It reveals LLMs' natural reasoning capabilities. LLMs can reason better by exploring multiple…

rohanpaul_ai's tweet image. Can LLMs reason effectively without prompting?

Great paper by @GoogleDeepMind 

By considering multiple paths during decoding, LLMs show improved reasoning without special prompts. It reveals LLMs&apos; natural reasoning capabilities.

LLMs can reason better by exploring multiple…


fun little toy script i wrote that uses the granite model to complete a chunk of code github.com/halfline/os-de…

halfartificial's tweet image. fun little toy script i wrote that uses the granite model to complete a chunk of code github.com/halfline/os-de…

United States Trends

Loading...

Something went wrong.


Something went wrong.