vlthr's profile picture.

Valthor

@vlthr

Valthor reposted

How should we think about error handling in distributed systems? Let's see what you think: ✅ means you think we should crash the process server, ❌ means you don't.

MarcJBrooker's tweet image. How should we think about error handling in distributed systems? Let's see what you think: ✅ means you think we should crash the process server, ❌ means you don't.

Valthor reposted

Turns out you can communicate across containers via 63-bits of available space in a shared lock you acquire on /proc/self/ns/time that all processes have access to. No networking required. The post has a demo of a chat app communicating across unprivileged containers.

eatonphil's tweet image. Turns out you can communicate across containers via 63-bits of available space in a shared lock you acquire on /proc/self/ns/time that all processes have access to.

No networking required. The post has a demo of a chat app communicating across unprivileged containers.

Valthor reposted

If you want to join lovable and work on technical problems with Valthor apply here lovable.dev/careers


Valthor reposted

Building products takes too long. We started Lovable to fix this. Today, we're officially launching the world’s first AI Full Stack Engineer Lovable reliably replicates the capabilities of a full stack engineer and handles everything from design to database operations by…


Valthor reposted

I've been exploring this idea of writing informal 'code sketches' as a way of structuring LLM code generation to efficiently capture all the design decisions *I* care about before handoff. Repo here: github.com/westoncb/code-… If there are two things LLMs are super good at it's…


Valthor reposted

Ahh yes. Recreative programming. So. Much. Fun.

fishnets88's tweet image. Ahh yes. Recreative programming. So. Much. Fun.

Valthor reposted

JD has a gift for saying things that are obviously true in the least palatable way imaginable

NEW VANCE AUDIO: In an interview from 2020, JD Vance agrees with a podcast host who says having grandmothers help raise children is “the whole purpose of the postmenopausal female.” He also agrees when the host says grandparents helping raise children is a "weird, unadvertised…



Valthor reposted

Embedding features learned with sparse autoencoders can make semantic edits to text ✨ (+ a reading/highlighting demo) I've built an interface to explore and visualize GPT-4 labelled features learned from a text embedding model's latent space. Here's a little video, more in 👇


Valthor reposted

Thought LLMs would be better at this: Given 2 500 token summaries: Does sum 1 mention anything sum 2 doesn't? Goal: Find 4 diffs w/o hallucinating. GPT-4 T:1/4, 4+ Hs GPT4-o:1/4, 4 Hs Command R+:1/4, 1 H L3:1/4, 1 H DeepseekV2: 3/4, 3 Hs Opus: 3/4, 1 H Gemini 1.5: 3/4, 0 H


Valthor reposted

since I'm deeply immersed in evals right now (and the process of building them) I got a kick out of this paper from @sh_reya @jdzamfi @bjo3rn @adityagp @IanArawjo it addresses the challenge of time-efficiently coming up with evals that are aligned with practitioners some…

enginoid's tweet image. since I'm deeply immersed in evals right now (and the process of building them) I got a kick out of this paper from @sh_reya @jdzamfi @bjo3rn @adityagp @IanArawjo 

it addresses the challenge of time-efficiently coming up with evals that are aligned with practitioners

some…

Valthor reposted

When I first saw Tree of Thoughts, I asked myself: If language models can reason better by searching, why don't they do it themselves during Chain of Thought? Some possible answers (and a new paper): 🧵


Valthor reposted

Sharing RewardBench! We delve into the evaluation of reward models, often used to align LLMs to human preferences, across various tasks such as chat, reasoning, code, safety, etc. Hopefully, this work leads to a better understanding of the preference-tuning process :)

Excited to share something that we've needed since the early open RLHF days: RewardBench, the first benchmark for reward models. 1. We evaluated 30+ of the currently available RMs (w/ DPO too). 2. We created new datasets covering chat, safety, code, math, etc. We learned a lot.…

natolambert's tweet image. Excited to share something that we've needed since the early open RLHF days: RewardBench, the first benchmark for reward models.
1. We evaluated 30+ of the currently available RMs (w/ DPO too).
2. We created new datasets covering chat, safety, code, math, etc. We learned a lot.…
natolambert's tweet image. Excited to share something that we've needed since the early open RLHF days: RewardBench, the first benchmark for reward models.
1. We evaluated 30+ of the currently available RMs (w/ DPO too).
2. We created new datasets covering chat, safety, code, math, etc. We learned a lot.…
natolambert's tweet image. Excited to share something that we've needed since the early open RLHF days: RewardBench, the first benchmark for reward models.
1. We evaluated 30+ of the currently available RMs (w/ DPO too).
2. We created new datasets covering chat, safety, code, math, etc. We learned a lot.…


Valthor reposted

I think there's maybe really something here, look at this. I got GPT to respond to the prompt from this uml.edu/docs/sample-es…. Then I got a sequence of embeddings for each word in both the human- and GPT- authored essays. You can see how the human one moves around more.

I think any point in that sequence is probably not that informative but the trajectory might be highly informative. Intuitively, natural writing isn’t done by always choosing the most likely next token whereas robot writing is and you might be able to see that in the trajectory.



Valthor reposted

I've updated my auto codebase analyzer, using Qwen-1.5-72b-chat + sglang for large-codebase analysis, is really really good. Now it more clearly highlights important features of the codebase and methods. github.com/cloneofsimo/au…

cloneofsimo's tweet image. I've updated my auto codebase analyzer, using Qwen-1.5-72b-chat + sglang for large-codebase analysis, is really really good. Now it more clearly highlights important features of the codebase and methods.

github.com/cloneofsimo/au…

Valthor reposted

Today, with @Tim_Dettmers, @huggingface, & @mobius_labs, we're releasing FSDP/QLoRA, a new project that lets you efficiently train very large (70b) models on a home computer with consumer gaming GPUs. 1/🧵 answer.ai/posts/2024-03-…


Valthor reposted

I am proud to release my newest coding dataset, 143k examples of tested python code. Why train on python code that doesn't work? When you can train on a large variety of tested python code! huggingface.co/datasets/Vezor…


Valthor reposted

Lots of requests for richer observability in DSPy. In March, @mikeldking & I are holding a DSPy <> @arizeai meetup in SF to show you how to do that w @ArizePhoenix-DSPy integration. Video by @axiomofjoy. Good chance to show something cool with DSPy. What would you like to see?


Valthor reposted

Happy Valentine’s Day everyone! And happy birthday Dad ❤️(my dad is awesome) Launching a new AI startup out of Europe today. Lovable. Needless to say, we’re very excited about Lovable. We think it will be huge: We’re building software that builds software. See website for…


Valthor reposted

Have you ever done a dense grid search over neural network hyperparameters? Like a *really dense* grid search? It looks like this (!!). Blueish colors correspond to hyperparameters for which training converges, redish colors to hyperparameters for which training diverges.


Valthor reposted

🗣️I've been thinking about data quality & human factor in the process a lot lately, so write a short post on the topic: lilianweng.github.io/posts/2024-02-… More: If you are into the topic, my team is hiring Research Engineer for a new sub-team Human-AI Interaction: openai.com/careers/resear…


Loading...

Something went wrong.


Something went wrong.