datawithdev's profile picture. Get the 5 Most Interesting Topics in Data Engineering and analytics at 👍 http://5bulletdata.com

Dev

@datawithdev

Get the 5 Most Interesting Topics in Data Engineering and analytics at 👍 http://5bulletdata.com

Dev reposted

Cut our AWS bill from $52K to $18K per month. Took 3 weeks of detective work. The audit: - Started with AWS Cost Explorer - Noticed NAT Gateway was $8K/month - Data transfer was $12K/month - RDS storage was $6K/month What we found: - Logs were being sent to S3 via NAT Gateway -…


Dev reposted

we are working on a Rust-based ETL server that can stream your Postgres database to S3/Iceberg (and other databases like BigQuery & ClickHouse) 100% open source, and designed so that it can be embedded in any Rust server

kiwicopple's tweet image. we are working on a Rust-based ETL server that can stream your Postgres database to S3/Iceberg (and other databases like BigQuery & ClickHouse)

100% open source, and designed so that it can be embedded in any Rust server

Dev reposted

I still remember back in grad school. My friend in NLP used to show off, bragging that he had LSTM all figured out. I envied him. Fortunately, my field was Computer Vision. I could survive just knowing my SVMs. In 2024, the inventor of LSTM himself is finally back with the…


Dev reposted

Vector databases explained for people who just want to understand. You have 10,000 product descriptions. User searches for "comfortable outdoor furniture." Traditional database: - Searches for exact word matches - Finds products containing "comfortable" OR "outdoor" OR…


Dev reposted

CTO asked to reduce S3 costs. We were spending $4,200 monthly on storage. Implemented lifecycle policies: - Move to Glacier after 90 days - Delete after 2 years Cost dropped to $980 per month. Saved $3,220 monthly. DevOps team got praised in all-hands meeting. Three months…


Dev reposted

Our EC2 bill was $31,000 monthly. 70% of instances ran below 15% CPU. We implemented: - AWS Compute Optimizer recommendations - Downsized 85 instances - Moved dev/staging to spot instances - Scheduled non-prod environments to shut down at night New monthly bill: $11,200 The…


Dev reposted

I learned this with a friend from Italy! From now on I only cook spaghetti like this🍝🍜


Dev reposted

Nice, short post illustrating how simple text (discrete) diffusion can be. Diffusion (i.e. parallel, iterated denoising, top) is the pervasive generative paradigm in image/video, but autoregression (i.e. go left to right bottom) is the dominant paradigm in text. For audio I've…

BERT is just a Single Text Diffusion Step! (1/n) When I first read about language diffusion models, I was surprised to find that their training objective was just a generalization of masked language modeling (MLM), something we’ve been doing since BERT from 2018. The first…



Dev reposted

Exploring @karpathy Nanochat through a Knowledge Graph Love how this project combines simplicity with power. Want to navigate and understand the entire repo? Check it out here: deepgraph.co/karpathy/nanoc… Plus, you can connect it via MCP to any AI coding assistant and work…

Excited to release new repo: nanochat! (it's among the most unhinged I've written). Unlike my earlier similar repo nanoGPT which only covered pretraining, nanochat is a minimal, from scratch, full-stack training/inference pipeline of a simple ChatGPT clone in a single,…

karpathy's tweet image. Excited to release new repo: nanochat!
(it's among the most unhinged I've written).

Unlike my earlier similar repo nanoGPT which only covered pretraining, nanochat is a minimal, from scratch, full-stack training/inference pipeline of a simple ChatGPT clone in a single,…


Dev reposted

Excited to release new repo: nanochat! (it's among the most unhinged I've written). Unlike my earlier similar repo nanoGPT which only covered pretraining, nanochat is a minimal, from scratch, full-stack training/inference pipeline of a simple ChatGPT clone in a single,…

karpathy's tweet image. Excited to release new repo: nanochat!
(it's among the most unhinged I've written).

Unlike my earlier similar repo nanoGPT which only covered pretraining, nanochat is a minimal, from scratch, full-stack training/inference pipeline of a simple ChatGPT clone in a single,…

Dev reposted

At MIT, I learned about RNNs in my NLP class with Prof. Michael Collins. He built a model from my keystrokes to predict who I was. To me, it felt like a magic box. Years later, when I had to teach RNNs, I forced myself to go inside the box. ⬇️ Download: byhand.ai/rnn


Dev reposted

"Foundations of Machine Learning" A MUST while starting AI/ML. Absolutely Beginner friendly. To get: - 1. Follow (So I can DM you ) 2. Like & retweet 3. Reply " Send "

DAIEvolutionHub's tweet image. "Foundations of Machine Learning"

A MUST while starting AI/ML. Absolutely Beginner friendly.

To get: - 
1. Follow (So I can DM you )
 2. Like & retweet
3. Reply " Send "

Dev reposted

10 lessons from @karpathy's "Intro to Large Language Models" talk recorded ~1 year ago, but still an amazing overview of LLMs. 1. An LLM is Just Two Files 📂 An LLM isn't some abstract cloud entity; at its core, it's just two files: a large parameters file (the model's…

chrsaravia's tweet card. [1hr Talk] Intro to Large Language Models

youtube.com

YouTube

[1hr Talk] Intro to Large Language Models


Dev reposted

Announcing a significant upgrade to Agentic Document Extraction! LandingAI's new DPT (Document Pre-trained Transformer) accurately extracts even from complex docs. For example, from large, complex tables, which is important for many finance and healthcare applications. And a…


Dev reposted

This is a really good book. I like it because it covers both ends of the spectrum: 1. How LLMs work 2. How to build using LLMs It's a really nice one-two punch: start with the theory and use that right away to implement something useful. The second half of the book is what I…

svpino's tweet image. This is a really good book.

I like it because it covers both ends of the spectrum:

1. How LLMs work
2. How to build using LLMs

It's a really nice one-two punch: start with the theory and use that right away to implement something useful.

The second half of the book is what I…

Dev reposted

Fragmented data is a major roadblock to scaling AI. 🚩 @Snowflake, alongside our industry and ecosystem partners, is tackling it head-on with the Open Semantic Interchange (OSI), a new initiative to create a universal framework for semantic data. We’ve teamed up with industry…

RamaswmySridhar's tweet image. Fragmented data is a major roadblock to scaling AI. 🚩

@Snowflake, alongside our industry and ecosystem partners, is tackling it head-on with the Open Semantic Interchange (OSI), a new initiative to create a universal framework for semantic data.

We’ve teamed up with industry…

Dev reposted

openai ceo sam altman says he uses a spiral notebook, rip pages out of it and uses uniball micro 0.5mm pen


Dev reposted

I am once again begging you to put your database servers and application servers in the same region.


Dev reposted

The greatest explanation of PCA you will ever read

goyal__pramod's tweet image. The greatest explanation of PCA you will ever read

Dev reposted

A rare practical project from me. I live in a building with ~16 units and we all share 2 washers/2 dryers. Very annoying to go all the way down to basement just to find out they're in use. So I'm putting together a low maintenance, non-invasive monitoring setup we can all use 🧵

gvy_dvpont's tweet image. A rare practical project from me. I live in a building with ~16 units and we all share 2 washers/2 dryers. Very annoying to go all the way down to basement just to find out they're in use. So I'm putting together a low maintenance, non-invasive monitoring setup we can all use 🧵
gvy_dvpont's tweet image. A rare practical project from me. I live in a building with ~16 units and we all share 2 washers/2 dryers. Very annoying to go all the way down to basement just to find out they're in use. So I'm putting together a low maintenance, non-invasive monitoring setup we can all use 🧵
gvy_dvpont's tweet image. A rare practical project from me. I live in a building with ~16 units and we all share 2 washers/2 dryers. Very annoying to go all the way down to basement just to find out they're in use. So I'm putting together a low maintenance, non-invasive monitoring setup we can all use 🧵

Loading...

Something went wrong.


Something went wrong.