Dev
@datawithdev
Get the 5 Most Interesting Topics in Data Engineering and analytics at 👍 http://5bulletdata.com
You might like
Cut our AWS bill from $52K to $18K per month. Took 3 weeks of detective work. The audit: - Started with AWS Cost Explorer - Noticed NAT Gateway was $8K/month - Data transfer was $12K/month - RDS storage was $6K/month What we found: - Logs were being sent to S3 via NAT Gateway -…
we are working on a Rust-based ETL server that can stream your Postgres database to S3/Iceberg (and other databases like BigQuery & ClickHouse) 100% open source, and designed so that it can be embedded in any Rust server
I still remember back in grad school. My friend in NLP used to show off, bragging that he had LSTM all figured out. I envied him. Fortunately, my field was Computer Vision. I could survive just knowing my SVMs. In 2024, the inventor of LSTM himself is finally back with the…
Vector databases explained for people who just want to understand. You have 10,000 product descriptions. User searches for "comfortable outdoor furniture." Traditional database: - Searches for exact word matches - Finds products containing "comfortable" OR "outdoor" OR…
CTO asked to reduce S3 costs. We were spending $4,200 monthly on storage. Implemented lifecycle policies: - Move to Glacier after 90 days - Delete after 2 years Cost dropped to $980 per month. Saved $3,220 monthly. DevOps team got praised in all-hands meeting. Three months…
Our EC2 bill was $31,000 monthly. 70% of instances ran below 15% CPU. We implemented: - AWS Compute Optimizer recommendations - Downsized 85 instances - Moved dev/staging to spot instances - Scheduled non-prod environments to shut down at night New monthly bill: $11,200 The…
I learned this with a friend from Italy! From now on I only cook spaghetti like this🍝🍜
Nice, short post illustrating how simple text (discrete) diffusion can be. Diffusion (i.e. parallel, iterated denoising, top) is the pervasive generative paradigm in image/video, but autoregression (i.e. go left to right bottom) is the dominant paradigm in text. For audio I've…
BERT is just a Single Text Diffusion Step! (1/n) When I first read about language diffusion models, I was surprised to find that their training objective was just a generalization of masked language modeling (MLM), something we’ve been doing since BERT from 2018. The first…
Exploring @karpathy Nanochat through a Knowledge Graph Love how this project combines simplicity with power. Want to navigate and understand the entire repo? Check it out here: deepgraph.co/karpathy/nanoc… Plus, you can connect it via MCP to any AI coding assistant and work…
Excited to release new repo: nanochat! (it's among the most unhinged I've written). Unlike my earlier similar repo nanoGPT which only covered pretraining, nanochat is a minimal, from scratch, full-stack training/inference pipeline of a simple ChatGPT clone in a single,…
Excited to release new repo: nanochat! (it's among the most unhinged I've written). Unlike my earlier similar repo nanoGPT which only covered pretraining, nanochat is a minimal, from scratch, full-stack training/inference pipeline of a simple ChatGPT clone in a single,…
At MIT, I learned about RNNs in my NLP class with Prof. Michael Collins. He built a model from my keystrokes to predict who I was. To me, it felt like a magic box. Years later, when I had to teach RNNs, I forced myself to go inside the box. ⬇️ Download: byhand.ai/rnn…
"Foundations of Machine Learning" A MUST while starting AI/ML. Absolutely Beginner friendly. To get: - 1. Follow (So I can DM you ) 2. Like & retweet 3. Reply " Send "
10 lessons from @karpathy's "Intro to Large Language Models" talk recorded ~1 year ago, but still an amazing overview of LLMs. 1. An LLM is Just Two Files 📂 An LLM isn't some abstract cloud entity; at its core, it's just two files: a large parameters file (the model's…
youtube.com
YouTube
[1hr Talk] Intro to Large Language Models
Announcing a significant upgrade to Agentic Document Extraction! LandingAI's new DPT (Document Pre-trained Transformer) accurately extracts even from complex docs. For example, from large, complex tables, which is important for many finance and healthcare applications. And a…
This is a really good book. I like it because it covers both ends of the spectrum: 1. How LLMs work 2. How to build using LLMs It's a really nice one-two punch: start with the theory and use that right away to implement something useful. The second half of the book is what I…
Fragmented data is a major roadblock to scaling AI. 🚩 @Snowflake, alongside our industry and ecosystem partners, is tackling it head-on with the Open Semantic Interchange (OSI), a new initiative to create a universal framework for semantic data. We’ve teamed up with industry…
openai ceo sam altman says he uses a spiral notebook, rip pages out of it and uses uniball micro 0.5mm pen
I am once again begging you to put your database servers and application servers in the same region.
The greatest explanation of PCA you will ever read
A rare practical project from me. I live in a building with ~16 units and we all share 2 washers/2 dryers. Very annoying to go all the way down to basement just to find out they're in use. So I'm putting together a low maintenance, non-invasive monitoring setup we can all use 🧵
United States Trends
- 1. $BNKK N/A
- 2. Pond 189K posts
- 3. Good Monday 36.3K posts
- 4. #MondayMotivation 35.7K posts
- 5. Happy 250th 3,680 posts
- 6. #Talus_Labs N/A
- 7. Semper Fi 4,680 posts
- 8. Rudy Giuliani 23.1K posts
- 9. Obamacare 20.9K posts
- 10. Go Birds 3,112 posts
- 11. #SoloLaUniónNosHaráLibres N/A
- 12. Victory Monday 1,305 posts
- 13. #MondayVibes 2,510 posts
- 14. #LingHerHynessTiktokLive 440K posts
- 15. United States Marine Corps 5,025 posts
- 16. LINGLING BA HERHYNESS 442K posts
- 17. The BBC 476K posts
- 18. Mark Meadows 20.9K posts
- 19. 8 Democrats 13.4K posts
- 20. Edmund Fitzgerald 4,249 posts
Something went wrong.
Something went wrong.