
Joseph Machado
@startdataeng
I write about data engineering | SQL | Python | Distributed systems. Get my free data engineering course at http://startdataengineering.com/email-course/
قد يعجبك
Exercise project for anyone starting in data engineering startdataengineering.com/post/data-engi… #dataengineering #bigdata #ETL #ApacheAirflow #AWS #ApacheSpark

An orchestration tool that I've been impressed with is @dagsterio. Easy setup, powerful features and great docs. Use 👇🏽 to play around with a pipeline on dagster startdataengineering.com/post/data-engi… #data #dataengineering #Python #Database #DataAnalytics
Backfilling is an inevitable part of data projects. When designing your data pipelines take some time to answer the following questions 1. Does multiple backfill runs cause duplicate data? 2. Can multiple backfills be parallelized? #data #DataEngineering #datapipeline #datasets
Starting as a DE? 90% of what you will need is SQL (OLAP), python, & distributed system basics Don't overcomplicate! #data #dataengineering #SQL #Database #Python
Left anti-join is cool! Get all the data from the left table that has no matching data in the right table select t1.* from t1 left join t2 on t1.id=t2.id where https://t.co/TilIGFtVGj is null; #data #dataengineering #SQL #Database
Exercise project for anyone starting in data engineering startdataengineering.com/post/data-engi… #dataengineering #bigdata #ETL #ApacheAirflow #AWS #ApacheSpark

Starting a data project is a lot of work! It can be overwhelming to define the problem, set up systems, and then code! Use this DE project as a blueprint to build your own: startdataengineering.com/post/data-engi… #data #dataengineering #Database #DataAnalytics #dataviz #Python #datapipeline

When data to process is larger than memory, try to stream with python generators, before jumping to distributed systems! #data #dataengineering #Python #pythonlearning #Generator E.g. Stream a file(note () and not []), get diff between date cols
![startdataeng's tweet image. When data to process is larger than memory, try to stream with python generators, before jumping to distributed systems!
#data #dataengineering #Python #pythonlearning #Generator
E.g. Stream a file(note () and not []), get diff between date cols](https://pbs.twimg.com/media/FQyi00EXoAMO4cg.jpg)
If you are interested in using "Change Data Capture" pattern for streaming ETL, check out startdataengineering.com/post/change-da… #ETL #changedatacapture #dataengineering #debezium #BigData

It can be overwhelming to start learning data engineering. I'd recommend starting with the basics of python, sql, UNIX commands, building a simple data project, update Github, Linkedin. Landing a DE job is 60% part learning and 40% marketing. See reply 👇🏽 for helpful links.
Preparing for SQL interviews? Do Leetcode SQL hard, sort by freq, and do the first 40 #data #dataengineering #Software #SQL
Learning data engineering? Build a pipeline locally. 1. Python to pull data from an API (e.g. Coincap) 2. Load data into a local Postgres container 3. Automate it with cron/task scheduler Start small, build, improve, & repeat. #data #dataengineering #pythonlearning #Python
uv by @astral_sh is truly one of the best tools you can have in your toolkit as a DE. TIL: You can quickly start a jupyter notebook with it doc: docs.astral.sh/uv/guides/inte…
Data engineers write the most complex piece of code to Upsert into tables. Here's THE command you need to know MERGE INTO/INSERT ON CONFLICT #data #dataengineering #SQL
United States الاتجاهات
- 1. Chiefs 67.8K posts
- 2. LaPorta 8,546 posts
- 3. #TNABoundForGlory 32K posts
- 4. Goff 11K posts
- 5. Butker 7,326 posts
- 6. Kelce 11.6K posts
- 7. #OnePride 5,013 posts
- 8. Baker 48.7K posts
- 9. Bryce Miller 2,439 posts
- 10. #DETvsKC 3,369 posts
- 11. #SNFonNBC N/A
- 12. Collinsworth 1,728 posts
- 13. Dan Campbell 1,996 posts
- 14. Gibbs 4,957 posts
- 15. #ALCS 7,762 posts
- 16. Polanco 5,831 posts
- 17. Leon Slater 2,380 posts
- 18. Cal Raleigh 4,403 posts
- 19. Pacheco 4,267 posts
- 20. 49ers 42.8K posts
قد يعجبك
-
DuckDB
@duckdb -
Mehdi Ouazza
@mehd_io -
dbt Labs
@dbt_labs -
Data Engineering Weekly
@data_weekly -
Darshil | Data Engineer👨🏻🔧
@parmardarshil07 -
Zach Wilson
@EcZachly -
dbt
@getdbt -
Anna Geller
@anna__geller -
Maxime Beauchemin
@mistercrunch -
SeattleDataGuy
@SeattleDataGuy -
Tristan Handy
@jthandy -
Mitchell van Rijkom
@MitchellvRijkom -
Neelesh Salian 💻
@neelesh_salian -
Sarah Floris
@ADutchEngineer -
Robert Yi 🐳
@imrobertyi
Something went wrong.
Something went wrong.