You might like
I just published "Optimizing storage for your data lake in 6 ways" link.medium.com/fWjB3u8MDHb
link.medium.com
Optimizing storage for your data lake in 6 ways
Data lakes are a cost-effective solution for storing large amounts of data in the cloud, for example on AWS S3 or Azure Data Lake Storage…
Great video on testing youtube.com/watch?v=RHO8Hh…
youtube.com
YouTube
This is why testing is hard
Handling Schema Evolution in the Data Pipelines at KOHO by @shwetastha1O koho.dev/handling-schem… One possible way to automatically handling source schema changes when loading data into AWS Redshift
koho.dev
Handling Schema Evolution in the Data Pipelines at KOHO
Written by Shweta Shrestha and Sahar Jazebi.
Open AI is nothing without its people
„You can’t unit test SQL“ Well, that´s wrong. And this should be obvious because queries are conceptually so similar to dataframe transformations. The simple trick is to limit the scope of CTEs as if they were functions. Example without 3rd party tools: stackoverflow.com/a/754570
15 Essential Steps To Build Reliable Data Pipelines towardsdatascience.com/15-essential-s…
When data to process is larger than memory, try to stream with python generators, before jumping to distributed systems! #data #dataengineering #Python #pythonlearning #Generator E.g. Stream a file(note () and not []), get diff between date cols
Good summary on how to structure Pyspark code for Unit testing, including approaches to test data engineeringfordatascience.com/posts/pyspark_…
the best engineer I've ever met in my life told me something that's always stuck with me: "software is only correct given a certain time frame" the world changes, and your software needs to change along with it
United States Trends
- 1. Marshawn Kneeland 33.8K posts
- 2. Nancy Pelosi 43.9K posts
- 3. #MichaelMovie 51.2K posts
- 4. Craig Stammen N/A
- 5. #NO1ShinesLikeHongjoong 33.1K posts
- 6. #영원한_넘버원캡틴쭝_생일 32.6K posts
- 7. Baxcalibur 4,985 posts
- 8. Gremlins 3 3,971 posts
- 9. ESPN Bet 2,786 posts
- 10. Joe Dante N/A
- 11. Chimecho 7,243 posts
- 12. Pujols N/A
- 13. Dallas Cowboys 12.9K posts
- 14. Chris Columbus 3,558 posts
- 15. Jaafar 15K posts
- 16. #LosdeSiemprePorelNO N/A
- 17. VOTAR NO 27.3K posts
- 18. #thursdayvibes 3,438 posts
- 19. Unplanned 8,508 posts
- 20. She's 85 1,132 posts
Something went wrong.
Something went wrong.