DominikDataDev's profile picture.

Intentionally left empty

@DominikDataDev

Handling Schema Evolution in the Data Pipelines at KOHO by @shwetastha1O koho.dev/handling-schem… One possible way to automatically handling source schema changes when loading data into AWS Redshift

koho.dev

Handling Schema Evolution in the Data Pipelines at KOHO

Written by Shweta Shrestha and Sahar Jazebi.


Open AI is nothing without its people


„You can’t unit test SQL“ Well, that´s wrong. And this should be obvious because queries are conceptually so similar to dataframe transformations. The simple trick is to limit the scope of CTEs as if they were functions. Example without 3rd party tools: stackoverflow.com/a/754570


10 Reasons Why Estimating Time For Data Projects is Hard annageller.com/blog/10-reason…


Intentionally left empty reposted

When data to process is larger than memory, try to stream with python generators, before jumping to distributed systems! #data #dataengineering #Python #pythonlearning #Generator E.g. Stream a file(note () and not []), get diff between date cols

startdataeng's tweet image. When data to process is larger than memory, try to stream with python generators, before jumping to distributed systems!

#data #dataengineering #Python #pythonlearning #Generator 

E.g. Stream a file(note () and not []), get diff  between date cols

Good summary on how to structure Pyspark code for Unit testing, including approaches to test data engineeringfordatascience.com/posts/pyspark_…


Intentionally left empty reposted

the best engineer I've ever met in my life told me something that's always stuck with me: "software is only correct given a certain time frame" the world changes, and your software needs to change along with it


Loading...

Something went wrong.


Something went wrong.