Intentionally left empty

@DominikDataDev

Joined July 2011

From pyenv to uv: Streamlining Python Management | Rob's Cogitations

A quick guide on transitioning from pyenv to Astral's uv, highlighting its speed, unified workflow, and tips for setting up Python environments efficiently.

Source: rob.cogit8.org

Intentionally left empty

@DominikDataDev

Mar 2, 2024

I just published "Optimizing storage for your data lake in 6 ways" link.medium.com/fWjB3u8MDHb

link.medium.com

Optimizing storage for your data lake in 6 ways

Data lakes are a cost-effective solution for storing large amounts of data in the cloud, for example on AWS S3 or Azure Data Lake Storage…

Source: link.medium.com

Intentionally left empty

@DominikDataDev

Feb 13, 2024

View my verified achievement from @awscloud. credly.com/badges/fe18888…

Intentionally left empty

@DominikDataDev

Jan 23, 2024

Great video on testing youtube.com/watch?v=RHO8Hh…

DominikDataDev's tweet card. This is why testing is hard

youtube.com

YouTube

This is why testing is hard

Source: youtube.com

Intentionally left empty

@DominikDataDev

Nov 29, 2023

Handling Schema Evolution in the Data Pipelines at KOHO by @shwetastha1O koho.dev/handling-schem… One possible way to automatically handling source schema changes when loading data into AWS Redshift

koho.dev

Handling Schema Evolution in the Data Pipelines at KOHO

Written by Shweta Shrestha and Sahar Jazebi.

Source: koho.dev

Intentionally left empty

@DominikDataDev

Nov 23, 2023

Open AI is nothing without its people

„You can’t unit test SQL“ Well, that´s wrong. And this should be obvious because queries are conceptually so similar to dataframe transformations. The simple trick is to limit the scope of CTEs as if they were functions. Example without 3rd party tools: stackoverflow.com/a/754570

Intentionally left empty

@DominikDataDev

Sep 19, 2023

15 Essential Steps To Build Reliable Data Pipelines towardsdatascience.com/15-essential-s…

Intentionally left empty

@DominikDataDev

Sep 19, 2023

10 Reasons Why Estimating Time For Data Projects is Hard annageller.com/blog/10-reason…

Intentionally left empty reposted

Joseph Machado

@startdataeng

Apr 20, 2022

When data to process is larger than memory, try to stream with python generators, before jumping to distributed systems! #data #dataengineering #Python #pythonlearning #Generator E.g. Stream a file(note () and not []), get diff between date cols

startdataeng's tweet image. When data to process is larger than memory, try to stream with python generators, before jumping to distributed systems!

#data #dataengineering #Python #pythonlearning #Generator

E.g. Stream a file(note () and not []), get diff between date cols

Intentionally left empty

@DominikDataDev

Sep 18, 2023

Good summary on how to structure Pyspark code for Unit testing, including approaches to test data engineeringfordatascience.com/posts/pyspark_…

Intentionally left empty reposted

kache

@yacineMTB

Aug 17, 2023

the best engineer I've ever met in my life told me something that's always stuck with me: "software is only correct given a certain time frame" the world changes, and your software needs to change along with it