
Bartosz Konieczny
@waitingforcode
Freelance Data Engineer and instructor, enjoy solving data problems with #ApacheSpark #AWS #GCP #Azure 👨🏭 | [email protected]
Dit vind je misschien leuk
⏰ Final Reminder – Delta Lake Webinar Tomorrow! Wondering if data engineering design patterns can unlock new insights into Delta Lake? Or how Delta Lake can become a key part of your streaming data architecture? Join @newfront (@bufbuild) and @waitingforcode as they tackle…

Why don’t Iceberg or Delta Lake have secondary indexes? Because analytics workloads and OLTP workloads optimize for opposite I/O patterns. See my dive into data layout, pruning, and what “indexing” really means in open table formats: jack-vanlightly.com/blog/2025/10/8…
Are you wondering if general concepts like data engineering design patterns can help you learn about #DeltaLake? Or, if it's possible to leverage Delta Lake within your streaming data architecture? In this webinar, Scott Haines and Bartosz Konieczny will answer these two…

Releasing Soon! Pre-order now shroffpublishers.com/books/97893680… Data Engineering Design Patterns By Bartosz Konieczny @waitingforcode. with @OReillyMedia Focusing on various aspects of data engineering, including data ingestion, data quality, idempotency, and more. #dataengineering

If you want to understand the consistency models of the mentioned table formats of the paper, I've written about it extensively and written formal models. * jack-vanlightly.com/analyses/2024/… * jack-vanlightly.com/analyses/2024/… * jack-vanlightly.com/analyses/2024/… * github.com/Vanlightly/tab…
Data Engineering patterns on the cloud by Bartosz Konieczny is on sale on Leanpub! Its suggested price is $39.00; get it for $24.65 with this coupon: leanpub.com/sh/ygsnqbRD @waitingforcode #CloudComputing #AmazonWebServices #GoogleCloudPlatform #MicrosoftAzure
Join @newfront and @waitingforcode and learn all about streaming Delta Lake tables with Apache Spark Structured Streaming! 🦀 🗓 March 21st 🕝 9:00AM PT / 12:00PM ET 💻 Join this webinar via LinkedIn, YouTube, or Zoom! Learn more: linkedin.com/events/streami… #deltalake #streaming

I have been busy the last few months writing a book for O'Reilly about how to build ML systems (batch, real-time, and LLMs), distilling much of what I have learnt from both working with customers as well as students. Why could the book interest you? * Data Scientists - transition…
I don't want to start a flame war here, but IMO it is a mistake to jump straight to distributed databases (and 90% of the content below is distributed databases) without first learning fundamentals on single node databases. Here's my 10 things to understand about databases:…
Ten things to understand about your database: 1) High level Architecture 2) How writes work? (Replication, data distribution, internal organisation etc) 3) How reads work? (Consistency guarantees, tuning options, etc) 4) CAP theorem, ex. CP or AP 5) Transactions and Concurrency…
Data Engineering patterns on the cloud by Bartosz Konieczny is on sale on Leanpub! Its suggested price is $39.00; get it for $26.10 with this coupon: leanpub.com/sh/1T4q5Z81 @waitingforcode #CloudComputing #AmazonWebServices #GoogleCloudPlatform #MicrosoftAzure
Chapter 4 of The Architecture of Serverless Data Systems: CockroachDB (serverless). jack-vanlightly.com/analyses/2023/…
The early release of Delta Lake: The Definitive Guide is here! 🎉 The latest edition includes the addition of Chapter 12: Performance Tuning. Download here ➡️ bit.ly/472DVY7 Authors @dennylee, Prashanth Babu, Tristen Wentling, & @newfront #opensource #deltalake #oss
Data Engineering patterns on the cloud: How to solve common data engineering problems with cloud services? leanpub.com/data-engineeri… by Bartosz Konieczny is the featured book on the Leanpub homepage! leanpub.com @waitingforcode #CloudComputing #AmazonWebServices
Last week I spent some time to understand the #PySpark applyInPandasWithState. This week I'm refactoring the code, hoping to still understand it 2 months later ;) 👉 waitingforcode.com/apache-spark-s…

In the previous release #PySpark has got an interesting streaming feature -> the arbitrary stateful processing. It has a different API than the Scala version but is more adapted to the Python world. More 👉 waitingforcode.com/apache-spark-s…

A list of articles I share again and again when developers ask me about Kafka 🧵
[ANNOUNCEMENT] Congrats to the Apache Spark community and all the contributors! The Apache Spark 3.5.0 release is here. Try it out! spark.apache.org/releases/spark…
It's not a rebranding but more a regrouping 😉 All my additional #dataengineering content is now available from there waitingforcode.com/better (planning to add some stream processing materials soon)

If Delta Lake implemented the commits only, I could stop exploring this transactional part after the previous article. But as for RDBMS, #DeltaLake implements other ACID-related concepts, such as isolation levels 👉 waitingforcode.com/delta-lake/tab…
One of the great features of table file formats is the ability to handle write conflicts. It wouldn't be possible without commits that are the topic of my #DeltaLake blog post. waitingforcode.com/delta-lake/tab…
United States Trends
- 1. No Kings 1.21M posts
- 2. Ole Miss 13.6K posts
- 3. Georgia 65.8K posts
- 4. #UFCVancouver 12.1K posts
- 5. #GoDawgs 5,863 posts
- 6. Drew Dober N/A
- 7. Austin Hill N/A
- 8. Julian Sayin 3,110 posts
- 9. Brian Kelly 8,223 posts
- 10. Carnell Tate 6,269 posts
- 11. Gunner 5,884 posts
- 12. Texas Tech 4,833 posts
- 13. Lane Kiffin 6,261 posts
- 14. Vandy 13.3K posts
- 15. UNLV 2,339 posts
- 16. Clemson 5,515 posts
- 17. Shapen N/A
- 18. Hammond 2,445 posts
- 19. Wisconsin 20.2K posts
- 20. Pete Golding N/A
Dit vind je misschien leuk
-
Apache Spark
@ApacheSpark -
Reynold Xin
@rxin -
#DataAISummit
@Data_AI_Summit -
Apache Hudi
@apachehudi -
Jacek Laskowski
@jaceklaskowski -
Apache Superset
@apachesuperset -
Matei Zaharia
@matei_zaharia -
Maxime Beauchemin
@mistercrunch -
Adi Polak
@AdiPolak -
Joseph Machado
@startdataeng -
MLflow
@MLflow -
Apache Pinot
@ApachePinot -
Delta Lake
@DeltaLakeOSS -
Flink Forward
@FlinkForward -
Apache Kafka
@apachekafka
Something went wrong.
Something went wrong.