vutrinh
@_vutrinh
My mom read my articles to support her son. Now, she can design a data architecture and write ETL scripts.
Parquet is not a columnar format. Indeed, it’s a hybrid format combining the best of row and column formats. Parquet groups data into subsets of rows. (horizontal partition.) In each subset, data for each column is stored close together. (vertical partition) A Parquet file is…
🚀🚀 DuckDB is great. It allows us to execute analytics SQLs on the local laptop with minutes set up. Here are some bullet points about its storage after my sefl-learning process via DuckDB’s materials and source code. ◉ Two modes: persistent and in-memory; the latter will…
Paper I would love to read but instead have to write? 🤔
Have you ever wondered how the Parquet dataset is written on the Disk? Parquet is a self-described file format that contains all the information needed for the application that consumes the file. Parquet organizes data in a hybrid format behind the scenes.
🚀🚀 How does Apache Spark execute the applications for us? A few weeks ago, I wrote an article that gave an overview of Apache Spark. Let’s revisit how Spark handles processing—from user-defined logic to execution by the executors: ◉ Defining the Application: The user defines…
🤔 My humble observation Large-scale cloud OLAP has increasingly converged toward the lakehouse paradigm. Below are some insights from my research—feel free to discuss or share corrections if you find anything off! 📌 In this context: ➝ Internal tables refer to data loaded…
🚀🚀 How does the @ApacheSpark plan the execution for us? (With the help of Catalyst Optimizer) When defining DataFrame transformation logic, it must first go through an optimized process before execution. This involves four key phases: ◉ Analysis: Spark SQL starts by…
🚀🚀 How does the @ApacheIceberg reading process look like? ◉ The reader first visits the catalog to retrieve the table's current metadata file location. ◉ After fetching the metadata file, it collects the table’s schema and checks partition schemes to understand the data…
🎉 Wow. This is truly an epic masterpiece. Article from Vu Trinh(@_vutrinh), with its vivid illustrations, breaks down and explains the technical architecture of AutoMQ in a very clear and understandable way. If you're interested in the cloud-native technical architecture of…
United States Trends
- 1. Ferran 7,317 posts
- 2. Sonny Gray 7,748 posts
- 3. Rush Hour 4 11.4K posts
- 4. Chelsea 335K posts
- 5. Godzilla 21.1K posts
- 6. Red Sox 7,477 posts
- 7. Dick Fitts N/A
- 8. Raising Arizona 1,045 posts
- 9. Happy Thanksgiving 21.6K posts
- 10. 50 Cent 5,052 posts
- 11. Barca 120K posts
- 12. National Treasure 5,739 posts
- 13. Gone in 60 2,111 posts
- 14. Giolito N/A
- 15. Muriel Bowser N/A
- 16. Clarke 6,485 posts
- 17. Brett Ratner 3,502 posts
- 18. Ghost Rider 2,518 posts
- 19. Lord of War 1,492 posts
- 20. Leaving Las Vegas N/A
Something went wrong.
Something went wrong.