Vinoth Chandar

@byte_array

Founder @Onehousehq, Creator of @apachehudi, Built the World's first #DataLakehouse, Distributed/Data Systems, Linkedin, Uber, Confluent alum. (views are mine)

bytearray.io

Entrou em Abril de 2009

1KPosts 2KSeguidores 234Seguindo

Talvez você curta

@apachehudi

@Onehousehq

@rxin

@open_lakehouse

@startreedata

@criccomini

@shirshanka

@DeltaLakeOSS

@rmetzger_

@StephanEwen

@gwenshap

@MatthiasJSax

@Ubunta

@PiotrNowojski

@timeplusdata

Fixado

Vinoth Chandar

@byte_array

20 de mai. de

🔥 Meet Quanton — the new query execution engine from Onehouse. 👍 Same Spark & SQL. 📉 At least half the cost. 📈 1.6x-3.6x better ETL price-performance 📊 2.2x-6.5x better Ingest price-performance 👉 Read the full blog here: onehouse.ai/blog/announcin… ⬇️ Download our free…

byte_array's tweet card. Meet Quanton! Our new query execution engine, that powers Spark and SQL jobs on top of the Onehouse Compute Runtime. Learn how it provides 2-3x better price-performance for your ETL data pipelines.

Announcing Apache Spark™ and SQL on the Onehouse Compute Runtime with Quanton

Fonte: onehouse.ai

Vinoth Chandar repostou

Apache Hudi

@apachehudi

3 de nov. de

🚀 Launching @apachehudi notebooks — a local, self‑contained environment to learn Hudi end‑to‑end! Includes: • Spark, Hive, MinIO + Jupyter • 5 notebooks: CRUD on COW/MOR; Snapshot/RO/Incremental; Time Travel & CDC; SCD 2/4; Schema Evolution; SQL Procedures Try Hudi quickly…

Vinoth Chandar

@byte_array

24 de out. de

A common pattern I am seeing that is draining productivity. AI: “Here’s how to do it.” OS or some infra software: “Error: nice try.” Engineer: stuck in a loop going back and forth 😅 AI is a force multiplier, but only if you also use it to learn what to do and how things work

byte_array's tweet image. A common pattern I am seeing that is draining productivity.

AI: “Here’s how to do it.”
OS or some infra software: “Error: nice try.”
Engineer: stuck in a loop going back and forth 😅

AI is a force multiplier, but only if you also use it to learn what to do and how things work

Vinoth Chandar

@byte_array

17 de out. de

No longer just “faster than”; Now, @apachehudi is also “faster on” #apacheiceberg . Thanks to @apachextable

Shiyan Xu

@_xushiyan

17 de out. de

[Blog] Struggling with Apache Iceberg performance when your data dimensions get too hot? 🔥🌡️ Frequent updates and deletes in Iceberg can lead to a "chilly meltdown," forcing a tough choice between fast writes and efficient reads. 🥶 But what if you didn't have to compromise? 🤔…

_xushiyan's tweet image. [Blog] Struggling with Apache Iceberg performance when your data dimensions get too hot? 🔥🌡️

Frequent updates and deletes in Iceberg can lead to a "chilly meltdown," forcing a tough choice between fast writes and efficient reads. 🥶 But what if you didn't have to compromise? 🤔…

Vinoth Chandar repostou

Apache Hudi

@apachehudi

15 de out. de

🚀 Big news for Hudi Community! We're back with the Apache Hudi Meetup | ASIA (Chinese), and this time we're hosted by the incredible team at @JD_Corporate (京东) ! Get ready to explore the "Next-Generation Lakehouse: The Intelligent Future Engine". We have a packed agenda…

apachehudi's tweet image. 🚀 Big news for Hudi Community! We're back with the Apache Hudi Meetup | ASIA (Chinese), and this time we're hosted by the incredible team at @JD_Corporate
(京东) !

Get ready to explore the "Next-Generation Lakehouse: The Intelligent Future Engine".

We have a packed agenda…

Vinoth Chandar

@byte_array

29 de set. de

💰🔥 Spark’s default autoscaler = higher latency scaling up, wasted $$ scaling down. Why? It’s based on task backlog, not actual resource usage. Costly flaw. Result: 🐢 Slow scale-ups (e.g. too few Kafka partitions during spikes) 🐢 Slow scale-downs (e.g. many tiny tasks →…

byte_array's tweet image. 💰🔥 Spark’s default autoscaler = higher latency scaling up, wasted $$ scaling down.

Why? It’s based on task backlog, not actual resource usage. Costly flaw.

Result:
🐢 Slow scale-ups (e.g. too few Kafka partitions during spikes)
🐢 Slow scale-downs (e.g. many tiny tasks →…

Vinoth Chandar

@byte_array

26 de set. de

📊 If you’re using Apache Spark on EMR (or anywhere really), you need better visibility into where your compute spend is going. At Onehouse, we kept seeing the same pattern across Hudi and non-Hudi users alike: 👉 Jobs were under-optimized 👉 Executors were sitting idle 👉…

byte_array's tweet card. Spark Analyzer Demo: Measure and evaluate your Apache Spark™ Applic...

youtube.com

YouTube

Spark Analyzer Demo: Measure and evaluate your Apache Spark™ Applic...

Fonte: youtube.com

Vinoth Chandar

@byte_array

24 de set. de

🗞️ OLD NEWS: but worth a shout-out. Keys are optional in Hudi ... One of Hudi’s core goals was to remove friction from building data lakes. That’s why — for a long time now — Hudi has quietly supported auto-generation of record keys. No need to think up a key field just to get…