#deltatable 検索結果

when the son is stronger than the father 😅🤓 #deltatable #datafusion

mim_djo's tweet image. when the son is stronger than the father 😅🤓
#deltatable #datafusion

The fastest way to serve #DeltaTable and #ApacheIceberg ? #PowerBI Direct Lake mode from #onelake , no debate there. What’s wild is it stays blazing fast even on the lowest #MicrosoftFabric tier (F2). 🚀 github.com/djouallah/fabr…

mim_djo's tweet image. The fastest way to serve #DeltaTable and #ApacheIceberg ? #PowerBI Direct Lake mode from #onelake , no debate there. 

What’s wild is it stays blazing fast even on the lowest #MicrosoftFabric tier (F2). 🚀

github.com/djouallah/fabr…

when you ingest a lot of small files into #DeltaTable make sure you alter the delta.logRetentionDuration to a more sensible duration otherwise you end up with a Log bigger than the Data itself :) to be clear only myself to blame here :)

mim_djo's tweet image. when you ingest a lot of small files into #DeltaTable make sure you alter the delta.logRetentionDuration to a more sensible duration otherwise you end up with a Log bigger than the Data itself :)
to be clear only myself to blame here :)

#duckdb is adding checkpoint to their ducklake format, multiple table transactions as pip install , we live in an amazing time !!! then you can expose the data as a #deltatable self promotion 😅 pypi.org/project/duckla…

mim_djo's tweet image. #duckdb is adding checkpoint to their ducklake format, 
multiple table transactions as pip install , we live in an amazing time !!! 

then you can expose the data as a #deltatable 
self promotion 😅
pypi.org/project/duckla…

if you are serious about #deltatable, you need your own #parquet writer, with the support of partition, @daft_dataframe is the new winner of my "ETL Benchmarks" based on my own data @duckdb step up your game 😋😅😋

mim_djo's tweet image. if you are serious about #deltatable, you need your own #parquet writer, with the support of partition, @daft_dataframe is the new winner of my "ETL Benchmarks" based on my own data
@duckdb step up your game 😋😅😋

#duckdb ODBC driver now support #deltatable in Direct Query mode, this table is hosted in Cloudflare R2 !!

mim_djo's tweet image. #duckdb ODBC driver now support #deltatable in Direct Query mode, this table is hosted in Cloudflare R2 !!

i asked chatgpt to generate a #deltatable log from #ducklake have something working 😅

mim_djo's tweet image. i asked chatgpt to generate a #deltatable log from #ducklake
have something working 😅

a new release of #datafusion 34, still reading #Deltatable via arrow is suboptimal compared to reading Parquet Directly :( something to do with passing stats to get correct join orders. colab.research.google.com/drive/1sJD7w6l…

mim_djo's tweet image. a new release of #datafusion 34, still reading #Deltatable via arrow is suboptimal compared to reading Parquet Directly :( something to do with passing stats to get correct join orders.
colab.research.google.com/drive/1sJD7w6l…

🎉 Exciting updates in @UnstructuredIO v0.10.5! 🎉 ✨ Features: introducing Delta Tables connector✨ Enhancements: added TIFF support & improved OCR mode for PDFs & images. Discover more: github.com/Unstructured-I… #DataAnalytics #DataScience #DeltaTable


The new `get_add_actions` API for delta-rs uniquely supports inspecting partition values, record counts, and stats of files in your #deltatable. Very useful for deep dives into the affects of re-partitioning or Z-Ordering. A lot of cool tools can be built on top of this feature!


V-Order has an 15% impact on average write times but provides up to 50% more compression #spark #deltatable


apparently this is a legit #deltatable , basically no log replay, just snapshots , spark don't like it, but #PowerBI direct lake read it just fine (the only reader i care about), just export snapshot from #ducklake and you have a proper metadata sync out of thin air !!!

mim_djo's tweet image. apparently this is a legit #deltatable , basically no log replay, just snapshots , spark don't like it, but #PowerBI direct lake read it just fine (the only reader i care about), 
just export snapshot from #ducklake and you have a proper metadata sync out of thin air !!!

New Post. …stindatacaughtinthetrace.blogspot.com/2023/12/Compar… Comparing the capability of #Deltatable Change Data Feed to #SQLServer Change Data Capture.


#DeltaTable in #AzureSynapse is much improved by the Apache Spark 3.3 updates. Having previously used Fetch XML for a custom app import this is much simpler.


When you choose a #deltatable format a more than 200 MB CSV File will get compressed within 50 MB. Good space saving Instead. #datalakehouse #benifits


The fastest way to serve #DeltaTable and #ApacheIceberg ? #PowerBI Direct Lake mode from #onelake , no debate there. What’s wild is it stays blazing fast even on the lowest #MicrosoftFabric tier (F2). 🚀 github.com/djouallah/fabr…

mim_djo's tweet image. The fastest way to serve #DeltaTable and #ApacheIceberg ? #PowerBI Direct Lake mode from #onelake , no debate there. 

What’s wild is it stays blazing fast even on the lowest #MicrosoftFabric tier (F2). 🚀

github.com/djouallah/fabr…

#duckdb is adding checkpoint to their ducklake format, multiple table transactions as pip install , we live in an amazing time !!! then you can expose the data as a #deltatable self promotion 😅 pypi.org/project/duckla…

mim_djo's tweet image. #duckdb is adding checkpoint to their ducklake format, 
multiple table transactions as pip install , we live in an amazing time !!! 

then you can expose the data as a #deltatable 
self promotion 😅
pypi.org/project/duckla…

#PowerBI will analyze your #deltatable and load only the delta , pun intended :) basically getting the same results using less work , it is like CDC to RAM 😁


first time, I built a python package, it does export #ducklake metadata to #deltatable, test it with #PowerBI and it works, hopefully we get a better solution from #duckdb :) pypi.org/project/duckla…


🚀 #DuckDB is adding support for #ApacheIceberg V3 deletion vectors , which are compatible with #DeltaTable DV! This is a huge step forward. There’s now a real possibility that both ecosystems will converge on a common data format. github.com/duckdb/duckdb-…


apparently this is a legit #deltatable , basically no log replay, just snapshots , spark don't like it, but #PowerBI direct lake read it just fine (the only reader i care about), just export snapshot from #ducklake and you have a proper metadata sync out of thin air !!!

mim_djo's tweet image. apparently this is a legit #deltatable , basically no log replay, just snapshots , spark don't like it, but #PowerBI direct lake read it just fine (the only reader i care about), 
just export snapshot from #ducklake and you have a proper metadata sync out of thin air !!!

i asked chatgpt to generate a #deltatable log from #ducklake have something working 😅

mim_djo's tweet image. i asked chatgpt to generate a #deltatable log from #ducklake
have something working 😅

#duckdb ODBC driver now support #deltatable in Direct Query mode, this table is hosted in Cloudflare R2 !!

mim_djo's tweet image. #duckdb ODBC driver now support #deltatable in Direct Query mode, this table is hosted in Cloudflare R2 !!

This is very big deal, experimental write support for #Deltatable using delta RS kernel, the users here are Query Engines not end users github.com/delta-io/delta…

github.com

Release v0.5.0 · delta-io/delta-kernel-rs

release 0.5.0


@mim_djo, detecting changes downstream with #deltatable sounds like a solid way to keep things fresh. Incremental updates really save time—efficient workflows, indeed


if you are serious about #deltatable, you need your own #parquet writer, with the support of partition, @daft_dataframe is the new winner of my "ETL Benchmarks" based on my own data @duckdb step up your game 😋😅😋

mim_djo's tweet image. if you are serious about #deltatable, you need your own #parquet writer, with the support of partition, @daft_dataframe is the new winner of my "ETL Benchmarks" based on my own data
@duckdb step up your game 😋😅😋

V-Order has an 15% impact on average write times but provides up to 50% more compression #spark #deltatable


"#deltatable" に一致する結果はありません

when the son is stronger than the father 😅🤓 #deltatable #datafusion

mim_djo's tweet image. when the son is stronger than the father 😅🤓
#deltatable #datafusion

3rd run is 1 min 27 s, using Delta table written by #Deltatable Rust, that's banana !!!

mim_djo's tweet image. 3rd run is 1 min 27 s, using Delta table written by #Deltatable Rust, that's banana !!!

the magic of #Deltatable relative Paths, I moved around my folder and everything keep working just fine

mim_djo's tweet image. the magic of #Deltatable relative Paths, I moved around my folder and everything keep working just fine

Finally !!! it works, writing data to #Fabric Onelake using #deltatable Rust cc @JoshCaplan1984 ♥️🪅🎉🥳

mim_djo's tweet image. Finally !!! it works, writing data to #Fabric Onelake using  #deltatable Rust
cc @JoshCaplan1984 ♥️🪅🎉🥳

#duckdb is adding checkpoint to their ducklake format, multiple table transactions as pip install , we live in an amazing time !!! then you can expose the data as a #deltatable self promotion 😅 pypi.org/project/duckla…

mim_djo's tweet image. #duckdb is adding checkpoint to their ducklake format, 
multiple table transactions as pip install , we live in an amazing time !!! 

then you can expose the data as a #deltatable 
self promotion 😅
pypi.org/project/duckla…

i asked chatgpt to generate a #deltatable log from #ducklake have something working 😅

mim_djo's tweet image. i asked chatgpt to generate a #deltatable log from #ducklake
have something working 😅

#Fabric DWH is faster querying native table vs lakehouse tables although both are #DeltaTable.

mim_djo's tweet image. #Fabric DWH is faster querying native table vs lakehouse tables although both are #DeltaTable.

The fastest way to serve #DeltaTable and #ApacheIceberg ? #PowerBI Direct Lake mode from #onelake , no debate there. What’s wild is it stays blazing fast even on the lowest #MicrosoftFabric tier (F2). 🚀 github.com/djouallah/fabr…

mim_djo's tweet image. The fastest way to serve #DeltaTable and #ApacheIceberg ? #PowerBI Direct Lake mode from #onelake , no debate there. 

What’s wild is it stays blazing fast even on the lowest #MicrosoftFabric tier (F2). 🚀

github.com/djouallah/fabr…

#duckdb ODBC driver now support #deltatable in Direct Query mode, this table is hosted in Cloudflare R2 !!

mim_djo's tweet image. #duckdb ODBC driver now support #deltatable in Direct Query mode, this table is hosted in Cloudflare R2 !!

a new release of #datafusion 34, still reading #Deltatable via arrow is suboptimal compared to reading Parquet Directly :( something to do with passing stats to get correct join orders. colab.research.google.com/drive/1sJD7w6l…

mim_djo's tweet image. a new release of #datafusion 34, still reading #Deltatable via arrow is suboptimal compared to reading Parquet Directly :( something to do with passing stats to get correct join orders.
colab.research.google.com/drive/1sJD7w6l…

#Fabric DWH is still very fast with Delta Table generated by third party tools, in this example hot run, TPCH_SF10 #Deltatable Rust, 7 second

mim_djo's tweet image. #Fabric DWH is still very fast with Delta Table generated by third party tools, in this example hot run, TPCH_SF10 #Deltatable Rust, 7 second

if you are serious about #deltatable, you need your own #parquet writer, with the support of partition, @daft_dataframe is the new winner of my "ETL Benchmarks" based on my own data @duckdb step up your game 😋😅😋

mim_djo's tweet image. if you are serious about #deltatable, you need your own #parquet writer, with the support of partition, @daft_dataframe is the new winner of my "ETL Benchmarks" based on my own data
@duckdb step up your game 😋😅😋

I am unreasonably excited about #Glaredb storage decision #deltatable, I would had the same reaction if it was iceberg too or hudi or any freaking standard. life is too short to build your own storage format.

mim_djo's tweet image. I am unreasonably excited about #Glaredb storage decision #deltatable, I would had the same reaction if it was iceberg too or hudi or any freaking standard. life is too short to build your own storage format.

Loaded TPCH_SF100 to #Fabric Lakehouse using pyarrow and #Deltatable Rust, rowgroup=8M and max file 80M for no particular reason 🤪😅 performance is slightly better than the data loaded using Spark :)

mim_djo's tweet image. Loaded TPCH_SF100 to #Fabric Lakehouse using pyarrow and #Deltatable Rust,  rowgroup=8M and max file 80M for no particular reason 🤪😅 performance is slightly better than the data loaded using Spark :)

some observation on #DeltaTable generated by #Fabric DWH, 2097152 rows per file, sometimes all in one group, and sometimes split on 2 row groups, probably much more complex than that but anyway, it seems to like very big rowgroups !!!!

mim_djo's tweet image. some observation on #DeltaTable generated by #Fabric DWH, 2097152 rows per file, sometimes all in one group,  and sometimes split on 2 row groups, probably much more complex than that but anyway, it seems to like very big rowgroups !!!!

I was expecting a cage fight but it did not happen :) joking aside, I like this open project a lot, and hope will bring more interoperability between Open Table Format #OneTable #Iceberg #Deltatable opensourcedatasummit.com/open-data-foun…

mim_djo's tweet image. I was expecting a cage fight but it did not happen :) joking aside, I like this open project a lot, and hope will bring more interoperability between Open Table Format
#OneTable #Iceberg #Deltatable
opensourcedatasummit.com/open-data-foun…

when you ingest a lot of small files into #DeltaTable make sure you alter the delta.logRetentionDuration to a more sensible duration otherwise you end up with a Log bigger than the Data itself :) to be clear only myself to blame here :)

mim_djo's tweet image. when you ingest a lot of small files into #DeltaTable make sure you alter the delta.logRetentionDuration to a more sensible duration otherwise you end up with a Log bigger than the Data itself :)
to be clear only myself to blame here :)

apparently this is a legit #deltatable , basically no log replay, just snapshots , spark don't like it, but #PowerBI direct lake read it just fine (the only reader i care about), just export snapshot from #ducklake and you have a proper metadata sync out of thin air !!!

mim_djo's tweet image. apparently this is a legit #deltatable , basically no log replay, just snapshots , spark don't like it, but #PowerBI direct lake read it just fine (the only reader i care about), 
just export snapshot from #ducklake and you have a proper metadata sync out of thin air !!!

Loading...

Something went wrong.


Something went wrong.


United States Trends