jcsherin's profile picture. Programmer

Sherin Jacob

@jcsherin

Programmer

Sherin Jacob reposted

New paper by Nancy Lynch summarizing her career's influence on the field of distributed computing. arxiv.org/pdf/2502.20468 If you don't know who she is, she's the L in FLP and DLS. @MarcJBrooker has a good summary article: brooker.co.za/blog/2014/05/1…

samlafer's tweet image. New paper by Nancy Lynch summarizing her career's influence on the field of distributed computing.
arxiv.org/pdf/2502.20468

If you don't know who she is, she's the L in FLP and DLS. @MarcJBrooker has a good summary article: brooker.co.za/blog/2014/05/1…

Sherin Jacob reposted

Been working on a tiny LLM service to help me write prompts just like regular well-typed application code. Here's a sample use case - map freeform text to an address form:


Sherin Jacob reposted

For everyone interested in data infra, want to get a quick sense of how big data works, how data systems are designed, and what the tradeoffs are, start with this share from @MOVNTDQ, really nice intro! intro-data-system.xiangpeng.systems

OnlyXuanwo's tweet image. For everyone interested in data infra, want to get a quick sense of how big data works, how data systems are designed, and what the tradeoffs are, start with this share from @MOVNTDQ, really nice intro!

intro-data-system.xiangpeng.systems

Sherin Jacob reposted

Excited to announce a new side project, a power user terminal UI for your personal finances: moneyflow.dev For years I've used personal finance tools like Mint and now Monarch. The data cleaning can be slow and tedious, so I made this to speed that up!


Sherin Jacob reposted

@ApacheDataFusio 's policy for AI assisted contribution: AI is great, but not AI dumps: maintainers could finish the task faster by using AI directly, and the submitters gain little knowledge when acting as a pass through AI proxy. datafusion.apache.org/contributor-gu…


Sherin Jacob reposted

First one is: "Speedrunning the lakehouse" by Jacopo Tagliabue (CTO of Bauplan) He asks: What if we started from scratch? Building a lakehouse infrastructure from scratch. Hilarious, funny, and informative youtube.com/watch?v=dvBRC9…

ngeloxyz's tweet card. Speedrunning the Lakehouse

youtube.com

YouTube

Speedrunning the Lakehouse


Sherin Jacob reposted

We use asserts all the time in Turso DB and also in the Turso Server. They're in release builds and shipped to production. And yes, they could crash the server. Asserts are my favorites, and I use them whenever possible. Just yesterday I merged a PR that contained asserts and…

iavins's tweet image. We use asserts all the time in Turso DB and also in the Turso Server. They're in release builds and shipped to production. And yes, they could crash the server.

Asserts are my favorites, and I use them whenever possible. Just yesterday I merged a PR that contained asserts and…

Arguably, Go doesn't have asserts because, well, Pike doesn't like them 🥲

iavins's tweet image. Arguably, Go doesn't have asserts because, well, Pike doesn't like them 🥲


New post -- A B+Tree Node Underflows: Merge or Borrow? jacobsherin.com/posts/2025-08-… An interesting engineering trade-off I stumbled upon implementing a concurrent B+Tree from scratch; where production databases diverge from textbook algorithms, and each does it their own way.


Sherin Jacob reposted

Our new thrift parser in the Rust @ApacheParquet implementation is a 🎁 that keeps on giving performance wise 🚀 github.com/apache/arrow-r… We are also working on a blog post that has a deeper explanation

andrewlamb1111's tweet image. Our new thrift parser in the Rust @ApacheParquet  implementation is a 🎁 that keeps on giving performance wise 🚀 github.com/apache/arrow-r…

We are also working on a blog post that has a deeper explanation

Sherin Jacob reposted

So instead of working together, everyone (including us) released their own format: → @velox_lib Nimble: github.com/facebookincuba…@cwi_da FastLanes: github.com/cwida/FastLanes@SpiralDB Vortex: vortex.dev


Sherin Jacob reposted

The sordid backstory is that there was an collaboration attempt to unify on a single format with CMU, Tsinghua, Meta, CWI, Voltron, Nvidia, and SpiralDB. The plan was to create a consortium and start with Meta's Nimble. But then lawyers got involved and it all fell apart.


Sherin Jacob reposted

Dynamic Filters for TopK and Join queries landing in DataFusion 50.0.0: datafusion.apache.org/blog/2025/09/1…

andrewlamb1111's tweet image. Dynamic Filters for TopK and Join queries landing in DataFusion 50.0.0: datafusion.apache.org/blog/2025/09/1…

Sherin Jacob reposted

People asked me about how OpenDAL makes money: the answer is it doesn’t. OpenDAL is for public goods, it helps you to access storage services and make money 🫡


Sherin Jacob reposted

Tobias Schmidt (TUM) at @VLDBconf presented SQLStorm, which uses LLMs to generate a huge amount of large queries. SQLStorm now has 18K different complex queries and runs on a large real-world dataset (stackoverflow) paper: vldb.org/pvldb/vol18/p4… code: github.com/SQL-Storm/SQLS…

peterabcz's tweet image. Tobias Schmidt (TUM) at @VLDBconf presented SQLStorm, which uses LLMs to generate a huge amount of large queries.

SQLStorm now has 18K different complex queries and runs on a large real-world dataset (stackoverflow)  

paper: vldb.org/pvldb/vol18/p4…
code: github.com/SQL-Storm/SQLS…
peterabcz's tweet image. Tobias Schmidt (TUM) at @VLDBconf presented SQLStorm, which uses LLMs to generate a huge amount of large queries.

SQLStorm now has 18K different complex queries and runs on a large real-world dataset (stackoverflow)  

paper: vldb.org/pvldb/vol18/p4…
code: github.com/SQL-Storm/SQLS…

Sherin Jacob reposted

Recording of "Introduction to Variant in @ApacheParquet ": youtube.com/watch?v=nlOJD7… Here are the slides: docs.google.com/presentation/d…

andrewlamb1111's tweet image. Recording of "Introduction to Variant in @ApacheParquet ": youtube.com/watch?v=nlOJD7…

Here are the slides: docs.google.com/presentation/d…

Sherin Jacob reposted

How can you slow down a program? And perhaps more importantly, why would you? Blog post on our upcoming @VMIL2025 paper. stefan-marr.de/2025/08/how-to… The research was led by @Humphrey_HCB.


Sherin Jacob reposted

One improvement regarding benchmaxxing is having thousands of diverse benchmark queries instead of dozens. Plugging the new SQLStorm paper below ;)

maxikuschewski's tweet image. One improvement regarding benchmaxxing is having thousands of diverse benchmark queries instead of dozens. Plugging the new SQLStorm paper below ;)

Sherin Jacob reposted

I'm excited to share that our paper (in collaboration with @peterabcz ) has been accepted at VLDB 2025 in London and will be presented there: The FastLanes File Format In this paper, we introduce the FastLanes file format with Expression Encoding—a new way to define and combine…


Sherin Jacob reposted

It is a common misconception that @ApacheParquet files are restricted to basic statistics. Footer metadata and offset-based addressing permit user-defined index structures today. Latest @ApacheDataFusio blog from Qi Zhi, Jigao Luo and myself explains how datafusion.apache.org/blog/2025/07/1…

andrewlamb1111's tweet image. It is a common misconception that @ApacheParquet files are restricted to basic statistics. Footer metadata and offset-based addressing permit user-defined index structures today.

Latest @ApacheDataFusio  blog from Qi Zhi, Jigao Luo and myself explains how datafusion.apache.org/blog/2025/07/1…

United States Trends

Loading...

Something went wrong.


Something went wrong.