Sherin Jacob

@jcsherin

Programmer

jacobsherin.com

Joined September 2015

285Posts 265Followers 157Following

You might like

@jasim_ab

@sajithamma

@sreekanthgs

@sunilkumar_g56

@piratekp

Sherin Jacob reposted

samlaf

@samlafer

Nov 3

New paper by Nancy Lynch summarizing her career's influence on the field of distributed computing. arxiv.org/pdf/2502.20468 If you don't know who she is, she's the L in FLP and DLS. @MarcJBrooker has a good summary article: brooker.co.za/blog/2014/05/1…

samlafer's tweet image. New paper by Nancy Lynch summarizing her career's influence on the field of distributed computing.
arxiv.org/pdf/2502.20468

If you don't know who she is, she's the L in FLP and DLS. @MarcJBrooker has a good summary article: brooker.co.za/blog/2014/05/1…

Sherin Jacob reposted

Jasim

@jasim_ab

Oct 31

Been working on a tiny LLM service to help me write prompts just like regular well-typed application code. Here's a sample use case - map freeform text to an address form:

Sherin Jacob reposted

Xuanwo

@OnlyXuanwo

Oct 29

For everyone interested in data infra, want to get a quick sense of how big data works, how data systems are designed, and what the tradeoffs are, start with this share from @MOVNTDQ, really nice intro! intro-data-system.xiangpeng.systems

OnlyXuanwo's tweet image. For everyone interested in data infra, want to get a quick sense of how big data works, how data systems are designed, and what the tradeoffs are, start with this share from @MOVNTDQ, really nice intro!

intro-data-system.xiangpeng.systems

Sherin Jacob reposted

Wes McKinney

@wesmckinn

Oct 27

Excited to announce a new side project, a power user terminal UI for your personal finances: moneyflow.dev For years I've used personal finance tools like Mint and now Monarch. The data cleaning can be slow and tedious, so I made this to speed that up!

Sherin Jacob reposted

Andrew Lamb

@andrewlamb1111

Oct 27

@ApacheDataFusio 's policy for AI assisted contribution: AI is great, but not AI dumps: maintainers could finish the task faster by using AI directly, and the submitters gain little knowledge when acting as a pass through AI proxy. datafusion.apache.org/contributor-gu…

Sherin Jacob reposted

Angelo 🇵🇷

@ngeloxyz

Oct 19

First one is: "Speedrunning the lakehouse" by Jacopo Tagliabue (CTO of Bauplan) He asks: What if we started from scratch? Building a lakehouse infrastructure from scratch. Hilarious, funny, and informative youtube.com/watch?v=dvBRC9…

ngeloxyz's tweet card. Speedrunning the Lakehouse

youtube.com

YouTube

Speedrunning the Lakehouse

Source: youtube.com

Sherin Jacob reposted

v

@iavins

Oct 18

We use asserts all the time in Turso DB and also in the Turso Server. They're in release builds and shipped to production. And yes, they could crash the server. Asserts are my favorites, and I use them whenever possible. Just yesterday I merged a PR that contained asserts and…

iavins's tweet image. We use asserts all the time in Turso DB and also in the Turso Server. They're in release builds and shipped to production. And yes, they could crash the server.

Asserts are my favorites, and I use them whenever possible. Just yesterday I merged a PR that contained asserts and…

v

@iavins

Oct 17

Arguably, Go doesn't have asserts because, well, Pike doesn't like them 🥲

Sherin Jacob

@jcsherin

Oct 14

New post -- A B+Tree Node Underflows: Merge or Borrow? jacobsherin.com/posts/2025-08-… An interesting engineering trade-off I stumbled upon implementing a concurrent B+Tree from scratch; where production databases diverge from textbook algorithms, and each does it their own way.

Sherin Jacob reposted

Andrew Lamb

@andrewlamb1111

Oct 10

Our new thrift parser in the Rust @ApacheParquet implementation is a 🎁 that keeps on giving performance wise 🚀 github.com/apache/arrow-r… We are also working on a blog post that has a deeper explanation

andrewlamb1111's tweet image. Our new thrift parser in the Rust @ApacheParquet implementation is a 🎁 that keeps on giving performance wise 🚀 github.com/apache/arrow-r…

We are also working on a blog post that has a deeper explanation

Sherin Jacob reposted

Andy Pavlo (@andypavlo.bsky.social)

@andy_pavlo

Oct 1

So instead of working together, everyone (including us) released their own format: → @velox_lib Nimble: github.com/facebookincuba… → @cwi_da FastLanes: github.com/cwida/FastLanes → @SpiralDB Vortex: vortex.dev

andy_pavlo's tweet card. Vortex is an extensible, state-of-the-art columnar file format, with associated tools for working with compressed Apache Arrow arrays in-memory, on-disk, and over-the-wire.

Vortex | An extensible, SOTA columnar file format

Source: vortex.dev

Sherin Jacob reposted

Andy Pavlo (@andypavlo.bsky.social)

@andy_pavlo

Oct 1

The sordid backstory is that there was an collaboration attempt to unify on a single format with CMU, Tsinghua, Meta, CWI, Voltron, Nvidia, and SpiralDB. The plan was to create a consortium and start with Meta's Nimble. But then lawyers got involved and it all fell apart.

Sherin Jacob reposted

Andrew Lamb

@andrewlamb1111

Sep 11

Dynamic Filters for TopK and Join queries landing in DataFusion 50.0.0: datafusion.apache.org/blog/2025/09/1…

Sherin Jacob reposted

Xuanwo

@OnlyXuanwo

Sep 14

People asked me about how OpenDAL makes money: the answer is it doesn’t. OpenDAL is for public goods, it helps you to access storage services and make money 🫡

Sherin Jacob reposted

Peter Boncz

@peterabcz

Sep 4

Tobias Schmidt (TUM) at @VLDBconf presented SQLStorm, which uses LLMs to generate a huge amount of large queries. SQLStorm now has 18K different complex queries and runs on a large real-world dataset (stackoverflow) paper: vldb.org/pvldb/vol18/p4… code: github.com/SQL-Storm/SQLS…

peterabcz's tweet image. Tobias Schmidt (TUM) at @VLDBconf presented SQLStorm, which uses LLMs to generate a huge amount of large queries.

SQLStorm now has 18K different complex queries and runs on a large real-world dataset (stackoverflow)

paper: vldb.org/pvldb/vol18/p4…
code: github.com/SQL-Storm/SQLS…

Sherin Jacob reposted

Andrew Lamb

@andrewlamb1111

Sep 6

Recording of "Introduction to Variant in @ApacheParquet ": youtube.com/watch?v=nlOJD7… Here are the slides: docs.google.com/presentation/d…

andrewlamb1111's tweet image. Recording of "Introduction to Variant in @ApacheParquet ": youtube.com/watch?v=nlOJD7…

Here are the slides: docs.google.com/presentation/d…

Sherin Jacob reposted

Stefan Marr

@smarr

Aug 27

How can you slow down a program? And perhaps more importantly, why would you? Blog post on our upcoming @VMIL2025 paper. stefan-marr.de/2025/08/how-to… The research was led by @Humphrey_HCB.

smarr's tweet card. Making programs slower can be useful to find...

How to Slow Down a Program? And Why it Can Be Useful.

Source: stefan-marr.de

Sherin Jacob reposted

Maximilian Kuschewski

@maxikuschewski

Aug 5

One improvement regarding benchmaxxing is having thousands of diverse benchmark queries instead of dozens. Plugging the new SQLStorm paper below ;)

maxikuschewski's tweet image. One improvement regarding benchmaxxing is having thousands of diverse benchmark queries instead of dozens. Plugging the new SQLStorm paper below ;)

Sherin Jacob reposted

Andrew Lamb

@andrewlamb1111

Jul 29

Mutli-level merge sort queued up for DataFusion 50.0.0 next month: github.com/apache/datafus… Thanks to @rluvaton and Yongting You

andrewlamb1111's tweet card. Which issue does this PR close? Closes A complete solution for stable and safe sort with spill #14692. Rationale for this change We need merge sort that does not fail with out of memory What chan...

feat: add multi level merge sort that will always fit in memory by rluvaton · Pull Request #15700 ·...

Source: github.com

Sherin Jacob reposted

Azim Afroozeh

@afroozeh3

Jul 25

I'm excited to share that our paper (in collaboration with @peterabcz ) has been accepted at VLDB 2025 in London and will be presented there: The FastLanes File Format In this paper, we introduce the FastLanes file format with Expression Encoding—a new way to define and combine…

afroozeh3's tweet card. Next-Gen Big Data File Format. Contribute to cwida/FastLanes development by creating an account on GitHub.

FastLanes/docs/specification.pdf at dev · cwida/FastLanes

Source: github.com

Sherin Jacob reposted

Andrew Lamb

@andrewlamb1111

Jul 14

It is a common misconception that @ApacheParquet files are restricted to basic statistics. Footer metadata and offset-based addressing permit user-defined index structures today. Latest @ApacheDataFusio blog from Qi Zhi, Jigao Luo and myself explains how datafusion.apache.org/blog/2025/07/1…