Lance Format Deep Dive, Samsara & Summer Events Circuit

Lance Format Deep Dive, Samsara & Summer Events Circuit

๐ŸŽค Catch Us on Stage This June!

Weโ€™re thrilled to be speaking at several top-tier events this month โ€” alongside our customers โ€” sharing real-world insights from scaling enterprise AI systems in production.

If youโ€™re attending the AI Engineering World Fair (June 3โ€“5), Data + AI Summit (June 9โ€“12), or the Toronto Machine Learning Summit (June 13โ€“18), donโ€™t miss our sessions across multiple tracks. Come say hi and learn what weโ€™ve been building!

AI Engineering World Fair

Data + AI Summit

Toronto Machine Learning Summit

LanceDB Speaking Events

Summer Tech Events


โš™๏ธ Lance Format Deep Dives

In addition to our highly requested deep dives into the Lance format, we also shared our perspective on the future of open source table formats โ€” inspired by feedback and questions from the Iceberg community.

Curious where things are headed? Dig in ๐Ÿ‘‡ [

Columnar File Readers in Depth: Column Shredding

Record shredding is a classic method used to transpose rows of potentially nested data into a flattened tree of buffers that can be written to the file. A similar technique, cascaded encoding, has recently emerged, that converts those arrays into a flattened tree of compressed buffers. In this article we

LanceDB BlogWeston Pace

](GHOST_URL/columnar-file-readers-in-depth-column-shredding/)[

Columnar File Readers in Depth: Repetition & Definition Levels

Repetition and definition levels are a method of converting structural arrays into a set of buffers. The approach was made popular in Parquet and is one of the key ways Parquet, ORC, and Arrow differ. In this blog I will explain how they work by contrasting them with validity & offsets

LanceDB BlogWeston Pace

](GHOST_URL/columnar-file-readers-in-depth-repetition-definition-levels/)[

The Future of Open Source Table Formats: Apache Iceberg and Lance

As the scale of data continues to grow, open-source table formats have become essential for efficient data lake management. Apache Iceberg has emerged as a leader in this space, while new formats like Lance are introducing optimizations for specific workloads. In this post, weโ€™ll explore how Iceberg and Lance

LanceDB BlogJack Ye

](GHOST_URL/the-future-of-open-source-table-formats-iceberg-and-lance/)

๐ŸŽฅ Event Recap: AI at Scale with Samsara

The Samsara team is harnessing LanceDB to simplify and streamline AI data infrastructure for massive, real-world datasets.

In May, our cofounder Chang She joined Samsaraโ€™s AI Speaker Series, where he shared cutting-edge insights on multimodal AI and the evolving landscape of AI agents.

Missed it? Catch the recording below ๐Ÿ‘‡

Scaling AI Data Infrastructure: A Multimodal Approach


๐Ÿ” LanceDB Enterprise Product News

  • Smoother concurrent upserts: Upsert operations are now conflict-free in typical workloads, so you can write without worrying about collisions.
  • Significantly reduced storage costs: Reduce object store operations by up to 95% with small files loaded with a single I/O instead of multiple IOPS - ideal for small-table workloads.
  • Filter binary data with ease: Now query large binary columns directly in your filters โ€“ no workarounds needed.
  • Optimized GCP deployment tuning: Fine-tune weak consistency and concurrency limits to better balance performance, cost, and flexibility.
  • **Intuitive embedding visualization: **New UMAP visualizations help you explore and understand vector data at a glance.

Learn more

0:00

                        /0:15

1ร—

Embedding Visualization shown in LanceDB Cloud (Beta)


๐Ÿ‘ฅ Community Contributions

๐Ÿ’ก

A heartfelt thank you to our community contributors of lance and lancedb this month: @yanghua@frankliee@leaves12138@Jay-ju@KazuhitoT@majin1102@upczsh@renato2099@HaochengLIU@omahs@xaptronic@acoliver


๐Ÿ› ๏ธ Open Source Releases Spotlight

  • Boolean logic for full-text search: Combine filters with AND/OR or &/| โ€” full-text search now works the way you think.

    fts_query = MatchQuery(“puppy”, “text”) & MatchQuery(“happy”, “text”)

  • **Faster, smarter full-text indexing: **Compression and optimized search algorithms speed up index builds and boost performance at scale.

  • **No more stalled upserts: **A new timeout setting ensures merge_insert operations wonโ€™t hang forever.

    table.merge_insert(id) .when_matched_update_all() .when_not_matched_insert_all() .execute(new_data, timeout=timedelta(seconds=10)) )

  • **Flexible phrase matching: **Control how loose or tight your matches are with the slop parameter.

    fts_query=PhraseQuery(“frodo was happy”, “text”, slop=2)

  • Spark compatibility built in: Works with multiple Spark versions out of the box โ€” just drop in the bundled JAR and go. Quick start โ†’

Jasmine Wang

Jasmine Wang