๐ค Catch Us on Stage This June! Weโre thrilled to be speaking at several top-tier events this month โ alongside our customers โ sharing real-world insights from scaling enterprise AI systems in production. If youโre attending the AI Engineering World Fair (June 3โ5), Data + AI Summit (June 9โ12), or the Toronto Machine Learning Summit (June 13โ18), donโt miss our sessions across multiple tracks. Come say hi and learn what weโve been building! โ๏ธ Lance Format Deep Dives In addition to our highly requested deep dives into the Lance format, we also shared our perspective on the future of open source table formats โ inspired by feedback and questions from the Iceberg community. Curious where things are headed? Dig in ๐ [ Columnar File Readers in Depth: Column Shredding Record shredding is a classic method used to transpose rows of potentially nested data into a flattened tree of buffers that can be written to the file. A similar technique, cascaded encoding, has recently emerged, that converts those arrays into a flattened tree of compressed buffers. In this article we LanceDB BlogWeston Pace ](GHOST_URL/columnar-file-readers-in-depth-column-shredding/)[ Columnar File Readers in Depth: Repetition & Definition Levels Repetition and definition levels are a method of converting structural arrays into a set of buffers. The approach was made popular in Parquet and is one of the key ways Parquet, ORC, and Arrow differ. In this blog I will explain how they work by contrasting them with validity & offsets LanceDB BlogWeston Pace ](GHOST_URL/columnar-file-readers-in-depth-repetition-definition-levels/)[ The Future of Open Source Table Formats: Apache Iceberg and Lance As the scale of data continues to grow, open-source table formats have become essential for efficient data lake management. Apache Iceberg has emerged as a leader in this space, while new formats like Lance are introducing optimizations for specific workloads. In this post, weโll explore how Iceberg and Lance LanceDB BlogJack Ye ](GHOST_URL/the-future-of-open-source-table-formats-iceberg-and-lance/) ๐ฅ Event Recap: AI at Scale with Samsara The Samsara team is harnessing LanceDB to simplify and streamline AI data infrastructure for massive, real-world datasets. In May, our cofounder Chang She joined Samsaraโs AI Speaker Series, where he shared cutting-edge insights on multimodal AI and the evolving landscape of AI agents. Missed it? Catch the recording below ๐ Scaling AI Data Infrastructure: A Multimodal Approach ๐ LanceDB Enterprise Product News Smoother concurrent upserts: Upsert operations are now conflict-free in typical workloads, so you can write without worrying about collisions. Significantly reduced storage costs: Reduce object store operations by up to 95% with small files loaded with a single I/O instead of multiple IOPS - ideal for small-table workloads. Filter binary data with ease: Now query large binary columns directly in your filters โ no workarounds needed. Optimized GCP deployment tuning: Fine-tune weak consistency and concurrency limits to better balance performance, cost, and flexibility. **Intuitive embedding visualization: **New UMAP visualizations help you explore and understand vector data at a glance. Learn more 0:00 /0:15 1ร Embedding Visualization shown in LanceDB Cloud (Beta) ๐ฅ Community Contributions ๐ก A heartfelt thank you to our community contributors of lance and lancedb this month: @yanghua@frankliee@leaves12138@Jay-ju@KazuhitoT@majin1102@upczsh@renato2099@HaochengLIU@omahs@xaptronic@acoliver ๐ ๏ธ Open Source Releases Spotlight Boolean logic for full-text search: Combine filters with AND/OR or &/| โ full-text search now works the way you think. fts_query = MatchQuery(“puppy”, “text”) & MatchQuery(“happy”, “text”) **Faster, smarter full-text indexing: **Compression and optimized search algorithms speed up index builds and boost performance at scale. **No more stalled upserts: **A new timeout setting ensures merge_insert operations wonโt hang forever. table.merge_insert(id) .when_matched_update_all() .when_not_matched_insert_all() .execute(new_data, timeout=timedelta(seconds=10)) ) **Flexible phrase matching: **Control how loose or tight your matches are with the slop parameter. fts_query=PhraseQuery(“frodo was happy”, “text”, slop=2) Spark compatibility built in: Works with multiple Spark versions out of the box โ just drop in the bundled JAR and go. Quick start โ