Tabular's Iceberg vision goes from Netflix and chill to database thrill

Promise of neutral data layer between vendors' vested interests attracts $26M

It is a year since a flurry of vendors including Snowflake, Google, and Cloudera backed the Apache Iceberg table format – promising to bring analytics to data wherever it sits.

Ryan Blue co-developed the table format while working for Netflix, and Tabular – the company he went on to found around it – has just closed a $26 million Series B funding round led by Altimeter Capital, with participation from Andreessen Horowitz and Zetta Venture Partners.

Speaking to The Register, Blue said Tabular aimed to make Iceberg a kind of neutral "database storage" between the blob storage and the data analytics vendors.

In the decade since Snowflake and others pioneered the separation of storage and compute to allow users to scale them independently in the cloud, the market for cloud-based analytics and data platforms relying on the approach has grown into a high-stakes battlefield.

Last week, Databricks sucked in $500 million in Series I funding, valuing it at a nominal $43 billion, while Snowflake was valued at a staggering $120 billion shortly after its 2020 IPO.

At the same time, there is not always a sense of neutrality in the market when it comes to bringing analytics engines to data outside a vendor's systems. The promise is there. Last year, Snowflake, Cloudera, and Google lined up behind Iceberg, an Apache open source project. Since then, AWS and IBM have come into the fold. The idea is that users can bring Snowflake's analytics engines to data stored outside its product portfolio in the Iceberg table format. Users only pay Snowflake for compute, not for data storage or movement.

On the other side of the fence, Salesforce, SAP, and Microsoft have lined up behind the Delta Lake table format developed by Databricks, but open sourced to the Linux Foundation. To clarify, SAP and Microsoft have said they would support Iceberg in time, while Databricks earlier this year announced support for Iceberg and another table format, Apache Hudi. Even Oracle said its MySQL-based HeatWave data warehouse would support these table formats in the future, starting with Iceberg and Delta Lake. But for Blue, it is a question of emphasis and who users will trust to give them the best performance.

"Storage as an object store is just dumb," the Iceberg co-creator told us. "That's not to say that they don't do a lot of work to make S3 a pretty amazing product, but it's dumb in the sense that it doesn't understand the data, and it doesn't do database-like tasks. It will never compact your data files; it doesn't look at the timestamp on a row and get rid of it if it gets too old. Those are tasks for the database storage layer. Tabular is universal database storage. We purposely want to work with any compute engine on top."

Blue added: "Imagine you use two vendors, Databricks and Snowflake. They are both supporting Iceberg, at least for interchange. You can read through Iceberg tables stored in Databricks. But do you trust Databricks to expose that in the right way that's going to make Snowflake really performant? Basically, every customer that I talked to doesn't.

"We have vendors competing not just for workloads, the dataset, and everything that uses that dataset, but they're competing to store all of your data: your entire lake or your entire warehouse or whatever those two things merged to become. That is really concerning because a database vendor is always going to make that storage – and their compute offer – look best. We really need to separate those layers, and that's where Tabular comes in."

Because of the performance and usability challenges inherent in Apache Hive tables in large and demanding data lake environments, Ryan and fellow Netflix data team member Dan Weeks donated Iceberg to the Apache Software Foundation as an open source project in November 2018. Together they founded Tabular in 2021.

Earlier this year, Tabular launched its first product, a system for a "headless" data warehouse. Users can start for free on up to 1TB of data, after which the company charges based on the amount of data under management.

In its architectural diagram, Tabular sits between Iceberg and popular analytics compute engines including Apache Spark, Trino, Python, and Snowflake to provide services such as ingestion, optimization, cataloging, and role-based access control.

With Iceberg, the promise is to untangle the storage and computing in terms of business and economics, as well as technology, to give users greater freedom in choosing the tools they want while optimizing costs.

Blue pointed out that although Snowflake may have pioneered the separation of storage and compute, they were still vertically integrated in its stack.

"It's their storage and their compute and you have to go through their package in order to use it," Blue said. "Iceberg is changing the game because you can actually share the storage underneath and across engines. And that is the transformation that is happening today."

For its part, Databricks has denied that it tightly controls development of the Delta Lake format, and said it welcomes the introduction of other formats. Speaking to The Register late last year, CEO and co-founder Ali Ghodsi said Iceberg, Hudi, and Delta were similar and likely to be adopted across the board by the majority of vendors. But he argued that data warehouse vendors would not be incentivized to offer optimal support for the standards because they make money from storing data in their systems.

Whatever the outcome of the growing interest in table formats to create economic separation of storage and compute, Tabular has launched into a market which is suddenly a focus for some of the world's largest software vendors. It will just need to see whether the $37 million total investment is sufficient to survive the shark tank. ®

More about

TIP US OFF

Send us news


Other stories you might like