SaaS

MySQL Heatwave dives into object storage data lakes

Oracle joins the analytics anywhere bandwagon, promises future access to AWS S3

Thu 20 Jul 2023 // 15:36 UTC

Oracle has launched MySQL HeatWave Lakehouse, an extension to its proprietary analytics platform which now supports object storage outside the database.

The analytics system, which was built on top of the open source MySQL database, can query data in the object store in a variety of file formats as well as combine it with data in MySQL. Meanwhile, files in the object store are queried directly by HeatWave without copying the data into the MySQL database, Oracle told us.

The data lake technology supports file formats including CSV, Parquet, and export files from other databases. At the same time, MySQL Autopilot promises to improve performance and scalability without requiring database tuning expertise.

On a 500TB TPC-H benchmark, Oracle claims queries took nine times longer on AWS's data warehouse and 17 times longer on Snowflake and Databricks compared with the new Heatwave datalake. Google's BigQuery would be 36 times slower, Oracle reckons, though it did not publish comparisons with Teradata, the data warehouse vendor founded in 1979.

The system is only available on Oracle Cloud Infrastructure (OCI), but Nipun Agarwal, senior vice veep of MySQL HeatWave, told The Register that Oracle planned to extend the system to query data held in object storage in other clouds including AWS, Azure and GCP.

"One of the important things to note over here is that data in the object store remains in the object store," he said. "We do not copy data from the object store into the MySQL database. Secondly, the processing of this data, whether it's loading or queried, is done by Heatwave not by the MySQL engine. That's what gives it extreme scalability because the Heatwave cluster can scale up to 500 nodes."

Using analytics engines to query data outside their home database is not new. The approach was used by Snowflake, Cloudera and Google's BigQuery with their support for the Apache Iceberg table format. Similarly, Databricks, Microsoft and SAP have endorsed Delta Lake table format, an open source format under the Linux Foundation, created by Databricks.

Commentators and vendors have suggested most vendors will come to support most formats, including Hudi.

Agarwal said Oracle intends HeatWave to support these formats in the future, starting with Iceberg and Delta Lake.

The Autopilot feature offers schema inference, which help users determine data type in object storage before data is analyzed by the query engine.

"We can come up with this mapping, even for files which don't have metadata," Agarwal said. "Autopilot can make these predictions in less than one minute. We invented this technique called adaptive data sampling, which very intelligently scans and samples the file without compromising on the accuracy."

Autopilot also predicts the in-memory representation for a specific data source, the optimal size of the cluster that is needed to compute the data and how long it's going to take to load the data, he said.

Holger Mueller, vice president and principal analyst at Constellation Research, said Oracle had introduced new features to HeatWave in the last three years at a rapid pace. "The HeatWave team has out-innovated all other cloud databases," he claimed.

The move into object storage was "huge," he added, because it "allows users to bring all the data of the enterprise together – into one single query. It is something enterprises have long waited for."

Meanwhile, the ability to query data in AWS, Azure and GCP object storage would appeal to users who want to work across all their enterprise data using Heatwave, he said.

Like any suite model, Oracle Heatwave had the downside of competing with specialist players in any one of its features. "But, at this point, Oracle is more than good enough," Mueller said. ®

Topics

Special Features

Vendor Voice

Resources

SaaS

MySQL Heatwave dives into object storage data lakes

Oracle joins the analytics anywhere bandwagon, promises future access to AWS S3

More about

More about

Narrower topics

Broader topics

More about

More about

More about

Narrower topics

Broader topics

TIP US OFF

Other stories you might like

Tabular's Iceberg vision goes from Netflix and chill to database thrill

Oracle early leader in pointing vectors at business data, say analysts

Oracle at Europe's largest council didn't foresee bankruptcy

Ensure data security at the edge

Oracle's $130M-plus payday still looms on horizon for Larry and Safra

Oracle cloud hardware to reside in Azure datacenters – and Microsoft's good with that

Cryptojackers spread their nets to capture more than just EC2

Do SSD failures follow the bathtub curve? Ask Backblaze

VCs lay $52.5M golden egg for MotherDuck's serverless analytics platform

CERN swaps out databases to feed its petabyte-a-day habit

Alexa's future is pay-to-play, departing Amazon exec predicts

No customer left behind, SAP's Klein tells users angered by cloud-only decision

About Us

Our Websites

Your Privacy