Teradata is adding support for two open table formats, Apache Iceberg and Linux Foundation’s Delta Lake, to its multi-cloud analytics platform VantageCloud Lake and its AI and machine learning engine AI Unlimited.
Typically, open table formats are architected to generate performance for data lakes using cloud-based object storage. The performance is achieved by creating a layer of abstraction atop a data lake via the use of columnar storage and metadata management that allows enterprises to manage and update data more efficiently.
The fundamental advantage of using an open table format is that enterprises can modify their data schema or partitioning strategy without having to reprocess the entire dataset.
A number of Teradata’s rivals, including providers of cloud-based analytics and software such as Snowflake, Starburst, Dremio, Cloudera, and Clickhouse, already support Apache Iceberg.
The Linux Foundation’s Delta Live tables format is supported by the likes of Google Cloud, AWS, and Databricks.
The addition of support for the open table formats will, according to Teradata, result in its customers being able to allow cross-read and cross-write data stored in multiple open table formats.
This interoperability extends to AWS Glue, Unity, and Apache Hive catalogs and works in multi-cloud and multi-data lake environments, the company said, adding that support for the open table formats will be available for VantageCloud Lake and AI Unlimited on AWS and Azure in June 2024.
AI Unlimited will be available for purchase under public preview on the AWS and Azure Marketplaces in the second quarter of the year.
Teradata is also integrating third-party tools such as Airbyte Cloud, Apache Airflow, and dbt.
The Airbyte Cloud integration will help streamline data ingestion into VantageCloud with a fully managed and hosted service that eliminates the need for time consuming infrastructure setup and management, while the Apache Airflow integration will allow enterprise teams to programmatically author, schedule, and monitor workflows.
The dbt tool integration, on the other hand, helps manage the transform part of the extract, load, and transform (ETL) process. It can be used as a tool for data transformation in databases, data lakes, and data warehouses, the company said, adding that all the integrations have already made generally available.