While this might seem pretty straightforward, it involves a change in storage and table format. Iceberg migration is the process of moving data from one table format to Iceberg or introducing Iceberg as a table format to the data. Tenant is an existing customer with only legacy datasets.įigure 3: Size and Count of production datasets migrated to Apache Iceberg Motivations for Migration.Tenant is an existing customer actively building out new integrations having a hybrid of Iceberg and legacy datasets.Tenant is a new customer completely on Apache Iceberg.For each tenant, we can have one of three scenarios: Iceberg records Adobe Experience Platform single-tenant storage architecture exposes us to some interesting challenges when migrating customers to Iceberg. With the introduction of Iceberg, we see a transitional shift in how metadata is captured and recorded. As more data is ingested over time, it becomes difficult to query metadata from Catalog. It is helpful in providing information such as name, description, schema, and applying for permissions, and all metadata recorded on Adobe Experience Platform. Data Lake relies on a Hadoop Distributed File System (HDFS) compatible backend for data storage, which today is the cloud-based storage provided by Azure (Azure’s Gen2 Data Lake Service (ADLS)).įigure 1: Adobe Experience Platform Architecture The ProblemĪdobe Experience Platform Catalog Service provides a way of listing, searching, and provisioning a DataSet, which is our equivalent of a Table in a Relational Database. At the center of our Data Lake Architecture is the underlying storage. That view can then be used with intelligent services to drive experiences across multiple devices, run targeted campaigns, classify profiles and other entities into segments, and leverage advanced analytics. Customers use it to centralize and standardize their data across the enterprise resulting in a 360-degree view of their data of interest. In this blog, we will share our story of migrating 1 PB+ datasets to Iceberg on Adobe Experience Platform Data Lake, the challenges we faced, and lessons learned.Īdobe Experience Platform is an open system for driving real-time personalized experiences.
![adobe incatalog adobe incatalog](https://i.ytimg.com/vi/RJZYKy5aqLA/maxresdefault.jpg)
In our previous blogs, Iceberg At Adobe, Data ingestion with Buffered writes to Iceberg, and Optimized Reads With Iceberg we understood the benefits of Apache Iceberg and how it fits in the overall Adobe Experience Platform architecture.