Article contents
A Comparative Performance & Metadata Study of Open Table Formats: Iceberg vs Delta vs Hudi at Scale
Abstract
The rapid adoption of open table formats has fundamentally transformed modern data engineering by enabling ACID transactions, schema evolution, and time travel capabilities on cloud object storage systems. Apache Iceberg, Delta Lake, and Apache Hudi represent the three dominant solutions that have emerged to address traditional data lake limitations, including a lack of transactional guarantees, concurrent write challenges, and metadata management inefficiencies. This evaluation conducts empirical benchmarking across terabyte-scale datasets to compare these formats across critical dimensions, including metadata scalability, transaction isolation guarantees, concurrent write handling, compaction strategies, streaming consistency semantics, and cross-engine interoperability. Testing scenarios encompass bulk ingestion throughput, incremental write latency, selective query performance, time travel operations, schema evolution capabilities, and maintenance overhead under varying concurrency levels. Results reveal that Iceberg excels in read-heavy analytics workloads with superior query planning efficiency and cross-engine portability, Delta Lake demonstrates operational simplicity with strong Spark integration and the highest bulk write throughput, while Hudi offers flexible write-read tradeoffs through dual table types optimized for streaming upserts. Format convergence trends indicate rapid feature adoption across competing implementations, reducing vendor lock-in risks and enabling organizations to select formats based on specific workload characteristics rather than seeking universally optimal solutions. The article establishes quantitative foundations for practitioners navigating table format selection as lakehouse architectures become the dominant paradigm for enterprise data platforms, with direct implications for infrastructure costs, operational complexity, and analytical performance at scale.
Article information
Journal
Journal of Computer Science and Technology Studies
Volume (Issue)
7 (12)
Pages
513-520
Published
Copyright
Copyright (c) 2025 https://creativecommons.org/licenses/by/4.0/
Open access

This work is licensed under a Creative Commons Attribution 4.0 International License.

Aims & scope
Call for Papers
Article Processing Charges
Publications Ethics
Google Scholar Citations
Recruitment