Research Article

A Comparative Performance & Metadata Study of Open Table Formats: Iceberg vs Delta vs Hudi at Scale

Authors

  • Siddhartha Parimi Dell Technologies, USA

Abstract

The rapid adoption of open table formats has fundamentally transformed modern data engineering by enabling ACID transactions, schema evolution, and time travel capabilities on cloud object storage systems. Apache Iceberg, Delta Lake, and Apache Hudi represent the three dominant solutions that have emerged to address traditional data lake limitations, including a lack of transactional guarantees, concurrent write challenges, and metadata management inefficiencies. This evaluation conducts empirical benchmarking across terabyte-scale datasets to compare these formats across critical dimensions, including metadata scalability, transaction isolation guarantees, concurrent write handling, compaction strategies, streaming consistency semantics, and cross-engine interoperability. Testing scenarios encompass bulk ingestion throughput, incremental write latency, selective query performance, time travel operations, schema evolution capabilities, and maintenance overhead under varying concurrency levels. Results reveal that Iceberg excels in read-heavy analytics workloads with superior query planning efficiency and cross-engine portability, Delta Lake demonstrates operational simplicity with strong Spark integration and the highest bulk write throughput, while Hudi offers flexible write-read tradeoffs through dual table types optimized for streaming upserts. Format convergence trends indicate rapid feature adoption across competing implementations, reducing vendor lock-in risks and enabling organizations to select formats based on specific workload characteristics rather than seeking universally optimal solutions. The article establishes quantitative foundations for practitioners navigating table format selection as lakehouse architectures become the dominant paradigm for enterprise data platforms, with direct implications for infrastructure costs, operational complexity, and analytical performance at scale.

Article information

Journal

Journal of Computer Science and Technology Studies

Volume (Issue)

7 (12)

Pages

513-520

Published

2025-12-29

How to Cite

Siddhartha Parimi. (2025). A Comparative Performance & Metadata Study of Open Table Formats: Iceberg vs Delta vs Hudi at Scale. Journal of Computer Science and Technology Studies, 7(12), 513-520. https://doi.org/10.32996/jcsts.2025.7.12.56

Downloads

Views

13

Downloads

7

Keywords:

Lakehouse Architecture, Table Format Comparison, Metadata Management, Transaction Isolation, Distributed Data Systems