Hard1 markMultiple Choice
Domain 2.2: Data IntegrationDomain 2Data IntegrationDatabricksDelta Lake

AZ-305 · Question 22 · Domain 2.2: Data Integration

Your data science team uses Azure Databricks. They are building a data lakehouse architecture and need a storage format that supports ACID transactions, scalable metadata handling, and unifies streaming and batch data processing.

They also require the ability to perform 'time travel' (data versioning) to query previous versions of the data for machine learning model reproducibility.

Which storage format should you recommend?

Answer options:

A.

Apache Parquet

B.

Delta Lake

C.

Avro

D.

Azure Synapse SQL pools

How to approach this question

Recognize the keywords 'ACID transactions', 'time travel', and 'Databricks'. These are the defining features of Delta Lake.

Full Answer

B.Delta Lake✓ Correct
Delta Lake
Delta Lake is an open-source storage layer that brings reliability to data lakes. It provides ACID transactions, scalable metadata handling, and unifies streaming and batch data processing. It stores data in Parquet format but adds a transaction log. This log enables features like 'time travel', allowing data scientists to query older versions of the data, which is critical for reproducing machine learning models.

Common mistakes

Choosing Parquet. Delta Lake uses Parquet under the hood, but Parquet alone cannot do ACID transactions or time travel.

Practice the full Azure Solutions Architect Expert AZ-305 Practice Exam 5

55 questions · hints · full answers · grading

More questions from this exam