Domain 2.2: Data Integration
15 questions across 5 exams
Exams covering this topic
All questions (15)
You are designing a data analytics platform using Azure Synapse Analytics. The data science team needs to query 50 TB of historical log data stored in Azure Data Lake Storage Gen2. They only run these queries sporadically, typically at the end of the month, to generate ad-hoc reports. They require a solution that uses standard T-SQL but minimizes costs by only charging for the queries executed, rather than paying for continuous compute capacity. Which Synapse Analytics component should you recommend?
Your company is building a modern data warehouse. You need to ingest 5 TB of raw JSON data daily from an external REST API, perform complex machine learning transformations on the data, and load the cleansed data into an Azure Synapse Analytics dedicated SQL pool. The data engineering team prefers writing transformations in Python and Scala. The orchestration team prefers a visual, drag-and-drop interface for scheduling and monitoring the end-to-end pipeline. Which TWO services should you combine to meet these requirements? (Select TWO)
You are designing an IoT solution that receives telemetry data from 100,000 sensors globally. The system must ingest up to 2 million events per second. You need a service that can buffer this massive influx of streaming data, partition it for parallel processing, and retain the raw events for up to 7 days so multiple downstream consumer applications can read the data at their own pace. Which Azure service should you recommend?
A data engineering team is building a data lake architecture on Azure. They have petabytes of raw data stored in Azure Data Lake Storage Gen2 in Parquet format. Data scientists need to run ad-hoc, exploratory SQL queries directly against this data. The query workload is highly unpredictable—some days there are hundreds of complex queries, and other days there are none. The company wants to pay only for the queries executed, without provisioning dedicated compute clusters that sit idle. Which Azure Synapse Analytics component should you recommend?
A retail company is designing an ELT (Extract, Load, Transform) pipeline. Requirements: 1. Extract data from 50 different on-premises and cloud data sources (SQL, Oracle, Salesforce, REST APIs). 2. Orchestrate the movement of this data into Azure Data Lake Storage Gen2. 3. Perform complex data transformations using Apache Spark and Python. 4. The transformation code must be developed collaboratively by data engineers using interactive notebooks. Which TWO Azure services should you combine to meet these requirements? (Select TWO)
An IoT startup is deploying 100,000 smart thermostats. Each thermostat sends telemetry data (temperature, humidity) every 5 seconds. You need to design an ingestion layer capable of receiving millions of events per second with low latency. The data must be buffered and then read by multiple downstream consumer applications simultaneously. Which Azure service should you recommend for the ingestion layer?
Your company is building an enterprise data platform. You need an orchestration tool to move data from 50 different on-premises and cloud sources into Azure Data Lake Storage Gen2. The data engineering team requires a visual, code-free interface to build ETL pipelines. They also need to execute SQL queries directly against the files in the Data Lake without provisioning dedicated database compute resources. Which service should you recommend?
You are designing an IoT architecture for a manufacturing plant. Thousands of sensors will send temperature and pressure telemetry every second. You need to ingest this massive stream of telemetry data, calculate the average temperature over a 5-minute tumbling window, and trigger an alert if the average exceeds a threshold. The raw data must also be saved to Azure Data Lake Storage for historical analysis. Which TWO Azure services should you combine to achieve this? (Select TWO)
Your data science team uses Azure Databricks. They are building a data lakehouse architecture and need a storage format that supports ACID transactions, scalable metadata handling, and unifies streaming and batch data processing. They also require the ability to perform 'time travel' (data versioning) to query previous versions of the data for machine learning model reproducibility. Which storage format should you recommend?
A retail company needs to orchestrate data movement from 20 different on-premises SQL Server databases into Azure Data Lake Storage Gen2. The data must be transformed using serverless Apache Spark pools before being loaded into a dedicated SQL pool for reporting. You want to use a single unified workspace to manage the pipelines, Spark pools, and SQL pools. Which service should you recommend?
You are designing a data architecture using Azure Databricks. The business requires ACID transactions, scalable metadata handling, and the ability to unify streaming and batch data processing on their existing Azure Data Lake Storage Gen2. Which TWO technologies should you include in your architecture? (Select TWO)
An IoT solution generates 1 million events per second from sensors worldwide. You need to ingest this telemetry data, perform real-time windowed aggregations (e.g., average temperature over a 5-minute tumbling window), and output the results to Power BI. Which combination of services should you recommend?
You are designing a data integration solution. You need to orchestrate ETL pipelines and perform data transformations using Apache Spark. You want a unified workspace experience. Which service should you choose?
You are designing an IoT solution that ingests 1 million telemetry messages per second. You need to perform real-time windowing aggregations (e.g., average temperature per minute) and output the results to Power BI. Which TWO services should you combine? (Select TWO)
You are building a Data Lake architecture. You need to support ACID transactions, scalable metadata handling, and unify streaming and batch data processing on top of Azure Data Lake Storage Gen2. What should you use?
Practice these questions with detailed guidance
Full answers, grading, and explanations on why each answer is correct.
Expert