Hard1 markMultiple Choice
Domain 3.5: Data IngestionDomain 3PerformanceETLGlue

AWS SAA-C03 · Question 52 · Domain 3.5: Data Ingestion

A data engineering team needs to perform complex Extract, Transform, Load (ETL) operations on large datasets stored in Amazon S3. They want to use Apache Spark for the transformations. Which TWO AWS services can provide a managed Apache Spark environment for this task? (Select TWO.)

Answer options:

A.

AWS Glue

B.

Amazon Athena

C.

Amazon EMR

D.

Amazon Kinesis Data Analytics

E.

AWS Data Pipeline

How to approach this question

Identify the two AWS services that support Apache Spark: EMR (cluster-based) and Glue (serverless).

Full Answer

To run Apache Spark workloads on AWS, you have two primary managed options. Amazon EMR provides managed clusters where you have fine-grained control over the Spark environment. AWS Glue provides a serverless Apache Spark environment specifically optimized for ETL jobs.

Common mistakes

Choosing Athena, which uses Presto for SQL queries, not Spark for ETL.

Practice the full AWS SAA-C03 Practice Exam 1

65 questions · hints · full answers · grading

More questions from this exam