Hard1 markMultiple Choice
AWS SAA-C03 · Question 52 · Domain 3.5: Data Ingestion
A data engineering team needs to perform complex Extract, Transform, Load (ETL) operations on large datasets stored in Amazon S3. They want to use Apache Spark for the transformations. Which TWO AWS services can provide a managed Apache Spark environment for this task? (Select TWO.)
A data engineering team needs to perform complex Extract, Transform, Load (ETL) operations on large datasets stored in Amazon S3. They want to use Apache Spark for the transformations. Which TWO AWS services can provide a managed Apache Spark environment for this task? (Select TWO.)
Answer options:
A.
AWS Glue
B.
Amazon Athena
C.
Amazon EMR
D.
Amazon Kinesis Data Analytics
E.
AWS Data Pipeline
How to approach this question
Identify the two AWS services that support Apache Spark: EMR (cluster-based) and Glue (serverless).
Full Answer
To run Apache Spark workloads on AWS, you have two primary managed options. Amazon EMR provides managed clusters where you have fine-grained control over the Spark environment. AWS Glue provides a serverless Apache Spark environment specifically optimized for ETL jobs.
Common mistakes
Choosing Athena, which uses Presto for SQL queries, not Spark for ETL.
Practice the full AWS SAA-C03 Practice Exam 1
65 questions · hints · full answers · grading
More questions from this exam
Q01A company has multiple AWS accounts in an AWS Organizations organization. The security team wants...MediumQ02A solutions architect is designing an application that will run on Amazon EC2 instances. The appl...EasyQ03A company wants to implement a federated identity solution for its employees to access the AWS Ma...MediumQ04A mobile application needs to access Amazon DynamoDB directly to read user-specific data. The app...HardQ05A company is hosting a web application on Amazon EC2 instances. The application connects to an Am...Medium
Expert