AWS SAP-C02 · Question 21 · Domain 2.1: Deployment Strategy
A company is designing a serverless data lake architecture. Data is ingested into Amazon S3 via Amazon Kinesis Data Firehose. The data must be transformed (e.g., converting JSON to Parquet, removing PII) before being stored in the final S3 bucket. The transformation logic is complex and requires custom Python libraries. Which solution provides the MOST scalable and lowest maintenance approach?
A company is designing a serverless data lake architecture. Data is ingested into Amazon S3 via Amazon Kinesis Data Firehose. The data must be transformed (e.g., converting JSON to Parquet, removing PII) before being stored in the final S3 bucket. The transformation logic is complex and requires custom Python libraries. Which solution provides the MOST scalable and lowest maintenance approach?
Answer options:
Configure Kinesis Data Firehose to invoke an AWS Lambda function to perform the data transformation before delivering the data to S3.
Write the data directly to S3. Use S3 Event Notifications to trigger an AWS Glue ETL job to transform the data and write it to a new bucket.
Deploy an Amazon EMR cluster. Configure Firehose to deliver data to EMR, run Apache Spark transformations, and write to S3.
Use Amazon Kinesis Data Analytics to run SQL queries to transform the data before it reaches Firehose.
How to approach this question
Full Answer
Common mistakes
Practice the full AWS Solutions Architect Professional SAP-C02 Practice Exam 5
75 questions · hints · full answers · grading
Expert