Domain 3.5: Data Ingestion
17 questions across 7 exams
Exams covering this topic
All questions (17)
A company wants to collect application logs from hundreds of EC2 instances and load them into Amazon Redshift for analytics. The company wants a fully managed service that can batch, compress, and encrypt the data before loading it, without writing any custom code. Which service should be used?
A data engineering team needs to perform complex Extract, Transform, Load (ETL) operations on large datasets stored in Amazon S3. They want to use Apache Spark for the transformations. Which TWO AWS services can provide a managed Apache Spark environment for this task? (Select TWO.)
A company needs to ingest streaming log data from thousands of EC2 instances, transform the data format from JSON to Parquet, and store it in Amazon S3 for analytics. Which service provides a fully managed solution for this?
A research vessel at sea has limited internet connectivity. They need to collect terabytes of sensor data, perform local machine learning inference on the data, and eventually transfer the data to AWS when they return to port. Which AWS service should they use?
A company uses Amazon Kinesis Data Streams to ingest telemetry data from IoT devices. The data volume has doubled, and the consumer applications are falling behind, resulting in a high `GetRecords.IteratorAgeMilliseconds` metric. How should a solutions architect resolve this?
A data engineering team needs to run daily ETL (Extract, Transform, Load) jobs to process logs stored in Amazon S3 and load the transformed data into Amazon Redshift. They want a fully managed, serverless Apache Spark environment. Which service should they use?
A research facility has 50 TB of genomic data stored on local NAS devices. They need to transfer this data to Amazon S3. The facility has a slow 50 Mbps internet connection. Which TWO services are MOST appropriate for this transfer? (Select TWO.)
An IoT application receives thousands of sensor readings per second. The data needs to be ingested in real-time, buffered, and then loaded into an Amazon Redshift data warehouse for analysis. <br/><br/>Which combination of AWS services should be used to build this ingestion pipeline with the LEAST operational overhead?
A company needs to transfer 50 TB of data from their on-premises data center to Amazon S3. Their internet connection is 100 Mbps and is heavily utilized by other applications. They need to complete the transfer within two weeks. <br/><br/>Which solution is the MOST reliable and cost-effective?
A company is migrating a MySQL database to AWS. The database is 5 TB in size and must remain online and accessible to the legacy on-premises application during the migration. <br/><br/>Which AWS service should be used to perform this migration with near-zero downtime?
A company is streaming IoT sensor data into Amazon Kinesis Data Streams. They need to calculate rolling averages of the sensor readings in real-time and trigger alerts if thresholds are exceeded. Which AWS service is BEST suited for this real-time processing?
A research facility needs to transfer 50 Terabytes of data to AWS. Their internet connection is 100 Mbps and is heavily utilized by daily operations. What is the MOST efficient way to transfer this data to Amazon S3?
A company needs to extract data from various databases, transform it, and load it into a data warehouse. They want a fully managed, serverless ETL (Extract, Transform, Load) service that can automatically discover the data schema. Which service should they use?
A company wants to collect application logs from hundreds of EC2 instances and deliver them to an Amazon S3 bucket for long-term storage. The solution must be fully managed, automatically scale to handle varying data volumes, and require zero custom code.<br/><br/>Which AWS service should be used?
A data engineering team needs to extract data from an Amazon RDS database, transform the data by joining it with reference files in Amazon S3, and load the transformed data into Amazon Redshift. They want a fully managed, serverless ETL (Extract, Transform, Load) service.<br/><br/>Which AWS service should they use?
A company receives daily CSV files in an Amazon S3 bucket. They need to automatically transform these files into Apache Parquet format and catalog the metadata so the data can be queried using Amazon Athena. Which AWS service should be used for the transformation and cataloging?
A company needs to ingest real-time log data from thousands of EC2 instances. The data must be processed in real-time to detect anomalies, and the raw data must be stored in Amazon S3. Which TWO services should be combined to build this architecture? (Select TWO.)
Practice these questions with detailed guidance
Full answers, grading, and explanations on why each answer is correct.
Expert