This question is part of a case study — click to read the full scenario(Case 11)
CASE STUDY: AutoMakers Inc
Company Overview:
AutoMakers Inc is a leading vehicle manufacturer transitioning to connected and autonomous vehicles. They need a platform to ingest, process, and analyze telemetry data from millions of cars.
Current Technical Environment:
- Legacy MQTT brokers on-premises.
- Hadoop cluster for batch processing (nightly runs).
- 100,000 connected cars sending 1 KB of data every minute.
- On-premises data warehouse reaching capacity.
Business Requirements:
- Support 5 million connected cars within 3 years.
- Enable real-time alerting for critical vehicle faults.
- Provide predictive maintenance insights to customers.
- Monetize anonymized traffic data.
Executive Statements:
- CEO: "Data is our new engine. We need real-time insights to improve safety."
- CFO: "The platform must scale cost-effectively. We only want to pay for what we use."
- CTO: "We need a fully managed serverless data pipeline to minimize operational overhead."
Technical Requirements:
- Ingest up to 1 million messages per second with low latency.
- Process data in real-time for anomaly detection.
- Store raw telemetry data indefinitely for machine learning model training.
- Provide a scalable data warehouse for business intelligence analysts.
Constraints:
- Strict data privacy regulations (GDPR) require masking of PII.
- Limited data engineering staff; prefer managed services.
- Must integrate with existing on-premises identity provider (Active Directory).
QUESTION:
Which architecture should you recommend for the real-time ingestion and processing pipeline to meet the CTO's requirement for a fully managed serverless solution?
GCP PCA · Question 12 · Storage Systems
CASE STUDY: AutoMakers Inc
Company Overview:
AutoMakers Inc is a leading vehicle manufacturer transitioning to connected and autonomous vehicles. They need a platform to ingest, process, and analyze telemetry data from millions of cars.
Current Technical Environment:
- Legacy MQTT brokers on-premises.
- Hadoop cluster for batch processing (nightly runs).
- 100,000 connected cars sending 1 KB of data every minute.
- On-premises data warehouse reaching capacity.
Business Requirements:
- Support 5 million connected cars within 3 years.
- Enable real-time alerting for critical vehicle faults.
- Provide predictive maintenance insights to customers.
- Monetize anonymized traffic data.
Executive Statements:
- CEO: "Data is our new engine. We need real-time insights to improve safety."
- CFO: "The platform must scale cost-effectively. We only want to pay for what we use."
- CTO: "We need a fully managed serverless data pipeline to minimize operational overhead."
Technical Requirements:
- Ingest up to 1 million messages per second with low latency.
- Process data in real-time for anomaly detection.
- Store raw telemetry data indefinitely for machine learning model training.
- Provide a scalable data warehouse for business intelligence analysts.
Constraints:
- Strict data privacy regulations (GDPR) require masking of PII.
- Limited data engineering staff; prefer managed services.
- Must integrate with existing on-premises identity provider (Active Directory).
QUESTION:
To meet the requirement to store raw telemetry data indefinitely for machine learning model training while adhering to the CFO's cost constraints, which storage solution should you use?
CASE STUDY: AutoMakers Inc
Company Overview:
AutoMakers Inc is a leading vehicle manufacturer transitioning to connected and autonomous vehicles. They need a platform to ingest, process, and analyze telemetry data from millions of cars.
Current Technical Environment:
- Legacy MQTT brokers on-premises.
- Hadoop cluster for batch processing (nightly runs).
- 100,000 connected cars sending 1 KB of data every minute.
- On-premises data warehouse reaching capacity.
Business Requirements:
- Support 5 million connected cars within 3 years.
- Enable real-time alerting for critical vehicle faults.
- Provide predictive maintenance insights to customers.
- Monetize anonymized traffic data.
Executive Statements:
- CEO: "Data is our new engine. We need real-time insights to improve safety."
- CFO: "The platform must scale cost-effectively. We only want to pay for what we use."
- CTO: "We need a fully managed serverless data pipeline to minimize operational overhead."
Technical Requirements:
- Ingest up to 1 million messages per second with low latency.
- Process data in real-time for anomaly detection.
- Store raw telemetry data indefinitely for machine learning model training.
- Provide a scalable data warehouse for business intelligence analysts.
Constraints:
- Strict data privacy regulations (GDPR) require masking of PII.
- Limited data engineering staff; prefer managed services.
- Must integrate with existing on-premises identity provider (Active Directory).
QUESTION:
To meet the requirement to store raw telemetry data indefinitely for machine learning model training while adhering to the CFO's cost constraints, which storage solution should you use?
Answer options:
Store the raw data in Cloud SQL and use read replicas for machine learning extraction.
Store the raw data in BigQuery and use partition expiration to delete data after 3 years.
Store the raw data in Cloud Storage using the Standard storage class, and use Object Lifecycle Management to transition older data to Coldline or Archive.
Store the raw data in Cloud Bigtable to ensure low-latency reads for the machine learning models.
How to approach this question
Full Answer
Common mistakes
Practice the full GCP Professional Cloud Architect Practice Exam 7
50 questions · hints · full answers · grading
Expert