Hard1 markMultiple Choice
Subtask 5.3: Cost OptimizationBigQueryPerformance OptimizationPartitioningClustering
This question is part of a case study — click to read the full scenario(Case 11)

CASE STUDY: AutoMakers Inc

Company Overview:
AutoMakers Inc is a leading vehicle manufacturer transitioning to connected and autonomous vehicles. They need a platform to ingest, process, and analyze telemetry data from millions of cars.

Current Technical Environment:

  • Legacy MQTT brokers on-premises.
  • Hadoop cluster for batch processing (nightly runs).
  • 100,000 connected cars sending 1 KB of data every minute.
  • On-premises data warehouse reaching capacity.

Business Requirements:

  • Support 5 million connected cars within 3 years.
  • Enable real-time alerting for critical vehicle faults.
  • Provide predictive maintenance insights to customers.
  • Monetize anonymized traffic data.

Executive Statements:

  • CEO: "Data is our new engine. We need real-time insights to improve safety."
  • CFO: "The platform must scale cost-effectively. We only want to pay for what we use."
  • CTO: "We need a fully managed serverless data pipeline to minimize operational overhead."

Technical Requirements:

  • Ingest up to 1 million messages per second with low latency.
  • Process data in real-time for anomaly detection.
  • Store raw telemetry data indefinitely for machine learning model training.
  • Provide a scalable data warehouse for business intelligence analysts.

Constraints:

  • Strict data privacy regulations (GDPR) require masking of PII.
  • Limited data engineering staff; prefer managed services.
  • Must integrate with existing on-premises identity provider (Active Directory).

QUESTION:
Which architecture should you recommend for the real-time ingestion and processing pipeline to meet the CTO's requirement for a fully managed serverless solution?

GCP PCA · Question 14 · Cost Optimization

CASE STUDY: AutoMakers Inc

Company Overview:
AutoMakers Inc is a leading vehicle manufacturer transitioning to connected and autonomous vehicles. They need a platform to ingest, process, and analyze telemetry data from millions of cars.

Current Technical Environment:

  • Legacy MQTT brokers on-premises.
  • Hadoop cluster for batch processing (nightly runs).
  • 100,000 connected cars sending 1 KB of data every minute.
  • On-premises data warehouse reaching capacity.

Business Requirements:

  • Support 5 million connected cars within 3 years.
  • Enable real-time alerting for critical vehicle faults.
  • Provide predictive maintenance insights to customers.
  • Monetize anonymized traffic data.

Executive Statements:

  • CEO: "Data is our new engine. We need real-time insights to improve safety."
  • CFO: "The platform must scale cost-effectively. We only want to pay for what we use."
  • CTO: "We need a fully managed serverless data pipeline to minimize operational overhead."

Technical Requirements:

  • Ingest up to 1 million messages per second with low latency.
  • Process data in real-time for anomaly detection.
  • Store raw telemetry data indefinitely for machine learning model training.
  • Provide a scalable data warehouse for business intelligence analysts.

Constraints:

  • Strict data privacy regulations (GDPR) require masking of PII.
  • Limited data engineering staff; prefer managed services.
  • Must integrate with existing on-premises identity provider (Active Directory).

QUESTION:
To provide a scalable data warehouse for business intelligence analysts, how should you configure BigQuery to optimize query performance and costs for time-series telemetry data?

Answer options:

A.

Create a new BigQuery table for every day of telemetry data (sharding).

B.

Partition the BigQuery tables by a timestamp column and cluster by vehicle ID.

C.

Cluster the BigQuery tables by timestamp and partition by vehicle ID.

D.

Use BigQuery BI Engine and load all historical telemetry data into memory.

How to approach this question

Understand BigQuery optimization techniques: Partitioning is for low-cardinality/time data to reduce bytes scanned. Clustering is for high-cardinality data (like IDs) to speed up filtering.

Full Answer

B.Partition the BigQuery tables by a timestamp column and cluster by vehicle ID.✓ Correct
In BigQuery, partitioning a table by a timestamp column restricts the amount of data scanned when a query includes a time filter, directly reducing costs. Clustering by a high-cardinality column like vehicle ID organizes the data within each partition, making queries that filter or aggregate by specific vehicles extremely fast and efficient.

Common mistakes

Confusing partitioning and clustering (Option C). You partition by time to prune data, and cluster by IDs to sort data.

Practice the full GCP Professional Cloud Architect Practice Exam 7

50 questions · hints · full answers · grading

More questions from this exam