Domain 4: Analyzing and Optimizing Technical and Business Processes
27 questions across 3 exams
Exams covering this topic
All questions (27)
CASE STUDY: TechStream Gaming. 500 emp, $100M rev. On-prem US/EU, 200 servers, MySQL 5TB. 2M peak users. $100K/mo cost. Req: Cut cost 40%, 5x growth, 3 new regions, daily deploys. CEO: Scale fast. CFO: <$100K/mo, 18mo ROI. CTO: Low cloud skills, 99.95% uptime. Tech: <100ms latency, real-time analytics, 5x spikes, EU data residency, DDoS protection, CI/CD. Constraints: 12mo migration, 4hr downtime, 20 devs (Java/MySQL), 5 ops (no cloud), $2M budget. Design the real-time analytics pipeline for player behavior.
CASE STUDY: ShopGlobal. Global e-commerce. Monolithic Java on VMware. Oracle RAC (20TB). 10x Black Friday traffic. Req: Microservices, 100% uptime during holidays, personalized recommendations. CEO: Flawless omnichannel. CFO: Predictable spend. CTO: No vendor lock-in, open-source. Tech: Containerize, Global LB, PCI-DSS, async orders, real-time inventory. Constraints: Keep Oracle on-prem for 2 yrs (licensing), low K8s skills, strict security reviews. Which architecture pattern should you use to implement asynchronous order processing to handle 10x Black Friday traffic spikes?
CASE STUDY: AutoMakers Inc. 1M connected cars, 100GB/day telemetry. Req: Predictive maintenance, real-time driver dashboard, monetize data. CEO: Data is new engine. CFO: Cut 3rd-party IoT costs. CTO: Highly scalable ingest. Tech: MQTT ingest, stream processing, ML models, 7-yr cold storage, handle intermittent connectivity. Constraints: Anonymize data, low vehicle compute, strict analytics budget. To build and deploy the predictive maintenance ML models with minimal MLOps overhead, which platform should you use?
CASE STUDY: HealthSecure. 50M patient records. Legacy mainframe, on-prem SAN (100TB), .NET portal. Req: Modernize portal, secure hospital sharing, fast audits. CEO: Modern UX. CFO: Automate audits. CISO: Zero breaches. Tech: HIPAA, CMEK, audit logging, API gateway, DR (1h RPO/4h RTO). Constraints: No public DB IPs, Dev/Ops separation, US data only, mainframe stays on-prem via VPN. To satisfy the CFO's requirement to automate and speed up compliance audits, how should you handle Cloud Audit Logs?
To ensure software supply chain security, your CISO requires that only container images verified by the QA team can be deployed to the production GKE cluster. Which combination of services achieves this?
Your SRE team has defined a Service Level Objective (SLO) of 99.9% availability for a service. Over the last 30 days, the service has experienced 45 minutes of downtime. The error budget is depleted. According to SRE best practices, what should the team do?
You are planning a massive marketing campaign that will require spinning up 5,000 new Compute Engine cores in a single region within 10 minutes. What must you do to ensure this succeeds?
Your microservices application is experiencing high latency. You need to identify which specific function in the code is consuming the most CPU, and how long requests take as they travel between microservices. Which TWO Google Cloud operations tools should you use? (Select TWO)
To improve system resilience, your SRE team wants to implement Chaos Engineering by intentionally injecting HTTP 500 errors and network delays between microservices running on GKE. Which TWO approaches are best suited for this? (Select TWO)
**CASE STUDY: TechStream Gaming** **Company Overview:** TechStream Gaming is a global gaming company with 500 employees and $100M in annual revenue. They develop multiplayer online games. **Current Technical Environment:** - On-premises data centers in US and EU - 200 servers (mix of Windows and Linux) - MySQL databases (5 TB total) - Peak concurrent users: 2 million - Current monthly infrastructure cost: $100K **Business Requirements:** - Reduce infrastructure costs by 40% - Support 5x user growth over 2 years - Launch in 3 new regions (APAC, SA, Africa) - Improve deployment speed (current: 1 week -> target: daily) **Executive Statements:** - CEO: "We need to scale rapidly to compete with larger gaming companies. Cloud migration is critical to our growth strategy." - CFO: "Cost reduction is paramount. We cannot exceed $60K/month in cloud costs. ROI must be achieved within 18 months." - CTO: "Our team has limited cloud experience. We need a solution that doesn't require extensive retraining. Reliability is non-negotiable - 99.95% uptime minimum." **Technical Requirements:** - Sub-100ms latency for players globally - Real-time analytics on player behavior - Seasonal traffic spikes (5x during holidays) - DDoS protection - CI/CD pipeline for daily deployments **Constraints:** - Migration must complete in 12 months - Cannot exceed 4-hour downtime during cutover - Development team: 20 engineers (Java, MySQL expertise) - Operations team: 5 engineers (limited cloud experience) **QUESTION:** How should you address the CTO's concern regarding the operations team's limited cloud experience while ensuring the 12-month migration timeline is met?
**CASE STUDY: TrendWear Apparel** **Company Overview:** TrendWear Apparel is a global clothing retailer with an e-commerce platform and 500 physical stores. **Current Technical Environment:** - On-premises VMware environment - Legacy IBM Mainframe for core inventory management - Monolithic e-commerce application running on VMs **Business Requirements:** - Modernize the e-commerce platform to handle Black Friday (10x normal traffic) - Unify online and in-store inventory data in real-time - Avoid major capital expenditure (CapEx) for data center refreshes **Executive Statements:** - CEO: "We need an omnichannel experience. Customers should see accurate store inventory online." - CFO: "We must shift from CapEx to OpEx. No more buying hardware." - CTO: "We want to move to microservices, but we cannot retire the mainframe for at least 3 years due to complex legacy dependencies." **Technical Requirements:** - Hybrid architecture connecting GCP and on-premises - Microservices architecture for the new e-commerce platform - PCI-DSS compliance for all payment processing - Consistent management plane across on-prem and cloud **Constraints:** - Mainframe must remain on-premises - E-commerce migration must be completed before the next holiday season (8 months) **QUESTION:** To ensure the security of the new microservices, the CTO wants to guarantee that only container images built by the official CI/CD pipeline and scanned for vulnerabilities can be deployed to GKE. How should you implement this?
**CASE STUDY: CareData Health** **Company Overview:** CareData Health is a large healthcare provider network operating 50 hospitals. They manage petabytes of patient records, medical imaging, and telemetry data. **Current Technical Environment:** - Decentralized on-premises data centers at each hospital - Legacy Electronic Health Record (EHR) systems - Fragmented data silos preventing holistic patient views **Business Requirements:** - Centralize patient data into a single secure data lake - Enable machine learning for predictive diagnostics - Securely share anonymized data with external research partners **Executive Statements:** - CEO: "We must leverage AI to improve patient outcomes and reduce readmission rates." - CISO: "Zero tolerance for data breaches. Patient data must be encrypted everywhere, and we must prevent any unauthorized data exfiltration." - DPO (Data Protection Officer): "We must strictly adhere to HIPAA in the US and GDPR for our European patients. Data residency is mandatory." **Technical Requirements:** - End-to-end encryption using keys managed by CareData - Strict access controls and comprehensive audit logging - Ingestion of HL7 and FHIR healthcare data formats - Physical separation of EU and US data **Constraints:** - Highly regulated environment - Legacy systems cannot be modified, only integrated with **QUESTION:** The DPO mandates physical separation of EU and US data. How should you design the BigQuery architecture to ensure compliance while minimizing operational overhead?
**CASE STUDY: AutoMakers Inc** **Company Overview:** AutoMakers Inc is a global vehicle manufacturer. They have recently launched a line of connected cars. **Current Technical Environment:** - 1 million connected cars currently on the road - Cars send telemetry data (speed, engine temp, location) every 5 seconds - Current on-premises MQTT brokers are crashing under the load **Business Requirements:** - Enable predictive maintenance to alert drivers before parts fail - Provide real-time fleet tracking for commercial customers - Support over-the-air (OTA) software updates **Executive Statements:** - CEO: "Data is our new revenue stream. We need to monetize this telemetry data." - CTO: "We expect to have 10 million connected cars in 3 years. The architecture must scale infinitely without manual intervention." - CFO: "The cost of ingesting and storing this data must be strictly controlled. We cannot pay for idle capacity." **Technical Requirements:** - Ingest up to 100,000 messages per second - Low-latency processing for real-time alerts - Time-series data storage for historical analysis - Handle variable network connectivity (cars driving through tunnels) **Constraints:** - Strict budget for data ingestion - Small data engineering team **QUESTION:** The operations team needs to be alerted if the number of unacknowledged messages in the Pub/Sub subscription exceeds 500,000, indicating a processing bottleneck. How should you configure this?
Your organization has adopted Site Reliability Engineering (SRE) practices. The Service Level Objective (SLO) for your e-commerce checkout service is 99.9% availability over a 30-day rolling window. Currently, the service has experienced several outages, and the error budget has been completely exhausted. According to SRE best practices, what action should the team take?
Your company is planning a massive marketing campaign that will launch in 48 hours. The engineering team expects a 20x increase in traffic. The application runs on Compute Engine instances in the `us-central1` region. During a load test, the autoscaler failed to provision enough instances, and the logs showed `QUOTA_EXCEEDED` errors for N2 CPUs. What is the most appropriate action to ensure the campaign's success?
Your team is deploying a major update to a user-facing web application. To minimize risk, you want to route only 5% of the live traffic to the new version. If the error rate remains low, you will gradually increase the traffic to 100%. Which deployment strategy does this describe?
Your SRE team wants to implement Chaos Engineering to test the resilience of a microservices architecture running on GKE. Which TWO practices should the team adopt? (Select TWO)
You are designing a CI/CD pipeline for a GKE-based application. The development team wants to adopt a GitOps methodology. Which TWO tools/services are central to implementing GitOps on Google Cloud? (Select TWO)
CASE STUDY: TechStream Gaming Overview: 500 employees, $100M revenue. On-prem US/EU, 200 servers, 5TB MySQL. 2M peak users, $100K/mo cost. Business: Reduce cost 40%, 5x growth, launch APAC/SA/Africa, daily deployments. Executives: - CEO: "Scale rapidly to compete. Cloud is critical." - CFO: "Cost reduction paramount. Max $100K/mo. ROI in 18 months." - CTO: "Team has limited cloud experience. 99.95% uptime non-negotiable." Tech: <100ms latency globally, real-time analytics, 5x seasonal spikes, EU data residency, DDoS protection, CI/CD. Constraints: 12-month migration, 4hr max downtime, 20 devs (Java/MySQL), 5 ops (limited cloud), $2M budget. How should you address the operations team's limited cloud experience while ensuring the 99.95% uptime requirement?
CASE STUDY: RetailMart Overview: Global e-commerce, 5,000 employees. Legacy monolith on VMware, 20TB Oracle DB on-prem. Business: Modernize to microservices, 100% uptime during Black Friday (10x traffic), real-time inventory sync, exit data center in 2 years. Executives: - CEO: "Innovate faster to beat online-only competitors." - CFO: "End hardware CAPEX. Move to pure OPEX." - CTO: "Break monolith safely. Zero downtime during transition." Tech: Migrate off Oracle to open-source, containerize, secure hybrid connectivity during transition, automated scaling. Constraints: Zero downtime for storefront, heavy reliance on Oracle stored procedures, all hybrid traffic must be private/encrypted. To support the CEO's goal to "innovate faster," you need to design a secure CI/CD pipeline for the new microservices. Which architecture should you implement?
CASE STUDY: HealthData Corp Overview: Healthcare SaaS managing 10PB of sensitive patient records and imaging. Business: Strict HIPAA/SOC 2 compliance, ransomware protection, secure sharing of anonymized data with researchers, robust DR. Executives: - CEO: "Trust is our product. Zero tolerance for breaches." - CFO: "Storage costs growing exponentially. Need lifecycle management." - CISO: "Zero-trust architecture, end-to-end encryption." Tech: RPO 15m, RTO 2h for core DB. All data CMEK encrypted. Strict access controls, audit logging. Prevent data exfiltration. Constraints: Images retained 7 years but rarely accessed after 90 days. Researchers use external identities. No public IPs on compute. Which disaster recovery architecture should you design for the core database to meet the RPO of 15 minutes and RTO of 2 hours?
CASE STUDY: AutoIoT Overview: Connected car manufacturer. 1M vehicles sending telemetry every 5 seconds. Business: Predictive maintenance alerts, real-time fleet tracking, monetize anonymized data. Executives: - CEO: "Leverage AI to predict failures." - CTO: "Current MQTT brokers crashing. Need fully managed, scalable ingestion." - DPO: "Vehicle location is sensitive. Strip PII before analytics." Tech: Ingest millions of msgs/sec, real-time stream processing for anomalies, store raw data for ML, sub-second queries for dashboards. Constraints: Vehicles lose connection and send late batch data. ML models updated weekly. Strict analytics budget. How should you monitor the health of the Dataflow streaming pipeline to ensure it is keeping up with the vehicle telemetry?
Your development team is adopting a microservices architecture on GKE. The CTO wants to ensure that if a service fails, it degrades gracefully rather than causing a cascading failure across the entire application. Which architectural pattern should you implement?
You are defining the Service Level Indicator (SLI) for a user-facing web application. According to Google's Site Reliability Engineering (SRE) practices, which metric is the most appropriate SLI for a synchronous request-response service?
Your team is deploying a new application to Google Cloud. The CFO wants to ensure that the team is alerted immediately if the monthly cloud spend is projected to exceed $10,000. How should you configure this?
Your development team is building a new application. They want to implement a CI/CD pipeline that automatically deploys code to a staging environment when a pull request is merged, and requires manual approval before deploying to production. Which TWO Google Cloud services should you use to build this workflow? (Select TWO)
You are conducting a blameless post-mortem after a major outage caused by a misconfigured firewall rule. According to Google's SRE practices, what are TWO primary goals of this post-mortem? (Select TWO)
Practice these questions with detailed guidance
Full answers, grading, and explanations on why each answer is correct.
Expert