Hard1 markMultiple Choice
Subtask 3.1: Design for securitySREError BudgetsReliabilityOperations

GCP PCA · Question 47 · Design for security

Your SRE team has defined an SLO of 99.9% availability for a critical service. Over the past month, the service has experienced multiple outages, and the error budget has been completely exhausted. According to Google SRE best practices, which THREE actions should the team take? (Select THREE)

Answer options:

A.

Halt all new feature deployments.

B.

Redirect engineering effort from feature development to reliability improvements.

C.

Conduct blameless post-mortems for the outages.

D.

Lower the SLO to 99.0% so the error budget is no longer exhausted.

E.

Fire the engineers responsible for the code that caused the outages.

F.

Ignore the error budget and continue deploying features to meet business deadlines.

How to approach this question

Recall the consequences of exhausting an error budget in Google SRE methodology.

Full Answer

Halt all new feature deployments., Redirect engineering effort from feature development to reliability improvements., Conduct blameless post-mortems for the outages.
In Google SRE, an error budget is the acceptable amount of unreliability (e.g., 0.1% for a 99.9% SLO). When the error budget is exhausted, it means the service is too unstable. The agreed-upon policy is to halt feature releases and redirect all engineering effort toward reliability (bug fixes, technical debt). Additionally, blameless post-mortems must be conducted to understand why the outages occurred and prevent recurrence.

Common mistakes

Suggesting lowering the SLO to 'fix' the metric, which ignores the actual user pain.

Practice the full GCP Professional Cloud Architect Practice Exam 1

50 questions · hints · full answers · grading

More questions from this exam