Medium1 markMultiple Choice

GCP PCA · Question 29 · Ensure solution and operations reliability

Your Site Reliability Engineering (SRE) team has defined a Service Level Objective (SLO) of 99.9% availability for a critical API. Over the last 30 days, the API has experienced several outages, and the error budget has been completely exhausted. According to SRE best practices, what action should the team take?

Answer options:

A.

Lower the SLO to 99.0% so the team is no longer in violation.

B.

Halt all new feature deployments and focus exclusively on reliability improvements until the error budget recovers.

C.

Fire the engineers responsible for the outages.

D.

Ignore the error budget and continue deploying features to meet business deadlines.

How to approach this question

Apply Google's SRE principles regarding the consequences of exhausting an error budget.

Full Answer

B.Halt all new feature deployments and focus exclusively on reliability improvements until the error budget recovers.✓ Correct
Halt all new feature deployments and focus exclusively on reliability improvements until the error budget recovers.
An error budget is 1 minus the SLO (e.g., 100% - 99.9% = 0.1% error budget). It represents the acceptable amount of unreliability. When the budget is spent, the agreement between Dev and Ops dictates that feature releases are frozen, and engineering effort is redirected to fixing the underlying reliability issues.

Common mistakes

Choosing to ignore the budget (D) due to business pressure, which violates the core SRE framework tested on the exam.

Practice the full GCP Professional Cloud Architect Practice Exam 6

50 questions · hints · full answers · grading

More questions from this exam