Medium1 markMultiple Choice
Subtask 5.1: Advise TeamsSREPostmortemsSLOOperations

GCP PCA · Question 45 · Advise Teams

You are advising a traditional IT operations team that is transitioning to a Site Reliability Engineering (SRE) model. They are currently overwhelmed by manual alerts and spend most of their time firefighting. Which TWO SRE practices should you recommend they implement first to reduce toil and improve reliability? (Select TWO)

Answer options:

A.

Implement blameless postmortems for every significant incident.

B.

Fire the operations team and force the developers to carry the pager.

C.

Create an alert for every single CPU spike over 80%.

D.

Define Service Level Objectives (SLOs) and only page engineers when the error budget is threatened.

E.

Implement a strict ITIL change management process with a 2-week approval window.

How to approach this question

Look for core SRE principles: learning from failure without blame, and alerting based on user experience rather than system metrics.

Full Answer

A,D
To transition to SRE and reduce toil, teams must change how they handle incidents and alerts. Blameless postmortems focus on fixing the system rather than punishing individuals, ensuring that action items are created to prevent recurrence. Defining SLOs and alerting based on error budget burn rates ensures that engineers are only woken up when the user experience is genuinely impacted, eliminating the noise of meaningless CPU or memory alerts.

Common mistakes

Choosing Option C. Alerting on CPU is a classic traditional IT mistake that leads to alert fatigue. High CPU is fine if the application is still serving requests quickly.

Practice the full GCP Professional Cloud Architect Practice Exam 7

50 questions · hints · full answers · grading

More questions from this exam