Workbook

Make the Mission Yours

Role: DevOps Engineer

Use these activities to apply each principle to your current product, service, or project. These activities are a sample to get you started, not an exhaustive list. Adapt and expand them based on your team's context and needs. Capture your answers, share them with your team, and revisit them as you learn.

⚠️

Important: When Using AI Tools

When using AI-assisted activities, always double-check for accuracy and meaning each and every time. AI tools can help accelerate your work, but human judgment, validation, and critical thinking remain essential.

Review AI-generated content with your team, validate it against real user feedback and domain knowledge, and ensure it truly serves your mission and user outcomes before proceeding.

1) Shared Mission and Vision

Align reliability work to user-facing mission outcomes.

πŸ’‘

Learn More

For more information and deeper understanding of this principle, refer to the 1) Shared Mission and Vision section in the framework.

Workbook Activities (do now)

  • ☐List two mission-critical user journeys and map the infra components each depends on.
  • ☐Add the mission/outcome to one runbook header (e.g., β€œkeeps checkout under 2s for users”).
  • ☐Review the next release and note the infra implication for the user outcome it targets.
  • ☐Tag today’s top ticket with the user journey it protects and the SLO tied to that journey.
  • ☐Share in standup one infra choice you will make differently because of the mission outcome.

AI Assisted Activities

  • ☐Use AI to help draft mission statements or outcome mappings for your infrastructure work, but have your team review and refine them to ensure they reflect real user needs.
  • ☐Ask AI to generate potential user outcomes for your infrastructure changes, then validate each one against direct user feedback and system performance data.
  • ☐Use AI to help structure your "why this matters" notes in infrastructure tickets, but ensure human team members validate that each change truly serves the mission before deploying.
  • ☐Have AI analyze past infrastructure changes to identify mission alignment patterns, then use those insights in team discussions to improve how infrastructure connects to user outcomes.

Evidence of Progress

  • ☐Runbooks and tickets state which user journey they protect.
  • ☐You can explain infra tasks in terms of user outcomes (latency, uptime for journey X).

2) Break Down Silos

Co-create delivery with product/engineering/QA.

πŸ’‘

Learn More

For more information and deeper understanding of this principle, refer to the 2) Break Down Silos section in the framework.

Workbook Activities (do now)

  • ☐Co-design rollout/rollback with engineering and QA for the next release and document it.
  • ☐Host a 15-minute β€œdeployment huddle” to align on blast radius, metrics, and comms.
  • ☐Pair with a developer to add health checks and alerts before release.
  • ☐Invite PM/QA to review the change freeze/allow list for this deploy window.
  • ☐Replace an email thread with a live review of the deployment plan and risk matrix.

AI Assisted Activities

  • ☐When AI generates deployment plans or infrastructure code, have cross-functional team members (developers, QA, product managers) review them together to ensure they serve users and maintain reliability.
  • ☐Use AI to help draft deployment huddle agendas or runbooks, but ensure all roles contribute their perspectives during the actual planning session.
  • ☐Have AI analyze deployment patterns and incident reports to identify handoff friction, then use those insights in cross-functional discussions to improve collaboration.
  • ☐Use AI to help structure deployment collaboration sessions, but ensure human team members make decisions together about what to deploy and how it serves users.

Evidence of Progress

  • ☐Rollouts include pre-agreed metrics and rollback steps.
  • ☐Fewer surprise escalations during deployment windows.

3) User Engagement

See how reliability affects real users.

πŸ’‘

Learn More

For more information and deeper understanding of this principle, refer to the 3) User Engagement section in the framework.

Workbook Activities (do now)

  • ☐Listen to a support call about performance/outage and capture the user pain in their words.
  • ☐Add or refine one SLI/SLO tied to a specific user journey (e.g., p95 page load for task X).
  • ☐Shadow or replay a user session to see how latency/errors show up in the UI.
  • ☐Trace one recent incident to the user-facing symptom and note how to detect it sooner.
  • ☐Share a user quote about reliability in your next ops review to anchor priority.

AI Assisted Activities

  • ☐Use AI to analyze user feedback, support tickets, or performance logs to identify reliability patterns, but always validate AI insights through direct user observation or support call reviews.
  • ☐Have AI generate questions for user interviews based on your infrastructure assumptions, then use those questions in real conversations with users to build genuine empathy.
  • ☐Use AI to help summarize user research findings related to reliability, but ensure you review the summaries and add your own observations from direct user interactions.
  • ☐Have AI analyze user behavior patterns from your monitoring, then discuss those patterns with actual users to understand the "why" behind reliability issues before making infrastructure changes.

Evidence of Progress

  • ☐You can quote a user impact when prioritizing infra work.
  • ☐You track at least one SLO tied to a concrete user journey.

4) Outcomes Over Outputs

Track reliability outcomes, not just deployments.

πŸ’‘

Learn More

For more information and deeper understanding of this principle, refer to the 4) Outcomes Over Outputs section in the framework.

Workbook Activities (do now)

  • ☐For this change, define the expected outcome (e.g., error rate down, latency down) and observation window.
  • ☐After rollout, post a brief readout with before/after metrics and next action.
  • ☐Add a β€œrunway” checklist to the change: outcome metric, alert threshold, rollback trigger.
  • ☐If the outcome missed, propose a config/tuning adjustment and schedule it.
  • ☐Flag one noise-prone alert and tune it to better reflect user impact.

AI Assisted Activities

  • ☐When AI generates infrastructure configurations or deployment scripts, define reliability outcome metrics upfront and measure whether AI-generated changes achieve intended user outcomes, not just technical completion.
  • ☐Use AI to help analyze outcome data from your monitoring and identify patterns, but have human team members interpret what those patterns mean for users and the mission.
  • ☐Have AI help draft outcome definitions and success criteria for your infrastructure changes, but ensure the team validates them against real user needs and business goals before deploying.
  • ☐Use AI to track and report on reliability outcome metrics, but schedule human team reviews to discuss what the metrics mean and how to adjust infrastructure based on observed impact.

Evidence of Progress

  • ☐Infra changes include outcome hypotheses and post-release readouts.
  • ☐You rolled back or tuned based on outcome metrics, not just logs.

5) Domain Knowledge

Understand service dependencies and business constraints.

πŸ’‘

Learn More

For more information and deeper understanding of this principle, refer to the 5) Domain Knowledge section in the framework.

Workbook Activities (do now)

  • ☐Create a dependency map for a key journey showing services, queues, and external calls with owners.
  • ☐Mark which dependencies are front stage vs. back stage; highlight weakest links and owners.
  • ☐Review one compliance/policy constraint (data residency, retention) that affects this change and document it.
  • ☐Inspect a past reliability incident in this domain; list a guardrail or monitor to add now.
  • ☐Validate one third-party dependency assumption and record the result in the runbook.

AI Assisted Activities

  • ☐Use AI to help summarize domain documentation, service dependencies, or compliance requirements, but validate AI-generated domain knowledge through direct engagement with domain experts and system reviews.
  • ☐Have AI generate questions about domain constraints or ecosystem relationships for your infrastructure, then use those questions in conversations with domain experts to build deep understanding.
  • ☐Use AI to help draft dependency maps or service diagrams, but ensure team members review them with domain experts to verify accuracy and completeness.
  • ☐Have AI analyze past incidents or domain-related infrastructure issues, then discuss those insights with the team and domain experts to identify patterns and prevent similar problems.

Evidence of Progress

  • ☐You can point to the most fragile dependency for a user journey and the mitigation in place.
  • ☐Runbooks reference domain/policy constraints explicitly.

6) The Art of Storytelling

Translate infra work into user value.

πŸ’‘

Learn More

For more information and deeper understanding of this principle, refer to the 6) The Art of Storytelling section in the framework.

Workbook Activities (do now)

  • ☐Explain one reliability improvement as a user story: who is helped, what changed, and by how much.
  • ☐Create two summaries of an incident: technical (root cause) and user-facing (impact, prevention).
  • ☐Share a before/after narrative of a perf fix with charts and the user task it improved.
  • ☐Add a user quote or support snippet about reliability to your next ops update.
  • ☐Record a 60-second video explaining how this change protects a specific user journey.

AI Assisted Activities

  • ☐Use AI to help structure or draft infrastructure stories and incident summaries, but refine them with real user anecdotes, emotions, and personal observations from direct user interactions.
  • ☐Have AI generate different versions of infrastructure updates for different audiences (technical peers vs stakeholders), but ensure each version includes authentic human stories about real user impact.
  • ☐Use AI to help summarize infrastructure work in demos, but lead presentations with human stories about real users affected by reliability issues, using AI-generated summaries as supporting material.
  • ☐Have AI help draft runbooks or infrastructure documentation, but always include real user quotes, data points, or anecdotes that connect your infrastructure work to human impact.

Evidence of Progress

  • ☐Stakeholders can retell your infra updates in business terms.
  • ☐Teams cite your incident summaries to justify preventative work.