Blameless on Microsoft Teams gives remote DevOps teams and Site Reliability Engineers a collaborative tool for effective incident resolution. You can read more about the integration of Blameless with Microsoft Teams here.
How two roads lead to Blameless
Deirdre Mahon and Nicolas Philip have worn many hats in their years working with startups. On the marketing and sales side, Deirdre has spent her career selling startup solutions to engineering teams, while Nicolas focused on helping organizations upgrade their data center solutions as they moved applications to the cloud.
Although they played different roles in their startup adventures, both Deirdre and Nicolas experienced the same pain point that many tech-forward companies do—managing incident resolution. Now serving as VP of Marketing and Director of Product Development at Blameless, respectively, Deirdre and Nicolas have found their paths converging in an exciting, new direction to help organizations ease their own incident resolution challenges.
The Blameless goal: Eliminating the toils of incident resolution
As consumer demand for speed and performance continues to rise, the tools businesses use to deliver those services have advanced to keep up. However, a harsh but inevitable truth is caught up in this constant cycle of rising expectations and technological velocity: Things can and do go wrong.
When something goes wrong on a digital platform, it’s called an incident. These disruptions can range from slight annoyance, such as a site acting sluggish, to incidents requiring immediate and urgent action, like a service outage in an entire region. As you can imagine, the longer an incident persists, the more expensive it becomes.
Site Reliability Engineers (SREs) are tasked with making sure platforms experience as few incidents as possible. Even the most reliable site, however, will inevitably experience them now and then. When an urgent disruption arises, speed is of the essence for the DevOps teams who must go in and fix it.
Unfortunately, the process of incident resolution is rife with what the Blameless team calls “toils.” They often experience friction when trying to share critical data and reports with each other or suffer distractions when they need to be laser-focused on the task. Some remote teams also struggle with knowing when and how to engage the right collaborators. Even when the right people are assembled, it can be a challenge to catch them up quickly so they know what the next steps must be.
Simply put, the process of solving problems is full of problems.
That’s where Blameless comes in.
When crucial minutes are ticking away and customers are increasingly frantic, it can be hard to know who needs to be involved in the conversation at any given moment. To eliminate the toils that were dragging down the resolution process, Deirdre, Nicolas, and the rest of the team at Blameless considered the end-to-end needs of DevOps teams tasked with remotely fixing urgent issues. They envisioned a service that would:
- Proactively and automatically alert the right people of an incident
- Facilitate an accessible space for all collaborators, regardless of location
- Make it easy to rope in new collaborators and quickly get them up to speed on what’s happening
- Give users a simple way to track and manage their individual tasks
- Capture the timeline of an incident from acknowledgment to resolution for future learning opportunities
The Blameless SRE platform was founded in 2017, offering users lightweight and intuitive site management tools for incident resolution, retrospective reporting, and reliability insights. However, the team also wanted their incident resolution solution to merge naturally with the DevOps teams’ existing workflow—which meant integrating with systems they were already using.
In her interactions with DevOps engineers in mid-to-large organizations, Deirdre realized that many were using Microsoft Teams as their collaboration tool. This presented the perfect opportunity for Blameless to make their solution readily available to the teams that most needed it. Microsoft Teams offered a wide reach and provided the Blameless developers with the features they needed to make their incident resolution process as intuitive and effective as possible.
Automated incident creation
The first essential step to speedy incident resolution is getting the problem in front of the right pair of eyes as soon as possible. The moment an alert triggers an incident, the Blameless app automatically generates a Teams channel and drops the most relevant team members into the conversation.
Simplified collaboration and context
Rather than have everyone waiting for their turn to be useful, Blameless makes it easier to rally users together on an as-needed basis. It also makes it easy to provide incoming team members with the context to know what’s happening and what their assigned role requires of them. Teams also lets users assign an incident commander, who orchestrates and tracks their progress as team members cycle in and out to do their part.
Task management and checklists
Incidents can be extremely complex, sometimes involving hundreds of tasks and activities. Adding task management as a feature was essential for giving individual users a self-guided experience as they collectively work toward a common goal. With checklists tailored to each user’s unique role and responsibilities, they can guide their own actions toward resolution, including assigning roles to other team members and escalating the incident if needed.
Behind-the-scenes runbook documentation
One of the most important components of incident resolution, Nicolas notes, is the ability for engineers to document their activity so they can get to the root of what caused the incident. Because some activities are executed outside of Blameless, they needed the app to run as a background feature that captures all of the engineers’ activities. With this comprehensive runbook, Blameless users can uncover the initial cause of the incident and identify flaws in their process that might have delayed their success, which helps them create more reliable systems and resolve similar issues faster in the future.
Plans for going beneath the surface
Looking ahead, Nicolas and the Blameless developers have plans to make the Blameless app on Teams even more intuitive. They’ll be adding incident summary features that will include tabs for easy access to information that might be separate from the main thread of conversation. They’re also looking to develop features to ease the onboarding experience, so teams can jump into it without a hitch even if they haven’t used it for several weeks.
As far as Microsoft’s ability to help the Blameless team reach their future goals, Nicolas says, “We’ve only touched the surface.” He was consistently impressed by the sheer volume of emerging Microsoft Teams capabilities for interacting with and building apps. Looking ahead, he’s excited for what those functionalities could mean for Blameless and incident resolution.
Working with a large organization can be daunting for startups. Many young businesses wonder if their vision will get lost in the vast framework of decision-makers and processes. The key, Deirdre says, is to be open and honest about what’s possible. Startup work can shift people into roles and responsibilities they weren’t expecting, and sometimes they find themselves trying to do everything at once. The Blameless team’s advice to other startups is to be diligent in how they focus their efforts—and develop relationships with partners who understand the challenges that young businesses must navigate.
To get access to Microsoft Teams and the rest of the Microsoft 365 suite, sign up to Microsoft for Startups Founders Hub today.