A guide to problem management

Take a deep dive into problem management and learn how to manage recurring issues and improve workflows with one simple solution.

What is problem management?

Senest opdateret March 30, 2023

Problem management identifies and manages the lifecycle of incidents occurring in an IT service.

On the surface, problem management may simply appear to be handling incidents as they come up. The process, however, is much more robust, consisting of identifying the problem, understanding the root causes, and finding a solution. Additionally, your business needs to continue the process by documenting and communicating the details of the incident across teams to understand how to tackle the issue—or related issues—that may someday arise.

Our guide offers a wealth of knowledge about IT problem management, problem management tools, software features, and the benefits of problem management software to help you handle incidents all in one place.

Key features of problem management software

Key features of problem management software

The best problem management software can help you streamline workflows and handle all aspects of incidents. Here are the key features to look for while comparing and researching problem management tools.

Problem and incident ticket linking, type, and status

When more than one person reports an issue, your ticketing system can streamline the IT problem management process.

For example, if your outbound phone system suddenly stops working, you may see an influx of customer support tickets flooding your IT help desk. Rather than handling each ticket individually, you can link each incident-related ticket to a single problem ticket for your team to manage.

Any comments, changes, or resolutions made to the problem ticket will automatically occur on the linked tickets. When you solve the problem ticket, the status of all other incident tickets will automatically update to “solved.”

Knowledge base and self-service portal

Including a knowledge base and customer self-service portal can help users manage common problems without outside help. Along with help center articles for assistance, IT teams can document known errors and solutions to help others who may have similar problems. Users can also submit tickets or report incidents through their employee portal.

Analytics and reporting

Analytics and reporting features help teams learn more about the problem and systems affected in real time. Based on the problem's severity, your management, engineering, and any relevant team can use the data to collaborate on the incident and formulate a game plan. These reports can display the following:

  • Problem overview

  • Impact on service or operations

  • The root cause

  • Incident details and timing

  • Remediation

The reports and analytics can prepare you for future incidents and provide the steps to handle them should they occur.

Service level agreement (SLA) tracking

The ability to manage and define SLA service targets allows you and your agents to monitor your service level performance and meet your goals. Zendesk highlights tickets that fail to meet service level targets so that you can promptly identify and address problems.

You can also set up an automation that escalates high-priority tickets, sends them to the top of the queue, and creates triggers to alert agents when they need to address time-sensitive service level agreements.

AI and automation

With AI and automation features, the sky’s the limit. Some common uses for AI and automation with problem management software include:

  • Using chatbot software to answer FAQs, recommend help center articles, and provide self-service options

  • Sending out user satisfaction surveys to collect feedback and end-user data

  • Automating ticket notifications

  • Escalating and routing tickets to the agent best suited to handle the problem

Relationship between problem management and other ITIL processes

Problem management shares common DNA with a few other Information Technology Infrastructure Library (ITIL) processes. Here’s what they mean and how they play off each other to accomplish their goals.

  • Incident management process: This process resolves an issue as soon as possible. Teams use this process to find quick fixes or workarounds to keep systems or services running, whereas problem management seeks to find the root cause of issues.

  • Change management process: This process implements strategies to improve business processes, IT infrastructure, or IT operations. Problem management and change management can meet at the discovery point of a root cause, where you can implement changes.

  • Risk management process: This process evaluates potential problems and the likelihood of those issues occurring. Potential risks are often identified during the problem management process flow, usually while handling recurring issues.

Benefits of problem management

When an unpredictable incident occurs, it's crucial to have a problem management strategy and process in place. Here are some benefits for your business.

Reduce downtime costs with faster problem resolution

When your system or service goes down, the cost of downtime increases—and it can be massive. The 2022 Uptime Institute Outage Report shows that over 60 percent of network outages cost a business at least $100,000 in total losses. Effective problem management allows you to be proactive with protocols and systems to prevent downtime.

Boost productivity and efficiency

Problem management software can take your already resourceful team and boost their efficiency, even increasing metrics like first contact resolution. Monitoring systems manually for incidents is nearly impossible, and IT problem management software can identify an issue as it occurs and quickly alert your team so they can spring into action.

As mentioned above, your IT service desk may receive multiple requests about the same incident. Because problem management software streamlines the process by linking tickets, your team can focus on resolving the root issue rather than sorting through dozens of duplicates.

Structure your workflow for identifying root causes

Your team needs a problem management platform with customizable workflow configurations that allow them to quickly track incidents, identify root causes, and resolve problems. With out-of-the-box functionality and integrations, Zendesk can immediately put these systems in place for a stellar customer and employee experience.

Increase employee or customer satisfaction and retention

Issues can inhibit customers from using your services. IT problem management can help prevent issues from occurring or help you resolve them quickly. Keeping your service up and running, delivering swift resolutions, and minimizing the frequency of disruptions can reduce angry customers, increase customer satisfaction, and improve the overall customer experience.

Resolve recurring incidents and minimize future problems

There are two approaches to IT problem management: proactive and reactive. Problem management software can assist with both.

  • Reactive: adapt quickly in real time and eliminate recurring incidents

  • Proactive: minimize future problems

Implementing both approaches enables you to decrease unscheduled downtime and keep your service running seamlessly for your customers.

What is the problem management process?

The problem management process

The problem management process flow consists of seven steps:

  1. Detect the problem: Identify a problem through an incident report, continuous incident analysis, automated detection by problem management software, or supplier notification.

  2. Document the problem: Log all the details associated with the issue, starting with the date and time, user information, and description of the problem.

  3. Diagnose the problem: Investigate the incident and review your problem management database to find prior issues and/or resolutions.

  4. Find a workaround for the problem: Look for a quick fix to get systems up and running and minimize downtime while identifying the root cause.

  5. Create a known error record: Create a known error record to assist with future incident research and resolution.

  6. Resolve the problem: Implement the solution, and test it to confirm service was restored.

  7. Close the case: Close the ticket and associated incidents/tickets.

Examples of problem management + best practices

Problem management best practices

Follow these best practices to make the most of your problem management process.

Preventative action: Be proactive, not reactive

Some incidents might be unforeseen, requiring you to react in real time to identify the problem, find a workaround, and resolve it on the fly. But integrating problem management software with your help desk puts safeguards in place. This allows you to be proactive, helping you eliminate problems before they can begin or alert your team to potential problems before escalation occurs.

Example: Your IT problem management team identifies a history of the VPN crashing for remote call center teams. They investigate prior incidents and resolutions periodically and implement safeguards to prevent the incident from happening again.

Corrective action: Minimize downtime by addressing ongoing problems

Investigations into existing problems are considered “corrective” actions. Addressing these ongoing problems or those that have already occurred should be a top priority. It’s best to create processes and infrastructure for investigating and solving problems, which may include assigning roles or responsibilities to team members. These roles and responsibilities can include:

  • Problem manager: reports to the process owner and is responsible for identifying daily issues, monitoring recurring incidents, and reviewing the problem management report. The ITIL problem manager and service desk manager should collaborate to identify any potential problems occurring in real time.

  • Process owner: trains agents and ensures they follow the problem management process.

  • Process coordinator: leads communication with management and affected teams.

Some IT teams operate with a low headcount, so it might make sense to add these responsibilities to the duties of existing members, rather than building a new, dedicated team.

Example: Because VPN connectivity issues for remote teams is an ongoing issue, a problem management team is created and each member is assigned specific roles and responsibilities. While the IT team creates a workaround to get the remote agents logged in, the process coordinator communicates the issue across departments to keep everyone in the loop.

Post-incident evaluation: Conduct a retrospective meeting

After handling an incident, it’s important to evaluate the issue, systems, and team members involved. Conducting retrospective meetings can help you understand what went wrong, how the problem was solved, and how to avoid similar situations in the future.

A retrospective meeting should be held within 72 hours of the incident, while the details are still fresh. The meeting usually includes team members involved in the incident, like the incident manager, team leads or managers, agents, and engineers. Together, they can collaborate and produce an incident report.

The tasks in a retrospective meeting can include:

  • Reviewing all details of the incident, including the systems and participants involved

  • Reviewing and validating the findings of the root-cause analysis

  • Identifying and determining any remediation work needed to address the root causes that led to the incident

  • Assigning remediation tasks to the appropriate teams with clear SLAs

Example: The VPN suddenly went offline and your remote team couldn’t connect, resulting in 30 minutes of downtime. IT identified that the server hosting the VPN wasn’t connecting to the domain, creating a problem with authenticating users. Once IT fixed the issue, the team members involved came together the following day for a retrospective meeting, to create the incident report.

Knowledge sharing and documentation: Encourage collaboration and uniform logging

Collaborating across teams allows you to look at a problem from several angles to spot the root cause and offer practical solutions. Having a standard system to support knowledge sharing and logging past problems keeps teams organized and documents past and ongoing issues so nothing slips through the cracks.

Aside from implementing problem management software, businesses should promote a culture that welcomes knowledge sharing by breaking down team silos and reducing any perception of internal competition. Support agents who deal with problems directly can offer valuable feedback integral to determining a root cause or finding solutions.

Example: In the previous example, the IT team implemented a software update to fix the reporting software issue. The team can document the issue and solution, then share this knowledge across teams. If other employees experience the same or similar issues, this information can eliminate the problem right away.

Continuous incident analysis: Track the original problem and resolution

Tracking the original problem and the effectiveness of its resolution helps you understand how to make continuous improvements. Consistent tracking and analysis can give you new information to find better solutions to the original issue or upgrade to a more effective problem-management tool. It can also provide more context for agents to better understand how to deal with a similar problem the next time it occurs, allowing for faster resolution times and minimizing unscheduled downtime.

Example: Say computer system updates fixed the reporting software issues on your network, and those issues are documented in your problem management database. While reviewing the original problem and resolution, you discover that the technology advancements featured in new software can get rid of the crashing issues altogether.

Zendesk for problem management

Zendesk takes the “problem” out of problem management. It easily integrates with your existing tech stack so your IT team can get vital alerts quickly and react to incidents before they escalate.

Zendesk offers a native problem/incident ticket type, which makes it easy to tie multiple incidents reported by different people to a single problem ticket. Once the problem ticket is solved, the status of all incident tickets is automatically set to “solved,” streamlining work for your IT staff. Zendesk for problem management offers benefits like:

  • A simple user experience for IT staff and employees alike

  • Fast time to value and ongoing agility

  • An open, flexible, and scalable platform, so it can integrate with the tools your IT teams already use

  • The best total cost of ownership

Zendesk also handles most of the work on the back end, so your IT staff can continue working in unaffected systems while you resolve other system issues.

Frequently asked questions

Try a proactive problem management solution

Your team doesn’t need overly complex ITSM solutions. They need something simple, reliable, and intuitive. Whether you’re looking for basic service desk capabilities or something more, Zendesk is easy to use and quick to set up.

We have more to say about this. Have a look below.