what is fault tree analysis

Fault Tree Analysis Explained: Paving the Way for Reliability Mastery

In industrial maintenance, pinpointing system failures is crucial. Fault Tree Analysis (FTA) is key in this process, offering deep insights into complex failure networks.

Take a manufacturing plant’s conveyor belt system halt, for example. FTA digs beyond surface issues like motor failure, uncovering underlying causes such as electrical faults or maintenance oversights.

What is a fault tree analysis (FTA)?

It’s a systematic method that unravels failure events, revealing interconnected faults. FTA provides maintenance teams with a visual map to trace failure origins, aiding in effective decision-making and robust maintenance strategies.

The failure of a component in a system is often caused by the failure of other elements. For example, a vehicle’s braking failure can be caused by water in the brake cylinders, which in turn may be caused by failure of the cylinder seals.

A fault tree analysis, or FTA, provides a method of breaking down these chains of failures, with a key addition for identifying combinations of faults that cause other faults. This type of analysis provides maintenance team members a visual representation of how a problem occurred and the potential pathways that led to the main failure event. This process provides the analyzer with a logical sequence that helps identify the root causes of the failed event in question. When a fault tree analysis is used concurrently with other analysis methods, it provides a better overall picture for future maintenance strategies and decisions.

Using A Fault Tree Analysis

Fault Tree Analysis (FTA) is an important tool in risk assessment and system safety engineering, used for analyzing the potential failure modes within a system.

There are three main components to this analysis:

  1. The Fault Tree Diagram

This is the visual representation of the events leading up to the equipment’s breakdown or failure. It begins with the failure, and leads back to the root cause through a series of logical deductions.

  1. The Events

This includes the occurrences that caused the failure, contributors, and the failure directly, and include everything that may have or has happened leading to the failure. Events can be either input (i.e., lead to other occurrences) or output (i.e., result from other circumstances). 

  1. Logic Gates

Logic gates use an “and/ or” sequence to connect related events. Two input events that lead to an output event are connected through an “and” gate, while single input events that lead to an output event connect through an “or” gate. For instance, a broken wiring system and a burnt-out light bulb connect with an “and” gate. However, if only the wiring was faulty, an “or” gate is more appropriate.

How Does FTA Work?

FTA helps in understanding the relationships between various events and conditions that can lead to a specific undesired outcome. Here’s an overview of how FTA works:

1. Define the Top Event:

The first step in FTA is to clearly define the undesired outcome, often referred to as the “Top Event.” This is the event that is being analyzed and is usually placed at the top of the fault tree.

2. Identify Basic Events:

Basic Events are the most elementary failures or events that can contribute to the Top Event. These are events that are assumed to occur independently of each other. Identify and list these events that could lead to the Top Event.

3. Construct the Fault Tree:

Using logic gates (AND, OR, and NOT), construct the fault tree by connecting the Basic Events to the Top Event.

AND Gate: Represents that all input events must occur for the output event to occur.

OR Gate: Represents that any one or more of the input events can cause the output event to occur.

NOT Gate: Represents the negation of an event.

a sample FTA

4. Assign Probabilities:

Assign probabilities to the Basic Events. This step involves estimating the likelihood of each Basic Event occurring and contributing to the Top Event. Probability values can be based on historical data, expert judgment, or other sources.

5. Calculate Top Event Probability:

Using the logic gates and assigned probabilities, calculate the probability of the Top Event occurring. This is often done using probabilistic models such as Boolean algebra or probability calculations.

6. Sensitivity Analysis:

Conduct sensitivity analysis to identify which Basic Events have the most significant impact on the Top Event. This helps prioritize risk mitigation efforts.

7. Interpret Results:

Interpret the results of the analysis. If the calculated probability of the Top Event is within acceptable limits, the system may be considered reliable. If it exceeds acceptable limits, further analysis and risk mitigation strategies may be necessary.

8. Risk Mitigation:

Based on the findings of the analysis, develop and implement risk mitigation strategies to reduce the probability of the Top Event occurring. This could involve design changes, redundancies, or other safety measures.

9. Documentation:

Document the fault tree analysis, including the identified events, probabilities, logic gates, and conclusions. This documentation is valuable for communication, future reference, and regulatory compliance.

Below is an example of a blank fault tree analysis diagram, and it is effectively a fill in the blank practice. You write the initial problem in the top rectangle before working through the various events that could have led to it in the subsequent boxes.

a blank FTA

What Are The Benefits of A Fault Tree Analysis

Implementing FTA practices have significant benefits to organizations, managers, and maintenance teams. 

Systematic Identification of Failure Modes:

By identifying the root causes of system failures, teams understand the vulnerabilities of a system and are able to develop targeted risk mitigation strategies.

Risk Assessment and Prioritization:

By assigning probabilities to events and calculating the probability of the top event (undesired outcome), organizations can prioritize their efforts and resources to address the most critical and potentially catastrophic failure modes, essential for effective risk management and decision-making.

Proactive Risk Management:

This proactive approach minimizes the likelihood of unexpected failures, reduces downtime, and enhances overall system performance.

Communication and Decision Support:

This visual representation makes it an effective communication tool for conveying complex risk scenarios to various stakeholders, including engineers, managers, and regulators. FTA results offer decision support by providing insights into the critical factors affecting system reliability, aiding decision-makers in developing effective risk mitigation strategies and optimizing resource allocation.

While a handy tool for every professional to know, FTA is not the right answer for all situations and teams. Implementing other preventive failure and maintenance strategies such as condition based monitoring tools like IoT sensors, or implementation of a CMMS to ensure accurate data recording and analysis, allows failure tree analyses to complement a pre-existing maintenance strategy, or allows teams to develop a comprehensive preventive maintenance strategy.