Software Failure Probability: Key to IEC 62304 Safety Classification

Software (SW) products play an increasingly important role in the medical device field, and managing SW-related risks is paramount. IEC 62304:2006/A1:2015 represents (amongst other standards, incl. IEC 82304-1) the current state-of-the-art manufacturers should respect to ensure compliant SW development and maintenance. Within the activities required by IEC 62304, the determination of a so-called “safety class” of the software being developed is an important step as it allows manufacturers to identify the minimum set of activities required by the standard. While this minimum set of development/maintenance activities may not be sufficient to address all applicable regulatory requirements, depending on the product and the intended markets, understanding the safety classification process is still quite important, as a wrongly assigned safety class may result in inappropriate development, testing and documentation efforts.

Navigating the IEC 62304 Safety Classification Process

In IEC 62304, the software safety class (A, B or C, the latter being the most stringent class) depends on two main factors:

whether the residual risks in which a SW failure is involved are acceptable or not, and
the type of “injury” to which patients/users can be exposed.

When SW failures cannot result in patient/user exposure to a hazard or when the resulting hazard exposure leads to acceptable residual risks only, the SW is considered class A. When residual risks related to a SW failure are no longer acceptable, the SW will be regarded as either class C or class B depending on whether the resulting injury is considered “serious” or “non-serious”, respectively (you’ll find the definition of “serious injury” in IEC 62304).

Software Failure Probability Key to IEC 62304 Safety Classification 1

Figure 1: IEC 62304 SW safety classification process

In essence, determining the software safety classification requires a hazard analysis that can be conducted according to the ISO 14971 methodology (see Figure 2 for a simplified visual representation).

In this hazard evaluation, however, IEC 62304 requires two specific constraints to be considered:

SW failure occurrence shall be assumed to be certain (e.g. the SW failure probability of occurrence shall be 100%).
To mitigate (reduce) risks arising from a sequence of events involving a SW failure, only control measures external to the failing SW can be considered. Note here that a control measure is considered external to the failing SW if implemented in a hardware system or another SW system, provided that the latter is sufficiently well segregated from the failing SW (e.g. SW running on a different processor or microcontroller).

Software Failure Probability Key to IEC 62304 Safety Classification 2

Figure 2: Simplified ISO 14971 risk analysis process, intended for IEC 62304 safety class determination

P = harm occurrence probability / S = harm severity / * = possibly modified following the implementation of a risk control measure

Software Failures Don't Always Lead to Hazardous Situations or Harm

Even if software failure probability is considered inevitable, the likelihood of a hazardous situation and the overall probability of harm are not necessarily guaranteed.

A simplified interpretation of the IEC 62304 safety classification process is to assume that a SW failure probability of 100% leads to a harm occurrence probability of 100%. Such assessment often assigns an artificially high probability of occurrence to SW-failure-related risks. As a result, these risks are more likely to be situated in the non-acceptable area of traditionally used acceptability matrices in an ISO 14971-compliant risk management process (see Figure 3).

Software Failure Probability Key to IEC 62304 Safety Classification 3

Figure 3: Example of a Risk Acceptance Matrix illustrating acceptable and non-acceptable areas based on combinations of harm occurrence probability and harm severity.

To better understand this issue, we need to get back to the definition of risk probability. As a reminder, ISO 14971 defines risk as the “combination of the probability of occurrence of harm and the severity of that harm”. As shown in Figure 4, the overall probability estimate “P” for a given harm to occur is the multiplication of two different components that we’ll call “P₁” and P₂”:

P₁ reflects the probability of a hazardous situation to occur; and
P₂ reflects the probability that a hazardous situation results in a defined harm.

While P₂ is more likely to depend on clinical factors (e.g. a particular patient characteristic may be required for a given harm to occur), P₁ is the result of a sequence of events that, combined, lead to the hazardous situation occurrence. In this events sequence, the SW failure may only be one element. If the probability of occurrence of the other events in the sequence can be justified as not being maximal, then the hazardous situation occurrence probability P₁ will be lower than 100% even if the SW failure is considered certain, as required by IEC 62304.

Software Failure Only One Step in Event Sequence Leading to Hazards

Often, software failure is only one element in a sequence of events leading to a hazardous situation, as illustrated by the following example. A manufacturer produces a medical device comprising a hardware system that interacts with the patient and is controlled through software with a graphical user interface.

The manufacturer has defined alarm conditions for

safety-relevant issues (critical alarm), e.g. when an applied part (in contact with the patient’s body) overheats to a dangerous temperature, and
non-safety-relevant issues (non-critical alarm), e.g., that preventive maintenance will soon be required (e.g., based on a certain device use duration threshold).

In such a case, if the SW fails to prioritize a critical alarm over a non-critical one, a non-critical alarm warning screen could override a critical one in the SW graphical user interface. The device user may, therefore, not recognize that the applied part is overheating.

Such a hazardous situation requires the SW graphical user interface to fail, and IEC 62304 tells us that, in determining the SW safety class, we shall consider this SW failure as certain (i.e. P_SW on Figure 4 is 100%). However, this hazardous situation also requires critical and non-critical alarms to be triggered sequentially. Such a combination of alarm conditions is likely not occurring with a 100% probability (P_1.1 and P_1.2 in Figure 4). Considering the complete sequence of events, the hazardous situation occurrence probability (P₁) is expected to be lower than 100% (P₁ in Figure 4).

In addition, the hazardous situation (overheated applied part touching the patient skin) may not automatically lead to harm. Let’s assume that the applied part contains a hardware control mechanism preventing overheating above a given temperature. Such a control mechanism would limit the maximum temperature the applied part can reach so that only patients with already damaged or sensitive skin would experience burns. The probability of harm in this hazardous situation (P₂ in Figure 4) depends on the patient's skin health status. Depending on the device's intended purpose, it can be that not all patients will present damaged or sensitive skin, resulting in the hazardous situation resulting in harm (P₂) being lower than 100%.

This example clarifies that the harm occurrence probability estimate (P = P₁x P₂, in Figure 4) should be considered to determine if an acceptable risk can be lower than 100%, even though we considered the underlying SW failure to be inevitable.

Software Failure Probability Key to IEC 62304 Safety Classification 4

Figure 4: Impact of considering SW failure probability as certain within overall risk estimation and evaluation

P = Probability / S = Severity / X = multiplication / U = combined with

Assumed Certainty of Software Failure Probability in Safety Classification vs. Risk Management Activities

Another common misconception is that SW failures occur as certain throughout the entire risk management lifecycle. IEC 62304 only requires it in the context of SW safety class determination. Should you have data demonstrating that a given SW failure probability can reliably be estimated (at a lower value than 100%), IEC 62304 does not forbid the use of a “realistic” SW failure probability estimate in the context of the risk management activities executed to evaluate the benefit/risk profile of the medical device. This is particularly important when developing Software as Medical Devices (SaMD) products for which risk management based on i) worst-case scenario analysis and ii) the assumption that all risks have a probability of occurrence of 100% quickly leads to a “my software kills everyone, every time” situation.

Accurate Software Failure Risk Assessment: Streamlining Regulatory Efforts

IEC 62304 requirements for the SW safety classification process often result in the interpretation that SW-related risk occurrence shall be considered certain (i.e. having a probability estimate of 100%). In the context of determining a SW safety class, the standard requires SW failures to be considered happening with a maximal probability. However, considering this to imply that SW failure-related harm occurrence is always 100% is a simplification. Indeed, the SW failure is usually only one element in the sequence of events leading to a hazardous situation that may not always (i.e. with a 100% probability) result in patient/user harm. As such, systematically estimating the SW-related harm occurrence probability as maximal may result assigning an artificially high IEC 62304 safety class, leading to unnecessary additional development, testing or documentation effort.

Valentin Chapuis
Senior Quality & Engineering Support

Got a question or need advice?

Do not hesitate to contact us if you have any questions or need advice.

Blog