In a previous blog post we explored how Mean Time Between Failure (MTBF), despite being commonly used, is not an effective metric for measuring and accessing the reliability of existing equipment or systems, or for predicting the reliability of future equipment or systems being developed.
MTBF isn’t the sole option. Here are five alternative approaches to failure prediction:
1. Reliability at a point in time with a Confidence Interval
This is the proper and most technically correct reliability metric that conforms to the classical definition of reliability:
Reliability is the probability that a system, product or function will perform as intended under stated conditions for a stated period of time.
This approach is common among industrial controls and some automotive manufacturers, especially in the U.S. It measures reliability across a certain number of hours of operation and expresses it as a percentage, such as 99% reliability after 5 years in service, or 97% reliability after 100,000 miles of vehicle usage. A confidence interval – also expressed as a percentage – is then attached to that reliability number to demonstrate the level of certainty surrounding it.
This approach correctly correlates reliability to a point in usage lifetime or usage cycle and forces consideration of the accuracy of the reliability number provided. If the confidence percentage declines, it raises potential concerns about the strength of the prediction or credibility of the data used to characterize field performance.
2. Bx or lx life metrics
Bx (Bearing Life metric) was developed in the ball and roller bearing industry. It defines the life point in time (hours, days or years) or cycles when no more than x% of the units in a population will have failed.
The often cited B10 life point, which can be correlated to, for example, 5 years or 100,000 miles or 1 million cycles, is the point when 10% of a population will fail by. In other words, the reliability is 90% at a specific point in the life usage metric that is appropriate to the type of equipment it is being applied to. Any value other than 10 can be used: 5, 2, 1, 0.5 and even 0.1 are regularly used Bx failure risk values.
Use of the Bx metric spread to other machine industries, where it is still widely used. Sometimes instead of using the Bx nomenclature it is listed as Lx to denote minimum time to failure percentage for any type of system equipment or component. The benefits and value of using the Bx/Lx reliability metric are:
- Distinctly correlates the expected or maximum allowable percentile of failures (and therefore survivors) to an application specific durability life point
- Not limited to the classical reliability concept that reliability is only concerned with failures that occur within the hypothetical useful service life (i.e. constant failure rate exponential distribution portion of the Bathtub Hazard Life curve)
- Inclusive of failures at all phases of the Bathtub curve including Infant Mortality (i.e. Quality issues), End of Life Wearout issues, and Random Failures addressed by MTBF/MTTF metrics
- Easily comprehended by non-reliability personnel, helping to lessen the frequent misuses and misunderstandings with the MTBF/MTTF metric
3. Failure Free Operating Period (FFOP)/Maintenance Free Operating Period (MFOP)
FFOP is a method dating back over 30 years to sync mechanical and electronic reliability approaches within a system, that was incorporated into MIL-STD-781D Reliability Testing for Engineering Development, Qualification and Production under failure-free period life tests.
There are two approaches to proving FFOP:
- Drive the constant failure rate so low that failure probability is highly unlikely over time
- Replace the exponential distribution with a three-parameter Weibull distribution
MFOP was originally a British concept that is similar to performance-based logistics. It places responsibility for some maintenance upon the manufacturer to address equipment failures in a timely fashion and ultimately avoid unscheduled maintenance. Calculation of MFOP requires an estimate of the system surviving during a maintenance-free period.
4. Mean Cumulative Function (MCF)
MCF is designed to replace the use of MTBF, and measures field performance of repairable systems over usage life, instead of testing and assessing the cumulative number of failures over a set period of time. MCF plots cumulative failure behaviors and determines the statistical techniques available to help users, versus MTBF’s assumptions that all systems are repairable, and all failures are independent and evenly distributed. It is especially useful for characterizing recurrent events in a fleet or population of equipment to determine whether your repairable system is improving, deteriorating, or remaining constant.
MCF is superior to MTBF in communicating what’s happening in the real world because it produces “big picture” data instead of one number. For example, Cochran-Mantel-Haenszel analysis is used to identify deviations from the curve, and archetypal analysis separates groups of systems to detect trends and outliers.
5. Rate of Occurrence of Failure (ROCOF)
ROCOF, also known as a Recurrence Rate (RR), is the mean rate of failures per unit time. It is a derivative of MCF. When MCF is differentiable the derivative is defined as ROCOF – the instantaneous rate of change in the expected number of failures. ROCOF provides the probability that a failure (not necessarily the first) occurs in a small time interval. The result is a plot of failure recurrence rate versus the item's age. ROCOF gets to the failure pain point by measuring the in-service performance of repairable units. ROCOF determines the instantaneous rate of change in the expected number of failures. It is a better way to determine, clearly evaluate and communicate the fluctuation in failures and correlates them to safety risks – information otherwise difficult to plot with straight MCF.
I hope this summary of MTBF alternatives can help you and your organization move to accurate and effective ways to measure and predict reliability. For more information on the continuing effort to eradicate the widespread misuse, misunderstanding, and misinformation about MTBF (which is most likely the worst four letter acronym in the reliability engineering profession), reference the No MTBF website maintained by our good friend and consummate reliability professional, Fred Schenkelberg.
To learn more about alternative options to MTBF, view our webinar, Predicting Component Warpage and Package Level Failure Modes. Click the button below to access.