ANSYS DfR reversed_white

Five Myths of Reliability

Dr. Nathan Blattau, Dr. Bob Esser, and James McLeish

 

Download PDF

Myth 1: I don’t worry about design, because most of my problems are with defects from suppliers

While the majority of product failure can be traced back to supplier or manufacturing issues, the most severe warranty issues tend to be design-related, because every product at every customer can be at risk. As a result, design issues are much more likely to result in a recall and therefore place a much larger strain on a company’s bottom line.

Myth 2: I design for more rugged environments and therefore I can’t learn anything from consumer/computer electronics

The stresses experienced during operation of a computer or mobile phone can actually far exceed any loads applied to military, avionics, and industrial designs. For example, laptop computers left in the back of a car can experience temperatures as high as 80C on a hot summer’s day. Combine that with component temperatures that can exceed 100C during operation and you can have thermal cycles in number and severity that can exceed those experienced in by engine control unit (ECU) used in commercial and military applications.

Myth 3: Design verification is just the same as product qualification

The purpose of design verification is to understand the margins of your design. This is typically performed on prototype units and small sample sizes (1 to 3 units). Tests performed during design verification include HALT, “Paint the Corners”, UL Testing, Ship-Shock, etc. Once the design is robust, product qualification can then be performed.

The purpose of product qualification is to demonstrate that design + manufacturing process are sufficient robust to ensure 10-year lifetime. Product qualification should be performed on a pilot production, not prototypes, and should have a sufficiently large sample size (5 to 20 units) to have some confidence of capturing gross manufacturing issues. The tests performed during product qualification primarily consist of accelerated life testing (ALT).

There are real and substantial risks in performing product qualification tests on prototypes. Prototypes that pass qualification may not be representative of production units. This increases the risk of qualification testing not capturing potential field issues. If prototypes fail qualification testing, these failures may be irrelevant and any attempt at root-cause identification may be a misuse of time and resources.

Myth 4: HALT can be used to demonstrate product reliability

No. HALT can demonstrate product robustness. Only accelerated life testing can demonstrate reliability. What’s the difference?

Robustness is the measure of a products ability to withstand stress. For example, one inch of steel is more robust then one mil of paper. This is measurement is often defaulted to a time zero, which can be either immediately after manufacturing or when it first arrives at the customer.

Reliability is the measure of a products ability to perform a required function under stated conditions for an expected duration. The required function may be described by some output characteristic, such as satisfactory transmission from a communication product, the accuracy of weather identification by airborne radar, or the cleanliness of clothes from a washing machine. The stated condition might be a varying AC input, stormy weather, or boiling wash water. The expected duration may be in hours, miles, or the number of wash and dry cycles. Thus, by definition, reliability is an application specific term. Consider a transmitter in a mobile phone, which worked well for three years, and a similar transmitter in a communication satellite that carried out the same function for ten years. Both have the same reliability, having satisfied the customer expectations, even though in some aspects the robustness of the satellite transmitter may be far above that of the transmitter in the mobile phone. See ‘Why HALT is not HALT’ for more details.

Myth 5: Reliability is all statistics

Companies that produce some of the most reliable products in the world spend a relatively insignificant percentage of their product development performing obtuse statistical assessments. For example, many OEMs in telecommunications, military, avionics, and industrial controls require a mean time between failure (MTBF) number from their suppliers.

MTBF, colloquially known as ‘average lifetime’, defines the time period over which the probability of failure will be 63%. The base process of calculating MTBF involves applying a constant failure rate to each part and summing the parts in the design. While there have been numerous claims over the years of improvement on this number, by applying additional failure rates or modifying factors to take into account temperature, humidity, printed circuit boards, solder joints, etc., there are several potential flaws to this approach.

The first is misunderstanding. The average engineer, much less non-engineer (like your customer), will often expect a product with a MTBF of 10 years to operate reliably for a minimum of 10 years. In reality, this product will likely fail far before 10 years. Second, the primary approach for increasing MTBF is to reduce parts count. This can be deleterious if the parts removed are critical for certain functions, such as filtering, timing, etc., that will won’t affect product performance under test, but will influence product reliability in the field.

So, how do the leaders in the electronics industry handle MTBF?

Step 1: Find a low-grade engineer (rationale: so critical resources are not used in this process)
Step 2: Ask the customer what MTBF would they like? 10 years? 50,000 hours?
Step 3: Various adjustments are made in the MTBF calculations to provide the customer with the exact MTBF they require, plus a few additional thousand hours for a nice margin.

These leaders then move on with best practices in design reviews and verification, supplier assurance, process control, reliability prediction, and life testing to ensure optimum reliability of their product. Warranty performance, as opposed to MTBF, is then predicated upon previous year’s performance and a methodical assessment, using the best practices detailed above, of the risk entailed with any and all changes in design, manufacturing, and supply chain.

 

DISCLAIMER

DfR represents that a reasonable effort has been made to ensure the accuracy and reliability of the information within this report. However, DfR Solutions makes no warranty, both express and implied, concerning the content of this report, including, but not limited to the existence of any latent or patent defects, merchantability, and/or fitness for a particular use. DfR will not be liable for loss of use, revenue, profit, or any special, incidental, or consequential damages arising out of, connected with, or resulting from, the information presented within this report.

 

Download PDF


Reliability-Communications--MTBF-Better-Way-webinar_LP_Image.jpg

WEBINAR:

RELIABILITY COMMUNICATION: MTBF - IS THERE A BETTER WAY?

Explore alternatives for electronics reliability prediction.  Meantime Between Failure (MTBF) is a generally accepted standard for electronics reliability prediction, but the assumption it’s the best approach is questionable.