Physics of failure

Craig Hillman

Download PDF


When I tell people that I am a practitioner of physics of failure, the response is typically predictable: I either get a blank look, or a large smile breaks out as if landsman has been discovered after many years in the wilderness.

What is physics of failure? In some respects, that is a good question. Historically, there are some nice academic definitions that will tend to put you to sleep. They include:

  • A science-based approach to reliability that uses modeling and simulation to design-in reliability. It helps to understand system performance and reduce decision risk during design and after the equipment is fielded. This approach models the root causes of failure such as fatigue, fracture, wear, and corrosion
  • An approach to the design and development of reliable product to prevent failure, based on the knowledge of root cause failure mechanisms. The Physics of Failure (PoF) concept is based on the understanding of the relationships between requirements and the physical characteristics of the product and their variation in the manufacturing processes, and the reaction of product elements and materials to loads (stressors) and interaction under loads and their influence on the fitness for use with respect to the use conditions and time.

I have always thought that there are better ways of explaining physics of failure. The first and foremost is common sense. If you want to use something reliably, you’d better know how it fails (and not necessarily rely on a third party, i.e., the supplier). We all hope and pray that the designers of the bridge we are driving over know EXACTLY the mechanical strength of every rivet, bolt, and beam that makes up that structure. And yet, electronics engineering decided a long time ago that this approach was too complicated and too expensive. Instead we have design rules, ratings/derating (see my previous article on that subject) and failure rates.

The reason for the uniquely electronics approach to reliability is threefold. First, most engineers in the electronics world are electrical engineers, but most of the mechanisms that cause failure in electronics are material or mechanical. (Truism: when most people don’t understand something, they tend to ignore it.) Second, the supply chain for electronics can be amazingly complicated compared to other industries. Automotive manufacturers will buy their steel panels directly from the steel plant (sometimes). With electronics, there can be four/five/six levels of suppliers between the person concerned about reliability and the company actually making the part she is concerned about. (If you were awake, you only counted two, right? Keep reading.)

So, how or where to use physics of failure? The best way to use physics of failure is to understand the mechanisms that can cause failure in a ‘defect-free’ technology. I emphasize defect-free primarily to limit the scope of physics of failure (there are literally a billion ways for defects to kill your technology) and to separate reliability activities, which are underrepresented in most engineering organizations, from quality activities, which is often overrepresented (my humble opinion).

Figure 1

Figure 2

Here are two examples of technology specific mechanisms:

  • Digital Signal Processor
  • Connectors

How do defect-free digital signal processors (DSPs) fail? At the highest level, DSPs fail in two ways: Either due to ‘on-die’ mechanisms or to packaging mechanisms. On-die mechanisms of concern are primarily time-dependent dielectric breakdown (TDDB), electromigration (EM), hot carrier injection (HCI), and negative bias temperature instability (NBTI). For packaging, you need to consider electromigration of the solder bumps (a rare case of a mechanisms existing in both worlds), corrosion of the wire bonds, and creep and fatigue of the solder bumps or wire bonds, die attach, and solder balls.

Connectors, however, are a completely different beast. For one, they don’t have die (at least, not yet). Connectors’ biggest concern is the contacts within the part. Contacts can degrade due to stress relaxation of the metal contact, stress relaxation of the housing (polymers also stress relax), corrosion of the contact plating, or fretting corrosion due to differential movement (see image on right).

Each one of these mechanisms should have an algorithm describing the influence of stress and how it drives damage evolution and eventual failure of the technology

Surprisingly, connectors, which I find to be one of the most common failure modes, actually is one of the least complete technologies when it comes to physics of failure algorithms.

Once you know the mechanisms, then comes the hard part. You need to figure out the environment the product will experience over time. Temperature, humidity, shock, vibration, voltage, power, current, etc. All this stuff needs to be figured out in some detail. A great example is the graph in Figure 1 showing the daily highs and lows in Phoenix, Arizona, in the United States.

Now, you might be asking, why is Phoenix, Arizona a great example? For one, Phoenix is hot. Really hot. And Phoenix is also really cool (side benefit of being in the desert). These high temperatures and large temperature swings (note that almost every day sees a 15˚C change in temperature) can do a good job of accelerating different failure mechanisms. But, most importantly, Phoenix is a big city with lots of people who buy lots of electronics. Which means, that for most electronics used outdoors, Phoenix is a good representation of realistic worst-case conditions. You could try to do a prediction for a 15-year lifetime in the Gobi desert, but what’s the point?

Once you know your mechanisms and your environment, you are ready to go. Except, most companies have a slight problem. As seen in the previous discussion, there can be three, five, seven or even more failure mechanisms per part, and there can be hundreds if not thousands of parts on a board. With most equations requiring a half-dozen different parameters, a physics of failure practitioner could be looking at gathering over 50,000 pieces of data to perform 10,000 different calculations.

And this, more than anything else, is what has driven physics of failure into a closeted activity. It’s not that the industry does not see value in the practice. It’s just that the requirements to perform such an activity has led the industry to find effective substitutions and to limit physics of failure to technologies of concern or during radical changes in design or materials.

The eventual goal of every high reliability organization should be life curves for every relevant technology (remember, not every technology degrades over time). This, far beyond our current approach, would create electronics as safe and effective as that bridge we drive over everyday.

Craig Hillman is CEO and Managing Member for DfR Solutions. Dr. Hillman’s specialties include best practices in Design for Reliability (DfR), Pb-Free strategies for transitioning to Pb-free, supplier qualification (commodity and engineered products), passive component technology (capacitors, resistors, etc.), and printed board failure mechanisms. Dr. Hillman has over 40 Publications and has presented on a wide variety of reliability issues to over 250 companies and organizations.