DfR Solutions Reliability Designed and Delivered

Overview of New DoD Reliability Revitalization Initiatives

James G. McLeish, CRE DfR Solutions

Download PDF

ABSTRACT

Since the mid-1990s, the U.S. Department of Defense (DoD) has recognized two disturbing trends. First the percentage of new systems failing to meet reliability requirements was increasing, resulting in costly delays and redesign activities. Second the cost of supporting fielded systems due to decreasing durability and reliability performance was also increasing. A number of initiatives are now under way to reverse these trends. This paper will summarize several of the key initiatives that have been announced in the press and at the 2010 RAMS conference that include:

  • The Reliability related portions of the Weapons Systems Acquisition Reform Act defining acquisition policy updates designed to strengthen oversight and accountability [1];
  • Revitalization of Systems and Reliability Engineering processes being institutionalized to reduce risk [2+3+4];
  • A Reliability Program Scorecard tool developed to standardize and assist new programs in applying the use of reliability best practices and to track planned and completed reliability tasks [5];
  • Reliability initiatives currently under development;
  • The AVSI Reliability Prediction Technology Roadmap;
  • Proposals for resolving the limitations of actuarial reliability prediction methods by updating Mil-HDBK217 to include science-based Physics of Failure (PoF) reliability modeling and simulations methods. [6+7]

Key Words: Reliability Assurance, Reliability Assessment, Design for Reliability, Physics of Failure, Reliability Physics,

INTRODUCTION

An unintended consequence of DoD acquisition reforms and cost reduction efforts during the 1990s was a reduction in the rigor and effectiveness of sustainment planning, system engineering, and reliability assurance throughout materiel development programs. In the early 2000s, the DoD acquisition community started to recognize and document two disturbing trends in Defense Acquisition Programs. 1) An increasing percentage of systems not meeting their reliability requirements, and 2) the cost of supporting fielded systems was increasingly higher than expected. Two major reports recommended significant changes in acquisition policies to address these issues.

The first, “Setting Requirements Differently Could Reduce Weapon Systems’ Total Ownership Costs” a Government Accountability Office (GAO) report [2], concluded that the requirements generation process should:

  • Include “…total ownership cost, especially operating and support cost, and weapon system readiness rates as performance parameters equal in priority to any other performance parameters for major system before beginning the acquisition program”;
  • Require “…the product developer to establish a firm estimate of a weapon system’s reliability based on demonstrated reliability rates at the component and subsystem level”;
  • Structure “…contracts for major systems acquisitions so that…the product developer has incentives to ensure that proper trade offs are made between reliability and performance prior to the production decision.”

The second, the “Report of the Defense Science Board (DSB) Task Force on Developmental Test and Evaluation” [8], examined the cause of the growing trend in unsuitable systems and degraded reliability performance. The related recommendation of the DSB report concluded that:

  • “The single most important step necessary to correct high suitability failure rates is to ensure programs are formulated to execute a viable systems engineering strategy from the beginning, including a robust RAM program, as an integral part of design and development. No amount of testing will compensate for deficiencies in RAM program formulation.”

The DoD chartered the Reliability Improvement Working Group (RIWG) to implement the recommendations of the DSB. The RIWG membership was drawn from stakeholders across the DoD. A number of key policy changes were derived from the RIWG recommendations [4]. Some of these reforms have been inscribed into law by congressional legislation to ensure they become permanent DoD policy. Others are being develop and implemented under separate initiatives. The remainder of this paper will review some of the key reforms that have been announced to date.

WEAPON SYSTEMS ACQUISITION REFORM ACT

(WSARA) [1] was originally called the “Weapon Acquisition System Reform Through Enhancing Technical Knowledge and Oversight Act”. It is a bipartisan Act of Congress cosponsored by Sen. Carl Levin, (D-Mich), Chairman-House Armed Services Committee and Sen. John McCain, (R-Az) ranking member of the Senate Armed Services Committee. The Act was based on the 2008 Defense Science Board Task Force Report on Developmental Test and Evaluation and the report of the RIWG that cited the need to reform the way DoD contracts for and purchases major weapons systems.

The Act was introduced Feb. 23, 2009, and it quickly passed both the Senate and House unanimously (93-0 & 411-0). It was signed into law on May 22, 2009 by President Obama who cited the need to end the "waste and inefficiency" in defense acquisition. The need for such reforms was clearly demonstrated by external audits like one by the Government Accountability Office evaluation of 95 major defense projects uncovering cost overruns totaling $295 billion. “It will strengthen oversight and accountability by appointing officials who will be charged with closely monitoring the weapons systems we're purchasing” [9]. The Congressional Budget Office estimates that the new reforms will cost about $55 million dollars and should be in place by the end of 2010. It is expected that the reforms will save millions if not billions of dollars over the next decade [10].

Reliability related portions of the Act are intended to reverse the 20 year trend of system development shortcuts in DoD acquisition processes, including reductions in the reliability and acceptance test workforce that have resulted in excessive cost overruns and delays in weapon systems fielding. The reliability related items and objectives of the key sections of the act, summarized below, are designed to revitalize (or institutionalize) up front system engineering, total life time planning, competent design analysis, and testing while improving program and cost oversight:

TITLE I--ACQUISITION ORGANIZATION

  • Sec. 101. Cost assessment and program evaluation.
  • Sec. 102. Directors of Developmental Test and Evaluation and Systems Engineering.
  • Sec. 103. Performance assessments and root cause analyses.
  • Sec. 104. Assessment of technological maturity.
  • Sec. 105. Role of the combatant command commanders in identifying joint military requirements. TITLE II--ACQUISITION POLICY
  • Sec. 201. Trade-offs among cost, schedule, and performance objectives.
  • Sec. 202. Strategies to ensure competition.
  • Sec. 203. Prototyping requirements.
  • Sec. 204. Actions to identify and address systemic problems in major defense acquisition programs.
  • Sec. 205. Additional requirements for certain major defense acquisition programs.
  • Sec. 206. Critical cost growth in major defense acquisition programs.
  • Sec. 207. Organizational conflicts of interest in major defense acquisition programs. TITLE III--ADDITIONAL ACQUISITION PROVISIONS
  • Sec. 301. Awards for Department of Defense personnel for excellence in acquisition.
  • Sec. 302. Earned value management.
  • Sec. 303. Expansion of national security objectives of the national technology and industrial base.
  • Sec. 304. Comptroller General reports on costs and financial info of major defense acquisition programs

REVITALIZING SYSTEMS & SUSTAINMENT

One of the main reliability effects of WSARA is the codification of key findings and recommendations in the GAO report [2+11] regarding incorporating sustainment planning into the systems engineering process—especially in the area of Total Ownership Cost analysis and control from the earliest program activities. The DSB 2008 Developmental Test and Evaluation report identified the importance of establishing a viable systems engineering process at the beginning of programs [8]. Unfortunately, the DoD and services systemically dismantled systems engineering activities beginning in the early 1990’s and revitalization efforts in the reliability arena are incomplete.

This issue is addressed in WSARA by requiring the DoD to: (1) evaluate if the necessary systems engineering development planning, lifecycle management and sustainability capabilities needed to ensure that key acquisition decisions are supported by a rigorous systems analysis and systems engineering process are in place and (2) establish organizations and develop skilled employees needed to fill any gaps in such capabilities [12]. Similar capability evaluations and corrective actions are specified for test and evaluation activities.

Re-establishing these capabilities along with other tasks are the responsibility of the new Directors of Developmental Test and Evaluation (D,DT&E) and Systems Engineering (D,SE), positions that were created within the Office of the Secretary of Defense (OSD) by WSARA.

RELIABILITY PROGRAM SCORECARD [5]

One problem that often faces government source selection teams is how to evaluate the ability of an offeror to develop a “viable RAM strategy.” The RIWG adapted an Army tool, the “Army Materiel System Analysis Activity (AMSAA) Scorecard,” to help in the assessment of both the program office and contractor’s system reliability efforts. The scorecard has focus areas for reliability requirements planning; reliability testing; failure tracking and reporting; and reliability verification and validation. When applied early in the program, the scorecard can identify areas for improvement before significant problems emerge. There are 40 elements organized into the following eight evaluation categories:

  • Requirements and Planning
  • Training and Development
  • Reliability Analysis
  • Reliability Testing
  • Supply Chain Management
  • Failure Tracking and Reporting
  • Verification and Validation
  • Reliability Improvements

The scorecard examines a supplier’s use of reliability best practices, as well as the supplier’s planned and completed reliability tasks. A risk assessment score is calculated based on the individual reliability risk ratings assigned to each element. Each element is given a color coded risk rating of high (red) which is allotted a score of 3, medium (yellow) which equals 2, low risk (green) which is rated as a 1, or not evaluated (gray). The elements are weighted and are normalized to produce a risk score for each of the eight evaluation categories, these are combined into the overall program score. Elements that are not evaluated are removed from the risk score calculations. The risk scores for each element are adjusted by weighting factors to produce an overall reliability risk normalized to a value between 1 &100. A low score equates to a low reliability risks and a high score indicates a high risk correlated to a visual green (go), yellow (caution), and red (stop) scale.

dod fig 1

The scorecard provides an early evaluation of RAM capabilities in an acquisition program while helping to identify reliability gaps. It provides a guide to help program offices and contractors think about reliability early in the acquisition process and throughout the program’s life cycle. The Reliability Program Scorecard is a valuable tool for evaluating risks and making an early initial reliability projection for a program. A copy of the scorecard can be obtained from the DoD Reliability Information and Analysis Center’s web site, http://www.theriac.org/ under “AMSAA Reliability Growth Tools and Scorecard”.

RELIABILITY INITIATIVES CURRENTLY UNDER DEVELOPMENT

A number of initiatives to permanently institutionalize reliability activities into major defense acquisition programs were defined under WSARA.

Reliability related duties of the new Director, Systems Engineering position include: using systems engineering approaches to enhance reliability, availability, and maintainability (RAM) and ensuring that provisions related to reliability growth are included in the request for proposals on major defense acquisition programs.

Acquisition executives of military departments and defense agencies responsible for major defense acquisition programs are responsible for ensuring that their organizations provide the System Engineering (SE) and Developmental Test and Evaluation (DT&E) organizations with appropriate resources and trained personnel to:

  • Define a robust RAM and sustainability program as an integral part of design and development within the systems engineering master plan;
  • Identify systems engineering lifecycle management, RAM, and sustainability requirements and incorporate them into contract requirements;
  • Define appropriate developmental testing requirements,
  • Participate in the planning of DT&E activities;
  • Participate in and oversee DT&E activities, test data analysis, and test reports.

To develop plans for implementing these initiatives the Under Secretary of Defense for Acquisition, Technology, & Logistics (AT&L) directed the new DoD Director of Systems Engineering to convene a Reliability Senior Steering Group (RSSG) in April 2010. Subordinate Reliability Working Group (RWG) teams were organized and tasked to address reliability-related issues regarding policy, people, and practice for reliability across the DoD. These teams actively developed recommendations regarding the reconstitution of DoD reliability, test, and sustainment related policy, skills, and processes. Related policy and guidance announcements are expected in the near future.

AVSI RELIABILITY ROADMAP

The Aerospace Vehicle Systems Institute (AVSI) is a Texas A&M University research cooperative of aerospace companies, the DoD, and the Federal Aviation Administration that works to improve aerospace vehicles, their components, systems, and development processes. The AVSI has undertaken a project to chart the future of reliability research by developing a reliability technology road map for electronics reliability assessment practices.

The AVSI team is applying a Quality Function Deployment (QFD) process to analyze and prioritize the experiences and recommendations of a large number of industry experts to define future reliability methods and research. A QFD is a widely used tool to help sort out, identify, and prioritize the requirements for complex issues in order to transform user needs and demands into criterion and plans that incorporate key characteristics from the viewpoint of potential end users.

This project has so far generated a prioritized “wish list” of 64 reliability assessment features that are evaluated by 25 needs criteria. The next step is to evaluate how well existing or near term reliability assessment/prediction methodologies or tools fulfill the objectives of the wish list items. Current or near term reliability prediction methods/tools that are determined to adequately fulfill wish list needs and features will be identified as easily achievable “Low Hanging Fruit” items that will be recommended as suitable for immediate usage in current and new program.

The wish list items that can not be easily implemented in the short term will be evaluated to estimate the effort needed to make them happen. These items will be identified as near term and long term recommended efforts to help influence the future direction of reliability research. The Reliability Roadmap results will be provided to the DoD for reliability research planning, future revisions to military and aerospace handbooks and processes, and to help direct future industry reliability research and development.

UPDATING MIL-HDBK-217 RELIABILITY PREDICTION METHODS TO INCLUDE PoF

The DoD’s Defense Standardization Program Office (DSPO) has initiated a multi-phase effort to update MIL-HDBK-217, the military’s often imitated and frequently criticized reliability prediction “bible” for electronics equipment that has not been updated since 1995 [4+12]. There are numerous concerns about the actuarial reliability prediction methods defined in MIL-HDBK-217 which have been covered thoroughly elsewhere [14+15]. The main concerns are:

1) Currently, 217 predictions are based on constant failure rates which model only random failure situations that do not account for infant mortality and wearout issues. Tabulation errors where infant mortality and wearout issues are tallied as random failures are another risk of this scheme which can produce significant inaccuracy in predicted failure rates.

2) Actuarial reliability predictions typically correlate poorly to actual field performance since they do not account for the physics or mechanics of failure. Hence, they cannot provide insight for controlling actual failure mechanisms and they are incapable of evaluating new technologies that lack a field history to base projections on.

3) The models are based upon industry wide average failure rates that are not vendor, device, nor event specific and MTBF results provide no insight on the starting point growth rate and distribution range of true failure trends. Also, the MTBF concept is often misinterpreted by people without formal reliability training.

4) Over emphasis on the Arrhenius model and steady state temperature as the primary factor in electronic component failure while the roles of key stress factors such as: temperature cycling, humidity, vibration and shock are not modeled [15+16+17+18].

5) Over emphasis on component failures when 78% of electronic failures are due to other issues that are not modeled such as: design errors, PCB assembly defects, solder and wiring interconnect failures, PCB insulation resistance and via failures, software errors, etc. [19]

6) The last 217 update was in 1995; new components, technology advancement, and quality improvements developed since then are not reflected in the current actuarial data tables [15+20].

In addition to the current effort to create 217-revision G that entails a simple update of the actuarial failure rate tables, the team also developed a proposal for a future revision H where improved and more holistic empirical MTBF models could be used for comparison evaluations during a program’s acquisition-supplier selection activities. Later, science-based Physics of Failure (PoF) reliability modeling combined with probabilistic mechanics techniques would be used during the actual system design-development phase to evaluate and optimize stress and wearout limitations of a design to foster the creation of highly reliable, robust systems.

The Physics of Failure approach (also known as Reliability Physics) applies analysis early in the design process to predict the reliability and durability of specific design alternatives in specific applications. This provides knowledge that enables designers to make design and manufacturing choices that minimize failure opportunities in order to produce reliability optimized, robust products.

PoF focuses on understanding the cause and effect physical processes and mechanisms that cause degradation and failure of materials and components [21]. It is based in the analysis of loads and stresses in an application and evaluating the ability of materials to endure them from a strength and mechanics of materials point of view. This approach, known as load-to-strength interference analysis, has been used for centuries in mechanical, structural, construction, and civil engineering, integrates reliability into the design activity via a science-based process for evaluating materials, structures, and technologies. In PoF, failures are organized into 3 categories:

1) Overstress Failures such as yield, buckling, and electrical surges that occur when the stresses of the application rapidly or greatly exceeds the strength of a device’s materials. This causes immediate or imminent failures.

2) Wearout Failures defined as stress driven damage accumulation of materials which includes failure mechanisms like fatigue and corrosion.

3) Errors and Excessive Variation issues comprise the PoF view of infant mortality. Opportunities for error and variation touch every aspect of design, supply chain, and manufacturing processes. These issues are the most diverse and challenging of the PoF categories. The diverse, random, and stochastic events involved cannot be modeled using a deterministic PoF cause and effect approach. However, reliability improvements are still possible when PoF knowledge and lessons learned are used to implement error proofing and select capable manufacturing processes that ensure robustness [22].

The proposed PoF circuit card assembly section defines four categories of analysis techniques (see Figure 2) that can be performed with currently available Computer Aided Engineering (CAE) analysis techniques. This methodology is aligned with the Analysis, Modeling, and Simulations methods recommended in Section 8 of SAE J1211 - Handbook for Robustness Validation of Automotive Electrical/Electronic Modules [23]. The 4 categories are:

1) E/E Performance and Variation Modeling used to evaluate if stable E/E circuit performance objectives are achieved under static and dynamic conditions that include tolerancing and drift concerns.

2) Electromagnetic Compatibility (EMC) and Signal Integrity Analysis to evaluate if an electronic assembly generates, or is susceptible to, disruptions by Electromagnetic Interference and if the transfer of high frequency signals is stable.

3) Stress Analysis is used to assess the ability of electronic packaging structures to maintain structural and circuit interconnection integrity, maintain a suitable environment for E/E circuits to function reliably, and determine if a device is susceptible to overstress failures [24].

4) Wearout Durability and Reliability Modeling uses the results of the stress analysis to predict the long-term stress aging/stress endurance, gradual degradation and wearout capabilities of a CCA [24]. Results are provided in terms of time to first failure, the expected failure distribution in an ordered list of 1st, 2nd, 3rd, etc. devices, features, mechanisms, and sites of most likely expected failures.

These techniques provide a multi-discipline virtual engineering prototyping process for early identification of design weaknesses, susceptibilities to failure mechanisms and for predicting reliability when improvements can be readily implemented at low costs.

CONCLUSIONS

The 1990 era attempts at up front cost reduction in DoD acquisition programs through reducing system engineering, RAM, development testing, and sustainment planning efforts resulted in large development cost overruns, excessive maintenance burdens, and increased support costs in a number of defense systems. The realization of the long term consequences of these policies has led DoD leadership to institute a policy and guidance reversal to revitalize these capabilities and return to a leadership role in system engineering, system assurance, RAM, and sustainment technologies and methods. Some of these policy changes have been incorporated into law by the Weapon Systems Acquisition Reform Act of 2009. The initiatives reported in this paper are only the starting point in the revitalization effort. The DoD Reliability Senior Steering Group (RSSG) and their Reliability Working Groups (RWG) are diligently and rapidly working to refine these efforts, develop additional initiatives, and to launch activities to implement them. Announcements of the next steps are expected in late 2010 or early 2011.

dod fig 2

BIOGRAPHY

James G. McLeish, CRE

DfR (Design for Reliability) Solutions

5110 Roanoke Place, Suite 101

College Park, Maryland 20740 – USA

e-mail: jmcleish@dfrsolutions.com

Mr. McLeish holds a dual EE/ME Masters degree in Vehicle E/E Controls System. He is a Certified Reliability Engineer and a core member of the Society of Automotive Engineering Reliability Standards Workgroup with over 32 years of automotive and military Electrical/Electronics experience. He started his career as a practicing electronics designing engineer who helped invent the first microprocessor based engine computer at Chrysler Corp. in the 1970’s. He has since worked in systems engineering, design, development, product, validation, reliability and quality assurance of both E/E components and vehicle systems at General Motors and GM Military. He is credited with the introduction of Physics-of-Failure methods to GM while serving as an E/E Reliability Manager and E/E QRD (Quality/Reliability/ Durability) Technology Expert. Since 2006 Mr. McLeish has been a partner and manager of the Michigan office of DfR Solutions, a quality/reliability engineering consulting and laboratory services firm formed by senior scientists and staffers from the University of Maryland’s CALCE Center for Electronic Product and Systems. DfR Solutions is a leader in providing PoF science and expertise to the global electronics industry.

REFERENCES

[1] The Weapon Systems Acquisition Reform Act of 2009. Public Law 111-23. 2009. Washington, DC: U.S. Congress.

[2] “Setting Requirements Differently Could Reduce Weapon Systems’ Total Ownership Costs.” GAO-03-57. Washington, DC: Government Accountability Office, Feb. 2003

[3] P. M. Dallosta, U.S. Defense Acquisition University, “The Impact of Changes in DoD Policy on Reliability and Sustainment”, RAMS 2010

[4] Final Report of the Reliability Improvement Working Group (RIWG). 2008. Washington DC.

[5] M. H. Shepler, N. Welliver, USAMSAA “New Army and DoD Reliability Scorecard”, RAMS 2010

[6] L. Gullo, “The Revitalization of MIL-HDBK-217”, IEEE Reliability Newsletter, Sept. 2008. http://www.ieee.org/portal/cms_docs_relsoc/relsoc/Newslette rs/Sep2008/Revitalization_MIL-HDBK-217.htm

[7] J. G. McLeish, Enhancing MIL-HDBK-217 Reliability Predictions with Physics of Failure Methods, RAMS 2010.

[8] Report of the Defense Science Board on Developmental Test & Evaluation, U.S. Dept of Defense, May 2008.

[9] Whitehouse Press Release “Remarks-by-the-President-atsigning-of-the-WSARA, May 22, 2009.

[10] R. Lake, "Weapon bill passes House, goes to Obama". http://firstread.msnbc.msn.com/_news/2009/05/21/4426481- weapons-bill-passes-house, MSNBC (21 May 2009),

[11] G.R. Schmieder, “Reintegration of Sustainment into Systems Engineering During the DoD Acquisition Process”, RAMS 2010

[12] Summary of the Weapon Systems Acquisition Reform Act of 2009, Senator C. Levin, press release #308525, Feb. 24, 2009

[10] G. F. Decker, “Policy on Incorporating a Performance Based Approach to Reliability in RFPs”, Dept of the Army Memo, Feb, 15 1995

[11] F. R. Nash, “Estimating Device Reliability: Assessment of Credibility”, AT&T Bell Labs/Kluwer Publishing, 1993.

[12] M. Pecht, “Why the Traditional Reliability Prediction Models Do Not Work - Is There an Alternative?”, Electronics Cooling, Vol. 2, pp. 10-12, January 1996

[13] M. Osterman, “We Still have a headache with Arrhenius”, Electronics Cooling, pp 53-54, Feb. 2001

[14] M. Pecht, P. Lall, E. Hakim, “Temperature as a Reliability Factor”, 1995 Eurotherm Seminar No. 45: Thermal Management of Electronic Systems, pp. 36.1-22

[15] O. Milton, “Reliability & Failure of Electronic Materials & Devices”, Ch 4.5.8 – “Is Arrhenius Erroneous” Academic Press, San Diego CA. 1998

[16] D.D. Dylis, M.G. Priore, “A Comprehensive Reliability Assessment Tool for Electronic Systems”, IIT Research/ Reliability Analysis Center, Rome NY, RAMS 2001

[17] “PRISM vs. commercially available prediction tools”, RIAC Admin Posting #558. May 17, 2007 RIAC.ORG, http://www.theriac.org/forum/showthread.php?t=12904

[18] R. Alderman, “Physics of Failure: Predicting Reliability in Electronic Components”, Embedded Technology, 7-2009

[19] S. Salemi, L. Yang, J. Dai, J. Qin , J.B. Bernstein Physics-of-Failure Based Handbook of Microelectronic Systems, Defense Technical Info Center/Air Force Research Lab Report., U of MD & RIAC, Utica, NY, Mar. 2008

[20] SAE J1211 – “Handbook for Robustness Validation of Automotive E/E Modules”, Section 8 - Analysis, Modeling and Simulations, SAE, April 2009.

[21] S.A. McKeown, Mechanical Analysis of Electronic Packaging Systems, Marcel Dekker, New York 1999.