Are your FMEAs early enough and robust enough to prevent the need for late-project fire-fighting?
Can your FMEA efforts be evolved as your projects evolve?
Although the aerospace industry has been (until recently perhaps) exceptional at not shipping defective products, that does not mean they are not plagued with costs due to the late detection of failure modes. Commercial data tends not to be published, but the US Government Accountability Office does report to Congress on defense acquisitions...
In 2009, they reported that the 96 major defense programs had cumulative cost overrun of $296B (over $3B per program) and an average schedule overrun of 22 months. They agreed with the DOD on the sources of the problem, including inadequate knowledge, excessive requirements, an imbalance between wants and needs, and permitting the programs to move forward with such. They called for programs to adopt "a knowledge-based approach [...] that demonstrates high levels of knowledge before significant commitments are made." [GAO2009]
Despite concerted efforts to contain such cost and schedule growth since then, in 2016 (seven years later), the GAO reported there were 79 major defense programs with a cumulative cost overrun totaling $469B (over $5.9B per program) and average schedule overrun of 29.5 months. They reiterated their calls for knowledge-based product development saying "knowledge supplants risk over time". [GAO2016]
We suggest that it is not a lack of trying; after all, most (if not all) the people on those 79 programs would prefer to use knowledge to supplant risk. The problem is that existing techniques (such as traditional FMEA) are not able to deliver the required knowledge prior to when key decisions must be made on these programs. The GAO should instead be calling for the programs to adopt set-based (rather than point-based) methodologies to accelerate learning while delaying decision-making until the required knowledge is available.
ABOUT THE SITUATION
Traditional FMEA Happens Late
Given the astronomical costs of product failures in the field (the 737 MAX failures will likely
cost Boeing well over $18B), the aerospace industry has been very good at ensuring they
don’t ship products with catastrophic flaws. One of the key tools for that since the 1960’s
has been Failure Mode & Effects Analysis (FMEA). In the 1970’s, the automotive industry
adopted FMEA standards for their supply chains and have seen similar benefits.
That FMEA process generally started with the detailed designs, against which Design FMEAs
(DFMEAs) were performed; and then continued once the manufacturing process flows had been developed, against which Process FMEAs (PFMEAs) were performed. Although not nearly
as expensive as failures in the field, identifying failure mode & effects that late in the development cycle was still hugely expensive to fix — sometimes so expensive the automotive industry would choose not to fix it and instead just deal with the consequences. For the aerospace or medical devices industries, where fixing the issues is necessary, it would mean major project delays and cost overruns, often measured in the tens and hundreds of millions of dollars.
As a result, there has long been a push to move FMEAs earlier in the process. However, the
traditional FMEA based on detailed drawings and manufacturing process flows could not easily be moved earlier. As a result, the automotive industry (specifically, the AIAG and VDA organizations) recently released a new standard FMEA Handbook designed to be more easily moved to the front end.
Worse, trade studies are often done as part of a learning & innovation process, meaning:
Multi-Timing – there may be a series of due dates for different decisions and different levels of convergence of those decisions
Multi-Perspective – different people from different disciplines may be leveraging the same trade study from different perspectives, wanting to see different things
Multi-Result – the same trade study may be leveraged to deliver different results to support those different due dates and different perspectives, not just one answer
Multi-Input/Assumption/Requirement – the team is learning and innovating and thus the inputs to the trade study may be changing, forcing the trade study to evolve with the rest of the project
While we applaud the intent and basic objective, we will argue that there is more needed to truly enable your engineers to conduct effective FMEAs in the early/conceptual phases of your projects...
Similarly, outside the DOD...
The GAO also audits NASA, which is a very different environment than the DOD...
A higher-precision reflective testing device was designed for the manufacturing of the Hubble Space Telescope. A failure in its assembly lead to the Hubble mirrors to be very precisely ground to the wrong shape. Conventional testing devices correctly detected the spherical aberrations, but their results were discarded in favor of the higher-precision device. The eventual effect: the Hubble was unable to create sharp images, collected vastly reduced light, and was unable to execute many of its intended uses.
Fixing or replacing those mirrors when already in space would have been too expensive. So, instead they chose to replace the other optics to have the reverse problem, effectively undoing the spherical aberration. But other instrumentation had to be sacrificed to make that possible.
The Hubble was originally budgeted $400M with launch planned for 1983. That launch was delayed to 1984, then 1986, at which point the budget had grown to over $1.1B. Eventually it did launch in 1990 with the program budget well over $4.5B (ten times original budget). And that before the discovery of the mirror failure. The program is now over $10B. (To be fair, that is a relatively small price tag compared to the tremendous discoveries that it has yielded over the last 30 years, and it will likely continue producing discoveries for another 10-20 years. The point here is that it didn't need to cost nearly that much; and better FMEA coupled with "Success is Assured" decision-making may have helped tremendously.)
However, much was learned from those project failures, right? Perhaps. But the Hubble successor's development began in 1986; its $824M contract was awarded in 2001 with a $1.6B budget and 2007 projected launch; the James Webb Space Telescope (JWST) was then re-planned in 2005 with a $4.5B budget and a 2011 launch; which was later delayed 22 months to 2013. Primary construction finally completed in 2016.
In March 2018, in a practice deployment, the JWST's sunshield ripped and the associated cables did not tighten sufficiently, delaying launch to May 2020. In June 2018, an independent review was conducted that identified 344 potential failure modes, any one of which could have had devastating effects on the project. As a result, launch was pushed out to March 2021, with total budget now at $8.8B.
It is certainly better that those failure modes were identified when they could be corrected on the ground; but it would have been much better if those had been identified prior to construction; and better yet if those had been identified prior to design such that they could have been designed out in the first place. And it is not that the designers didn't want to identify those failure modes; the issue is in having the right enablers...
ABOUT THE CHALLENGES
Moving FMEA Earlier
There are two key challenges in moving FMEA prior to making the key design decisions:
Analyzing potential failure modes is done most easily and most efficiently when working against something concrete. As a result, FMEA has typically been performed late in the process, after the drawings are available. If the engineers attempt to do the traditional FMEAs earlier by sketching drawings earlier, they will be effectively making key decisions prematurely… causing engineers to jump to solutions… which defeats the goal of moving FMEA prior to making those decisions such that it can improve those decisions. But without drawings, how do we make the FMEA discussions adequately concrete? The standards suggest working against the requirements and functional breakdowns. But functional breakdowns can similarly force premature decisions; and at the same time, functional breakdowns can be inadequately concrete to really make analyzing potential failure modes efficient and effective.
Simply identifying failure modes prior to decision-making is certainly going to have some positive impact on that decision-making. But it won’t have as much impact as it should if there’s no way for the engineers to manage the complex trade-offs that need to be made between the newly-identified potential failure modes and all the existing performance and cost trade-offs that must be considered. Adding those potential failure modes into the mix is dramatically increasing the complexity… the complexity of the decision-making… the complexity of the trade studies… the complexity of the fundamental relationships.
As described by the joint AIAG & VDA standard FMEA Handbook, Failure Mode and Effects Analysis (FMEA) is intended to drive “decisions for further improvement of the product and process” resulting in a “product and process with reduced risk.” The first five steps together provide the basis for the sixth step: Optimization. “The primary objective of Design Optimization is to develop actions that reduce risk and increase customer satisfaction by improving the design.”
When changing design decisions to reduce risk of a specific failure, you may also increase cost, increase weight, reduce performance, increase scrap, or increase risk of other failures. In other words, to make decisions that truly optimize customer satisfaction, your design team needs visibility to the trade-offs that they are making.
There are other things we humans tend to be weak at which impact effective creation and use of multi-dimensional trade studies:
Uncertainty. We tend to want to insert a single value best guess. What we should be doing is looking at the full set of possibilities and make sure we're okay with the worst-case.
Statistics. We are notoriously bad at properly interpreting probabilities and statistics.
Aggregation. It is well-known, widely-observed that if we are estimating times for doing tasks that are a few hours or a few days, then we tend to be fairly accurate; but if estimating tasks that take a few weeks or months, then we tend to be quite inaccurate as we lose sight of the details and the interactions. A similar pattern happens when estimating the totals for systems of subsystems of components, or any other aggregation.
Combinatorics. We deal well with linear relationships; not so well with exponential ones. For example, if offered $10,000 per day for a month, or a penny the first day, and then double that the next day, and so on for that same month, which would you choose? Most human intuition would take the $10,000 per day and net $310,000. Those who have worked through the math would choose the doubling penny, and net $21,474,836.47.
Human intuition and innovation is incredibly powerful and should be leveraged in all our analysis and decision-making. But it also has its weaknesses that we should be well-aware of and use our visualization and analysis tools to back-fill, adjust, and auto-correct.
Image and video courtesy NASA's Goddard Space Flight Center.
ABOUT OUR SOLUTION
Making it Concrete, without Jumping to Solutions
The Decision Map doesn’t just identify the decisions (the circles)… it also identifies how each decision impacts the requirements -- the customer interests. Those impacts are explicitly captured as Relations (the rectangles) between the Decisions. That makes it possible for our Success Assured® software, leveraging its Set-Based computational engine, to compute the trade-offs between the competing customer interests. With those trade-offs made visible in Trade-Off Charts, the engineers have a mechanism for concretely discussing those trade-offs with all the stakeholders. With that mechanism in place for better making those key trade-off decisions between the customer interests, you also have a concrete model that can visually demonstrate the impact that the potential failure modes need to have on the decision-making. The impacts of the failure modes on the customers and their customer interests can be explicitly computed such that the failure modes can be directly factored into the trade-off decision-making.
In fact, by identifying the causal relationships between the various design decisions and the customer interests, you very often expose additional potential failure modes — the engineers can concretely see the causal paths by which the customer interests are being satisfied such that they can identify all the ways those paths could get broken, or could just fail to deliver the level of impact that is needed or expected.
Better yet, the Decision Map explicitly identifies what you need to know to make the trade-offs visible, such that you can make those decisions fully informed. Where there are gaps in the required knowledge, your team can get focused on acquiring that specific knowledge in the most expedient way possible. That effectively accelerates the learning ahead of the decision-making.
NET: The Decision Map is a key enabler for moving FMEA prior to making the key design decisions in such a way that the FMEA can properly impact the trade-off decision-making on complex product designs.
The same characteristics that make the Causal Map the superior visual model for Root Cause Analysis for better optimized Problem-Solving Decision Making also make it the superior visual model for capturing the Failure Chains that are the core construct of the FMEA process. [FMEA Handbook, p50] Those Failure Chains with common elements can be linked together in a Causal Map forming the larger Failure Analysis Structure Tree. [FMEA Handbook, p55] Better yet, that Causal Map can then be evolved to a full Decision Map by extending to the root causes (decisions that you can control) and to the key objectives (decisions on how well you satisfy the customer).
Without the Causal Map being developed into a Decision Map from which Success Assured® can compute Trade-Off Charts and Trade-Off Solvers, the FMEA process has key limitations, which according to the FMEA Handbook [p17] “include the following:
It is qualitative (subjective), not quantitative (measurable)
It is a single-point failure analysis, not a multi-point failure analysis
It relies on the team’s level of knowledge which may or may not predict future performance
It is a summary of the team’s discussions and decisions, therefore the quality of the FMEA report is subject to the recording skills of the team which may reflect the discussion points in whole or in part”
Executing the FMEA process using the Causal Map as a visual model of the Failure Chains and the fuller Failure Analysis Structure Tree requires no extra effort. And it provides a foundation for evolving to a Decision Map in the FMEA Optimization step. That Decision Map provides a quantitative design optimization, and can model multi-point failure modes. By having those Trade-Off Charts and Solvers backed by a Decision Map, the knowledge on which the decisions are being based can be challenged, vetted, and continuously improved such that the resultant quality is far less dependent on the individuals who wrote the report. Further, it provides a framework for rapid Decision-Focused Learning to close the identified knowledge gaps rather than just relying on the team’s current level of knowledge.
Note that the standard FMEA process is just focused on technical risks. The financial risks, time risks, and strategy risks are considered “out of scope”. [FMEA Handbook p15-16] A Decision Map can optionally be extended to trade-off all of those concerns when optimizing the decisions.
The Set-Based computational engine enables superior optimization within the entire trade space (all infinite points):
Enables human-in-the-loop optimization that corresponds better to how people need to make decisions, particularly regarding how customers will react
Exposes worst-case limits and safe vs. unsafe regions for making wise decisions, even in the face of uncertainty
Avoids models that miss key anomalies or catastrophic failure points
Enables "eliminate the weak" optimization, which in turn enables efficient convergent decision-making
We suggest addressing both of those challenges by introducing what we call the “Decision Map”. The Decision Map doesn’t stop at a functional breakdown. Rather, it takes the requirements that need to be satisfied in performing those functions, and identifies the decisions that need to be made that will impact the satisfaction of those requirements. By explicitly identifying them as decisions-to-be-made, you prevent engineers from prematurely making those decisions by jumping to solutions to make things concrete. Rather, the identification of the decisions and the full ranges of values that could be chosen for each makes everything concrete, while at the same time keeping the design as open as it should be, until you know the right choice to make for each decision.
Not just for billion dollar programs...
The video at the top of this page is the testing of the tank rupture failure mode of the inexpensive Ford Pinto, not an F-35 or Space Telescope; however, the cost to replace/repair the problem in the field was still so high that Ford chose to settle with the impacted owners instead.
Despite being a fairly inexpensive part, replacing the faulty ignition switches that could have effects such as shutting off the engine, power steering, power brakes, and airbags, had a net cost to GM of $4.1B.
Then there was the even cheaper and non-essential cruise control deactivation switches. However, its failure mode of overheating could have the effect of setting the engine on fire.
Even much cheaper products, like a cell phone or tablet, can have very expensive recalls if their failure can have the effect of catching fire or causing burns (e.g., toy ovens).
There are a tremendous number of ways things can fail, with many different potential effects. While the goal is to identify those up front such that they can be designed out from the beginning, the reality is that as decisions are made and the designs become more concrete, failure mode identification and prevention needs to be ongoing!
So, it is important to consider the reusability and continuous improvement of the knowledge and FMEA analyses, not just for future product development work, but for your existing work. The current project will need to continue to use, evolve, and refine that knowledge and the FMEAs developed based on that knowledge.
Three Layers of Reusable FMEA Knowledge
A tremendous amount of valuable knowledge is generated in most FMEA efforts. And given the significant effort involved, it is a waste if that knowledge is not easily reused. The key to reuse is that the generic reusable knowledge needs to be separate from the highly situation-specific knowledge. And the mechanisms need to do that automatically – it cannot be an extra effort to make the knowledge reusable. Rather, the natural flow of building the knowledge to get your work done, should capture it in a resuable format and organization. If making it reusable requires extra work, it will rarely be done.
And note that the reusability of your FMEAs and the knowledge on which they are based is not just for the benefit of future projects. This project will need to be able to reuse and continuously improve those FMEAs as decisions are made and the designs are refined and become more concrete – because more failure modes and effects will need to be identified and prevented, given that more specific knowledge as it becomes available. The K-Briefs (the top level of reusable knowledge) tell that story; within those K-Briefs, the trade-off charts and solvers capture the cause & effect reasoning for the decisions that prevent those failures; and the underlying Decisions and Relations capture the knowledge that enable all of that in a reusable way.
A Decision Map is a collection of related Decisions and Relations that together connect the decisions you can directly control to the customer interests you want to satisfy. The Relations capture the critical knowledge from the different disciplines.
Charts & Solvers
The Charts and Solvers capture how best to see the sensitivities and limits of the trade space and how best to optimize your competing decisions within that trade space to best satisfy your customers (all the stakeholders). From any Chart or Solver, you're always just one click away from the underlying Decision Map that it was computed from.
The Project, Problem, and similar K-Briefs tell the story of how the Charts and Solvers are being used to make the required decisions, based on the underlying knowledge. That may include identification of gaps in that knowledge that should be closed prior to further converging those decisions.
Contact Us to Schedule a Demo today!
Do you have a past FMEA along with the key trade-offs that needed to be made in a spreadsheet or similar? If so, send it to us and we'll even demo using your own data!
Confidently make decisions you won't need to change...
If you have questions, would like a demo, or would like to schedule us to visit you, please email us at Answers@TargetedConvergence.Com. Alternatively, you may call us at 1-888-LRN-FRST (1-888-576-3778). Either way, we'll route you to the right person.
Or if you prefer, you can send email direct to:
Our Sales Team at Sales@TargetedConvergence.Com.
Our On-the-Job Coaching Team at Mentors@TargetedConvergence.Com.
Our Software Support Team at Support@TargetedConvergence.Com.
Our Website Team at Webmaster@TargetedConvergence.Com.
Our Accounting Team at Accounting@TargetedConvergence.Com.
Our Human Resources Team at HR@TargetedConvergence.Com.
After 10 years in Carrollton, we have moved a few miles south to the Dallas Communications Complex in the prestigious Las Colinas region of Irving, TX:
Targeted Convergence Corporation
400 E Royal Ln Ste 290 Bldg 3
Irving, TX 75039-3602