Although the aerospace industry has been (until recently perhaps) exceptional at not shipping defective products, that does not mean they are not plagued with costs due to the late detection of failure modes. Commercial data tends not to be published, but the US Government Accountability Office does report to Congress on defense acquisitions...
In 2009, they reported that the 96 major defense programs had cumulative cost overrun of $296B (over $3B per program) and an average schedule overrun of 22 months. They agreed with the DOD on the sources of the problem, including inadequate knowledge, excessive requirements, an imbalance between wants and needs, and permitting the programs to move forward with such. They called for programs to adopt "a knowledge-based approach [...] that demonstrates high levels of knowledge before significant commitments are made." [GAO2009]
Despite concerted efforts to contain such cost and schedule growth since then, in 2016 (seven years later), the GAO reported there were 79 major defense programs with a cumulative cost overrun totaling $469B (over $5.9B per program) and average schedule overrun of 29.5 months. They reiterated their calls for knowledge-based product development saying "knowledge supplants risk over time". [GAO2016]
We suggest that it is not a lack of trying; after all, most (if not all) the people on those 79 programs would prefer to use knowledge to supplant risk. The problem is that existing techniques (such as traditional FMEA) are not able to deliver the required knowledge prior to when key decisions must be made on these programs. The GAO should instead be calling for the programs to adopt set-based (rather than point-based) methodologies to accelerate learning while delaying decision-making until the required knowledge is available.
Given the astronomical costs of product failures in the field (the 737 MAX failures will likely
cost Boeing well over $18B), the aerospace industry has been very good at ensuring they
don’t ship products with catastrophic flaws. One of the key tools for that since the 1960’s
has been Failure Mode & Effects Analysis (FMEA). In the 1970’s, the automotive industry
adopted FMEA standards for their supply chains and have seen similar benefits.
That FMEA process generally started with the detailed designs, against which Design FMEAs
(DFMEAs) were performed; and then continued once the manufacturing process flows had been developed, against which Process FMEAs (PFMEAs) were performed. Although not nearly
as expensive as failures in the field, identifying failure mode & effects that late in the development cycle was still hugely expensive to fix — sometimes so expensive the automotive industry would choose not to fix it and instead just deal with the consequences. For the aerospace or medical devices industries, where fixing the issues is necessary, it would mean major project delays and cost overruns, often measured in the tens and hundreds of millions of dollars.
As a result, there has long been a push to move FMEAs earlier in the process. However, the
traditional FMEA based on detailed drawings and manufacturing process flows could not easily be moved earlier. As a result, the automotive industry (specifically, the AIAG and VDA organizations) recently released a new standard FMEA Handbook designed to be more easily moved to the front end.
While we applaud the intent and basic objective, we will argue that there is more needed to truly enable your engineers to conduct effective FMEAs in the early/conceptual phases of your projects...
The GAO also audits NASA, which is a very different environment than the DOD...
A higher-precision reflective testing device was designed for the manufacturing of the Hubble Space Telescope. A failure in its assembly lead to the Hubble mirrors to be very precisely ground to the wrong shape. Conventional testing devices correctly detected the spherical aberrations, but their results were discarded in favor of the higher-precision device. The eventual effect: the Hubble was unable to create sharp images, collected vastly reduced light, and was unable to execute many of its intended uses.
Fixing or replacing those mirrors when already in space would have been too expensive. So, instead they chose to replace the other optics to have the reverse problem, effectively undoing the spherical aberration. But other instrumentation had to be sacrificed to make that possible.
The Hubble was originally budgeted $400M with launch planned for 1983. That launch was delayed to 1984, then 1986, at which point the budget had grown to over $1.1B. Eventually it did launch in 1990 with the program budget well over $4.5B (ten times original budget). And that before the discovery of the mirror failure. The program is now over $10B. (To be fair, that is a relatively small price tag compared to the tremendous discoveries that it has yielded over the last 30 years, and it will likely continue producing discoveries for another 10-20 years. The point here is that it didn't need to cost nearly that much; and better FMEA coupled with "Success is Assured" decision-making may have helped tremendously.)
However, much was learned from those project failures, right? Perhaps. But the Hubble successor's development began in 1986; its $824M contract was awarded in 2001 with a $1.6B budget and 2007 projected launch; the James Webb Space Telescope (JWST) was then re-planned in 2005 with a $4.5B budget and a 2011 launch; which was later delayed 22 months to 2013. Primary construction finally completed in 2016.
In March 2018, in a practice deployment, the JWST's sunshield ripped and the associated cables did not tighten sufficiently, delaying launch to May 2020. In June 2018, an independent review was conducted that identified 344 potential failure modes, any one of which could have had devastating effects on the project. As a result, launch was pushed out to March 2021, with total budget now at $8.8B.
It is certainly better that those failure modes were identified when they could be corrected on the ground; but it would have been much better if those had been identified prior to construction; and better yet if those had been identified prior to design such that they could have been designed out in the first place. And it is not that the designers didn't want to identify those failure modes; the issue is in having the right enablers...
There are two key challenges in moving FMEA prior to making the key design decisions:
Analyzing potential failure modes is done most easily and most efficiently when working against something concrete. As a result, FMEA has typically been performed late in the process, after the drawings are available. If the engineers attempt to do the traditional FMEAs earlier by sketching drawings earlier, they will be effectively making key decisions prematurely… causing engineers to jump to solutions… which defeats the goal of moving FMEA prior to making those decisions such that it can improve those decisions. But without drawings, how do we make the FMEA discussions adequately concrete? The standards suggest working against the requirements and functional breakdowns. But functional breakdowns can similarly force premature decisions; and at the same time, functional breakdowns can be inadequately concrete to really make analyzing potential failure modes efficient and effective.
Simply identifying failure modes prior to decision-making is certainly going to have some positive impact on that decision-making. But it won’t have as much impact as it should if there’s no way for the engineers to manage the complex trade-offs that need to be made between the newly-identified potential failure modes and all the existing performance and cost trade-offs that must be considered. Adding those potential failure modes into the mix is dramatically increasing the complexity… the complexity of the decision-making… the complexity of the trade studies… the complexity of the fundamental relationships.
As described by the joint AIAG & VDA standard FMEA Handbook, Failure Mode and Effects Analysis (FMEA) is intended to drive “decisions for further improvement of the product and process” resulting in a “product and process with reduced risk.” The first five steps together provide the basis for the sixth step: Optimization. “The primary objective of Design Optimization is to develop actions that reduce risk and increase customer satisfaction by improving the design.”
When changing design decisions to reduce risk of a specific failure, you may also increase cost, increase weight, reduce performance, increase scrap, or increase risk of other failures. In other words, to make decisions that truly optimize customer satisfaction, your design team needs visibility to the trade-offs that they are making.
Image and video courtesy NASA's Goddard Space Flight Center.
The Decision Map doesn’t just identify the decisions (the circles)… it also identifies how each decision impacts the requirements -- the customer interests. Those impacts are explicitly captured as Relations (the rectangles) between the Decisions. That makes it possible to compute the trade-offs between the competing customer interests. With those trade-offs made visible in Trade-Off Charts, the engineers have a mechanism for concretely discussing those trade-offs with all the stakeholders. With that mechanism in place for better making those key trade-off decisions between the customer interests, you also have a concrete model that can visually demonstrate the impact that the potential failure modes need to have on the decision-making. The impacts of the failure modes on the customers and their customer interests can be explicitly computed such that the failure modes can be directly factored into the trade-off decision-making.
In fact, by identifying the causal relationships between the various design decisions and the customer interests, you very often expose additional potential failure modes — the engineers can concretely see the causal paths by which the customer interests are being satisfied such that they can identify all the ways those paths could get broken, or could just fail to deliver the level of impact that is needed or expected.
Better yet, the Decision Map explicitly identifies what you need to know to make the trade-offs visible, such that you can make those decisions fully informed. Where there are gaps in the required knowledge, your team can get focused on acquiring that specific knowledge in the most expedient way possible. That effectively accelerates the learning ahead of the decision-making.
NET: The Decision Map is a key enabler for moving FMEA prior to making the key design decisions in such a way that the FMEA can properly impact the trade-off decision-making on complex product designs.
We suggest addressing both of those challenges by introducing what we call the “Decision Map”. The Decision Map doesn’t stop at a functional breakdown. Rather, it takes the requirements that need to be satisfied in performing those functions, and identifies the decisions that need to be made that will impact the satisfaction of those requirements. By explicitly identifying them as decisions-to-be-made, you prevent engineers from prematurely making those decisions by jumping to solutions to make things concrete. Rather, the identification of the decisions and the full ranges of values that could be chosen for each makes everything concrete, while at the same time keeping the design as open as it should be, until you know the right choice to make for each decision.
The video at the top of this page is the testing of the tank rupture failure mode of the inexpensive Ford Pinto, not an F-35 or Space Telescope; however, the cost to replace/repair the problem in the field was still so high that Ford chose to settle with the impacted owners instead.
Despite being a fairly inexpensive part, replacing the faulty ignition switches that could have effects such as shutting off the engine, power steering, power brakes, and airbags, had a net cost to GM of $4.1B.
Then there was the even cheaper and non-essential cruise control deactivation switches. However, its failure mode of overheating could have the effect of setting the engine on fire.
Even much cheaper products, like a cell phone or tablet, can have very expensive recalls if their failure can have the effect of catching fire or causing burns (e.g., toy ovens).
There are a tremendous number of ways things can fail, with many different potential effects. While the goal is to identify those up front such that they can be designed out from the beginning, the reality is that as decisions are made and the designs become more concrete, failure mode identification and prevention needs to be ongoing!
So, it is important to consider the reusability and continuous improvement of the knowledge and FMEA analyses, not just for future product development work, but for your existing work. The current project will need to continue to use, evolve, and refine that knowledge and the FMEAs developed based on that knowledge.
A tremendous amount of valuable knowledge is generated in most FMEA efforts. And given the significant effort involved, it is a waste if that knowledge is not easily reused. The key to reuse is that the generic reusable knowledge needs to be separate from the highly situation-specific knowledge. And the mechanisms need to do that automatically – it cannot be an extra effort to make the knowledge reusable. Rather, the natural flow of building the knowledge to get your work done, should capture it in a resuable format and organization. If making it reusable requires extra work, it will rarely be done.
And note that the reusability of your FMEAs and the knowledge on which they are based is not just for the benefit of future projects. This project will need to be able to reuse and continuously improve those FMEAs as decisions are made and the designs are refined and become more concrete – because more failure modes and effects will need to be identified and prevented, given that more specific knowledge as it becomes available. The K-Briefs (the top level of reusable knowledge) tell that story; within those K-Briefs, the trade-off charts and solvers capture the cause & effect reasoning for the decisions that prevent those failures; and the underlying Decisions and Relations capture the knowledge that enable all of that in a reusable way.
A Decision Map is a collection of related Decisions and Relations that together connect the decisions you can directly control to the customer interests you want to satisfy. The Relations capture the critical knowledge from the different disciplines.
The Charts and Solvers capture how best to see the sensitivities and limits of the trade space and how best to optimize your competing decisions within that trade space to best satisfy your customers (all the stakeholders). From any Chart or Solver, you're always just one click away from the underlying Decision Map that it was computed from.
The Project, Problem, and similar K-Briefs tell the story of how the Charts and Solvers are being used to make the required decisions, based on the underlying knowledge. That may include identification of gaps in that knowledge that should be closed prior to further converging those decisions.
If you have questions, would like a demo, or would like to schedule us to visit you, please email us at Answers@TargetedConvergence.Com. Alternatively, you may call us at 1-888-LRN-FRST (1-888-576-3778). Either way, we'll route you to the right person.
Or if you prefer, you can send email direct to:
Our Sales Team at Sales@TargetedConvergence.Com.
Our On-the-Job Coaching Team at Mentors@TargetedConvergence.Com.
Our Software Support Team at Support@TargetedConvergence.Com.
Our Website Team at Webmaster@TargetedConvergence.Com.
Our Accounting Team at Accounting@TargetedConvergence.Com.
Our Human Resources Team at HR@TargetedConvergence.Com.
After 10 years in Carrollton, we have moved a few miles south to the Dallas Communications Complex in the prestigious Las Colinas region of Irving, TX:
Targeted Convergence Corporation
400 E Royal Ln Ste 290 Bldg 3
Irving, TX 75039-3602