Custom Tools as SOUP? (Software of Unknown Provenance)

JSambrook · Sep 12, 2011

Hello,

In building software for medical devices it seems fairly common to have certain proprietary software tools that somehow contribute either code or data to be incorporated into the medical device software that is being built.

For example, you might have a tool that does certain calculations and generates a set of data tables as output. Those data tables may then be incorporated, in one way or other, into the medical device software.

I'm interested in good strategies for the process by which those custom tools are developed.

For example, a company that is IEC 62304 could mandate that the development of such tools shall follow the same process as is followed for the product itself. That is, the custom tools would be developed in an IEC 62304 compliant process.

I'm wondering about another option, however. In this option, the manufacturer elects to treat those tools as SOUP. The tools could then be developed following a relatively lightweight process.

The other half of the equation, however, is to have a quality gate that seeks to ensure that erroneous outputs (e.g., a data table that contains errors) does not get into the medical device software.

The output of the tools, however (e.g., the data tables generated) would be subject to review for correctness, testing, etc., before it was included in the medical device software itself.

I'd be grateful for your thoughts on this.

Thanks!

Stijloor · Sep 13, 2011

JSambrook said:
Hello,

In building software for medical devices it seems fairly common to have certain proprietary software tools that somehow contribute either code or data to be incorporated into the medical device software that is being built.

For example, you might have a tool that does certain calculations and generates a set of data tables as output. Those data tables may then be incorporated, in one way or other, into the medical device software.

I'm interested in good strategies for the process by which those custom tools are developed.

For example, a company that is IEC 62304 could mandate that the development of such tools shall follow the same process as is followed for the product itself. That is, the custom tools would be developed in an IEC 62304 compliant process.

I'm wondering about another option, however. In this option, the manufacturer elects to treat those tools as SOUP. The tools could then be developed following a relatively lightweight process.

The other half of the equation, however, is to have a quality gate that seeks to ensure that erroneous outputs (e.g., a data table that contains errors) does not get into the medical device software.

The output of the tools, however (e.g., the data tables generated) would be subject to review for correctness, testing, etc., before it was included in the medical device software itself.

I'd be grateful for your thoughts on this.

Thanks!

Any comments and thoughts for John?

Thank you very much!

Stijloor.

sagai · Sep 14, 2011

Hi John, I apologize, but I do not understand what is your question, Regards Szabolcs

pldey42 · Sep 14, 2011

JSambrook said:
Hello,

[..]

I'm wondering about another option, however. In this option, the manufacturer elects to treat those tools as SOUP. The tools could then be developed following a relatively lightweight process.

The other half of the equation, however, is to have a quality gate that seeks to ensure that erroneous outputs (e.g., a data table that contains errors) does not get into the medical device software.

The output of the tools, however (e.g., the data tables generated) would be subject to review for correctness, testing, etc., before it was included in the medical device software itself.

I'd be grateful for your thoughts on this.

Thanks!

I think this would be a mistake, and a bad one because failures in medical devices can raise life or death issues.

Manufacturing industry has known for several decades that waiting until the end of the production line to detect errors with inspections and tests is wasteful, costly and error-prone. Their quality management systems insert checks and balances through design, manufacturing and supplier management processes in order to detect and eliminate problems as early as possible, when they're cheapest to fix.

Software is no different, and often so complex that tests and inspections performed after the product is made will miss errors. That's one reason why we all have the patch management nightmare: problems only surface when a customer does something we did not anticipate in our tests and inspections.

I've worked in software for many years and have yet to see a "lightweight process " that's not a plausible-sounding excuse for "let's skip software engineering discipline, get it done and out there real quick so we can get paid, then pick up the pieces later with the maintenance contracts we'll make the customers sign." Software is the only industry I know of that's not held legally responsible for product failures.

Good software processes (some of them characterized as "lightweight" meaning "without the excessive bureacracy that some quality people insist on imposing for no good reason") include quality gates in many places, typically reviews, inspections and tests throughout design and development, because these processes are critically reliant upon human skill and good team communications and, being people-intensive, errors inevitably creep in. If they're not detected early, they get progressively and exponentially more expensive to fix, so bad "lightweight" processes carry massive hidden costs - costs not only of rework, late delivery, etc, but of the consequences of bugs which in medical devices could be fatal.

Here's a simple example of why it won't work. The celebrated programmer and writer Dijkstra once wrote that, if you want to be sure that a program that performs simple addition is correct, you have to watch it do all possible additions and check all its answers. So if it can add two numbers between 0 and 9, that's 10 times 10 = 100 possible sums it could be asked to do, so you have to do 100 tests to be dead sure it's right. While in manufacturing they can get the number of tests down using statistical methods, we can't do it in software because it's a digital, not an analogue medium. Things often don't fail progressively as analogue limits approach, but can unexpectedly fall off a cliff. For example, our little adding-up program might fail suddenly and catastrophically if it runs out of memory, perhaps due to a programmer failing to properly manage memory allocation and deallocation - and worse: such a problem might never occur in the lab (using the nice big fat development machines) but in the client's machine fitted with minimal memory which, with our "lightweight" process, we failed to identify as a constraint within which the program must operate and which must be tested for.

By analogy, if the SOUP produces data tables, the only way to be sure it's right with a quality gate at the end of production is to check every single cell of every table, else how would we know that there's not one value that's wrong and will kill someone? The probability of this happening would be considerably reduced if we knew that the design and development process included interim reviews, inspections and tests that mitigate the probability of errors - not just in algorithms and calculations, but in memory management, database storage and retrieval, data communications across networks, etc.

Please, no. Software processes have learned much from manufacturing industry's move away from only testing and inspecting at the end of the production line.

Hope this helps,
Pat

Peter Selvey · Sep 19, 2011

For the comment above by pldey42:

It is often pointed out that software usually has too many combinations to test (even 62304 says this). However, the background assumption here is that hardware and mechanical design is fully testable. This is wrong. Hardware/mechanical design is even more complicated that software, impossible to test to cover all real world situations, and it is normal to have hardware/mechanical problems in medical devices released to market.

One difference seems to be that the presumption of failure is built into the design. So, for example, a connector that gives some trouble after 1-2 years of use might result in some annoyance, but the defensive design is such that the failure of the connector would rarely result in any serious harm. If it did, the designers would not be given a hard time about the connector itself, but that a weak design allowed something as simple as a bad connection to cause harm.

Software engineers don't think like this (myself included). There is something about software that lulls us into a false sense of security. And when a problem occurs in the market, regulators reinforce this thinking by always blaming the design process (software design controls), not weakness in the design itself.

The original proposal to treat the data table as SOUP and then use a gate to catch erroneous data is in fact a good and reasonable approach.

Of course, this does not mean we should ignore the quality of the data table altogether. In hardware you would not deliberate choose low quality parts just because you have a defensive design.

So to answer the original post, yes, you could treat the data table as SOUP, and then apply the following controls:

5.3.3, 5.3.4 - specify performance, function, support needed for the data table
5.5 - verify the data table as a software unit (e.g. by sampling plan if the data table is large)
7.1.1 - consider the implication of failure of data table
==> use "gate" as a risk control (assuming high harm potential)
==> gate becomes part of Clause 5 (specification/ architecture/ verification etc)

Actually, when you think about it it does not really matter much if the data table comes from a third party or not, the process would be much the same. A good defensive design would consider failure of the data table and conclude a gate is necessary.

dsterling · Sep 24, 2011

Stijloor said:
Any comments and thoughts for John?

Thank you very much!

Stijloor.

Hello,

My firm is ISO 13485 registered and IEC 62304 compliant. We develop medical devices for FDA and CE-mark. It wasn't clear from your question whether you are submitting to FDA, but if you are, then 21CFR820.70(i) governs your problem.

Custom tools, as you described, are NOT SOUP. SOUP is software that is actually incorporated into the medical device (e.g. operating system, comm library, etc.). Something you buy or open source code that is of complete or somewhat unknown quality because you don't have access to the qualifying materials (e.g. spec, test protocols, reports, etc.).

In the scenario you described, you have a data table produced by your tool that IS incorporated into the medical device.

You have two choices: 1) validate the table or 2) validate the tool that produced the table.

If you choose to validate the table, then you are under no obligation to validate the tool that produced it. Period. The rationale is that you are putting no trust in the tool to be "correct" because you are validating the end result. This is the same logic that allows you to avoid validation of compilers.

However, if your situation dictates that validating the table is more arduous than validating the tool, you must validate the tool.

Under IEC 62304, the extent to which you validate the table is governed by the risk that the table poses in the system. Everything in 62304 is about risk. You have to determine if, under the architecture/design of your system, the table component is of Class A, B, or C and adjust your level of process to comply.

IEC 62304 is less clear about the extent of tool validation... it indicates that the tool must be validated as does the FDA (per 21CFR820.70(i)) but it is not as clear as to the level of process. I would argue that the Class of the table should be a guide to the process used to validate the tool itself. It would be difficult to argue that a Class C table was generated by a Class A tool. Having said that, it is possible under 62304 to have components of the tool of lesser class than the whole. You would have to decompose the architecture and start with the top level of class equal to the table (so it produces of table of that class) and then show any components within the architecture that do not effect the ultimate risk of the end result. An example may be a logging component that produces runtime data but does not affect the output data table.

In any tool validation, as a minimum, you need to write a specification of the requirements the tool is to meet, generate a test protocol that tests those requirements, and generate a report of the results. The result should include the test results, the configuration of the tool itself and any other tools or equipment used to produce it and test it (so it is repeatable).

I hope this is clear. If you need more help, you can learn more at the Sterling Smartware website (can't post the link)

Best,
Dan

pldey42 · Sep 29, 2011

Hi Peter,

For the comment above by pldey42:

It is often pointed out that software usually has too many combinations to test (even 62304 says this). However, the background assumption here is that hardware and mechanical design is fully testable. This is wrong. Hardware/mechanical design is even more complicated that software, impossible to test to cover all real world situations, and it is normal to have hardware/mechanical problems in medical devices released to market.

There’s no such background assumption at all. Nothing is fully testable. The point is that tests in software, and no doubt in other engineering disciplines, can create a false sense of security because the number of combinations of circumstances and events that software can encounter is enormous, especially when it’s multi-tasking real-time control software (and perhaps a data table generator is not so complex). In control systems, I/O is handled by interrupts. If the control program comprises just one million instructions, that’s one million potential failure points if there’s an error in interrupt handling that corrupts the stack or the running program’s data. Impossible to test for. Easier to inspect and design defensively against.

One difference seems to be that the presumption of failure is built into the design. So, for example, a connector that gives some trouble after 1-2 years of use might result in some annoyance, but the defensive design is such that the failure of the connector would rarely result in any serious harm. If it did, the designers would not be given a hard time about the connector itself, but that a weak design allowed something as simple as a bad connection to cause harm.

Software engineers don't think like this (myself included). There is something about software that lulls us into a false sense of security. And when a problem occurs in the market, regulators reinforce this thinking by always blaming the design process (software design controls), not weakness in the design itself.

I think the regulators are absolutely right. A weakness in the design is a weakness in the software design process - which includes requirements capture. The failure is in the management and requirements capture process, where failing safe gets missed as a requirement so it doesn't get into the design. If the design is weak, then there are weaknesses in the design process and the design review process – probably due to management being unwilling to invest time and money getting it right and instead doing the bare minimum to get to market. (One suspects that if the medical devices were being fitted to managers and their loved ones, it might be different.)

The original proposal to treat the data table as SOUP and then use a gate to catch erroneous data is in fact a good and reasonable approach.

I don’t think anyone’s opinion is more a fact than anyone else’s.

Of course, this does not mean we should ignore the quality of the data table altogether. In hardware you would not deliberate choose low quality parts just because you have a defensive design.

So to answer the original post, yes, you could treat the data table as SOUP, and then apply the following controls:

5.3.3, 5.3.4 - specify performance, function, support needed for the data table
5.5 - verify the data table as a software unit (e.g. by sampling plan if the data table is large)
7.1.1 - consider the implication of failure of data table
==> use "gate" as a risk control (assuming high harm potential)
==> gate becomes part of Clause 5 (specification/ architecture/ verification etc)

Actually, when you think about it it does not really matter much if the data table comes from a third party or not, the process would be much the same. A good defensive design would consider failure of the data table and conclude a gate is necessary.

Yes, but there’s more to it. If the consequence of an error in any single data table entry is fatal to the patient, then surely either all the values need to be checked (making the table generating tool redundant) or the sampling plan needs to be strong. Software tends to fail catastrophically; it suddenly falls off a cliff due to memory overflow, or a clash over shared resources, or a number that’s bigger than the word length available, or a loop that fails to terminate … Without a sense of these modes of failure, black box testing, as it’s called tends not to expose all the errors, whilst exposing others that were unwittingly unaccounted for in the design. White box testing (where you can see the design) is differently powerful (neither more nor less than white) especially in the hands of an experienced engineer who knows to look for loop terminations, boundary conditions, buffer overflows, matching capture and release of shared resources, weak programming technique and so forth.

So if you’re going to test a data table produced by a tool using a statistical sampling plan, the question is, how do you determine the sample without knowing how the table generator works, and how do you estimate the probability of field failure?

I think my original point still stands. Testing the output is weaker than inspecting and testing through the process. The question is, as dsterling suggests, which method best mitigates the risks to users to an acceptable level?

Hope this helps,
Pat

Peter Selvey · Sep 29, 2011

Hi Pat,

Some very good points. The lack of defensive design in software is still a design control issue, under the architecture side. And a sampling plan makes certain assumptions (e.g. data is homogeneous) which implies we know something about how the data table was created (i.e. not a black box).

I guess the difference I was trying to get at was the typical response to a serious software bug is to assume the testing was weak, whereas a serious hardware bug we assume the architecture was weak (e.g. absence of redundancy, segregation).

One example, infusion pumps have many problems, with around 100 deaths per year. The FDA indicates the most common problem is the pump just stopping and quietly display some error code such as "E-038". If the infusion drug was critical, the absence of infusion can (and does) result in death or other serious outcomes.

It seems to me that the industry response is more testing, to eliminate the error codes. My guess is the increase in error codes itself is problem caused by too much focus on software and internal error checking, with the error checking not being properly validated. This follows an experience working with dialysis systems, where an increased focus in software resulted in a marked increase in nuisance alarms (but at least they provided an alarm...).

A hardware engineer would look at that and say, well, no amount of testing can make sure that a complex "single channel" system will guarantee infusion. So, we need a independent system that monitors infusion and provides an alarm if it stops unexpectedly, regardless of the cause, hardware or software.

pldey42 · Sep 30, 2011

Absolutely, Peter. The obscure error code signalling a possibly fatal condition sounds like a classical failure in requirements capture to consider the hectic environment of a medical facility and the need to draw attention to errors.

It sounds like there's a need in the medical industry to learn from the telecom and aero-space industries where, as no doubt you're aware, they invest significant time and money in double or triple redundancy - two or three different sets of hardware and two or three implementations of the software (yep, get the same program written by independent teams) and some way of letting the machines vote about critical decisions.

Then, hopefully, an error in one does not exist in the others and isn't fatal.

They also use monitoring systems as you suggest. In some cases there's a special, highly reliable machine monitoring a critical machine, and phoning home to the vendor if it sees something unusual.

And I'm told that the military sometimes use fancy features in Ada that enable the programmer to tell the compiler something about the behaviour to expect so the software becomes to some extent self-checking.

Good IT network operators do something similar by protecting the network with several firewalls bought from different vendors in the hope that if something nasty gets past one, another will spot and swat it.

Pat

igocrazyhere · Nov 20, 2019

My company is looking to build some minor services like these....Can someone point me to a sample quality plan for these please? thanks so much

Custom Tools as SOUP? (Software of Unknown Provenance)

JSambrook

Stijloor

sagai

Quite Involved in Discussions

pldey42

Peter Selvey

dsterling

pldey42

Peter Selvey

pldey42

igocrazyhere

Registered

Similar threads