FDA Software Failure Mitigation

blazin912 · Feb 5, 2014

Trying to understand how to interpret the FDA's concept of a "Software Device".

Our system is comprised of multiple software components running on multiple processors. This is a mix of unique software components and duplicates.

Processor 1:

OS with DSP

Processors 2-5:

4 Copies of RTOS w/ control loop (exact same source/sub system)

When considering a software failure of a software device, does this mean all software or a software component? ie Does a software failure mean the DSP goes whacky, while the control loops are ok? or All software is assumed to have gone haywire..?

I can't seem to grasp how it could be assumed that all software goes insane at the same time, but maybe I'm off base.

If there is a clear answer somewhere in the FDA guidance documents could a link be provided as well? Hot debate here! Thanks!

Ajit Basrur · Feb 5, 2014

Welcome to the Cove :bigwave:

Pls note that I have deleted the duplicate thread that was in another section.

Marc · Feb 5, 2014

~~We crossed up doing something at the same time. Hang on while I fix.~~ Fixed.

Ronen E · Feb 5, 2014

Typically a "Software Device" is a Medical Device consisting of software only. I'm not sure yours is one.

c.mitch · Feb 6, 2014

Hi,
What you describe sounds like embedded software. Either standalone or embedded, the method to assess risk is the same. Difference with embedded sw: a risk can be mitigated by hw measure.
Anyway, you should assess software risks component by component (a bit like FMEA, what consequence on the system if one component crashes or goes whacky). You should also assess risk taking sw as a black box, if this is relevant.
There's not guidance but a recognized standard: iec 62304. It contains requirements to assess sw risks and the concept of security class (similar to FDA's level of concern). Depending on that class, you can take sw system or subsystems as back boxes (class A), or you have to split it inti components and assess risks on each of them.

blazin912 · Feb 6, 2014

I understand that a risk can be mitigated by HW. The question is really, can other software subsystems mitigate the risk of a failure of a separate software system?

Failure:

CPU 1 sends command to CPU 2-5 to perform routine at max current. CPU 2-5 have SW controlled current limits.

As they are separate components, do we need to assume that CPU 2-5 ALSO have lost control and their SW controlled current limits are wiped ie no limit? or Can we assume 1 failure mode (much like hardware) and say the current limit within CPUs 2-5 will perform as expected.

Make sense?

I'd say building in the software current limit is a mitigation to protect from such a case, but if all software trapping/control is considered to be invalid when "Software fails", and you can only rely on HW, then we may need to address a lot of our design.

c.mitch · Feb 6, 2014

I would assume 1 failure mode, much like hardware, as you said.

Also, there is the rule to set sw failure probability to 100%. But only for one failure mode. Probability of failure modes of CPU 2 to 5, while CPU 1 is in failure mode, shouldn't be set to 100%.

The probability to have sw failure of all subsystems (CPU 1 to 5) at the same time is very low (Unless, for example, you use exactly the same sw in each subsystem, and that each subsystem is in the same state).

but if all software trapping/control is considered to be invalid when "Software fails", and you can only rely on HW, then we may need to address a lot of our design.

I agree. If we were to use this rule, no system would be sw controlled.
:deadhorse:

sagai · Feb 6, 2014

I mostly agree, two things to be mentioned.
There is a way to construct software on such a way that it at least can get to its safe stage (for example with watchdogs and Markov chains for states).
The other thing is there is also a way to mitigate one there are two or more similar purpose software is running on different hardware and based on different algorithms also could be from different development teams and decisions made based on voting of these parallel systems.
Cheers!

schandra · Feb 26, 2014

Software failure can be at system level (Software Item), subsystem level or software unit level.

schandra · Feb 26, 2014

Yes, you can mitigate one software subsystem failure by another subsystem in some scenarios. ( you can mitigate by software, electrical, mechanical, user, environment etc).
Suppose Subsystem A controlling a motor and Subsystem B can look at the motor feedback and determine that whether A is doing what it is supposed to do. If not initiate fault recovery on A or failure announciation on A.

Providing fault tolerance, fault detection, fault monitoring, etc are good risk mitigation strategies and it depends upon how safety classifications have been allocated to each subsystems and the justification for allocation. IEC 62304 clearly states this.

FDA Software Failure Mitigation

blazin912

Ajit Basrur

Marc

Fully vaccinated are you?

Ronen E

Problem Solver

c.mitch

Quite Involved in Discussions

blazin912

c.mitch

Quite Involved in Discussions

sagai

Quite Involved in Discussions

schandra

schandra

Similar threads