Introduction to STPA

- 5 mins read

Series: STPA

This is the beginning of a short series of notes on STPA (System-Theoretic Process Analysis) from the handbook. STPA is an analytical method for understanding how accidents involving systems may occur in order that the conditions which lead to such accidents may be controlled. In order to better explain STPA it’s useful to first contrast it with more traditional thinking about systems.

Analytical Decomposition

Analytical decomposition is the basis of more traditional methods such as FMEA (failure modes and effects analysis) (and FMECA - failure modes and effects criticality analysis - which only considers failures leading to critical loss) and FTA (fault tree analysis).

This method is only valid given the following assumptions:

  • independence of components: each subsystem operates independently, if events are modeled then they are independent except for immediate preceding and succeeding events
  • components act the same in isolation (when examined in isolation) as they do in the system in motion
  • components and events are not subject to feedback loops and other indirect interactions
  • interactions among components and events can be examined pairwise and combined into composite values (such as when doing logical combinations of fault probabilities)

In techniques considering the probabilities of failure events, the failure events themselves are assumed to be stochastic (probabilistic but unable to be precisely predicted) in order to determine the likelihood of system failures. These assumption is the basis of event-chain approaches such as FTA, ETA (event tree analysis), FHA (fault hazard analysis), and HAZOP (hazard and operability analysis - which uses a deviation rather than a failure as the event or condition to be considered). The central premise of STPA as a method is that systems of non-trivial complexity (such as software and humans) do not fulfill the above criteria.

System Theory

STPA is based on System Theory, a model of systems intended to replace analytical decomposition in order to better model complexity.

Some unique aspects of system theory:

  • system is treated as a whole rather than as components
  • primarily concerned with emergent properties (properties not expressed in the summation of the component behaviours)
  • emergent properties arise from relationships among the parts of the system

Since emergent properties are considered to arise from the totality of the system (from the components and their interactions) then controlling those properties requires controlling the behaviour of individual components. We can consider that a ‘controller’ would fulfill this role by providing control actions to the system and receiving feedback from the system to determine the impact of those control actions. This is a standard control feedback loop.

STAMP (System-Theoretic Accident Model and Processes)

STAMP is an accident causality model (not an analysis method) based on system theory which provides the theoretical basis for STPA: ie. it’s a model of how accidents occur and an alternative to chain-of-failure models which are the basis of methods such as FTA. STAMP considers accidents to be the result of inadequate system control: it expands the traditional accident causality model beyond event-chain models to include complex processes and unsafe interactions (which may or may not be related to failures). In this way STAMP can model causes of accidents arising from unsafe designs or other events that do not relate to failures.

Since STAMP considers safety to be a dynamic control problem rather than a failure prevention problem, the emphasis changes to enforcing constraints on system behaviour rather than preventing failures.

Some advantages of STAMP:

  • it works on very complex systems (top-down rather than bottom up)
  • includes software, humans, organizations, safety culture, etc as causal factors in accidents without requiring individual levels of modeling
  • allows creation of tools such as STPA, accident analysis (CAST), identification and management of leading indicators of increasing risk, organizational risk analysis, etc.

Note that because STAMP can be applied to model any emergent property, tools such as STPA can be applied to any system property (eg security).

The two most widely used STAMP-based tools are STPA (System-Theoretic Process Analysis) and CAST (causal analysis based on systems theory). STPA is a proactive method that analyzes the potential cause of accidents so that hazards can be controlled. CAST is a retroactive analysis method that examines an accident/incident that has occurred and identifies the causal factors that were involved.

Definitions

The following are some of the important definitions of vocabulary used in the context of STPA along with the pages where you can find these in the STPA handbook:

  • loss (p16): involves something of value to stakeholders. Losses may include a loss of human life or human injury, property damage, environmental pollution, loss of mission, loss of reputation, loss or leak of sensitive information, or any other loss that is unacceptable to the stakeholder.
  • hazard (p17): a system state or set of conditions that, together with a particular set of worst-case environmental conditions, will lead to a loss
  • system (p17): a set of components that act together as a whole to achieve some common goal, objective, or end. a system may contain subsystems and may also be part of a larger system
  • system-level constraint (p20): specifies system conditions or behaviours that need to be satisfied to prevent hazards (and ultimately to prevent losses)
  • hierarchical control structure (p22): a system model composed of feedback control loops which enforces constraints on system behaviours
  • unsafe control actions (UCAs) (p35): a control action that, in a particular context and worst-case environment, will lead to a hazard
  • controller constraint (p41): specifies the controller behaviours that need to be satisfied to prevent UCAs
  • loss scenario (p42): describes the causal factors that can lead to the unsafe control actions and to hazards
  • control path (p49): transfers control actions to the controlled process via (in generic terms) an actuator