How to do a basic STPA

- 23 mins read

Series: STPA

STPA can be considered to be a worst-case analysis method: it doesn’t consider the average cases or the best-case but identifies behaviours that should be prevented in order to mitigate accidents in those worst-cases.

Basic STPA is composed of four steps:

  1. Define the purpose of the analysis
  2. Model the control structure
  3. Identify unsafe control actions and controller constraints
  4. Identify loss scenarios

step 1: Define the purpose of the analysis.

This step is composed of four parts:

  1. identify losses
  2. identify system-level hazards
  3. identify system-level constraints
  4. refine hazards (optional)

step 1: part 1. identify losses

A general approach to identifying losses:

  • identify the stakeholders
  • stakeholders identify their ‘stake’ in the system (what do they value and what are their goals)
  • express each goal or value in terms of a loss eg:
L-1: loss of life or injury to people
L-2: loss of or damage to vehicle

points to consider when identifying losses:

  • losses can include any impact that is unacceptable to any stakeholder
  • losses should not reference individual components or specific causes
    • eg. “brake failure” is not a valid loss but a component failure that may lead to loss and therefore a component-level hazard
  • losses may involve aspects of the environment that are not directly controlled by the system designer
  • document any special considerations or assumptions made, such as losses that have been explicitly excluded

step 1: part 2. identify system-level hazards

Hazard identification in STPA is about system-states and conditions that are inherently unsafe regardless of the cause. In order to take advantage of STPA, hazards must be determined at a high-enough level of abstraction that there can be no distinction between causes related to technical failures, design flaws, requirement flaws, human error, etc.

A useful first step to identifying system-level hazards is to identify the system to be analyzed. This can be accomplished by determining the system boundary. In turn, the most useful way to determine this (and to think about it) is to determine what conditions and states the system designer may exert some control over.

This is a primary distinction between hazards and losses: losses may involve conditions or states of the environment (essentially everything excluded by the system boundary) as defined by those conditions over which little control can be exerted by the system designer. The goal of safety engineering is to mitigate the effects of hazards in the system under control, although awareness of uncontrolled conditions is important.

basic criteria for defining system-level hazards:

  • hazards are system states or conditions (not component-level causes or environmental states)
  • hazards will lead to a loss in some worst-case environment
  • hazards must describe states or conditions to be prevented
  • hazards should refer to factors that can be influenced by system designers and operators

common mistakes when identifying system-level hazards:

  • too many hazards containing unnecessary detail (in such cases it may be appropriate to group and refine these into sub-hazards - see step 4)
  • ambiguous or recursive wording
  • confusing hazards with failures

for example:

  • “ambient temperature is under 10C” is an invalid hazard in a vehicle system since that is outside the control of the system designer.
  • “brake failure” is an invalid system-level hazard since it refers to a component failure rather that a system-state.
  • A better system-level hazard might be “car does not decelerate from 30mph to 0mph within 5s” since the system designer may exert some control over that and in a worst-case environment this could be expected to lead to loss of or injury to human life or damage to the vehicle (both of which could be reasonable losses according to stakeholders in a vehicle)
  • “vehicle is in motion” is a system state which could lead to a loss. However, that is an invalid hazard since it’s not a system state to be avoided (since that would defeat the purpose of the vehicle).

structure: Hazards are fully specified by the following:

<HAZARD> = <SYSTEM> & <UNSAFE CONDITION> & <LINK TO LOSSES>

for example:

H-1: Aircraft violate minimum separation standards [L-1, L-2, L-3]

step 1: part 3. define system-level constraints

The system-level constraints (SC) specify system conditions or behaviours that need to be satisfied in order to prevent hazards (this might also be an expression of how the system minimizes losses in case the hazards do occur). Consequently, SCs derive immediately from the identified hazards.

SCs should not specify particular solutions or implementations: that is generally premature at such a level of abstraction (and point in time when such an analysis is performed) and can artificially constrain solutions.

structure: SCs are fully specified by the following:

<SYSTEM LEVEL CONSTRAINT> = <SYSTEM> & <CONDITION TO ENFORCE> & <LINK TO HAZARDS>
<SYSTEM LEVEL CONSTRAINT> = If <HAZARD> occurs, then <WHAT NEEDS TO BE DONE TO PREVENT OR MINIMIZE A LOSS> & <LINK TO HAZARD>

for example:

SC-1: Aircraft must satisfy minimum separation standards from other aircraft and objects [H-1]
SC-3: If aircraft violate minimum separation, then the violation must be detected and measures taken to prevent collision [H-1]

step 1: part 4. refine system-level hazards (optional)

Refinement of identified and reviewed hazards can be useful for complex applications and other large analysis efforts in order to:

  • better manage hazards
  • inform control structure modeling

However, refinement is generally not necessary for most applications. One way to do this is to ask ‘what needs to be controlled to prevent this hazard’: this will help to identify system processes or activities that are common across hazards (eg. acceleration, deceleration, steering).

step 2: Model the control structure

The hierarchical control structure is a model composed of control loops. These can be used to explain and anticipate complex interactions that can lead to losses.

Generic hierarchical control structures are composed of five types of elements:

  • controllers
  • control actions
  • feedback
  • other inputs and outputs (not control or feedback)
  • controlled processes

By convention, these elements are arranged vertically to indicate control and authority within the system; all downward arrows represent control actions and upward arrows represent feedback.

This modeling should begin with an abstract control structure and further rounds of modeling can add detail iteratively. The control structure can be refined by defining how each subsystem will be controlled (eg. ‘will the wheel braking subsystem be controlled directly and manually by the flight crew?’).

Some additional points to consider:

  • there is no model limitation restricting controllers to interactions of one level spans
  • there is no requirement for a 1-to-1 mapping between controllers and processes. a controller may provide control actions to one or more processes and a process may be controlled by zero or more controllers.
  • typically each control action path is parallel to a feedback path but this is not always the case

controllers

Generic controllers may provide “control actions” to control some process or enforce behavioural constraints on the process. Generic controllers are further decomposed into

  • “control algorithm”: the representation of the controller logic determining the provided control actions
  • “process models”: the representation of the assumptions/premises underlying the logic determining provisions of control actions (which may include assumptions about the process being controlled). Process models may be expected to be updated by feedback from the process under control.

When the controller is human, the process model is usually called a ‘mental model’ and the control algorithm is called the ‘operating procedures’, but the principles are directly translated.

assigning responsibility and deriving control actions

During development of the control structure, each entity in the structure may be assigned responsibilities as a refinement of the system constraints: what does each entity need to do such that the system-level constraints will be enforced? For example:

physical wheel brakes:

R-1: decelerate wheels when commanded by BSCU or Flight Crew [SC-6.1]

BSCU:

R-2: Actuate brakes when requested by flight crew [SC-6.1]
R-3: Pulse brakes in case of a skid (Anti-skid) [SC-6.2]
...

Control actions for each controller may then be defined based on these responsibilities. For example the BSCU controller must have capability to provide a control action to the physical wheel brakes in order to satisfy R-2.

The control structure can be refined further by adding additional details to the responsibilities to decrease the level of abstraction. For example if a responsibility is that the BSCU will need to execute both normal and automatic braking then this may imply two controllers within the BSCU which cooperatively control those two behaviours.

identifying process models and deriving feedback

Feedback can be derived from the identified control actions and responsibilities by identifying the process models the controllers require to facilitate the logic driving the control action provisions. For example:

  • BSCU responsibility: actuate brakes when requested by flight crew
  • process model: braking is requested by flight crew
  • feedback: brake pedal applied

common questions when modeling the control structure

  • does the control structure need to be complete before proceeding?
    • no, it can either be applied to a complete system or used to identify missing entities, generally it’s easier though not to intentionally omit information
  • how specific should the arrow labels be?
    • what matters is the functional information that can be sent and the function/role of entities, not the mechanism/implementation, so:
      • labels for control actions should indicate the type of command (eg. ‘open/close valves’)
      • feedback should indicate the type of information (eg ‘wheel speed’)
      • controller labels should indicate the functional type or role of the controller, not the implementation (eg. ‘Autobrake controller’)
  • should the control structure include all actuators and sensors?
    • these aren’t needed to begin STPA and aren’t appropriate to begin with (at the highest level of abstraction)
  • how do physical processes and physical interactions fit into the control structure?
    • control structures emphasize functional relationships/interactions, they do not typically capture physical relationships. physical processes are typically specified at the lowest levels of the structure.
  • does the control structure require a linear hierarchy?
    • no, the system will dictate the hierarchy. A linear hierarchy is generally the most simple and straightforward though and avoids coordination issues such as diffusion of responsibility and assumptions about other controllers.
  • who controls who?
    • control is different from obedience, similarly the ability to trigger a response or influence behaviour does not imply a control relationship: as an example to illustrate this, a control action to open a valve may trigger a valve to open but a feedback signal of high temperature may trigger a controller to turn on a fan. The ability to trigger a response is not a sufficient criteria to distinguish control from feedback. Control actions are always provided in order to achieve goals while feedback is provided in ignorance of high-level goals and without responsibility for achieving those goals. Thus the control hierarchy has a close correlation to the hierarchy of goals and responsibilities (and authorities): this is why control actions are naturally derived from responsibilities. However, mischaracterization of feedback/controls as the other type of information will not generally have a large impact on the results of the analysis since STPA considers unsafe control actions (UCAs) as well as potential feedback problems.
  • do I need to document anything other than the control structure diagram?
    • documenting information that is helpful to understand the control structure is good practice (eg. basic descriptions of controllers, purposes, special functionality, controller responsibilities, process models, etc)

common points of confusion

  1. a control structure is not a physical model
  2. a control structure is not an executable model
  3. a control structure does not assume obedience
  4. abstraction should be used to manage complexity

The first three points can be summarised by saying that control structures are about describing which mechanisms by which functional information may be exchanged exist and nothing else. The exact nature of the mechanisms and the implementation are largely irrelevant to the control structure model. This means that a control structure showing that a flight crew may provide control actions to an aircraft only means that mechanisms exist (several levels of indirection of which may be abstracted away at a certain level of analysis) which may allow the flight crew to do so. It says nothing about whether the flight crew can, in a particular instance, do so.

The last point is only an expression about the most efficient way to apply STPA: this should begin before implementations or other details are decided, at which point the highest level of abstraction is the natural starting point. Models arising at this level of abstraction can be used to begin STPA and to identify requirements and constraints for the system. STPA results can then be used to inform architecture, design, implementations, and further refinements. Even if those details have already been decided, STPA applied at the highest level of abstraction will provide the quickest results and on the broadest issues.

tips to prevent common mistakes in a control structure

  • labels should describe functional information (not implementations)
  • labels should be unambiguous when information types are known
  • controlled physical processes must be controlled by at least one controller
  • responsibilities (and traceability) must be reviewed for conflicts and gaps
  • control actions and feedback needed to satisfy the responsibilities must be included

step 3: Identify unsafe control actions and controller constraints

identify unsafe control actions

UCAs are those control actions that are ‘unsafe’ and should be avoided. More precisely, UCAs are those control actions that in certain contexts and worst-case environmental conditions will lead to a hazard. Note then that the context is important in assessing whether a control action is unsafe.

Therefore, a valid UCA must:

  • be traceable to one or more hazards
  • specify the context in which the control action is unsafe

There are four categories of unsafe control actions (ie. ways that actions can lead to hazards):

  1. control action not provided
  2. control action provided
  3. control action provided at the wrong time (too early, too late, out of sequence, etc)
  4. control action provided for the wrong duration (only applicable to continuous control actions)

The combination of the above criteria for a valid UCA informs the full specification for UCAs consisting of five parts:

UCA-*: <Source> <Type> <Control Action> <Context> <Link to hazards>

where ‘source’ is the controller that can provide the control action and ’type’ is the category of UCA. For example:

UCA-2: BSCU Autobrake provides Brake command during a normal takeoff [H-4.3]

common questions about UCAs

  • does a UCA guarantee that a hazard will always result?
    • no, a UCA leads to a hazard in a certain context and worst-case environment
  • can I identify UCAS when we already have safeguards in place?
    • STPA remains a worst-case analysis method. It is concerned with identifying those control actions that are inherently unsafe which is irrespective of any existing safeguards. Also STPA is best applied early in the design phase when those safeguards may not be known. Further, identifying UCAs can be used to provide argumentation about the efficacy of those safeguards which do exist or to inform design features which may act as safeguards. In other words, UCAs should be prevented even if safeguards do exist.
  • the last two categories of UCAs are about timing, what’s the distinction between them?
    • Firstly, category four only applies to continuous control action. Secondly, the distinction covers those situations where a continuous control action may be provided in the correct situation and at the correct time but for a period that renders it unsafe. For example, brakes may be applied for too short a duration to be effective (‘stops providing brake control action too early’). Thirdly, distinguishing these categories ensures UCAs will be discovered irrespective of whether they are continuous or discrete. For example, the case where brakes are not applied for a long enough time to be effective could also be expressed as a category three UCA (‘provides stop brake control action too early’).
  • do I need to identify exactly one UCA for each category of unsafe control action?
    • no, there may be multiple UCAs falling into each category, or none since categories may not be applicable to every case
  • are there more than four categories of UCA?
    • the four categories considered are provably complete but they do admit to subcategories. For example, for the second category (‘providing leads to hazard’) could consider the following subcategories considering contexts in which:
      • the control action may never be safe
      • the control action has an incorrect parameter (eg. wrong radio frequency)
      • insufficient/excessive control action may be unsafe
      • the direction of the control action may be unsafe
      • the control action has already been provided (eg. repeated, oscillatory, intermittent)
      • the control action is provided too quickly or too slowly
    • whenever control actions include one or more parameters it is important to consider how the parameters may be insufficient, excessive, in the wrong direction or otherwise unsafe
  • I identified a UCA not related to a system-level hazard, what should I do?
  • each valid UCA is traceable to at least one system-level hazard. Identifying a UCA that does not trace to a system-level hazard indicates that a system-level hazard may not have been identified. STPA is an iterative method and results at any point can update existing results.
  • where is the outcome or result of the UCA described?
    • UCAs should be traceable to at least one hazard. Where the relationship between UCAs and hazards is more complex or unintuitive, any special reasoning should be recorded either in the UCA itself or in comments on the UCA. Be careful not to confuse the UCA context and the UCA outcome. Recording the outcome rather than the context will make it impossible to derive the requirements and scenarios. Every UCA must contain the context and it may include the result for clarity.
  • should I specify process model flows in the UCA context?
    • No, the UCA should record the context (the true conditions) making the control action unsafe, not the particular control process model/belief. Identifying the causes of the UCAs is the outcome of the following step. For example: ‘BSCU provides Brake command during normal takeoff’ records the context while ‘BSCU provides Brake command when it incorrectly believes aircraft is landing’ incorrectly provides a process belief (which artificially limits the analysis)

tips to prevent common mistakes when identifying UCAs

ensure that:

  • every UCA specifies the context that makes the control action unsafe
  • UCA contexts specify the actual states that would make the control action unsafe not potential beliefs about the actual states
  • UCA contexts are defined clearly
  • UCA contexts are included and not replaced by future effects or outcomes
  • every UCA is traced to at least one hazard
  • any control action categories assumed to not apply is verified as not applicable
  • excessive, insufficient, wrong direction, etc are considered for continuous control actions with a parameter
  • any assumption or special reasoning behind UCAs is documented

identify controller constraints

Once UCAs have been identified, controller constraints (CCs) can be derived. Generally, each UCA can be inverted to directly derive the CC with a link provided for traceability. For example:

UCA-2: BSCU Autobrake provides Brake control action during a normal takeoff [H-4.3,H-4.5
C-2: BSCU Autobrake must not provide Brake control action during a normal takeoff [UCA-2]

step 4: Identify loss scenarios

It is important to consider the following categories of loss scenarios:

  1. scenarios that lead to UCAs (these relate directly to the controller and its inputs/feedback)
  2. scenarios in which control actions are improperly or not executed (these relate directly to the control path and the controlled process)

It can be helpful at this point to refine control structures with additional components such as sensors and actuators.

identifying scenarios that lead to UCAs

These are identified by starting with a UCA and attempting to identify the cause of the controller providing/not providing that control action.

These scenarios can be categorised accordingly:

  1. unsafe controller behaviour
  2. causes of inadequate feedback/information

1. unsafe controller behaviour

  • failures related to the controller (eg. physical, power, etc) (flawed execution/operation)
  • inadequate control algorithm (flawed logic)
    • flawed implementation
    • flawed specification
    • inadequacy over time/degradation
    • assumption of previous control actions being executed properly
  • unsafe control input (UCA received from another controller)
  • inadequate process model (flawed beliefs, start with the process model then identify how the process models might occur)
    • controller receives incorrect feedback
    • controller receives conflicting feedback that can not be or is incorrectly resolved
    • controller incorrectly interprets or ignores correct feedback
    • controller does not receive or receives delayed feedback
    • necessary controller feedback does not exist

Example of physical failure:

UCA-1: BSCU Autobrake does not provide the Brake control action during landing roll when the BSCU is armed [H-4.1]
Scenario 1 for UCA-1: The BSCU Autobrake controller fails during landing roll when the BSCU is armed, causing the Brake control action not to be provided [UCA-1]. As a result, insufficient deceleration may be provided upon landing [H-4.1].

Example of inadequate control algorithm:

UCA-3: BSCU Autobrake provides the Brake control action too late after touchdown [H-4.1]
Scenario 1 for UCA-3: The aircraft lands, but processing delays within the BSCU result in the Brake control action being provided too late [UCA-3]. As a result, insufficient deceleration may be provided upon landing [H-4.1].

Example of inadequate process model:

UCA-2: BSCU Autobrake does not provide the Brake control action during landing roll when the BSCU is armed [H-4.1]

controller process model (belief) that could cause the UCA: controller believes the aircraft has already stopped on ground controller receives correct feedback but interprets it incorrectly: wheel speed signals momentarily reach zero during anti-skid operation, causing flawed process model

Scenario 1 for UCA-2: The BSCU is armed and the aircraft begins landing roll. The BSCU does not provide the Brake control action [UCA-2] because the BSCU incorrectly believes the aircraft has already come to a stop. This flawed process model will occur if the received feedback momentarily indicates zero speed during landing roll. The received feedback may momentarily indicate zero speed during anti-skid operation, even though the aircraft is not stopped.

controller process model (belief) that could cause the UCA: Aircraft is in flight controlled does not receive information when needed: touchdown indication is not received

Scenario 2 for UCA-2: The BSCU is armed and the aircraft begins landing roll. The BSCU does not provide the Brake control action [UCA-2] because the BSCU incorrectly believes the aircraft is in the air and has not touched down. This flawed process model will occur if the touchdown indication is not received upon touchdown. <insert reason for this to occur>

The reason to be inserted in Scenario 2 is an example of a flawed process model arising from inadequate information.

2. causes of inadequate feedback/information

Any such scenarios must explain why the feedback/information is inadequate. Otherwise it will be impossible to prevent it. The source of such data must also be determined and examined. Feedback comes from the controlled process (by definition) while other information may come from other sources in the system or the environment. In general these scenarios may involve:

  • feedback/information not received
    • sent by sensors but not received by controller
    • not sent by sensors but received or applied to sensors
    • not received or applied to sensors
    • does not exist in control structure or sensors do not exist
  • inadequate feedback is received
    • sensors respond adequately but controller receives inadequate feedback
    • sensors respond inadequately to feedback that is received or applied to sensors
    • sensors are incapable or not designed to provide necessary feedback

These may be caused transmission errors or failures, unexpected sensor environments, flawed models, etc. Scenarios can be completed by considering a reason for this to occur given the true system state. For example, Scenario 2 for UCA-2 above:

  • true state from UCA context: aircraft is in landing roll
  • information received: touchdown indication is not received upon touchdown
  • how this could occur given true state: reported wheel speed is insufficient, reported weight on wheels is insufficient, wheel speed or weight is delayed, etc
Scenario 2 for UCA-2: The BSCU is armed and the aircraft begins landing roll. The BSCU does not provide the Brake control action [UCA-2] because the BSCU incorrectly believes the aircraft is in the air and has not touched down. This flawed process model will occur if the touchdown indication is not received upon touchdown. The touchdown indication may not be received when needed if any of the following occur:
-  wheels hydroplane on wet runway (insufficient wheel speed)
- wheel speed delayed due to filtering
- conflicting air/ground indications due to crosswind landing
- failure of wheel speed sensors
- failure of air/ground switches
- etc
As a result, insufficient deceleration may be provided upon landing [H-4.1]

identifying scenarios in which control actions are improperly executed or not executed

These scenarios consider factors affecting the control path as well as the controlled process.

1. scenarios involving the control path

These might generally include:

  • control action not executed
    • sent by controller but not received by actuators
    • received by actuators but they do not respond
    • actuators respond but control action not applied or received by controlled process
  • control action improperly executed
    • sent by controller but received improperly by actuators
    • received correctly but actuators respond inadequately
    • actuators respond adequately but control action is applied or received improperly at controlled process
    • control action not sent but actuators respond as though it had been sent

Creating these scenarios should be done by starting with a control action, identify what improper or no execution means for the application, and identify the control path that could contribute to that behaviour.

example:

  • control action: BSCU sends Brake command
  • no execution: brakes not applied
  • improper execution: insufficient braking
  • Scenario 1: BSCU sends Brake command upon landing but the brakes are not applied due to actuator failure. As a result, insufficient deceleration may be provided upon landing [H-4.1]
  • Scenario 2: BSCU sends Brake command upon landing, but insufficient braking is applied due to slow actuator responses. As a result, insufficient deceleration may be provided upon landing [H-4.1]
  • Scenario 3: BSCU sends Brake command upon landing, but it is not received by the actuator due to a wiring error. As a result, insufficient deceleration may be provided upon landing [H-4.1]
  • Scenario 5: BSCU sends the Brake command but the brakes are not applied because an adversary executes a denial of service attack the blocks the Brake command. As a result, insufficient deceleration may be provided upon landing [H-4.1]

Also consider how control actions may not be sent but actuators respond as though it had been:

  • control action: BSCU does not send Brake command
  • improper execution: brakes applied during normal takeoff (similar to UCA-2)
  • Scenario 4: BSCU does not send Brake command, but the brakes are applied due to hydraulic valve failure. As a result, acceleration may be insufficient during takeoff [H-4.6]

In general, these might include those scenarios where the transferred control actions are ineffective or overridden:

  • control action not executed
    • control action applied to controlled process but controlled process does not respond
  • control action improperly executed
    • control action applied to controlled process but the controlled process responds improperly
    • control action not applied but the controlled process responds as though the control actions is applied

To create such scenarios, select a control action and identify what factors can affect the controlled process to make the control action ineffective.

  • control action: BSCU sends Brake command
  • Scenario 6: BSCU sends brake command, but the brakes are not applied because the wheel braking system was previously commanded into alternate braking mode (bypassing BSCU). As a result, insufficient deceleration may be provided upon landing [H-4.1]
  • Scenario 7: BSCU sends brake command, but the brakes are not applied due to insufficient hydraulic pressure. As a result, insufficient deceleration may be provided upon landing [H-4.1]
  • Scenario 8: BSCU sends brake command, the brakes are applied, but the aircraft does not decelerate due to a wet runway. As a result, insufficient deceleration may be provided upon landing [H-4.1]
  • Scenario 9: BSCU sends brake command, but the brakes are not applied because an adversary injected a command that put the wheel braking system into alternate braking mode. As a result, insufficient deceleration may be provided upon landing [H-4.1]

preventing common mistakes

  • identify scenarios rather than individual causal factors: considering single factors reduces to FMEA (only considers single component failures, overlooks non-trivial/non-obvious factors and interactions of factors)

STPA outputs and traceability

The control structure is not explicitly shown above since it is closely related to every STPA output.