Skip to content

Mathematical Roadmap for the Trustable Score

This section describes the theoretical definition of the Trustable Score and intended method of calculation. The current implementation evaluates these definitions using graph algorithms that are mathematically equivalent to the formulations presented here, though not all features (such as user-defined weights and completeness scores) are fully exposed yet. A description of the score as computed by the tooling is documented here.

Scoring is performed in two stages. First, evidence scores are prepared for each leaf node (trudag/score.py). Then, these scores are propagated through the graph using the algorithms described below (trudag/graphalyzer/analysis.py).

Currently, edge weights are included in the scoring calculations but are not user-configurable. All weights are initialised to 1 and normalised by the number of children.

Completeness scores are supported by the graph algorithm but are not yet exposed to users as a separate input. All completeness values passed to the graph algorithm default to 1. However, for Evidence items with both an SME assessment and a Validator score, the SME score acts as completeness and the Validator score acts as correctness. Their product (sme_score * validator_score) is used as the evidence score, effectively incorporating completeness into the leaf score before it reaches the graph algorithm. For Evidence with only an SME score or only a Validator score, the missing value is assumed to be 1, and the single score is treated as correctness.

Notation and naming

Many of the concepts here will be familiar if you have previously encountered graph theory. Much of the following is basic graph theory framed in the language of Trustable. This is done to permit the tools and methodology to be used with an intuitive, rather than mathematical understanding of the concepts at play.

Modelling arguments

Consider some set of Statements related to XYZ, \(S\). We call the power set of \(S\), \(\mathcal{P}(S)\), the set of Arguments. We call the set of ordered pairs \(C=\mathcal{P}(S)\times S\) the set of Conjectures. A Conjecture \((A,c)\) is composed of an Argument, \(A\in\mathcal{P}(S)\) and a Conclusion \(c\in S\). A Conjecture is called a Proof if and only if

\[ \left(\bigwedge_{s'\in A}s'\right) \Rightarrow c. \]

Naturally, the set of Proofs \(Pr \subseteq C\).

Info

This model is not intended as a rigorous mathematical model of a logical proof. Such a treatment of the Trustable Graph is, for the moment, not considered. Rather, we are trying to describe networks of requirements we already have in language that is unambiguous and as concepts that we can reason about mathematically; the model is rigorous, the arguments it describes are not.

Defining the score

A Trustable Graph is the ordered pair \((S,L)\), composed of a set of Statements \(S\) and a set of directed edges or links between Statements, \(L\).

A link between two Statements, \((s,s')\in L\), exists if and only if \(\exists (A,s) \in C, s'\in A\).

We aim to define a "Trustable Score" for each Statement in the Trustable Graph, representing our collective confidence in the truth of that Statement. To do so, we must first define some simpler concepts.

SME Assessment Scores

Mathematically, an SME Assessment score is some function \(c: S \rightarrow [0,1]\), which maps Statements to a level of confidence in the truth of that Statement as a probability.

Completeness of a proof

We previously introduced the idea of an Argument: a set of Statements whose conjunction is claimed to imply the Conclusion. In reality there will be a level of uncertainty that a given Conjecture in the graph is in fact a Proof.

The aggregated SME assessment of this probability is called the completeness score of the Statement, \(c_p: S \rightarrow [0,1]\),

\[ c_p(s) = P\left(\exists (A,s) \in Pr, \left\{s':(s,s')\in L\right\} = A\right). \]

This probability is a measure of both aleatory and epistemic uncertainty.

For Statements without an Argument (or equivalently Evidence) \(c_p(s)\) is defined differently:

  • If the Statement has a Validator, \(c_p\) is the SME confidence assessment in the Validator as an accurate measure of the truth of the Statement.
  • If the Statement has no Validator, \(c_p\) is the SME confidence in the Statement on the basis of the linked Artifacts.

Weighting of the Argument

Different groups of Stakeholders may wish to emphasise particular aspects of an Argument. Each group of Stakeholders who are interested in the output of the project assigns a weight to each link in the graph. In doing so, they define a link-weighting function \(\mathcal{W}:L\rightarrow[0,1]\) which accounts for their organizational priorities and risk management strategy.

This function must satisfy the additional constraint,

\[ \sum_{s'\in A(s)} \mathcal{W}((s,s')) \leq 1. \]

For convenience, we define the weighted-adjacency function \(W:S\times S \rightarrow [0,1]\),

\[ W(s,s') = \begin{cases} 0, \; (s,s') \not\in A(S) \\ \mathcal{W}((s,s')), \; \text{otherwise} \end{cases}. \]

Info

While well-defined, weights do not have a formal interpretation.

Correctness of an Argument

The degree to which the conjunction of an Argument for a Statement is True (regardless of whether that Argument and Statement form a Proof or Conjecture) is measured by the correctness function, \(c_r: S\rightarrow [0,1]\). This is a poorly defined extension of boolean logic where a score of \(0\) means the Argument is False, \(1\) means True and values in between indicate a degree of uncertainty (though expressly not in any rigorous sense).

For Requests, we define the correctness function \(c_r: S \rightarrow [0,1]\) recursively from its Argument,

\[ c_r(s) = \left(\sum_{s'\in A(s)}W(s,s')c_p(s')c_r(s')\right). \]

For Evidence, this value is determined by the Validator, or set to \(1\) if the Statement has no Validator (i.e. it is purely SME assessed).

Info

Again correctness does not have a formal interpretation, but is well-defined.

The Trustable Score

Finally, we'd like to consider our confidence that a given Statement is True, based on its Argument. A reasonable approximation for this is the product of the completeness and correctness of its Argument:

\[ T(s) = c_p(s)c_r(s). \]

We call this the Trustable Score for a Statement. If completeness and correctness scores are defined for all Evidence, the definition of \(T(s)\) above is sufficient to recursively define the scores of all items in the graph.

The Trustable Score is therefore an approximate measure of an organisation's collective confidence that a given Statement is True, solely on the basis of Evidence.

It is important to be clear that the Trustable Score is only an approximate measure, in the sense that:

  • For a given set of weights, equivalent Trustable Scores for two Statements does not mean equal likelihood those Statements are True.
  • For a given set of weights, an increase or decrease in Trustable Score does not necessarily imply an increase or decrease in the likelihood a Statement is True.

However it is still useful, in the sense that:

  • A Trustable Score of \(1\) does correspond to absolute organizational confidence in a Statement.
  • Long-run trends in Trustable Scores should correlate strongly with changes in organizational confidence.

To improve the reliability of the Trustable Score as a measure of organizational confidence, the following avenues of research should be considered:

  • Statistical analysis of the relationship between Trustable Scores and the perceived success or failure of projects.
  • More rigorous application of fuzzy logic and probability theory in the definition of the Trustable Score to help define its formal meaning.

Contributions and critique

Contributions to and criticism of the Trustable Score are welcomed. In the first instance, the preferred method of communication for this is via GitLab issues.

Calculating the score

As suggested above, we must begin by determining the correctness of Evidence. Recall that correctness for Evidence is equal to one or the value returned by a Validator, if one exists.

Next, we define the completeness score for all Statements, using the mean of all recorded SME assessments for a given Statement.

For a graph of \(n\) nodes, label the nodes with indices \(1\leq i\leq n\), such that the set of statements \(S\) is equivalent to \(\{s_i:i= 1,...,n\}\) and we can define the diagonal completeness matrix \(\mathbf{C}_p\), whose entries are given by,

\[ {c_p}_{ij} = \begin{cases} c_p(s_i), \; i=j\\ 0,\;\text{otherwise} \end{cases}. \]

Similarly, the entries of the weighted adjacency matrix \(\mathbf{W}\) are defined as,

\[ w_{ij} = W(s_i, s_j). \]

Then, given the vector of correctness scores for Evidence items, \({\mathbf{c}_r}_E\) whose entries are given by

\[ {c_r}_i = \begin{cases} c_r(s_i), \; A(s) = \emptyset\\ 0, \; \text{otherwise} \end{cases}, \]

we claim that the correctness score for all nodes \(\mathbf{c}_r\), \({c_r}_i = C_r(s_i)\), is given by the following sum

\[ \mathbf{c}_r =\sum_{i=0}^n \left(\mathbf{W}\mathbf{C}_p\right)^i {\mathbf{c}_r}_E, \]

and by extension the trustable score \(\mathbf{t}\), \({t}_i = T(s_i)\),

\[ \mathbf{t} =\mathbf{C}_p\sum_{i=0}^n \left(\mathbf{W}\mathbf{C}_p\right)^i {\mathbf{c}_r}_E, \]

Because Trustable graphs are acyclic, the series above can be evaluated by a finite dynamic program over the graph. In practice this corresponds to computing scores by traversing the graph in topological order rather than by explicit matrix powers, while remaining mathematically equivalent to the expression above.

Sensitivity Analysis

The derivatives of the Trustable score can be used to study the effect of varying weights and scores on the trustable scores of Statements in the Graph.

In implementation, these derivatives are evaluated by propagating influence along the DAG using recurrences equivalent to the matrix expressions above.

Edge-weight sensitivity

The partial derivative of trustable scores with respect to edge weights is given by

\[ \frac{\partial\mathbf{t}}{\partial w_{rs}} = \mathbf{C}_p\sum_{i=0}^n\sum_{k=0}^{i-1}\left(\mathbf{W}\mathbf{C}_p\right)^k \mathbf{E}_{rs}\mathbf{C}_p\left(\mathbf{W}\mathbf{C}_p\right)^{i-1-k}{\mathbf{c}_r}_E, \]

where \(\mathbf{E}_{rs}\) is the matrix with entries

\[ (e_{rs})_{ij} = \begin{cases} 1,\; i=r, j=s\\ 0,\; \text{otherwise} \end{cases}. \]

Node-score sensitivity

The partial derivative of trustable scores with respect to other scores is more complex. Beginning with the definition of the trustable score,

\[ \mathbf{t} =\mathbf{C}_p\sum_{i=0}^n \left(\mathbf{W}\mathbf{C}_p\right)^i {\mathbf{c}_r}_E, \]

expanding gives us

\[ \mathbf{t} =\mathbf{C}_p{\mathbf{c}_r}_E + \sum_{i=0}^n \mathbf{C}_p\left(\mathbf{W}\mathbf{C}_p\right)^{i-1}\mathbf{W}\mathbf{C}_p {\mathbf{c}_r}_E. \]

It is then clear that we can write the trustable score of all nodes in terms of the trustable score of the leaf nodes,

\[ \mathbf{t} =\sum_{i=0}^n \left(\mathbf{C}_p\mathbf{W}\right)^{i}\mathbf{t}_E. \]

We interpret this as meaning the trustable score is the product of powers of the modified adjacency matrix \(\mathbf{C}_p\mathbf{W}\) and the vector of leaf scores. These powers represent the number of walks from one node in the graph to another. Determining the partial derivative is therefore as simple as choosing the correct entry of the modified adjacency matrix,

\[ \frac{\partial t_r}{\partial t_s} = \left(\mathbf{d}_r^\intercal\sum_{i=0}^n\left(\mathbf{C}_p\mathbf{W}\right)^k\right)_s \]

where \((d_r)_s = \delta_{rs}\).