Architecture
The Eclipse Trustable Software Framework is designed to be tool-agnostic, with one exception: the data defining a Trustable Graph is expected to be stored under version control in a repository managed by git.
The CLI tool (trudag) and data store backend (dotstop) that are provided as part of this project are the default implementation of TSF. This page documents the expected architectural design of TSF, which is intended to inform these and any other tool and data store implementations. Note that some of the features documented here may not yet be supported by trudag.
This architecture view looks only at the conceptual elements of the TSF graph (the types of data processed by TSF and the types of processes that consume that data) and how they interact. It does not address:
- Data formats, grouping and segmentation
- Processing and dataflow orchestration and configuration
- User, library and process interface definitions
- Any examples for any discussed element to implement an example argument
Overview
In defining this architectural view, we consider which types of information matter when applying TSF. First, we identify which parts of this information are managed by the user. Then, we look at the elements that process this data and create intermediate artifacts. Finally, we discuss these intermediate artifacts.

The TSF architecture distinguishes between three distinct categories of element that correspond to these topics, as illustrated above:
- Managed data defines a Trustable graph. It is always stored as files under version control in a single git repository, but may include references to data stored or managed in other contexts, including content in the graph itself, and files in its local git repository, or another repository. This type of data is expected to be persistent, i.e. it should be consistent for any instantiation of the graph unless explicitly changed. Note that this expectation may not apply to referenced data managed in another context (see Output data below).
- Providers are software components for a given TSF implementation, providing functions relating to a specific type of Managed data, which obtain or process Output data. These functions may be 'built-in' behaviours of the tooling associated with the implementation, or they may be implemented as extensions (e.g. plug-in libraries), enabling users to extend or customise the tooling's behaviour to support different types and sources of data.
- Output data is generated by Providers using information from an associated type of Managed data to obtain and process data from a referenced context. This type of data is expected to be transient, i.e. it is generated for each instantiation of the graph, and its content may vary. Examples include the results of executing a test, or of querying an external data source. Also included is referenced data from an external context, such as a file managed in another git repository, which may be subject to uncontrolled changes (e.g. if the reference is to a branch in another git repository).
The combination of the Managed data for a given iteration of the graph, and the Output data obtained or generated for it by a given set of Providers is called a Resolved graph.
Managed Data
This is the data that defines a TSF graph, which is created and managed by users, and stored in a git repository.
Statement
A Statement is Managed data that represents the most fundamental element of a TSF Graph, as described by the Model. A Statement must include a textual component, which is expected to consist of a single sentence.
Statements may be connected to other Managed data elements, as characterised by the following relationships:
- Other Statements that it supports, or which support it
- References that qualify it, or provide additional context
- Evidence that is used to confirm its validity
- Scores that record evaluations of its validity by a human
- Graphs that include it as a member (at least one)
- Namespaces that identify a group of related Statements
Reference
A Reference is Managed data that specifies data from this or another context, which is obtained via a Resolver and used to qualify, provide context or validate a Statement. A Reference may be characterised by attributes:
- The type attribute is always required; it determines the Resolver that is used to obtain Content Data associated with the Reference. For example, the
filetype of Reference in trudag specifies a file managed in the same git repository as the graph. - A class attribute may be specified to characterise the relationship between the element that owns the Reference and the Content Data that it specifies.
As with the associated Content data, there are two categories of Reference:
-
For Persistent References, the resolved Content data is expected to be consistently reproducible for a given iteration of the associated Reference. Examples include data obtained from a file managed in the same git repository as the Trustable graph, or a file managed in another git repository that is referenced using a specific tag or SHA-1.
-
For Changeable References, the associated Content data may be expected to vary each time it is resolved. Examples include a file from the main branch of another git repository, the log of the most recent execution of a CI job, or the result of executing a query on a database of accumulated result data.
In both categories, the Reference implementation may include measures to determine whether the resolved Content data has changed since the reference was last updated. For example, a hash value for the content generated by a cryptographic algorithm might be stored as part of the Reference, and used by the Resolver to determine whether the data has changed.
References may also be connected to other Managed data elements, as characterised by the following relationships:
- A Namespace may define a class of References (e.g. the set of documented Misbehaviours)
- A Reference may specify a subgraph of a specified Graph (e.g. all Statements satisfying a given set of criteria)
- (Note): A subgraph in this context may consist of a single Statement,
- A Reference may provide a documented description (e.g. context for a group of Statements) for a Namespace
- A Reference may provide documented qualification or context for a Statement
- A Reference may provide an artifact that is used as Evidence
In a Resolved graph, References are also associated with the corresponding Content data provided by their related Resolver.
Evidence
Evidence is Managed data that is used to validate a Statement. It defines the artifacts that are to be evaluated and may define criteria used in their evaluation. It may be characterised by the following additional relationships:
- References define input artifacts for use in an evaluation.
- Validators provide functions that are used to perform automated evaluations.
In a Resolved graph, Evidence is also associated with the Result data that is produced by its related Validator(s).
Where Validators use discrete inputs from an external source of data (e.g. a file managed in an external git repository, or the id of the latest CI pipeline), these should be specified in the Evidence element using References. This serves to document the specific source and context of inputs provided to the Validator, and can be used to capture and store the associated Content data as part of the Resolved graph. This can also be used to verify the reproducibility of the associated Result data in the Resolved graph.
Where a Validator interacts directly with an external source of data (e.g. executing a series of queries that depend upon each other), however, the data source and the input criteria may instead be specified as attributes of the Evidence element.
Score
A Score is Managed data recording the outcome of an evaluation of a Statement by a human, based on its defined Evidence and any Statement(s) that it supports.
Where a Statement has a Score, it must also have Evidence, which identifies the set of inputs that are to be evaluated, and any Validators that may be involved in the evaluation.
A Score may have three components:
- evaluator: Who or what determined this Score?
- verdict: Based on the specified Evidence, is the related Statement true, false, or undetermined?
- confidence: A value expressing the evaluator's degree of confidence in the validity of the verdict
The evaluator component identifies the individual(s) or Validator who provide the Score. A Statement may be assigned more than one Score, so this property is used to distinguish between them.
Multiple Scores provided by humans may be recorded for a single Statement, reflecting reviews by individuals with different areas of expertise, or responsibility.
Graph
A Graph is Managed data consisting of a set of Statements and the relationships between them.
The scope of a Graph may encompass all of the Statements in a Trustable graph, or a subset of these. It is characterised by the following relationships:
- A Graph defines the set of Statements that are its members.
- A Graph may be identified by a Namespace.
- A Reference may specify a subgraph of an associated Graph.
The Graph also documents the Links between Statements, which may have associated attributes.
A link should be annotated with a cryptographic hash, which records the state of the parent and child statements. This enables a tooling implementation to report when either of these Statements has changed, which should prompt a re-evaluation of the link to determine whether it is still valid.
A link may also be annotated with a weight, which expresses the relative significance or importance of a linked child Statement with respect to other children of the same parent.
Namespace
A Namespace is Managed data that identifies a specific Graph.
Namespaces may be connected to other Managed data elements, as characterised by the following relationships:
- A Namespace may be associated with a class of References (e.g. the set of documented Misbehaviours).
- A Namespace may specify a definition Reference, which defines an associated subgraph of a specified Graph (e.g. all Statements satisfying a given set of criteria)
- A Namespace may also specify a description Reference (e.g. context for a group of Statements).
- A group of Statements may be associated with a Namespace (e.g. because their UIDs share the same prefix).
Providers
These are discrete elements of a TSF tooling implementation that are associated with particular types of Managed and Output data.
How Providers are implemented, scheduled and configured, and how their inputs and outputs are managed by an orchestrating process or processes, is outside the scope of this document.
How outcome documents (such as a Trustable Compliance Report) are constructed and laid out by Publishers is also outside the scope of this document.
Resolver
A Resolver is a Provider that obtains and verifies Content data for a given type of Reference. This data may be obtained from an external context (e.g. a file managed in a different git repository), or from the local context of the graph (e.g. a file in the same git repository), or from an element of the graph itself (e.g. a Statement).
The Resolver may implement a mechanism to determine whether the resolved Content data has changed since the associated Reference was last updated. See Reference for a further discussion of this.
Validator
A Validator is a Provider that performs automated evaluation using input data and criteria specified by an Evidence element. Validators may use referenced Content data and generate Result data as part of the validation process.
Every validator checks whether its input criteria are satisfied and its result data is as expected, reporting any violations with suitable error codes and messages as part of its Result data.
Publisher
A Publisher is a Provider that generates documents from a given Resolved graph. The Trustable report generated by trudag is an example of such a document.
Publishers are provided with a Resolved Graph as an input, which includes all of the necessary Content and Result data, so that this can be included or rendered in the output document.
If the required data referenced by Statements in the provided input graph has not been obtained by the associated Resolvers or produced by the associated Validators, then the Publisher must omit it, or replace it with an explanation.
Output Data
This is data that is produced by Providers acting on Managed data. First Content data is retrieved by Resolvers, to better contextualise parts of the Managed data. Then, Result data is gathered or generated against that context by Validators. Finally, all of this is collected together with the Managed data to create a Resolved graph.
Content data
Content data is Output data that is obtained when a Reference is processed by a Resolver. Two categories of Content data may be referenced:
- Persistent: the same result should be obtained, unless some attribute of the Reference is changed.
- Changeable: the result obtained may vary each time the Content data is obtained.
For both categories, the associated Reference type and Resolver may include a mechanism to detect when the resolved data has changed. See Reference for a further discussion of this.
Result data
Result data is Output data that is generated and used by a Validator as part of its evaluation of Evidence. This type of data is transient, but may be exported as part of a Resolved Graph.
Result data should always include a verdict component, where true means that the evaluated set of artifacts and results fulfil the criteria specified by the Evidence, and false means that they do not, or that an error prevented the Validator from completing its evaluation.
Result data may optionally include other types of data, which may be used to quantify or qualify the verdict. The meaning and significance of such data is Validator-specific. For example, a Validator may perform a calculation, and return a verdict of true if it falls within defined parameters, but also output the calculated result. For some Validators, this may be used to provide a confidence component of the associated Score.
Result data is always related to the corresponding Evidence element, and is typically associated with a corresponding Score element of the related Statement.
Resolved graph
When all of Output data for a Trustable graph has been obtained or produced by the associated Resolvers and Validators, the result is called a Resolved graph.
This represents the state of the Trustable graph for a specific iteration of the Managed data at a given moment in time and for a given evaluation context.
An implementation of TSF should include methods for storing a Resolved graph as a persistent artifact, and for instantiating a Resolved graph from such an artifact.