Skip to content

Setting Up a Trustable Data Store

Long-term maintenance entails trend analysis, which depends on data about the project's state. To perform trend analysis on your TSF project, data about the graph's state must be stored persistently.

Git is designed as a store for your source code, hence it is suitable for storing the structural component of the graph state. Storing the artifacts of this structure, such as scores, would quickly become unmaintainable in git. Furthermore, a historic snapshot of your output state is not always practical to reproduce from the project's source code and graph structure alone. An external data store provides a maintainable solution for recording the graph's output state.

The following documentation shows how the reference tool, trudag, integrates with external storage. The interface between trudag and external platforms is user-defined, allowing integration with systems like SQL databases or OpenSearch-like data stores. For starters, to build a simple proof of concept, trudag can dump the score, which can then be transformed and pushed to an external platform using a custom script (as shown below).

trudag score --validate --dump my_data.json
bash my_upload_script.sh my_data.json

Throughout the rest of this document, more robust and long-term data model solutions provided by trudag are explained.

Data Models

For long-term maintainability, it's good practice to define a data model to ensure consistency, simplify data store design, and reduce the risk of corruption. Accordingly, trudag relies on a specific internal data model, so care must be taken to ensure it aligns with the external data store to avoid conflicts.

In order to prevent clashes, the data_model.py file provides a schema that ensures all fields are present, and in a usable format. This schema ignores extra keys, designed to support extended data models. The data model provided here needs to be a superset of the one expected by trudag.

Transformations enable you to define backwards compatibility between changes to your data model. These transformations should be included in your implementation of the data store connector, such that the output of data_store_pull (see below) fits the superset criteria mentioned above. To aid with backward and forward compatibility trudag provides a Schema version field to track any changes to the data model.

Data Spec

Below is a schema defining the shape of the core data model expected by trudag. This is expected in a .json format.

RootLevel

Field Type Notes
scores list[ScoreDict] See ScoreDict below.
info Info See Info below.

ScoreDict

Field Type Notes
id str The name of the statement e.g. SUN-BRIGHT.
score float The trustable score.

Info

Field Type Notes
Repository root str The top level directory of the git repository.
Commit SHA str The current git commit SHA.
Commit tag str The latest git tag if exists.
Commit date/time int The UNIX timestamp in seconds when the commit occurred.
CI job id str The id of the current CI job or "run_locally" if run outside of CI.
Schema version str The version of this schema used.

Configuring a Data Store Connector

A data store connector defines an interface for trudag to connect with your data store without an external script. The connector is implemented by you as a dotstop extension in the file .dotstop_extensions/data_store.py. The extension must include at least two Python functions - data_store_push and data_store_pull - which trudag uses to push and pull data, respectively, in the following format:

Trudag Data Schema

[
  {
    "scores": [
      {"id": "SUN-BRIGHT", "score": 1.0, ...}
      ...
    ],
    "info": {
      "Commit date/time": 1747754766,
      "Schema version": "1",
      ...
    }
  }
  ...
]

Please note that the date format is also important here. The data format is identical to the one created when dumping to a json file with trudag. For more information on data models please see the section above. A simple data_store.py file might look something like this.

def data_store_pull() -> list[dict]:
    data = get_my_data()
    return data

def data_store_push(data: list[dict]):
    push_my_data(data)

def get_my_data() -> list[dict]:
    # Insert data store interfacing logic here

def push_my_data(data: list[dict]):
    # Insert data store interfacing logic here

Using a Data Store Connector

To push data to the data store run any command that supports --dump with the argument data_store. e.g.

trudag score --validate --dump data_store

or

trudag publish --validate --dump data_store

Collecting data with each change to your projects mainline branch ensures accountability for that changes impact to the graph state.