Data usage tracking networks

Long-term projects use data from a range of sources. Common databases or core files are often reused multiple times. Some data can remain dormant for long periods of time. We built automated data tracing systems that are used to present summarised analytics that are conducive to multipurpose usage and error prevention. This software scans entire data servers or project code to find and quantify all data sources. It then produces live interactive reports without the need for manual curation.

Usage of all data showing dates, sizes, the purpose of usage, and projects. Double clicking a file will show all files that have the same datatype purpose (e.g. all files required for RNAseq analysis).

The interaction between data uses, within and between projects. ■ Teal coloured edges illustrate single-purpose data. ■ Dark coloured edges illustrate multi-purpose data.

Simplified data usage. Single purpose, project-specific files connect to only one project (e.g. RNAseq analysis data for single project). Files that are reused appear in the centre and are connected to multiple projects (e.g. RNAseq reference files reused in several projects).