Review of QUARE - Ontology-based Quality Assessment for Git Repositories

Published on February 24, 2023 6 min read

QUARE is an open-source, ontology-based software to evaluate the quality of git repositories associated with scientific publications. It represents all metrics in an RDF knowledge graph and uses SHACL validation to measure repository quality.

This blog series features open-source, ontology-based software solutions that empower scientific research. The purpose is to help understand the technical landscape, highlight valuable use cases and examine the limitations that may impact adoption.

Related: Review of Project DEBBIE - Biomaterials Ontology and Database

This article features QUARE (the Latin word for “Why”), a tool developed by Prof. Dr. Andreas Henrich's team at the University of Bamberg in Germany. QUARE aims to evaluate the quality of git repositories, particularly those associated with scientific publications. Notably, it represents all metrics in the form of an RDF knowledge graph and uses ontology-based validation to measure repository quality.

For each assessment, relavant repository information, such as project descriptions, topics and license type, is fetched using the GitHub API and converted into a knowledge graph. The validation process, based on either SHACL or OWL, is then triggered and any validation errors are translated into human-readable explanations.

QUARE Validation Process

QUARE is capable of both OWL and SHACL based assessments, but the authors preferred the SHACL approach for a few reasons:

  • Separation of concerns. The data shape is separated from the ontology model.
  • More comprehensive validation results. OWL-based validation terminates after the first contradiction, while SHACL emits all validation errors.
  • Faster processing time, though this may depend on the specific OWL reasonser and SHACL validator used.

Using the QUARE repository itself as an example, the following is the generated RDF data graph for assessment.

@prefix ex: <https://example.org/repo/props/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

<https://github.com/uniba-mi/quare> a
    <https://example.org/repo/project-types/FinishedResearchProject> ;
    ex:has_branch <https://github.com/uniba-mi/quare/tree/dev>,
        <https://github.com/uniba-mi/quare/tree/main> ;
    ex:has_description "QuaRe is a tool that allows users to test if GitHub repositories
    of interest comply with certain quality criteria that they should fulfill according
    to the type of project in the repository." ;
    ex:has_issue <https://github.com/uniba-mi/quare/issues/3> ;
    ex:has_license "GNU General Public License v3.0" ;
    ex:has_readme <https://github.com/uniba-mi/quare/blob/main/README.md> ;
    ex:has_topic "git-management",
        "python",
        "svelte" ;
    ex:is_private false .

<https://github.com/uniba-mi/quare/blob/main/README.md> ex:has_section "Benchmarks",
        "Developer Information",
        "Installation",
        "License",
        "QuaRe: Validate your GitHub Repositories against Quality Criteria",
        "Running the Backend",
        "Running the Frontend",
        "Summary",
        "The Specification Page",
        "The Validation Page",
        "Usage" .

<https://github.com/uniba-mi/quare/issues/3> ns1:has_state "open"

QUEARE defined a set of criteria depending on the type of the repository (e.g. finished research project or internal documentation). Some of the key criteria for finished research projects include:

CriterionExplanation
TopicsThere shall be topic(s) assigned to the repository.
DescriptionThere shall be a description the repository.
Open-Source LicenseThere shall be an open-source license such as GPL.
Software ReleaseThe repository shall provide at least one software release.

Below is the SHACL shape definition for finished research projects. The complete SHACL data shape file can be found in the QUARE repository.

@prefix sh: <http://www.w3.org/ns/shacl#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix types: <https://example.org/repo/project-types/> .
@prefix props: <https://example.org/repo/props/> .
@prefix entities: <https://example.org/repo/entities/> .

types:FinishedResearchProject
  a rdfs:Class, sh:NodeShape ;
  sh:property [
		sh:path props:has_topic ;
		sh:minCount 1 ;
	], [
		sh:path props:has_description ;
		sh:minCount 1 ;
    	sh:maxCount 1 ;
	], [
		sh:path ( props:has_branch props:has_name ) ;
		sh:qualifiedValueShape [
			sh:pattern "main" ;
		] ;
		sh:qualifiedMinCount 1 ;
		sh:qualifiedMaxCount 1 ;
	], [
		sh:path ( props:has_issue props:has_state ) ;
		sh:pattern "open" ;
		sh:maxCount 0 ;
	], [
		sh:path props:has_release ;
		sh:minCount 1 ;
	], [
		sh:path props:has_license ;
		sh:pattern "GNU General Public License v3.0" ;
	] .

The declarative nature of SHACL data shapes makes it easy to tailor the quality criteria as needed. For example, by default, the only acceptable license is “GNU General Public License v3.0”. We can easily update the property shape to add other open-access license options, as follows:

[
  sh:path props:has_license ;
  sh:or (
    [ sh:pattern "GNU General Public License v3.0" ; ]
    [ sh:pattern "Apache License 2.0" ; ]
    [ sh:pattern "MIT License" ; ]
    [ sh:pattern "BSD-2-Clause license" ;]
    [ sh:pattern "BSD-3-Clause license" ;]
  ) ;
]

QUARE comes with a simple web app, which was built in Svelte. The assessment process is straightforward, simply specify the repo name, project type and validation model.

QUARE User Interface

To evaluate QUARE on real repositories associated with research projects, I selected the February 2023 issue of the Digital Discovery journal (published by the Royal Society of Chemistry). This issue contains 15 research articles, all with free access, 14 of which provided links to their public code repositories in GitHub. QUARE was able to perform assessment on 11 of them. The results, as shown below, suggest that there is a need to improve standardization of source code sharing for scientific publications.

GitHub ReposTopicsDescriptionOpen Source LicenseSoftware Release
https://github.com/MolecularMaterials/nfpNYNN
https://github.com/EmilSkaaning/DeepStrucNYYN
https://github.com/CumbyLab/gridrdfNYYY
https://github.com/Bayer-Group/CPMolGANYNYN
https://github.com/oxpig/CoPriNetNYYN
https://github.com/laura-rieger/battery-life-predictionNNYN
https://github.com/BlauGroup/HiPRGenNNNY
https://github.com/zavalab/ML/tree/SolvGNNError
https://github.com/popelier-group/FEREBUS-v7NNYN
https://github.com/KanHatakeyama/qclNNYN
https://github.com/2AUK/pyRISMYYYY
https://github.com/Robert-Forrest/EvolutionaryDesignOfBMGsError
https://github.com/GormleyLab/LightboxResources_Final/Error
https://github.com/tillbiskup/infofileNNYY

To summarize, QUARE demonstrated an interesting use case of ontology-based quality assessment for git repositories. The flexibility of SHACL data shapes makes it easy to tailor the assessment based on project types and specific needs. Currently, the asseesment only utilizes repository metadata from GitHub API only. As Justin Dowdy has demonstrated, Git repository itself, such as commit history, can also be represented as RDF knowledge graph. This presents an opportunity for more thorough quality assessment in the future.

    OntologyScientific SoftwareGitSHACL