KG-SaF-JDeX: Modular Dataset Generation Workflow

Machine Learning Loaders

This section contains dataset loaders for PyTorch, designed to handle ontology-enriched knowledge graphs. The KnowledgeGraph class provides structured access to ABox triples, TBox taxonomies, and RBox axioms, exposing them as integer-based tensors suitable for embedding models, link prediction, and other graph-based machine learning tasks.

class kgsaf_jdex.loaders.pytorch.dataset.KnowledgeGraph(path: str)[source]

Bases: Dataset

Knowledge Graph Dataset Loader.

This module defines a PyTorch Dataset abstraction for ontology-enhanced knowledge graphs, supporting ABox, TBox, and RBox components.

The dataset structure assumes:
  • Precomputed URI-to-ID mappings for individuals, classes, and object properties

  • ABox triples split into train/validation/test sets

  • Optional class assertions (rdf:type)

  • TBox taxonomy (class subsumption)

  • RBox axioms including object property hierarchies and domain/range constraints

All symbolic components are loaded from disk and exposed as torch.Tensor objects suitable for downstream learning and reasoning tasks.

The implementation supports limited OWL constructs such as owl:unionOf in domain and range definitions while skipping unsupported complex expressions.

property class_assertions: tensor
class_to_id(class_uri: str) int[source]

Convert class URI to ID.

property dataset_location: str
id_to_class(class_id: int) str[source]

Convert class ID to URI.

id_to_individual(individual_id: int) str[source]

Convert individual ID to URI.

id_to_obj_prop(obj_prop_id: int) str[source]

Convert object property ID to URI.

individual_classes(individual_id: int) tensor[source]

Get classes asserted for an individual.

individual_to_id(individual_uri: str) int[source]

Convert individual URI to ID.

is_leaf(class_id: int) bool[source]

Check if class has no subclasses.

obj_prop_domain(obj_prop_id: int) tensor[source]

Get domain classes of an object property.

obj_prop_range(obj_prop_id: int) tensor[source]

Get range classes of an object property.

obj_prop_to_id(obj_prop_uri: str) int[source]

Convert object property URI to ID.

property obj_props_domain: tensor
property obj_props_domains_range: tensor
property obj_props_hierarchy: tensor
property obj_props_range: tensor
sub_classes(class_id: int) tensor[source]

Get subclasses of a class.

sub_obj_prop(obj_prop_id: int) tensor[source]

Get sub-properties of an object property.

sup_classes(class_id: int) tensor[source]

Get superclasses of a class.

sup_obj_prop(obj_prop_id: int) tensor[source]

Get super-properties of an object property.

property taxonomy: tensor
property test: tensor
property train: tensor
property valid: tensor

Reasoning Utilities

Utilities for performing reasoning over OWL ontologies using the ROBOT toolkit. ReasonerUtility supports tasks such as OWL conversion, reasoning with HermiT, filtering unsatisfiable classes, and serializing RDFLib graphs to OWL, making it easy to automate ontology preprocessing and inference pipelines.

class kgsaf_jdex.utils.reason.ReasonerUtility(robot_jar: str)[source]

Bases: object

Utility wrapper for invoking the ROBOT OWL tool and OWL reasoners.

check_result(result)[source]

Check the result of a subprocess execution. Prints whether the command completed successfully based on the return code.

Parameters:

result (subprocess.CompletedProcess) – Result returned by subprocess.run.

convert_owl(input, output)[source]

Convert and normalize an OWL file using ROBOT merge. Writes a merged OWL ontology to the output path.

Parameters:
  • input (Path or str) – Input OWL file.

  • output (Path or str) – Output OWL file.

filter_unsatisfiable(input, output)[source]

Detect and remove unsatisfiable classes from an ontology. Writes a filtered OWL file if unsatisfiable classes are found. Prints detected unsatisfiable class IRIs.

Parameters:
  • input (Path or str) – Input OWL ontology.

  • output (Path or str) – Output OWL ontology with unsatisfiable classes removed.

reason(axiom_generators, input, output, debug)[source]

Run OWL reasoning and materialize inferred axioms. Writes a reasoned OWL ontology to the output path.

Parameters:
  • axiom_generators (list[str]) – ROBOT axiom generators to enable (e.g., ‘SubClassOf’, ‘EquivalentClasses’).

  • input (Path or str) – Input OWL ontology.

  • output (Path or str) – Output OWL ontology with inferred axioms.

  • debug (bool) – Enable ROBOT debug mode.

serialize(graph, output)[source]

Serialize an RDFLib graph to OWL format using ROBOT. Writes an OWL file to disk and removes the temporary XML file.

Parameters:
  • graph (rdflib.Graph) – RDFLib graph to serialize.

  • output (Path) – Output file path (without extension).

Raises:

RuntimeError – If ROBOT merge fails.

Modularization and Decomposition

Modules for breaking down large ontologies or knowledge graphs into smaller, semantically meaningful components. SignatureModularizer and SchemaDecomposer enable modularization based on signatures.

class kgsaf_jdex.utils.modularization.SignatureModularizer(schema: Graph, seed: Set[URIRef])[source]

Bases: object

Exctract a Module from an OWL Ontology given a signature (Set of target URIs)

modularize(verbose: bool) Graph[source]

Modularize the graph and output a new RDFLib graph

Parameters:

verbose (bool) – Log printing.

Returns:

Modularized sub graph

Return type:

Graph

class kgsaf_jdex.utils.modularization.SchemaDecomposer(input_graph: Graph)[source]

Bases: object

Decompose a given Ontology into TBox and RBox components.

decompose(verbose: bool) Tuple[Graph, Graph, Graph][source]

Decompose a Graph into RBox, Taxonomy and Classes

Parameters:

verbose (bool) – Log printing.

Returns:

RBox graph, Taxonomy Graph and Class definitions Graph

Return type:

Tuple[Graph, Graph, Graph]

Conversion Utilities

Utilities to convert and serialize ontologies and RDF data into formats usable for downstream processing.

class kgsaf_jdex.utils.conversion.OWLConverter(path: str)[source]

Bases: object

Converts a subset of OWL Ontology axioms to JSON Serialization

preprocess(taxonomy: bool = True, class_assertions: bool = True, obj_prop_domain_range: bool = True, obj_prop_hierarchy: bool = True, verbose: bool = True)[source]

Preprocess a subset of the dataset schema into Python data structure

Parameters:
  • taxonomy (bool, optional) – Load and convert taxonomy axioms. Defaults to True.

  • class_assertions (bool, optional) – Load and convert class assertions axioms. Defaults to True.

  • obj_prop_domain_range (bool, optional) – Load and convert object propoerty domain and range. Defaults to True.

  • obj_prop_hierarchy (bool, optional) – Load and convert object property hierarchy. Defaults to True.

  • verbose (bool) – Log printing. Defaults to True.

preprocess_class_assertions(verbose: bool) dict[source]

Process class assertions data, the out dictionary will be formatted as:

` uri_individuals : ['uri_class_1',...,'uri_class_n'] `

Parameters:

verbose (bool) – Log printing.

Returns:

Dictionary with list of individuals and their types

Return type:

dict

preprocess_obj_prop_domain_range(verbose: bool) dict[source]

Process object properties domain and range, the out dictionary will be formatted as:

``` uri_obj_prop : {

domain : [‘uri_c_1’, …, ‘uri_c_n’] range : [‘uri_c_1’, …, ‘uri_c_m’]

}

If complex classes are found (restrictions or lists). These will be kept and recusively added as a Python dictionary

param verbose:

Log printing.

type verbose:

bool

returns:

Dictionary with list of object properties and domain and range classes

rtype:

dict

preprocess_obj_prop_hierarchy(verbose: bool) dict[source]

Process object properties hierarchy, the out dictionary will be formatted as:

` uri_obj_prop : ['sup_uri_obj_prop_1',...,'sup_uri_obj_prop_1'] `

If complex classes are found (restrictions or lists). These will be kept and recusively added as a Python dictionary

Parameters:

verbose (bool) – Log printing.

Returns:

Dictionary with list of object properties and their hierarchy

Return type:

dict

preprocess_taxonomy(verbose: bool) dict[source]

Process taxonomy data, the out dictionary will be formatted as:

` uri_class : ['uri_sup_class_1',..., 'uri_sup_class_n'] `

If complex classes are found (restrictions or lists). These will be kept and recusively added as a Python dictionary

Parameters:

verbose (bool) – Log printing.

Returns:

Dictionary with list of classes and theri super classes

Return type:

dict

serialize()[source]

Serialize loaded and converted data into JSON format

class kgsaf_jdex.utils.conversion.TSVConverter(path: str)[source]

Bases: object

Converts RDF triple files into TSV format.

convert(triples: bool = True, splits: bool = True)[source]

Convert RDF triple files into TSV files. Prepares TSV representations for serialization.

Parameters:
  • triples (bool, optional) – Convert full ABox triples. Defaults to True.

  • splits (bool, optional) – Convert train/valid/test splits. Defaults to True.

preprocess_triples(path)[source]

Convert an RDF triple file into a TSV string.

Parameters:

path (Path) – Path to an RDF triple file.

Returns:

TSV-formatted string of triples (s, p, o).

Return type:

str

serialize()[source]

Write converted TSV data to disk.

class kgsaf_jdex.utils.conversion.IDMapper(path: str)[source]

Bases: object

Maps ontology URIs to integer identifiers.

map_to_id()[source]

Assign unique integer IDs to ontology elements. IDs are assigned deterministically after sorting URIs. Generates mappings for:

  • Classes

  • Object properties

  • Individuals

serialize()[source]

Write generated ID mappings to JSON files. Writes class, individual, and property mappings to disk.