JDEX Configuration Guide

JDEX is fully configurable through a JSON configuration file. This allows you to control every stage of the pipeline, from reasoning services to dataset splitting and post-processing. This tutorial walks you through the structure of the configuration file and provides practical examples.

Minimal Configuration

At a minimum, you must define the dataset name and the file paths. JDEX will automatically populate all other parameters with sensible defaults. The minimum required inputs to JDEX are:

  • paths.schema: The path to your custom ontology schema file

  • paths.data: The path to you custom knowledge graph assertions file (containing RDF type assertions, meaning ObjectPropertyAssertions and ClassAssertions)

{
  "dataset_name": "example_dataset",
  "paths": {
    "schema": "data/schema.owl",
    "data": "data/triples.ttl",
    "output": "output/"
  }
}

Tip

You do not need to setup all the parameters! Set only what you want, all other parameters will be automatically set to their default values!

Configuration Structure Overview

The configuration is logically divided into five primary sections:

  • General Settings: Global execution behavior.

  • Paths: Input and output file management.

  • Reasoning: Logic-based inference and validation settings.

  • Split: Parameters for Machine Learning dataset partitioning.

  • Post-processing: Export formats and mapping utilities.

General Settings

These parameters control the environment and the Description Logic (DL) complexity handled by the engine.

{
  "dataset_name": "my_dataset",
  "verbose": 1,
  "interactive_shell": true,
}

Parameter

Description

dataset_name

A unique identifier for the project.

verbose

Logging verbosity level (0 for quiet, 1 for info, 2 for debug).

interactive_shell

When true, enables interactive prompts during execution.

Paths Configuration

Defines where JDEX finds your knowledge base and where it saves processed artifacts.

{
  "paths": {
    "schema": "data/schema.owl",
    "data": "data/data.ttl",
    "output": "output/"
  }
}

Reasoning Configuration

The reasoning block manages JVM memory and specific inference tasks. JDEX supports multiple reasoners depending on the task.

{
  "reasoning": {
    "java_max_ram": 8,
    "java_8_home": "/path/to/java8",
    "java_11_home": "/path/to/java11",
    "materialization": {
      "enabled": true,
      "reasoner": "hermit"
    },
    "realization": {
      "enabled": true,
      "reasoner": "konclude"
    },
    "satisfiability": {
      "filter_unsatisfiable": false,
      "reasoner": "hermit"
    },
    "modularization": {
      "enabled": true
    },
    "decomposition": {
      "tbox": true,
      "rbox": true
    },
    "consistency": {
      "convert_ntriples": false
    }
  }
}

Parameter

Description

java_max_ram

Maximum RAM (in GB) allocated to the Java reasoning process.

java_8_home

Path to the Java 8 installation directory.

java_11_home

Path to the Java 11 installation directory.

materialization.enabled

Enables or disables materialization (precomputing inferred knowledge).

materialization.reasoner

Reasoner used for materialization (e.g., “hermit”).

realization.enabled

Enables or disables realization (computing class memberships of individuals).

realization.reasoner

Reasoner used for realization (e.g., “konclude”).

satisfiability.filter_unsatisfiable

If true, filters out unsatisfiable classes.

satisfiability.reasoner

Reasoner used for satisfiability checking (e.g., “hermit”).

modularization.enabled

Enables or disables ontology modularization.

decomposition.tbox

Enables decomposition of the TBox (terminological axioms).

decomposition.rbox

Enables decomposition of the RBox (role/property axioms).

consistency.convert_ntriples

Converts data to N-Triples format for consistency checking if enabled.

Dataset Splitting

Configure how triples are partitioned for Machine Learning tasks.

{
  "split": {
    "enabled": true,
    "train_percent": 80,
    "validation_percent": 10,
    "test_percent": 10,
    "transductive": true,
    "test_leakage_filtering": {
      "enabled": true,
      "minimum_frequency": 0.97
    }
  }
}

Constraints: Percentages must sum to exactly 100.
Leakage Filtering: A critical feature that prevents “data contamination” by ensuring test entities aren’t overly represented in the training set based on frequency thresholds.

Parameter

Description

enabled

Enables or disables dataset splitting.

train_percent

Percentage of data assigned to the training set.

validation_percent

Percentage of data assigned to the validation set.

test_percent

Percentage of data assigned to the test set.

transductive

If true, uses a transductive split (entities may appear across splits).

test_leakage_filtering.enabled

Enables or disables leakage filtering for the test set.

test_leakage_filtering.minimum_frequency

Threshold frequency for filtering entities to prevent leakage (e.g., 0.97).

Post-processing

Control the final output formats for downstream consumption.

{
  "post_processing": {
    "json_conversion": true,
    "id_mapping": true,
    "tsv_conversion": true
  }
}

Parameter

Description

json_conversion

Converts processed data into JSON format.

id_mapping

Generates mappings between original identifiers and internal IDs.

tsv_conversion

Converts processed data into TSV (tab-separated values) format.

Tip

JSON: Best for web apps or document databases.
ID Mapping: Creates a dictionary mapping URI strings to integer IDs.
TSV: Ideal for knowledge graph embedding frameworks like Pykeen or GraphVite.

Full Example

A complete configuration for a standard reasoning and splitting pipeline.

{
  "dataset_name": "example",
  "verbose": 1,
  "interactive_shell": true,
  "paths": {
    "schema": "data/schema.owl",
    "data": "data/triples.ttl",
    "output": "output/"
  },
  "reasoning": {
    "java_max_ram": 4,
    "materialization": {
      "enabled": true,
      "reasoner": "hermit"
    },
    "realization": {
      "enabled": true,
      "reasoner": "konclude"
    }
  },
  "split": {
    "train_percent": 80,
    "validation_percent": 10,
    "test_percent": 10
  },
  "post_processing": {
    "json_conversion": true,
    "tsv_conversion": true
  }
}

Loading the Configuration in Python

You can easily interface with JDEX programmatically using the JDEXConfig class.

from jdex.config import JDEXConfig
import json

# Load from file
with open("config.json", "r") as f:
    data = json.load(f)

# Instantiate the config object
config = JDEXConfig.from_dict(data)

# Verify parameters
print(config.pretty_print())

Additional Notes

  • Validation: Providing invalid values (e.g., wrong reasoner names or split percentages that don’t total 100) will raise errors during initialization.

  • Modularity: Each component (reasoning, splitting, etc.) can be toggled independently if you only need specific parts of the pipeline.