JDEX Configuration Guide
JDEX is fully configurable through a JSON configuration file. This allows you to control every stage of the pipeline, from reasoning services to dataset splitting and post-processing. This tutorial walks you through the structure of the configuration file and provides practical examples.
Minimal Configuration
At a minimum, you must define the dataset name and the file paths. JDEX will automatically populate all other parameters with sensible defaults. The minimum required inputs to JDEX are:
paths.schema: The path to your custom ontology schema filepaths.data: The path to you custom knowledge graph assertions file (containing RDF type assertions, meaning ObjectPropertyAssertions and ClassAssertions)
{
"dataset_name": "example_dataset",
"paths": {
"schema": "data/schema.owl",
"data": "data/triples.ttl",
"output": "output/"
}
}
Tip
You do not need to setup all the parameters! Set only what you want, all other parameters will be automatically set to their default values!
Configuration Structure Overview
The configuration is logically divided into five primary sections:
General Settings: Global execution behavior.
Paths: Input and output file management.
Reasoning: Logic-based inference and validation settings.
Split: Parameters for Machine Learning dataset partitioning.
Post-processing: Export formats and mapping utilities.
General Settings
These parameters control the environment and the Description Logic (DL) complexity handled by the engine.
{
"dataset_name": "my_dataset",
"verbose": 1,
"interactive_shell": true,
}
Parameter |
Description |
|---|---|
dataset_name |
A unique identifier for the project. |
verbose |
Logging verbosity level (0 for quiet, 1 for info, 2 for debug). |
interactive_shell |
When true, enables interactive prompts during execution. |
Paths Configuration
Defines where JDEX finds your knowledge base and where it saves processed artifacts.
{
"paths": {
"schema": "data/schema.owl",
"data": "data/data.ttl",
"output": "output/"
}
}
Reasoning Configuration
The reasoning block manages JVM memory and specific inference tasks. JDEX supports multiple reasoners depending on the task.
{
"reasoning": {
"java_max_ram": 8,
"java_8_home": "/path/to/java8",
"java_11_home": "/path/to/java11",
"materialization": {
"enabled": true,
"reasoner": "hermit"
},
"realization": {
"enabled": true,
"reasoner": "konclude"
},
"satisfiability": {
"filter_unsatisfiable": false,
"reasoner": "hermit"
},
"modularization": {
"enabled": true
},
"decomposition": {
"tbox": true,
"rbox": true
},
"consistency": {
"convert_ntriples": false
}
}
}
Parameter |
Description |
|---|---|
java_max_ram |
Maximum RAM (in GB) allocated to the Java reasoning process. |
java_8_home |
Path to the Java 8 installation directory. |
java_11_home |
Path to the Java 11 installation directory. |
materialization.enabled |
Enables or disables materialization (precomputing inferred knowledge). |
materialization.reasoner |
Reasoner used for materialization (e.g., “hermit”). |
realization.enabled |
Enables or disables realization (computing class memberships of individuals). |
realization.reasoner |
Reasoner used for realization (e.g., “konclude”). |
satisfiability.filter_unsatisfiable |
If true, filters out unsatisfiable classes. |
satisfiability.reasoner |
Reasoner used for satisfiability checking (e.g., “hermit”). |
modularization.enabled |
Enables or disables ontology modularization. |
decomposition.tbox |
Enables decomposition of the TBox (terminological axioms). |
decomposition.rbox |
Enables decomposition of the RBox (role/property axioms). |
consistency.convert_ntriples |
Converts data to N-Triples format for consistency checking if enabled. |
Dataset Splitting
Configure how triples are partitioned for Machine Learning tasks.
{
"split": {
"enabled": true,
"train_percent": 80,
"validation_percent": 10,
"test_percent": 10,
"transductive": true,
"test_leakage_filtering": {
"enabled": true,
"minimum_frequency": 0.97
}
}
}
Constraints: Percentages must sum to exactly 100.
Leakage Filtering: A critical feature that prevents “data contamination” by ensuring test entities aren’t overly represented in the training set based on frequency thresholds.
Parameter |
Description |
|---|---|
enabled |
Enables or disables dataset splitting. |
train_percent |
Percentage of data assigned to the training set. |
validation_percent |
Percentage of data assigned to the validation set. |
test_percent |
Percentage of data assigned to the test set. |
transductive |
If true, uses a transductive split (entities may appear across splits). |
test_leakage_filtering.enabled |
Enables or disables leakage filtering for the test set. |
test_leakage_filtering.minimum_frequency |
Threshold frequency for filtering entities to prevent leakage (e.g., 0.97). |
Post-processing
Control the final output formats for downstream consumption.
{
"post_processing": {
"json_conversion": true,
"id_mapping": true,
"tsv_conversion": true
}
}
Parameter |
Description |
|---|---|
json_conversion |
Converts processed data into JSON format. |
id_mapping |
Generates mappings between original identifiers and internal IDs. |
tsv_conversion |
Converts processed data into TSV (tab-separated values) format. |
Tip
JSON: Best for web apps or document databases.
ID Mapping: Creates a dictionary mapping URI strings to integer IDs.
TSV: Ideal for knowledge graph embedding frameworks like Pykeen or GraphVite.
Full Example
A complete configuration for a standard reasoning and splitting pipeline.
{
"dataset_name": "example",
"verbose": 1,
"interactive_shell": true,
"paths": {
"schema": "data/schema.owl",
"data": "data/triples.ttl",
"output": "output/"
},
"reasoning": {
"java_max_ram": 4,
"materialization": {
"enabled": true,
"reasoner": "hermit"
},
"realization": {
"enabled": true,
"reasoner": "konclude"
}
},
"split": {
"train_percent": 80,
"validation_percent": 10,
"test_percent": 10
},
"post_processing": {
"json_conversion": true,
"tsv_conversion": true
}
}
Loading the Configuration in Python
You can easily interface with JDEX programmatically using the JDEXConfig class.
from jdex.config import JDEXConfig
import json
# Load from file
with open("config.json", "r") as f:
data = json.load(f)
# Instantiate the config object
config = JDEXConfig.from_dict(data)
# Verify parameters
print(config.pretty_print())
Additional Notes
Validation: Providing invalid values (e.g., wrong reasoner names or split percentages that don’t total 100) will raise errors during initialization.
Modularity: Each component (reasoning, splitting, etc.) can be toggled independently if you only need specific parts of the pipeline.