PyKEEN Extension

Constants

This module defines shared constant values used throughout the PyKEEN negative sampler extension. These constants ensure consistency and avoid hardcoding fixed values across the implementation. Typical contents include default parameter values, files naming conventions.

Dataset

Custom dataset loader designed to extend PyKEEN’s dataset handling. Enables support for additional metadata, filtering logic, and preprocessing tailored to advanced negative sampling strategies.

class extension.dataset.OnMemoryDataset(data_path: str | Path = None, load_entity_classes: bool = True, load_domain_range: bool = True, **kwargs)[source]

Bases: Dataset

Dataset located on memory, requires already splitted data in RDF triple format. The folder should contain the following files

### Folder Structure

train.txt : Training triples in “h r t” format using RDF names
test.txt : Testing triples in “h r t” format using RDF names
valid.txt : Validation triples in “h r t” format using RDF names
entity_to_id.json: JSON file for entity name to ID mapping
relation_to_id.json: JSON file for relation name to ID mapping
entities_classes.json : Additional metadata of class memebership for each entity, need to have format

```json {

“<ENTITY_NAME>”[
“<CLASS_NAME_1>” … “<CLASS_NAME_N>”

]

}

relation_domain_range.json : Additional metadata of domain and range classes for each relation, needs to have format:

```json {

“<RELATION_NAME>”{
“domain” : “<CLASS_NAME_DOMAIN>” OR “None” “range” : “<CLASS_NAME_RANGE>” OR “None”

}

}

Filtering

Implements extended filtering mechanisms for training and evaluation. Includes logic for excluding invalid or null-indexed negatives, such as with the NullPythonSetFilterer.

class extension.filtering.NullPythonSetFilterer(mapped_triples)[source]

Bases: PythonSetFilterer

Extensiion of Python Set based filterer that also check for manually inserted invalid negatives entities with negative indices.

contains(batch)[source]

Check whether a triple is contained.

Supports batching.

Parameters:: batch – shape (batch_size, 3) The batch of triples.
Returns:: shape: (batch_size,) Whether the triples are contained in the training triples.

Sampling

Defines custom negative sampling strategies with support for static and dynamic approaches. Built to integrate seamlessly with PyKEEN’s pipeline while enabling schema-aware, type-based, and model-driven sampling logic.

class extension.sampling.ClassesNegativeSampler(*, entity_classes_dict, **kwargs)[source]

Bases: SubSetNegativeSampler

Type-Constrained Negative sampler derived from “Krompaß, D., Baier, S., Tresp, V.: Type-constrained representation learning in knowledge graphs. In: The Semantic Web-ISWC 2015”. Produces the subsed of available negatives using only entities that appear as domain (for corruptiong head) and range (for corrupting tails) of a triple relation. Uses the target corruption entity class for defining the set of negative entities, can be used when domain and range relations are not available.

average_pool_size(check_triples)[source]

Compute the average pool size for every h,r combination and r,t combination

Parameters:: check_triples (MappedTriples) – Triples used for computating the pool size
Returns:: Average pool size, and dictionary with number of triples with less than X negative (from 2 to 100)
Return type:: Tuple[int, dict]

compute_poolsize_aggregate(check_triples)[source]

Compute the average pool size for every h,r combination and r,t combination, strategy specific implementation

Parameters:

head_relation (torch.tensor) – Head, Relation tensor
tail_relation (torch.tensor) – Tail, Relation tensor

Returns:

Average pool size, and dictionary with number of triples with less than X negative (from 2 to 100)

Return type:

Tuple[int, dict]

generate_subset(mapped_triples, **kwargs)[source]

Generated the supporting subset to corrupt the triple

Parameters:: mapped_triples (MappedTriples) – Base triples to generate the subset

strategy_negative_pool(h, r, t, target)[source]

Compute the negative pool for a triple and the target for corruption

Parameters:

h (int) – Head entity ID
r (int) – Relation ID
t (int) – Tail entity ID
target (str) – “head” or “tail” corruption

Returns:

Tensor with computed negative entities IDs

Return type:

torch.tensor

class extension.sampling.CorruptNegativeSampler(*args, **kwargs)[source]

Bases: SubSetNegativeSampler

Negative sampler from “Richard Socher, Danqi Chen, Christopher D Manning, and Andrew Ng. 2013. Reasoning With Neural Tensor Networks for Knowledge Base Completion.” Corrupt head and tails based on the subset of entities seen as head or tail of the specific relation

generate_subset(mapped_triples)[source]

Generated the supporting subset to corrupt the triple

Parameters:: mapped_triples (MappedTriples) – Base triples to generate the subset

strategy_negative_pool(h, r, t, target)[source]

Compute the negative pool for a triple and the target for corruption

Parameters:

h (int) – Head entity ID
r (int) – Relation ID
t (int) – Tail entity ID
target (str) – “head” or “tail” corruption

Returns:

Tensor with computed negative entities IDs

Return type:

torch.tensor

class extension.sampling.NearMissNegativeSampler(*, sampling_model: ERModel = None, prediction_function: Callable[[ERModel, Tensor, tensor], tensor] = None, num_query_results: int = None, **kwargs)[source]

Bases: SubSetNegativeSampler

Auxiliary Model based Negative Sampler from “Kotnis, B., & Nastase, V. (2017). Analysis of the impact of negative sampling on link prediction in knowledge graphs. arXiv preprint arXiv:1708.06816.” Uses a pretrained model on the same dataset to produce harder negatives. Given the predicted entity embedding for each triple, a Nearest Neighbour algorithm is used to produce negatives that could be predicted as positive but in reality are negatives.

choose_from_pools(triple, internal_id) → tensor[source]

Sample negatives from the negative pool

Parameters:

triple (torch.tensor) – Triple for corruption
target (str) – Target of corruption
target_size (int) – Number of negatives to produce

Returns:

Chosen negatives from the negative pool

Return type:

torch.tensor

corrupt_batch(positive_batch: Tensor) → Tensor[source]

Subset batch corruptor. Uniform corruption between head and tail. Corrupts each triple using the generated subset

Parameters:: positive_batch (MappedTriples) – Batch of positive triples
Returns:: Batch of negative triples of size (positive_size * num_neg_per_pos, 3)
Return type:: MappedTriples

generate_subset(mapped_triples: Tensor, **kwargs) → Dict[source]

Generate the auxiliary subset to aid in triple corruption. Specifically it creates the BallTree structure with the filtering triples (in Numpy format)

Parameters:: mapped_triples (MappedTriples) – Triples used for filtering
Returns:: Dictionary with auxiliary data
Return type:: Dict

strategy_negative_pool(h, r, t, internal_id)[source]

Compute the negative pool for a triple and the target for corruption

Parameters:

h (int) – Head entity ID
r (int) – Relation ID
t (int) – Tail entity ID
target (str) – “head” or “tail” corruption

Returns:

Tensor with computed negative entities IDs

Return type:

torch.tensor

class extension.sampling.NearestNeighbourNegativeSampler(*args, sampling_model: ERModel = None, num_query_results: int = None, **kwargs)[source]

Bases: SubSetNegativeSampler

Nearest Neighbour Negative Sampler from “Kotnis, B., Nastase, V.: Analysis of the impact of negative sampling on link prediction in knowledge graphs”. Uses the entity embedding from a pretrained KGE input model to compute the entity K-Nearest neighbours to be used as negatives.

generate_subset(mapped_triples, **kwargs)[source]

Generated the supporting subset to corrupt the triple

Parameters:: mapped_triples (MappedTriples) – Base triples to generate the subset

query_kdtree(entity_id)[source]

strategy_negative_pool(h, r, t, target)[source]

Compute the negative pool for a triple and the target for corruption

Parameters:

h (int) – Head entity ID
r (int) – Relation ID
t (int) – Tail entity ID
target (str) – “head” or “tail” corruption

Returns:

Tensor with computed negative entities IDs

Return type:

torch.tensor

class extension.sampling.RelationalNegativeSampler(*args, local_file=None, **kwargs)[source]

Bases: SubSetNegativeSampler

Relational constrained Negative Sampler from “Kotnis, B., Nastase, V.: Analysis of the impact of negative sampling on link prediction in knowledge graphs”. If follows the assuption that each head,tail pair are connected by only one relation, so, fixed the head (tail) we take all the tail (head) elements that appear in the triple with a relation different from the original one.

generate_subset(mapped_triples, **kwargs)[source]

Generated the supporting subset to corrupt the triple

Parameters:: mapped_triples (MappedTriples) – Base triples to generate the subset

get_subset(entity, rel, target)[source]

strategy_negative_pool(h, r, t, target)[source]

Compute the negative pool for a triple and the target for corruption

Parameters:

h (int) – Head entity ID
r (int) – Relation ID
t (int) – Tail entity ID
target (str) – “head” or “tail” corruption

Returns:

Tensor with computed negative entities IDs

Return type:

torch.tensor

class extension.sampling.SubSetNegativeSampler(*, mapped_triples: Tensor, num_entities: int = None, num_relations: int = None, num_negs_per_pos: int = None, filtered: int = False, filterer: str = None, filterer_kwargs: dict = None, integrate: bool = False, **kwargs)[source]

Bases: NegativeSampler, ABC

Abstract Class Handling static negative sampling, requires implementing a method able to calculate the correct subset pool of negative for each entity in the triples set

average_pool_size(check_triples: Tensor) → Tuple[int, dict][source]

Compute the average pool size for every h,r combination and r,t combination

Parameters:: check_triples (MappedTriples) – Triples used for computating the pool size
Returns:: Average pool size, and dictionary with number of triples with less than X negative (from 2 to 100)
Return type:: Tuple[int, dict]

choose_from_pools(triple: tensor, target: str, target_size: int) → tensor[source]

Sample negatives from the negative pool

Parameters:

triple (torch.tensor) – Triple for corruption
target (str) – Target of corruption
target_size (int) – Number of negatives to produce

Returns:

Chosen negatives from the negative pool

Return type:

torch.tensor

compute_poolsize_aggregate(head_relation: tensor, tail_relation: tensor) → Tuple[int, dict][source]

Compute the average pool size for every h,r combination and r,t combination, strategy specific implementation

Parameters:

head_relation (torch.tensor) – Head, Relation tensor
tail_relation (torch.tensor) – Tail, Relation tensor

Returns:

Average pool size, and dictionary with number of triples with less than X negative (from 2 to 100)

Return type:

Tuple[int, dict]

corrupt_batch(positive_batch: Tensor) → Tensor[source]

Subset batch corruptor. Uniform corruption between head and tail. Corrupts each triple using the generated subset

Parameters:: positive_batch (MappedTriples) – Batch of positive triples
Returns:: Batch of negative triples of size (positive_size * num_neg_per_pos, 3)
Return type:: MappedTriples

abstractmethod generate_subset(mapped_triples: Tensor, **kwargs)[source]

Generated the supporting subset to corrupt the triple

Parameters:: mapped_triples (MappedTriples) – Base triples to generate the subset

get_positive_pool(e: int, r: int, target: str) → tensor[source]

Returns all the real negatives given an entity, a relation, and the taget for corruption. if target == “head” returns the full availabile negative entities for (*, rel, entity) if target == “tail” returns the full availabile negative entities for (entity, rel, *)

Parameters:

e (int) – Entity ID
r (int) – Relation ID
target (str) – Target of corruption

Returns:

Positive istances IDs

Return type:

torch.tensor

abstractmethod strategy_negative_pool(h: int, r: int, t: int, target: str) → tensor[source]

Compute the negative pool for a triple and the target for corruption

Parameters:

h (int) – Head entity ID
r (int) – Relation ID
t (int) – Tail entity ID
target (str) – “head” or “tail” corruption

Returns:

Tensor with computed negative entities IDs

Return type:

torch.tensor

class extension.sampling.TypedNegativeSampler(*, relation_domain_range_dict, entity_classes_dict, **kwargs)[source]

Bases: SubSetNegativeSampler

Type-Constrained Negative sampler from “Krompaß, D., Baier, S., Tresp, V.: Type-constrained representation learning in knowledge graphs. In: The Semantic Web-ISWC 2015”. Produces the subsed of available negatives using only entities that appear as domain (for corruptiong head) and range (for corrupting tails) of a triple relation. Need additional information on triples, a dict with domain and range for each relation (mapped to IDS) and a dictionary of class memebership for each entity (mapped to IDS)

generate_subset(mapped_triples, **kwargs)[source]

Generated the supporting subset to corrupt the triple

Parameters:: mapped_triples (MappedTriples) – Base triples to generate the subset

strategy_negative_pool(h, r, t, target)[source]

Compute the negative pool for a triple and the target for corruption

Parameters:

h (int) – Head entity ID
r (int) – Relation ID
t (int) – Tail entity ID
target (str) – “head” or “tail” corruption

Returns:

Tensor with computed negative entities IDs

Return type:

torch.tensor