PyKEEN Extension
Constants
This module defines shared constant values used throughout the PyKEEN negative sampler extension. These constants ensure consistency and avoid hardcoding fixed values across the implementation. Typical contents include default parameter values, files naming conventions.
Dataset
Custom dataset loader designed to extend PyKEEN’s dataset handling. Enables support for additional metadata, filtering logic, and preprocessing tailored to advanced negative sampling strategies.
- class extension.dataset.OnMemoryDataset(data_path: str | Path = None, load_entity_classes: bool = True, load_domain_range: bool = True, **kwargs)[source]
Bases:
Dataset
Dataset located on memory, requires already splitted data in RDF triple format. The folder should contain the following files
### Folder Structure
train.txt : Training triples in “h r t” format using RDF names
test.txt : Testing triples in “h r t” format using RDF names
valid.txt : Validation triples in “h r t” format using RDF names
entity_to_id.json: JSON file for entity name to ID mapping
relation_to_id.json: JSON file for relation name to ID mapping
entities_classes.json : Additional metadata of class memebership for each entity, need to have format
- “<ENTITY_NAME>”[
“<CLASS_NAME_1>” … “<CLASS_NAME_N>”
]
}
relation_domain_range.json : Additional metadata of domain and range classes for each relation, needs to have format:
- “<RELATION_NAME>”{
“domain” : “<CLASS_NAME_DOMAIN>” OR “None” “range” : “<CLASS_NAME_RANGE>” OR “None”
}
}
Filtering
Implements extended filtering mechanisms for training and evaluation. Includes logic for excluding invalid or null-indexed negatives, such as with the NullPythonSetFilterer.
Sampling
Defines custom negative sampling strategies with support for static and dynamic approaches. Built to integrate seamlessly with PyKEEN’s pipeline while enabling schema-aware, type-based, and model-driven sampling logic.
- class extension.sampling.ClassesNegativeSampler(*, entity_classes_dict, **kwargs)[source]
Bases:
SubSetNegativeSampler
Type-Constrained Negative sampler derived from “Krompaß, D., Baier, S., Tresp, V.: Type-constrained representation learning in knowledge graphs. In: The Semantic Web-ISWC 2015”. Produces the subsed of available negatives using only entities that appear as domain (for corruptiong head) and range (for corrupting tails) of a triple relation. Uses the target corruption entity class for defining the set of negative entities, can be used when domain and range relations are not available.
- average_pool_size(check_triples)[source]
Compute the average pool size for every h,r combination and r,t combination
- Parameters:
check_triples (MappedTriples) – Triples used for computating the pool size
- Returns:
Average pool size, and dictionary with number of triples with less than X negative (from 2 to 100)
- Return type:
Tuple[int, dict]
- compute_poolsize_aggregate(check_triples)[source]
Compute the average pool size for every h,r combination and r,t combination, strategy specific implementation
- Parameters:
head_relation (torch.tensor) – Head, Relation tensor
tail_relation (torch.tensor) – Tail, Relation tensor
- Returns:
Average pool size, and dictionary with number of triples with less than X negative (from 2 to 100)
- Return type:
Tuple[int, dict]
- generate_subset(mapped_triples, **kwargs)[source]
Generated the supporting subset to corrupt the triple
- Parameters:
mapped_triples (MappedTriples) – Base triples to generate the subset
- strategy_negative_pool(h, r, t, target)[source]
Compute the negative pool for a triple and the target for corruption
- Parameters:
h (int) – Head entity ID
r (int) – Relation ID
t (int) – Tail entity ID
target (str) – “head” or “tail” corruption
- Returns:
Tensor with computed negative entities IDs
- Return type:
torch.tensor
- class extension.sampling.CorruptNegativeSampler(*args, **kwargs)[source]
Bases:
SubSetNegativeSampler
Negative sampler from “Richard Socher, Danqi Chen, Christopher D Manning, and Andrew Ng. 2013. Reasoning With Neural Tensor Networks for Knowledge Base Completion.” Corrupt head and tails based on the subset of entities seen as head or tail of the specific relation
- generate_subset(mapped_triples)[source]
Generated the supporting subset to corrupt the triple
- Parameters:
mapped_triples (MappedTriples) – Base triples to generate the subset
- strategy_negative_pool(h, r, t, target)[source]
Compute the negative pool for a triple and the target for corruption
- Parameters:
h (int) – Head entity ID
r (int) – Relation ID
t (int) – Tail entity ID
target (str) – “head” or “tail” corruption
- Returns:
Tensor with computed negative entities IDs
- Return type:
torch.tensor
- class extension.sampling.NearMissNegativeSampler(*, sampling_model: ERModel = None, prediction_function: Callable[[ERModel, Tensor, tensor], tensor] = None, num_query_results: int = None, **kwargs)[source]
Bases:
SubSetNegativeSampler
Auxiliary Model based Negative Sampler from “Kotnis, B., & Nastase, V. (2017). Analysis of the impact of negative sampling on link prediction in knowledge graphs. arXiv preprint arXiv:1708.06816.” Uses a pretrained model on the same dataset to produce harder negatives. Given the predicted entity embedding for each triple, a Nearest Neighbour algorithm is used to produce negatives that could be predicted as positive but in reality are negatives.
- choose_from_pools(triple, internal_id) tensor [source]
Sample negatives from the negative pool
- Parameters:
triple (torch.tensor) – Triple for corruption
target (str) – Target of corruption
target_size (int) – Number of negatives to produce
- Returns:
Chosen negatives from the negative pool
- Return type:
torch.tensor
- corrupt_batch(positive_batch: Tensor) Tensor [source]
Subset batch corruptor. Uniform corruption between head and tail. Corrupts each triple using the generated subset
- Parameters:
positive_batch (MappedTriples) – Batch of positive triples
- Returns:
Batch of negative triples of size (positive_size * num_neg_per_pos, 3)
- Return type:
MappedTriples
- generate_subset(mapped_triples: Tensor, **kwargs) Dict [source]
Generate the auxiliary subset to aid in triple corruption. Specifically it creates the BallTree structure with the filtering triples (in Numpy format)
- Parameters:
mapped_triples (MappedTriples) – Triples used for filtering
- Returns:
Dictionary with auxiliary data
- Return type:
Dict
- strategy_negative_pool(h, r, t, internal_id)[source]
Compute the negative pool for a triple and the target for corruption
- Parameters:
h (int) – Head entity ID
r (int) – Relation ID
t (int) – Tail entity ID
target (str) – “head” or “tail” corruption
- Returns:
Tensor with computed negative entities IDs
- Return type:
torch.tensor
- class extension.sampling.NearestNeighbourNegativeSampler(*args, sampling_model: ERModel = None, num_query_results: int = None, **kwargs)[source]
Bases:
SubSetNegativeSampler
Nearest Neighbour Negative Sampler from “Kotnis, B., Nastase, V.: Analysis of the impact of negative sampling on link prediction in knowledge graphs”. Uses the entity embedding from a pretrained KGE input model to compute the entity K-Nearest neighbours to be used as negatives.
- generate_subset(mapped_triples, **kwargs)[source]
Generated the supporting subset to corrupt the triple
- Parameters:
mapped_triples (MappedTriples) – Base triples to generate the subset
- strategy_negative_pool(h, r, t, target)[source]
Compute the negative pool for a triple and the target for corruption
- Parameters:
h (int) – Head entity ID
r (int) – Relation ID
t (int) – Tail entity ID
target (str) – “head” or “tail” corruption
- Returns:
Tensor with computed negative entities IDs
- Return type:
torch.tensor
- class extension.sampling.RelationalNegativeSampler(*args, local_file=None, **kwargs)[source]
Bases:
SubSetNegativeSampler
Relational constrained Negative Sampler from “Kotnis, B., Nastase, V.: Analysis of the impact of negative sampling on link prediction in knowledge graphs”. If follows the assuption that each head,tail pair are connected by only one relation, so, fixed the head (tail) we take all the tail (head) elements that appear in the triple with a relation different from the original one.
- generate_subset(mapped_triples, **kwargs)[source]
Generated the supporting subset to corrupt the triple
- Parameters:
mapped_triples (MappedTriples) – Base triples to generate the subset
- strategy_negative_pool(h, r, t, target)[source]
Compute the negative pool for a triple and the target for corruption
- Parameters:
h (int) – Head entity ID
r (int) – Relation ID
t (int) – Tail entity ID
target (str) – “head” or “tail” corruption
- Returns:
Tensor with computed negative entities IDs
- Return type:
torch.tensor
- class extension.sampling.SubSetNegativeSampler(*, mapped_triples: Tensor, num_entities: int = None, num_relations: int = None, num_negs_per_pos: int = None, filtered: int = False, filterer: str = None, filterer_kwargs: dict = None, integrate: bool = False, **kwargs)[source]
Bases:
NegativeSampler
,ABC
Abstract Class Handling static negative sampling, requires implementing a method able to calculate the correct subset pool of negative for each entity in the triples set
- average_pool_size(check_triples: Tensor) Tuple[int, dict] [source]
Compute the average pool size for every h,r combination and r,t combination
- Parameters:
check_triples (MappedTriples) – Triples used for computating the pool size
- Returns:
Average pool size, and dictionary with number of triples with less than X negative (from 2 to 100)
- Return type:
Tuple[int, dict]
- choose_from_pools(triple: tensor, target: str, target_size: int) tensor [source]
Sample negatives from the negative pool
- Parameters:
triple (torch.tensor) – Triple for corruption
target (str) – Target of corruption
target_size (int) – Number of negatives to produce
- Returns:
Chosen negatives from the negative pool
- Return type:
torch.tensor
- compute_poolsize_aggregate(head_relation: tensor, tail_relation: tensor) Tuple[int, dict] [source]
Compute the average pool size for every h,r combination and r,t combination, strategy specific implementation
- Parameters:
head_relation (torch.tensor) – Head, Relation tensor
tail_relation (torch.tensor) – Tail, Relation tensor
- Returns:
Average pool size, and dictionary with number of triples with less than X negative (from 2 to 100)
- Return type:
Tuple[int, dict]
- corrupt_batch(positive_batch: Tensor) Tensor [source]
Subset batch corruptor. Uniform corruption between head and tail. Corrupts each triple using the generated subset
- Parameters:
positive_batch (MappedTriples) – Batch of positive triples
- Returns:
Batch of negative triples of size (positive_size * num_neg_per_pos, 3)
- Return type:
MappedTriples
- abstractmethod generate_subset(mapped_triples: Tensor, **kwargs)[source]
Generated the supporting subset to corrupt the triple
- Parameters:
mapped_triples (MappedTriples) – Base triples to generate the subset
- get_positive_pool(e: int, r: int, target: str) tensor [source]
Returns all the real negatives given an entity, a relation, and the taget for corruption. if target == “head” returns the full availabile negative entities for (*, rel, entity) if target == “tail” returns the full availabile negative entities for (entity, rel, *)
- Parameters:
e (int) – Entity ID
r (int) – Relation ID
target (str) – Target of corruption
- Returns:
Positive istances IDs
- Return type:
torch.tensor
- abstractmethod strategy_negative_pool(h: int, r: int, t: int, target: str) tensor [source]
Compute the negative pool for a triple and the target for corruption
- Parameters:
h (int) – Head entity ID
r (int) – Relation ID
t (int) – Tail entity ID
target (str) – “head” or “tail” corruption
- Returns:
Tensor with computed negative entities IDs
- Return type:
torch.tensor
- class extension.sampling.TypedNegativeSampler(*, relation_domain_range_dict, entity_classes_dict, **kwargs)[source]
Bases:
SubSetNegativeSampler
Type-Constrained Negative sampler from “Krompaß, D., Baier, S., Tresp, V.: Type-constrained representation learning in knowledge graphs. In: The Semantic Web-ISWC 2015”. Produces the subsed of available negatives using only entities that appear as domain (for corruptiong head) and range (for corrupting tails) of a triple relation. Need additional information on triples, a dict with domain and range for each relation (mapped to IDS) and a dictionary of class memebership for each entity (mapped to IDS)
- generate_subset(mapped_triples, **kwargs)[source]
Generated the supporting subset to corrupt the triple
- Parameters:
mapped_triples (MappedTriples) – Base triples to generate the subset
- strategy_negative_pool(h, r, t, target)[source]
Compute the negative pool for a triple and the target for corruption
- Parameters:
h (int) – Head entity ID
r (int) – Relation ID
t (int) – Tail entity ID
target (str) – “head” or “tail” corruption
- Returns:
Tensor with computed negative entities IDs
- Return type:
torch.tensor