prioritize

prioritize(mappings: list[Mapping], priority: list[str], *, progress: bool = True) list[Mapping][source]

Get a priority star graph.

Parameters:
  • mappings

    An iterable of mappings.

    Warning

    This assumes that inference and inversion have already been run. This means that if there exists any exact match mapping path between A and B, then there exists an edge A, exact, B`. Further, if there exists a mapping A, exact, B, there must be a B, exact, A.

  • priority – A priority list of prefixes, where earlier in the list means the priority is higher.

Returns:

A list of mappings representing a “prioritization”, meaning that each element only appears as subject once. This condition means that the prioritization mapping can be applied to upgrade any reference to a “canonical” reference.

This algorithm works in the following way

  1. Get the subset of exact matches from the input mapping list

  2. Convert the exact matches to an undirected mapping graph

  3. Extract connected components.

    Note

    because of construction, connected components might contain just two mappings, A, exact, B and B, exact A.

  4. For each component
    1. Get the “priority” reference using get_priority_reference()

    2. Construct new mappings where all references in the component are the subject and the priority reference is the object (skip the self mapping)

Here’s an example usage, where inference is run ahead of prioritization.

>>> from semra import DB_XREF, EXACT_MATCH, Reference
>>> from semra.inference import infer_reversible, infer_chains
>>> curies = "doid:0050577", "mesh:C562966", "umls:C4551571"
>>> r1, r2, r3 = (Reference.from_curie(c) for c in curies)
>>> m1 = Mapping.from_triple((r1, EXACT_MATCH, r2))
>>> m2 = Mapping.from_triple((r2, EXACT_MATCH, r3))
>>> m3 = Mapping.from_triple((r1, EXACT_MATCH, r3))
>>> mappings = [m1, m2, m3]
>>> mappings = infer_reversible(mappings)
>>> mappings = infer_chains(mappings)
>>> prioritize(mappings, ["mesh", "doid", "umls"])