Querying with Cypher

SeMRA constructs data artifacts and docker configuration for locally deploying a Neo4j graph databases and a web application via semra.io.write_neo4j() (for example outputs, see semra.database or semra.landscape). The resulting graph database can be queried directly with the Cypher query language in one of the following ways:

  1. By connecting with a client via the bolt protocol on port 7687, which is exposed in the Dockerfile

  2. By navigating to http://localhost:7474 in the web browser to use Neo4j’s builtin graphical front-end, where you can type in Cypher queries and interact with the results.

The contents of the grpah database have the following schema:

_images/graph-schema.svg

Below, some example Cypher queries are given to show what is possible by direct querying of the database.

Lookup by CURIE

The following Cypher queries allow for looking up concepts, mappings, evidences, and mapping sets.

Look up a concept (e.g., a cell line) by its CURIE:

MATCH (n:concept)
WHERE n.curie = "cellosaurus:0440"
RETURN n

The same is possible for mappings, evidences, and mapping sets. Each of these three types of entities has SeMRA-specific CURIE generation. For a mapping:

MATCH (m:mapping)
WHERE m.curie = "..."
RETURN m

For an evidence:

MATCH (e:evidence)
WHERE e.curie = "..."
RETURN e

For a mapping set:

MATCH (s:mappingset)
WHERE s.curie = "..."
RETURN s

Cypher also lets you return certain parts from each record. The list of what fields are available can be found in the following documentation:

Concept

semra.io.neo4j_io.CONCEPT_NODES_HEADER

Mapping

semra.io.neo4j_io.MAPPING_NODES_HEADER

Evidence

semra.io.neo4j_io.EVIDENCE_NODES_HEADER

Mapping Set

semra.io.neo4j_io.MAPPING_NODES_HEADER

For example, you can look up a concept by its CURIE and return specific parts, such as the name:

MATCH (n:concept)
WHERE n.curie = "cellosaurus:0440"
RETURN n.name

Traversing Mappings

Get all targets for exact match mappings where cellosaurus:0440 is the source:

MATCH
    (source:concept)-[:`skos:exactMatch`]->(target:concept)
WHERE source.curie = "cellosaurus:0440"
RETURN target

The same query can be reified using owl:annotatedSource, owl:annotatedTarget, and the mapping node type:

MATCH
    (m:mapping)-[:`owl:annotatedSource`]->(source:concept) ,
    (m)-[:`owl:annotatedSource`]->(target:concept)
WHERE source.curie = "cellosaurus:0440" and m.predicate == "skos:exactMatch"
RETURN target

After reifying, you can extend the query to return evidences. In the interactive view, returning multiple elements will also automatically show edges between them

MATCH
    (m:mapping)-[:`owl:annotatedSource`]->(source:concept) ,
    (m)-[:`owl:annotatedSource`]->(target:concept)
    (m)-[:hasEvidence]->(e:evidence)
WHERE source.curie = "cellosaurus:0440" and m.predicate == "skos:exactMatch"
RETURN source, target, m, e

Reification is useful for doing complex filters, e.g., on mapping justification. The following query returns exact matches to cellosaurus:0440 that have manual mapping justification

MATCH
    (m:mapping)-[:`owl:annotatedSource`]->(source:concept) ,
    (m)-[:`owl:annotatedSource`]->(target:concept)
    (m)-[:hasEvidence]->(e:evidence)
WHERE
    source.curie = "cellosaurus:0440"
    and m.predicate == "skos:exactMatch"
    and e.mapping_justification == "semapv:ManualMappingCuration"
RETURN target

The previous query can be reformulated to filter for minimum confidence:

MATCH
    (m:mapping)-[:`owl:annotatedSource`]->(source:concept) ,
    (m)-[:`owl:annotatedSource`]->(target:concept)
    (m)-[:hasEvidence]->(e:evidence)
WHERE
    source.curie = "cellosaurus:0440"
    and m.predicate == "skos:exactMatch"
    and e.confidence > 0.3
RETURN target

It can also be extended to return the authors of the evidences:

MATCH
    (m:mapping)-[:`owl:annotatedSource`]->(source:concept) ,
    (m)-[:`owl:annotatedSource`]->(target:concept)
    (m)-[:hasEvidence]->(e:evidence)
    (e)-[:hasAuthor]->(author:concept)
WHERE
    source.curie = "cellosaurus:0440"
    and m.predicate == "skos:exactMatch"
    and e.mapping_justification == "semapv:ManualMappingCuration"
RETURN target, author

The following query gets all mappings (with associated evidences, mapping sets, and authors) where cellosaurus:0440 is the source, with optional matches for mapping sets and authors:

MATCH
    (m:mapping)-[:`owl:annotatedSource`]->(source:concept) ,
    (m:mapping)-[:`owl:annotatedTarget`]->(target:concept) ,
    (m)-[:hasEvidence]->(e:evidence)
WHERE source.curie = "cellosaurus:0440"
OPTIONAL MATCH
    (e)-[:fromSet]->(mset:mappingset)
OPTIONAL MATCH
    (e)-[:hasAuthor]->(author:concept)
RETURN source, target, m, e, mset, author

Neo4j Output Reference

I/O for Neo4j.

Variables

CONCEPT_NODES_HEADER

The column headers for the concept nodes in the SeMRA Neo4j graph database export

DERIVED_PREDICATE

The predicate used in the graph data model connecting a reasoned evidence

EDGES_HEADER

The column headers for properties attached to simple mappings

EDGES_SUPPLEMENT_HEADER

for extra edges that aren't mapping edges, such as those with HAS_EVIDENCE_PREDICATE, FROM_SET_PREDICATE, DERIVED_PREDICATE, and HAS_AUTHOR_PREDICATE

EVIDENCE_NODES_HEADER

The column headers for evidence nodes in the SeMRA Neo4j graph database export

FROM_SET_PREDICATE

The predicate used in the graph data model connecting an evidence node to a mapping set node

HAS_AUTHOR_PREDICATE

node to the mapping node(s) from which it was derived

HAS_EVIDENCE_PREDICATE

The predicate used in the graph data model connecting a mapping node to an evidence node

MAPPING_NODES_HEADER

The column headers for the mapping nodes in the SeMRA Neo4j graph database export

MAPPING_SET_NODES_HEADER

Built-in mutable sequence.