Querying with Cypher

SeMRA constructs data artifacts and docker configuration for locally deploying a Neo4j graph databases and a web application via semra.io.write_neo4j() (for example outputs, see semra.database or semra.landscape). The resulting graph database can be queried directly with the Cypher query language in one of the following ways:

By connecting with a client via the bolt protocol on port 7687, which is exposed in the Dockerfile
By navigating to http://localhost:7474 in the web browser to use Neo4j’s builtin graphical front-end, where you can type in Cypher queries and interact with the results.

The contents of the grpah database have the following schema:

Below, some example Cypher queries are given to show what is possible by direct querying of the database.

Lookup by CURIE

The following Cypher queries allow for looking up concepts, mappings, evidences, and mapping sets.

Look up a concept (e.g., a cell line) by its CURIE:

MATCH (n:concept)
WHERE n.curie = "cellosaurus:0440"
RETURN n

The same is possible for mappings, evidences, and mapping sets. Each of these three types of entities has SeMRA-specific CURIE generation. For a mapping:

MATCH (m:mapping)
WHERE m.curie = "..."
RETURN m

For an evidence:

MATCH (e:evidence)
WHERE e.curie = "..."
RETURN e

For a mapping set:

MATCH (s:mappingset)
WHERE s.curie = "..."
RETURN s

Cypher also lets you return certain parts from each record. The list of what fields are available can be found in the following documentation:

Concept	`semra.io.neo4j_io.CONCEPT_NODES_HEADER`
Mapping	`semra.io.neo4j_io.MAPPING_NODES_HEADER`
Evidence	`semra.io.neo4j_io.EVIDENCE_NODES_HEADER`
Mapping Set	`semra.io.neo4j_io.MAPPING_NODES_HEADER`

For example, you can look up a concept by its CURIE and return specific parts, such as the name:

MATCH (n:concept)
WHERE n.curie = "cellosaurus:0440"
RETURN n.name

Traversing Mappings

Get all targets for exact match mappings where cellosaurus:0440 is the source:

MATCH
    (source:concept)-[:`skos:exactMatch`]->(target:concept)
WHERE source.curie = "cellosaurus:0440"
RETURN target

The same query can be reified using owl:annotatedSource, owl:annotatedTarget, and the mapping node type:

MATCH
    (m:mapping)-[:`owl:annotatedSource`]->(source:concept) ,
    (m)-[:`owl:annotatedSource`]->(target:concept)
WHERE source.curie = "cellosaurus:0440" and m.predicate == "skos:exactMatch"
RETURN target

After reifying, you can extend the query to return evidences. In the interactive view, returning multiple elements will also automatically show edges between them

MATCH
    (m:mapping)-[:`owl:annotatedSource`]->(source:concept) ,
    (m)-[:`owl:annotatedSource`]->(target:concept)
    (m)-[:hasEvidence]->(e:evidence)
WHERE source.curie = "cellosaurus:0440" and m.predicate == "skos:exactMatch"
RETURN source, target, m, e

Reification is useful for doing complex filters, e.g., on mapping justification. The following query returns exact matches to cellosaurus:0440 that have manual mapping justification

MATCH
    (m:mapping)-[:`owl:annotatedSource`]->(source:concept) ,
    (m)-[:`owl:annotatedSource`]->(target:concept)
    (m)-[:hasEvidence]->(e:evidence)
WHERE
    source.curie = "cellosaurus:0440"
    and m.predicate == "skos:exactMatch"
    and e.mapping_justification == "semapv:ManualMappingCuration"
RETURN target

The previous query can be reformulated to filter for minimum confidence:

MATCH
    (m:mapping)-[:`owl:annotatedSource`]->(source:concept) ,
    (m)-[:`owl:annotatedSource`]->(target:concept)
    (m)-[:hasEvidence]->(e:evidence)
WHERE
    source.curie = "cellosaurus:0440"
    and m.predicate == "skos:exactMatch"
    and e.confidence > 0.3
RETURN target

It can also be extended to return the authors of the evidences:

MATCH
    (m:mapping)-[:`owl:annotatedSource`]->(source:concept) ,
    (m)-[:`owl:annotatedSource`]->(target:concept)
    (m)-[:hasEvidence]->(e:evidence)
    (e)-[:hasAuthor]->(author:concept)
WHERE
    source.curie = "cellosaurus:0440"
    and m.predicate == "skos:exactMatch"
    and e.mapping_justification == "semapv:ManualMappingCuration"
RETURN target, author

The following query gets all mappings (with associated evidences, mapping sets, and authors) where cellosaurus:0440 is the source, with optional matches for mapping sets and authors:

MATCH
    (m:mapping)-[:`owl:annotatedSource`]->(source:concept) ,
    (m:mapping)-[:`owl:annotatedTarget`]->(target:concept) ,
    (m)-[:hasEvidence]->(e:evidence)
WHERE source.curie = "cellosaurus:0440"
OPTIONAL MATCH
    (e)-[:fromSet]->(mset:mappingset)
OPTIONAL MATCH
    (e)-[:hasAuthor]->(author:concept)
RETURN source, target, m, e, mset, author

Neo4j Output Reference

I/O for Neo4j.

Variables

`CONCEPT_NODES_HEADER`	The column headers for the concept nodes in the SeMRA Neo4j graph database export
`DERIVED_PREDICATE`	The predicate used in the graph data model connecting a reasoned evidence
`EDGES_HEADER`	The column headers for properties attached to simple mappings
`EDGES_SUPPLEMENT_HEADER`	for extra edges that aren't mapping edges, such as those with `HAS_EVIDENCE_PREDICATE`, `FROM_SET_PREDICATE`, `DERIVED_PREDICATE`, and `HAS_AUTHOR_PREDICATE`
`EVIDENCE_NODES_HEADER`	The column headers for evidence nodes in the SeMRA Neo4j graph database export
`FROM_SET_PREDICATE`	The predicate used in the graph data model connecting an evidence node to a mapping set node
`HAS_AUTHOR_PREDICATE`	node to the mapping node(s) from which it was derived
`HAS_EVIDENCE_PREDICATE`	The predicate used in the graph data model connecting a mapping node to an evidence node
`MAPPING_NODES_HEADER`	The column headers for the mapping nodes in the SeMRA Neo4j graph database export
`MAPPING_SET_NODES_HEADER`	Built-in mutable sequence.