PyTorch Geometric is a Python library for dealing with graph algorithms.
Use the package manager poetry in myenv to install foobar. Install pyenv befrorehand.
python3 -m venv .venv
source .venv/bin/activate
pip3 install poetry --no-cache
poetry installNow build desired neo4j container.
CONTAINER=$(docker run -d \
-p 7474:7474 -p 7687:7687 \
-v $(pwd)/data/neo4j_db/data:/data \
-v $(pwd)/data/neo4j_db/logs:/logs \
-v $(pwd)/data/neo4j_db/import:/var/lib/neo4j/import \
--name test-neo4j-stx-books-recommender44 \
-e NEO4J_apoc_export_file_enabled=true \
-e NEO4J_apoc_import_file_enabled=true \
-e NEO4J_apoc_import_file_use__neo4j__config=true \
-e NEO4J_AUTH=neo4j/stx_books_pass \
-e NEO4JLABS_PLUGINS='["apoc", "graph-data-science"]' \
-e NEO4J_ACCEPT_LICENSE_AGREEMENT=yes \
neo4j:4.4-enterprise
) Of note: We use here 4.4 version due to not being stable (at 30.01) APOC version from 5. x. This might vary in future.
Once Docker Container is up and running, create contents based on queries in YOUR_DOCKER_NEO_LOCATION/db_loader.cypher file.
You have few options:
- (Easy-mode) You can run them in browser and just copy-paste.
- Within terminal run ->
$ docker exec $CONTAINER /var/lib/neo4j/bin/neo4j-shell -f YOUR_DOCKER_NEO_LOCATION/db_loader.cypher
or for interactive mode... (to copy-paste like in the browser)
$ docker exec -ti $CONTAINER /var/lib/neo4j/bin/neo4j-shell
Before running your code, you need to define all variables stored in .env.
Especially:
MLFLOW_USER=
MLFLOW_PASSWORD=
MLFLOW_URL=
So either uses your own MLFlow account or use your dockerized one.
After proper data population within the graph database there should be visible following schema:
Or you can try by yourself by calling
CALL db.schema.visualization()- Users - representing our users with some attributes (including
first_name,last_nameetc) - Titles - representing specific books with their metadata. Connected with a user with relations
RATED_BYandREAD_BY. WhileRATED_BYhas its wage (0-10) and is used for further embeddings via FastRB to classify and obtain our recommendations (that will be modelled viaRECOMMENDED_BY) - Authors - Node that points to the given Author of the book, with its metadata. By relation
WRITTEN_BY - YearsOfPublications - node for a specific year of publication (via
WRITTEN_IN_YEARrelation) - Publishers - node representing publisher of a given book (via
PUBLISHED_BYrelation)
More detailed schema (with specific indices in csv view) can be read here
Our dataset comes from Kaggle Dataset It was modified limited to 50k and for readability by adding fixtures to Users (first name, last name) by faker so that any similarity to real person is pure coincidence :)
Then run the following code in the terminal for the training model and create a new RECOMMENDED_TO relationship.
python3 main.pyObviously, the relationship is between Titles and Users
(Titles)-[:RECOMMENDED_TO)->(Users)
Below is a fracture of new relationships:

How the process of embeddings (to temporary book_titles graph) looks like:

- TODO: See also our blog-post!- link to STX blogpost here for more (TODO: or copy-paste here)
Graph-based recommendations give us a very powerful tool to search by different criteria. Where our imagination is the limit.
Results of recommendation for a specific user (in this case Patti Jacobs)

MATCH paths=(u: Users {first_name: 'Patti', last_name: 'Jacobs'})-[:RECOMMENDED_TO]->(t:Titles) RETURN paths;List of readers that loves "pride & prejudice" to check what they have in common:
For results CSV
MATCH (romance_lovers:Users)-[:READ_BY]->(n:Titles) WHERE n.title = 'Pride and Prejudice'
MATCH (other_book:Titles)-[:RECOMMENDED_T0]->(romance_lovers:Users)
WHERE id(other_book) <> id(n)
RETURN other_book.author AS author, other_book.title AS title;What are the best guesses for top-5 book readers?
Below the cypher, sub-query obtaining first part

Full query showing all recommendations
For results CSV
CALL {
MATCH (users:Users)-[:READ_BY]->(n:Titles)
WITH COUNT(n) AS counter, n, COLLECT(id(users)) AS user_ids
RETURN n.title, counter, user_ids
ORDER BY counter DESC
LIMIT 5
}
WITH user_ids
UNWIND user_ids AS user_id
MATCH (u:Users {user:user_id})-[:RECOMMENDED_TO]->(t2:Titles)
RETURN t2
LIMIT 10;Here make the limitation to only readers based on US that have already rated books published after 1984!
For results CSV
CALL {
MATCH (u:Users)-[r:RATED_BY]->(t:Titles)
WITH lTrim(split(u. location, ',')[-1]) AS location, t, u
WHERE Location - 'usa' AND t.year_of_publication > 1984
RETURN t, u
LIMIT 10
}
WITH u
MATCH (u)<-[:RECOMMENDED_TO]-(t2:Titles)
RETURN t2.author AS recommended_author, t2.title AS recommended_title
LIMIT 5;Pulling data from Neo4j and loading results to Neo4j are made with the use of ["graph-data-science", "apoc"] plugins.
For a visualisation - an example of new mapping can be found in the sample/results.txt file, but it is not updated after new training.
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.
Please make sure to update tests as appropriate.