Setting Up a Neo4j GenAI Environment on Fedora
In this article, we will walk through the steps to set up a Neo4j GenAI Python environment on a Fedora system using Ansible automation. This setup will enable you to deploy a Retrieval-Augmented Generation (RAG) system that integrates with a Neo4j graph database and utilizes OpenAI’s language models for interactive data retrieval and analysis.
Prerequisites
Before starting, ensure you have:
- A Fedora system with root access.
- Ansible installed on your control machine.
- Access to the OpenAI API and a valid API key.
- Credentials and URI for your Neo4j database.
Additionally, we will be using the Northwind dataset to populate our Neo4j database. For more information on importing the Northwind dataset into Neo4j, refer to the Northwind Dataset Guide.
Step-by-Step Setup
Create an Ansible Playbook
Create a file named
setup_neo4j_genai.yml
with the following content:--- - name: Set up Neo4j GenAI Python environment on Fedora hosts: all become: true tasks: - name: Install necessary system packages ansible.builtin.dnf: name: - python3 - python3-pip state: present update_cache: true - name: Install necessary Python packages using pip ansible.builtin.pip: name: - neo4j - neo4j_genai - openai state: present executable: /usr/bin/pip3 - name: Set OpenAI API key as environment variable ansible.builtin.lineinfile: path: /etc/environment line: "OPENAI_API_KEY={{ openai_key }}" create: true state: present mode: '0644' - name: Source environment file to apply changes ansible.builtin.shell: source /etc/environment - name: Create configuration file for Neo4j connection ansible.builtin.copy: dest: /etc/neo4j_genai_config.py content: | from neo4j import GraphDatabase URI = "{{ neo4j_uri }}" AUTH = ("{{ neo4j_auth.split(':')[0] }}", "{{ neo4j_auth.split(':')[1] }}") driver = GraphDatabase.driver(URI, auth=AUTH) mode: '0644' - name: Create application directory ansible.builtin.file: path: /opt/neo4j_genai state: directory mode: '0755' - name: Copy Python application to the server ansible.builtin.copy: src: files/application.py dest: /opt/neo4j_genai/application.py mode: '0755' - name: Run the Neo4j GenAI application ansible.builtin.command: python3 /opt/neo4j_genai/application.py environment: OPENAI_API_KEY: "{{ openai_key }}" NEO4J_URI: "{{ neo4j_uri }}" NEO4J_AUTH0: "{{ neo4j_auth.split(':')[0] }}" NEO4J_AUTH1: "{{ neo4j_auth.split(':')[1] }}" register: application_output ignore_errors: true - name: Debug application output ansible.builtin.debug: var: application_output.stdout_lines
This playbook automates the setup of the Neo4j GenAI environment by installing necessary packages, configuring environment variables, and deploying the application.
Define Variables
Create a variables file named
servers.yml
to store sensitive information. Ensure to replace the values with your actual credentials.python_version: "3.12" neo4j_uri: "your-neo4j-uri" neo4j_auth: "username:password" openai_key: "your-openai-api-key"
Note: Replace sensitive values like the Neo4j URI, credentials, and OpenAI API key with placeholders or secure vault mechanisms in a production environment. Follow this guide Where do I find my OpenAI API Key?
Create the Application File
Save the following Python code in
files/application.py
:from neo4j import GraphDatabase from neo4j_genai.embeddings.openai import OpenAIEmbeddings from neo4j_genai.retrievers import HybridRetriever from neo4j_genai.llm import OpenAILLM from neo4j_genai.generation import GraphRAG from openai import OpenAIError import os # Configure Neo4j and OpenAI URI = os.getenv("NEO4J_URI") AUTH = (os.getenv("NEO4J_AUTH0"), os.getenv("NEO4J_AUTH1")) OPENAI_API_KEY = os.getenv("OPENAI_API_KEY") # Initialize driver and retriever driver = GraphDatabase.driver(URI, auth=AUTH) embedder = OpenAIEmbeddings(model="text-embedding-ada-002") retriever = HybridRetriever( driver=driver, vector_index_name="product_vector", fulltext_index_name="product_name_index", embedder=embedder, return_properties=["productID", "productName", "unitPrice"], ) # Set up RAG pipeline llm = OpenAILLM(model_name="gpt-4", model_params={"temperature": 0}) rag = GraphRAG(retriever=retriever, llm=llm) # Perform retrieval with retry logic query_text = "What is the ID and price of the product Queso Cabrales?" print(query_text) max_retries = 5 retry_delay = 10 # seconds for attempt in range(max_retries): try: response = rag.search(query_text=query_text, retriever_config={"top_k": 3}) print(response.answer) break # Exit the loop if successful except OpenAIError as e: if "rate limit" in str(e).lower() or "insufficient_quota" in str(e).lower(): print(f"Rate limit exceeded or insufficient quota. Retrying in {retry_delay} seconds...") time.sleep(retry_delay) else: print(f"OpenAI Error: {e}") break except Exception as e: print(f"An unexpected error occurred: {e}") break
This application connects to a Neo4j database, retrieves data using OpenAI’s models, and displays the results.
Define a Full-Text Index
Full-text indexes are used to optimize text searches on specific properties of nodes or relationships. To define a full-text index on the
product
object, use the following Cypher command:CALL db.index.fulltext.createNodeIndex("product_name_index", ["Product"], ["productName"])
This command creates a full-text index named
product_name_index
on theProduct
label, indexing theproductName
property. For more details on full-text indexes, visit the Neo4j Full-Text Indexes Guide.Define a Vector Index
Vector indexes enable the storage and retrieval of vector embeddings, which are useful for similarity search. To define a vector index on the
product
object, use the following Cypher command:CREATE VECTOR INDEX product_vector FOR (n:Product) ON (n.embedding)
This command creates a vector index named
product_vector
on theProduct
label, indexing theembedding
property. For more information on vector indexes, refer to the Neo4j Vector Indexes Guide.Run the Playbook
Execute the Ansible playbook with the following command:
ansible-playbook -i inventory setup_neo4j_genai.yml --extra-vars="@servers.yml"
Replace
inventory
with your Ansible inventory file that includes your Fedora host.Verify the Setup
TASK [Debug application output] *************************************************************** ok: [fedora.example.com] => { "application_output.stdout_lines": [ "What is the ID and price of the product Queso Cabrales?", "The ID of the product Queso Cabrales is '11' and the price is 21.0." ] } PLAY RECAP ************************************************************************************ fedora.example.com : ok=10 changed=2 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
After the playbook runs successfully, verify that the application is working as expected by checking the output in the Ansible debug task or by accessing the server and reviewing the application logs.
Conclusion
This guide demonstrates how to set up a Neo4j GenAI environment on Fedora using Ansible automation. By following these steps, you can integrate advanced AI capabilities into your data management workflows and leverage the power of both Neo4j and OpenAI for intelligent data retrieval and analysis.
For additional information on importing datasets and creating indexes in Neo4j, refer to the following resources:
By utilizing these resources, you can further enhance your Neo4j database to support complex queries and AI-based analytics.
Subscribe to the YouTube channel, Medium, and Website, X (formerly Twitter) to not miss the next episode of the Ansible Pilot.Academy
Learn the Ansible automation technology with some real-life examples in my Udemy 300+ Lessons Video Course.
My book Ansible By Examples: 200+ Automation Examples For Linux and Windows System Administrator and DevOps
Donate
Want to keep this project going? Please donate