Setting Up Neo4j GenAI Environment on Fedora Using Ansible

Learn how to set up a Neo4j GenAI environment on Fedora using Ansible, including full-text and vector indexing, and OpenAI integration

Setting Up Neo4j GenAI Environment on Fedora Using Ansible

September 25, 2024
Posted by Luca Berton
Neo4j, GenAI, Fedora, Ansible, Python, AI, Graph Database, Full-Text Index, Vector Index, OpenAI, RAG, Database Setup, Data Retrieval, AI Integration, Automation, Neo4j Setup, Ansible Playbook, Linux, System Administration, Data Science

Setting Up a Neo4j GenAI Environment on Fedora

In this article, we will walk through the steps to set up a Neo4j GenAI Python environment on a Fedora system using Ansible automation. This setup will enable you to deploy a Retrieval-Augmented Generation (RAG) system that integrates with a Neo4j graph database and utilizes OpenAI’s language models for interactive data retrieval and analysis.

Prerequisites

Before starting, ensure you have:

A Fedora system with root access.
Ansible installed on your control machine.
Access to the OpenAI API and a valid API key.
Credentials and URI for your Neo4j database.

Additionally, we will be using the Northwind dataset to populate our Neo4j database. For more information on importing the Northwind dataset into Neo4j, refer to the Northwind Dataset Guide.

Step-by-Step Setup

Create an Ansible Playbook

Create a file named setup_neo4j_genai.yml with the following content:

---
- name: Set up Neo4j GenAI Python environment on Fedora
  hosts: all
  become: true

  tasks:

    - name: Install necessary system packages
      ansible.builtin.dnf:
        name:
          - python3
          - python3-pip
        state: present
        update_cache: true

    - name: Install necessary Python packages using pip
      ansible.builtin.pip:
        name:
          - neo4j
          - neo4j_genai
          - openai
        state: present
        executable: /usr/bin/pip3

    - name: Set OpenAI API key as environment variable
      ansible.builtin.lineinfile:
        path: /etc/environment
        line: "OPENAI_API_KEY={{ openai_key }}"
        create: true
        state: present
        mode: '0644'

    - name: Source environment file to apply changes
      ansible.builtin.shell: source /etc/environment

    - name: Create configuration file for Neo4j connection
      ansible.builtin.copy:
        dest: /etc/neo4j_genai_config.py
        content: |
          from neo4j import GraphDatabase
          URI = "{{ neo4j_uri }}"
          AUTH = ("{{ neo4j_auth.split(':')[0] }}", "{{ neo4j_auth.split(':')[1] }}")
          driver = GraphDatabase.driver(URI, auth=AUTH)          
        mode: '0644'

    - name: Create application directory
      ansible.builtin.file:
        path: /opt/neo4j_genai
        state: directory
        mode: '0755'

    - name: Copy Python application to the server
      ansible.builtin.copy:
        src: files/application.py
        dest: /opt/neo4j_genai/application.py
        mode: '0755'

    - name: Run the Neo4j GenAI application
      ansible.builtin.command: python3 /opt/neo4j_genai/application.py
      environment:
        OPENAI_API_KEY: "{{ openai_key }}"
        NEO4J_URI: "{{ neo4j_uri }}"
        NEO4J_AUTH0: "{{ neo4j_auth.split(':')[0] }}"
        NEO4J_AUTH1: "{{ neo4j_auth.split(':')[1] }}"
      register: application_output
      ignore_errors: true

    - name: Debug application output
      ansible.builtin.debug:
        var: application_output.stdout_lines

This playbook automates the setup of the Neo4j GenAI environment by installing necessary packages, configuring environment variables, and deploying the application.

Define Variables
Create a variables file named servers.yml to store sensitive information. Ensure to replace the values with your actual credentials.
```
python_version: "3.12"
neo4j_uri: "your-neo4j-uri"
neo4j_auth: "username:password"
openai_key: "your-openai-api-key"
```
Note: Replace sensitive values like the Neo4j URI, credentials, and OpenAI API key with placeholders or secure vault mechanisms in a production environment. Follow this guide Where do I find my OpenAI API Key?

Create the Application File

Save the following Python code in files/application.py:

from neo4j import GraphDatabase
from neo4j_genai.embeddings.openai import OpenAIEmbeddings
from neo4j_genai.retrievers import HybridRetriever
from neo4j_genai.llm import OpenAILLM
from neo4j_genai.generation import GraphRAG
from openai import OpenAIError
import os

# Configure Neo4j and OpenAI
URI = os.getenv("NEO4J_URI")
AUTH = (os.getenv("NEO4J_AUTH0"), os.getenv("NEO4J_AUTH1"))
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")

# Initialize driver and retriever
driver = GraphDatabase.driver(URI, auth=AUTH)
embedder = OpenAIEmbeddings(model="text-embedding-ada-002")
retriever = HybridRetriever(
    driver=driver,
    vector_index_name="product_vector",
    fulltext_index_name="product_name_index",
    embedder=embedder,
    return_properties=["productID", "productName", "unitPrice"],
)

# Set up RAG pipeline
llm = OpenAILLM(model_name="gpt-4", model_params={"temperature": 0})
rag = GraphRAG(retriever=retriever, llm=llm)

# Perform retrieval with retry logic
query_text = "What is the ID and price of the product Queso Cabrales?"
print(query_text)
max_retries = 5
retry_delay = 10  # seconds

for attempt in range(max_retries):
    try:
        response = rag.search(query_text=query_text, retriever_config={"top_k": 3})
        print(response.answer)
        break  # Exit the loop if successful
    except OpenAIError as e:
        if "rate limit" in str(e).lower() or "insufficient_quota" in str(e).lower():
            print(f"Rate limit exceeded or insufficient quota. Retrying in {retry_delay} seconds...")
            time.sleep(retry_delay)
        else:
            print(f"OpenAI Error: {e}")
            break
    except Exception as e:
        print(f"An unexpected error occurred: {e}")
        break

This application connects to a Neo4j database, retrieves data using OpenAI’s models, and displays the results.

Define a Full-Text Index
Full-text indexes are used to optimize text searches on specific properties of nodes or relationships. To define a full-text index on the product object, use the following Cypher command:
```
CALL db.index.fulltext.createNodeIndex("product_name_index", ["Product"], ["productName"])
```
This command creates a full-text index named product_name_index on the Product label, indexing the productName property. For more details on full-text indexes, visit the Neo4j Full-Text Indexes Guide.
Define a Vector Index
Vector indexes enable the storage and retrieval of vector embeddings, which are useful for similarity search. To define a vector index on the product object, use the following Cypher command:
```
CREATE VECTOR INDEX product_vector FOR (n:Product) ON (n.embedding)
```
This command creates a vector index named product_vector on the Product label, indexing the embedding property. For more information on vector indexes, refer to the Neo4j Vector Indexes Guide.
Run the Playbook
Execute the Ansible playbook with the following command:
```
ansible-playbook -i inventory setup_neo4j_genai.yml --extra-vars="@servers.yml"
```
Replace inventory with your Ansible inventory file that includes your Fedora host.

Verify the Setup

 TASK [Debug application output] ***************************************************************
 ok: [fedora.example.com] => {
     "application_output.stdout_lines": [
         "What is the ID and price of the product Queso Cabrales?",
         "The ID of the product Queso Cabrales is '11' and the price is 21.0."
     ]
 }

 PLAY RECAP ************************************************************************************
 fedora.example.com         : ok=10   changed=2    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0

After the playbook runs successfully, verify that the application is working as expected by checking the output in the Ansible debug task or by accessing the server and reviewing the application logs.

Conclusion

This guide demonstrates how to set up a Neo4j GenAI environment on Fedora using Ansible automation. By following these steps, you can integrate advanced AI capabilities into your data management workflows and leverage the power of both Neo4j and OpenAI for intelligent data retrieval and analysis.

For additional information on importing datasets and creating indexes in Neo4j, refer to the following resources:

By utilizing these resources, you can further enhance your Neo4j database to support complex queries and AI-based analytics.

Subscribe to the YouTube channel, Medium, and Website, X (formerly Twitter) to not miss the next episode of the Ansible Pilot.

Academy

Learn the Ansible automation technology with some real-life examples in my Udemy 300+ Lessons Video Course.

BUY the Complete Udemy 300+ Lessons Video Course

My book Ansible By Examples: 200+ Automation Examples For Linux and Windows System Administrator and DevOps

BUY the Complete PDF BOOK to easily Copy and Paste the 250+ Ansible code

Donate

Want to keep this project going? Please donate

Patreon Buy me a Pizza