Setting Up a Neo4j GenAI Environment on Fedora

In this article, we will walk through the steps to set up a Neo4j GenAI Python environment on a Fedora system using Ansible automation. This setup will enable you to deploy a Retrieval-Augmented Generation (RAG) system that integrates with a Neo4j graph database and utilizes OpenAI’s language models for interactive data retrieval and analysis.

Prerequisites

Before starting, ensure you have:

  • A Fedora system with root access.
  • Ansible installed on your control machine.
  • Access to the OpenAI API and a valid API key.
  • Credentials and URI for your Neo4j database.

Additionally, we will be using the Northwind dataset to populate our Neo4j database. For more information on importing the Northwind dataset into Neo4j, refer to the Northwind Dataset Guide.

Step-by-Step Setup

  1. Create an Ansible Playbook

    Create a file named setup_neo4j_genai.yml with the following content:

    ---
    - name: Set up Neo4j GenAI Python environment on Fedora
      hosts: all
      become: true
    
      tasks:
    
        - name: Install necessary system packages
          ansible.builtin.dnf:
            name:
              - python3
              - python3-pip
            state: present
            update_cache: true
    
        - name: Install necessary Python packages using pip
          ansible.builtin.pip:
            name:
              - neo4j
              - neo4j_genai
              - openai
            state: present
            executable: /usr/bin/pip3
    
        - name: Set OpenAI API key as environment variable
          ansible.builtin.lineinfile:
            path: /etc/environment
            line: "OPENAI_API_KEY={{ openai_key }}"
            create: true
            state: present
            mode: '0644'
    
        - name: Source environment file to apply changes
          ansible.builtin.shell: source /etc/environment
    
        - name: Create configuration file for Neo4j connection
          ansible.builtin.copy:
            dest: /etc/neo4j_genai_config.py
            content: |
              from neo4j import GraphDatabase
              URI = "{{ neo4j_uri }}"
              AUTH = ("{{ neo4j_auth.split(':')[0] }}", "{{ neo4j_auth.split(':')[1] }}")
              driver = GraphDatabase.driver(URI, auth=AUTH)          
            mode: '0644'
    
        - name: Create application directory
          ansible.builtin.file:
            path: /opt/neo4j_genai
            state: directory
            mode: '0755'
    
        - name: Copy Python application to the server
          ansible.builtin.copy:
            src: files/application.py
            dest: /opt/neo4j_genai/application.py
            mode: '0755'
    
        - name: Run the Neo4j GenAI application
          ansible.builtin.command: python3 /opt/neo4j_genai/application.py
          environment:
            OPENAI_API_KEY: "{{ openai_key }}"
            NEO4J_URI: "{{ neo4j_uri }}"
            NEO4J_AUTH0: "{{ neo4j_auth.split(':')[0] }}"
            NEO4J_AUTH1: "{{ neo4j_auth.split(':')[1] }}"
          register: application_output
          ignore_errors: true
    
        - name: Debug application output
          ansible.builtin.debug:
            var: application_output.stdout_lines
    

    This playbook automates the setup of the Neo4j GenAI environment by installing necessary packages, configuring environment variables, and deploying the application.

  2. Define Variables

    Create a variables file named servers.yml to store sensitive information. Ensure to replace the values with your actual credentials.

    python_version: "3.12"
    neo4j_uri: "your-neo4j-uri"
    neo4j_auth: "username:password"
    openai_key: "your-openai-api-key"
    

    Note: Replace sensitive values like the Neo4j URI, credentials, and OpenAI API key with placeholders or secure vault mechanisms in a production environment. Follow this guide Where do I find my OpenAI API Key?

  3. Create the Application File

    Save the following Python code in files/application.py:

    from neo4j import GraphDatabase
    from neo4j_genai.embeddings.openai import OpenAIEmbeddings
    from neo4j_genai.retrievers import HybridRetriever
    from neo4j_genai.llm import OpenAILLM
    from neo4j_genai.generation import GraphRAG
    from openai import OpenAIError
    import os
    
    # Configure Neo4j and OpenAI
    URI = os.getenv("NEO4J_URI")
    AUTH = (os.getenv("NEO4J_AUTH0"), os.getenv("NEO4J_AUTH1"))
    OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
    
    # Initialize driver and retriever
    driver = GraphDatabase.driver(URI, auth=AUTH)
    embedder = OpenAIEmbeddings(model="text-embedding-ada-002")
    retriever = HybridRetriever(
        driver=driver,
        vector_index_name="product_vector",
        fulltext_index_name="product_name_index",
        embedder=embedder,
        return_properties=["productID", "productName", "unitPrice"],
    )
    
    # Set up RAG pipeline
    llm = OpenAILLM(model_name="gpt-4", model_params={"temperature": 0})
    rag = GraphRAG(retriever=retriever, llm=llm)
    
    # Perform retrieval with retry logic
    query_text = "What is the ID and price of the product Queso Cabrales?"
    print(query_text)
    max_retries = 5
    retry_delay = 10  # seconds
    
    for attempt in range(max_retries):
        try:
            response = rag.search(query_text=query_text, retriever_config={"top_k": 3})
            print(response.answer)
            break  # Exit the loop if successful
        except OpenAIError as e:
            if "rate limit" in str(e).lower() or "insufficient_quota" in str(e).lower():
                print(f"Rate limit exceeded or insufficient quota. Retrying in {retry_delay} seconds...")
                time.sleep(retry_delay)
            else:
                print(f"OpenAI Error: {e}")
                break
        except Exception as e:
            print(f"An unexpected error occurred: {e}")
            break
    

    This application connects to a Neo4j database, retrieves data using OpenAI’s models, and displays the results.

  4. Define a Full-Text Index

    Full-text indexes are used to optimize text searches on specific properties of nodes or relationships. To define a full-text index on the product object, use the following Cypher command:

    CALL db.index.fulltext.createNodeIndex("product_name_index", ["Product"], ["productName"])
    

    This command creates a full-text index named product_name_index on the Product label, indexing the productName property. For more details on full-text indexes, visit the Neo4j Full-Text Indexes Guide.

  5. Define a Vector Index

    Vector indexes enable the storage and retrieval of vector embeddings, which are useful for similarity search. To define a vector index on the product object, use the following Cypher command:

    CREATE VECTOR INDEX product_vector FOR (n:Product) ON (n.embedding)
    

    This command creates a vector index named product_vector on the Product label, indexing the embedding property. For more information on vector indexes, refer to the Neo4j Vector Indexes Guide.

  6. Run the Playbook

    Execute the Ansible playbook with the following command:

    ansible-playbook -i inventory setup_neo4j_genai.yml --extra-vars="@servers.yml"
    

    Replace inventory with your Ansible inventory file that includes your Fedora host.

  7. Verify the Setup

     TASK [Debug application output] ***************************************************************
     ok: [fedora.example.com] => {
         "application_output.stdout_lines": [
             "What is the ID and price of the product Queso Cabrales?",
             "The ID of the product Queso Cabrales is '11' and the price is 21.0."
         ]
     }
    
     PLAY RECAP ************************************************************************************
     fedora.example.com         : ok=10   changed=2    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0 
    

    After the playbook runs successfully, verify that the application is working as expected by checking the output in the Ansible debug task or by accessing the server and reviewing the application logs.

Conclusion

This guide demonstrates how to set up a Neo4j GenAI environment on Fedora using Ansible automation. By following these steps, you can integrate advanced AI capabilities into your data management workflows and leverage the power of both Neo4j and OpenAI for intelligent data retrieval and analysis.

For additional information on importing datasets and creating indexes in Neo4j, refer to the following resources:

By utilizing these resources, you can further enhance your Neo4j database to support complex queries and AI-based analytics.

Subscribe to the YouTube channel, Medium, and Website, X (formerly Twitter) to not miss the next episode of the Ansible Pilot.

Academy

Learn the Ansible automation technology with some real-life examples in my Udemy 300+ Lessons Video Course.

BUY the Complete Udemy 300+ Lessons Video Course

My book Ansible By Examples: 200+ Automation Examples For Linux and Windows System Administrator and DevOps

BUY the Complete PDF BOOK to easily Copy and Paste the 250+ Ansible code

Want to keep this project going? Please donate

Patreon Buy me a Pizza