HKSingleParty/99_references/supabase-examples/ai/llamaindex/llamaindex.ipynb

{
  "cells": [
    {
      "attachments": {},
      "cell_type": "markdown",
      "id": "4f3b89c3",
      "metadata": {},
      "source": [
        "[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/supabase/supabase/blob/master/examples/ai/llamaindex/llamaindex.ipynb)"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "id": "2a1f181d-feeb-4b29-aabc-67a75234b92c",
      "metadata": {},
      "source": [
        "# Supabase + LlamaIndex\n",
        "\n",
        "In this example we'll use PostgreSQL + pgvectors with [LlamaIndex's Supabase Vector Store](https://gpt-index.readthedocs.io/en/latest/examples/vector_stores/SupabaseVectorIndexDemo.html#setup-openai).\n"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "id": "3cab93f5-10d0-47c5-9f4e-64921461e7e2",
      "metadata": {},
      "source": [
        "## Install Dependencies"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 1,
      "id": "a41bc3e4-ea52-43aa-9239-a431b49f029e",
      "metadata": {},
      "outputs": [],
      "source": [
        "!pip install -qU vecs datasets llama_index html2text"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "0026437c",
      "metadata": {},
      "outputs": [],
      "source": [
        "import logging\n",
        "import sys\n",
        "\n",
        "# Uncomment to see debug logs\n",
        "# logging.basicConfig(stream=sys.stdout, level=logging.DEBUG)\n",
        "# logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))\n",
        "\n",
        "from llama_index.core.storage import StorageContext\n",
        "from llama_index.readers.web import SimpleWebPageReader\n",
        "from llama_index.indices.vector_store import VectorStoreIndex\n",
        "from llama_index.vector_stores.supabase import SupabaseVectorStore\n",
        "import textwrap\n",
        "import html2text"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "id": "7a95a440",
      "metadata": {},
      "source": [
        "# Set up OpenAI\n",
        "\n",
        "OpenAI requires an [API key](https://platform.openai.com/api-keys) to run their models. Let's store that in an environment variable:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "ca2437d3",
      "metadata": {},
      "outputs": [],
      "source": [
        "import os\n",
        "os.environ['OPENAI_API_KEY'] = \"[your_openai_api_key]\""
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "id": "4dbd176f-3b4e-4d41-a72d-1e1affe6ecae",
      "metadata": {},
      "source": [
        "## Load the Dataset\n",
        "\n",
        "Let's load a small data set of Paul Graham's essays:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "dc6b0bc2-b95f-4190-bf77-fa2dc57fc247",
      "metadata": {},
      "outputs": [],
      "source": [
        "essays = [\n",
        "    'paul_graham_essay.txt'\n",
        "]\n",
        "documents = SimpleWebPageReader().load_data([f'https://raw.githubusercontent.com/supabase/supabase/master/examples/ai/llamaindex/data/{essay}' for essay in essays])\n",
        "print('Document ID:', documents[0].doc_id, 'Document Hash:', documents[0].hash)"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "id": "955b2700-8242-40eb-ac3f-d479a0312693",
      "metadata": {},
      "source": [
        "## Create an index in Supabase\n",
        "\n",
        "Let's store Paul Graham's essays in Supabase. You can find the Postgres connection string in the [Database Settings](https://supabase.com/dashboard/project/_/settings/database) of your Supabase project.\n",
        "\n",
        "> **Note:** SQLAlchemy requires the connection string to start with `postgresql://` (instead of `postgres://`). Don't forget to rename this after copying the string from the dashboard.\n",
        "\n",
        "> **Note:** You must use the \"connection pooling\" string (domain ending in `*.pooler.supabase.com`) with Google Colab since Colab does not support IPv6.\n",
        "\n",
        "This will also work with any other Postgres provider that supports pgvector."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 5,
      "id": "ce9bc85b-e844-407c-a0ad-ccf6af3c8866",
      "metadata": {},
      "outputs": [],
      "source": [
        "\n",
        "# Substitute your connection string here\n",
        "DB_CONNECTION = \"postgresql://postgres:password@localhost:5431/db\"\n",
        "\n",
        "vector_store = SupabaseVectorStore(\n",
        "    postgres_connection_string=DB_CONNECTION, \n",
        "    collection_name='base_demo'\n",
        ")\n",
        "storage_context = StorageContext.from_defaults(vector_store=vector_store)\n",
        "index = VectorStoreIndex.from_documents(documents, storage_context=storage_context)"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "id": "9e82862d-440a-4f66-9ed7-0eaa6a0f4062",
      "metadata": {},
      "source": [
        "## Query the index\n",
        "\n",
        "We can now ask questions using our index."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 7,
      "id": "d2771545-d209-4ceb-a222-ed139a4620f2",
      "metadata": {},
      "outputs": [],
      "source": [
        "query_engine = index.as_query_engine()\n",
        "\n",
        "# Ask a question\n",
        "response = query_engine.query(\"What did the author do growing up?\")\n",
        "\n",
        "# Print the response\n",
        "print(response)"
      ]
    }
  ],
  "metadata": {
    "kernelspec": {
      "display_name": "Python 3 (ipykernel)",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.11.2"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 5
}