Pinecone Vector Store

Image by https://www.pexels.com/@pixabay/
Image by https://www.pexels.com/@pixabay/

 In this blog, we use Pinecone vector store to perform upsert and query operations. Like more of the implementation these days, it is effortless and the code is concise and simple.

I am going to use the same data from my previous blog, Azure OpenAI: Similarity Search.

I have an Azure OpenAI service. Please get one if you wish to run the code. Or you can get any OpenAI service.

Dependencies

python = "^3.9"
langchain = "^0.0.228"
openai = "^0.27.8"
python-dotenv = "^1.0.0"
pinecone-client = "^2.2.2"

Environment Parameters

OPENAI_API_TYPE="azure"
OPENAI_API_BASE="https://<my-azure-instance>.openai.azure.com/"
OPENAI_API_KEY="<key>"
OPENAI_API_VERSION="2023-05-15"
TEXT_ENGINE="text-embedding-ada-002"
EMBEDDING_DIMENSIONS="1536"

PINECONE_VECTOR_STORE_KEY="<pinecone key>"
PINECONE_VECTOR_STORE_ENV="<pinecone env>"
PINECONE_VECTOR_STORE_INDEX_NAME="yelp-comments"
PINECONE_VECTOR_STORE_NAMESPACE="comments"

Source Code

from langchain.embeddings import OpenAIEmbeddings
from dotenv import load_dotenv

import os
import pinecone

load_dotenv()

embeddings = OpenAIEmbeddings(model=os.getenv("TEXT_ENGINE"))
pinecone_index_name = os.getenv("PINECONE_VECTOR_STORE_INDEX_NAME")
pinecone_namespace = os.getenv("PINECONE_VECTOR_STORE_NAMESPACE")

# a lookup table - typically another datastore
db: dict[str, str] = {}

# initialize pinecone client
pinecone.init(
    api_key=os.getenv("PINECONE_VECTOR_STORE_KEY"),
    environment=os.getenv("PINECONE_VECTOR_STORE_ENV"),
)

# create index if it is absent
if pinecone_index_name not in pinecone.list_indexes():
    pinecone.create_index(
        pinecone_index_name,
        dimension=int(os.getenv("EMBEDDING_DIMENSIONS")),
    )

index = pinecone.Index(pinecone_index_name)

# read from data.txt file which contains a list of yelp comments
# and load into vector store
results = []

with open("data.txt") as fp:
    for i, txt in enumerate(fp.readlines()):
        results.append((str(i), embeddings.embed_query(txt)))
        db[str(i)] = txt

index.upsert(vectors=results, namespace=pinecone_namespace)

# query for result
query_response = index.query(
    namespace=pinecone_namespace,
    top_k=5,
    vector=embeddings.embed_query("I love this food in this resturant"),
)

# print result
for r in query_response["matches"]:
    print(f"""{db[r.id][0:46]}...\t{r["score"]}""")

The output is
It's my most favourite food and restaurant. Th...       0.881385624
Absolutely love their food.   But haven't been...       0.84017241
If I'm to keep my reviews honest, DTF received...       0.809663355
First of all, if you can, go on a weekday and/...       0.801368177
How would like it if the waiter put your order...       0.774454892
which is the same as the output from my previous blog, Azure OpenAI: Similarity Search.


Comments