Image by https://www.pexels.com/@pixabay/ |
In this blog, we use Pinecone vector store to perform upsert and query operations. Like more of the implementation these days, it is effortless and the code is concise and simple.
I am going to use the same data from my previous blog, Azure OpenAI: Similarity Search.
I have an Azure OpenAI service. Please get one if you wish to run the code. Or you can get any OpenAI service.
Dependencies
python = "^3.9" langchain = "^0.0.228" openai = "^0.27.8" python-dotenv = "^1.0.0" pinecone-client = "^2.2.2"
Environment Parameters
OPENAI_API_TYPE="azure" OPENAI_API_BASE="https://<my-azure-instance>.openai.azure.com/" OPENAI_API_KEY="<key>" OPENAI_API_VERSION="2023-05-15" TEXT_ENGINE="text-embedding-ada-002" EMBEDDING_DIMENSIONS="1536" PINECONE_VECTOR_STORE_KEY="<pinecone key>" PINECONE_VECTOR_STORE_ENV="<pinecone env>" PINECONE_VECTOR_STORE_INDEX_NAME="yelp-comments" PINECONE_VECTOR_STORE_NAMESPACE="comments"
Source Code
from langchain.embeddings import OpenAIEmbeddings from dotenv import load_dotenv import os import pinecone load_dotenv() embeddings = OpenAIEmbeddings(model=os.getenv("TEXT_ENGINE")) pinecone_index_name = os.getenv("PINECONE_VECTOR_STORE_INDEX_NAME") pinecone_namespace = os.getenv("PINECONE_VECTOR_STORE_NAMESPACE") # a lookup table - typically another datastore db: dict[str, str] = {} # initialize pinecone client pinecone.init( api_key=os.getenv("PINECONE_VECTOR_STORE_KEY"), environment=os.getenv("PINECONE_VECTOR_STORE_ENV"), ) # create index if it is absent if pinecone_index_name not in pinecone.list_indexes(): pinecone.create_index( pinecone_index_name, dimension=int(os.getenv("EMBEDDING_DIMENSIONS")), ) index = pinecone.Index(pinecone_index_name) # read from data.txt file which contains a list of yelp comments # and load into vector store results = [] with open("data.txt") as fp: for i, txt in enumerate(fp.readlines()): results.append((str(i), embeddings.embed_query(txt))) db[str(i)] = txt index.upsert(vectors=results, namespace=pinecone_namespace) # query for result query_response = index.query( namespace=pinecone_namespace, top_k=5, vector=embeddings.embed_query("I love this food in this resturant"), ) # print result for r in query_response["matches"]: print(f"""{db[r.id][0:46]}...\t{r["score"]}""")
The output is
It's my most favourite food and restaurant. Th... 0.881385624 Absolutely love their food. But haven't been... 0.84017241 If I'm to keep my reviews honest, DTF received... 0.809663355 First of all, if you can, go on a weekday and/... 0.801368177 How would like it if the waiter put your order... 0.774454892
which is the same as the output from my previous blog, Azure OpenAI: Similarity Search.
Comments
Post a Comment