Saving Vector Store as KB

Forermx
Forermx Dataiku DSS Core Designer, Registered Posts: 3 ✭✭✭

I was wondering if there was any way of saving a FAISS vector store I create in a python notebook as a knowledge bank I can use later on?

I created a vector store (see code below) which has summaries as the embedded objects, and the parent documents as the retrieved documents. I did this based on LangChain's MultiVectorRetriever. The vector store I created uses DKUEmbedding function as the embedding model and everything I want to do works within the notebook. The problem is, I can't actually use this vector store because the FAISS knowledge bank can't be saved into my flow. Any thoughts? I would be open to work arounds like overwriting an empty knowledge bank with the custom vector store I create, if that's necessary.

Sample Code:

sample_query = "hello world"
sample_embedding = embedding_function.embed_query(sample_query)
embedding_dimension = len(sample_embedding) # Simple way to get embedding model output dimension
new_index = faiss.IndexFlatL2(embedding_dimension)
new_docstore = InMemoryDocstore() # Initialize an in-memory docstore vector_store = FAISS(
embedding_function=embedding_function,
index=new_index,
docstore=new_docstore,
index_to_docstore_id={}
) extracted_loader = DataFrameLoader(extracted_df, page_content_column="extracted_text")
docs.extend(extracted_loader.load())
doc_ids = [str(uuid.uuid4()) for _ in docs]

summary_docs = [
Document(page_content=s, metadata={id_key: doc_ids[i]})
for i, s in enumerate(summary_df['text_summary'])
] retriever = MultiVectorRetriever(
vectorstore=vector_store,
byte_store=store,
id_key=id_key,
)

retriever.vectorstore.add_documents(summary_docs)
retriever.docstore.mset(list(zip(doc_ids, docs)))

Setup Info
    Tags
      Help me…