Survey banner
The Dataiku Community is moving to a new home! We are temporary in read only mode: LEARN MORE

warning using sentence-transformers/all-MiniLM-L6-v2 for embedding

Level 1
warning using sentence-transformers/all-MiniLM-L6-v2 for embedding


we are using huggingface model that does not required API,

already downloaded hugging face model in resources by using this code

model_name_fifteen = 'sentence-transformers/all-MiniLM-L6-v2'
MODEL_REVISION_FIFTEEN = '8b3219a92973c328a8e22fadcfa821b5dc75636a'
tokenizer = AutoTokenizer.from_pretrained(model_name_fifteen, revision=MODEL_REVISION_FIFTEEN)
model = AutoModel.from_pretrained(model_name_fifteen, revision=MODEL_REVISION_FIFTEEN)

I have no problem installing it. 

now I want to use it using this code

import os
# Define which pre-trained model to use
model1 = {"name": "sentence-transformers/all-MiniLM-L6-v2",
"revision": "8b3219a92973c328a8e22fadcfa821b5dc75636a"}
# Load pre-trained model
hf_home_dir = os.getenv("HF_HOME")
model_path1 = os.path.join(hf_home_dir, f"transformers/models--sentence-transformers--all-MiniLM-L6-v2/snapshots/{model1['revision']}")

no problem when load pre-trained model.

when I try to do embedding, it give the warning

# Step 2: Embed text and store in vector database
embedder = SentenceTransformer(model_path1)
chunks = pdf_text.split("\n\n")
embeddings = embedder.encode(chunks, convert_to_tensor=True)
dimension = embeddings.shape[1]
index = faiss.IndexFlatL2(dimension)

WARNING:sentence_transformers.SentenceTransformer:No sentence-transformers model found with name /app/dataiku/design/code-envs/resources/python/py_39_GenAi/huggingface/transformers/models--sentence-transformers--all-MiniLM-L6-v2/snapshots/8b3219a92973c328a8e22fadcfa821b5dc75636a. Creating a new one with MEAN pooling.

  why the warning is happening? what should I do to fix it?

Operating system used: Almalinux (8.9)

0 Kudos
0 Replies