Another way to run this using txtai

#8
by thandibilli - opened

I got it working using txtai:

from txtai import Embeddings

# Create embeddings instance with a semantic graph
embeddings = Embeddings({
  "autoid": "uuid5",
  "path": "nomic-ai/nomic-embed-text-v1.5",
  "instructions": {
    "query": "query: ",
    "data": "passage: "
  },
  "content": True,
  "graph": {
      "approximate": False,
      "topics": {}
  }
})

# Load dataset
wikipedia = Embeddings()
wikipedia.load(provider="huggingface-hub", container="neuml/txtai-wikipedia")

query = """
SELECT id, text FROM txtai
order by percentile desc
LIMIT 1000
"""

embeddings.index(wikipedia.search(query))
print(embeddings.search("select id, article, score from txtai where similar(:x)", parameters={"x": "operating system"}))

Two issues...

  1. It uses remote code. So, make sure you press 'y' to accept when you run the above script or make changes to this file:

txtai/models/models.py line 204:

return models[task](path,trust_remote_code=True) if task in models else path
  1. For some weird reason, it expects the "nomic-ai/nomic-embed-text-v1.5/model.safetensors" file in path where the above script is run. I simply run the following command where the script is:

    huggingface-cli download nomic-ai/nomic-embed-text-v1.5 --local-dir ./nomic-ai/nomic-embed-text-v1.5

This is coming from https://huggingface.co/nomic-ai/nomic-embed-text-v1-unsupervised/blob/62fe27d25832db69e9002f0ba71f9b3c2e7bad63/modeling_hf_nomic_bert.py#L52

Referenced here:

https://huggingface.co/nomic-ai/nomic-embed-text-v1.5/blob/1c45008249e960fddc50e443a833bfe147fc1e40/config.json#L8

FYI, the "path" in embeddings config actually downloads model to huggingface cache folder just fine. Still the remote code expects it in local folder

Nomic AI org

Hm that's odd. It seems like there's some local path it's picking up instead of downloading from HF

zpn changed discussion status to closed

Sign up or log in to comment