How to perform vector search and find the semantic similarity of documents in Python?

Last updated 20, Apr 2024

Question

How to perform vector search and find the semantic similarity of documents in Python?

Answer

In order to perform Vector Similarity searches in Python, first create the index to execute the recommendations for similar documents. For the model all-distilroberta-v1, make sure DIM is 768 (see the example).

FT.CREATE vss_index ON HASH PREFIX 1 "doc:" SCHEMA name TEXT content TEXT creation NUMERIC SORTABLE update NUMERIC SORTABLE content_embedding VECTOR FLAT 6 TYPE FLOAT32 DIM 768 DISTANCE_METRIC COSINE

Modeling documents

Then import the modeling library, in order to use all-distilroberta-v1, you must include the library SentenceTransformer.

from sentence_transformers import SentenceTransformer

Now we need to produce a vectorial representation of the document. Use a suitable model to compute the vector embedding of the :

content = "This is an arbitrary content"
model = SentenceTransformer('sentence-transformers/all-distilroberta-v1')
embedding = model.encode(content).astype(np.float32).tobytes()

Now you can store the embedding in the Hash that

doc = { "content_embedding" : embedding, 
        "name" : "Document's title",
        "state" : document.state}

conn.hset("doc:{}".format(pk), mapping=doc)

Searching for similar documents

In order to search for documents similar to a provided document, you will model the document as done previously, when creating a database of vector embeddings.

model = SentenceTransformer('sentence-transformers/all-distilroberta-v1')
new_embedding = model.encode(content).astype(np.float32).tobytes()

And then perform the similarity search.

q = Query("*=>[KNN 3 @v $vec]").return_field("__v_score").dialect(2)
res = conn.ft("vss_index").search(q, query_params={"vec": new_embedding})

References

https://huggingface.co/sentence-transformers