EmbeddingModel
To convert text into an embedding, you need to use an embedding model. Embeddings are a way to represent text as a vector of numbers. You might be familiar with a 3 dimensional vector (across x, y, z), which is a list of 3 numbers. An embedding might have thousands of dimensions (in an abstract space), that means it is a list of thousands of numbers.
This kind of multidimentional vector is how deep learning models see any information they process. Turns out, such vector representations are also very useful for tasks such as semantic search.
EmbeddingModel
is an abstract class. Inherit from this class and define the
_embed
method. The _embed
method should take in a string as the input text
and return the embedding. To use EmbeddingModel
, call the class instance like
a function with the input text as the argument.
Methods
_embed
(abstract): Implement this method to convert a text input into an embedding. Do not call this method directly. Instead, use the__call__
method.
- Input:
Union[List[Any], str]
- Output:
List[Any]
__call__
: Internally calls the_embed
method. Use this method by calling the class instance like a function with the input text as the argument.
- Input:
Union[List[Any], str]
- Output:
List[Any]
- Publishes an
EmbeddingStart
event before calling the_embed
method and publishes anEmbeddingEnd
event after the_embed
method returns.
Basic Usage
An EmbeddingModel
can be used in conjunction with a VectorDB
(Learn more
about: VectorDB) to build a semantic search
index in your application. A semantic search index with an LLM model is the
basis of a Retrieval Augmented Generation (RAG) framework.
Let's connect the OpenAI's text embedding model to Embedia.
import asyncio
import os
import openai
from embedia import EmbeddingModel
from tenacity import (
retry,
retry_if_not_exception_type,
stop_after_attempt,
wait_random_exponential,
)
class OpenAIEmbedding(EmbeddingModel):
def __init__(self):
super().__init__()
openai.api_key = os.environ['OPENAI_API_KEY']
@retry(wait=wait_random_exponential(min=1, max=20),
stop=stop_after_attempt(6),
retry=retry_if_not_exception_type(openai.InvalidRequestError))
async def _embed(self, input: str):
result = await openai.Embedding.acreate(input=input,
model='text-embedding-ada-002')
return result["data"][0]["embedding"]
if __name__ == '__main__':
embedding_model = OpenAIEmbedding()
embedding = asyncio.run(
embedding_model(
'Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua'
))
print(len(embedding))
Running the above code should print the following output:
[time: 2023-10-01T12:15:20.338184+00:00] [id: 139646081867760] [event: Embedding Start]
Input:
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore ...
[time: 2023-10-01T12:15:21.240092+00:00] [id: 139646081867760] [event: Embedding End]
Embedding:
[-0.007770706433802843, -0.017298607155680656, 0.006062322296202183, -0.02754240296781063, -0.020682834088802338]...
1536