LLM

An LLM (Large Language Model) is a next token generation model that generates the most probable next token given a set of tokens.

This is a rapidly developing research area and the definitions of its types might have some nuances. But, for a layman, you can consider that they come in three flavours:

Foundation - Trained on a massive corpus of text from scratch
Instruct - Foundation model, fine-tuned on instruction following (eg: text-davinci-003)
Chat - Instruct models, fine-tuned on chatting with humans (eg: gpt-3.5-turbo)

The Instruct and Chat based fine-tunes are the most useful for application development. And the classes associated with these in Embedia are LLM and ChatLLM respectively.

LLM is an abstract class. Inherit from this class and define the _complete method. The _complete method should take in a string as the input prompt, send it to the large language model fine-tuned on instruction following and return the completion. To use LLM, call the class instance like a function with the input prompt as the argument.

ℹ️

You can convert an LLM to a ChatLLM (Learn more about: Converting LLM to ChatLLM)

Attributes

tokenizer (Tokenizer, Optional): Used for counting no. of tokens in the prompt and response.
max_input_tokens (int, Optional): Used for checking if the prompt is too long.

Methods

_complete (abstract): Implement this method to generate the next token(s) given a prompt. Do not call this method directly. Instead, use the __call__ method.

Input: str
Output: str

__call__ : Internally calls the _complete method. Use this method by calling the class instance like a function with the input text as the argument.

Input: str
Output: str
Counts the number of tokens in the input prompt and the output completion if the tokenizer argument is passed to the constructor.
Checks if the length of the input prompt is less than max_input_tokens if the max_input_tokens argument is passed to the constructor.
Publishes an LLMStart event before calling the _complete method.
Publishes an LLMEnd event after calling the _complete method.

Usage

Basic Usage

An LLM might have a variety of use cases in your webapp. An example with chaining multiple LLM instances together would be summarizing a text, then translating it to another language and then finally extracting the keywords from the translated text.

You can connect any LLM to Embedia. It might be an open-source model like Llama-2, Vicuna, Falcon, etc. or a paid api from OpenAI, Google, Anthropic, etc.

Make sure to connect the instruct based models to LLM and the chat based models to ChatLLM

Let's connect the OpenAI's text-davinci-003 model to Embedia.

import asyncio
import os
 
import openai
from embedia import LLM
 
 
class OpenAILLM(LLM):
 
    def __init__(self):
        super().__init__()
        openai.api_key = os.environ['OPENAI_API_KEY']
 
    async def _complete(self, prompt):
        completion = await openai.Completion.acreate(model="text-davinci-003",
                                                     prompt=prompt)
        return completion.choices[0].text
 
 
if __name__ == '__main__':
    llm = OpenAILLM()
    completion = asyncio.run(llm('The capital of France is'))

Running the above code will print the following output because there are two events published internally, namely: LLMStart and LLMEnd. (Learn more about: Publish-Subscribe Event System)

[time: 2023-09-24T00:53:54.957874+00:00] [id: 140480688535392] [event: LLM Start]
Prompt (None tokens):
The capital of France is
 
[time: 2023-09-24T00:53:55.787782+00:00] [id: 140480688535392] [event: LLM End]
Completion (None tokens):
 Paris.

Adding the optional `Tokenizer`

Notice that the number of tokens is None in the above-printed log. This is because the LLM class doesn't have the optional tokenizer parameter in the constructor. Let's link a Tokenizer in the next example (Learn more about: Tokenizer)

ℹ️

Note that the way your tokenizer counts the number of tokens might slightly vary from how a service provider (eg: OpenAI) counts them. They might add a few tokens internally for the service to function properly.

import asyncio
import os
 
import openai
import tiktoken
from embedia import LLM, Tokenizer
 
 
class OpenAITokenizer(Tokenizer):
 
    def __init__(self):
        super().__init__()
 
    async def _tokenize(self, text):
        return tiktoken.encoding_for_model("gpt-3.5-turbo").encode(text)
 
 
class OpenAILLM(LLM):
 
    def __init__(self):
        super().__init__(tokenizer=OpenAITokenizer())
        openai.api_key = os.environ['OPENAI_API_KEY']
 
    async def _complete(self, prompt):
        completion = await openai.Completion.acreate(
            model="text-davinci-003", prompt=prompt)
        return completion.choices[0].text
 
 
if __name__ == '__main__':
    llm = OpenAILLM()
    completion = asyncio.run(llm('The capital of France is'))

Running the above code will now calculate the number of tokens:

[time: 2023-09-24T01:18:51.464181+00:00] [id: 140227726425776] [event: LLM Start]
Prompt (5 tokens):
The capital of France is
 
[time: 2023-09-24T01:18:52.001154+00:00] [id: 140227726425776] [event: LLM End]
Completion (2 tokens):
 Paris.

Adding the optional `max_input_tokens` parameter

There's also another optional parameter in the LLM constructor called max_input_tokens. If the length of the input prompt is greater than max_input_tokens, the class will raise a ValueError.

ℹ️

Note that max_input_tokens will not have any effect if the tokenizer argument is not passed to the LLM constructor.

import asyncio
import os
 
import openai
import tiktoken
from embedia import LLM, Tokenizer
 
 
class OpenAITokenizer(Tokenizer):
 
    def __init__(self):
        super().__init__()
 
    async def _tokenize(self, text):
        return tiktoken.encoding_for_model("gpt-3.5-turbo").encode(text)
 
 
class OpenAILLM(LLM):
 
    def __init__(self):
        super().__init__(tokenizer=OpenAITokenizer(), max_input_tokens=1)
        openai.api_key = os.environ['OPENAI_API_KEY']
 
    async def _complete(self, prompt):
        completion = await openai.Completion.acreate(model="text-davinci-003",
                                                     prompt=prompt)
        return completion.choices[0].text
 
 
if __name__ == '__main__':
    llm = OpenAILLM()
    completion = asyncio.run(llm('The capital of France is'))

The above code will thow the following error:

ValueError: Length of input text: 5 token(s) is longer than max_input_tokens: 1

Try it out yourself

Tokenizer ChatLLM

LLM