LLM
An LLM (Large Language Model) is a next token generation model that generates the most probable next token given a set of tokens.
This is a rapidly developing research area and the definitions of its types might have some nuances. But, for a layman, you can consider that they come in three flavours:
- Foundation - Trained on a massive corpus of text from scratch
- Instruct - Foundation model, fine-tuned on instruction following (eg: text-davinci-003)
- Chat - Instruct models, fine-tuned on chatting with humans (eg: gpt-3.5-turbo)
The Instruct and Chat based fine-tunes are the most useful for application
development. And the classes associated with these in Embedia are LLM
and
ChatLLM
respectively.
LLM
is an abstract class. Inherit from this class and define the _complete
method. The _complete
method should take in a string as the input prompt, send
it to the large language model fine-tuned on instruction following and return
the completion. To use LLM
, call the class instance like a function with the
input prompt as the argument.
You can convert an LLM
to a ChatLLM
(Learn more about: Converting LLM to
ChatLLM)
Attributes
-
tokenizer
(Tokenizer
, Optional): Used for counting no. of tokens in the prompt and response. -
max_input_tokens
(int
, Optional): Used for checking if the prompt is too long.
Methods
_complete
(abstract): Implement this method to generate the next token(s) given a prompt. Do not call this method directly. Instead, use the__call__
method.
- Input:
str
- Output:
str
__call__
: Internally calls the_complete
method. Use this method by calling the class instance like a function with the input text as the argument.
- Input:
str
- Output:
str
- Counts the number of tokens in the input prompt and the output completion if
the
tokenizer
argument is passed to the constructor. - Checks if the length of the input prompt is less than
max_input_tokens
if themax_input_tokens
argument is passed to the constructor. - Publishes an
LLMStart
event before calling the_complete
method. - Publishes an
LLMEnd
event after calling the_complete
method.
Usage
Basic Usage
An LLM
might have a variety of use cases in your webapp. An example with
chaining multiple LLM
instances together would be summarizing a text, then
translating it to another language and then finally extracting the keywords from
the translated text.
You can connect any LLM to Embedia. It might be an open-source model like Llama-2, Vicuna, Falcon, etc. or a paid api from OpenAI, Google, Anthropic, etc.
Make sure to connect the instruct based models to LLM
and the chat based
models to ChatLLM
Let's connect the OpenAI's text-davinci-003
model to Embedia.
import asyncio
import os
import openai
from embedia import LLM
class OpenAILLM(LLM):
def __init__(self):
super().__init__()
openai.api_key = os.environ['OPENAI_API_KEY']
async def _complete(self, prompt):
completion = await openai.Completion.acreate(model="text-davinci-003",
prompt=prompt)
return completion.choices[0].text
if __name__ == '__main__':
llm = OpenAILLM()
completion = asyncio.run(llm('The capital of France is'))
Running the above code will print the following output because there are two
events published internally, namely: LLMStart
and LLMEnd
. (Learn more about:
Publish-Subscribe Event System)
[time: 2023-09-24T00:53:54.957874+00:00] [id: 140480688535392] [event: LLM Start]
Prompt (None tokens):
The capital of France is
[time: 2023-09-24T00:53:55.787782+00:00] [id: 140480688535392] [event: LLM End]
Completion (None tokens):
Paris.
Adding the optional Tokenizer
Notice that the number of tokens is None
in the above-printed log. This is
because the LLM
class doesn't have the optional tokenizer
parameter in the
constructor. Let's link a Tokenizer
in the next example (Learn more about:
Tokenizer)
Note that the way your tokenizer counts the number of tokens might slightly vary from how a service provider (eg: OpenAI) counts them. They might add a few tokens internally for the service to function properly.
import asyncio
import os
import openai
import tiktoken
from embedia import LLM, Tokenizer
class OpenAITokenizer(Tokenizer):
def __init__(self):
super().__init__()
async def _tokenize(self, text):
return tiktoken.encoding_for_model("gpt-3.5-turbo").encode(text)
class OpenAILLM(LLM):
def __init__(self):
super().__init__(tokenizer=OpenAITokenizer())
openai.api_key = os.environ['OPENAI_API_KEY']
async def _complete(self, prompt):
completion = await openai.Completion.acreate(
model="text-davinci-003", prompt=prompt)
return completion.choices[0].text
if __name__ == '__main__':
llm = OpenAILLM()
completion = asyncio.run(llm('The capital of France is'))
Running the above code will now calculate the number of tokens:
[time: 2023-09-24T01:18:51.464181+00:00] [id: 140227726425776] [event: LLM Start]
Prompt (5 tokens):
The capital of France is
[time: 2023-09-24T01:18:52.001154+00:00] [id: 140227726425776] [event: LLM End]
Completion (2 tokens):
Paris.
Adding the optional max_input_tokens
parameter
There's also another optional parameter in the LLM
constructor called
max_input_tokens
. If the length of the input prompt is greater than
max_input_tokens
, the class will raise a ValueError
.
Note that max_input_tokens
will not have any effect if the tokenizer
argument is not passed to the LLM
constructor.
import asyncio
import os
import openai
import tiktoken
from embedia import LLM, Tokenizer
class OpenAITokenizer(Tokenizer):
def __init__(self):
super().__init__()
async def _tokenize(self, text):
return tiktoken.encoding_for_model("gpt-3.5-turbo").encode(text)
class OpenAILLM(LLM):
def __init__(self):
super().__init__(tokenizer=OpenAITokenizer(), max_input_tokens=1)
openai.api_key = os.environ['OPENAI_API_KEY']
async def _complete(self, prompt):
completion = await openai.Completion.acreate(model="text-davinci-003",
prompt=prompt)
return completion.choices[0].text
if __name__ == '__main__':
llm = OpenAILLM()
completion = asyncio.run(llm('The capital of France is'))
The above code will thow the following error:
ValueError: Length of input text: 5 token(s) is longer than max_input_tokens: 1