ChatLLM
An LLM (Large Language Model) is a next token generation model that generates the most probable next token given a set of tokens.
This is a rapidly developing research area and the definitions of its types might have some nuances. But, for a layman, you can consider that they come in three flavours:
- Foundation - Trained on a massive corpus of text from scratch
- Instruct - Foundation model, fine-tuned on instruction following (eg: text-davinci-003)
- Chat - Instruct models, fine-tuned on chatting with humans (eg: gpt-3.5-turbo)
The Instruct and Chat based fine-tunes are the most useful for application
development. And the classes associated with these in Embedia are LLM
and
ChatLLM
respectively.
ChatLLM
is an abstract class. Inherit from this class and define the _reply
method. The _reply
method should take in a string as the input prompt, send it
to the large language model fine-tuned on chat and return the reply. To use
ChatLLM
, call the class instance like a function with the input prompt as the
argument.
You can convert an LLM
to a ChatLLM
(Learn more about: Converting LLM to
ChatLLM)
Attributes
-
tokenizer
(Tokenizer
, Optional): Used for counting no. of tokens in the prompt and response. -
max_input_tokens
(int
, Optional): Used for checking if the sum of all the message contents inchat_history
is less thanmax_input_tokens
. -
chat_history
(List[Message]
): Contains all the messages sent and received by theChatLLM
instance. It is automatically initialized with an empty list when the class instance is created. (Learn more about: Message schema) -
llm
(Optional[LLM]
): Contains theLLM
instance if theChatLLM
object was created using thefrom_llm
classmethod. Otherwise, it isNone
.
Methods
_reply
(abstract): Implement this method to generate the reply given a prompt. Do not call this method directly. Instead, use the__call__
method.
- Input:
Optional[str]
- Output:
str
The input prompt is kept optional in the _reply
function because a Message
object with the role
attribute set to MessageRole.user
and content
attribute set to the input prompt will automatically be added to the
chat_history
when the __call__
method is called with the input prompt.
__call__
: Internally calls the_reply
method. Use this method by calling the class instance like a function with the input text as the argument.
- Input:
str
- Output:
str
- Adds a
Message
object with therole
attribute set toMessageRole.user
andcontent
attribute set to the input prompt to thechat_history
before calling the_reply
method. - Counts the number of tokens in the system prompt, input prompt and the output
reply if the
tokenizer
argument is passed to the constructor. - Checks if the sum of all the message contents in
chat_history
is less thanmax_input_tokens
if themax_input_tokens
argument is passed to the constructor. - Publishes an
ChatLLMStart
event before calling the_reply
method. - The
_reply
method is called with the input prompt as the argument if it is defined to take it. Else, it is called without any arguments. - Publishes an
ChatLLMEnd
event after calling the_reply
method. - Adds a
Message
object with therole
attribute set toMessageRole.assistant
andcontent
attribute set to the output reply to thechat_history
after calling the_reply
method.
If the ChatLLM
object is created using the from_llm
classmethod, instead
of calling the _reply
method, it calls the _complete
method of the LLM
object with the following prompt structure:
system: <system_prompt>
user: <prompt_1>
assistant: <reply_1>
user: <prompt_2>
assistant: <reply_2>
<...>
assistant:
set_system_prompt
: This erases thechat_history
and sets aMessage
object with therole
set toMessageRole.system
andcontent
set to the provided system prompt as the first message in thechat_history
.
- Input:
str
- Output:
None
- Publishes an
ChatLLMInit
event after adding the system prompt to the chat history.
save_chat
: Saves thechat_history
in a pickle file.
- Input:
str
(path to the pickle file) - Output:
None
load_chat
: Loads thechat_history
from a pickle file.
- Input:
str
(path to the pickle file) - Output:
None
from_llm
(classmethod): Converts anLLM
instance into aChatLLM
instance.
- Input:
LLM
(instance of yourLLM
subclass) - Output:
ChatLLM
(instance ofChatLLM
)
Usage
Basic Usage
A ChatLLM
might have a variety of usecases in your webapp. An example would be
to add a chatbot with a certain personality like a physics teacher or a content
writer.
You can connect any ChatLLM to Embedia. It might be an open-source model like Llama-2, Vicuna, Falcon, etc. or a paid api from OpenAI, Google, Anthropic, etc.
Make sure to connect the instruct based models to LLM
and the chat based
models to ChatLLM
Let's connect the OpenAI's gpt-3.5-turbo
model to Embedia. Since OpenAI's
ChatCompletion API takes in the entire chat history, we'll need to pass it the
entire chat history instead of just the current prompt.
This functionality will be built differently if using the Google's Palm 2 model for example. At the time of writing, the google-generativeai library keeps a copy of the entire chat history, hence, we'll need to pass it only the current prompt.
import asyncio
import os
import openai
from embedia import ChatLLM
class OpenAIChatLLM(ChatLLM):
def __init__(self):
super().__init__()
openai.api_key = os.environ['OPENAI_API_KEY']
async def _reply(self, prompt):
completion = await openai.ChatCompletion.acreate(
model="gpt-3.5-turbo",
messages=[{
'role': msg.role,
'content': msg.content
} for msg in self.chat_history],
)
return completion.choices[0].message.content
if __name__ == '__main__':
chatllm = OpenAIChatLLM()
reply = asyncio.run(chatllm('What is the capital of France?'))
Running the above code will print the following output because there are two
events published internally, namely: ChatLLMStart
and ChatLLMEnd
. (Learn
more about: Publish-Subscribe Event System)
[time: 2023-09-24T06:31:26.402688+00:00] [id: 140338366051664] [event: ChatLLM Start]
user (None tokens):
What is the capital of France?
[time: 2023-09-24T06:31:27.507751+00:00] [id: 140338366051664] [event: ChatLLM End]
assistant (None tokens):
The capital of France is Paris.
Saving and loading chat_history
You can save and load the chat_history
variable in a pickle
file by using
the save_chat
and load_chat
methods respectively.
import asyncio
import os
import openai
from embedia import ChatLLM
class OpenAIChatLLM(ChatLLM):
def __init__(self):
super().__init__()
openai.api_key = os.environ['OPENAI_API_KEY']
async def _reply(self, prompt):
completion = await openai.ChatCompletion.acreate(
model="gpt-3.5-turbo",
messages=[{
'role': msg.role,
'content': msg.content
} for msg in self.chat_history],
)
return completion.choices[0].message.content
if __name__ == '__main__':
chatllm = OpenAIChatLLM()
reply = asyncio.run(chatllm('What is the capital of France?'))
reply = asyncio.run(chatllm('What is the capital of Italy?'))
asyncio.run(chatllm.save_chat('openai_chatllm.pkl'))
asyncio.run(chatllm.load_chat('openai_chatllm.pkl'))
assert os.path.exists('openai_chatllm.pkl')
print(chatllm.chat_history)
Running the above code will print the following output:
[time: 2023-09-24T06:36:25.321676+00:00] [id: 139750960313680] [event: ChatLLM Start]
user (None tokens):
What is the capital of France?
[time: 2023-09-24T06:36:26.326593+00:00] [id: 139750960313680] [event: ChatLLM End]
assistant (None tokens):
The capital of France is Paris.
[time: 2023-09-24T06:36:26.329971+00:00] [id: 139750960313680] [event: ChatLLM Start]
user (None tokens):
What is the capital of Italy?
[time: 2023-09-24T06:36:27.316278+00:00] [id: 139750960313680] [event: ChatLLM End]
assistant (None tokens):
The capital of Italy is Rome.
[Message(role=<MessageRole.user: 'user'>, content='What is the capital of France?', id='42a95b2e-89b4-4638-ac79-60633604e2a2', created_at='2023-09-24 06:36:25.321613+00:00'), Message(role=<MessageRole.assistant: 'assistant'>, content='The capital of France is Paris.', id='6a756ef5-3a1a-4d77-8126-3c31f468feba', created_at='2023-09-24 06:36:26.326527+00:00'), Message(role=<MessageRole.user: 'user'>, content='What is the capital of Italy?', id='1d1ea9fc-bac2-40b2-a7ef-580b0c4ce771', created_at='2023-09-24 06:36:26.329854+00:00'), Message(role=<MessageRole.assistant: 'assistant'>, content='The capital of Italy is Rome.', id='ea33491b-1471-4a2d-bf5f-531cf953bef2', created_at='2023-09-24 06:36:27.316237+00:00')]
Adding a system prompt
You can set the system prompt for the ChatLLM
subclass by using its
set_system_prompt
method. This erases the chat_history
and sets the provided
system prompt as the first message in the chat_history
. There are a bunch of
predefined system prompts available in the Persona
class. (Learn more about:
Persona prompts) You can also create your
own system prompt.
Using a lower temperature when asking the LLM to write code gives more predictable results.
import asyncio
import os
import openai
from embedia import ChatLLM, Persona
class OpenAIChatLLM(ChatLLM):
def __init__(self):
super().__init__()
openai.api_key = os.environ['OPENAI_API_KEY']
async def _reply(self, prompt):
completion = await openai.ChatCompletion.acreate(
model="gpt-3.5-turbo",
temperature=0.1,
messages=[{
'role': msg.role,
'content': msg.content
} for msg in self.chat_history],
)
return completion.choices[0].message.content
if __name__ == '__main__':
chatllm = OpenAIChatLLM()
asyncio.run(
chatllm.set_system_prompt(
Persona.CodingLanguageExpert.format(language='Python')))
reply = asyncio.run(
chatllm('Count the number of python code lines in the current folder'))
Running the above code will print the following output:
[time: 2023-09-24T06:53:26.213236+00:00] [id: 140328954393936] [event: ChatLLM Init]
system (None tokens):
You are an expert in writing Python code. Only use Python default libraries. Reply only with the code and nothing else
[time: 2023-09-24T06:53:26.215463+00:00] [id: 140328954393936] [event: ChatLLM Start]
user (None tokens):
Count the number of python code lines in the current folder
[time: 2023-09-24T06:53:29.265166+00:00] [id: 140328954393936] [event: ChatLLM End]
assistant (None tokens):
import os
count = 0
for root, dirs, files in os.walk('.'):
for file in files:
if file.endswith('.py'):
with open(os.path.join(root, file), 'r') as f:
count += sum(1 for line in f if line.strip())
print(count)
Adding the optional Tokenizer
Notice that the number of tokens is None
in the above-printed log. This is
because the ChatLLM
class doesn't have the optional tokenizer
parameter in
the constructor. Let's link a Tokenizer
in the next example (Learn more about:
Tokenizer)
Note that the way your tokenizer counts the number of tokens might slightly vary from how a service provider (eg: OpenAI) counts them. They might add a few tokens internally for the service to function properly.
import asyncio
import os
import openai
import tiktoken
from embedia import ChatLLM, Persona, Tokenizer
class OpenAITokenizer(Tokenizer):
def __init__(self):
super().__init__()
async def _tokenize(self, text):
return tiktoken.encoding_for_model("gpt-3.5-turbo").encode(text)
class OpenAIChatLLM(ChatLLM):
def __init__(self):
super().__init__(tokenizer=OpenAITokenizer())
openai.api_key = os.environ['OPENAI_API_KEY']
async def _reply(self, prompt):
completion = await openai.ChatCompletion.acreate(
model="gpt-3.5-turbo",
temperature=0.1,
messages=[{
'role': msg.role,
'content': msg.content
} for msg in self.chat_history],
)
return completion.choices[0].message.content
if __name__ == '__main__':
chatllm = OpenAIChatLLM()
asyncio.run(
chatllm.set_system_prompt(
Persona.CodingLanguageExpert.format(language='Python')))
reply = asyncio.run(
chatllm('Count the number of python code lines in the current folder'))
Running the above code will print the following output:
[time: 2023-09-24T07:04:30.005131+00:00] [id: 139954984253664] [event: ChatLLM Init]
system (23 tokens):
You are an expert in writing Python code. Only use Python default libraries. Reply only with the code and nothing else
[time: 2023-09-24T07:04:30.016053+00:00] [id: 139954984253664] [event: ChatLLM Start]
user (11 tokens):
Count the number of python code lines in the current folder
[time: 2023-09-24T07:04:33.392050+00:00] [id: 139954984253664] [event: ChatLLM End]
assistant (65 tokens):
import os
count = 0
for root, dirs, files in os.walk('.'):
for file in files:
if file.endswith('.py'):
with open(os.path.join(root, file), 'r') as f:
count += sum(1 for line in f if line.strip())
print(count)
Adding the optional max_input_tokens
parameter
There's also another optional parameter in the ChatLLM
constructor called
max_input_tokens
. If the sum of all the message contents in chat_history
is
greater than max_input_tokens
, the class will raise a ValueError
.
Note that max_input_tokens
will not have any effect if the tokenizer
argument is not passed to the ChatLLM
constructor.
import asyncio
import os
import openai
import tiktoken
from embedia import ChatLLM, Persona, Tokenizer
class OpenAITokenizer(Tokenizer):
def __init__(self):
super().__init__()
async def _tokenize(self, text):
return tiktoken.encoding_for_model("gpt-3.5-turbo").encode(text)
class OpenAIChatLLM(ChatLLM):
def __init__(self):
super().__init__(tokenizer=OpenAITokenizer(), max_input_tokens=1)
openai.api_key = os.environ['OPENAI_API_KEY']
async def _reply(self, prompt):
completion = await openai.ChatCompletion.acreate(
model="gpt-3.5-turbo",
temperature=0.1,
messages=[{
'role': msg.role,
'content': msg.content
} for msg in self.chat_history],
)
return completion.choices[0].message.content
if __name__ == '__main__':
chatllm = OpenAIChatLLM()
asyncio.run(
chatllm.set_system_prompt(
Persona.CodingLanguageExpert.format(language='Python')))
reply = asyncio.run(
chatllm('Count the number of python code lines in the current folder'))
There will be two messages set in the chat_history
when the __call__
method.
One with the system prompt (23 tokens) and the other with the user prompt (11
tokens). The sum of these two is 34. Hence the above code will thow the
following error:
ValueError: Length of input text: 34 token(s) is longer than max_input_tokens: 1
Converting LLM to ChatLLM
You can convert an instance of an LLM
subclass into an instance of a ChatLLM
subclass using the from_llm
classmethod present in the ChatLLM
class. Once
you've converted the LLM
instance, you can use it exactly like a ChatLLM
instance.
This is very useful since a lot of LLM service providers (and even open-source models) only provide a next-token generation interface and not a chat interface.
The tokenizer
and max_input_tokens
parameters behave the same way as they
would if it were an LLM
. Setting the system prompt is also supported for these
kinds of instances.
import asyncio
import os
import openai
import tiktoken
from embedia import LLM, ChatLLM, Persona, Tokenizer
class OpenAITokenizer(Tokenizer):
def __init__(self):
super().__init__()
async def _tokenize(self, text):
return tiktoken.encoding_for_model("gpt-3.5-turbo").encode(text)
class OpenAILLM(LLM):
def __init__(self):
super().__init__(tokenizer=OpenAITokenizer())
openai.api_key = os.environ['OPENAI_API_KEY']
async def _complete(self, prompt):
completion = await openai.Completion.acreate(model="text-davinci-003",
prompt=prompt,
max_tokens=100,
temperature=0.1)
return completion.choices[0].text
if __name__ == '__main__':
llm = OpenAILLM()
chatllm = ChatLLM.from_llm(llm)
asyncio.run(
chatllm.set_system_prompt(
Persona.CodingLanguageExpert.format(language='Python')))
reply = asyncio.run(
chatllm('Count the number of python code lines in the current folder'))
Internally, Embedia combines all the messages from the chat_history
in the
following format:
system: <system_prompt>
user: <prompt_1>
assistant: <reply_1>
user: <prompt_2>
assistant: <reply_2>
<...>
assistant:
This entire string is then sent to the __call__
function of the underlying
LLM
. This makes an LLM with a next-token generation interface behave like an
LLM with a chat interface.
Running the above code will trigger a ChatLLMInit
when the system prompt is
set and then a LLMStart
and LLMEnd
when the __call__
method of LLM
is
called.
[time: 2023-09-24T07:30:30.641946+00:00] [id: 139949812477728] [event: ChatLLM Init]
system (23 tokens):
You are an expert in writing Python code. Only use Python default libraries. Reply only with the code and nothing else
[time: 2023-09-24T07:30:30.644154+00:00] [id: 139949834272432] [event: LLM Start]
Prompt (43 tokens):
system: You are an expert in writing Python code. Only use Python default libraries. Reply only with the code and nothing else
user: Count the number of python code lines in the current folder
assistant:
[time: 2023-09-24T07:30:33.204390+00:00] [id: 139949834272432] [event: LLM End]
Completion (69 tokens):
import os
def count_lines(path):
count = 0
for root, dirs, files in os.walk(path):
for file in files:
if file.endswith('.py'):
with open(os.path.join(root, file)) as f:
count += len(f.readlines())
return count
print(count_lines('.'))