TextDoc
TextDoc
represents a text document that can be used with the VectorDB
and
EmbeddingModel
classes. It has in-built helper functions for text processing.
Attributes
contents
(str): The contents of the text document.meta
(dict, optional): Any metadata related to the text document. Defaults to None.id
(str, optional): The id of the text document. Defaults to a random uuid.created_at
(str, optional): The timestamp of the text document. Defaults to the current time with system's timezone.
Methods
from_file
(classmethod): Create aTextDoc
instance from a file.
- Input:
path: str, meta: Optional[dict] = None, encoding: str = "utf-8"
- Output:
TextDoc
split_on_separator
: Split the contents on a separator and return a list ofTextDoc
instances.
- Input:
separator: str = "\n", strip_after_split: bool = False
- Output:
List[TextDoc]
extract_regex
: ExtractTextDoc
instances from the content using a regex pattern.
- Input:
pattern: str
- Output:
List[TextDoc]