We use cookies on our website.
Docs
API Reference
Schema
TextDoc

TextDoc

TextDoc represents a text document that can be used with the VectorDB and EmbeddingModel classes. It has in-built helper functions for text processing.

Attributes

  • contents (str): The contents of the text document.
  • meta (dict, optional): Any metadata related to the text document. Defaults to None.
  • id (str, optional): The id of the text document. Defaults to a random uuid.
  • created_at (str, optional): The timestamp of the text document. Defaults to the current time with system's timezone.

Methods

  1. from_file (classmethod): Create a TextDoc instance from a file.
  • Input: path: str, meta: Optional[dict] = None, encoding: str = "utf-8"
  • Output: TextDoc
  1. split_on_separator: Split the contents on a separator and return a list of TextDoc instances.
  • Input: separator: str = "\n", strip_after_split: bool = False
  • Output: List[TextDoc]
  1. extract_regex: Extract TextDoc instances from the content using a regex pattern.
  • Input: pattern: str
  • Output: List[TextDoc]