textfier.stream.tokenizer

Tokenization-based utilities, such as sentence- and word-level tokenizers.

textfier.stream.tokenizer.tokenize_to_sentences(text: str, language: Optional[str] = 'portuguese')

Tokenizes text into sentence-level.

Parameters
  • text – String holding the text to be tokenized.

  • language – Identifier of tokenizer’s language.

Returns

Sentence-level tokens.

Return type

(List[str])

textfier.stream.tokenizer.tokenize_to_words(text: str, language: Optional[str] = 'portuguese')

Tokenizes text into word-level.

Parameters
  • text – String holding the text to be tokenized.

  • language – Identifier of tokenizer’s language.

Returns

Word-level tokens.

Return type

(List[str])