textfier.stream.tokenizer¶
Tokenization-based utilities, such as sentence- and word-level tokenizers.
-
textfier.stream.tokenizer.
tokenize_to_sentences
(text: str, language: Optional[str] = 'portuguese')¶ Tokenizes text into sentence-level.
- Parameters
text – String holding the text to be tokenized.
language – Identifier of tokenizer’s language.
- Returns
Sentence-level tokens.
- Return type
(List[str])
-
textfier.stream.tokenizer.
tokenize_to_words
(text: str, language: Optional[str] = 'portuguese')¶ Tokenizes text into word-level.
- Parameters
text – String holding the text to be tokenized.
language – Identifier of tokenizer’s language.
- Returns
Word-level tokens.
- Return type
(List[str])