textfier.stream.cleaner¶
Cleaning-based utilities, such as stemmers and stopword removal.
- textfier.stream.cleaner.clean_sentences(sentences: List[str], remove_stopwords: Optional[bool] = False, language: Optional[str] = 'portuguese')¶
Stems and removes stopwords from a set of sentence-level tokens using the RSLPStemmer.
- Parameters
sentences – Sentences to be stemmed.
remove_stopwords – Whether stopwords should be removed or not.
- Returns
Stemmed tokens.
- Return type
(List[str])
- textfier.stream.cleaner.clean_words(words: List[str], remove_stopwords: Optional[bool] = False, language: Optional[str] = 'portuguese')¶
Stems and removes stopwords from a set of word-level tokens using the RSLPStemmer.
- Parameters
words – Tokens to be stemmed.
remove_stopwords – Whether stopwords should be removed or not.
language – Identifier of stopwords’ language.
- Returns
Stemmed tokens.
- Return type
(List[str])