textfier.stream.cleaner¶
Cleaning-based utilities, such as stemmers and stopword removal.
-
textfier.stream.cleaner.
clean_sentences
(sentences: List[str], remove_stopwords: Optional[bool] = False, language: Optional[str] = 'portuguese')¶ Stems and removes stopwords from a set of sentence-level tokens using the RSLPStemmer.
- Parameters
sentences – Sentences to be stemmed.
remove_stopwords – Whether stopwords should be removed or not.
- Returns
Stemmed tokens.
- Return type
(List[str])
-
textfier.stream.cleaner.
clean_words
(words: List[str], remove_stopwords: Optional[bool] = False, language: Optional[str] = 'portuguese')¶ Stems and removes stopwords from a set of word-level tokens using the RSLPStemmer.
- Parameters
words – Tokens to be stemmed.
remove_stopwords – Whether stopwords should be removed or not.
language – Identifier of stopwords’ language.
- Returns
Stemmed tokens.
- Return type
(List[str])