textfier.stream.cleaner

Cleaning-based utilities, such as stemmers and stopword removal.

textfier.stream.cleaner.clean_sentences(sentences: List[str], remove_stopwords: Optional[bool] = False, language: Optional[str] = 'portuguese')

Stems and removes stopwords from a set of sentence-level tokens using the RSLPStemmer.

Parameters
  • sentences – Sentences to be stemmed.

  • remove_stopwords – Whether stopwords should be removed or not.

Returns

Stemmed tokens.

Return type

(List[str])

textfier.stream.cleaner.clean_words(words: List[str], remove_stopwords: Optional[bool] = False, language: Optional[str] = 'portuguese')

Stems and removes stopwords from a set of word-level tokens using the RSLPStemmer.

Parameters
  • words – Tokens to be stemmed.

  • remove_stopwords – Whether stopwords should be removed or not.

  • language – Identifier of stopwords’ language.

Returns

Stemmed tokens.

Return type

(List[str])