|
VIBE v5.1.6
Search Engine
|
Static Public Member Functions | |
| static String | clean (String text) |
| static String[] | split (String text) |
| static String[] | parse (String text) |
| static void | addStems (String line, Stemmer stemmer, Collection<? super String > stems) |
| static ArrayList< String > | listStems (String line, Stemmer stemmer) |
| static ArrayList< String > | listStems (String line) |
| static ArrayList< String > | listStems (Path input) throws IOException |
| static TreeSet< String > | uniqueStems (String line, Stemmer stemmer) |
| static TreeSet< String > | uniqueStems (String line) |
| static TreeSet< String > | uniqueStems (Path input) throws IOException |
| static ArrayList< TreeSet< String > > | listUniqueStems (Path input) throws IOException |
Static Public Attributes | |
| static final Pattern | SPLIT_REGEX = Pattern.compile("(?U)\\s+") |
| static final Pattern | CLEAN_REGEX = Pattern.compile("(?U)[^\\p{Alpha}\\s]+") |
Utility class for parsing, cleaning, and stemming text and text files into collections of processed words.
|
static |
Parses the line into cleaned and stemmed words and adds them to the provided collection.
| line | the line of words to clean, split, and stem |
| stemmer | the stemmer to use |
| stems | the collection to add stems |
|
static |
Cleans the text by removing any non-alphabetic characters (e.g. non-letters like digits, punctuation, symbols, and diacritical marks like the umlaut) and converting the remaining characters to lowercase.
| text | the text to clean |
|
static |
Reads a file line by line, parses each line into cleaned and stemmed words using the default stemmer for English.
| input | the input file to parse and stem |
| IOException | if unable to read or parse file |
|
static |
Parses the line into a list of cleaned and stemmed words using the default stemmer for English.
| line | the line of words to parse and stem |
|
static |
Parses the line into a list of cleaned and stemmed words.
| line | the line of words to clean, split, and stem |
| stemmer | the stemmer to use |
|
static |
Reads a file line by line, parses each line into unique, sorted, cleaned, and stemmed words using the default stemmer for English, and adds the set of unique sorted stems to a list per line in the file.
| input | the input file to parse and stem |
| IOException | if the path is null or invalid. |
|
static |
Parses the text into an array of clean words.
| text | the text to clean and split |
String objects
|
static |
Splits the supplied text by whitespaces.
| text | the text to split |
String objects
|
static |
Reads a file line by line, parses each line into a set of unique, sorted, cleaned, and stemmed words using the default stemmer for English.
| input | the input file to parse and stem |
| IOException | if unable to read or parse file |
|
static |
Parses the line into a set of unique, sorted, cleaned, and stemmed words using the default stemmer for English.
| line | the line of words to parse and stem |
|
static |
Parses the line into a set of unique, sorted, cleaned, and stemmed words.
| line | the line of words to parse and stem |
| stemmer | the stemmer to use |
|
static |
Regular expression that matches non-alphabetic characters. *
|
static |
Regular expression that matches any whitespace. *