frej.fuzzy
Class Fuzzy

Object
  extended by frej.fuzzy.Fuzzy

public final class Fuzzy
extends Object

Class providing fuzzy string comparison. It is used in fuzzy regexp matching, but could also be used alone, for fuzzy string matching or fuzzy substring search. Based on Demerau-Levenshtein distance evaluation, i.e. there are four types of "mistakes" each counting as 1 point (char deletion, char adding, char replacement, swap of two adjacent chars). Methods are static, and resulting variables too, so necessary values should be read before new matching/searching attempt.

Author:
Rodion Gorkovenko

Field Summary
static String matchedPattern
          keeps best matched string after matching against a list of strings
static double result
          "distance" of last match (roughly mistakes count divided by length
static int resultEnd
          keeps ending position of matched region after substring search
static int resultIndex
          keeps index of best match after matching against a list of strings
static int resultStart
          keeps starting position of matched region after substring search
static double threshold
          if result of match is higher than threshold, boolean methods return "false"
 
Constructor Summary
Fuzzy()
           
 
Method Summary
static double bestEqual(String string, Object patterns, boolean equality)
          Given list or array of strings, searches for one which is best matched with whole original string (equality=true) or with some its substring (equality=false).
static double containability(CharSequence source, CharSequence pattern)
          Core method for searching substring.
static boolean containsOneOf(CharSequence source, CharSequence... patterns)
          Tests whether any of "patterns" is presented in "source" as substring.
static boolean equals(CharSequence source, CharSequence pattern)
          Tests whether "source" matches "pattern".
static double similarity(CharSequence source, CharSequence pattern)
          Core method for measuring Demerau-Levenshtein distance between two strings.
static int substrEnd(CharSequence source, CharSequence pattern)
          Tries to find substring "pattern" in the "source" and if successful, returns the position of the end of the match.
static int substrStart(CharSequence source, CharSequence pattern)
          Tries to find substring "pattern" in the "source" and if successful, returns the position of the beginning of the match.
 
Methods inherited from class Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

resultStart

public static int resultStart
keeps starting position of matched region after substring search


resultEnd

public static int resultEnd
keeps ending position of matched region after substring search


resultIndex

public static int resultIndex
keeps index of best match after matching against a list of strings


matchedPattern

public static String matchedPattern
keeps best matched string after matching against a list of strings


result

public static double result
"distance" of last match (roughly mistakes count divided by length


threshold

public static double threshold
if result of match is higher than threshold, boolean methods return "false"

Constructor Detail

Fuzzy

public Fuzzy()
Method Detail

substrStart

public static int substrStart(CharSequence source,
                              CharSequence pattern)
Tries to find substring "pattern" in the "source" and if successful, returns the position of the beginning of the match.

Returns:
position of found substring (0 .. source.length() - 1) or (-1) if substring was not found (with given threshold).

substrEnd

public static int substrEnd(CharSequence source,
                            CharSequence pattern)
Tries to find substring "pattern" in the "source" and if successful, returns the position of the end of the match.

Returns:
position of found substring end (0 .. source.length() - 1) or (-1) if substring was not found (with given threshold).

equals

public static boolean equals(CharSequence source,
                             CharSequence pattern)
Tests whether "source" matches "pattern".

Returns:
true or false depending on match quality.

containsOneOf

public static boolean containsOneOf(CharSequence source,
                                    CharSequence... patterns)
Tests whether any of "patterns" is presented in "source" as substring. Stops on first good match.

Returns:
true or false depending on whether any of pattern search succeeds.

containability

public static double containability(CharSequence source,
                                    CharSequence pattern)
Core method for searching substring. Finds the region for which Demerau-Levenshtein distance is minimal.

Returns:
normalized best distance (i.e. distance / pattern.length())

bestEqual

public static double bestEqual(String string,
                               Object patterns,
                               boolean equality)
Given list or array of strings, searches for one which is best matched with whole original string (equality=true) or with some its substring (equality=false).

Returns:
best match result (normalized distance).

similarity

public static double similarity(CharSequence source,
                                CharSequence pattern)
Core method for measuring Demerau-Levenshtein distance between two strings.

Returns:
normalized distance (distance / average(source.length(), pattern.length()))