frej
Class Regex

Object
  extended by frej.Regex

public final class Regex
extends Object

Class represents fuzzy regular expression at whole. Pattern of fuzzy regexp is passed as string to constructor. Then any string could be checked against this regexp with the help of match, matchFromStart or presentInSequence methods. After matching it is possible to receive replacement for matched region via getReplacement method. Few more auxiliary methods provided for handling parts of original string and result.

Author:
Rodion Gorkovenko

Constructor Summary
Regex(String pattern)
          Creates new regular expression (builds it as a tree of elements) from presented pattern.
Regex(String pattern, double threshold, String punctuators)
          Creates new regular expression from presented pattern, specifying also settings of threshold value and allowed punctuation marks.
 
Method Summary
 int getMatchEnd()
          Tells the character position (of string which have been matched) where last match ends (i.e. position strictly following last character of matched region).
 double getMatchResult()
          Returns result of the last match.
 int getMatchStart()
          Tells the character position (of string which have been matched) from which the match starts.
 String getReplacement()
          Gives replacement string which is generated after successful match according to rules specified in regexp pattern.
 double getThreshold()
          Returns value of threshold used in matching methods to decide whether matching result signifies match or mismatch.
 boolean match(String seq)
          Check whether presented string matches with this regexp with all tokens.
 int matchedTokenCount()
          Tells number of tokens in matched region (mostly important when pattern contains optional elements).
 boolean matchFromStart(String seq)
          Checks whether this regexp matches to beginning of presented sequence.
 String pattern()
          Reconstructs pattern which was used for creation of this regexp.
 String prefix()
          Returns the part of matched string, which precedes matching region.
 int presentInSequence(String seq)
          Checks whether this regexp matches to any subsequence in presented string.
 String setAllowedPunctuationMarks(String punct)
          Allows to set up which punctuation marks are allowed in the tokens By default only slash and dash i.e. punct = "/-"
 void setThreshold(double t)
          Sets value of threshold used in matching methods to decide whether matching result signifies match or mismatch.
 String suffix()
          Returns the part of matched string, which follows matching region.
 
Methods inherited from class Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

Regex

public Regex(String pattern)
Creates new regular expression (builds it as a tree of elements) from presented pattern. Behavior is undefined if pattern is incorrect.


Regex

public Regex(String pattern,
             double threshold,
             String punctuators)
Creates new regular expression from presented pattern, specifying also settings of threshold value and allowed punctuation marks.

Method Detail

presentInSequence

public int presentInSequence(String seq)
Checks whether this regexp matches to any subsequence in presented string.

Returns:
number of token from which best match starts or (-1) if all matches are bad enough.

match

public boolean match(String seq)
Check whether presented string matches with this regexp with all tokens.

Returns:
true or false depending on quality of best matching variant.

matchFromStart

public boolean matchFromStart(String seq)
Checks whether this regexp matches to beginning of presented sequence.

Returns:
true or false depending on quality of best match.

getMatchResult

public double getMatchResult()
Returns result of the last match. Result is strongly linked to "distance" between strings being fuzzy matched, i.e. it is roughly count of dissimilarities divided by length of matched region. For example "Free" and "Frej" match result is 0.25 while "Bold" and "Frej" gives 1.0.

Returns:
measure of dissimilarity, 0 means exact match.

getReplacement

public String getReplacement()
Gives replacement string which is generated after successful match according to rules specified in regexp pattern.

Returns:
replacement as a string.

getMatchStart

public int getMatchStart()
Tells the character position (of string which have been matched) from which the match starts.

Returns:
position, as integer from range 0 .. seq.length() - 1

getMatchEnd

public int getMatchEnd()
Tells the character position (of string which have been matched) where last match ends (i.e. position strictly following last character of matched region).

Returns:
position, as integer from range 0 .. seq.length() - 1

pattern

public String pattern()
Reconstructs pattern which was used for creation of this regexp.

Returns:
string representation of pattern.

matchedTokenCount

public int matchedTokenCount()
Tells number of tokens in matched region (mostly important when pattern contains optional elements).

Returns:
token count.

prefix

public String prefix()
Returns the part of matched string, which precedes matching region. String is trimmed of spaces since spaces are token delimiters.

Returns:
beginning of the seq used in presentInSequence etc.

suffix

public String suffix()
Returns the part of matched string, which follows matching region. String is trimmed of spaces since spaces are token delimiters.

Returns:
ending of the seq used in presentInSequence etc.

setAllowedPunctuationMarks

public String setAllowedPunctuationMarks(String punct)
Allows to set up which punctuation marks are allowed in the tokens By default only slash and dash i.e. punct = "/-"


getThreshold

public double getThreshold()
Returns value of threshold used in matching methods to decide whether matching result signifies match or mismatch. By default equals to frej.Fuzzy.threshold.


setThreshold

public void setThreshold(double t)
Sets value of threshold used in matching methods to decide whether matching result signifies match or mismatch. By default equals to frej.Fuzzy.threshold.