Saturday, February 18, 2012

Implementing a method in a trie for a dictionary?

String[] WordDictionary.findClosest(String str, int dist, int max)

This method will return up to max words that are close to the spelling of str. The words you return may differ in at most dist characters from str.



how would i do this for java?

thanksImplementing a method in a trie for a dictionary?
Well depending on the dictionary, English has over 100,000 words, I would definitely consider filtering out words as much as I could as fast as I could.



Now here is a question is workbench and bench a valid word if dist is 4 or more? or do they need to be the same length sort of like mastermind, (you have 3 are correct position and correct color and 2 are correct color and incorrect position.) but you have to have 5 tokens on each guess.



Filter out words that are shorter than str.length()-dist and longer than str.length()+dist ie

looking for words like "bench" and a dist of 3, "workbench" should not be returned since it is 4 letters off. may be different based on what the assignment is suppose to be.



one algorithm would be brute force...

for(i = 0; i %26lt;= totalwords; i++){

// filter based on if their length is still within range (may need to be the same length)

if (str.length()-dist %26lt;= dictionary[i].length() %26amp;%26amp; dictionary[i].length() %26lt;= str.length()+dist){

// still a candidate



// compare each letter in current dictionary word and given word, increment a counter of exact matches (sort of like mastermind, correct color, correct position)

if( str.length() - dist %26lt;= count) {

current dictionary word is a good word, add it to the list.

}



// now if "bench" could return "workbench" with a dist being 4 or more.

// you would need to have 2 nested for loops and test does b match w, no

does b match o, no

does b match r, no

does b match k, no

does b match b, yes

increment first word index

does e match e, yes

increment first word index

does n match n, yes

etc.

count the number of matches found and do the same as above



}



The other option is if you like regex'es you could create a pattern being for "bench"

"$(.+)([b])(.+)([e])(.+)([n])(.+)([c])鈥?I don't know if "(.+)" is correct pattern and test each group members and if the wildcards letters are a total more than dist letters, that dictionary word is not a valid word.





You mentioned the data was in a tree, how is the tree organized it may change the algorithm as well to determine which branch you would take to test the words.
  • tagalog translator
  • dominoes online
  • No comments:

    Post a Comment