Soundex is an algorithm that has been used by the U.S. National
Archives since the 19th century to consolidate different spellings of
surnames in census reports. Soundex allows phonetic matches
meaning that a word is matched on how it's sounds rather then how it's
spelled. For example Smith and Smyth both sound similar and map to
the same Soundex code of "S530".
A Soundex code is made up of the first letter of the word followed by
These three numbers are decided on using the table below.
B,P,F,V = 1
C,S,G,J,K,Q,X,Z = 2
D,T = 3
L = 4
M,N = 5
R = 6
The letters A,E,I,O,U,Y,H,W and other characters are not coded.
The steps to Soundex coding are:
- Take the first letter in the word and make it the first letter of
the Soundex code.
- For each remaining letter in the word, translate it to a number
with the table above and concatenate the numbers, preserving order
into the Soundex code.
If two or more letters with the same code appear next to one another only
add one of them to the code.
- Trim the code if it's over four characters long. If it is
under four characters long append zeros until it is four long.
Below is the soundex algorithm implemented in Visual Basic.
Function SoundEx(sWord As String) As String
Dim Num As String ' Holds the generated code
Dim sChar As String
Dim lWordLength As Long
Dim sLastCode As String
Num = UCase(Mid$(sWord, 1, 1)) ' Get the first letter
sLastCode = GetSoundCodeNumber(Num)
lWordLength = Len(sWord)
' Create the code starting at the second letter.
For I = 2 To lWordLength
sChar = GetSoundCodeNumber(UCase(Mid$(sWord, I, 1)))
' If two letters that are the same are next to each other
' only count one of them
If Len(sChar) > 0 And sLastCode <> sChar Then
Num = Num & sChar
sLastCode = sChar
SoundEx = Mid$(Num, 1, 4) ' Make sure code isn't longer then 4 letters
If Len(Num) < 4 Then ' Make sure the code is at least 4 characters long
SoundEx = SoundEx & String(4 - Len(Num), "0")
Private Function GetSoundCodeNumber(sChar As String) As String
Select Case sChar
Case "B", "F", "P", "V"
GetSoundCodeNumber = "1"
Case "C", "G", "J", "K", "Q", "S", "X", "Z"
GetSoundCodeNumber = "2"
Case "D", "T"
GetSoundCodeNumber = "3"
GetSoundCodeNumber = "4"
Case "M", "N"
GetSoundCodeNumber = "5"
GetSoundCodeNumber = "6"
The Spell Checker
The spell checker developed works by loading a list of words from a
file, calculating the Soundex code for the word and using this as the
key to an item in a visual basic collection. The item in the
collection that the key points to is another collection. This
second collection holds the words that match the generated key.
For example the item with the key "P532" in the main
collection, contains a collection with the following words:
The key "P532" also maps to "phoneetic" a
misspelled word and that's how the spell checker works.
After we use the soundex code to return a list of possible words the
program automatically selects the word that has the most characters in
common with the word being checked. See the screen shot below.
Spell Checker Application.
Spell Checker in VB 6.0.
The dictionary included with this sample isn't complete so if it
can't guess the spelling, check dictionary.txt for the word.
a friend about this article
spell checking to your applications you could use the Spell
Checker DLL by VB Web. It is free and is based on the code in this
article. The dictionary included with it now has 5900 entries.