Web Scripting
Active Server Pages
Online Tools
Click Here


What's New


Code Downloads

Code Snippets

Message boards

Tool Box


Mailing List
Receive free code snippets and notices when this site is updated.



Tell a friend about this article

Spell checking using SoundEx

Soundex is an algorithm that has been used by the U.S. National Archives since the 19th century to consolidate different spellings of surnames in census reports.  Soundex allows phonetic matches meaning that a word is matched on how it's sounds rather then how it's spelled.  For example Smith and Smyth both sound similar and map to the same Soundex code of "S530". 

A Soundex code is made up of the first letter of the word followed by three numbers.

These three numbers are decided on using the table below.

B,P,F,V = 1
C,S,G,J,K,Q,X,Z = 2
D,T = 3
L = 4
M,N = 5
R = 6

The letters A,E,I,O,U,Y,H,W and other characters are not coded.

The steps to Soundex coding are:

  1. Take the first letter in the word and make it the first letter of the Soundex code.
  2. For each remaining letter in the word, translate it to a number with the table above and concatenate the numbers, preserving order into the Soundex code.
    If two or more letters with the same code appear next to one another only add one of them to the code.
  3. Trim the code if it's over four characters long.  If it is under four characters long append zeros until it is four long.

Soundex algorithm

Below is the soundex algorithm implemented in Visual Basic.

Function SoundEx(sWord As String) As String
    Dim Num As String ' Holds the generated code
    Dim sChar As String
    Dim lWordLength As Long
    Dim sLastCode As String
    Num = UCase(Mid$(sWord, 1, 1)) ' Get the first letter
    sLastCode = GetSoundCodeNumber(Num)
    lWordLength = Len(sWord)
    ' Create the code starting at the second letter.
    For I = 2 To lWordLength
        sChar = GetSoundCodeNumber(UCase(Mid$(sWord, I, 1)))
        ' If two letters that are the same are next to each other
        ' only count one of them
        If Len(sChar) > 0 And sLastCode <> sChar Then
            Num = Num & sChar
        End If
        sLastCode = sChar
    SoundEx = Mid$(Num, 1, 4) ' Make sure code isn't longer then 4 letters
    If Len(Num) < 4 Then ' Make sure the code is at least 4 characters long
        SoundEx = SoundEx & String(4 - Len(Num), "0")
    End If
End Function
Private Function GetSoundCodeNumber(sChar As String) As String
    Select Case sChar
        Case "B", "F", "P", "V"
           GetSoundCodeNumber = "1"
        Case "C", "G", "J", "K", "Q", "S", "X", "Z"
           GetSoundCodeNumber = "2"
        Case "D", "T"
            GetSoundCodeNumber = "3"
        Case "L"
            GetSoundCodeNumber = "4"
        Case "M", "N"
            GetSoundCodeNumber = "5"
        Case "R"
            GetSoundCodeNumber = "6"
    End Select
End Function

The Spell Checker

The spell checker developed works by loading a list of words from a file, calculating the Soundex code for the word and using this as the key to an item in a visual basic collection.  The item in the collection that the key points to is another collection.  This second collection holds the words that match the generated key.

For example the item with the key "P532" in the main collection, contains a collection with the following words:

    --> patients
    --> panties
    --> pants
    --> phonetic
    --> phonetician

The key "P532" also maps to "phoneetic" a misspelled word and that's how the spell checker works.

After we use the soundex code to return a list of possible words the program automatically selects the word that has the most characters in common with the word being checked.  See the screen shot below.

Spell checker screenshot
Spell Checker Application.

Download Spell Checker in VB 6.0.

The dictionary included with this sample isn't complete so if it can't guess the spelling, check dictionary.txt for the word.

Tell a friend about this article

To add spell checking to your applications you could use the Spell Checker DLL by VB Web. It is free and is based on the code in this article.  The dictionary included with it now has 5900 entries.