Spell checkers, word2vec, fastext

For the given word, generate all possible corrections by deleting a character and then expand the corrections set in a similar way with transposition, replacement, insertion and then check the probability of these corrections in a word frequency dictionary and find the word with max probability that could be the correction.

why do we see bad output?

One reason for this behavior could be that the pretrained model was originally trained by FastText with a >1 neighborhood window. This FastText documentation page confirms the fact that the wordNgrams (max length of word ngram) was set to 5 during training. If we train our own word vectors, we could keep the wordNgrams hyperparameter to 1, so that FastText trains with 0 neighbors (i.e. each word is considered a line on it’s own).

cat sample.txt
my name is how are hello you when is this going to happen for what that only when helle helo
workspace/fasttext$ python
Python 3.8.0 (default, Oct 8 2020, 21:35:46)
[GCC 7.5.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import fasttext
>>> model = fasttext.train_unsupervised('sample.txt', model='skipgram', minCount=1)
Read 0M words
Number of words: 19
Number of labels: 0
Progress: 100.0% words/sec/thread: 51532 lr: 0.000000 avg.loss: -nan ETA: 0h 0m 0s
>>> model.get_nearest_neighbors('hello')
[(0.40682557225227356, 'helle'), (0.19969063997268677, 'helo'), (0.1260446161031723, 'that'), (0.0739162266254425, 'only'), (0.06943869590759277, 'to'), (0.06371329724788666, 'my'), (0.053224191069602966, 'when'), (0.020248744636774063, 'name'), (0.00697528850287199, '</s>'), (-0.00836258102208376, 'happen')]
>>>
>>>

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store