NLTK WordNet: Find Synonyms from NLTK WordNet in Python
What is Wordnet?
Wordnet is an NLTK corpus reader, a lexical database for English. It can be used to find the meaning of words, synonym or antonym. One can define it as a semantically oriented dictionary of English. It is imported with the following command:
from nltk.corpus import wordnet as guru
Find Synonyms from NLTK WordNet in Python
Stats reveal that there are 155287 words and 117659 synonym sets included with English WordNet.
Different methods available with WordNet can be found by typing dir(guru)
[‘_LazyCorpusLoader__args’, ‘_LazyCorpusLoader__kwargs’, ‘_LazyCorpusLoader__load’, ‘_LazyCorpusLoader__name’, ‘_LazyCorpusLoader__reader_cls’, ‘__class__’, ‘__delattr__’, ‘__dict__’, ‘__dir__’, ‘__doc__’, ‘__eq__’, ‘__format__’, ‘__ge__’, ‘__getattr__’, ‘__getattribute__’, ‘__gt__’, ‘__hash__’, ‘__init__’, ‘__le__’, ‘__lt__’, ‘__module__’, ‘__name__’, ‘__ne__’, ‘__new__’, ‘__reduce__’, ‘__reduce_ex__’, ‘__repr__’, ‘__setattr__’, ‘__sizeof__’, ‘__str__’, ‘__subclasshook__’, ‘__unicode__’, ‘__weakref__’, ‘_unload’, ‘subdir’, ‘unicode_repr’]
Let us understand some of the features available with the wordnet:
Synset: It is also called as synonym set or collection of synonym words. Let us check a example
from nltk.corpus import wordnet syns = wordnet.synsets("dog") print(syns)
Output:
[Synset('dog.n.01'), Synset('frump.n.01'), Synset('dog.n.03'), Synset('cad.n.01'), Synset('frank.n.02'), Synset('pawl.n.01'), Synset('andiron.n.01'), Synset('chase.v.01')]
Lexical Relations: These are semantic relations which are reciprocated. If there is a relationship between {x1,x2,…xn} and {y1,y2,…yn} then there is also relation between {y1,y2,…yn} and {x1,x2,…xn}. For example Synonym is the opposite of antonym or hypernyms and hyponym are type of lexical concept.
Let us write a program using python to find synonym and antonym of word “active” using Wordnet.
from nltk.corpus import wordnet synonyms = [] antonyms = [] for syn in wordnet.synsets("active"): for l in syn.lemmas(): synonyms.append(l.name()) if l.antonyms(): antonyms.append(l.antonyms()[0].name()) print(set(synonyms)) print(set(antonyms))
The output of the code:
{‘dynamic’, ‘fighting’, ‘combat-ready’, ‘active_voice’, ‘active_agent’, ‘participating’, ‘alive’, ‘active’} — Synonym
{‘stative’, ‘passive’, ‘quiet’, ‘passive_voice’, ‘extinct’, ‘dormant’, ‘inactive’} — Antonym
Explanation of the code
- Wordnet is a corpus, so it is imported from the ntlk.corpus
- List of both synonym and antonym is taken as empty which will be used for appending
- Synonyms of the word active are searched in the module synsets and are appended in the list synonyms. The same process is repeated for the second one.
- Output is printed
Conclusion
WordNet is a lexical database that has been used by a major search engine. From the WordNet, information about a given word or phrase can be calculated such as
- synonym (words having the same meaning)
- hypernyms (The generic term used to designate a class of specifics (i.e., meal is a breakfast), hyponyms (rice is a meal)
- holonyms (proteins, carbohydrates are part of meal)
- meronyms (meal is part of daily food intake)
WordNet also provides information on co-ordinate terms, derivates, senses and more. It is used to find the similarities between any two words. It also holds information on the results of the related word. In short or nutshell one can treat it as Dictionary or Thesaurus. Going deeper in wordnet, it is divided into four total subnets such as
- Noun
- Verb
- Adjective
- Adverb
It can be used in the area of artificial intelligence for text analysis. With the help of Wordnet, you can create your corpus for spelling checking, language translation, Spam detection and many more.
In the same way, you can use this corpus and mold it to work some dynamic functionality. This is just like ready to made corpus for you. You can use it in your way.