A complete guide to using WordNET in NLP applications


SOURCE: ANALYTICSINDIAMAG.COM
SEP 26, 2021

In the field of natural language processing, there are a variety of tasks such as automatic text classification, sentiment analysis, text summarization, etc. These tasks are partially based on the pattern of the sentence and the meaning of the words in a different context. The two different words may be similar with an amount of amplitude. For example, the words ‘jog’ and ‘run’, both of them are partially different and also partially similar to each other. To perform specific NLP-based tasks, it is required to understand the intuition of words in different positions and hold the similarity between the words as well. Here WordNET comes to the picture which helps in solving the linguistic problems of the NLP models.

WordNET is a lexical database of semantic relations between words in more than 200 languages. In this article, we will discuss WordNet in detail with its structure, working and implementation. The major points to be discussed in this article are listed below.

Table of Contents

  1. What is WordNET?
  2. The Distinction Between WordNET and Thesaurus
  3. Structure of WordNET
  4. Relations in the WordNET
  5. Implementation of WordNET

What is WordNET

WordNET is a lexical database of words in more than 200 languages in which we have adjectives, adverbs, nouns, and verbs grouped differently into a set of cognitive synonyms, where each word in the database is expressing its distinct concept. The cognitive synonyms which are called synsets are presented in the database with lexical and semantic relations. WordNET is publicly available for download and also we can test its network of related words and concepts using this link. Below are a few test images when accessed this through the browser.

For download purposes, you can navigate to this link.

The Distinction Between WordNET and Thesaurus

Where thesaurus is helping us in finding the synonyms and antonyms of the words the WordNET is helping us to do more than that. WordNET interlinks the specific sense of the words wherein thesaurus links words by their meaning only. In the WordNET the words are semantically disambiguated if they are in close proximity to each other. Thesaurus provides a level to the words in the network if the words have similar meaning but in the case of WordNET, we get levels of words according to their semantic relations which is a better way of grouping the words.

Structure of WordNET

The below image is a basic structure of the WordNET. The main concept of the relationship between the words in the WordNETs network is that the words are synonyms like sad and unhappy, benefit and profit. These words show the same concept of using them in similar contexts by interchanging them. These types of words are grouped into synsets which are unordered sets. Where synsets are linked together if they are having even small conceptual relations. Every synset in the network has its own brief definition and many of them are illustrated with the example of how to use them in a sentence. That definition and example part makes WordNET different from other

In the below picture we can see the structure of any synset where we are having synonyms of benefit in the array of synsets with the definition and the example of usage of benefit word. This synset is related to another synset word, where the words benefit and profit have exactly the same meaning.

Here we can see the structure of the wordnet and also how the synsets under the networks are interlinked because of the conceptual relation between the words.

Relations in the WordNET

Hyponym: In linguistics, a word with a broad meaning constitutes a category into which words with more specific meanings fall; a superordinate. For example, the colour is a hypernym of red. Where Hyponymy shows the relationship between a hypernym and a specific instance of a hyponym. A hyponym is a word or phrase whose semantic field is more specific than its hypernym. The semantic field of a hypernym, also known as a superordinate.

Image source

The above image is an example of the relationship between hyponyms and hypernym.

The reason for explaining these terms here is because in WordNET the most frequent relationships between synsets are based on these hyponym and hypernym relations. These are very beneficial in linking words like(paper, piece of paper). Saying more specifically with an example from the above picture like purple and violet, in WordNET the category colour includes purple which in turn includes violet. The root node of the hierarchy is the last point for every noun. In violet is a kind of purple and purple is a kind of colour then violet is a kind colour this is the hyponymy relation between the words which is transitive.

Meronymy: The wordnet hold follows the meronymy relation which defines the whole relationship between the synset for example a bike has two wheels handle and petrol tank. These components of a bike are inherited from their subordinates: if a bike has two wheels then a sports bike has wheels as well. In linguistics, we basically use this kind of relationship for adverbs which basically represents the characteristic of the noun. So the parts are inherited into a downward direction because all the bikes and types of bikes have two wheels, but not all kinds of automobiles consist of two wheels.

Troponymy: In linguistics, troponymy is the presence of a ‘manner’ relation between two lexemes. In WordNET Verbs describing events that necessarily and unidirectionally entail one another are linked: {buy}-{pay}, {succeed}-{try}, {show}-{see}, etc. basically the in the hierarchy verbs towards the bottom shows the manners are characterizing the events like communication-talk-whisper.

Antonymy: Adjective words under the WordNET arranged in the antonymy pairs like wet and dry, smile and cry. Each of these pairs of antonyms is linked with sets of semantic similar ones. The cry is linked to weep, shed tears, sob, wail etc. so that they all can be considered as the opposite of indirect antonyms of a smile.

Cross – PoS Relations

Most of the relations in the wordNET are in the same part of speech. On the basis of part of speech relations, we can divide WordNET into 4 types of 4 subnets one for each noun, verbs, adjective, and adverb. There are also some cross-PoS pointers available in the network which include a morphosemantic link that holds the words with the same meaning and shares a stem. For example, many pairs like (reader read) in which the noun of the pair has a semantic layer with respect to the verb have been specified.

Implementation of WordNET

We can implement WordNET in just a few lines of code.

Importing libraries:

import nltk from nltk.corpus import wordnet

Downloading the wordnet:

nltk.download('wordnet')

Output:

Taking trial of WordNET by checking the synonyms, antonyms and similarity percentage:

synonyms = [] antonyms = [] for synset in wordnet.synsets("evil"): for l in synset.lemmas(): synonyms.append(l.name()) if l.antonyms(): antonyms.append(l.antonyms()[0].name()) print(set(synonyms)) print(set(antonyms))

Output:

Here we can see the synonyms of the evil word and in the network, good and goodness is the opposite of the evil word.

Checking the word similarity feature:

word1 = wordnet.synset('man.n.01') word2 = wordnet.synset('boy.n.01') print(word1.wup_similarity(word2)*100)

Output:

Since we know grown-up boys are men, here when we asked the measure of similarity between the man and boy it gave the result around 66% which is a nice estimation of the similarity.

Final words

Here in this article, we had an overview of the WordNET along with an understanding of what are the basic structures of the wordnet and the synset. We discussed how it works to make the relation between the words properly because the manageable representation of the data into the model can make a model more accurate and workable. We saw what lexical relation that the database follows to hold the word with huge information and we have seen how we can implement this using python and nltk. It can be done using TextBlob and R as well. You can use WordNET and try it with these tools also. And try to accurately implement it in the models for better accuracy.

Similar articles you can read