What is Natural Language Processing and how does it work?

SEP 03, 2021

How does Siri or Alexa understand what you're saying? How can the computer translate your voice perfectly?

Have you ever wondered how virtual assistants like Siri and Cortana work? How do they understand what you're saying?

Well, part of the answer is natural language processing. This interesting field of artificial intelligence has led to some huge breakthroughs over the last few years, but how exactly does it work?

Read on to learn more about natural language processing, how it works, and how it’s being used to make our lives more convenient.

What Is Natural Language Processing?

Natural Language Processing, or NLP, is how computers can understand human languages. For example, when you speak to voice-activated virtual assistants like Alexa or Siri, they listen, understand your speech, and perform an action based on what you’ve said.

Traditionally, humans could only communicate with computers via the programming language they were coded via particular commands. Code is inherently structured and logical, and the same commands will always produce the same output.

In contrast, human language is unstructured and much more complex. The same word or sentence can have multiple meanings based on inflections and context. And, there are many different languages.

So how is AI able to understand what we’re saying?

How Does NLP Work?

Unsplash - no attribution required

NLP is trained with machine learning. Machine learning is a branch of artificial intelligence that takes large amounts of data into an algorithm that trains itself to produce accurate predictions. The more data and time the algorithm has, the better it gets. This is why NLP machines are so much better today than they were ten years ago.

NLP works via preprocessing the text and then running it through the machine learning-trained algorithm.

Preprocessing Steps

Here are four of the common preprocessing steps that an NLP machine will use.

  • Tokenization: Tokenization is the process of breaking speech or text down into smaller units (called tokens). These are either individual words or clauses. Tokenization is important because it allows the software to determine which words are present, which leads to the next stages of NLP processing.
  • Stemming and Lemmatization: Stemming and lemmatization are simplifying processes that reduce each word to its root word. For instance, “running” into “run.” This enables the NLP to process text faster.

Stemming is a simpler process and involves removing any affixes from a word. Affixes are additions to the start and end of the word that gives it a slightly different meaning. However, stemming can result in errors when similar words have different roots. Consider the words “camel” and “came.” Stemming may reduce “camel” to “came" despite having completely different meanings.

Lemmatization is much more complicated and accurate. It involves reducing a word to their lemma, which is the base form of a word (as found in the dictionary). Lemmatization takes into account the context and is based on vocabulary and morphological analysis of words. A good example is “caring.” Stemming may reduce “caring” to “car,” whereas lemmatization will accurately reduce it to “care.”

Another technique works alongside both processes, known as Stop Word Removal. This is the simple removal of words that add no relevant information to the meaning of the speech, such as “at” and “a.”

Machine Learning Algorithm Tasks

Unsplash - no attribution required

Once the text has been preprocessed, an NLP machine is able to do several things depending on its intent.

  • Sentiment Analysis: The process of classifying the sentiment of the text. For example, whether a product review is positive, neutral, or negative.
  • Topic Classification: This is where the main topic of the text is identified. An NLP machine can tag documents, paragraphs, and sentences with what topic they are concerning.
  • Intent Detection: This is the process of determining what the intent is behind a particular text. For example, it can help businesses determine whether customers want to unsubscribe or are interested in a product.
  • Part-of-Speech-Tagging: After tokenization, an NLP machine will tag each word with an identifier. These include marking words as nouns, verbs, adjectives, and so on.
  • Speech Recognition: This is the task of converting speech to text and is particularly challenging because of differences in accent, intonation, grammar, and inflection, between people.
  • Named-Entity Recognition: The process of identifying useful names like “England” or “Google.” This is combined with coreference resolution, determining whether two words refer to the same thing, such as “Alice” and then “she.”
  • Natural Language Generation: This is the opposite of speech-to-text and is how NLP machines can generate speech or text to communicate back.

Why Is NLP So Important?

Unsplash - No attribution required

Natural Language Processing is a huge and ever-growing field that encompasses many functions. Some of the major uses of NLP are:

  • Analyzing Online Information: Businesses and researchers can use NLP to analyze swathes of text-based data into usable information. For instance, social media comments, reviews, customer support tickets, and even articles. NLP can analyze these for trends and insights of value for the business.
  • Language Translation: Apps such as Google Translate use NLP machines to convert one language into another.
  • Spell and Grammar Check: Word processors and apps like Grammarly check your text for spelling and grammar mistakes, readability, passive voice, and so on, to improve your writing.
  • Interactive Voice Response (IVR): Telephone bots allow humans to communicate with a computer-operated phone system to perform redirections and other tasks.
  • Virtual Assistants: Personal assistants such as Siri, Cortana, Bixby, Google Assistant, and Alexa use NLP to listen to your queries and produce responses or perform actions based on what you say.
  • Predictive Text: Your smartphone automatically provides predicted words based on a few letters or what you’ve already written in the sentence. The smartphone learns based on sentences you usually type and offers words you’re most likely to use. In fact, Microsoft Word is soon to implement this as a feature.
  • Chat Bots: Many websites now have virtual customer service bots that will attempt to assist customers prior to them being referred to a human operator.

Robot Conversationalists

Natural Language Processing is changing the way we communicate with robots and how they communicate with us. Bloomberg News uses an AI system called Cyborg to produce almost a third of its content. Meanwhile, Forbes, The Guardian, and The Washington Post all use AI to write news articles.

And all of this is only possible thanks to NLP!