Adversarial machine learning: Combating data poisoning


SOURCE: ITBUSINESSEDGE.COM
SEP 21, 2021

The fields of machine learning (ML) and artificial intelligence (AI) have seen rapid developments in recent years. ML, a branch of AI and computer science, is the process through which computers can be programmed or trained to identify patterns in data or information to make better accurate predictions over time.

Machine learning is a backbone technology of AI that is being used in simple yet valuable applications such as email spam filters and malware detection, as well as in more complex technologies including speech recognition, facial recognition, self-driving cars, and robotics.

Adversarial Machine Learning

Along with many potential benefits, machine learning comes with vulnerability to manipulation. ‘Adversarial machine learning’ is the term used by cybersecurity researchers for malicious activities by attackers or ‘adversaries’ inputting deceptive data to trick machine learning systems to make errors.

For example, a few small stickers placed on the ground at an intersection can trick a self-driving car into taking the route of the opposite lane of traffic. It is also easier to deceive a computer vision system of a self-driving car into wrongly taking a stop sign as a speed limit sign if you place a few pieces of tape on the stop sign board.

Today, as machine learning models are widely adopted in domains like business, transportation, and national security, cyber criminals could deploy adversarial machine learning attacks for nefarious means — from financial fraud to launching drone strikes on unintentional targets.

Below, a brief overview of adversarial machine learning has been laid out for policymakers, entrepreneurs, and other participants in the development of machine learning systems to help them ward off manipulation and corruption of these systems.

Types of Adversarial Machine Learning

Machine learning models are designed to recognize patterns in data with the help of ‘training data’ algorithms also known as ‘classifiers’ provided by humans. Through repeated exposure to classifiers, these machine learning models can be taught how to respond to different data inputs.

Eventually, these models are expected to make increasingly accurate predictions. The feeding of increased volume of data into a machine learning system enhances the accuracy of its predictions, at least in theory; the machine learning process, owing to its complexities, can be unpredictable in nature.

Adversaries can devise and wage a variety of attacks, either during the training phase of the ML model or after the completion of the training phase to disrupt a machine learning model.

Data Poisoning Attack

In the case of data poisoning attacks, an adversary provides misleading data to a classifier, and this leads the machine learning model to make inaccurate decisions in the future. Data poisoning attacks require that an adversary has a degree of control over classifiers.

At the human perception levels, the poisoned data can be very hard to detect and it is even harder to recognize what sorts of inputs will trigger reckless behaviour in a ML model. The complexities involved in detecting the input of malicious data into the ML model is described in a 2018 presentation conducted by Doug Tygar, Professor of Computer Science and Information Management at UC Berkeley.

Instance of a Data Poisoning Attack

In 2016, a Twitter chat bot named ‘Tay’ was launched by Microsoft. It had been programmed to learn to engage in friendly conversation in a casual and playful manner through repeated interactions with other users.

Detecting that the chat bot lacks a sufficient filter, cyber criminals began to feed offensive tweets into Tay’s machine learning algorithm. Thus, Tay’s tweets turned more offensive when it began to engage with more users.

Consequently, Microsoft was forced to shut down Tay 16 hours after its launch.

Evasion Attacks on ML Models

Adversaries look out for a machine learning model after the training phase to launch a successful evasion attack. As the developers or adversaries don’t know what malicious data input will tamper a ML model, evasion attacks often depend on the chances of trials and errors.

For example, if a machine learning model has been designed to filter out spam emails, an attacker might craft different emails with different content to see which one can bypass the filter. That is, to move the email from being recognized as ‘spam’ to ‘not spam’.

Adversaries can wage evasion attacks to damage the integrity of a ML model by making it produce incorrect output or produce a specific output intended by the adversary.

Other evasion attacks can aim at the confidentiality of an AI-based model as Professor Dawn Song has demonstrated in 2018 that she could extract social security numbers from a language processing ML model.

Combating Adversarial Machine Learning

Adversarial machine learning doesn’t pose an immediate threat in the future. However, cybersecurity researchers are concerned that as ML and AI are integrated into a broader array of our everyday systems like self-driving cars where human lives are at stake, it could become a serious problem. Researchers in the cybersecurity space have been publishing hundreds of papers since the threat of adversarial machine learning was identified by the research community a few years ago.

One half of the challenge lies in the ‘black box’ attribute of many machine learning systems whose logic is largely inscrutable not only to the AI-ML models’ developers, but also to the adversaries. The other half rests on the loophole that just one little crack in the security of the AI-ML model can open up an opportunity for a successful attack on the system.

Out of tons of promising solutions put forward by a number of researchers, only one or two seem to work but they don’t offer a complete solution to the problem. Two major approaches to ward off adversarial machine learning are described below.

Adversarial Training

Adversarial training is a potential approach for devising and deploying security measures ahead of time. It improves the robustness of a ML model and to train the model to deep learn the complexities of an adversarial attack. It is similar to building up an ‘immune system’ for the ML model. Though this approach has a good number of merits, it is insufficient to guard the ML model from all adversarial attacks since the scope of possible threats is wider and it is difficult to anticipate them and implement safeguard measures in advance.

‘Moving Target’ Strategy

This strategy demands continuously altering the ML algorithms that use classifiers. That is, continuously developing a ‘moving target’ by keeping the algorithms secret and altering the ML model on an occasional basis. A peek into the proposed strategy of a team of Harvard researchers on adversarial machine learning threat on a medical imaging software can give you a clear-cut idea of how a ‘moving target’ strategy works

How to Stay Safe

More than anything else, the ML model developers should regularly cross check and identify the potential adversarial machine learning threats. It is advisable that they should also continuously attempt to hack their own models to identify as many potential weak points as possible. The deeper understanding of the decision-making process of neural networks also helps in the process.

Similar articles you can read