II.

Technical solutions to mitigate attacks

Mitigating security attacks

Evasion attack

Mitigating evasion attacks can be achieved through two main approaches:

  • Improvement of the machine learning (ML) model resilience by design

  • Deployment of defenses to protect the machine learning system, which remains unmodified.

The predominant solution to mitigate evasion attacks by design is ‘adversarial training. It consists of training machine learning models using both normal training data and adversarial examples (used in evasion attacks). This technique isn’t as simple as injecting adversarial examples into a clean training dataset. In contrast, different versions of adversarial examples are generated and used during the several training epochs of the machine learning model. It usually causes some decrease in accuracy, and one must consider the accuracy/security trade-off when using it. Adversarial training is the most effective mitigation method against evasion attacks developed to date. In addition, it can provide theoretical proof of resilience: a warranty that a clean sample subject to some maximum modifications can’t evade the protected machine learning system.

An alternative to mitigate evasion attacks is monitoring and sanitizing any requests coming to the model and filtering out potentially adversarial inputs. Protection systems like this can be used to detect adversarial examples being used in evasion attacks and prevent them from reaching the machine learning system. A common solution to deploy this defense is by modeling the expected distribution of data that’s supposed to be submitted to a machine learning model and detecting any queries that seem out of this distribution – in other words, recognizing and flagging odd and unexpected queries.

It's important to note that no single method can provide foolproof protection against all possible adversarial attacks. Employing a combination of these techniques and adopting a holistic approach to security and robustness can significantly enhance the resilience of AI models against evasion attacks. Ongoing research and development in the field of adversarial machine learning continues to explore new defenses and countermeasures to stay ahead of evolving attack strategies.

Data poisoning

Data poisoning attacks are correlated with low data quality or data that’s compromised. A first mitigation strategy is to ensure the quality and provenance of data. One should collect training data from trusted sources as much as possible. Then, data preprocessing and cleaning must be employed to discard any anomalous and potentially compromised data, thereby increasing data quality. Finally, some pre-processing methods such as data smoothing (algorithmically fitting noisy data to smoother trends), can be used to eliminate anomalies and potential adversarial modifications from poisoned data. Smoothing is a technique that consists of reducing the irregularities and singularities of a curve in mathematics.

Tailor-made detectors can be designed to identify and discard poisoned data. These can rely on modeling of normal training data distribution and anomaly detection to see if the input data is similar to the data we expect, or on clustering techniques to identify compromised clusters of similar data. When using clustering as an approach, the first task is to choose a clustering algorithm, for example, k-means. The chosen clustering algorithm will be applied to the training dataset. Each data point gets assigned to a cluster based on its similarity to other data points. These clusters can then be used to detect anomalies in input data – the new input data should, when applied to the clustering model, belong to the cluster of “expected values” while other clusters are potentially malicious and should be discarded from the cleaned training dataset. However, poisoning data detectors are generally difficult to design and have suboptimal performance because the detection of poisoning attacks depends on the type of data dealt with and the type of machine learning model to be trained.

Research led by Tartu University investigated how explainable AI (XAI) methods can be used to identify poisoned data. They deployed a convolutional deep learning model trained with the TrashNet litter classification dataset consisting of 2527 images. This data was poisoned through blurring and steganography (simply put, this means concealing information in an image) and tested three model-agnostic XAI methods: LIME, SHAP, and occlusion sensitivity. The XAI methods were indeed able to identify poisoned data, with LIME and occlusion sensitivity being generally better at it. There are some limitations, however. The methods are unable to distinguish between targeted attacks and other malfunctions, and their performance depends on sensor modality, environments, and input data. Another finding was that XAI methods help to identify the important features of the image, even after data is poisoned – but their effectiveness is affected by the object, the type of attack, and the extent of poisoning generated by the attack.

In general, the most effective way to mitigate data poisoning attacks is by sanitizing data prior to training. However, sometimes this process fails or isn’t possible and there exist methods to identify a machine learning model poisoned with backdoors by performing some type of evasion attacks against them, which are meant to identify and reconstruct potential backdoors.

Mitigating privacy attacks

Differential privacy

Differential privacy deals with various types of information leakage problems caused by de-anonymization. Differential privacy is distinguished from simple removal of the personal identifiable information (PII) fields, such as name, ID, or contacts. Even if the data is pre-processed to drop the PII fields, studies have shown how individuals can be identified using the other fields available in the data. For example, if an attacker knows that a certain person is included in the data set and has information about some elements of the person, the attacker can narrow down the scope of the search to the set of people with matching elements. In general, the more fields available and the more possible entries for the fields that exist in the data set, the possibility of identifying a particular row as a unique person increases.

The idea of differential privacy is to deliberately add noise to the data so that the data still preserves the characteristics and provides an accurate response to the queries while allowing plausible deniability, in other words, removing the possibility of drawing conclusions from the sensitive data and providing an additional layer of protection. In reality, the level of noise is carefully calibrated to achieve a balance between privacy and utility. For example, a differential privacy mechanism can go through the data records (where a record corresponds to an individual) and replace a certain portion with randomized records (whose fields have random values). As a record of an individual could be either a true record of the person or a randomized record, the individual has plausible deniability about whether the record is their information.

Next section
III. Industry examples of privacy and security