III.

Industry examples of privacy and security

Use case 1: Preserving privacy with federated learning

One of the great challenges in preserving users’ privacy is that the machine learning applications using that data are often centralized (all the data of users/clients/devices need to be uploaded to cloud platforms for learning). Therefore, personal data has to leave people's devices inevitably. In addition, as the data is collected at a central point, the risk of potential attacks is high.

That’s where a federated learning approach might help. The largest network provider in Spain, Telefonica, takes a federated learning approach to deal with user privacy in the context of machine learning model development (FLaaSResearch on GitHub). In their research project, they envision an environment where multiple machine learning applications use personal data at a large scale. Such large-scale machine learning applications have been predominantly centralized. Federated learning allows the building of machine learning models in a decentralized fashion close to users’ data without the need to collect and process them centrally. Instead of collecting personal data from users’ devices, federated learning trains machine learning models locally on the users’ devices. Then, only the parameters of the machine learning models are collected to build a global model. This process is called federated aggregation, and the aggregation is repeated iteratively.

An illustration of federated learning
An illustration of federated learning

There are some important environmental characteristics that need to be understood. A federated learning environment typically assumes massively distributed client devices with intermittent and slow connections. As such, the availability of the data for training a model is highly dynamic, depending on the hour, day, and location of the devices. As a consequence, the devices that contribute to the training process are often non-representative of the population.

There are some trade-offs due to the dynamicity of the environment. Firstly, the model's accuracy could be lower than a centralized machine learning model, which collects all data on the centralized server. Secondly, as federated learning models involve numerous client devices and interact with them, they incur overhead for the client. The overhead includes the communication cost and the resources required for building local machine learning models. Thirdly, federated learning requires more time to build a stable global model, as federated aggregation is conducted iteratively over time. Lastly, due to the uncertainty of data availability for training, the models could face the non-IID (independent and identically distributed) issues of data. For example, more data could come from a certain group of people who share similar characteristics (like lifestyle, preferences, or socioeconomic status), which can lead to biased data sets.

Use case 2: Making complex mobile networks secure with AI

In the present era, 4G/5G mobile networks and IoT (Internet of Things) networks (like networks of smart IoT-enabled sensors and devices deployed throughout the city to monitor traffic conditions, parking availability, and public transportation systems) offer the potential to deliver exceptional features such as ultra-low latency, ultra-high throughput, ultra-high reliability, ultra-low energy consumption, and extensive connectivity.

To ensure the security of these networks, technical measures are rapidly incorporating various machine learning algorithms as an effective approach for intelligent, adaptive, and autonomous security management. This enables addressing the increasing complexities of the network. AI can identify anomalous patterns from vast amounts of dynamic and multi-dimensional data, enabling faster and more accurate decision-making. However, integrating AI methods in IoT and future mobile networks is still at an early stage. More research is needed on efficient code and data auditing of safety-critical systems to benefit both users and developers. The SPATIAL research project, that this course is a part of, targets three key issues in the 4G/5G/IoT domain:

Lack of real-world datasets. The effectiveness of AI models, including supervised machine learning, heavily depends on the availability of extensive, accurately labeled datasets. As a result, data quality plays a pivotal role in driving advancements in modern AI research. However, obtaining diverse real-world datasets from 4G/5G/IoT networks is challenging due to strict privacy regulations imposed on all telecom operators.

Lack of explainability. Currently, AI approaches implemented in security solutions primarily prioritize accuracy and performance metrics such as precision, recall, and resource utilization. However, these approaches often lack the capability to explain the decision-making process behind a specific output. In the context of the 5G network, the ability to explain decisions becomes crucial, particularly considering the reliance of numerous critical services on the 5G infrastructure.

Lack of resilience against adversarial attacks. Machine learning models are susceptible to adversarial attacks, which involve introducing small and deliberately crafted perturbations into the data. These perturbations deceive the underlying machine learning models into making incorrect decisions. Building robust defenses against such adversarial attacks poses a significant challenge, as there is currently no foolproof solution that guarantees complete protection against these types of attacks.

Montimage, a French network security tool provider, is deploying a real 4G/5G/IoT testbed to evaluate AI techniques for explainability, resiliency, and distribution. They are applying XAI methods to assess the current techniques for performing cybersecurity analysis and protection of 4G/5G/IoT networks, including tasks such as encrypted traffic analysis and root cause analysis.

Part summary

In this chapter, we learned that…

  • Evasion attacks and label flipping attacks are common types of security attacks that target the inference phase of machine learning models to compromise their integrity, while data inference attacks are privacy attacks that take advantage of information leaked by machine learning models to obtain information about individuals whose data is used to train the models.

  • There are three main types of attacks on machine learning models – evasion attacks, data poisoning attacks, and privacy attacks. The mitigation strategies for each are adversarial training for evasion attacks, data sanitization for data poisoning attacks, and differential privacy for privacy attacks.

  • The industry is using federated learning to preserve user privacy. Training machine learning models in a decentralized fashion close to the users' data addresses the challenges of using AI algorithms in 4G/5G/IoT networks, such as the lack of real-world datasets, explainability, and resilience against adversarial attacks.

Next Chapter
5. Conclusion