Jump to content

Federated Learning: Difference between revisions

From Edge Computing Wiki
No edit summary
No edit summary
Line 1: Line 1:
== Federated Learning in Edge Computing ==
=== Overview ===
=== Overview ===
Federated Learning (FL) is a decentralized learning framework where multiple clients—such as smartphones, IoT devices, or sensors—collaboratively train a machine learning model without exchanging their raw data. Instead, model updates are exchanged and aggregated, which significantly improves data privacy and reduces network congestion.
Federated Learning (FL) is a decentralized learning framework that enables multiple devices—such as smartphones, IoT sensors, or edge gateways—to collaboratively train a shared machine learning model without transmitting their raw data. Each device computes model updates locally and sends only these updates (not the actual data) to an aggregator.


When deployed in Edge Computing (EC) environments, FL allows training to occur close to where data is generated, enabling real-time, low-latency, and privacy-aware intelligence across a distributed infrastructure [1].
When combined with Edge Computing (EC), which brings computational power closer to data sources, FL enables real-time, privacy-aware learning across distributed infrastructures. This is particularly useful in applications involving sensitive data, limited bandwidth, or regulatory constraints such as GDPR and HIPAA [1].


=== Background ===
=== Background ===
In traditional machine learning, all data must be collected and stored centrally before training begins. This is impractical for edge devices, where bandwidth is limited, privacy is crucial, and latency is sensitive. FL solves this by keeping data local and only sharing learned parameters.
Traditional machine learning depends on aggregating data in a centralized server or cloud, which can be inefficient or unsafe for edge-based environments. Federated Learning addresses this by performing training on-device.


The FL process follows an iterative pattern:
In each FL round:
# The server initializes a global model.
# The central server initializes a global model.
# Edge devices download the model and train it locally on their private datasets.
# Selected clients download the model and train it on their local datasets.
# Devices send their learned model parameters to the server.
# Clients send back their updated model parameters.
# The server aggregates these updates and redistributes a new global model.
# The server aggregates these updates to improve the global model.
# The process repeats for multiple rounds until the model converges.
# The process continues iteratively until convergence.


This process is mathematically formalized using global and local loss functions, which are optimized collaboratively across all clients.
Mathematically, the goal is to minimize a global loss function F(w), defined as the weighted average of local loss functions Fₖ(w) across all participating clients. The formula is:
 
F(w) = Σ (λₖ * Fₖ(w))  for k = 1 to N
 
where:
- Fₖ(w) is the loss function of client k,
- λₖ = nₖ / n is the weight proportional to the client’s dataset size,
- nₖ is the number of data points on client k, and
- n is the total number of data points across all clients [3].


=== Architectures ===
=== Architectures ===
FL can follow different system structures depending on the deployment setting.
FL can follow several system architectures:


'''Centralized''': A single central server orchestrates the learning process. It distributes the model and collects updates. While easy to manage, this introduces a single point of failure [1].
In the **centralized architecture**, a single server manages the learning lifecycle. It distributes models, receives client updates, and performs aggregation. This setup is simple but suffers from scalability limitations and a single point of failure [1].


'''Decentralized''': Clients communicate directly using peer-to-peer protocols. This removes reliance on a central server but complicates synchronization and increases communication cost [2].
The **decentralized architecture** removes the server. Clients communicate directly using peer-to-peer or blockchain mechanisms. Although this improves robustness and decentralization, it increases coordination overhead and system complexity [2].


'''Hierarchical''': Intermediate edge servers aggregate updates from nearby clients and send these to the cloud. This balances scalability and communication efficiency, especially in smart cities and industrial systems [1][3].
In **hierarchical FL**, intermediate edge servers are introduced between clients and the cloud. Local edge servers collect updates from their associated devices and forward aggregated updates to the cloud server. This improves scalability and reduces communication latency, especially in distributed systems like smart cities or factories [1][3].


=== Aggregation Algorithms ===
=== Aggregation Algorithms ===


The most foundational mathematical concept in FL is the global optimization objective. Let \( F(w) \) be the global loss function, where \( w \) is the model parameter vector. This objective is defined as a weighted average of the loss functions of all participating clients:
The core task of the server is to combine the clients’ model updates into a unified global model. The most basic approach is **Federated Averaging (FedAvg)**.
 
:<math>F(w) = \sum_{k=1}^{N} \lambda_k F_k(w)</math>
 
Here, \( F_k(w) \) is the local loss function for client \( k \), and \( \lambda_k = \frac{n_k}{n} \) represents the weight for each client, proportional to the number of local data samples \( n_k \) [3].
 
The most widely used aggregation method is **Federated Averaging (FedAvg)**, where each client performs multiple steps of local gradient descent before sending updates to the server. The server then performs a weighted average of all updates:


:<math>w^{t+1}_C = \sum_{k=1}^{K} \lambda_k w_k</math>
In FedAvg, each device performs local training and sends back updated weights. The server computes a weighted average of all submitted models to form the next version of the global model.


This formula produces the next version of the global model \( w^{t+1}_C \), based on local updates \( w_k \) from \( K \) participating clients in round \( t \) [3].
The update rule is:


However, FedAvg can struggle in scenarios where the data across clients is non-identically distributed (non-IID). To overcome this, **FedProx** introduces a proximity term into each client's objective to discourage divergence from the global model. This is formalized as:
Global model (next round) = Σ (λₖ * wₖ) for k = 1 to K


:<math>F_k(w) = \mathbb{E}_{x_k \sim D_k} [f(w_k; x_k)] + \rho \| w_k - w^t_C \|^2</math>
where:
- wₖ is the locally trained model of client k,
- λₖ is the client’s weight (usually based on data volume),
- K is the number of clients participating in that round [3].


The term \( \rho \| w_k - w^t_C \|^2 \) penalizes clients for straying too far from the global model \( w^t_C \). The parameter \( \rho \) controls the strength of this regularization [3].
When data is non-IID, FedAvg struggles with stability. To address this, **FedProx** modifies the local objective function by adding a proximity term that penalizes large deviations from the global model.


During local training, clients use this regularized loss to update their models using gradient descent:
Local objective in FedProx:
Fₖ(w) = Expected loss over client data + ρ * ||wₖ - w_C||²


:<math>w_k \leftarrow w_k - \eta \cdot \frac{1}{B} \sum_{x_i \in \mathcal{I}_k} \left( \nabla f(w_k; x_i) + 2\rho (w_k - w^t_C) \right)</math>
where:
- wₖ is the local model,
- w_C is the global model sent by the server,
- ρ is a regularization parameter that controls the strength of the penalty [3].


Where \( \mathcal{I}_k \) is a local mini-batch, \( \eta \) is the learning rate, and \( B \) is the batch size [3].
During training, each client updates its model using a modified SGD rule that includes the gradient of this penalty term.


=== Communication Efficiency ===
=== Communication Efficiency ===
In edge computing scenarios, bandwidth is limited and transmission energy is costly. FL addresses this with several optimizations to reduce communication load.


Quantization reduces the size of transmitted updates by lowering numerical precision. Sparsification sends only the most important updates (e.g., top-k gradients), and periodic communication allows clients to perform several local updates before transmitting.
Edge devices are often resource-constrained. FL minimizes communication costs using several optimization techniques:


Another common practice is **client sampling**, where only a fraction of clients are chosen to participate in each training round, balancing quality and cost.
- **Gradient quantization**: Converts floating-point updates to lower-bit formats.
- **Sparsification**: Sends only the top-k most important model updates.
- **Client sampling**: Randomly selects a fraction of clients to participate per round.
- **Local update batching**: Devices train over multiple local epochs before sending updates.


'''Comparison: Federated vs Traditional Learning'''
These strategies allow FL to operate efficiently even with limited connectivity.


{| class="wikitable"
{| class="wikitable"
|+ Federated Learning vs Traditional ML
|+ Federated Learning vs Traditional Machine Learning
! Feature !! Federated Learning !! Traditional Learning
! Feature !! Federated Learning !! Traditional ML
|-
|-
| Data location || On-device || Central server
| Data Location || Remains on device || Sent to central server
|-
|-
| Privacy risk || Low || High
| Privacy Risk || Lower || Higher
|-
|-
| Bandwidth usage || Low || High
| Communication Overhead || Low (only updates shared) || High (raw data transfer)
|-
|-
| Latency || Low (edge-based) || High (cloud-based)
| Latency || Lower (local inference) || Higher (cloud round-trip)
|-
|-
| Trust model || Distributed || Centralized
| Scalability || Medium to high || Limited by central compute
|}
|}


=== Privacy and Security ===
=== Privacy and Security ===


Although FL is designed with privacy in mind, it is still vulnerable to attacks like gradient leakage, model poisoning, and backdoor injection. To address this, various mathematical and cryptographic techniques are used.
Although FL inherently improves privacy by keeping data local, it is still vulnerable to security threats such as:
- Gradient inversion attacks (reconstructing private data from updates),
- Model poisoning (malicious updates to corrupt the global model), and
- Backdoor attacks (trigger-based model manipulation).


'''Differential Privacy (DP)''' guarantees that the output of a computation is statistically similar regardless of whether any one individual’s data is included. The standard DP definition is:
To address these, FL integrates several defense mechanisms:


:<math>P(A(D) \in S) \leq e^\epsilon P(A(D') \in S) + \delta</math>
**Differential Privacy (DP)**: Introduces random noise into model updates, making it statistically improbable to infer any individual’s data. A computation A is (ε, δ)-differentially private if:


Here, \( D \) and \( D' \) are datasets that differ by one user’s record, \( A \) is the algorithm, \( \epsilon \) is the privacy budget, and \( \delta \) is the failure probability [4].
P(A(D) ∈ S) ≤ exp(ε) * P(A(D′) ∈ S) + δ


'''Secure Aggregation''' ensures that the server cannot see any individual update, only the final sum. This can be achieved using homomorphic encryption. For example, in additive homomorphic schemes:
where:
- D and D′ are datasets differing by one user,
- ε is the privacy budget, and
- δ is the probability of failure [4].


:<math>Enc(a) \cdot Enc(b) = Enc(a + b)</math>
**Secure Aggregation**: Uses cryptographic techniques to ensure the server sees only the aggregated result, not individual updates.


This allows the server to perform aggregation directly on encrypted data without accessing the unencrypted updates [4].
**Homomorphic Encryption**: Allows servers to perform aggregation on encrypted updates. For example, in additive encryption schemes:
 
Enc(a) + Enc(b) = Enc(a + b)
 
This ensures data remains encrypted throughout computation [4].


=== Applications ===
=== Applications ===


In healthcare, hospitals use FL to build disease prediction and medical image analysis models without sharing patient records. This improves diagnosis while preserving compliance with laws such as GDPR and HIPAA [1].
Federated Learning is applicable across diverse domains where privacy and decentralization are critical.


Autonomous vehicles use FL to collaboratively learn driving models across a fleet. Each car collects data about road conditions and object recognition, trains a local model, and shares updates for global improvement—without transmitting any raw video or location data.
In **healthcare**, hospitals collaborate on AI models for diagnosis (e.g., cancer detection, pandemic tracking) without sharing sensitive medical data. FL enables legal and ethical cooperation while improving model generalization [1].


Smart cities implement FL across infrastructure like traffic lights, pollution sensors, and utility meters. Local learning reduces latency and enhances citizen privacy [1][4].
**Autonomous vehicles** use FL to build shared perception and navigation models. Cars learn locally from driving conditions and contribute only model updates, not raw camera feeds or GPS coordinates.


Mobile applications like keyboard prediction and fitness tracking benefit from personalized learning without compromising user data. Devices such as smartwatches and phones contribute to a shared model while maintaining user confidentiality.
**Smart cities** deploy FL to train predictive models using data from traffic lights, pollution sensors, and surveillance systems—without centralizing citizen data [1][4].


In the Industrial IoT (IIoT), FL allows for real-time fault detection and predictive maintenance using machine logs and sensor data. Proprietary information stays protected while models continue to improve collaboratively.
In **mobile devices**, FL powers personalized services such as next-word prediction, speech recognition, and activity tracking, all while preserving user privacy.
 
**Industrial IoT** applications use FL to train predictive maintenance models and optimize energy usage based on localized sensor data without exposing proprietary processes.


=== Challenges ===
=== Challenges ===


Despite its potential, FL faces several technical challenges.
While FL offers clear advantages, several barriers remain:
 
**Scalability** is hindered by device heterogeneity, dropout, and asynchronous updates. Solutions include hierarchical aggregation and adaptive scheduling algorithms.


Scalability remains an issue due to variable device availability, network unreliability, and model complexity. Techniques like asynchronous updates and hierarchical aggregation are actively being researched.
**Data heterogeneity** is a core challenge—clients may have very different data types and distributions. Personalized FL techniques aim to address this through client-specific model tuning.


Client heterogeneity causes problems because not all devices are equal in terms of compute power, battery life, or data quality. Handling non-IID data and creating adaptive participation strategies are critical areas of focus.
**Security risks** like poisoning, inference attacks, and Sybil attacks demand robust aggregation schemes, anomaly detection, and secure hardware environments.


Security is a major concern. Adversaries may launch poisoning attacks by injecting malicious updates, or attempt gradient inversion to recover private training data. Countermeasures include robust aggregation, anomaly detection, and use of secure hardware enclaves [2].
**Incentivization** is also critical. Clients expend computational resources, so fair reward systems (e.g., token-based models) are needed to encourage participation.


Incentivizing participation is another open problem. FL consumes device resources, so fair contribution tracking and reward mechanisms—such as token systems or FL marketplaces—are essential for long-term viability.
**Interoperability** across device platforms, networks, and frameworks remains an engineering challenge, necessitating standards for FL APIs, data formats, and deployment protocols.


=== References ===
=== References ===

Revision as of 00:18, 2 April 2025

Federated Learning in Edge Computing

Overview

Federated Learning (FL) is a decentralized learning framework that enables multiple devices—such as smartphones, IoT sensors, or edge gateways—to collaboratively train a shared machine learning model without transmitting their raw data. Each device computes model updates locally and sends only these updates (not the actual data) to an aggregator.

When combined with Edge Computing (EC), which brings computational power closer to data sources, FL enables real-time, privacy-aware learning across distributed infrastructures. This is particularly useful in applications involving sensitive data, limited bandwidth, or regulatory constraints such as GDPR and HIPAA [1].

Background

Traditional machine learning depends on aggregating data in a centralized server or cloud, which can be inefficient or unsafe for edge-based environments. Federated Learning addresses this by performing training on-device.

In each FL round:

  1. The central server initializes a global model.
  2. Selected clients download the model and train it on their local datasets.
  3. Clients send back their updated model parameters.
  4. The server aggregates these updates to improve the global model.
  5. The process continues iteratively until convergence.

Mathematically, the goal is to minimize a global loss function F(w), defined as the weighted average of local loss functions Fₖ(w) across all participating clients. The formula is:

F(w) = Σ (λₖ * Fₖ(w)) for k = 1 to N

where: - Fₖ(w) is the loss function of client k, - λₖ = nₖ / n is the weight proportional to the client’s dataset size, - nₖ is the number of data points on client k, and - n is the total number of data points across all clients [3].

Architectures

FL can follow several system architectures:

In the **centralized architecture**, a single server manages the learning lifecycle. It distributes models, receives client updates, and performs aggregation. This setup is simple but suffers from scalability limitations and a single point of failure [1].

The **decentralized architecture** removes the server. Clients communicate directly using peer-to-peer or blockchain mechanisms. Although this improves robustness and decentralization, it increases coordination overhead and system complexity [2].

In **hierarchical FL**, intermediate edge servers are introduced between clients and the cloud. Local edge servers collect updates from their associated devices and forward aggregated updates to the cloud server. This improves scalability and reduces communication latency, especially in distributed systems like smart cities or factories [1][3].

Aggregation Algorithms

The core task of the server is to combine the clients’ model updates into a unified global model. The most basic approach is **Federated Averaging (FedAvg)**.

In FedAvg, each device performs local training and sends back updated weights. The server computes a weighted average of all submitted models to form the next version of the global model.

The update rule is:

Global model (next round) = Σ (λₖ * wₖ) for k = 1 to K

where: - wₖ is the locally trained model of client k, - λₖ is the client’s weight (usually based on data volume), - K is the number of clients participating in that round [3].

When data is non-IID, FedAvg struggles with stability. To address this, **FedProx** modifies the local objective function by adding a proximity term that penalizes large deviations from the global model.

Local objective in FedProx: Fₖ(w) = Expected loss over client data + ρ * ||wₖ - w_C||²

where: - wₖ is the local model, - w_C is the global model sent by the server, - ρ is a regularization parameter that controls the strength of the penalty [3].

During training, each client updates its model using a modified SGD rule that includes the gradient of this penalty term.

Communication Efficiency

Edge devices are often resource-constrained. FL minimizes communication costs using several optimization techniques:

- **Gradient quantization**: Converts floating-point updates to lower-bit formats. - **Sparsification**: Sends only the top-k most important model updates. - **Client sampling**: Randomly selects a fraction of clients to participate per round. - **Local update batching**: Devices train over multiple local epochs before sending updates.

These strategies allow FL to operate efficiently even with limited connectivity.

Federated Learning vs Traditional Machine Learning
Feature Federated Learning Traditional ML
Data Location Remains on device Sent to central server
Privacy Risk Lower Higher
Communication Overhead Low (only updates shared) High (raw data transfer)
Latency Lower (local inference) Higher (cloud round-trip)
Scalability Medium to high Limited by central compute

Privacy and Security

Although FL inherently improves privacy by keeping data local, it is still vulnerable to security threats such as: - Gradient inversion attacks (reconstructing private data from updates), - Model poisoning (malicious updates to corrupt the global model), and - Backdoor attacks (trigger-based model manipulation).

To address these, FL integrates several defense mechanisms:

    • Differential Privacy (DP)**: Introduces random noise into model updates, making it statistically improbable to infer any individual’s data. A computation A is (ε, δ)-differentially private if:

P(A(D) ∈ S) ≤ exp(ε) * P(A(D′) ∈ S) + δ

where: - D and D′ are datasets differing by one user, - ε is the privacy budget, and - δ is the probability of failure [4].

    • Secure Aggregation**: Uses cryptographic techniques to ensure the server sees only the aggregated result, not individual updates.
    • Homomorphic Encryption**: Allows servers to perform aggregation on encrypted updates. For example, in additive encryption schemes:

Enc(a) + Enc(b) = Enc(a + b)

This ensures data remains encrypted throughout computation [4].

Applications

Federated Learning is applicable across diverse domains where privacy and decentralization are critical.

In **healthcare**, hospitals collaborate on AI models for diagnosis (e.g., cancer detection, pandemic tracking) without sharing sensitive medical data. FL enables legal and ethical cooperation while improving model generalization [1].

    • Autonomous vehicles** use FL to build shared perception and navigation models. Cars learn locally from driving conditions and contribute only model updates, not raw camera feeds or GPS coordinates.
    • Smart cities** deploy FL to train predictive models using data from traffic lights, pollution sensors, and surveillance systems—without centralizing citizen data [1][4].

In **mobile devices**, FL powers personalized services such as next-word prediction, speech recognition, and activity tracking, all while preserving user privacy.

    • Industrial IoT** applications use FL to train predictive maintenance models and optimize energy usage based on localized sensor data without exposing proprietary processes.

Challenges

While FL offers clear advantages, several barriers remain:

    • Scalability** is hindered by device heterogeneity, dropout, and asynchronous updates. Solutions include hierarchical aggregation and adaptive scheduling algorithms.
    • Data heterogeneity** is a core challenge—clients may have very different data types and distributions. Personalized FL techniques aim to address this through client-specific model tuning.
    • Security risks** like poisoning, inference attacks, and Sybil attacks demand robust aggregation schemes, anomaly detection, and secure hardware environments.
    • Incentivization** is also critical. Clients expend computational resources, so fair reward systems (e.g., token-based models) are needed to encourage participation.
    • Interoperability** across device platforms, networks, and frameworks remains an engineering challenge, necessitating standards for FL APIs, data formats, and deployment protocols.

References

  1. Abreha, H.G., Hayajneh, M., & Serhani, M.A. (2022). Federated Learning in Edge Computing: A Systematic Survey. Sensors, 22(2), 450.
  2. Lyu, L., Yu, H., & Yang, Q. (2020). Threats to Federated Learning: A Survey. arXiv preprint arXiv:2003.02133.
  3. Li, T., Sahu, A.K., Talwalkar, A., & Smith, V. (2020). Federated Learning: Challenges, Methods, and Future Directions. IEEE Signal Processing Magazine, 37(3), 50–60.
  4. Kairouz, P., et al. (2019). Advances and Open Problems in Federated Learning. arXiv preprint arXiv:1912.04977.