Jump to content

Federated Learning: Difference between revisions

From Edge Computing Wiki
No edit summary
No edit summary
Line 2: Line 2:


=== Overview ===
=== Overview ===
Federated Learning (FL) is a decentralized learning framework that enables multiple devices—such as smartphones, IoT sensors, or edge gateways—to collaboratively train a shared machine learning model without transmitting their raw data. Each device computes model updates locally and sends only these updates (not the actual data) to an aggregator.
Federated Learning (FL) is a decentralized approach to training machine learning models. In FL, edge devices such as smartphones, sensors, or IoT gateways train models locally and only share model updates, not raw data. This minimizes privacy risks and reduces communication overhead.


When combined with Edge Computing (EC), which brings computational power closer to data sources, FL enables real-time, privacy-aware learning across distributed infrastructures. This is particularly useful in applications involving sensitive data, limited bandwidth, or regulatory constraints such as GDPR and HIPAA [1].
When combined with Edge Computing (EC)—which brings computational power closer to the data source—FL enables privacy-preserving, real-time intelligence across distributed systems. It is especially useful in domains like healthcare, smart cities, autonomous vehicles, and industrial IoT where data sensitivity and latency are key concerns.


=== Background ===
=== Background ===
Traditional machine learning depends on aggregating data in a centralized server or cloud, which can be inefficient or unsafe for edge-based environments. Federated Learning addresses this by performing training on-device.
In traditional machine learning, data is collected in a central server for training. This becomes inefficient and risky when devices generate massive volumes of private data at the edge. FL solves this problem by keeping data local.


In each FL round:
A typical FL cycle involves these steps:
# The central server initializes a global model.
# The central server sends the global model to selected clients.
# Selected clients download the model and train it on their local datasets.
# Each client trains the model on its own local dataset.
# Clients send back their updated model parameters.
# The client sends back updated model parameters to the server.
# The server aggregates these updates to improve the global model.
# The server aggregates the updates into a new global model.
# The process continues iteratively until convergence.
# The process repeats for multiple rounds.


Mathematically, the goal is to minimize a global loss function F(w), defined as the weighted average of local loss functions Fₖ(w) across all participating clients. The formula is:
The global objective function is:


F(w) = Σ (λₖ * Fₖ(w))  for k = 1 to N
F(w) = sum over k from 1 to N of [ λ_k * F_k(w) ]


where:
Where:
- Fₖ(w) is the loss function of client k,
- F_k(w) is the loss function on client k,
- λₖ = nₖ / n is the weight proportional to the client’s dataset size,
- λ_k = n_k / n,
- nₖ is the number of data points on client k, and
- n_k is the number of samples on client k,
- n is the total number of data points across all clients [3].
- n is the total number of samples across all clients.
 
This ensures clients with more data influence the model proportionally.


=== Architectures ===
=== Architectures ===
FL can follow several system architectures:


In the **centralized architecture**, a single server manages the learning lifecycle. It distributes models, receives client updates, and performs aggregation. This setup is simple but suffers from scalability limitations and a single point of failure [1].
'''Centralized Federated Learning'''
 
In this structure, a central server manages model distribution and aggregation. Clients train locally and send updates to the server. This is easy to implement but creates a single point of failure.


The **decentralized architecture** removes the server. Clients communicate directly using peer-to-peer or blockchain mechanisms. Although this improves robustness and decentralization, it increases coordination overhead and system complexity [2].
'''Decentralized Federated Learning'''


In **hierarchical FL**, intermediate edge servers are introduced between clients and the cloud. Local edge servers collect updates from their associated devices and forward aggregated updates to the cloud server. This improves scalability and reduces communication latency, especially in distributed systems like smart cities or factories [1][3].
There is no central server. Clients exchange model updates directly using peer-to-peer or blockchain protocols. While more resilient, this method is harder to synchronize and manage.


=== Aggregation Algorithms ===
'''Hierarchical Federated Learning'''


The core task of the server is to combine the clients’ model updates into a unified global model. The most basic approach is **Federated Averaging (FedAvg)**.
Edge servers collect updates from nearby devices, aggregate them, and forward results to the cloud. This reduces latency and balances load between edge and cloud resources.


In FedAvg, each device performs local training and sends back updated weights. The server computes a weighted average of all submitted models to form the next version of the global model.
=== Aggregation Algorithms ===


The update rule is:
The most common aggregation method is Federated Averaging (FedAvg). Each client trains locally and sends updated weights, which the server averages. The formula is:


Global model (next round) = Σ (λₖ * wₖ) for k = 1 to K
w_next = sum over k of [ λ_k * w_k ]


where:
Where:
- wₖ is the locally trained model of client k,
- w_k is the local model from client k,
- λₖ is the client’s weight (usually based on data volume),
- λ_k is based on the client’s data size.
- K is the number of clients participating in that round [3].


When data is non-IID, FedAvg struggles with stability. To address this, **FedProx** modifies the local objective function by adding a proximity term that penalizes large deviations from the global model.
When client data is highly variable (non-IID), FedAvg struggles. FedProx improves stability by adding a regularization term:


Local objective in FedProx:
F_k(w) = Local loss + ρ * ||w_k - w_global||^2
Fₖ(w) = Expected loss over client data + ρ * ||wₖ - w_C||²


where:
Here, ρ is a tuning parameter and w_global is the last global model. This discourages clients from diverging too far from the shared model.
- wₖ is the local model,
- w_C is the global model sent by the server,
- ρ is a regularization parameter that controls the strength of the penalty [3].


During training, each client updates its model using a modified SGD rule that includes the gradient of this penalty term.
Local model updates also change to include the regularization in their gradient descent.


=== Communication Efficiency ===
=== Communication Efficiency ===


Edge devices are often resource-constrained. FL minimizes communication costs using several optimization techniques:
To reduce transmission costs and support weak network conditions, FL systems use:
 
- Gradient quantization: Sending compressed updates.
- **Gradient quantization**: Converts floating-point updates to lower-bit formats.
- Sparsification: Only sending most important updates.
- **Sparsification**: Sends only the top-k most important model updates.
- Local update batching: Performing multiple training steps before communicating.
- **Client sampling**: Randomly selects a fraction of clients to participate per round.
- Client sampling: Selecting only a few clients each round.
- **Local update batching**: Devices train over multiple local epochs before sending updates.


These strategies allow FL to operate efficiently even with limited connectivity.
These techniques save energy and bandwidth without significantly affecting model accuracy.


{| class="wikitable"
{| class="wikitable"
|+ Federated Learning vs Traditional Machine Learning
|+ Comparison: Federated Learning vs Traditional Machine Learning
! Feature !! Federated Learning !! Traditional ML
! Feature !! Federated Learning !! Traditional Machine Learning
|-
|-
| Data Location || Remains on device || Sent to central server
| Data Location || On-device || Centralized
|-
|-
| Privacy Risk || Lower || Higher
| Privacy Risk || Low || High
|-
|-
| Communication Overhead || Low (only updates shared) || High (raw data transfer)
| Bandwidth Use || Low || High
|-
|-
| Latency || Lower (local inference) || Higher (cloud round-trip)
| Latency || Low (local) || High (cloud round-trip)
|-
|-
| Scalability || Medium to high || Limited by central compute
| Scalability || High (with sampling and compression) || Moderate
|}
|}


=== Privacy and Security ===
=== Privacy and Security ===


Although FL inherently improves privacy by keeping data local, it is still vulnerable to security threats such as:
Although FL avoids raw data collection, it is not completely immune to privacy threats. Attackers can reverse-engineer updates to infer sensitive information.
- Gradient inversion attacks (reconstructing private data from updates),
 
- Model poisoning (malicious updates to corrupt the global model), and
'''Differential Privacy'''
- Backdoor attacks (trigger-based model manipulation).
 
This technique adds random noise to updates to obscure individual data points. It satisfies the formula:


To address these, FL integrates several defense mechanisms:
P(A(D) ∈ S) ≤ exp(ε) * P(A(D') ∈ S) + δ


**Differential Privacy (DP)**: Introduces random noise into model updates, making it statistically improbable to infer any individual’s data. A computation A is , δ)-differentially private if:
Where D and D′ differ by one user's data, ε is the privacy budget, and δ is a small tolerance for failure.


P(A(D) ∈ S) ≤ exp(ε) * P(A(D′) ∈ S) + δ
'''Secure Aggregation'''


where:
Uses cryptographic techniques so that the server only sees the sum of updates, not each one individually.
- D and D′ are datasets differing by one user,
- ε is the privacy budget, and
- δ is the probability of failure [4].


**Secure Aggregation**: Uses cryptographic techniques to ensure the server sees only the aggregated result, not individual updates.
'''Homomorphic Encryption'''


**Homomorphic Encryption**: Allows servers to perform aggregation on encrypted updates. For example, in additive encryption schemes:
Allows computations on encrypted updates. For example, with additive encryption:


Enc(a) + Enc(b) = Enc(a + b)
Enc(a) + Enc(b) = Enc(a + b)


This ensures data remains encrypted throughout computation [4].
This keeps updates private even during aggregation.


=== Applications ===
=== Applications ===


Federated Learning is applicable across diverse domains where privacy and decentralization are critical.
'''Healthcare'''
 
Hospitals collaboratively train disease diagnosis models without exchanging patient data. This protects privacy and supports regulation compliance.
 
'''Autonomous Vehicles'''
 
Cars learn from local driving environments and send encrypted updates. These updates help build a shared driving model without exposing personal or location data.
 
'''Smart Cities'''


In **healthcare**, hospitals collaborate on AI models for diagnosis (e.g., cancer detection, pandemic tracking) without sharing sensitive medical data. FL enables legal and ethical cooperation while improving model generalization [1].
FL supports distributed learning across traffic lights, pollution sensors, and public safety systems. The result is real-time learning with citizen privacy intact.


**Autonomous vehicles** use FL to build shared perception and navigation models. Cars learn locally from driving conditions and contribute only model updates, not raw camera feeds or GPS coordinates.
'''Mobile Applications'''


**Smart cities** deploy FL to train predictive models using data from traffic lights, pollution sensors, and surveillance systems—without centralizing citizen data [1][4].
Apps like keyboard predictors and fitness trackers use FL to improve personalization while keeping your data local.


In **mobile devices**, FL powers personalized services such as next-word prediction, speech recognition, and activity tracking, all while preserving user privacy.
'''Industrial IoT'''


**Industrial IoT** applications use FL to train predictive maintenance models and optimize energy usage based on localized sensor data without exposing proprietary processes.
Factories and energy systems use FL to detect faults and optimize operations without exposing proprietary information.


=== Challenges ===
=== Challenges ===


While FL offers clear advantages, several barriers remain:
FL faces several deployment issues:
 
'''Scalability'''
 
Large networks require efficient coordination, especially when devices are frequently offline or have variable resources.
 
'''Data Heterogeneity'''
 
Client data is often unbalanced and varies in quality. This affects model convergence and generalization.
 
'''Security Threats'''


**Scalability** is hindered by device heterogeneity, dropout, and asynchronous updates. Solutions include hierarchical aggregation and adaptive scheduling algorithms.
Poisoning attacks, model backdoors, and inference threats require robust defenses like anomaly detection and trusted execution environments.


**Data heterogeneity** is a core challenge—clients may have very different data types and distributions. Personalized FL techniques aim to address this through client-specific model tuning.
'''Incentives'''


**Security risks** like poisoning, inference attacks, and Sybil attacks demand robust aggregation schemes, anomaly detection, and secure hardware environments.
Devices use battery and compute to participate in FL. Fair reward systems and contribution scoring are under active research.


**Incentivization** is also critical. Clients expend computational resources, so fair reward systems (e.g., token-based models) are needed to encourage participation.
'''Interoperability'''


**Interoperability** across device platforms, networks, and frameworks remains an engineering challenge, necessitating standards for FL APIs, data formats, and deployment protocols.
FL systems must work across various device types, operating systems, and network conditions. Standard APIs and lightweight FL libraries are crucial.


=== References ===
=== References ===

Revision as of 00:20, 2 April 2025

Federated Learning in Edge Computing

Overview

Federated Learning (FL) is a decentralized approach to training machine learning models. In FL, edge devices such as smartphones, sensors, or IoT gateways train models locally and only share model updates, not raw data. This minimizes privacy risks and reduces communication overhead.

When combined with Edge Computing (EC)—which brings computational power closer to the data source—FL enables privacy-preserving, real-time intelligence across distributed systems. It is especially useful in domains like healthcare, smart cities, autonomous vehicles, and industrial IoT where data sensitivity and latency are key concerns.

Background

In traditional machine learning, data is collected in a central server for training. This becomes inefficient and risky when devices generate massive volumes of private data at the edge. FL solves this problem by keeping data local.

A typical FL cycle involves these steps:

  1. The central server sends the global model to selected clients.
  2. Each client trains the model on its own local dataset.
  3. The client sends back updated model parameters to the server.
  4. The server aggregates the updates into a new global model.
  5. The process repeats for multiple rounds.

The global objective function is:

F(w) = sum over k from 1 to N of [ λ_k * F_k(w) ]

Where: - F_k(w) is the loss function on client k, - λ_k = n_k / n, - n_k is the number of samples on client k, - n is the total number of samples across all clients.

This ensures clients with more data influence the model proportionally.

Architectures

Centralized Federated Learning

In this structure, a central server manages model distribution and aggregation. Clients train locally and send updates to the server. This is easy to implement but creates a single point of failure.

Decentralized Federated Learning

There is no central server. Clients exchange model updates directly using peer-to-peer or blockchain protocols. While more resilient, this method is harder to synchronize and manage.

Hierarchical Federated Learning

Edge servers collect updates from nearby devices, aggregate them, and forward results to the cloud. This reduces latency and balances load between edge and cloud resources.

Aggregation Algorithms

The most common aggregation method is Federated Averaging (FedAvg). Each client trains locally and sends updated weights, which the server averages. The formula is:

w_next = sum over k of [ λ_k * w_k ]

Where: - w_k is the local model from client k, - λ_k is based on the client’s data size.

When client data is highly variable (non-IID), FedAvg struggles. FedProx improves stability by adding a regularization term:

F_k(w) = Local loss + ρ * ||w_k - w_global||^2

Here, ρ is a tuning parameter and w_global is the last global model. This discourages clients from diverging too far from the shared model.

Local model updates also change to include the regularization in their gradient descent.

Communication Efficiency

To reduce transmission costs and support weak network conditions, FL systems use: - Gradient quantization: Sending compressed updates. - Sparsification: Only sending most important updates. - Local update batching: Performing multiple training steps before communicating. - Client sampling: Selecting only a few clients each round.

These techniques save energy and bandwidth without significantly affecting model accuracy.

Comparison: Federated Learning vs Traditional Machine Learning
Feature Federated Learning Traditional Machine Learning
Data Location On-device Centralized
Privacy Risk Low High
Bandwidth Use Low High
Latency Low (local) High (cloud round-trip)
Scalability High (with sampling and compression) Moderate

Privacy and Security

Although FL avoids raw data collection, it is not completely immune to privacy threats. Attackers can reverse-engineer updates to infer sensitive information.

Differential Privacy

This technique adds random noise to updates to obscure individual data points. It satisfies the formula:

P(A(D) ∈ S) ≤ exp(ε) * P(A(D') ∈ S) + δ

Where D and D′ differ by one user's data, ε is the privacy budget, and δ is a small tolerance for failure.

Secure Aggregation

Uses cryptographic techniques so that the server only sees the sum of updates, not each one individually.

Homomorphic Encryption

Allows computations on encrypted updates. For example, with additive encryption:

Enc(a) + Enc(b) = Enc(a + b)

This keeps updates private even during aggregation.

Applications

Healthcare

Hospitals collaboratively train disease diagnosis models without exchanging patient data. This protects privacy and supports regulation compliance.

Autonomous Vehicles

Cars learn from local driving environments and send encrypted updates. These updates help build a shared driving model without exposing personal or location data.

Smart Cities

FL supports distributed learning across traffic lights, pollution sensors, and public safety systems. The result is real-time learning with citizen privacy intact.

Mobile Applications

Apps like keyboard predictors and fitness trackers use FL to improve personalization while keeping your data local.

Industrial IoT

Factories and energy systems use FL to detect faults and optimize operations without exposing proprietary information.

Challenges

FL faces several deployment issues:

Scalability

Large networks require efficient coordination, especially when devices are frequently offline or have variable resources.

Data Heterogeneity

Client data is often unbalanced and varies in quality. This affects model convergence and generalization.

Security Threats

Poisoning attacks, model backdoors, and inference threats require robust defenses like anomaly detection and trusted execution environments.

Incentives

Devices use battery and compute to participate in FL. Fair reward systems and contribution scoring are under active research.

Interoperability

FL systems must work across various device types, operating systems, and network conditions. Standard APIs and lightweight FL libraries are crucial.

References

  1. Abreha, H.G., Hayajneh, M., & Serhani, M.A. (2022). Federated Learning in Edge Computing: A Systematic Survey. Sensors, 22(2), 450.
  2. Lyu, L., Yu, H., & Yang, Q. (2020). Threats to Federated Learning: A Survey. arXiv preprint arXiv:2003.02133.
  3. Li, T., Sahu, A.K., Talwalkar, A., & Smith, V. (2020). Federated Learning: Challenges, Methods, and Future Directions. IEEE Signal Processing Magazine, 37(3), 50–60.
  4. Kairouz, P., et al. (2019). Advances and Open Problems in Federated Learning. arXiv preprint arXiv:1912.04977.