Jump to content

Federated Learning

From Edge Computing Wiki

Overview and Motivation

Federated Learning (FL) is a decentralized machine learning paradigm that enables multiple edge devices referred to as clients to collaboratively train a shared model without transferring their private data to a central location. Each client performs local training using its own dataset and communicates only model updates (such as gradients or weights) to an orchestrating server or aggregator. These updates are then aggregated to produce a new global model that is redistributed to the clients for further training. This process continues iteratively, allowing the model to learn from distributed data sources while preserving the privacy and autonomy of each client. By design, FL shifts the focus from centralized data collection to collaborative model development, introducing a new direction in scalable, privacy-preserving machine learning [1].

The motivation for Federated Learning arises from growing concerns around data privacy, security, and communication efficiency particularly in edge computing environments where data is generated in massive volumes across geographically distributed and often resource-constrained devices. Centralized learning architectures struggle in such contexts due to limited bandwidth, high transmission costs, and strict regulatory frameworks such as the General Data Protection Regulation (GDPR) and the Health Insurance Portability and Accountability Act (HIPAA). FL inherently mitigates these issues by allowing data to remain on-device, thereby minimizing the risk of data exposure and reducing reliance on constant connectivity to cloud services. Furthermore, by exchanging only lightweight model updates instead of full datasets, FL significantly decreases communication overhead, making it well-suited for real-time learning in mobile and edge networks [2].

Within the broader ecosystem of edge computing, FL represents a paradigm shift that enables distributed intelligence under conditions of partial availability, device heterogeneity, and non-identically distributed (non-IID) data. Clients in FL systems can participate asynchronously, tolerate network interruptions, and adapt their computational loads based on local capabilities. This flexibility is particularly important in edge scenarios where devices may differ in processor power, battery life, and storage. Moreover, FL supports the development of personalized and locally adapted models through techniques such as federated personalization and clustered aggregation. These properties make FL not only an effective solution for collaborative learning at the edge but also a foundational approach for building scalable, secure, and trustworthy AI systems that are aligned with emerging demands in distributed computing and privacy-preserving technologies [1][2][3].

Federated Learning Architectures

Federated Learning (FL) can be implemented through various architectural configurations, each defining how clients interact, how updates are aggregated, and how trust and responsibility are distributed. These architectures play a central role in determining the scalability, fault tolerance, communication overhead, and privacy guarantees of a federated system. In edge computing environments, where client devices are heterogeneous and network reliability varies, the choice of architecture significantly affects the efficiency and robustness of learning. The three dominant paradigms are centralized, decentralized, and hierarchical architectures. Each of these approaches balances different trade-offs in terms of coordination complexity, system resilience, and resource allocation.

Visual comparison of Cloud-Based, Edge-Based, and Hierarchical Federated Learning architectures. Source: [1]

Centralized Architecture

In the centralized FL architecture, a central server or cloud orchestrator is responsible for all coordination, aggregation, and distribution activities. The server begins each round by broadcasting a global model to a selected subset of client devices, which then perform local training using their private data. After completing local updates, clients send their modified model parameters usually in the form of weight vectors or gradients back to the server. The server performs aggregation, typically using algorithms such as Federated Averaging (FedAvg), and sends the updated global model to the clients for the next round of training.

The centralized model is appealing for its simplicity and compatibility with existing cloud to client infrastructures. It is relatively easy to deploy, manage, and scale in environments with stable connectivity and limited client churn. However, its reliance on a single server introduces critical vulnerabilities. The server becomes a bottleneck under high communication loads and a single point of failure if it experiences downtime or compromise. Furthermore, this architecture requires clients to trust the central aggregator with metadata, model parameters, and access scheduling. In privacy-sensitive or high availability contexts, these limitations can restrict centralized FL’s applicability [1].

Decentralized Architecture

Decentralized FL removes the need for a central server altogether. Instead, client devices interact directly with each other to share and aggregate model updates. These peer-to-peer (P2P) networks may operate using structured overlays, such as ring topologies or blockchain systems, or employ gossip-based protocols for stochastic update dissemination. In some implementations, clients collaboratively compute weighted averages or perform federated consensus to update the global model in a distributed fashion.

This architecture significantly enhances system robustness, resilience, and trust decentralization. There is no single point of failure, and the absence of a central coordinator eliminates risks of aggregator bias or compromise. Moreover, decentralized FL supports federated learning in contexts where participants belong to different organizations or jurisdictions and cannot rely on a neutral third party. However, these benefits come at the cost of increased communication overhead, complex synchronization requirements, and difficulties in managing convergence, especially under non-identical data distributions and asynchronous updates. Protocols for secure communication, update verification, and identity authentication are necessary to prevent malicious behavior and ensure model integrity. Due to these complexities, decentralized FL is an active area of research and is best suited for scenarios requiring strong autonomy and fault tolerance [2].

Hierarchical Architecture

Hierarchical FL is a hybrid architecture that introduces one or more intermediary layers—often called edge servers or aggregators between clients and the global coordinator. In this model, clients are organized into logical or geographical groups, with each group connected to an edge server. Clients send their local model updates to their respective edge aggregator, which performs preliminary aggregation. The edge servers then send their aggregated results to the cloud server, where final aggregation occurs to produce the updated global model.

This multi-tiered architecture is designed to address the scalability and efficiency challenges inherent in centralized systems while avoiding the coordination overhead of full decentralization. Hierarchical FL is especially well-suited for edge computing environments where data, clients, and compute resources are distributed across structured clusters, such as hospitals within a healthcare network or base stations in a telecommunications infrastructure.

One of the key advantages of hierarchical FL is communication optimization. By aggregating locally at edge nodes, the amount of data transmitted over wide-area networks is significantly reduced. Additionally, this model supports region-specific model personalization by allowing edge servers to maintain specialized sub-models adapted to local client behavior. Hierarchical FL also enables asynchronous and fault-tolerant training by isolating disruptions within specific clusters. However, this architecture still depends on reliable edge aggregators and introduces new challenges in cross-layer consistency, scheduling, and privacy preservation across multiple tiers [1][3].

Aggregation Algorithms and Communication Efficiency

Aggregation is a fundamental operation in Federated Learning (FL), where updates from multiple edge clients are merged to form a new global model. The quality, stability, and efficiency of the federated learning process depend heavily on the aggregation strategy employed. In edge environments characterized by device heterogeneity and non-identical data distributions choosing the right aggregation algorithm is essential to ensure reliable convergence and effective collaboration.

Federated Learning protocol showing client selection, local training, model update, and aggregation. Source: Adapted from Federated Learning in Edge Computing: A Systematic Survey [1].


Key Aggregation Algorithms

Comparison of Aggregation Algorithms in Federated Learning
Algorithm Description Handles Non-IID Data Server-Side Optimization Typical Use Case
FedAvg Performs weighted averaging of client models based on dataset size. Simple and communication-efficient. Limited No Basic federated setups with IID or mildly non-IID data.
FedProx Adds a proximal term to the local loss function to prevent client drift. Stabilizes training with diverse data. Yes No Suitable for edge deployments with high data heterogeneity or resource-limited clients.
FedOpt Applies adaptive optimizers (e.g., FedAdam, FedYogi) on aggregated updates. Enhances convergence in dynamic systems. Yes Yes Used in large-scale systems or settings with unstable participation and gradient variability.


Aggregation is the cornerstone of Federated Learning (FL), where locally computed model updates from edge devices are combined into a global model. The most widely adopted aggregation method is Federated Averaging (FedAvg), introduced in the foundational work by McMahan et al. FedAvg operates by averaging model parameters received from participating clients, typically weighted by the size of each client’s local dataset. This simple yet powerful method reduces the frequency of communication by allowing each device to perform multiple local updates before sending gradients to the server. However, FedAvg performs optimally only when data across clients is balanced and independent and identically distributed (IID)—conditions rarely satisfied in edge computing environments, where client datasets are often highly non-IID, sparse, or skewed [1][2].

To address these limitations, several advanced aggregation algorithms have been proposed. One notable extension is FedProx, which modifies the local optimization objective by adding a proximal term that penalizes large deviations from the global model. This constrains local training and improves stability in heterogeneous data scenarios. FedProx also allows flexible participation by clients with limited resources or intermittent connectivity, making it more robust in practical edge deployments. Another family of aggregation algorithms is FedOpt, which includes adaptive server-side optimization techniques such as FedAdam and FedYogi. These algorithms build on optimization methods used in centralized training and apply them at the aggregation level, enabling faster convergence and improved generalization under complex, real-world data distributions. Collectively, these variants of aggregation address critical FL challenges such as slow convergence, client drift, and update divergence due to heterogeneity in both data and device capabilities [1][3].

Communication Efficiency in Edge-Based FL

Communication remains one of the most critical bottlenecks in deploying FL at the edge, where devices often suffer from limited bandwidth, intermittent connectivity, and energy constraints. To address this, several strategies have been developed. **Gradient quantization** reduces the size of transmitted updates by lowering numerical precision (e.g., from 32-bit to 8-bit values). **Gradient sparsification** limits communication to only the most significant changes in the model, transmitting top-k updates while discarding negligible ones. **Local update batching** allows devices to perform multiple rounds of local training before sending updates, reducing the frequency of synchronization.

Further, **client selection strategies** dynamically choose a subset of devices to participate in each round, based on criteria like availability, data quality, hardware capacity, or trust level. These communication optimizations are crucial for ensuring that FL remains scalable, efficient, and deployable across millions of edge nodes without overloading the network or draining device batteries [1][2][3].

Privacy Mechanisms

Privacy and data confidentiality are central design goals of Federated Learning (FL), particularly in edge computing scenarios where numerous IoT devices (e.g., hospital servers, autonomous vehicles) gather sensitive data. Although FL does not require the raw data to leave each client’s device, model updates can still leak private information or be correlated to individual data points. To address these challenges, various privacy-preserving mechanisms have been proposed in the literature [1][2][3].

Differential Privacy (DP)

Differential Privacy is a formal framework ensuring that the model’s outputs (e.g., parameter updates) do not reveal individual records. In FL, DP often involves injecting calibrated noise into gradients or model weights on each client. This noise is designed so that the global model’s performance remains acceptable, yet attackers cannot reliably infer any single data sample’s presence in the training set. A step-by-step timeline of DP in an FL context can be summarized as follows: 1. Clients fetch the global model and compute local gradients. 2. Before transmitting gradients, clients add randomized noise to mask specific data patterns. 3. The central server aggregates the noisy gradients to produce a new global model. 4. Clients download the updated global model for further local training. By carefully tuning the “privacy budget” (ε and δ), DP can balance privacy against model utility [1][4].

Secure Aggregation (SecAgg)

Secure Aggregation, or SecAgg, is a protocol that encrypts local updates before they are sent to the server, ensuring that only the aggregated result is revealed. A typical SecAgg workflow includes: 1. Each client randomly splits its model updates into multiple shares. 2. These shares are exchanged among clients and the server in a way that no single party sees the entirety of any update. 3. The server only obtains the sum of all client updates, rather than individual parameters. This approach can thwart internal adversaries who might try to reconstruct local data from raw updates [2]. SecAgg is crucial for preserving confidentiality, especially in IoT-based FL systems where data privacy regulations (GDPR, HIPAA) prohibit raw data exposure.

Homomorphic Encryption and SMPC

Homomorphic Encryption (HE) supports computations on encrypted data without the need for decryption. In FL, a homomorphically encrypted gradient can be aggregated securely by the server, preventing it from seeing cleartext updates. This approach, however, introduces higher computational overhead, which can be burdensome for resource-limited IoT edge devices [3]. Secure Multi-Party Computation (SMPC) is a related set of techniques that enables multiple parties to perform joint computations on secret inputs. In the context of FL, SMPC allows clients to compute sums of model updates without revealing individual updates. Although performance optimizations exist, SMPC remains challenging for large-scale models with millions of parameters [1][5].

IoT-Specific Considerations

In edge computing, IoT devices often capture highly sensitive data (patient records, vehicle sensor logs, etc.). Privacy measures must therefore operate seamlessly on low-power hardware while accommodating intermittent connectivity. For instance, a smart healthcare device storing patient records may use DP-based local training and SecAgg to encrypt updates before uploading. Meanwhile, an autonomous vehicle might adopt HE to guard sensor patterns relevant to real-time traffic analysis. Together, these techniques form a multi-layered privacy defense tailored for distributed, resource-constrained IoT ecosystems [4][5].

System model illustrating privacy-preserving federated learning using homomorphic encryption.
Adapted from Privacy-Preserving Federated Learning Using Homomorphic Encryption.

Security Threats

While Federated Learning (FL) enhances data privacy by ensuring that raw data remains on edge devices, it introduces significant security vulnerabilities due to its decentralized design and reliance on untrusted participants. In edge computing environments, where clients often operate with limited computational power and over unreliable networks, these threats are particularly pronounced.

Model Poisoning Attacks

Model poisoning attacks are a critical threat in Federated Learning (FL), especially in edge computing environments where the infrastructure is distributed and clients may be untrusted or loosely regulated. In this type of attack, malicious clients intentionally craft and submit harmful model updates during the training process to compromise the performance or integrity of the global model. These attacks are typically categorized as either untargeted—aimed at degrading general model accuracy—or targeted (backdoor attacks), where the global model is manipulated to behave incorrectly in specific scenarios while appearing normal in others. For instance, an attacker might train its local model with a backdoor trigger, such as a specific pixel pattern in an image, so that the global model misclassifies inputs containing that pattern, even though it performs well on standard test cases [1][4].

FL's inherent reliance on aggregation algorithms like Federated Averaging (FedAvg), which simply compute the average of local updates, makes it susceptible to these attacks. Since raw data is never shared, poisoned updates can be hard to detect, especially in non-IID settings where variability in updates is expected. Robust aggregation techniques like Krum, Trimmed Mean, and Bulyan have been proposed to resist such manipulation by filtering or down-weighting outlier contributions. However, these algorithms often introduce computational and communication overheads, which are impractical for edge devices with limited power and processing capabilities [2][4]. Furthermore, adversaries can design subtle attacks that mimic benign statistical patterns, making them indistinguishable from legitimate updates. Emerging research explores anomaly detection based on update similarity and trust scoring, yet these solutions face limitations when applied to large-scale or asynchronous FL deployments. Developing lightweight, real-time, and scalable defenses that are effective even under device heterogeneity and unreliable network conditions remains an unresolved challenge in secure edge-based FL [3][4].

Data Poisoning Attacks

Data poisoning attacks target the integrity of Federated Learning (FL) by manipulating the training data on individual clients before model updates are generated. Unlike model poisoning, which corrupts the gradients or weights directly, data poisoning occurs at the dataset level—allowing adversaries to stealthily influence model behavior through biased or malicious data. This includes techniques such as label flipping (e.g., changing labels of one class to another), outlier injection (introducing data points that fall far outside the normal distribution), or clean-label attacks (subtly altering legitimate data so it has harmful effects without obvious artifacts). Since FL relies on the assumption that client data remains private and uninspected, such poisoned data can easily propagate harmful patterns into the global model, particularly in non-IID settings common in edge environments [2][3].

Edge devices are especially vulnerable to this form of attack due to their limited compute and energy resources, which often preclude comprehensive input validation or anomaly detection. In addition, the highly diverse and fragmented nature of data collected at the edge—such as medical readings from wearable sensors or driving behavior from connected vehicles—makes it difficult to establish a clear baseline for identifying poisoned updates. Defense strategies include robust aggregation (e.g., Median, Trimmed Mean), anomaly detection techniques, and differentially private mechanisms that inject random noise to reduce precision targeting. However, these methods come with trade-offs, such as reduced model accuracy or increased system complexity [1][4]. There is currently no foolproof solution to detect data poisoning without violating privacy principles. As FL continues to be deployed in critical domains like healthcare, finance, and smart cities, mitigating data poisoning while preserving user data locality and system scalability remains an open and urgent research challenge [3][4].

Inference and Membership Attacks

Inference attacks represent a subtle yet powerful class of threats in Federated Learning (FL), where adversaries seek to extract sensitive information from shared model updates rather than raw data. These attacks exploit the iterative nature of the FL training process, where clients send gradient updates or model weights to the server. By analyzing these updates—especially in overparameterized models—attackers can infer statistical properties of the underlying data or even reconstruct representative inputs. A well-documented example is the membership inference attack, where an adversary determines whether a specific data point was used in training by observing the model’s behavior on that input. This becomes especially problematic in edge computing environments, where data heterogeneity and limited client datasets make it easier to correlate individual updates with specific users or devices [2][3].

The risk of information leakage through gradient sharing grows in proportion to the model’s complexity and the granularity of updates. In FL, where edge clients often have only a small number of data samples, their updates may reveal disproportionately detailed information. Studies have shown that attackers with access to multiple rounds of updates—especially when clients are selected frequently—can perform input reconstruction using gradient inversion techniques. These attacks pose significant risks in domains such as healthcare, where private data like patient symptoms or diagnoses might be inferred from the model’s training dynamics. Mitigation strategies include differential privacy (DP), which adds noise to updates before transmission to obscure precise information. Secure Aggregation protocols also help by ensuring the server only sees aggregated updates from multiple clients. However, both approaches come with trade-offs: DP reduces model accuracy and requires careful calibration of the privacy budget, while Secure Aggregation adds communication and cryptographic overhead [4]. Designing privacy-preserving FL systems that balance utility, efficiency, and strong protection against inference remains a major challenge, particularly at the scale and variability found in real-world edge networks [1][4].

Sybil and Free-Rider Attacks

Sybil attacks are a serious security concern in Federated Learning (FL), particularly within decentralized or large-scale edge environments. In a Sybil attack, a single adversary creates multiple fake identities—or Sybil nodes—that participate in the FL training process. These fake clients can collude to manipulate the global model by amplifying poisoned updates, skewing consensus, or outvoting honest participants during aggregation. This is especially dangerous in FL systems where client selection is randomized and identity verification mechanisms are either weak or absent. In cross-device FL scenarios, where millions of devices participate and authentication is often lightweight, Sybil attacks can be launched without significant computational cost [1]. By overwhelming the training process with manipulated updates from multiple controlled identities, attackers can degrade model accuracy, insert backdoors, or block convergence altogether.

Mitigating Sybil attacks is challenging due to the inherent privacy constraints of FL. Traditional centralized defenses like identity verification or IP-based throttling may violate the privacy-preserving principles of FL or be infeasible in mobile, disconnected, or edge-network settings. Some defense mechanisms include cryptographic client registration, proof-of-work schemes, or client reputation scoring. However, these introduce computational burdens or trust assumptions that may not hold in edge environments. Techniques such as clustering client updates and identifying similarity patterns can help detect coordinated Sybil behavior, but adversaries can adapt by subtly varying their poisoned updates to mimic honest client diversity [4].

Free-rider attacks, while less destructive than Sybil attacks, undermine the collaborative foundation of FL. In this scenario, a client participates in the FL protocol but contributes little to no useful computation—e.g., by sending stale updates, randomly initialized models, or dummy gradients—while still downloading the improved global model and benefiting from it. Free-riders reduce overall model quality and fairness, especially in resource-constrained settings like IoT networks, where honest clients expend real bandwidth, battery, and computation to train the model [3]. Addressing free-riding behavior often involves contribution-aware aggregation (e.g., weighting updates based on gradient quality or model improvement) or audit mechanisms that assess client effort over

Malicious Server Attacks

In Federated Learning (FL), especially in the classical centralized architecture, the server plays a critical role in coordinating the learning process by aggregating updates from clients and distributing the global model. However, this central position also makes the server a powerful threat vector if it becomes compromised or malicious. A malicious server can violate confidentiality by launching inference attacks on the collected client updates, attempt to reconstruct local data through gradient inversion, or identify statistical properties of the client datasets. Additionally, the server can tamper with model integrity by selectively dropping updates from honest clients, modifying the aggregation process, or injecting adversarial updates into the global model. This single point of failure is particularly concerning in sensitive applications such as healthcare, finance, and autonomous vehicles, where edge clients rely on trusted model updates [1][3].

To defend against these threats, a variety of cryptographic and architectural strategies have been proposed. Secure Aggregation protocols ensure that the server only receives the aggregated sum of updates from clients, preventing it from accessing individual contributions. However, this protection assumes that clients are non-colluding and connected simultaneously, which can be difficult to guarantee in dynamic edge networks. Homomorphic encryption (HE) offers another layer of defense by enabling computations on encrypted data, but its computational cost is often prohibitive for resource-limited edge devices. Similarly, Secure Multi-Party Computation (SMPC) allows multiple clients to jointly compute global updates without revealing local data, but requires heavy communication and coordination overhead [4]. Decentralized or hierarchical FL architectures also aim to reduce the central server’s authority by distributing aggregation roles across trusted intermediaries or peer clients. However, these designs introduce new challenges such as ensuring consensus, managing trust across multiple layers, and maintaining training efficiency [2][4]. Balancing security, scalability, and efficiency remains an open challenge in FL systems—particularly as deployment expands across distributed edge environments where centralized trust cannot be easily assumed.

References

  1. Gabrielli, E., Pica, G., Tolomei, G. A Survey on Decentralized Federated Learning, 2023.
  2. Kairouz, P., et al. Advances and Open Problems in Federated Learning, 2021.
  3. Nguyen, D. C., et al. Federated Learning for Internet of Things: A Comprehensive Survey, IEEE Communications Surveys & Tutorials, vol. 23, no. 3, 2021.
  4. Abreha, H. G., Hayajneh, M., Serhani, M. A. Federated Learning in Edge Computing: A Systematic Survey, Sensors, vol. 22, 2022.
  5. Pinyoanuntapong, P., et al. EdgeML: Towards Network-Accelerated Federated Learning over Wireless Edge, 2022.