Federated Learning

Overview and Motivation

Federated Learning (FL) is a decentralized machine learning paradigm that enables multiple edge devices referred to as clients to collaboratively train a shared model without transferring their private data to a central location. Each client performs local training using its own dataset and communicates only model updates (such as gradients or weights) to an orchestrating server or aggregator. These updates are then aggregated to produce a new global model that is redistributed to the clients for further training. This process continues iteratively, allowing the model to learn from distributed data sources while preserving the privacy and autonomy of each client. By design, FL shifts the focus from centralized data collection to collaborative model development, introducing a new direction in scalable, privacy-preserving machine learning [1].

The motivation for Federated Learning arises from growing concerns around data privacy, security, and communication efficiency particularly in edge computing environments where data is generated in massive volumes across geographically distributed and often resource-constrained devices. Centralized learning architectures struggle in such contexts due to limited bandwidth, high transmission costs, and strict regulatory frameworks such as the General Data Protection Regulation (GDPR) and the Health Insurance Portability and Accountability Act (HIPAA). FL inherently mitigates these issues by allowing data to remain on-device, thereby minimizing the risk of data exposure and reducing reliance on constant connectivity to cloud services. Furthermore, by exchanging only lightweight model updates instead of full datasets, FL significantly decreases communication overhead, making it well-suited for real-time learning in mobile and edge networks [2].

Within the broader ecosystem of edge computing, FL represents a paradigm shift that enables distributed intelligence under conditions of partial availability, device heterogeneity, and non-identically distributed (non-IID) data. Clients in FL systems can participate asynchronously, tolerate network interruptions, and adapt their computational loads based on local capabilities. This flexibility is particularly important in edge scenarios where devices may differ in processor power, battery life, and storage. Moreover, FL supports the development of personalized and locally adapted models through techniques such as federated personalization and clustered aggregation. These properties make FL not only an effective solution for collaborative learning at the edge but also a foundational approach for building scalable, secure, and trustworthy AI systems that are aligned with emerging demands in distributed computing and privacy-preserving technologies [1][2][3].

Federated Learning Architectures

Federated Learning (FL) systems can be organized under several architectural paradigms, depending on the deployment scale, communication topology, fault tolerance, and infrastructure availability. The most basic and widely adopted structure is the centralized architecture. In this configuration, a central server is responsible for coordinating the training process by broadcasting the global model to a set of participating clients and collecting their local updates. After receiving client updates, the server performs model aggregation and redistributes the improved global model for the next round. While centralized FL is relatively simple to implement and effective in small-to-medium scale systems, it introduces a single point of failure, poses trust and privacy risks if the server is compromised, and may become a bottleneck in large-scale deployments [1].

To address these limitations, decentralized FL architectures eliminate the need for a central coordinator. In this model, clients communicate directly with one another, either through peer-to-peer (P2P) interactions or overlay networks such as blockchain. Updates are exchanged and aggregated locally among peers, often relying on consensus mechanisms to maintain global model consistency. Decentralized FL improves system robustness and transparency by removing centralized trust dependencies. However, this model presents unique challenges, including increased communication overhead, the complexity of model synchronization, and difficulties in maintaining convergence across asynchronous and potentially unreliable nodes. As such, decentralized approaches are more suitable for applications that demand strong resilience or involve multiple mutually distrustful entities [2].

An alternative and scalable design for FL in edge environments is hierarchical architecture. This model introduces intermediate edge servers—also known as regional aggregators—between the clients and the cloud. Clients first send their local updates to their nearest edge server, which performs preliminary aggregation. The aggregated updates from edge servers are then forwarded to a central or cloud-based server for final aggregation. Hierarchical FL significantly reduces network load, lowers latency, and supports more efficient bandwidth utilization in geographically distributed systems. It also enables scalability by allowing aggregation to be parallelized at multiple levels. This structure is particularly effective in edge computing scenarios where devices are organized in clusters (e.g., per region or subnet), and where communication costs between edge and cloud servers can be optimized through localized aggregation [1][3].