Edge Computing Wiki - User contributions [en-gb]

Machine Learning at the Edge

2025-04-25T05:18:23Z

Ciarang: /* Dynamic Models */

='''Machine Learning at the Edge'''=

=='''4.1 Overview of ML at the Edge'''==
==='''Introduction'''===
Machine Learning is a branch of Artificial Intelligence that is primarily concerned with the training of algorithms to look at sets of data, analyze patterns, and generate conclusions so that the gathered data can be used to generate results and carry out tasks. As an application, machine learning is especially relevant when considering edge computing [7]. Given the recent prevalence and rise in the field of machine learning, it is inevitable that deployment of machine learning algorithms will play a crucial role in the function of edge computing systems as well. The abundant sensor networks often involved in edge systems provide a means of gathering very large amounts of data. With the help of machine learning, such data can be utilized in very effective ways and automation methods can be employed to improve efficiency. Machine learning on edge devices themselves can help ease the burden on cloud systems and the networks connected to them as well, especially if the model does need such large computational power such as in the case of certain Small Language Models (SLMs). However, the main issue regarding machine learning at the edge lies in the devices, which may have limited computational power, and the environment, which may change dynamically and be effected by network congestion, device outages, or other unexpected events. If these challenges can be overcome, edge systems could play a pivotal role in advancing machine learning and utilize it to maximize efficiency.

[[File:ECML.png|600px|thumb|center|]]

==='''Benefits of Machine Learning using Edge Computing'''===
'''Data Volumes:''' Traditional cloud computing architecture may not be able to keep up with the massive volume of data that is often generated by IoT devices and sensors, thus increasing costs as well as pressure and congestion on the networks transmitting data [5]. With edge computing architectures, machine learning can be deployed in a distributed nature, allowing ML models to be deployed and trained across many edge devices. The nature of edge devices sharing computational tasks makes it perfect for taking large datasets and splitting the computational load so that many devices can compute what a single device might not have been able to handle [5]. More than just splitting input data across edge devices, machine learning models can be split up across edge devices as well. This allows for deployment of models that would be otherwise too large to be deployed on a single edge device with limited memory. In such cases, edge nodes are able to collaboratively pass data between them, and the global ML model is later updated based on the workings of each smaller portion [5]. Additionally, the sheer amount of data that processed from edge devices means more training data for machine learning models, which generally increases accuracy, and thus edge computing could be an effective solution to providing lots of good and relevant data for a variety of applications.

'''Lower Latency:''' For certain real-time applications, such as Virtual and Augmented Reality(VR/AR) or smart cars, the latency required to transmit data and process it to the cloud may be too high to make these applications efficient and safe. A key benefit of edge computing lies in the reduced latency provided by putting devices closer to the users, and this is applicable to machine learning as well [5].

'''Enhanced privacy:''' Processing data locally rather than on the cloud reduces the risk of data theft and enhances privacy. The data does not have to be sent to anyone else, or go over a network to potentially get compromised. This is crucial given the large amounts of data needed for machine learning, and especially if personal data is needed for a personal application, many users would prefer the privacy done by having it processed locally [5].

==='''Challenges of Machine Learning Using Edge Computing'''===

'''Potentially Low Computational Power:''' Many edge devices lack the computational power needed for deep learning applications, especially given the large amount of data and complex operations that may have to take place [1].

'''Energy Management:''' Given that edge devices may consist of sensors, or other devices that are meant to have low energy consumption, the tasks associated with machine learning could quickly drain their power, even if they have sufficient power to run the workloads [6].

'''New Attack Surfaces:''' With more devices comes the increased potential for malicious hackers to steal data or compromise devices, despite the enhanced privacy. Encryption, access controls, and other methodologies must be employed to mitigate the potential for new attack surfaces to be exploited [6].

==='''Applications for ML at the Edge'''===
The utilization of data to make conjectures about what is going on in the environment and how to respond has a variety of use cases that can greatly benefit people, cities, and the environment. By leveraging and monitoring a constant stream of data and training machine learning models to detect or even respond to different events, there can be many practical applications for such systems. These would rely on the combination of edge devices and machine learning to better enhance experience for users and detect events of interest.

'''Self-Driving Cars''': Imitation learning can be leveraged to better understand and emulate human driving. The low latency that can be provided by edge computing is especially useful for the quick decision making needed by these systems [7].

'''Smart Home Devices''': Understanding user habits by leveraging the available data for them can make smart homes more convenient for users. The increased privacy that can come with edge computing, along with a few extra cybersecurity measures, can ensure the personal data that may be used for training is not compromised.

'''Environmental and Industrial Monitoring:''' Sensors deployed in environments and industrial settings could be trained to recognize when there are anomalies or undesirable behavior, thus ensuring a quick response and active information sending [7].

'''Smart Cities:''' Similar to above, the data collected by sensors in cities can leverage machine learning to help in crime and emergency detection, traffic management, or energy management[7].

=='''4.2 ML Training at the Edge'''==

Machine Learning (ML) training at the edge is basically the process of developing, updating, or fine-tuning ML models directly on edge devices like on smartphones, IoT sensors, wearables, and other embedded systems instead of only depending on centralized cloud infrastructure. This approach is becoming a lot more important as the demand for real-time, personalized AI applications continues to grow. By being able to train models closer to where the data is generated, edge-based ML enables faster responses, helps reduce latency, and enhances user privacy by minimizing the need to transmit sensitive data to the cloud. It’s also especially useful in scenarios where devices operate in environments with limited or unreliable network connectivity, allowing them to function more efficiently.

[[File:Machine Learning Pipeline.png|500px|thumb|right|Diagram of the machine learning pipeline: raw structured and unstructured data is preprocessed, analyzed, and used for feature selection, followed by model construction, evaluation, and deployment for application use.]]

[[File:Learning.png|500px|thumb|center| This diagram shows the relationship among different devices in the edge-cloud system. As shown, the deep learning workload is partitioned among all layers so that each one contributes based on its advantages.]]

==='''Benefits:'''===
One significant advantage of training ML models directly on edge devices is reduced latency. By processing data locally, devices can make immediate decisions without the delays caused by transmitting data back and forth to cloud servers. This immediate responsiveness is extremely important for applications like real time health monitoring, autonomous driving, and industrial automation.

Additionally, training machine learning models at the edge significantly enhances user privacy. Since sensitive data can be processed and stored directly on the user's device rather than being sent to centralized cloud servers, the risk of data breaches or unauthorized access during transmission is reduced by a lot. This local data handling is able to prevent exposure of personal or confidential information, providing users greater control over their data. Edge-based training naturally aligns with privacy regulations such as the General Data Protection Regulation (GDPR), which emphasizes strict data security, transparency, and explicit user consent. By keeping personal data localized, edge training not only improves security but also helps organizations easily comply with privacy laws, protecting users’ rights and maintaining trust.

Efficiency and resilience are important benefits of edge training. By training machine learning models directly on edge devices, these devices become capable of processing data locally without relying on constant internet connectivity. This local processing allows edge devices to continue operating effectively even in environments where network connections are weak, unstable, or completely unavailable. Because they are not fully dependent on cloud infrastructure, edge devices can quickly adapt to changes, respond in real-time, and update their ML models based on immediate local data. As a result, edge training ensures reliable performance and uninterrupted operation, making it particularly valuable for remote locations, emergency scenarios, and harsh environments where cloud-based solutions might fail or become unreliable.

'''Examples: '''
A smart thermostat in a home can learn a user’s preferences for temperature and adjust automatically based on real-time inputs, like time of day or weather conditions. Similarly, a fitness tracker can track user activity patterns and adapt its recommendations for workouts or rest periods based on how the user is performing each day. These devices don’t need to rely on cloud servers to update or personalize their behavior — they can do it instantly on the device, which makes them more responsive and efficient.

In smart agriculture, edge computing is used to enhance crop monitoring and optimize farming practices. Devices like soil sensors, drones, and automated irrigation systems are equipped with sensors that collect data on soil moisture, temperature, and crop health. Edge devices process this data locally, enabling real-time decisions for tasks like irrigation, fertilization, and pest control.

In smart retail, edge computing is used to improve inventory management and customer experience. Retailers use smart shelves, RFID tags, and in-store cameras equipped with sensors to track inventory and monitor customer behavior. By processing this data locally on edge devices, retailers can manage stock levels, detect theft, and optimize store layouts in real-time. RFID tags placed on products can detect when an item is removed from the shelf. Using edge processing, the system can immediately update the inventory count and trigger a restocking request if an item’s stock is low.

'''Research Papers:'''
An important contribution to the understanding of machine learning (ML) training at the edge is the research paper "Making Distributed Edge Machine Learning for Resource-Constrained Communities and Environments Smarter: Contexts and Challenges" by Truong et al. (2023). This paper focuses on training ML models directly on edge devices in communities and environments facing limitations, such as unstable network connections, limited computational resources, and scarce technical expertise. The authors emphasize the necessity of developing context-aware ML training methods specifically tailored to these environments. Traditional centralized ML training methods often fail to operate effectively in such constrained settings, highlighting the need for decentralized, localized solutions. Truong et al. explore various challenges, including managing data efficiently, deploying suitable software frameworks, and designing intelligent runtime strategies that allow edge devices to train models effectively despite limited resources. Their work points out significant research directions, advocating for more adaptable and sustainable ML training solutions that genuinely reflect the technological and social contexts of resource-limited environments.

==='''Tools and Frameworks:'''===

Frameworks like TensorFlow Lite, PyTorch Mobile, and Edge Impulse are designed to support edge-based model training and inference. These tools allow developers to build and fine-tune models specifically for deployment on low-power devices.

==='''Technical Challenges:'''===

Despite its advantages, ML training at the edge presents challenges, including limited processing power, memory constraints, and energy efficiency. Edge devices often lack the computational resources of cloud servers, requiring lightweight models, optimized algorithms, and energy-efficient hardware.

Real World Applications:
A well known example is Apple’s use of on-device training for personalized voice recognition with Siri. Instead of uploading user voice data to the cloud, Apple uses local training to improve accuracy over time while maintaining user privacy.

==='''Model Compression Techniques'''===
Despite the challenges, of ML at the edge, there are a variety of methods that can be used to provide a more efficient means of training, and making the heavy workloads compatible with even the limited computing power of certain edge devices.

'''Quantization:'''
Quantization is a method that involves reducing the precision of numbers, and thus easing the burden of computational power as well as memory management on the edge devices. There are multiple forms of quantization, but each one essentially sacrifices some precision - enough so that accuracy is mostly maintained but the numbers are easier to handle. For example, converting from floating point to integer datatypes means significantly less memory is used, and the differences for some models in the precision may be negligible. Another example is K-means based Weight Quantization, which involves creating a matrix and grouping similar numbers together with centroids. An example is shown below:

[[File:Screenshot_2025-04-24_170604.png|500px|thumb|right|By clustering each index in the matrix, and using centroids to approximate, the overall computations can be done much quicker and are more easily handled by edge devices]]

In recent work, Quantized Neural Networks (QNNs) have demonstrated that even extreme quantization—such as using just 1-bit values for weights and activations—can retain near state-of-the-art accuracy across vision and language tasks [12]. This type of quantization drastically reduces memory access requirements and replaces expensive arithmetic operations with fast, low-power bitwise operations like XNOR and popcount. These benefits are especially important for edge deployment, where energy efficiency is critical. In addition to model compression, Hubara et al. also show that quantized gradients—using as little as 6 bits—can be employed during training with minimal performance loss, further enabling efficient on-device learning [12]. QNNs have achieved strong results even on demanding benchmarks like ImageNet, while offering significant speedups and memory savings, making them one of the most practical solutions for edge AI deployment [12].

'''Pruning:'''
Pruning is an optimization technique that systematically removes low-salience parameters—such as weakly contributing weights or redundant hypothesis paths—from a machine learning model or decoding algorithm to reduce computational overhead. In the context of edge computing, where resources like memory bandwidth, power, and processing time are limited, pruning enables the deployment of performant models within strict efficiency constraints.

In statistical machine translation (SMT) systems, pruning is particularly critical during the decoding phase, where the search space of possible translations grows exponentially with sentence length. Techniques such as histogram pruning and threshold pruning are employed to manage this complexity. Histogram pruning restricts the number of candidate hypotheses retained in a decoding stack to a fixed size 𝑛, discarding the remainder. Threshold pruning eliminates hypotheses whose scores fall below a proportion 𝛼 of the best-scoring candidate, effectively filtering out weak candidates early.

The paper by Banik et al. introduces a machine learning-based dynamic pruning framework that adaptively tunes pruning parameters—namely stack size and beam threshold—based on structural features of the input text, such as sentence length, syntactic complexity, and the distribution of stop words. Rather than relying on static hyperparameters, this method uses a classifier (CN2 algorithm) trained on performance data to predict optimal pruning configurations at runtime. Experimental results showed consistent reductions in decoding latency (up to 90%) while maintaining or improving translation quality, as measured by BLEU scores [13].

This adaptive pruning paradigm is highly relevant to edge inference pipelines, where models must maintain a balance between latency and predictive accuracy. By intelligently limiting the hypothesis space and focusing computational resources on high-probability paths, pruning supports real-time, resource-efficient processing in edge NLP and embedded translation systems.

[[File:Pruning.png|350px|thumb|center|This shows how pruning can significantly reduce the overall network, thus leading to better computational and memory management]]

'''Distillation:'''
Distillation is a key strategy for reducing model complexity in edge computing environments. Instead of training a compact student model on hard labels—discrete class labels like 0, 1, or 2—it is trained on the soft outputs of a larger teacher model. These soft labels represent probability distributions over all classes, offering more nuanced supervision. For instance, rather than telling the student the input belongs strictly to class 3, a teacher might output “70% class 3, 25% class 2, 5% class 1.” This richer feedback helps the student model capture subtle relationships between classes that hard labels miss. Beyond reducing computational demands, distillation enhances generalization by conveying more informative training signals. It also benefits from favorable data geometry—when class distributions are well-separated and aligned—and exhibits strong monotonicity, meaning the student model reliably improves as more data becomes available [11]. These properties make it exceptionally suited for edge devices where training data may be limited, but efficient inference is crucial.

In most cases, knowledge distillation in edge environments involves a large, high-capacity model trained in the cloud acting as the teacher, while the smaller, lightweight student model is deployed on edge devices. A less common—but emerging—practice is edge-to-edge distillation, where a more powerful edge node or edge server functions as the teacher for other nearby edge devices. This setup is especially valuable in federated, collaborative, or hierarchical edge networks, where cloud connectivity may be limited or privacy concerns necessitate local training. Distillation can also be combined with techniques such as quantization or pruning to further optimize model performance under hardware constraints. An example is shown below:

[[File:Knowledge_Distillation.png|700px|thumb|center|This shows how a complex teacher model transfers learned knowledge to a smaller student model using soft predictions to enable efficient edge deployment]]

{| class="wikitable" style="width:100%; text-align:left;"
|+ '''Comparison of Model Compression Techniques for Edge Deployment'''
! Technique
! Description
! Primary Benefit
! Trade-offs
! Ideal Use Case
|-
| '''Pruning'''
| Removes unnecessary weights or neurons from a neural network.
| Reduces model size and computation.
| May require retraining or fine-tuning to preserve accuracy.
| Useful for deploying models on devices with strict memory and compute constraints.
|-
| '''Quantization'''
| Converts high-precision values (e.g., 32-bit float) to lower precision (e.g., 8-bit integer or binary).
| Lowers memory usage and accelerates inference.
| Risk of precision loss, especially in very small or sensitive models.
| Ideal when real-time inference and power efficiency are essential.
|-
| '''Distillation'''
| Trains a smaller model (student) using the output probabilities of a larger, more complex teacher model.
| Preserves performance while reducing model complexity.
| Requires access to a trained teacher model and additional training data.
| Effective when deploying accurate, lightweight models under data or resource constraints.
|}

==='''Usage and Applications of AI Agents'''===
As artificial intelligence and machine learning technologies continue to mature, they pave the way for the development of intelligent AI agents capable of autonomous, context-aware behavior, with the goal of efficiently performing tasks specified by users. These agents combine perception, reasoning, and decision-making to execute tasks with minimal human intervention. When deployed on edge devices, AI agents can operate with low latency, preserve user privacy, and adapt to local data—making them ideal for real-time, personalized applications in homes, vehicles, factories, and beyond.

To function effectively, an agent must first perceive its environment and understand the task—often defined by the user. Then, it must reason about the optimal steps to accomplish that task, and finally, it must act on those decisions. These three components—perception, reasoning, and action—are essential to the agent’s ability to operate accurately and autonomously in dynamic environments.

'''Reasoning:''' The agent must be able to think sequentially, and decompose its specified tasks into a sequence of specific steps in order to accomplish its goal. It must also have some memory storage in order to remember what it has done, as well as the results of its sequence of actions in order to learn for future steps.

'''Autonomy:''' The agent must choose from the availability of possible steps, and operate based on its reasoning without step-by-step instructions from the user.

'''Tools:''' These tasks, however, are impossible to accomplish without the correct tools. Even if an AI agent understands how to go about carrying a task for optimal results, it must have the actual means to do it. This can include the ability to use and interact with APIs, interpret code, and access certain databases.

Utilizing AI agents on edge devices can be tricky due to the computational and reasoning power needed. However, there are methods to accomplish this such as SLMs which query LLMs as needed (discussed later), or utilizing more powerful edge devices to carry out tasks. However, utilizing edge devices can be paramount if latency is a major issue, or if the agent is exposed to sensitive user data. Additionally, by using edge devices specific to a user, it may be able to better learn a user's patterns and preferences and react accordingly to provide the best possible outcome for that user.

=='''4.3 ML Model Optimization at the Edge'''==

==='''The Need for Model Optimization at the Edge'''===
Given the constrained resources and the inherently dynamic environments in which edge devices must operate, model optimization is a crucial part of machine learning in edge computing. The current most widely used methodology consists of simply specifying an exceptionally large set of parameters, and giving it to the model to train on. This can be feasible when hardware is very advanced and powerful, and is necessary for systems such as Large Language Models (LLMs). However, this is no longer viable when dealing with the devices and environments at the edge. It is crucial to identify the best parameters and training methodology so as to minimize the amount of work done by these devices, while compromising as little as possible on the accuracy of the models. There are multiple ways to this, and they include either optimization or augmentation of the dataset itself, or optimization of the partition of work among the edge devices.

==='''Edge and Cloud Collaboration'''===
One methodology that is often used involves collaboration between both Edge and Cloud Devices. The cloud has the ability to process workloads that may require much more resources and cannot be done on edge devices. On the other hand, edge devices, which can store and process data locally, may have lower latency and more privacy. Given the advantages of each of these, many have proposed that the best way to handle machine learning is through a combination of edge and cloud computing.

The primary issue facing this computing paradigm, however, is the problem of optimally selecting which workloads should be done on the cloud and which should be done on the edge. This is a crucial problem to solve, as the correct partition of workloads is the best way to ensure that the respective benefits of the devices can be leveraged. A common way to do this, is to run certain computing tasks on the necessary devices and determine the length of time and resources that it takes. An example of this is the profiling step done in [https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10818760&tag=1 EdgeShard] [1] and [https://dl.acm.org/doi/pdf/10.1145/3093337.3037698 Neurosurgeon] [4]. Other frameworks implement similar steps, where the capabilities of different devices are tested in order to allocate their workloads and determine the limit at which they can provide efficient functionality. If the workload is beyond the limits of the devices, it can be sent to the cloud for processing

The key advantage of this is that it is able to utilize the resources of the edge devices as necessary, allowing increased data privacy and lower latency. Since workloads are only processed in the cloud as needed, this will reduce the overall amount of time needed for processing because data is not constantly sent back and forth. It also allows for much less network congestion, which is crucial for many applications.

[[File:ECcollab.png|400px|thumb|center|The collaboration of Edge and Cloud Devices]]

==='''Optimizing Workload Partitioning'''===
The key idea for much of the optimization done in machine learning on edge systems involves fully utilizing the heterogenous devices that are often contained in these systems. As such, it is important to understand the capabilities of each device so as to fully utilize its advantages. Devices can very greatly from smartphones with more powerful computational abilities to raspberry pis to sensors. More difficult tasks are offloaded to the powerful devices, while simpler tasks, or models that have been somewhat pretrained can be sent to the smaller devices. In some cases, as in [https://ieeexplore.ieee.org/abstract/document/8690980 Mobile-Edge] [2], the task may be dropped altogether if the resources are deemed insufficient. In this way, exceptionally difficult tasks do not block tasks that have the ability to be executed and therefore the system can continue working.

==='''Dynamic Models'''===
Given the dynamic nature of the environments that edge devices must function, as well as the heterogeneity of the devices themselves, a dynamic model of machine learning is often employed. Such models must keep track of the current available resources including computation usage and power, as well as network traffic. These may change very often depending on the workloads and devices in the system. As such, training models to continuously monitor and dynamically distribute the workloads is a very important part of optimization. Simply offloading larger tasks to more powerful devices may be obsolete if the devices has all of its computing resources or network capabilities being used up by another workload.

The way this is commonly done is by using the profiling step described above as a baseline. Then, a machine learning model utilizes the data to estimate the performance of devices and/or layers. During runtime, a similar process is employed which may update the data used and help the model refine its predictions. Network traffic is also taken into account at this stage in order to preserve the edge computing benefit of providing lower latency. Using all of this data and updates at runtime, the partitioning model is able to dynamically distribute workloads at runtime in order to optimize the workflow and ensure each device is utilizing its resources in the most efficient manner. Two very good examples of how such a system is specifically deployed are the Neurosurgeon and EdgeShard systems, shown above.

==='''Horizontal and Vertical Partitioning'''===
There are 2 major ways that these models split the workloads in order to optimize the machine learning: Horizontal and vertical partitioning [3]. Given a set of layers that ranges from the cloud to edge, vertical partitioning involves splitting up the workload between the layers. For example, if a large amount of computational resources is deemed necessary, this task may go to the cloud to be completed and preprocessed. One the other hand, if a small amount of computational power is required, this type of work can go to edge devices. Such partitioning also depends on the confidence and accuracy level of the given learning. If the accuracy is completed on an edge device and found to be very low, it can be sent to the cloud; on the other hand if the accuracy is already fairly high and the learning model needs smaller work to reach the threshold deemed acceptable, it may be sent to edge devices to free up network traffic on the cloud and reduce latency [3].

The second model of partitioning is called horizontal partitioning. This involves splitting among the devices within a certain layer rather than among the layers themselves. This is similar to what has been described in previous sections, as it allows a means for fully utilizing the heterogenous abilities that are found in edge devices. Similar functionality and determination to what is found in horizontal partitioning is done, but all of the devices that the workload is split across function on the same layer [3]. To fully optimize a machine learning model, both horizontal and vertical partitioning must be used.

[[File:Edge_computing_layers.png|400px|thumb|center|An example of different layers with multiple devices]]

==='''Distributed Learning'''===
Distributed learning is a process in which instead of giving all of the data to the nodes, each node takes in and processes only a subset of the overall data, then sends its results to a central server which contains the main model. These periodic updates are done by each node, and by only processing part of the data, it is much more easy for edge devices to handle these workloads. It can also reduce the time and computational burden on the cloud and network because it is not only the central server performing all of the computations. One popular method of accomplishing this is by using parallel gradient descent. Gradient descent by itself is a very useful means of training a model, and parallel gradient descent uses a similar process but instead of a single operation, it aggregates the gradient descent calculated by each node in order to update the central model. This efficiently utilizes each node and makes sure that the data that is used to update the model does not exceed the memory constraints of the edge devices being used as nodes.

[[File:Distributed.png|600px|thumb|center| A visualization of data partitioning among nodes doing distributed learning]]

'''Asynchronous Scheduling:''' One important aspect of distributed learning is the means by which the updates are given to the model. Each node may have different amounts of data, memory, and computational power, and therefore the time it takes to process and update will not be the same. Synchronous algorithms make sure that each node finishes its respective processing, and then all of the calculations from all nodes are sent to update the central model. Although, this can make the updating easier, any nodes that lag behind the others will greatly reduce the speed at which the model is trained; this also leaves nodes idle while others are still processing and can be highly inefficient. To optimize this, after each node has finished its respective processing, it sends everything to the main server for an update rather than waiting for all the others to finish. However, this requires some more communication, as after each update, every node must get the updated model for the central server; this can be challenging to repeatedly do, but the efficiency that is gained is often worth it.

'''Federated learning:''' Federated learning is another means by which the data as well as computational burden is distributed among a set of nodes, thus reducing network traffic and strain on the central servers. It is similar to distributed learning, but the key difference lies in the data partitioning. Unlike distributed learning, data used in federated learning is not shared with the central servers. Instead, only local data from each of the nodes is used to train the model. Then, the only part shared with the central model is the updated parameters. A key aspect of this is that federated learning provides a greater amount of data privacy, which is crucial for certain applications dealing with sensitive data. Therefore, it is especially useful to utilize edge devices to perform federated learning. Federated learning is discussed in detail in chapter 5 of this wiki.

==='''Transfer Learning'''===
Transfer learning is a method of machine learning in which a pretrained model is sent to the different nodes for further processing and fine-tuning. The initial training of the model may involve a very large amount of data and could place a major burden on the device that must execute it, which may not be feasible for edge devices. Therefore, the bulk of the model training is done by a more powerful machine, such as the cloud, and then the pretrained model is sent to the edge device. This can be useful, as it reduces the computational burden on the edge device, and allows it to fine-tune the model using the data it collects without having to completely train the whole system. One form of this is knowledge distillation, in which a smaller model can be trained to mimic that of a larger model. This may often be the case when edge and cloud systems are used in a combined way.

==='''Methods of Data Optimization'''===

'''Data Selection:''' Certain types of data may be more useful than others, and therefore the ones that most affect the accuracy of a model can be offloaded to edge devices. This decreases the workload on them because less data must be processed, while also conserving the accuracy of the model as much as possible. Some larger data may not be needed, such as only putting Small Language Models on edge devices that can handle simple commands and prompts, rather than having to offload an entire LLM onto the device, which may overload its computational abilities and not provide enough use.

'''Data Compression:''' Data can be compressed to a smaller form in order to fit the constraints of edge devices. This is especially true given their limited memory, and also makes the workload smaller. Quantization, discussed previously, is a prevalent example of this.

'''Container Workloads:''' Container workloads can be very useful, as they provide all resources and important data the device needs for processing the work. By examining the computational abilities of the device, different sized workloads can be allocated as deemed necessary to maximize the efficiency of the training.

'''Batch size tuning:''' The batch size used by a certain model is very important when considering the memory constraints of edge devices. Batch sizes that are smaller allow for a quicker and more efficient means of training models, and are less likely to lead to bottlenecks in memory. This is related to partitioning because the computing and memory capacity of the devices available are very important factors to consider.

==='''Utilizing Small Language Models (SLMs)'''===
Large Language Models (LLMs) have become a prevalent system recently and are able to do and help with a variety of tasks. However, running and training an LLM requires a significant amount of computational resources which is not feasible when working with edge devices. Most modern LLMs are cloud based, but this may lead to high latency and increased network traffic, especially when working with a large subsystem of nodes.

One way that a similar system can be achieved on edge devices is by using SLMs. These are not as accurate and do not have the vast knowledge of LLMs or the amount of data they are trained on, but for the purposes of basic applications and edge devices, they can be sufficient to accomplish many tasks. They are also often fine-tuned and trained to accomplish the specific tasks which they are deployed for and are much faster and resource-efficient than LLMs. They can also provide much more privacy because they are able to be run on local devices without sharing user data to the cloud. This can be useful for a wide variety of edge applications. If needed and privacy constraints permit, they can query and LLM as needed for more complicated tasks. This means that not every prompt leads to a query, and thus network traffic, privacy, and latency constraints are still preserved.

==='''Synthesis'''===

In summary, edge-oriented machine learning optimization requires an integrated approach that combines model-level compression with system-level orchestration. Techniques such as quantization, structured and unstructured pruning, and knowledge distillation reduce the computational footprint and memory requirements of deep learning models, enabling deployment on resource-constrained devices without substantial loss in inference accuracy. Concurrently, dynamic workload partitioning, heterogeneity-aware scheduling, and adaptive runtime profiling allow the system to allocate tasks across edge and cloud tiers based on real-time availability of compute, bandwidth, and energy resources. This joint optimization across model architecture and execution environment is essential to meet the latency, privacy, and resilience demands of edge AI deployments.

=='''References'''==
[https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10818760&tag=1 1. M. Zhang, X. Shen, J. Cao, Z. Cui and S. Jiang, "EdgeShard: Efficient LLM Inference via Collaborative Edge Computing," in IEEE Internet of Things Journal, doi: 10.1109/JIOT.2024.3524255]

[https://ieeexplore.ieee.org/abstract/document/8690980 2. X. Chen, H. Zhang, C. Wu, S. Mao, Y. Ji and M. Bennis, "Performance Optimization in Mobile-Edge Computing via Deep Reinforcement Learning," 2018 IEEE 88th Vehicular Technology Conference (VTC-Fall), Chicago, IL, USA, 2018, pp. 1-6, doi: 10.1109/VTCFall.2018.8690980.]

[https://ieeexplore.ieee.org/abstract/document/8976180 3. X. Wang, Y. Han, V. C. M. Leung, D. Niyato, X. Yan and X. Chen, "Convergence of Edge Computing and Deep Learning: A Comprehensive Survey," in IEEE Communications Surveys & Tutorials, vol. 22, no. 2, pp. 869-904, Secondquarter 2020, doi: 10.1109/COMST.2020.2970550.]

[https://dl.acm.org/doi/abs/10.1145/3093337.3037698 4. Kang, Yiping and Hauswald, Johann and Gao, Cao and Rovinski, Austin and Mudge, Trevor and Mars, Jason and Tang, Lingjia, "Neurosurgeon: Collaborative Intelligence Between the Cloud and Mobile Edge" 2017 Association for Computing Machinery, New York, NY, USA, 2017, doi: 10.1145/3093337.3037698.]

[https://dl.acm.org/doi/pdf/10.1145/3555802 5. H. Hua, Y. Li, T. Wang, N. Dong, W. Li, and J. Cao, "Edge computing with artificial intelligence: A machine learning perspective," ACM Computing Surveys, vol. 55, no. 9, Art. no. 184, pp. 1–35, Jan. 2023, doi: 10.1145/3555802..]

[https://www.mdpi.com/2079-9292/13/3/640 6. Grzesik, Piotr, and Dariusz Mrozek. "Combining machine learning and edge computing: Opportunities, challenges, platforms, frameworks, and use cases." Electronics 13.3 (2024): 640.]

[https://link.springer.com/chapter/10.1007/978-3-030-96756-7_1#citeas 7. S. Rafatirad, H. Homayoun, Z. Chen, and S. M. Pudukotai Dinakarrao, "What Is Applied Machine Learning?," in Machine Learning for Computer Scientists and Data Analysts, Cham, Switzerland: Springer, 2022.]

[https://link.springer.com/article/10.1007/s13748-012-0035-5 8. Peteiro-Barral, Diego, and Bertha Guijarro-Berdiñas. "A survey of methods for distributed machine learning." Progress in Artificial Intelligence 2 (2013): 1-11.]

[https://proceedings.neurips.cc/paper/2010/hash/abea47ba24142ed16b7d8fbf2c740e0d-Abstract.html 9. Zinkevich, Martin, et al. "Parallelized stochastic gradient descent." Advances in neural information processing systems 23 (2010).]

[https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9134370 10. F. Zhuang et al., "A Comprehensive Survey on Transfer Learning," in Proceedings of the IEEE, vol. 109, no. 1, pp. 43-76, Jan. 2021, doi: 10.1109/JPROC.2020.3004555]

[https://proceedings.mlr.press/v97/phuong19a/phuong19a.pdf 11. M. Phuong and C. Lampert, "Towards understanding knowledge distillation," in Proc. Int. Conf. Mach. Learn., May 2019, pp. 5142–5151.]

[https://www.jmlr.org/papers/volume18/16-456/16-456.pdf 12. I. Hubara, M. Courbariaux, D. Soudry, R. El-Yaniv, and Y. Bengio, "Quantized neural networks: Training neural networks with low precision weights and activations," Journal of Machine Learning Research, vol. 18, no. 187, pp. 1–30, 2018.]]

[https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=8588318 13. D. Banik, A. Ekbal, and P. Bhattacharyya, "Machine learning based optimized pruning approach for decoding in statistical machine translation," IEEE Access, vol. 7, pp. 1736–1751, Dec. 2018, doi: 10.1109/ACCESS.2018.2883738.]

Machine Learning at the Edge

2025-04-25T05:16:48Z

Ciarang: /* Usage and Applications of AI Agents */

='''Machine Learning at the Edge'''=

=='''4.1 Overview of ML at the Edge'''==
==='''Introduction'''===
Machine Learning is a branch of Artificial Intelligence that is primarily concerned with the training of algorithms to look at sets of data, analyze patterns, and generate conclusions so that the gathered data can be used to generate results and carry out tasks. As an application, machine learning is especially relevant when considering edge computing [7]. Given the recent prevalence and rise in the field of machine learning, it is inevitable that deployment of machine learning algorithms will play a crucial role in the function of edge computing systems as well. The abundant sensor networks often involved in edge systems provide a means of gathering very large amounts of data. With the help of machine learning, such data can be utilized in very effective ways and automation methods can be employed to improve efficiency. Machine learning on edge devices themselves can help ease the burden on cloud systems and the networks connected to them as well, especially if the model does need such large computational power such as in the case of certain Small Language Models (SLMs). However, the main issue regarding machine learning at the edge lies in the devices, which may have limited computational power, and the environment, which may change dynamically and be effected by network congestion, device outages, or other unexpected events. If these challenges can be overcome, edge systems could play a pivotal role in advancing machine learning and utilize it to maximize efficiency.

[[File:ECML.png|600px|thumb|center|]]

==='''Benefits of Machine Learning using Edge Computing'''===
'''Data Volumes:''' Traditional cloud computing architecture may not be able to keep up with the massive volume of data that is often generated by IoT devices and sensors, thus increasing costs as well as pressure and congestion on the networks transmitting data [5]. With edge computing architectures, machine learning can be deployed in a distributed nature, allowing ML models to be deployed and trained across many edge devices. The nature of edge devices sharing computational tasks makes it perfect for taking large datasets and splitting the computational load so that many devices can compute what a single device might not have been able to handle [5]. More than just splitting input data across edge devices, machine learning models can be split up across edge devices as well. This allows for deployment of models that would be otherwise too large to be deployed on a single edge device with limited memory. In such cases, edge nodes are able to collaboratively pass data between them, and the global ML model is later updated based on the workings of each smaller portion [5]. Additionally, the sheer amount of data that processed from edge devices means more training data for machine learning models, which generally increases accuracy, and thus edge computing could be an effective solution to providing lots of good and relevant data for a variety of applications.

'''Lower Latency:''' For certain real-time applications, such as Virtual and Augmented Reality(VR/AR) or smart cars, the latency required to transmit data and process it to the cloud may be too high to make these applications efficient and safe. A key benefit of edge computing lies in the reduced latency provided by putting devices closer to the users, and this is applicable to machine learning as well [5].

'''Enhanced privacy:''' Processing data locally rather than on the cloud reduces the risk of data theft and enhances privacy. The data does not have to be sent to anyone else, or go over a network to potentially get compromised. This is crucial given the large amounts of data needed for machine learning, and especially if personal data is needed for a personal application, many users would prefer the privacy done by having it processed locally [5].

==='''Challenges of Machine Learning Using Edge Computing'''===

'''Potentially Low Computational Power:''' Many edge devices lack the computational power needed for deep learning applications, especially given the large amount of data and complex operations that may have to take place [1].

'''Energy Management:''' Given that edge devices may consist of sensors, or other devices that are meant to have low energy consumption, the tasks associated with machine learning could quickly drain their power, even if they have sufficient power to run the workloads [6].

'''New Attack Surfaces:''' With more devices comes the increased potential for malicious hackers to steal data or compromise devices, despite the enhanced privacy. Encryption, access controls, and other methodologies must be employed to mitigate the potential for new attack surfaces to be exploited [6].

==='''Applications for ML at the Edge'''===
The utilization of data to make conjectures about what is going on in the environment and how to respond has a variety of use cases that can greatly benefit people, cities, and the environment. By leveraging and monitoring a constant stream of data and training machine learning models to detect or even respond to different events, there can be many practical applications for such systems. These would rely on the combination of edge devices and machine learning to better enhance experience for users and detect events of interest.

'''Self-Driving Cars''': Imitation learning can be leveraged to better understand and emulate human driving. The low latency that can be provided by edge computing is especially useful for the quick decision making needed by these systems [7].

'''Smart Home Devices''': Understanding user habits by leveraging the available data for them can make smart homes more convenient for users. The increased privacy that can come with edge computing, along with a few extra cybersecurity measures, can ensure the personal data that may be used for training is not compromised.

'''Environmental and Industrial Monitoring:''' Sensors deployed in environments and industrial settings could be trained to recognize when there are anomalies or undesirable behavior, thus ensuring a quick response and active information sending [7].

'''Smart Cities:''' Similar to above, the data collected by sensors in cities can leverage machine learning to help in crime and emergency detection, traffic management, or energy management[7].

=='''4.2 ML Training at the Edge'''==

Machine Learning (ML) training at the edge is basically the process of developing, updating, or fine-tuning ML models directly on edge devices like on smartphones, IoT sensors, wearables, and other embedded systems instead of only depending on centralized cloud infrastructure. This approach is becoming a lot more important as the demand for real-time, personalized AI applications continues to grow. By being able to train models closer to where the data is generated, edge-based ML enables faster responses, helps reduce latency, and enhances user privacy by minimizing the need to transmit sensitive data to the cloud. It’s also especially useful in scenarios where devices operate in environments with limited or unreliable network connectivity, allowing them to function more efficiently.

[[File:Machine Learning Pipeline.png|500px|thumb|right|Diagram of the machine learning pipeline: raw structured and unstructured data is preprocessed, analyzed, and used for feature selection, followed by model construction, evaluation, and deployment for application use.]]

[[File:Learning.png|500px|thumb|center| This diagram shows the relationship among different devices in the edge-cloud system. As shown, the deep learning workload is partitioned among all layers so that each one contributes based on its advantages.]]

==='''Benefits:'''===
One significant advantage of training ML models directly on edge devices is reduced latency. By processing data locally, devices can make immediate decisions without the delays caused by transmitting data back and forth to cloud servers. This immediate responsiveness is extremely important for applications like real time health monitoring, autonomous driving, and industrial automation.

Additionally, training machine learning models at the edge significantly enhances user privacy. Since sensitive data can be processed and stored directly on the user's device rather than being sent to centralized cloud servers, the risk of data breaches or unauthorized access during transmission is reduced by a lot. This local data handling is able to prevent exposure of personal or confidential information, providing users greater control over their data. Edge-based training naturally aligns with privacy regulations such as the General Data Protection Regulation (GDPR), which emphasizes strict data security, transparency, and explicit user consent. By keeping personal data localized, edge training not only improves security but also helps organizations easily comply with privacy laws, protecting users’ rights and maintaining trust.

Efficiency and resilience are important benefits of edge training. By training machine learning models directly on edge devices, these devices become capable of processing data locally without relying on constant internet connectivity. This local processing allows edge devices to continue operating effectively even in environments where network connections are weak, unstable, or completely unavailable. Because they are not fully dependent on cloud infrastructure, edge devices can quickly adapt to changes, respond in real-time, and update their ML models based on immediate local data. As a result, edge training ensures reliable performance and uninterrupted operation, making it particularly valuable for remote locations, emergency scenarios, and harsh environments where cloud-based solutions might fail or become unreliable.

'''Examples: '''
A smart thermostat in a home can learn a user’s preferences for temperature and adjust automatically based on real-time inputs, like time of day or weather conditions. Similarly, a fitness tracker can track user activity patterns and adapt its recommendations for workouts or rest periods based on how the user is performing each day. These devices don’t need to rely on cloud servers to update or personalize their behavior — they can do it instantly on the device, which makes them more responsive and efficient.

In smart agriculture, edge computing is used to enhance crop monitoring and optimize farming practices. Devices like soil sensors, drones, and automated irrigation systems are equipped with sensors that collect data on soil moisture, temperature, and crop health. Edge devices process this data locally, enabling real-time decisions for tasks like irrigation, fertilization, and pest control.

In smart retail, edge computing is used to improve inventory management and customer experience. Retailers use smart shelves, RFID tags, and in-store cameras equipped with sensors to track inventory and monitor customer behavior. By processing this data locally on edge devices, retailers can manage stock levels, detect theft, and optimize store layouts in real-time. RFID tags placed on products can detect when an item is removed from the shelf. Using edge processing, the system can immediately update the inventory count and trigger a restocking request if an item’s stock is low.

'''Research Papers:'''
An important contribution to the understanding of machine learning (ML) training at the edge is the research paper "Making Distributed Edge Machine Learning for Resource-Constrained Communities and Environments Smarter: Contexts and Challenges" by Truong et al. (2023). This paper focuses on training ML models directly on edge devices in communities and environments facing limitations, such as unstable network connections, limited computational resources, and scarce technical expertise. The authors emphasize the necessity of developing context-aware ML training methods specifically tailored to these environments. Traditional centralized ML training methods often fail to operate effectively in such constrained settings, highlighting the need for decentralized, localized solutions. Truong et al. explore various challenges, including managing data efficiently, deploying suitable software frameworks, and designing intelligent runtime strategies that allow edge devices to train models effectively despite limited resources. Their work points out significant research directions, advocating for more adaptable and sustainable ML training solutions that genuinely reflect the technological and social contexts of resource-limited environments.

==='''Tools and Frameworks:'''===

Frameworks like TensorFlow Lite, PyTorch Mobile, and Edge Impulse are designed to support edge-based model training and inference. These tools allow developers to build and fine-tune models specifically for deployment on low-power devices.

==='''Technical Challenges:'''===

Despite its advantages, ML training at the edge presents challenges, including limited processing power, memory constraints, and energy efficiency. Edge devices often lack the computational resources of cloud servers, requiring lightweight models, optimized algorithms, and energy-efficient hardware.

Real World Applications:
A well known example is Apple’s use of on-device training for personalized voice recognition with Siri. Instead of uploading user voice data to the cloud, Apple uses local training to improve accuracy over time while maintaining user privacy.

==='''Model Compression Techniques'''===
Despite the challenges, of ML at the edge, there are a variety of methods that can be used to provide a more efficient means of training, and making the heavy workloads compatible with even the limited computing power of certain edge devices.

'''Quantization:'''
Quantization is a method that involves reducing the precision of numbers, and thus easing the burden of computational power as well as memory management on the edge devices. There are multiple forms of quantization, but each one essentially sacrifices some precision - enough so that accuracy is mostly maintained but the numbers are easier to handle. For example, converting from floating point to integer datatypes means significantly less memory is used, and the differences for some models in the precision may be negligible. Another example is K-means based Weight Quantization, which involves creating a matrix and grouping similar numbers together with centroids. An example is shown below:

[[File:Screenshot_2025-04-24_170604.png|500px|thumb|right|By clustering each index in the matrix, and using centroids to approximate, the overall computations can be done much quicker and are more easily handled by edge devices]]

In recent work, Quantized Neural Networks (QNNs) have demonstrated that even extreme quantization—such as using just 1-bit values for weights and activations—can retain near state-of-the-art accuracy across vision and language tasks [12]. This type of quantization drastically reduces memory access requirements and replaces expensive arithmetic operations with fast, low-power bitwise operations like XNOR and popcount. These benefits are especially important for edge deployment, where energy efficiency is critical. In addition to model compression, Hubara et al. also show that quantized gradients—using as little as 6 bits—can be employed during training with minimal performance loss, further enabling efficient on-device learning [12]. QNNs have achieved strong results even on demanding benchmarks like ImageNet, while offering significant speedups and memory savings, making them one of the most practical solutions for edge AI deployment [12].

'''Pruning:'''
Pruning is an optimization technique that systematically removes low-salience parameters—such as weakly contributing weights or redundant hypothesis paths—from a machine learning model or decoding algorithm to reduce computational overhead. In the context of edge computing, where resources like memory bandwidth, power, and processing time are limited, pruning enables the deployment of performant models within strict efficiency constraints.

In statistical machine translation (SMT) systems, pruning is particularly critical during the decoding phase, where the search space of possible translations grows exponentially with sentence length. Techniques such as histogram pruning and threshold pruning are employed to manage this complexity. Histogram pruning restricts the number of candidate hypotheses retained in a decoding stack to a fixed size 𝑛, discarding the remainder. Threshold pruning eliminates hypotheses whose scores fall below a proportion 𝛼 of the best-scoring candidate, effectively filtering out weak candidates early.

The paper by Banik et al. introduces a machine learning-based dynamic pruning framework that adaptively tunes pruning parameters—namely stack size and beam threshold—based on structural features of the input text, such as sentence length, syntactic complexity, and the distribution of stop words. Rather than relying on static hyperparameters, this method uses a classifier (CN2 algorithm) trained on performance data to predict optimal pruning configurations at runtime. Experimental results showed consistent reductions in decoding latency (up to 90%) while maintaining or improving translation quality, as measured by BLEU scores [13].

This adaptive pruning paradigm is highly relevant to edge inference pipelines, where models must maintain a balance between latency and predictive accuracy. By intelligently limiting the hypothesis space and focusing computational resources on high-probability paths, pruning supports real-time, resource-efficient processing in edge NLP and embedded translation systems.

[[File:Pruning.png|350px|thumb|center|This shows how pruning can significantly reduce the overall network, thus leading to better computational and memory management]]

'''Distillation:'''
Distillation is a key strategy for reducing model complexity in edge computing environments. Instead of training a compact student model on hard labels—discrete class labels like 0, 1, or 2—it is trained on the soft outputs of a larger teacher model. These soft labels represent probability distributions over all classes, offering more nuanced supervision. For instance, rather than telling the student the input belongs strictly to class 3, a teacher might output “70% class 3, 25% class 2, 5% class 1.” This richer feedback helps the student model capture subtle relationships between classes that hard labels miss. Beyond reducing computational demands, distillation enhances generalization by conveying more informative training signals. It also benefits from favorable data geometry—when class distributions are well-separated and aligned—and exhibits strong monotonicity, meaning the student model reliably improves as more data becomes available [11]. These properties make it exceptionally suited for edge devices where training data may be limited, but efficient inference is crucial.

In most cases, knowledge distillation in edge environments involves a large, high-capacity model trained in the cloud acting as the teacher, while the smaller, lightweight student model is deployed on edge devices. A less common—but emerging—practice is edge-to-edge distillation, where a more powerful edge node or edge server functions as the teacher for other nearby edge devices. This setup is especially valuable in federated, collaborative, or hierarchical edge networks, where cloud connectivity may be limited or privacy concerns necessitate local training. Distillation can also be combined with techniques such as quantization or pruning to further optimize model performance under hardware constraints. An example is shown below:

[[File:Knowledge_Distillation.png|700px|thumb|center|This shows how a complex teacher model transfers learned knowledge to a smaller student model using soft predictions to enable efficient edge deployment]]

{| class="wikitable" style="width:100%; text-align:left;"
|+ '''Comparison of Model Compression Techniques for Edge Deployment'''
! Technique
! Description
! Primary Benefit
! Trade-offs
! Ideal Use Case
|-
| '''Pruning'''
| Removes unnecessary weights or neurons from a neural network.
| Reduces model size and computation.
| May require retraining or fine-tuning to preserve accuracy.
| Useful for deploying models on devices with strict memory and compute constraints.
|-
| '''Quantization'''
| Converts high-precision values (e.g., 32-bit float) to lower precision (e.g., 8-bit integer or binary).
| Lowers memory usage and accelerates inference.
| Risk of precision loss, especially in very small or sensitive models.
| Ideal when real-time inference and power efficiency are essential.
|-
| '''Distillation'''
| Trains a smaller model (student) using the output probabilities of a larger, more complex teacher model.
| Preserves performance while reducing model complexity.
| Requires access to a trained teacher model and additional training data.
| Effective when deploying accurate, lightweight models under data or resource constraints.
|}

==='''Usage and Applications of AI Agents'''===
As artificial intelligence and machine learning technologies continue to mature, they pave the way for the development of intelligent AI agents capable of autonomous, context-aware behavior, with the goal of efficiently performing tasks specified by users. These agents combine perception, reasoning, and decision-making to execute tasks with minimal human intervention. When deployed on edge devices, AI agents can operate with low latency, preserve user privacy, and adapt to local data—making them ideal for real-time, personalized applications in homes, vehicles, factories, and beyond.

To function effectively, an agent must first perceive its environment and understand the task—often defined by the user. Then, it must reason about the optimal steps to accomplish that task, and finally, it must act on those decisions. These three components—perception, reasoning, and action—are essential to the agent’s ability to operate accurately and autonomously in dynamic environments.

'''Reasoning:''' The agent must be able to think sequentially, and decompose its specified tasks into a sequence of specific steps in order to accomplish its goal. It must also have some memory storage in order to remember what it has done, as well as the results of its sequence of actions in order to learn for future steps.

'''Autonomy:''' The agent must choose from the availability of possible steps, and operate based on its reasoning without step-by-step instructions from the user.

'''Tools:''' These tasks, however, are impossible to accomplish without the correct tools. Even if an AI agent understands how to go about carrying a task for optimal results, it must have the actual means to do it. This can include the ability to use and interact with APIs, interpret code, and access certain databases.

Utilizing AI agents on edge devices can be tricky due to the computational and reasoning power needed. However, there are methods to accomplish this such as SLMs which query LLMs as needed (discussed later), or utilizing more powerful edge devices to carry out tasks. However, utilizing edge devices can be paramount if latency is a major issue, or if the agent is exposed to sensitive user data. Additionally, by using edge devices specific to a user, it may be able to better learn a user's patterns and preferences and react accordingly to provide the best possible outcome for that user.

=='''4.3 ML Model Optimization at the Edge'''==

==='''The Need for Model Optimization at the Edge'''===
Given the constrained resources and the inherently dynamic environments in which edge devices must operate, model optimization is a crucial part of machine learning in edge computing. The current most widely used methodology consists of simply specifying an exceptionally large set of parameters, and giving it to the model to train on. This can be feasible when hardware is very advanced and powerful, and is necessary for systems such as Large Language Models (LLMs). However, this is no longer viable when dealing with the devices and environments at the edge. It is crucial to identify the best parameters and training methodology so as to minimize the amount of work done by these devices, while compromising as little as possible on the accuracy of the models. There are multiple ways to this, and they include either optimization or augmentation of the dataset itself, or optimization of the partition of work among the edge devices.

==='''Edge and Cloud Collaboration'''===
One methodology that is often used involves collaboration between both Edge and Cloud Devices. The cloud has the ability to process workloads that may require much more resources and cannot be done on edge devices. On the other hand, edge devices, which can store and process data locally, may have lower latency and more privacy. Given the advantages of each of these, many have proposed that the best way to handle machine learning is through a combination of edge and cloud computing.

The primary issue facing this computing paradigm, however, is the problem of optimally selecting which workloads should be done on the cloud and which should be done on the edge. This is a crucial problem to solve, as the correct partition of workloads is the best way to ensure that the respective benefits of the devices can be leveraged. A common way to do this, is to run certain computing tasks on the necessary devices and determine the length of time and resources that it takes. An example of this is the profiling step done in [https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10818760&tag=1 EdgeShard] [1] and [https://dl.acm.org/doi/pdf/10.1145/3093337.3037698 Neurosurgeon] [4]. Other frameworks implement similar steps, where the capabilities of different devices are tested in order to allocate their workloads and determine the limit at which they can provide efficient functionality. If the workload is beyond the limits of the devices, it can be sent to the cloud for processing

The key advantage of this is that it is able to utilize the resources of the edge devices as necessary, allowing increased data privacy and lower latency. Since workloads are only processed in the cloud as needed, this will reduce the overall amount of time needed for processing because data is not constantly sent back and forth. It also allows for much less network congestion, which is crucial for many applications.

[[File:ECcollab.png|400px|thumb|center|The collaboration of Edge and Cloud Devices]]

==='''Optimizing Workload Partitioning'''===
The key idea for much of the optimization done in machine learning on edge systems involves fully utilizing the heterogenous devices that are often contained in these systems. As such, it is important to understand the capabilities of each device so as to fully utilize its advantages. Devices can very greatly from smartphones with more powerful computational abilities to raspberry pis to sensors. More difficult tasks are offloaded to the powerful devices, while simpler tasks, or models that have been somewhat pretrained can be sent to the smaller devices. In some cases, as in [https://ieeexplore.ieee.org/abstract/document/8690980 Mobile-Edge] [2], the task may be dropped altogether if the resources are deemed insufficient. In this way, exceptionally difficult tasks do not block tasks that have the ability to be executed and therefore the system can continue working.

==='''Dynamic Models'''===
Given the dynamic nature of the environments that edge devices must function, as well as the heterogeneity of the devices themselves, a dynamic model of machine learning is often employed. Such models must keep track of the current available resources including computation usage and power, as well as network traffic. These may change very often depending on the workloads and devices in the system. As such, training models to continuously monitor and dynamically distribute the workloads is a very important part of optimization. Simply offloading larger tasks to more powerful devices may be obsolete if the devices has all of its computing resources or network capabilities being used up by another workload.

The way this is commonly done is by using the profiling step described above as a baseline. Then, a machine learning model utilizes the data to estimate the performance of devices and/or layers. During runtime, a similar process is employed which may update the data used and help the model refine its predictions. Network traffic is also taken into account at this stage in order to preserve the edge computing benefit of providing lower latency. Using all of this data and updates at runtime, the partitioning model is able to dynamically distribute workloads at runtime in order to optimize the workflow and ensure each device is utilizing its resources in the most efficient manner. 2 very good examples of how such a system is specifically deployed are the Neurosurgeon and EdgeShard systems, shown above.

==='''Horizontal and Vertical Partitioning'''===
There are 2 major ways that these models split the workloads in order to optimize the machine learning: Horizontal and vertical partitioning [3]. Given a set of layers that ranges from the cloud to edge, vertical partitioning involves splitting up the workload between the layers. For example, if a large amount of computational resources is deemed necessary, this task may go to the cloud to be completed and preprocessed. One the other hand, if a small amount of computational power is required, this type of work can go to edge devices. Such partitioning also depends on the confidence and accuracy level of the given learning. If the accuracy is completed on an edge device and found to be very low, it can be sent to the cloud; on the other hand if the accuracy is already fairly high and the learning model needs smaller work to reach the threshold deemed acceptable, it may be sent to edge devices to free up network traffic on the cloud and reduce latency [3].

The second model of partitioning is called horizontal partitioning. This involves splitting among the devices within a certain layer rather than among the layers themselves. This is similar to what has been described in previous sections, as it allows a means for fully utilizing the heterogenous abilities that are found in edge devices. Similar functionality and determination to what is found in horizontal partitioning is done, but all of the devices that the workload is split across function on the same layer [3]. To fully optimize a machine learning model, both horizontal and vertical partitioning must be used.

[[File:Edge_computing_layers.png|400px|thumb|center|An example of different layers with multiple devices]]

==='''Distributed Learning'''===
Distributed learning is a process in which instead of giving all of the data to the nodes, each node takes in and processes only a subset of the overall data, then sends its results to a central server which contains the main model. These periodic updates are done by each node, and by only processing part of the data, it is much more easy for edge devices to handle these workloads. It can also reduce the time and computational burden on the cloud and network because it is not only the central server performing all of the computations. One popular method of accomplishing this is by using parallel gradient descent. Gradient descent by itself is a very useful means of training a model, and parallel gradient descent uses a similar process but instead of a single operation, it aggregates the gradient descent calculated by each node in order to update the central model. This efficiently utilizes each node and makes sure that the data that is used to update the model does not exceed the memory constraints of the edge devices being used as nodes.

[[File:Distributed.png|600px|thumb|center| A visualization of data partitioning among nodes doing distributed learning]]

'''Asynchronous Scheduling:''' One important aspect of distributed learning is the means by which the updates are given to the model. Each node may have different amounts of data, memory, and computational power, and therefore the time it takes to process and update will not be the same. Synchronous algorithms make sure that each node finishes its respective processing, and then all of the calculations from all nodes are sent to update the central model. Although, this can make the updating easier, any nodes that lag behind the others will greatly reduce the speed at which the model is trained; this also leaves nodes idle while others are still processing and can be highly inefficient. To optimize this, after each node has finished its respective processing, it sends everything to the main server for an update rather than waiting for all the others to finish. However, this requires some more communication, as after each update, every node must get the updated model for the central server; this can be challenging to repeatedly do, but the efficiency that is gained is often worth it.

'''Federated learning:''' Federated learning is another means by which the data as well as computational burden is distributed among a set of nodes, thus reducing network traffic and strain on the central servers. It is similar to distributed learning, but the key difference lies in the data partitioning. Unlike distributed learning, data used in federated learning is not shared with the central servers. Instead, only local data from each of the nodes is used to train the model. Then, the only part shared with the central model is the updated parameters. A key aspect of this is that federated learning provides a greater amount of data privacy, which is crucial for certain applications dealing with sensitive data. Therefore, it is especially useful to utilize edge devices to perform federated learning. Federated learning is discussed in detail in chapter 5 of this wiki.

==='''Transfer Learning'''===
Transfer learning is a method of machine learning in which a pretrained model is sent to the different nodes for further processing and fine-tuning. The initial training of the model may involve a very large amount of data and could place a major burden on the device that must execute it, which may not be feasible for edge devices. Therefore, the bulk of the model training is done by a more powerful machine, such as the cloud, and then the pretrained model is sent to the edge device. This can be useful, as it reduces the computational burden on the edge device, and allows it to fine-tune the model using the data it collects without having to completely train the whole system. One form of this is knowledge distillation, in which a smaller model can be trained to mimic that of a larger model. This may often be the case when edge and cloud systems are used in a combined way.

==='''Methods of Data Optimization'''===

'''Data Selection:''' Certain types of data may be more useful than others, and therefore the ones that most affect the accuracy of a model can be offloaded to edge devices. This decreases the workload on them because less data must be processed, while also conserving the accuracy of the model as much as possible. Some larger data may not be needed, such as only putting Small Language Models on edge devices that can handle simple commands and prompts, rather than having to offload an entire LLM onto the device, which may overload its computational abilities and not provide enough use.

'''Data Compression:''' Data can be compressed to a smaller form in order to fit the constraints of edge devices. This is especially true given their limited memory, and also makes the workload smaller. Quantization, discussed previously, is a prevalent example of this.

'''Container Workloads:''' Container workloads can be very useful, as they provide all resources and important data the device needs for processing the work. By examining the computational abilities of the device, different sized workloads can be allocated as deemed necessary to maximize the efficiency of the training.

'''Batch size tuning:''' The batch size used by a certain model is very important when considering the memory constraints of edge devices. Batch sizes that are smaller allow for a quicker and more efficient means of training models, and are less likely to lead to bottlenecks in memory. This is related to partitioning because the computing and memory capacity of the devices available are very important factors to consider.

==='''Utilizing Small Language Models (SLMs)'''===
Large Language Models (LLMs) have become a prevalent system recently and are able to do and help with a variety of tasks. However, running and training an LLM requires a significant amount of computational resources which is not feasible when working with edge devices. Most modern LLMs are cloud based, but this may lead to high latency and increased network traffic, especially when working with a large subsystem of nodes.

One way that a similar system can be achieved on edge devices is by using SLMs. These are not as accurate and do not have the vast knowledge of LLMs or the amount of data they are trained on, but for the purposes of basic applications and edge devices, they can be sufficient to accomplish many tasks. They are also often fine-tuned and trained to accomplish the specific tasks which they are deployed for and are much faster and resource-efficient than LLMs. They can also provide much more privacy because they are able to be run on local devices without sharing user data to the cloud. This can be useful for a wide variety of edge applications. If needed and privacy constraints permit, they can query and LLM as needed for more complicated tasks. This means that not every prompt leads to a query, and thus network traffic, privacy, and latency constraints are still preserved.

==='''Synthesis'''===

In summary, edge-oriented machine learning optimization requires an integrated approach that combines model-level compression with system-level orchestration. Techniques such as quantization, structured and unstructured pruning, and knowledge distillation reduce the computational footprint and memory requirements of deep learning models, enabling deployment on resource-constrained devices without substantial loss in inference accuracy. Concurrently, dynamic workload partitioning, heterogeneity-aware scheduling, and adaptive runtime profiling allow the system to allocate tasks across edge and cloud tiers based on real-time availability of compute, bandwidth, and energy resources. This joint optimization across model architecture and execution environment is essential to meet the latency, privacy, and resilience demands of edge AI deployments.

=='''References'''==
[https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10818760&tag=1 1. M. Zhang, X. Shen, J. Cao, Z. Cui and S. Jiang, "EdgeShard: Efficient LLM Inference via Collaborative Edge Computing," in IEEE Internet of Things Journal, doi: 10.1109/JIOT.2024.3524255]

[https://ieeexplore.ieee.org/abstract/document/8690980 2. X. Chen, H. Zhang, C. Wu, S. Mao, Y. Ji and M. Bennis, "Performance Optimization in Mobile-Edge Computing via Deep Reinforcement Learning," 2018 IEEE 88th Vehicular Technology Conference (VTC-Fall), Chicago, IL, USA, 2018, pp. 1-6, doi: 10.1109/VTCFall.2018.8690980.]

[https://ieeexplore.ieee.org/abstract/document/8976180 3. X. Wang, Y. Han, V. C. M. Leung, D. Niyato, X. Yan and X. Chen, "Convergence of Edge Computing and Deep Learning: A Comprehensive Survey," in IEEE Communications Surveys & Tutorials, vol. 22, no. 2, pp. 869-904, Secondquarter 2020, doi: 10.1109/COMST.2020.2970550.]

[https://dl.acm.org/doi/abs/10.1145/3093337.3037698 4. Kang, Yiping and Hauswald, Johann and Gao, Cao and Rovinski, Austin and Mudge, Trevor and Mars, Jason and Tang, Lingjia, "Neurosurgeon: Collaborative Intelligence Between the Cloud and Mobile Edge" 2017 Association for Computing Machinery, New York, NY, USA, 2017, doi: 10.1145/3093337.3037698.]

[https://dl.acm.org/doi/pdf/10.1145/3555802 5. H. Hua, Y. Li, T. Wang, N. Dong, W. Li, and J. Cao, "Edge computing with artificial intelligence: A machine learning perspective," ACM Computing Surveys, vol. 55, no. 9, Art. no. 184, pp. 1–35, Jan. 2023, doi: 10.1145/3555802..]

[https://www.mdpi.com/2079-9292/13/3/640 6. Grzesik, Piotr, and Dariusz Mrozek. "Combining machine learning and edge computing: Opportunities, challenges, platforms, frameworks, and use cases." Electronics 13.3 (2024): 640.]

[https://link.springer.com/chapter/10.1007/978-3-030-96756-7_1#citeas 7. S. Rafatirad, H. Homayoun, Z. Chen, and S. M. Pudukotai Dinakarrao, "What Is Applied Machine Learning?," in Machine Learning for Computer Scientists and Data Analysts, Cham, Switzerland: Springer, 2022.]

[https://link.springer.com/article/10.1007/s13748-012-0035-5 8. Peteiro-Barral, Diego, and Bertha Guijarro-Berdiñas. "A survey of methods for distributed machine learning." Progress in Artificial Intelligence 2 (2013): 1-11.]

[https://proceedings.neurips.cc/paper/2010/hash/abea47ba24142ed16b7d8fbf2c740e0d-Abstract.html 9. Zinkevich, Martin, et al. "Parallelized stochastic gradient descent." Advances in neural information processing systems 23 (2010).]

[https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9134370 10. F. Zhuang et al., "A Comprehensive Survey on Transfer Learning," in Proceedings of the IEEE, vol. 109, no. 1, pp. 43-76, Jan. 2021, doi: 10.1109/JPROC.2020.3004555]

[https://proceedings.mlr.press/v97/phuong19a/phuong19a.pdf 11. M. Phuong and C. Lampert, "Towards understanding knowledge distillation," in Proc. Int. Conf. Mach. Learn., May 2019, pp. 5142–5151.]

[https://www.jmlr.org/papers/volume18/16-456/16-456.pdf 12. I. Hubara, M. Courbariaux, D. Soudry, R. El-Yaniv, and Y. Bengio, "Quantized neural networks: Training neural networks with low precision weights and activations," Journal of Machine Learning Research, vol. 18, no. 187, pp. 1–30, 2018.]]

[https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=8588318 13. D. Banik, A. Ekbal, and P. Bhattacharyya, "Machine learning based optimized pruning approach for decoding in statistical machine translation," IEEE Access, vol. 7, pp. 1736–1751, Dec. 2018, doi: 10.1109/ACCESS.2018.2883738.]

Machine Learning at the Edge

2025-04-25T05:15:44Z

Ciarang: /* Challenges of Machine Learning Using Edge Computing */

='''Machine Learning at the Edge'''=

=='''4.1 Overview of ML at the Edge'''==
==='''Introduction'''===
Machine Learning is a branch of Artificial Intelligence that is primarily concerned with the training of algorithms to look at sets of data, analyze patterns, and generate conclusions so that the gathered data can be used to generate results and carry out tasks. As an application, machine learning is especially relevant when considering edge computing [7]. Given the recent prevalence and rise in the field of machine learning, it is inevitable that deployment of machine learning algorithms will play a crucial role in the function of edge computing systems as well. The abundant sensor networks often involved in edge systems provide a means of gathering very large amounts of data. With the help of machine learning, such data can be utilized in very effective ways and automation methods can be employed to improve efficiency. Machine learning on edge devices themselves can help ease the burden on cloud systems and the networks connected to them as well, especially if the model does need such large computational power such as in the case of certain Small Language Models (SLMs). However, the main issue regarding machine learning at the edge lies in the devices, which may have limited computational power, and the environment, which may change dynamically and be effected by network congestion, device outages, or other unexpected events. If these challenges can be overcome, edge systems could play a pivotal role in advancing machine learning and utilize it to maximize efficiency.

[[File:ECML.png|600px|thumb|center|]]

==='''Benefits of Machine Learning using Edge Computing'''===
'''Data Volumes:''' Traditional cloud computing architecture may not be able to keep up with the massive volume of data that is often generated by IoT devices and sensors, thus increasing costs as well as pressure and congestion on the networks transmitting data [5]. With edge computing architectures, machine learning can be deployed in a distributed nature, allowing ML models to be deployed and trained across many edge devices. The nature of edge devices sharing computational tasks makes it perfect for taking large datasets and splitting the computational load so that many devices can compute what a single device might not have been able to handle [5]. More than just splitting input data across edge devices, machine learning models can be split up across edge devices as well. This allows for deployment of models that would be otherwise too large to be deployed on a single edge device with limited memory. In such cases, edge nodes are able to collaboratively pass data between them, and the global ML model is later updated based on the workings of each smaller portion [5]. Additionally, the sheer amount of data that processed from edge devices means more training data for machine learning models, which generally increases accuracy, and thus edge computing could be an effective solution to providing lots of good and relevant data for a variety of applications.

'''Lower Latency:''' For certain real-time applications, such as Virtual and Augmented Reality(VR/AR) or smart cars, the latency required to transmit data and process it to the cloud may be too high to make these applications efficient and safe. A key benefit of edge computing lies in the reduced latency provided by putting devices closer to the users, and this is applicable to machine learning as well [5].

'''Enhanced privacy:''' Processing data locally rather than on the cloud reduces the risk of data theft and enhances privacy. The data does not have to be sent to anyone else, or go over a network to potentially get compromised. This is crucial given the large amounts of data needed for machine learning, and especially if personal data is needed for a personal application, many users would prefer the privacy done by having it processed locally [5].

==='''Challenges of Machine Learning Using Edge Computing'''===

'''Potentially Low Computational Power:''' Many edge devices lack the computational power needed for deep learning applications, especially given the large amount of data and complex operations that may have to take place [1].

'''Energy Management:''' Given that edge devices may consist of sensors, or other devices that are meant to have low energy consumption, the tasks associated with machine learning could quickly drain their power, even if they have sufficient power to run the workloads [6].

'''New Attack Surfaces:''' With more devices comes the increased potential for malicious hackers to steal data or compromise devices, despite the enhanced privacy. Encryption, access controls, and other methodologies must be employed to mitigate the potential for new attack surfaces to be exploited [6].

==='''Applications for ML at the Edge'''===
The utilization of data to make conjectures about what is going on in the environment and how to respond has a variety of use cases that can greatly benefit people, cities, and the environment. By leveraging and monitoring a constant stream of data and training machine learning models to detect or even respond to different events, there can be many practical applications for such systems. These would rely on the combination of edge devices and machine learning to better enhance experience for users and detect events of interest.

'''Self-Driving Cars''': Imitation learning can be leveraged to better understand and emulate human driving. The low latency that can be provided by edge computing is especially useful for the quick decision making needed by these systems [7].

'''Smart Home Devices''': Understanding user habits by leveraging the available data for them can make smart homes more convenient for users. The increased privacy that can come with edge computing, along with a few extra cybersecurity measures, can ensure the personal data that may be used for training is not compromised.

'''Environmental and Industrial Monitoring:''' Sensors deployed in environments and industrial settings could be trained to recognize when there are anomalies or undesirable behavior, thus ensuring a quick response and active information sending [7].

'''Smart Cities:''' Similar to above, the data collected by sensors in cities can leverage machine learning to help in crime and emergency detection, traffic management, or energy management[7].

=='''4.2 ML Training at the Edge'''==

Machine Learning (ML) training at the edge is basically the process of developing, updating, or fine-tuning ML models directly on edge devices like on smartphones, IoT sensors, wearables, and other embedded systems instead of only depending on centralized cloud infrastructure. This approach is becoming a lot more important as the demand for real-time, personalized AI applications continues to grow. By being able to train models closer to where the data is generated, edge-based ML enables faster responses, helps reduce latency, and enhances user privacy by minimizing the need to transmit sensitive data to the cloud. It’s also especially useful in scenarios where devices operate in environments with limited or unreliable network connectivity, allowing them to function more efficiently.

[[File:Machine Learning Pipeline.png|500px|thumb|right|Diagram of the machine learning pipeline: raw structured and unstructured data is preprocessed, analyzed, and used for feature selection, followed by model construction, evaluation, and deployment for application use.]]

[[File:Learning.png|500px|thumb|center| This diagram shows the relationship among different devices in the edge-cloud system. As shown, the deep learning workload is partitioned among all layers so that each one contributes based on its advantages.]]

==='''Benefits:'''===
One significant advantage of training ML models directly on edge devices is reduced latency. By processing data locally, devices can make immediate decisions without the delays caused by transmitting data back and forth to cloud servers. This immediate responsiveness is extremely important for applications like real time health monitoring, autonomous driving, and industrial automation.

Additionally, training machine learning models at the edge significantly enhances user privacy. Since sensitive data can be processed and stored directly on the user's device rather than being sent to centralized cloud servers, the risk of data breaches or unauthorized access during transmission is reduced by a lot. This local data handling is able to prevent exposure of personal or confidential information, providing users greater control over their data. Edge-based training naturally aligns with privacy regulations such as the General Data Protection Regulation (GDPR), which emphasizes strict data security, transparency, and explicit user consent. By keeping personal data localized, edge training not only improves security but also helps organizations easily comply with privacy laws, protecting users’ rights and maintaining trust.

Efficiency and resilience are important benefits of edge training. By training machine learning models directly on edge devices, these devices become capable of processing data locally without relying on constant internet connectivity. This local processing allows edge devices to continue operating effectively even in environments where network connections are weak, unstable, or completely unavailable. Because they are not fully dependent on cloud infrastructure, edge devices can quickly adapt to changes, respond in real-time, and update their ML models based on immediate local data. As a result, edge training ensures reliable performance and uninterrupted operation, making it particularly valuable for remote locations, emergency scenarios, and harsh environments where cloud-based solutions might fail or become unreliable.

'''Examples: '''
A smart thermostat in a home can learn a user’s preferences for temperature and adjust automatically based on real-time inputs, like time of day or weather conditions. Similarly, a fitness tracker can track user activity patterns and adapt its recommendations for workouts or rest periods based on how the user is performing each day. These devices don’t need to rely on cloud servers to update or personalize their behavior — they can do it instantly on the device, which makes them more responsive and efficient.

In smart agriculture, edge computing is used to enhance crop monitoring and optimize farming practices. Devices like soil sensors, drones, and automated irrigation systems are equipped with sensors that collect data on soil moisture, temperature, and crop health. Edge devices process this data locally, enabling real-time decisions for tasks like irrigation, fertilization, and pest control.

In smart retail, edge computing is used to improve inventory management and customer experience. Retailers use smart shelves, RFID tags, and in-store cameras equipped with sensors to track inventory and monitor customer behavior. By processing this data locally on edge devices, retailers can manage stock levels, detect theft, and optimize store layouts in real-time. RFID tags placed on products can detect when an item is removed from the shelf. Using edge processing, the system can immediately update the inventory count and trigger a restocking request if an item’s stock is low.

'''Research Papers:'''
An important contribution to the understanding of machine learning (ML) training at the edge is the research paper "Making Distributed Edge Machine Learning for Resource-Constrained Communities and Environments Smarter: Contexts and Challenges" by Truong et al. (2023). This paper focuses on training ML models directly on edge devices in communities and environments facing limitations, such as unstable network connections, limited computational resources, and scarce technical expertise. The authors emphasize the necessity of developing context-aware ML training methods specifically tailored to these environments. Traditional centralized ML training methods often fail to operate effectively in such constrained settings, highlighting the need for decentralized, localized solutions. Truong et al. explore various challenges, including managing data efficiently, deploying suitable software frameworks, and designing intelligent runtime strategies that allow edge devices to train models effectively despite limited resources. Their work points out significant research directions, advocating for more adaptable and sustainable ML training solutions that genuinely reflect the technological and social contexts of resource-limited environments.

==='''Tools and Frameworks:'''===

Frameworks like TensorFlow Lite, PyTorch Mobile, and Edge Impulse are designed to support edge-based model training and inference. These tools allow developers to build and fine-tune models specifically for deployment on low-power devices.

==='''Technical Challenges:'''===

Despite its advantages, ML training at the edge presents challenges, including limited processing power, memory constraints, and energy efficiency. Edge devices often lack the computational resources of cloud servers, requiring lightweight models, optimized algorithms, and energy-efficient hardware.

Real World Applications:
A well known example is Apple’s use of on-device training for personalized voice recognition with Siri. Instead of uploading user voice data to the cloud, Apple uses local training to improve accuracy over time while maintaining user privacy.

==='''Model Compression Techniques'''===
Despite the challenges, of ML at the edge, there are a variety of methods that can be used to provide a more efficient means of training, and making the heavy workloads compatible with even the limited computing power of certain edge devices.

'''Quantization:'''
Quantization is a method that involves reducing the precision of numbers, and thus easing the burden of computational power as well as memory management on the edge devices. There are multiple forms of quantization, but each one essentially sacrifices some precision - enough so that accuracy is mostly maintained but the numbers are easier to handle. For example, converting from floating point to integer datatypes means significantly less memory is used, and the differences for some models in the precision may be negligible. Another example is K-means based Weight Quantization, which involves creating a matrix and grouping similar numbers together with centroids. An example is shown below:

[[File:Screenshot_2025-04-24_170604.png|500px|thumb|right|By clustering each index in the matrix, and using centroids to approximate, the overall computations can be done much quicker and are more easily handled by edge devices]]

In recent work, Quantized Neural Networks (QNNs) have demonstrated that even extreme quantization—such as using just 1-bit values for weights and activations—can retain near state-of-the-art accuracy across vision and language tasks [12]. This type of quantization drastically reduces memory access requirements and replaces expensive arithmetic operations with fast, low-power bitwise operations like XNOR and popcount. These benefits are especially important for edge deployment, where energy efficiency is critical. In addition to model compression, Hubara et al. also show that quantized gradients—using as little as 6 bits—can be employed during training with minimal performance loss, further enabling efficient on-device learning [12]. QNNs have achieved strong results even on demanding benchmarks like ImageNet, while offering significant speedups and memory savings, making them one of the most practical solutions for edge AI deployment [12].

'''Pruning:'''
Pruning is an optimization technique that systematically removes low-salience parameters—such as weakly contributing weights or redundant hypothesis paths—from a machine learning model or decoding algorithm to reduce computational overhead. In the context of edge computing, where resources like memory bandwidth, power, and processing time are limited, pruning enables the deployment of performant models within strict efficiency constraints.

In statistical machine translation (SMT) systems, pruning is particularly critical during the decoding phase, where the search space of possible translations grows exponentially with sentence length. Techniques such as histogram pruning and threshold pruning are employed to manage this complexity. Histogram pruning restricts the number of candidate hypotheses retained in a decoding stack to a fixed size 𝑛, discarding the remainder. Threshold pruning eliminates hypotheses whose scores fall below a proportion 𝛼 of the best-scoring candidate, effectively filtering out weak candidates early.

The paper by Banik et al. introduces a machine learning-based dynamic pruning framework that adaptively tunes pruning parameters—namely stack size and beam threshold—based on structural features of the input text, such as sentence length, syntactic complexity, and the distribution of stop words. Rather than relying on static hyperparameters, this method uses a classifier (CN2 algorithm) trained on performance data to predict optimal pruning configurations at runtime. Experimental results showed consistent reductions in decoding latency (up to 90%) while maintaining or improving translation quality, as measured by BLEU scores [13].

This adaptive pruning paradigm is highly relevant to edge inference pipelines, where models must maintain a balance between latency and predictive accuracy. By intelligently limiting the hypothesis space and focusing computational resources on high-probability paths, pruning supports real-time, resource-efficient processing in edge NLP and embedded translation systems.

[[File:Pruning.png|350px|thumb|center|This shows how pruning can significantly reduce the overall network, thus leading to better computational and memory management]]

'''Distillation:'''
Distillation is a key strategy for reducing model complexity in edge computing environments. Instead of training a compact student model on hard labels—discrete class labels like 0, 1, or 2—it is trained on the soft outputs of a larger teacher model. These soft labels represent probability distributions over all classes, offering more nuanced supervision. For instance, rather than telling the student the input belongs strictly to class 3, a teacher might output “70% class 3, 25% class 2, 5% class 1.” This richer feedback helps the student model capture subtle relationships between classes that hard labels miss. Beyond reducing computational demands, distillation enhances generalization by conveying more informative training signals. It also benefits from favorable data geometry—when class distributions are well-separated and aligned—and exhibits strong monotonicity, meaning the student model reliably improves as more data becomes available [11]. These properties make it exceptionally suited for edge devices where training data may be limited, but efficient inference is crucial.

In most cases, knowledge distillation in edge environments involves a large, high-capacity model trained in the cloud acting as the teacher, while the smaller, lightweight student model is deployed on edge devices. A less common—but emerging—practice is edge-to-edge distillation, where a more powerful edge node or edge server functions as the teacher for other nearby edge devices. This setup is especially valuable in federated, collaborative, or hierarchical edge networks, where cloud connectivity may be limited or privacy concerns necessitate local training. Distillation can also be combined with techniques such as quantization or pruning to further optimize model performance under hardware constraints. An example is shown below:

[[File:Knowledge_Distillation.png|700px|thumb|center|This shows how a complex teacher model transfers learned knowledge to a smaller student model using soft predictions to enable efficient edge deployment]]

{| class="wikitable" style="width:100%; text-align:left;"
|+ '''Comparison of Model Compression Techniques for Edge Deployment'''
! Technique
! Description
! Primary Benefit
! Trade-offs
! Ideal Use Case
|-
| '''Pruning'''
| Removes unnecessary weights or neurons from a neural network.
| Reduces model size and computation.
| May require retraining or fine-tuning to preserve accuracy.
| Useful for deploying models on devices with strict memory and compute constraints.
|-
| '''Quantization'''
| Converts high-precision values (e.g., 32-bit float) to lower precision (e.g., 8-bit integer or binary).
| Lowers memory usage and accelerates inference.
| Risk of precision loss, especially in very small or sensitive models.
| Ideal when real-time inference and power efficiency are essential.
|-
| '''Distillation'''
| Trains a smaller model (student) using the output probabilities of a larger, more complex teacher model.
| Preserves performance while reducing model complexity.
| Requires access to a trained teacher model and additional training data.
| Effective when deploying accurate, lightweight models under data or resource constraints.
|}

==='''Usage and Applications of AI Agents'''===
As artificial intelligence and machine learning technologies continue to mature, they pave the way for the development of intelligent AI agents capable of autonomous, context-aware behavior, with the goal of efficiently performing tasks specified by users. These agents combine perception, reasoning, and decision-making to execute tasks with minimal human intervention. When deployed on edge devices, AI agents can operate with low latency, preserve user privacy, and adapt to local data—making them ideal for real-time, personalized applications in homes, vehicles, factories, and beyond.

To function effectively, an agent must first perceive its environment and understand the task—often defined by the user. Then, it must reason about the optimal steps to accomplish that task, and finally, it must act on those decisions. These three components—perception, reasoning, and action—are essential to the agent’s ability to operate accurately and autonomously in dynamic environments.

'''Reasoning:''' The agent must be able to think sequentially, and decompose its specified tasks into a sequence of specific steps in order to accomplish its goal. It must also have some memory storage in order to remember what it has done, as well as the results of its sequence of actions in order to learn for future steps.

'''Autonomy:''' The agent must choose from the availability of possible steps, and operate based on its reasoning without step-by-step instructions from the user.

'''Tools:''' These tasks, however, are impossible to accomplish without the correct tools. Even if an AI agent understands how to go about carrying a task for optimal results, it must have the actual means to do it. This can include the ability to use and interact with APIs, interpret code, and access certain databases.

Utilizing AI agents on edge devices can be tricky due to the computational and reasoning power needed. However, there are methods to accomplish this such as SLMs which query LLMs as need (discussed later), or utilizing more powerful edge devices to carry out tasks. However, utilizing edge devices can be paramount if latency is a major issue, or if the agent is exposed to sensitive user data. Additionally, by using edge devices specific to a user, it may be able to better learn a user's patterns and preferences and react accordingly to provide the best possible outcome for that user.

=='''4.3 ML Model Optimization at the Edge'''==

==='''The Need for Model Optimization at the Edge'''===
Given the constrained resources and the inherently dynamic environments in which edge devices must operate, model optimization is a crucial part of machine learning in edge computing. The current most widely used methodology consists of simply specifying an exceptionally large set of parameters, and giving it to the model to train on. This can be feasible when hardware is very advanced and powerful, and is necessary for systems such as Large Language Models (LLMs). However, this is no longer viable when dealing with the devices and environments at the edge. It is crucial to identify the best parameters and training methodology so as to minimize the amount of work done by these devices, while compromising as little as possible on the accuracy of the models. There are multiple ways to this, and they include either optimization or augmentation of the dataset itself, or optimization of the partition of work among the edge devices.

==='''Edge and Cloud Collaboration'''===
One methodology that is often used involves collaboration between both Edge and Cloud Devices. The cloud has the ability to process workloads that may require much more resources and cannot be done on edge devices. On the other hand, edge devices, which can store and process data locally, may have lower latency and more privacy. Given the advantages of each of these, many have proposed that the best way to handle machine learning is through a combination of edge and cloud computing.

The primary issue facing this computing paradigm, however, is the problem of optimally selecting which workloads should be done on the cloud and which should be done on the edge. This is a crucial problem to solve, as the correct partition of workloads is the best way to ensure that the respective benefits of the devices can be leveraged. A common way to do this, is to run certain computing tasks on the necessary devices and determine the length of time and resources that it takes. An example of this is the profiling step done in [https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10818760&tag=1 EdgeShard] [1] and [https://dl.acm.org/doi/pdf/10.1145/3093337.3037698 Neurosurgeon] [4]. Other frameworks implement similar steps, where the capabilities of different devices are tested in order to allocate their workloads and determine the limit at which they can provide efficient functionality. If the workload is beyond the limits of the devices, it can be sent to the cloud for processing

The key advantage of this is that it is able to utilize the resources of the edge devices as necessary, allowing increased data privacy and lower latency. Since workloads are only processed in the cloud as needed, this will reduce the overall amount of time needed for processing because data is not constantly sent back and forth. It also allows for much less network congestion, which is crucial for many applications.

[[File:ECcollab.png|400px|thumb|center|The collaboration of Edge and Cloud Devices]]

==='''Optimizing Workload Partitioning'''===
The key idea for much of the optimization done in machine learning on edge systems involves fully utilizing the heterogenous devices that are often contained in these systems. As such, it is important to understand the capabilities of each device so as to fully utilize its advantages. Devices can very greatly from smartphones with more powerful computational abilities to raspberry pis to sensors. More difficult tasks are offloaded to the powerful devices, while simpler tasks, or models that have been somewhat pretrained can be sent to the smaller devices. In some cases, as in [https://ieeexplore.ieee.org/abstract/document/8690980 Mobile-Edge] [2], the task may be dropped altogether if the resources are deemed insufficient. In this way, exceptionally difficult tasks do not block tasks that have the ability to be executed and therefore the system can continue working.

==='''Dynamic Models'''===
Given the dynamic nature of the environments that edge devices must function, as well as the heterogeneity of the devices themselves, a dynamic model of machine learning is often employed. Such models must keep track of the current available resources including computation usage and power, as well as network traffic. These may change very often depending on the workloads and devices in the system. As such, training models to continuously monitor and dynamically distribute the workloads is a very important part of optimization. Simply offloading larger tasks to more powerful devices may be obsolete if the devices has all of its computing resources or network capabilities being used up by another workload.

The way this is commonly done is by using the profiling step described above as a baseline. Then, a machine learning model utilizes the data to estimate the performance of devices and/or layers. During runtime, a similar process is employed which may update the data used and help the model refine its predictions. Network traffic is also taken into account at this stage in order to preserve the edge computing benefit of providing lower latency. Using all of this data and updates at runtime, the partitioning model is able to dynamically distribute workloads at runtime in order to optimize the workflow and ensure each device is utilizing its resources in the most efficient manner. 2 very good examples of how such a system is specifically deployed are the Neurosurgeon and EdgeShard systems, shown above.

==='''Horizontal and Vertical Partitioning'''===
There are 2 major ways that these models split the workloads in order to optimize the machine learning: Horizontal and vertical partitioning [3]. Given a set of layers that ranges from the cloud to edge, vertical partitioning involves splitting up the workload between the layers. For example, if a large amount of computational resources is deemed necessary, this task may go to the cloud to be completed and preprocessed. One the other hand, if a small amount of computational power is required, this type of work can go to edge devices. Such partitioning also depends on the confidence and accuracy level of the given learning. If the accuracy is completed on an edge device and found to be very low, it can be sent to the cloud; on the other hand if the accuracy is already fairly high and the learning model needs smaller work to reach the threshold deemed acceptable, it may be sent to edge devices to free up network traffic on the cloud and reduce latency [3].

The second model of partitioning is called horizontal partitioning. This involves splitting among the devices within a certain layer rather than among the layers themselves. This is similar to what has been described in previous sections, as it allows a means for fully utilizing the heterogenous abilities that are found in edge devices. Similar functionality and determination to what is found in horizontal partitioning is done, but all of the devices that the workload is split across function on the same layer [3]. To fully optimize a machine learning model, both horizontal and vertical partitioning must be used.

[[File:Edge_computing_layers.png|400px|thumb|center|An example of different layers with multiple devices]]

==='''Distributed Learning'''===
Distributed learning is a process in which instead of giving all of the data to the nodes, each node takes in and processes only a subset of the overall data, then sends its results to a central server which contains the main model. These periodic updates are done by each node, and by only processing part of the data, it is much more easy for edge devices to handle these workloads. It can also reduce the time and computational burden on the cloud and network because it is not only the central server performing all of the computations. One popular method of accomplishing this is by using parallel gradient descent. Gradient descent by itself is a very useful means of training a model, and parallel gradient descent uses a similar process but instead of a single operation, it aggregates the gradient descent calculated by each node in order to update the central model. This efficiently utilizes each node and makes sure that the data that is used to update the model does not exceed the memory constraints of the edge devices being used as nodes.

[[File:Distributed.png|600px|thumb|center| A visualization of data partitioning among nodes doing distributed learning]]

'''Asynchronous Scheduling:''' One important aspect of distributed learning is the means by which the updates are given to the model. Each node may have different amounts of data, memory, and computational power, and therefore the time it takes to process and update will not be the same. Synchronous algorithms make sure that each node finishes its respective processing, and then all of the calculations from all nodes are sent to update the central model. Although, this can make the updating easier, any nodes that lag behind the others will greatly reduce the speed at which the model is trained; this also leaves nodes idle while others are still processing and can be highly inefficient. To optimize this, after each node has finished its respective processing, it sends everything to the main server for an update rather than waiting for all the others to finish. However, this requires some more communication, as after each update, every node must get the updated model for the central server; this can be challenging to repeatedly do, but the efficiency that is gained is often worth it.

'''Federated learning:''' Federated learning is another means by which the data as well as computational burden is distributed among a set of nodes, thus reducing network traffic and strain on the central servers. It is similar to distributed learning, but the key difference lies in the data partitioning. Unlike distributed learning, data used in federated learning is not shared with the central servers. Instead, only local data from each of the nodes is used to train the model. Then, the only part shared with the central model is the updated parameters. A key aspect of this is that federated learning provides a greater amount of data privacy, which is crucial for certain applications dealing with sensitive data. Therefore, it is especially useful to utilize edge devices to perform federated learning. Federated learning is discussed in detail in chapter 5 of this wiki.

==='''Transfer Learning'''===
Transfer learning is a method of machine learning in which a pretrained model is sent to the different nodes for further processing and fine-tuning. The initial training of the model may involve a very large amount of data and could place a major burden on the device that must execute it, which may not be feasible for edge devices. Therefore, the bulk of the model training is done by a more powerful machine, such as the cloud, and then the pretrained model is sent to the edge device. This can be useful, as it reduces the computational burden on the edge device, and allows it to fine-tune the model using the data it collects without having to completely train the whole system. One form of this is knowledge distillation, in which a smaller model can be trained to mimic that of a larger model. This may often be the case when edge and cloud systems are used in a combined way.

==='''Methods of Data Optimization'''===

'''Data Selection:''' Certain types of data may be more useful than others, and therefore the ones that most affect the accuracy of a model can be offloaded to edge devices. This decreases the workload on them because less data must be processed, while also conserving the accuracy of the model as much as possible. Some larger data may not be needed, such as only putting Small Language Models on edge devices that can handle simple commands and prompts, rather than having to offload an entire LLM onto the device, which may overload its computational abilities and not provide enough use.

'''Data Compression:''' Data can be compressed to a smaller form in order to fit the constraints of edge devices. This is especially true given their limited memory, and also makes the workload smaller. Quantization, discussed previously, is a prevalent example of this.

'''Container Workloads:''' Container workloads can be very useful, as they provide all resources and important data the device needs for processing the work. By examining the computational abilities of the device, different sized workloads can be allocated as deemed necessary to maximize the efficiency of the training.

'''Batch size tuning:''' The batch size used by a certain model is very important when considering the memory constraints of edge devices. Batch sizes that are smaller allow for a quicker and more efficient means of training models, and are less likely to lead to bottlenecks in memory. This is related to partitioning because the computing and memory capacity of the devices available are very important factors to consider.

==='''Utilizing Small Language Models (SLMs)'''===
Large Language Models (LLMs) have become a prevalent system recently and are able to do and help with a variety of tasks. However, running and training an LLM requires a significant amount of computational resources which is not feasible when working with edge devices. Most modern LLMs are cloud based, but this may lead to high latency and increased network traffic, especially when working with a large subsystem of nodes.

One way that a similar system can be achieved on edge devices is by using SLMs. These are not as accurate and do not have the vast knowledge of LLMs or the amount of data they are trained on, but for the purposes of basic applications and edge devices, they can be sufficient to accomplish many tasks. They are also often fine-tuned and trained to accomplish the specific tasks which they are deployed for and are much faster and resource-efficient than LLMs. They can also provide much more privacy because they are able to be run on local devices without sharing user data to the cloud. This can be useful for a wide variety of edge applications. If needed and privacy constraints permit, they can query and LLM as needed for more complicated tasks. This means that not every prompt leads to a query, and thus network traffic, privacy, and latency constraints are still preserved.

==='''Synthesis'''===

In summary, edge-oriented machine learning optimization requires an integrated approach that combines model-level compression with system-level orchestration. Techniques such as quantization, structured and unstructured pruning, and knowledge distillation reduce the computational footprint and memory requirements of deep learning models, enabling deployment on resource-constrained devices without substantial loss in inference accuracy. Concurrently, dynamic workload partitioning, heterogeneity-aware scheduling, and adaptive runtime profiling allow the system to allocate tasks across edge and cloud tiers based on real-time availability of compute, bandwidth, and energy resources. This joint optimization across model architecture and execution environment is essential to meet the latency, privacy, and resilience demands of edge AI deployments.

=='''References'''==
[https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10818760&tag=1 1. M. Zhang, X. Shen, J. Cao, Z. Cui and S. Jiang, "EdgeShard: Efficient LLM Inference via Collaborative Edge Computing," in IEEE Internet of Things Journal, doi: 10.1109/JIOT.2024.3524255]

[https://ieeexplore.ieee.org/abstract/document/8690980 2. X. Chen, H. Zhang, C. Wu, S. Mao, Y. Ji and M. Bennis, "Performance Optimization in Mobile-Edge Computing via Deep Reinforcement Learning," 2018 IEEE 88th Vehicular Technology Conference (VTC-Fall), Chicago, IL, USA, 2018, pp. 1-6, doi: 10.1109/VTCFall.2018.8690980.]

[https://ieeexplore.ieee.org/abstract/document/8976180 3. X. Wang, Y. Han, V. C. M. Leung, D. Niyato, X. Yan and X. Chen, "Convergence of Edge Computing and Deep Learning: A Comprehensive Survey," in IEEE Communications Surveys & Tutorials, vol. 22, no. 2, pp. 869-904, Secondquarter 2020, doi: 10.1109/COMST.2020.2970550.]

[https://dl.acm.org/doi/abs/10.1145/3093337.3037698 4. Kang, Yiping and Hauswald, Johann and Gao, Cao and Rovinski, Austin and Mudge, Trevor and Mars, Jason and Tang, Lingjia, "Neurosurgeon: Collaborative Intelligence Between the Cloud and Mobile Edge" 2017 Association for Computing Machinery, New York, NY, USA, 2017, doi: 10.1145/3093337.3037698.]

[https://dl.acm.org/doi/pdf/10.1145/3555802 5. H. Hua, Y. Li, T. Wang, N. Dong, W. Li, and J. Cao, "Edge computing with artificial intelligence: A machine learning perspective," ACM Computing Surveys, vol. 55, no. 9, Art. no. 184, pp. 1–35, Jan. 2023, doi: 10.1145/3555802..]

[https://www.mdpi.com/2079-9292/13/3/640 6. Grzesik, Piotr, and Dariusz Mrozek. "Combining machine learning and edge computing: Opportunities, challenges, platforms, frameworks, and use cases." Electronics 13.3 (2024): 640.]

[https://link.springer.com/chapter/10.1007/978-3-030-96756-7_1#citeas 7. S. Rafatirad, H. Homayoun, Z. Chen, and S. M. Pudukotai Dinakarrao, "What Is Applied Machine Learning?," in Machine Learning for Computer Scientists and Data Analysts, Cham, Switzerland: Springer, 2022.]

[https://link.springer.com/article/10.1007/s13748-012-0035-5 8. Peteiro-Barral, Diego, and Bertha Guijarro-Berdiñas. "A survey of methods for distributed machine learning." Progress in Artificial Intelligence 2 (2013): 1-11.]

[https://proceedings.neurips.cc/paper/2010/hash/abea47ba24142ed16b7d8fbf2c740e0d-Abstract.html 9. Zinkevich, Martin, et al. "Parallelized stochastic gradient descent." Advances in neural information processing systems 23 (2010).]

[https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9134370 10. F. Zhuang et al., "A Comprehensive Survey on Transfer Learning," in Proceedings of the IEEE, vol. 109, no. 1, pp. 43-76, Jan. 2021, doi: 10.1109/JPROC.2020.3004555]

[https://proceedings.mlr.press/v97/phuong19a/phuong19a.pdf 11. M. Phuong and C. Lampert, "Towards understanding knowledge distillation," in Proc. Int. Conf. Mach. Learn., May 2019, pp. 5142–5151.]

[https://www.jmlr.org/papers/volume18/16-456/16-456.pdf 12. I. Hubara, M. Courbariaux, D. Soudry, R. El-Yaniv, and Y. Bengio, "Quantized neural networks: Training neural networks with low precision weights and activations," Journal of Machine Learning Research, vol. 18, no. 187, pp. 1–30, 2018.]]

[https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=8588318 13. D. Banik, A. Ekbal, and P. Bhattacharyya, "Machine learning based optimized pruning approach for decoding in statistical machine translation," IEEE Access, vol. 7, pp. 1736–1751, Dec. 2018, doi: 10.1109/ACCESS.2018.2883738.]

Machine Learning at the Edge

2025-04-25T05:14:49Z

Ciarang: /* The Need for Model Optimization at the Edge */

='''Machine Learning at the Edge'''=

=='''4.1 Overview of ML at the Edge'''==
==='''Introduction'''===
Machine Learning is a branch of Artificial Intelligence that is primarily concerned with the training of algorithms to look at sets of data, analyze patterns, and generate conclusions so that the gathered data can be used to generate results and carry out tasks. As an application, machine learning is especially relevant when considering edge computing [7]. Given the recent prevalence and rise in the field of machine learning, it is inevitable that deployment of machine learning algorithms will play a crucial role in the function of edge computing systems as well. The abundant sensor networks often involved in edge systems provide a means of gathering very large amounts of data. With the help of machine learning, such data can be utilized in very effective ways and automation methods can be employed to improve efficiency. Machine learning on edge devices themselves can help ease the burden on cloud systems and the networks connected to them as well, especially if the model does need such large computational power such as in the case of certain Small Language Models (SLMs). However, the main issue regarding machine learning at the edge lies in the devices, which may have limited computational power, and the environment, which may change dynamically and be effected by network congestion, device outages, or other unexpected events. If these challenges can be overcome, edge systems could play a pivotal role in advancing machine learning and utilize it to maximize efficiency.

[[File:ECML.png|600px|thumb|center|]]

==='''Benefits of Machine Learning using Edge Computing'''===
'''Data Volumes:''' Traditional cloud computing architecture may not be able to keep up with the massive volume of data that is often generated by IoT devices and sensors, thus increasing costs as well as pressure and congestion on the networks transmitting data [5]. With edge computing architectures, machine learning can be deployed in a distributed nature, allowing ML models to be deployed and trained across many edge devices. The nature of edge devices sharing computational tasks makes it perfect for taking large datasets and splitting the computational load so that many devices can compute what a single device might not have been able to handle [5]. More than just splitting input data across edge devices, machine learning models can be split up across edge devices as well. This allows for deployment of models that would be otherwise too large to be deployed on a single edge device with limited memory. In such cases, edge nodes are able to collaboratively pass data between them, and the global ML model is later updated based on the workings of each smaller portion [5]. Additionally, the sheer amount of data that processed from edge devices means more training data for machine learning models, which generally increases accuracy, and thus edge computing could be an effective solution to providing lots of good and relevant data for a variety of applications.

'''Lower Latency:''' For certain real-time applications, such as Virtual and Augmented Reality(VR/AR) or smart cars, the latency required to transmit data and process it to the cloud may be too high to make these applications efficient and safe. A key benefit of edge computing lies in the reduced latency provided by putting devices closer to the users, and this is applicable to machine learning as well [5].

'''Enhanced privacy:''' Processing data locally rather than on the cloud reduces the risk of data theft and enhances privacy. The data does not have to be sent to anyone else, or go over a network to potentially get compromised. This is crucial given the large amounts of data needed for machine learning, and especially if personal data is needed for a personal application, many users would prefer the privacy done by having it processed locally [5].

==='''Challenges of Machine Learning Using Edge Computing'''===

'''Potentially Low Computational Power:''' Many edge devices lack the computational power needed for needed for deep learning applications, especially given the large amount of data and complex operations that may have to take place [1].

'''Energy Management:''' Given that edge devices may consist of sensors, or other devices that are meant to have low energy consumption, the tasks associated with machine learning could quickly drain their power, even if they have sufficient power to run the workloads [6].

'''New Attack Surfaces:''' With more devices comes the increased potential for malicious hackers to steal data or compromise devices, despite the enhanced privacy. Encryption, access controls, and other methodologies must be employed to mitigate the potential for new attack surfaces to be exploited [6].

==='''Applications for ML at the Edge'''===
The utilization of data to make conjectures about what is going on in the environment and how to respond has a variety of use cases that can greatly benefit people, cities, and the environment. By leveraging and monitoring a constant stream of data and training machine learning models to detect or even respond to different events, there can be many practical applications for such systems. These would rely on the combination of edge devices and machine learning to better enhance experience for users and detect events of interest.

'''Self-Driving Cars''': Imitation learning can be leveraged to better understand and emulate human driving. The low latency that can be provided by edge computing is especially useful for the quick decision making needed by these systems [7].

'''Smart Home Devices''': Understanding user habits by leveraging the available data for them can make smart homes more convenient for users. The increased privacy that can come with edge computing, along with a few extra cybersecurity measures, can ensure the personal data that may be used for training is not compromised.

'''Environmental and Industrial Monitoring:''' Sensors deployed in environments and industrial settings could be trained to recognize when there are anomalies or undesirable behavior, thus ensuring a quick response and active information sending [7].

'''Smart Cities:''' Similar to above, the data collected by sensors in cities can leverage machine learning to help in crime and emergency detection, traffic management, or energy management[7].

=='''4.2 ML Training at the Edge'''==

Machine Learning (ML) training at the edge is basically the process of developing, updating, or fine-tuning ML models directly on edge devices like on smartphones, IoT sensors, wearables, and other embedded systems instead of only depending on centralized cloud infrastructure. This approach is becoming a lot more important as the demand for real-time, personalized AI applications continues to grow. By being able to train models closer to where the data is generated, edge-based ML enables faster responses, helps reduce latency, and enhances user privacy by minimizing the need to transmit sensitive data to the cloud. It’s also especially useful in scenarios where devices operate in environments with limited or unreliable network connectivity, allowing them to function more efficiently.

[[File:Machine Learning Pipeline.png|500px|thumb|right|Diagram of the machine learning pipeline: raw structured and unstructured data is preprocessed, analyzed, and used for feature selection, followed by model construction, evaluation, and deployment for application use.]]

[[File:Learning.png|500px|thumb|center| This diagram shows the relationship among different devices in the edge-cloud system. As shown, the deep learning workload is partitioned among all layers so that each one contributes based on its advantages.]]

==='''Benefits:'''===
One significant advantage of training ML models directly on edge devices is reduced latency. By processing data locally, devices can make immediate decisions without the delays caused by transmitting data back and forth to cloud servers. This immediate responsiveness is extremely important for applications like real time health monitoring, autonomous driving, and industrial automation.

Additionally, training machine learning models at the edge significantly enhances user privacy. Since sensitive data can be processed and stored directly on the user's device rather than being sent to centralized cloud servers, the risk of data breaches or unauthorized access during transmission is reduced by a lot. This local data handling is able to prevent exposure of personal or confidential information, providing users greater control over their data. Edge-based training naturally aligns with privacy regulations such as the General Data Protection Regulation (GDPR), which emphasizes strict data security, transparency, and explicit user consent. By keeping personal data localized, edge training not only improves security but also helps organizations easily comply with privacy laws, protecting users’ rights and maintaining trust.

Efficiency and resilience are important benefits of edge training. By training machine learning models directly on edge devices, these devices become capable of processing data locally without relying on constant internet connectivity. This local processing allows edge devices to continue operating effectively even in environments where network connections are weak, unstable, or completely unavailable. Because they are not fully dependent on cloud infrastructure, edge devices can quickly adapt to changes, respond in real-time, and update their ML models based on immediate local data. As a result, edge training ensures reliable performance and uninterrupted operation, making it particularly valuable for remote locations, emergency scenarios, and harsh environments where cloud-based solutions might fail or become unreliable.

'''Examples: '''
A smart thermostat in a home can learn a user’s preferences for temperature and adjust automatically based on real-time inputs, like time of day or weather conditions. Similarly, a fitness tracker can track user activity patterns and adapt its recommendations for workouts or rest periods based on how the user is performing each day. These devices don’t need to rely on cloud servers to update or personalize their behavior — they can do it instantly on the device, which makes them more responsive and efficient.

In smart agriculture, edge computing is used to enhance crop monitoring and optimize farming practices. Devices like soil sensors, drones, and automated irrigation systems are equipped with sensors that collect data on soil moisture, temperature, and crop health. Edge devices process this data locally, enabling real-time decisions for tasks like irrigation, fertilization, and pest control.

In smart retail, edge computing is used to improve inventory management and customer experience. Retailers use smart shelves, RFID tags, and in-store cameras equipped with sensors to track inventory and monitor customer behavior. By processing this data locally on edge devices, retailers can manage stock levels, detect theft, and optimize store layouts in real-time. RFID tags placed on products can detect when an item is removed from the shelf. Using edge processing, the system can immediately update the inventory count and trigger a restocking request if an item’s stock is low.

'''Research Papers:'''
An important contribution to the understanding of machine learning (ML) training at the edge is the research paper "Making Distributed Edge Machine Learning for Resource-Constrained Communities and Environments Smarter: Contexts and Challenges" by Truong et al. (2023). This paper focuses on training ML models directly on edge devices in communities and environments facing limitations, such as unstable network connections, limited computational resources, and scarce technical expertise. The authors emphasize the necessity of developing context-aware ML training methods specifically tailored to these environments. Traditional centralized ML training methods often fail to operate effectively in such constrained settings, highlighting the need for decentralized, localized solutions. Truong et al. explore various challenges, including managing data efficiently, deploying suitable software frameworks, and designing intelligent runtime strategies that allow edge devices to train models effectively despite limited resources. Their work points out significant research directions, advocating for more adaptable and sustainable ML training solutions that genuinely reflect the technological and social contexts of resource-limited environments.

==='''Tools and Frameworks:'''===

Frameworks like TensorFlow Lite, PyTorch Mobile, and Edge Impulse are designed to support edge-based model training and inference. These tools allow developers to build and fine-tune models specifically for deployment on low-power devices.

==='''Technical Challenges:'''===

Despite its advantages, ML training at the edge presents challenges, including limited processing power, memory constraints, and energy efficiency. Edge devices often lack the computational resources of cloud servers, requiring lightweight models, optimized algorithms, and energy-efficient hardware.

Real World Applications:
A well known example is Apple’s use of on-device training for personalized voice recognition with Siri. Instead of uploading user voice data to the cloud, Apple uses local training to improve accuracy over time while maintaining user privacy.

==='''Model Compression Techniques'''===
Despite the challenges, of ML at the edge, there are a variety of methods that can be used to provide a more efficient means of training, and making the heavy workloads compatible with even the limited computing power of certain edge devices.

'''Quantization:'''
Quantization is a method that involves reducing the precision of numbers, and thus easing the burden of computational power as well as memory management on the edge devices. There are multiple forms of quantization, but each one essentially sacrifices some precision - enough so that accuracy is mostly maintained but the numbers are easier to handle. For example, converting from floating point to integer datatypes means significantly less memory is used, and the differences for some models in the precision may be negligible. Another example is K-means based Weight Quantization, which involves creating a matrix and grouping similar numbers together with centroids. An example is shown below:

[[File:Screenshot_2025-04-24_170604.png|500px|thumb|right|By clustering each index in the matrix, and using centroids to approximate, the overall computations can be done much quicker and are more easily handled by edge devices]]

In recent work, Quantized Neural Networks (QNNs) have demonstrated that even extreme quantization—such as using just 1-bit values for weights and activations—can retain near state-of-the-art accuracy across vision and language tasks [12]. This type of quantization drastically reduces memory access requirements and replaces expensive arithmetic operations with fast, low-power bitwise operations like XNOR and popcount. These benefits are especially important for edge deployment, where energy efficiency is critical. In addition to model compression, Hubara et al. also show that quantized gradients—using as little as 6 bits—can be employed during training with minimal performance loss, further enabling efficient on-device learning [12]. QNNs have achieved strong results even on demanding benchmarks like ImageNet, while offering significant speedups and memory savings, making them one of the most practical solutions for edge AI deployment [12].

'''Pruning:'''
Pruning is an optimization technique that systematically removes low-salience parameters—such as weakly contributing weights or redundant hypothesis paths—from a machine learning model or decoding algorithm to reduce computational overhead. In the context of edge computing, where resources like memory bandwidth, power, and processing time are limited, pruning enables the deployment of performant models within strict efficiency constraints.

In statistical machine translation (SMT) systems, pruning is particularly critical during the decoding phase, where the search space of possible translations grows exponentially with sentence length. Techniques such as histogram pruning and threshold pruning are employed to manage this complexity. Histogram pruning restricts the number of candidate hypotheses retained in a decoding stack to a fixed size 𝑛, discarding the remainder. Threshold pruning eliminates hypotheses whose scores fall below a proportion 𝛼 of the best-scoring candidate, effectively filtering out weak candidates early.

The paper by Banik et al. introduces a machine learning-based dynamic pruning framework that adaptively tunes pruning parameters—namely stack size and beam threshold—based on structural features of the input text, such as sentence length, syntactic complexity, and the distribution of stop words. Rather than relying on static hyperparameters, this method uses a classifier (CN2 algorithm) trained on performance data to predict optimal pruning configurations at runtime. Experimental results showed consistent reductions in decoding latency (up to 90%) while maintaining or improving translation quality, as measured by BLEU scores [13].

This adaptive pruning paradigm is highly relevant to edge inference pipelines, where models must maintain a balance between latency and predictive accuracy. By intelligently limiting the hypothesis space and focusing computational resources on high-probability paths, pruning supports real-time, resource-efficient processing in edge NLP and embedded translation systems.

[[File:Pruning.png|350px|thumb|center|This shows how pruning can significantly reduce the overall network, thus leading to better computational and memory management]]

'''Distillation:'''
Distillation is a key strategy for reducing model complexity in edge computing environments. Instead of training a compact student model on hard labels—discrete class labels like 0, 1, or 2—it is trained on the soft outputs of a larger teacher model. These soft labels represent probability distributions over all classes, offering more nuanced supervision. For instance, rather than telling the student the input belongs strictly to class 3, a teacher might output “70% class 3, 25% class 2, 5% class 1.” This richer feedback helps the student model capture subtle relationships between classes that hard labels miss. Beyond reducing computational demands, distillation enhances generalization by conveying more informative training signals. It also benefits from favorable data geometry—when class distributions are well-separated and aligned—and exhibits strong monotonicity, meaning the student model reliably improves as more data becomes available [11]. These properties make it exceptionally suited for edge devices where training data may be limited, but efficient inference is crucial.

In most cases, knowledge distillation in edge environments involves a large, high-capacity model trained in the cloud acting as the teacher, while the smaller, lightweight student model is deployed on edge devices. A less common—but emerging—practice is edge-to-edge distillation, where a more powerful edge node or edge server functions as the teacher for other nearby edge devices. This setup is especially valuable in federated, collaborative, or hierarchical edge networks, where cloud connectivity may be limited or privacy concerns necessitate local training. Distillation can also be combined with techniques such as quantization or pruning to further optimize model performance under hardware constraints. An example is shown below:

[[File:Knowledge_Distillation.png|700px|thumb|center|This shows how a complex teacher model transfers learned knowledge to a smaller student model using soft predictions to enable efficient edge deployment]]

{| class="wikitable" style="width:100%; text-align:left;"
|+ '''Comparison of Model Compression Techniques for Edge Deployment'''
! Technique
! Description
! Primary Benefit
! Trade-offs
! Ideal Use Case
|-
| '''Pruning'''
| Removes unnecessary weights or neurons from a neural network.
| Reduces model size and computation.
| May require retraining or fine-tuning to preserve accuracy.
| Useful for deploying models on devices with strict memory and compute constraints.
|-
| '''Quantization'''
| Converts high-precision values (e.g., 32-bit float) to lower precision (e.g., 8-bit integer or binary).
| Lowers memory usage and accelerates inference.
| Risk of precision loss, especially in very small or sensitive models.
| Ideal when real-time inference and power efficiency are essential.
|-
| '''Distillation'''
| Trains a smaller model (student) using the output probabilities of a larger, more complex teacher model.
| Preserves performance while reducing model complexity.
| Requires access to a trained teacher model and additional training data.
| Effective when deploying accurate, lightweight models under data or resource constraints.
|}

==='''Usage and Applications of AI Agents'''===
As artificial intelligence and machine learning technologies continue to mature, they pave the way for the development of intelligent AI agents capable of autonomous, context-aware behavior, with the goal of efficiently performing tasks specified by users. These agents combine perception, reasoning, and decision-making to execute tasks with minimal human intervention. When deployed on edge devices, AI agents can operate with low latency, preserve user privacy, and adapt to local data—making them ideal for real-time, personalized applications in homes, vehicles, factories, and beyond.

To function effectively, an agent must first perceive its environment and understand the task—often defined by the user. Then, it must reason about the optimal steps to accomplish that task, and finally, it must act on those decisions. These three components—perception, reasoning, and action—are essential to the agent’s ability to operate accurately and autonomously in dynamic environments.

'''Reasoning:''' The agent must be able to think sequentially, and decompose its specified tasks into a sequence of specific steps in order to accomplish its goal. It must also have some memory storage in order to remember what it has done, as well as the results of its sequence of actions in order to learn for future steps.

'''Autonomy:''' The agent must choose from the availability of possible steps, and operate based on its reasoning without step-by-step instructions from the user.

'''Tools:''' These tasks, however, are impossible to accomplish without the correct tools. Even if an AI agent understands how to go about carrying a task for optimal results, it must have the actual means to do it. This can include the ability to use and interact with APIs, interpret code, and access certain databases.

Utilizing AI agents on edge devices can be tricky due to the computational and reasoning power needed. However, there are methods to accomplish this such as SLMs which query LLMs as need (discussed later), or utilizing more powerful edge devices to carry out tasks. However, utilizing edge devices can be paramount if latency is a major issue, or if the agent is exposed to sensitive user data. Additionally, by using edge devices specific to a user, it may be able to better learn a user's patterns and preferences and react accordingly to provide the best possible outcome for that user.

=='''4.3 ML Model Optimization at the Edge'''==

==='''The Need for Model Optimization at the Edge'''===
Given the constrained resources and the inherently dynamic environments in which edge devices must operate, model optimization is a crucial part of machine learning in edge computing. The current most widely used methodology consists of simply specifying an exceptionally large set of parameters, and giving it to the model to train on. This can be feasible when hardware is very advanced and powerful, and is necessary for systems such as Large Language Models (LLMs). However, this is no longer viable when dealing with the devices and environments at the edge. It is crucial to identify the best parameters and training methodology so as to minimize the amount of work done by these devices, while compromising as little as possible on the accuracy of the models. There are multiple ways to this, and they include either optimization or augmentation of the dataset itself, or optimization of the partition of work among the edge devices.

==='''Edge and Cloud Collaboration'''===
One methodology that is often used involves collaboration between both Edge and Cloud Devices. The cloud has the ability to process workloads that may require much more resources and cannot be done on edge devices. On the other hand, edge devices, which can store and process data locally, may have lower latency and more privacy. Given the advantages of each of these, many have proposed that the best way to handle machine learning is through a combination of edge and cloud computing.

The primary issue facing this computing paradigm, however, is the problem of optimally selecting which workloads should be done on the cloud and which should be done on the edge. This is a crucial problem to solve, as the correct partition of workloads is the best way to ensure that the respective benefits of the devices can be leveraged. A common way to do this, is to run certain computing tasks on the necessary devices and determine the length of time and resources that it takes. An example of this is the profiling step done in [https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10818760&tag=1 EdgeShard] [1] and [https://dl.acm.org/doi/pdf/10.1145/3093337.3037698 Neurosurgeon] [4]. Other frameworks implement similar steps, where the capabilities of different devices are tested in order to allocate their workloads and determine the limit at which they can provide efficient functionality. If the workload is beyond the limits of the devices, it can be sent to the cloud for processing

The key advantage of this is that it is able to utilize the resources of the edge devices as necessary, allowing increased data privacy and lower latency. Since workloads are only processed in the cloud as needed, this will reduce the overall amount of time needed for processing because data is not constantly sent back and forth. It also allows for much less network congestion, which is crucial for many applications.

[[File:ECcollab.png|400px|thumb|center|The collaboration of Edge and Cloud Devices]]

==='''Optimizing Workload Partitioning'''===
The key idea for much of the optimization done in machine learning on edge systems involves fully utilizing the heterogenous devices that are often contained in these systems. As such, it is important to understand the capabilities of each device so as to fully utilize its advantages. Devices can very greatly from smartphones with more powerful computational abilities to raspberry pis to sensors. More difficult tasks are offloaded to the powerful devices, while simpler tasks, or models that have been somewhat pretrained can be sent to the smaller devices. In some cases, as in [https://ieeexplore.ieee.org/abstract/document/8690980 Mobile-Edge] [2], the task may be dropped altogether if the resources are deemed insufficient. In this way, exceptionally difficult tasks do not block tasks that have the ability to be executed and therefore the system can continue working.

==='''Dynamic Models'''===
Given the dynamic nature of the environments that edge devices must function, as well as the heterogeneity of the devices themselves, a dynamic model of machine learning is often employed. Such models must keep track of the current available resources including computation usage and power, as well as network traffic. These may change very often depending on the workloads and devices in the system. As such, training models to continuously monitor and dynamically distribute the workloads is a very important part of optimization. Simply offloading larger tasks to more powerful devices may be obsolete if the devices has all of its computing resources or network capabilities being used up by another workload.

The way this is commonly done is by using the profiling step described above as a baseline. Then, a machine learning model utilizes the data to estimate the performance of devices and/or layers. During runtime, a similar process is employed which may update the data used and help the model refine its predictions. Network traffic is also taken into account at this stage in order to preserve the edge computing benefit of providing lower latency. Using all of this data and updates at runtime, the partitioning model is able to dynamically distribute workloads at runtime in order to optimize the workflow and ensure each device is utilizing its resources in the most efficient manner. 2 very good examples of how such a system is specifically deployed are the Neurosurgeon and EdgeShard systems, shown above.

==='''Horizontal and Vertical Partitioning'''===
There are 2 major ways that these models split the workloads in order to optimize the machine learning: Horizontal and vertical partitioning [3]. Given a set of layers that ranges from the cloud to edge, vertical partitioning involves splitting up the workload between the layers. For example, if a large amount of computational resources is deemed necessary, this task may go to the cloud to be completed and preprocessed. One the other hand, if a small amount of computational power is required, this type of work can go to edge devices. Such partitioning also depends on the confidence and accuracy level of the given learning. If the accuracy is completed on an edge device and found to be very low, it can be sent to the cloud; on the other hand if the accuracy is already fairly high and the learning model needs smaller work to reach the threshold deemed acceptable, it may be sent to edge devices to free up network traffic on the cloud and reduce latency [3].

The second model of partitioning is called horizontal partitioning. This involves splitting among the devices within a certain layer rather than among the layers themselves. This is similar to what has been described in previous sections, as it allows a means for fully utilizing the heterogenous abilities that are found in edge devices. Similar functionality and determination to what is found in horizontal partitioning is done, but all of the devices that the workload is split across function on the same layer [3]. To fully optimize a machine learning model, both horizontal and vertical partitioning must be used.

[[File:Edge_computing_layers.png|400px|thumb|center|An example of different layers with multiple devices]]

==='''Distributed Learning'''===
Distributed learning is a process in which instead of giving all of the data to the nodes, each node takes in and processes only a subset of the overall data, then sends its results to a central server which contains the main model. These periodic updates are done by each node, and by only processing part of the data, it is much more easy for edge devices to handle these workloads. It can also reduce the time and computational burden on the cloud and network because it is not only the central server performing all of the computations. One popular method of accomplishing this is by using parallel gradient descent. Gradient descent by itself is a very useful means of training a model, and parallel gradient descent uses a similar process but instead of a single operation, it aggregates the gradient descent calculated by each node in order to update the central model. This efficiently utilizes each node and makes sure that the data that is used to update the model does not exceed the memory constraints of the edge devices being used as nodes.

[[File:Distributed.png|600px|thumb|center| A visualization of data partitioning among nodes doing distributed learning]]

'''Asynchronous Scheduling:''' One important aspect of distributed learning is the means by which the updates are given to the model. Each node may have different amounts of data, memory, and computational power, and therefore the time it takes to process and update will not be the same. Synchronous algorithms make sure that each node finishes its respective processing, and then all of the calculations from all nodes are sent to update the central model. Although, this can make the updating easier, any nodes that lag behind the others will greatly reduce the speed at which the model is trained; this also leaves nodes idle while others are still processing and can be highly inefficient. To optimize this, after each node has finished its respective processing, it sends everything to the main server for an update rather than waiting for all the others to finish. However, this requires some more communication, as after each update, every node must get the updated model for the central server; this can be challenging to repeatedly do, but the efficiency that is gained is often worth it.

'''Federated learning:''' Federated learning is another means by which the data as well as computational burden is distributed among a set of nodes, thus reducing network traffic and strain on the central servers. It is similar to distributed learning, but the key difference lies in the data partitioning. Unlike distributed learning, data used in federated learning is not shared with the central servers. Instead, only local data from each of the nodes is used to train the model. Then, the only part shared with the central model is the updated parameters. A key aspect of this is that federated learning provides a greater amount of data privacy, which is crucial for certain applications dealing with sensitive data. Therefore, it is especially useful to utilize edge devices to perform federated learning. Federated learning is discussed in detail in chapter 5 of this wiki.

==='''Transfer Learning'''===
Transfer learning is a method of machine learning in which a pretrained model is sent to the different nodes for further processing and fine-tuning. The initial training of the model may involve a very large amount of data and could place a major burden on the device that must execute it, which may not be feasible for edge devices. Therefore, the bulk of the model training is done by a more powerful machine, such as the cloud, and then the pretrained model is sent to the edge device. This can be useful, as it reduces the computational burden on the edge device, and allows it to fine-tune the model using the data it collects without having to completely train the whole system. One form of this is knowledge distillation, in which a smaller model can be trained to mimic that of a larger model. This may often be the case when edge and cloud systems are used in a combined way.

==='''Methods of Data Optimization'''===

'''Data Selection:''' Certain types of data may be more useful than others, and therefore the ones that most affect the accuracy of a model can be offloaded to edge devices. This decreases the workload on them because less data must be processed, while also conserving the accuracy of the model as much as possible. Some larger data may not be needed, such as only putting Small Language Models on edge devices that can handle simple commands and prompts, rather than having to offload an entire LLM onto the device, which may overload its computational abilities and not provide enough use.

'''Data Compression:''' Data can be compressed to a smaller form in order to fit the constraints of edge devices. This is especially true given their limited memory, and also makes the workload smaller. Quantization, discussed previously, is a prevalent example of this.

'''Container Workloads:''' Container workloads can be very useful, as they provide all resources and important data the device needs for processing the work. By examining the computational abilities of the device, different sized workloads can be allocated as deemed necessary to maximize the efficiency of the training.

'''Batch size tuning:''' The batch size used by a certain model is very important when considering the memory constraints of edge devices. Batch sizes that are smaller allow for a quicker and more efficient means of training models, and are less likely to lead to bottlenecks in memory. This is related to partitioning because the computing and memory capacity of the devices available are very important factors to consider.

==='''Utilizing Small Language Models (SLMs)'''===
Large Language Models (LLMs) have become a prevalent system recently and are able to do and help with a variety of tasks. However, running and training an LLM requires a significant amount of computational resources which is not feasible when working with edge devices. Most modern LLMs are cloud based, but this may lead to high latency and increased network traffic, especially when working with a large subsystem of nodes.

One way that a similar system can be achieved on edge devices is by using SLMs. These are not as accurate and do not have the vast knowledge of LLMs or the amount of data they are trained on, but for the purposes of basic applications and edge devices, they can be sufficient to accomplish many tasks. They are also often fine-tuned and trained to accomplish the specific tasks which they are deployed for and are much faster and resource-efficient than LLMs. They can also provide much more privacy because they are able to be run on local devices without sharing user data to the cloud. This can be useful for a wide variety of edge applications. If needed and privacy constraints permit, they can query and LLM as needed for more complicated tasks. This means that not every prompt leads to a query, and thus network traffic, privacy, and latency constraints are still preserved.

==='''Synthesis'''===

In summary, edge-oriented machine learning optimization requires an integrated approach that combines model-level compression with system-level orchestration. Techniques such as quantization, structured and unstructured pruning, and knowledge distillation reduce the computational footprint and memory requirements of deep learning models, enabling deployment on resource-constrained devices without substantial loss in inference accuracy. Concurrently, dynamic workload partitioning, heterogeneity-aware scheduling, and adaptive runtime profiling allow the system to allocate tasks across edge and cloud tiers based on real-time availability of compute, bandwidth, and energy resources. This joint optimization across model architecture and execution environment is essential to meet the latency, privacy, and resilience demands of edge AI deployments.

=='''References'''==
[https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10818760&tag=1 1. M. Zhang, X. Shen, J. Cao, Z. Cui and S. Jiang, "EdgeShard: Efficient LLM Inference via Collaborative Edge Computing," in IEEE Internet of Things Journal, doi: 10.1109/JIOT.2024.3524255]

[https://ieeexplore.ieee.org/abstract/document/8690980 2. X. Chen, H. Zhang, C. Wu, S. Mao, Y. Ji and M. Bennis, "Performance Optimization in Mobile-Edge Computing via Deep Reinforcement Learning," 2018 IEEE 88th Vehicular Technology Conference (VTC-Fall), Chicago, IL, USA, 2018, pp. 1-6, doi: 10.1109/VTCFall.2018.8690980.]

[https://ieeexplore.ieee.org/abstract/document/8976180 3. X. Wang, Y. Han, V. C. M. Leung, D. Niyato, X. Yan and X. Chen, "Convergence of Edge Computing and Deep Learning: A Comprehensive Survey," in IEEE Communications Surveys & Tutorials, vol. 22, no. 2, pp. 869-904, Secondquarter 2020, doi: 10.1109/COMST.2020.2970550.]

[https://dl.acm.org/doi/abs/10.1145/3093337.3037698 4. Kang, Yiping and Hauswald, Johann and Gao, Cao and Rovinski, Austin and Mudge, Trevor and Mars, Jason and Tang, Lingjia, "Neurosurgeon: Collaborative Intelligence Between the Cloud and Mobile Edge" 2017 Association for Computing Machinery, New York, NY, USA, 2017, doi: 10.1145/3093337.3037698.]

[https://dl.acm.org/doi/pdf/10.1145/3555802 5. H. Hua, Y. Li, T. Wang, N. Dong, W. Li, and J. Cao, "Edge computing with artificial intelligence: A machine learning perspective," ACM Computing Surveys, vol. 55, no. 9, Art. no. 184, pp. 1–35, Jan. 2023, doi: 10.1145/3555802..]

[https://www.mdpi.com/2079-9292/13/3/640 6. Grzesik, Piotr, and Dariusz Mrozek. "Combining machine learning and edge computing: Opportunities, challenges, platforms, frameworks, and use cases." Electronics 13.3 (2024): 640.]

[https://link.springer.com/chapter/10.1007/978-3-030-96756-7_1#citeas 7. S. Rafatirad, H. Homayoun, Z. Chen, and S. M. Pudukotai Dinakarrao, "What Is Applied Machine Learning?," in Machine Learning for Computer Scientists and Data Analysts, Cham, Switzerland: Springer, 2022.]

[https://link.springer.com/article/10.1007/s13748-012-0035-5 8. Peteiro-Barral, Diego, and Bertha Guijarro-Berdiñas. "A survey of methods for distributed machine learning." Progress in Artificial Intelligence 2 (2013): 1-11.]

[https://proceedings.neurips.cc/paper/2010/hash/abea47ba24142ed16b7d8fbf2c740e0d-Abstract.html 9. Zinkevich, Martin, et al. "Parallelized stochastic gradient descent." Advances in neural information processing systems 23 (2010).]

[https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9134370 10. F. Zhuang et al., "A Comprehensive Survey on Transfer Learning," in Proceedings of the IEEE, vol. 109, no. 1, pp. 43-76, Jan. 2021, doi: 10.1109/JPROC.2020.3004555]

[https://proceedings.mlr.press/v97/phuong19a/phuong19a.pdf 11. M. Phuong and C. Lampert, "Towards understanding knowledge distillation," in Proc. Int. Conf. Mach. Learn., May 2019, pp. 5142–5151.]

[https://www.jmlr.org/papers/volume18/16-456/16-456.pdf 12. I. Hubara, M. Courbariaux, D. Soudry, R. El-Yaniv, and Y. Bengio, "Quantized neural networks: Training neural networks with low precision weights and activations," Journal of Machine Learning Research, vol. 18, no. 187, pp. 1–30, 2018.]]

[https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=8588318 13. D. Banik, A. Ekbal, and P. Bhattacharyya, "Machine learning based optimized pruning approach for decoding in statistical machine translation," IEEE Access, vol. 7, pp. 1736–1751, Dec. 2018, doi: 10.1109/ACCESS.2018.2883738.]

Machine Learning at the Edge

2025-04-25T05:10:52Z

Ciarang: /* 4.2 ML Training at the Edge */

='''Machine Learning at the Edge'''=

=='''4.1 Overview of ML at the Edge'''==
==='''Introduction'''===
Machine Learning is a branch of Artificial Intelligence that is primarily concerned with the training of algorithms to look at sets of data, analyze patterns, and generate conclusions so that the gathered data can be used to generate results and carry out tasks. As an application, machine learning is especially relevant when considering edge computing [7]. Given the recent prevalence and rise in the field of machine learning, it is inevitable that deployment of machine learning algorithms will play a crucial role in the function of edge computing systems as well. The abundant sensor networks often involved in edge systems provide a means of gathering very large amounts of data. With the help of machine learning, such data can be utilized in very effective ways and automation methods can be employed to improve efficiency. Machine learning on edge devices themselves can help ease the burden on cloud systems and the networks connected to them as well, especially if the model does need such large computational power such as in the case of certain Small Language Models (SLMs). However, the main issue regarding machine learning at the edge lies in the devices, which may have limited computational power, and the environment, which may change dynamically and be effected by network congestion, device outages, or other unexpected events. If these challenges can be overcome, edge systems could play a pivotal role in advancing machine learning and utilize it to maximize efficiency.

[[File:ECML.png|600px|thumb|center|]]

==='''Benefits of Machine Learning using Edge Computing'''===
'''Data Volumes:''' Traditional cloud computing architecture may not be able to keep up with the massive volume of data that is often generated by IoT devices and sensors, thus increasing costs as well as pressure and congestion on the networks transmitting data [5]. With edge computing architectures, machine learning can be deployed in a distributed nature, allowing ML models to be deployed and trained across many edge devices. The nature of edge devices sharing computational tasks makes it perfect for taking large datasets and splitting the computational load so that many devices can compute what a single device might not have been able to handle [5]. More than just splitting input data across edge devices, machine learning models can be split up across edge devices as well. This allows for deployment of models that would be otherwise too large to be deployed on a single edge device with limited memory. In such cases, edge nodes are able to collaboratively pass data between them, and the global ML model is later updated based on the workings of each smaller portion [5]. Additionally, the sheer amount of data that processed from edge devices means more training data for machine learning models, which generally increases accuracy, and thus edge computing could be an effective solution to providing lots of good and relevant data for a variety of applications.

'''Lower Latency:''' For certain real-time applications, such as Virtual and Augmented Reality(VR/AR) or smart cars, the latency required to transmit data and process it to the cloud may be too high to make these applications efficient and safe. A key benefit of edge computing lies in the reduced latency provided by putting devices closer to the users, and this is applicable to machine learning as well [5].

'''Enhanced privacy:''' Processing data locally rather than on the cloud reduces the risk of data theft and enhances privacy. The data does not have to be sent to anyone else, or go over a network to potentially get compromised. This is crucial given the large amounts of data needed for machine learning, and especially if personal data is needed for a personal application, many users would prefer the privacy done by having it processed locally [5].

==='''Challenges of Machine Learning Using Edge Computing'''===

'''Potentially Low Computational Power:''' Many edge devices lack the computational power needed for needed for deep learning applications, especially given the large amount of data and complex operations that may have to take place [1].

'''Energy Management:''' Given that edge devices may consist of sensors, or other devices that are meant to have low energy consumption, the tasks associated with machine learning could quickly drain their power, even if they have sufficient power to run the workloads [6].

'''New Attack Surfaces:''' With more devices comes the increased potential for malicious hackers to steal data or compromise devices, despite the enhanced privacy. Encryption, access controls, and other methodologies must be employed to mitigate the potential for new attack surfaces to be exploited [6].

==='''Applications for ML at the Edge'''===
The utilization of data to make conjectures about what is going on in the environment and how to respond has a variety of use cases that can greatly benefit people, cities, and the environment. By leveraging and monitoring a constant stream of data and training machine learning models to detect or even respond to different events, there can be many practical applications for such systems. These would rely on the combination of edge devices and machine learning to better enhance experience for users and detect events of interest.

'''Self-Driving Cars''': Imitation learning can be leveraged to better understand and emulate human driving. The low latency that can be provided by edge computing is especially useful for the quick decision making needed by these systems [7].

'''Smart Home Devices''': Understanding user habits by leveraging the available data for them can make smart homes more convenient for users. The increased privacy that can come with edge computing, along with a few extra cybersecurity measures, can ensure the personal data that may be used for training is not compromised.

'''Environmental and Industrial Monitoring:''' Sensors deployed in environments and industrial settings could be trained to recognize when there are anomalies or undesirable behavior, thus ensuring a quick response and active information sending [7].

'''Smart Cities:''' Similar to above, the data collected by sensors in cities can leverage machine learning to help in crime and emergency detection, traffic management, or energy management[7].

=='''4.2 ML Training at the Edge'''==

Machine Learning (ML) training at the edge is basically the process of developing, updating, or fine-tuning ML models directly on edge devices like on smartphones, IoT sensors, wearables, and other embedded systems instead of only depending on centralized cloud infrastructure. This approach is becoming a lot more important as the demand for real-time, personalized AI applications continues to grow. By being able to train models closer to where the data is generated, edge-based ML enables faster responses, helps reduce latency, and enhances user privacy by minimizing the need to transmit sensitive data to the cloud. It’s also especially useful in scenarios where devices operate in environments with limited or unreliable network connectivity, allowing them to function more efficiently.

[[File:Machine Learning Pipeline.png|500px|thumb|right|Diagram of the machine learning pipeline: raw structured and unstructured data is preprocessed, analyzed, and used for feature selection, followed by model construction, evaluation, and deployment for application use.]]

[[File:Learning.png|500px|thumb|center| This diagram shows the relationship among different devices in the edge-cloud system. As shown, the deep learning workload is partitioned among all layers so that each one contributes based on its advantages.]]

==='''Benefits:'''===
One significant advantage of training ML models directly on edge devices is reduced latency. By processing data locally, devices can make immediate decisions without the delays caused by transmitting data back and forth to cloud servers. This immediate responsiveness is extremely important for applications like real time health monitoring, autonomous driving, and industrial automation.

Additionally, training machine learning models at the edge significantly enhances user privacy. Since sensitive data can be processed and stored directly on the user's device rather than being sent to centralized cloud servers, the risk of data breaches or unauthorized access during transmission is reduced by a lot. This local data handling is able to prevent exposure of personal or confidential information, providing users greater control over their data. Edge-based training naturally aligns with privacy regulations such as the General Data Protection Regulation (GDPR), which emphasizes strict data security, transparency, and explicit user consent. By keeping personal data localized, edge training not only improves security but also helps organizations easily comply with privacy laws, protecting users’ rights and maintaining trust.

Efficiency and resilience are important benefits of edge training. By training machine learning models directly on edge devices, these devices become capable of processing data locally without relying on constant internet connectivity. This local processing allows edge devices to continue operating effectively even in environments where network connections are weak, unstable, or completely unavailable. Because they are not fully dependent on cloud infrastructure, edge devices can quickly adapt to changes, respond in real-time, and update their ML models based on immediate local data. As a result, edge training ensures reliable performance and uninterrupted operation, making it particularly valuable for remote locations, emergency scenarios, and harsh environments where cloud-based solutions might fail or become unreliable.

'''Examples: '''
A smart thermostat in a home can learn a user’s preferences for temperature and adjust automatically based on real-time inputs, like time of day or weather conditions. Similarly, a fitness tracker can track user activity patterns and adapt its recommendations for workouts or rest periods based on how the user is performing each day. These devices don’t need to rely on cloud servers to update or personalize their behavior — they can do it instantly on the device, which makes them more responsive and efficient.

In smart agriculture, edge computing is used to enhance crop monitoring and optimize farming practices. Devices like soil sensors, drones, and automated irrigation systems are equipped with sensors that collect data on soil moisture, temperature, and crop health. Edge devices process this data locally, enabling real-time decisions for tasks like irrigation, fertilization, and pest control.

In smart retail, edge computing is used to improve inventory management and customer experience. Retailers use smart shelves, RFID tags, and in-store cameras equipped with sensors to track inventory and monitor customer behavior. By processing this data locally on edge devices, retailers can manage stock levels, detect theft, and optimize store layouts in real-time. RFID tags placed on products can detect when an item is removed from the shelf. Using edge processing, the system can immediately update the inventory count and trigger a restocking request if an item’s stock is low.

'''Research Papers:'''
An important contribution to the understanding of machine learning (ML) training at the edge is the research paper "Making Distributed Edge Machine Learning for Resource-Constrained Communities and Environments Smarter: Contexts and Challenges" by Truong et al. (2023). This paper focuses on training ML models directly on edge devices in communities and environments facing limitations, such as unstable network connections, limited computational resources, and scarce technical expertise. The authors emphasize the necessity of developing context-aware ML training methods specifically tailored to these environments. Traditional centralized ML training methods often fail to operate effectively in such constrained settings, highlighting the need for decentralized, localized solutions. Truong et al. explore various challenges, including managing data efficiently, deploying suitable software frameworks, and designing intelligent runtime strategies that allow edge devices to train models effectively despite limited resources. Their work points out significant research directions, advocating for more adaptable and sustainable ML training solutions that genuinely reflect the technological and social contexts of resource-limited environments.

==='''Tools and Frameworks:'''===

Frameworks like TensorFlow Lite, PyTorch Mobile, and Edge Impulse are designed to support edge-based model training and inference. These tools allow developers to build and fine-tune models specifically for deployment on low-power devices.

==='''Technical Challenges:'''===

Despite its advantages, ML training at the edge presents challenges, including limited processing power, memory constraints, and energy efficiency. Edge devices often lack the computational resources of cloud servers, requiring lightweight models, optimized algorithms, and energy-efficient hardware.

Real World Applications:
A well known example is Apple’s use of on-device training for personalized voice recognition with Siri. Instead of uploading user voice data to the cloud, Apple uses local training to improve accuracy over time while maintaining user privacy.

==='''Model Compression Techniques'''===
Despite the challenges, of ML at the edge, there are a variety of methods that can be used to provide a more efficient means of training, and making the heavy workloads compatible with even the limited computing power of certain edge devices.

'''Quantization:'''
Quantization is a method that involves reducing the precision of numbers, and thus easing the burden of computational power as well as memory management on the edge devices. There are multiple forms of quantization, but each one essentially sacrifices some precision - enough so that accuracy is mostly maintained but the numbers are easier to handle. For example, converting from floating point to integer datatypes means significantly less memory is used, and the differences for some models in the precision may be negligible. Another example is K-means based Weight Quantization, which involves creating a matrix and grouping similar numbers together with centroids. An example is shown below:

[[File:Screenshot_2025-04-24_170604.png|500px|thumb|right|By clustering each index in the matrix, and using centroids to approximate, the overall computations can be done much quicker and are more easily handled by edge devices]]

In recent work, Quantized Neural Networks (QNNs) have demonstrated that even extreme quantization—such as using just 1-bit values for weights and activations—can retain near state-of-the-art accuracy across vision and language tasks [12]. This type of quantization drastically reduces memory access requirements and replaces expensive arithmetic operations with fast, low-power bitwise operations like XNOR and popcount. These benefits are especially important for edge deployment, where energy efficiency is critical. In addition to model compression, Hubara et al. also show that quantized gradients—using as little as 6 bits—can be employed during training with minimal performance loss, further enabling efficient on-device learning [12]. QNNs have achieved strong results even on demanding benchmarks like ImageNet, while offering significant speedups and memory savings, making them one of the most practical solutions for edge AI deployment [12].

'''Pruning:'''
Pruning is an optimization technique that systematically removes low-salience parameters—such as weakly contributing weights or redundant hypothesis paths—from a machine learning model or decoding algorithm to reduce computational overhead. In the context of edge computing, where resources like memory bandwidth, power, and processing time are limited, pruning enables the deployment of performant models within strict efficiency constraints.

In statistical machine translation (SMT) systems, pruning is particularly critical during the decoding phase, where the search space of possible translations grows exponentially with sentence length. Techniques such as histogram pruning and threshold pruning are employed to manage this complexity. Histogram pruning restricts the number of candidate hypotheses retained in a decoding stack to a fixed size 𝑛, discarding the remainder. Threshold pruning eliminates hypotheses whose scores fall below a proportion 𝛼 of the best-scoring candidate, effectively filtering out weak candidates early.

The paper by Banik et al. introduces a machine learning-based dynamic pruning framework that adaptively tunes pruning parameters—namely stack size and beam threshold—based on structural features of the input text, such as sentence length, syntactic complexity, and the distribution of stop words. Rather than relying on static hyperparameters, this method uses a classifier (CN2 algorithm) trained on performance data to predict optimal pruning configurations at runtime. Experimental results showed consistent reductions in decoding latency (up to 90%) while maintaining or improving translation quality, as measured by BLEU scores [13].

This adaptive pruning paradigm is highly relevant to edge inference pipelines, where models must maintain a balance between latency and predictive accuracy. By intelligently limiting the hypothesis space and focusing computational resources on high-probability paths, pruning supports real-time, resource-efficient processing in edge NLP and embedded translation systems.

[[File:Pruning.png|350px|thumb|center|This shows how pruning can significantly reduce the overall network, thus leading to better computational and memory management]]

'''Distillation:'''
Distillation is a key strategy for reducing model complexity in edge computing environments. Instead of training a compact student model on hard labels—discrete class labels like 0, 1, or 2—it is trained on the soft outputs of a larger teacher model. These soft labels represent probability distributions over all classes, offering more nuanced supervision. For instance, rather than telling the student the input belongs strictly to class 3, a teacher might output “70% class 3, 25% class 2, 5% class 1.” This richer feedback helps the student model capture subtle relationships between classes that hard labels miss. Beyond reducing computational demands, distillation enhances generalization by conveying more informative training signals. It also benefits from favorable data geometry—when class distributions are well-separated and aligned—and exhibits strong monotonicity, meaning the student model reliably improves as more data becomes available [11]. These properties make it exceptionally suited for edge devices where training data may be limited, but efficient inference is crucial.

In most cases, knowledge distillation in edge environments involves a large, high-capacity model trained in the cloud acting as the teacher, while the smaller, lightweight student model is deployed on edge devices. A less common—but emerging—practice is edge-to-edge distillation, where a more powerful edge node or edge server functions as the teacher for other nearby edge devices. This setup is especially valuable in federated, collaborative, or hierarchical edge networks, where cloud connectivity may be limited or privacy concerns necessitate local training. Distillation can also be combined with techniques such as quantization or pruning to further optimize model performance under hardware constraints. An example is shown below:

[[File:Knowledge_Distillation.png|700px|thumb|center|This shows how a complex teacher model transfers learned knowledge to a smaller student model using soft predictions to enable efficient edge deployment]]

{| class="wikitable" style="width:100%; text-align:left;"
|+ '''Comparison of Model Compression Techniques for Edge Deployment'''
! Technique
! Description
! Primary Benefit
! Trade-offs
! Ideal Use Case
|-
| '''Pruning'''
| Removes unnecessary weights or neurons from a neural network.
| Reduces model size and computation.
| May require retraining or fine-tuning to preserve accuracy.
| Useful for deploying models on devices with strict memory and compute constraints.
|-
| '''Quantization'''
| Converts high-precision values (e.g., 32-bit float) to lower precision (e.g., 8-bit integer or binary).
| Lowers memory usage and accelerates inference.
| Risk of precision loss, especially in very small or sensitive models.
| Ideal when real-time inference and power efficiency are essential.
|-
| '''Distillation'''
| Trains a smaller model (student) using the output probabilities of a larger, more complex teacher model.
| Preserves performance while reducing model complexity.
| Requires access to a trained teacher model and additional training data.
| Effective when deploying accurate, lightweight models under data or resource constraints.
|}

==='''Usage and Applications of AI Agents'''===
As artificial intelligence and machine learning technologies continue to mature, they pave the way for the development of intelligent AI agents capable of autonomous, context-aware behavior, with the goal of efficiently performing tasks specified by users. These agents combine perception, reasoning, and decision-making to execute tasks with minimal human intervention. When deployed on edge devices, AI agents can operate with low latency, preserve user privacy, and adapt to local data—making them ideal for real-time, personalized applications in homes, vehicles, factories, and beyond.

To function effectively, an agent must first perceive its environment and understand the task—often defined by the user. Then, it must reason about the optimal steps to accomplish that task, and finally, it must act on those decisions. These three components—perception, reasoning, and action—are essential to the agent’s ability to operate accurately and autonomously in dynamic environments.

'''Reasoning:''' The agent must be able to think sequentially, and decompose its specified tasks into a sequence of specific steps in order to accomplish its goal. It must also have some memory storage in order to remember what it has done, as well as the results of its sequence of actions in order to learn for future steps.

'''Autonomy:''' The agent must choose from the availability of possible steps, and operate based on its reasoning without step-by-step instructions from the user.

'''Tools:''' These tasks, however, are impossible to accomplish without the correct tools. Even if an AI agent understands how to go about carrying a task for optimal results, it must have the actual means to do it. This can include the ability to use and interact with APIs, interpret code, and access certain databases.

Utilizing AI agents on edge devices can be tricky due to the computational and reasoning power needed. However, there are methods to accomplish this such as SLMs which query LLMs as need (discussed later), or utilizing more powerful edge devices to carry out tasks. However, utilizing edge devices can be paramount if latency is a major issue, or if the agent is exposed to sensitive user data. Additionally, by using edge devices specific to a user, it may be able to better learn a user's patterns and preferences and react accordingly to provide the best possible outcome for that user.

=='''4.3 ML Model Optimization at the Edge'''==

==='''The Need for Model Optimization at the Edge'''===
Given the constrained resources, along with the inherently dynamic environment that edge devices must function in, model optimization is a crucial part of machine learning in edge computing. The current most widely used methodology consists of simply specifying an exceptionally large set of parameters, and giving it to the model to train on. This can be feasible when hardware is very advanced and powerful, and is necessary for systems such as Large Language Models (LLMs). However, this is no longer viable when dealing with the devices and environments at the edge. It is crucial to identify the best parameters and training methodology so as to minimize the amount of work done by these devices, while compromising as little as possible on the accuracy of the models. There are multiple ways to this, and they include either optimization or augmentation of the dataset itself, or optimization of the partition of work among the edge devices.

==='''Edge and Cloud Collaboration'''===
One methodology that is often used involves collaboration between both Edge and Cloud Devices. The cloud has the ability to process workloads that may require much more resources and cannot be done on edge devices. On the other hand, edge devices, which can store and process data locally, may have lower latency and more privacy. Given the advantages of each of these, many have proposed that the best way to handle machine learning is through a combination of edge and cloud computing.

The primary issue facing this computing paradigm, however, is the problem of optimally selecting which workloads should be done on the cloud and which should be done on the edge. This is a crucial problem to solve, as the correct partition of workloads is the best way to ensure that the respective benefits of the devices can be leveraged. A common way to do this, is to run certain computing tasks on the necessary devices and determine the length of time and resources that it takes. An example of this is the profiling step done in [https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10818760&tag=1 EdgeShard] [1] and [https://dl.acm.org/doi/pdf/10.1145/3093337.3037698 Neurosurgeon] [4]. Other frameworks implement similar steps, where the capabilities of different devices are tested in order to allocate their workloads and determine the limit at which they can provide efficient functionality. If the workload is beyond the limits of the devices, it can be sent to the cloud for processing

The key advantage of this is that it is able to utilize the resources of the edge devices as necessary, allowing increased data privacy and lower latency. Since workloads are only processed in the cloud as needed, this will reduce the overall amount of time needed for processing because data is not constantly sent back and forth. It also allows for much less network congestion, which is crucial for many applications.

[[File:ECcollab.png|400px|thumb|center|The collaboration of Edge and Cloud Devices]]

==='''Optimizing Workload Partitioning'''===
The key idea for much of the optimization done in machine learning on edge systems involves fully utilizing the heterogenous devices that are often contained in these systems. As such, it is important to understand the capabilities of each device so as to fully utilize its advantages. Devices can very greatly from smartphones with more powerful computational abilities to raspberry pis to sensors. More difficult tasks are offloaded to the powerful devices, while simpler tasks, or models that have been somewhat pretrained can be sent to the smaller devices. In some cases, as in [https://ieeexplore.ieee.org/abstract/document/8690980 Mobile-Edge] [2], the task may be dropped altogether if the resources are deemed insufficient. In this way, exceptionally difficult tasks do not block tasks that have the ability to be executed and therefore the system can continue working.

==='''Dynamic Models'''===
Given the dynamic nature of the environments that edge devices must function, as well as the heterogeneity of the devices themselves, a dynamic model of machine learning is often employed. Such models must keep track of the current available resources including computation usage and power, as well as network traffic. These may change very often depending on the workloads and devices in the system. As such, training models to continuously monitor and dynamically distribute the workloads is a very important part of optimization. Simply offloading larger tasks to more powerful devices may be obsolete if the devices has all of its computing resources or network capabilities being used up by another workload.

The way this is commonly done is by using the profiling step described above as a baseline. Then, a machine learning model utilizes the data to estimate the performance of devices and/or layers. During runtime, a similar process is employed which may update the data used and help the model refine its predictions. Network traffic is also taken into account at this stage in order to preserve the edge computing benefit of providing lower latency. Using all of this data and updates at runtime, the partitioning model is able to dynamically distribute workloads at runtime in order to optimize the workflow and ensure each device is utilizing its resources in the most efficient manner. 2 very good examples of how such a system is specifically deployed are the Neurosurgeon and EdgeShard systems, shown above.

==='''Horizontal and Vertical Partitioning'''===
There are 2 major ways that these models split the workloads in order to optimize the machine learning: Horizontal and vertical partitioning [3]. Given a set of layers that ranges from the cloud to edge, vertical partitioning involves splitting up the workload between the layers. For example, if a large amount of computational resources is deemed necessary, this task may go to the cloud to be completed and preprocessed. One the other hand, if a small amount of computational power is required, this type of work can go to edge devices. Such partitioning also depends on the confidence and accuracy level of the given learning. If the accuracy is completed on an edge device and found to be very low, it can be sent to the cloud; on the other hand if the accuracy is already fairly high and the learning model needs smaller work to reach the threshold deemed acceptable, it may be sent to edge devices to free up network traffic on the cloud and reduce latency [3].

The second model of partitioning is called horizontal partitioning. This involves splitting among the devices within a certain layer rather than among the layers themselves. This is similar to what has been described in previous sections, as it allows a means for fully utilizing the heterogenous abilities that are found in edge devices. Similar functionality and determination to what is found in horizontal partitioning is done, but all of the devices that the workload is split across function on the same layer [3]. To fully optimize a machine learning model, both horizontal and vertical partitioning must be used.

[[File:Edge_computing_layers.png|400px|thumb|center|An example of different layers with multiple devices]]

==='''Distributed Learning'''===
Distributed learning is a process in which instead of giving all of the data to the nodes, each node takes in and processes only a subset of the overall data, then sends its results to a central server which contains the main model. These periodic updates are done by each node, and by only processing part of the data, it is much more easy for edge devices to handle these workloads. It can also reduce the time and computational burden on the cloud and network because it is not only the central server performing all of the computations. One popular method of accomplishing this is by using parallel gradient descent. Gradient descent by itself is a very useful means of training a model, and parallel gradient descent uses a similar process but instead of a single operation, it aggregates the gradient descent calculated by each node in order to update the central model. This efficiently utilizes each node and makes sure that the data that is used to update the model does not exceed the memory constraints of the edge devices being used as nodes.

[[File:Distributed.png|600px|thumb|center| A visualization of data partitioning among nodes doing distributed learning]]

'''Asynchronous Scheduling:''' One important aspect of distributed learning is the means by which the updates are given to the model. Each node may have different amounts of data, memory, and computational power, and therefore the time it takes to process and update will not be the same. Synchronous algorithms make sure that each node finishes its respective processing, and then all of the calculations from all nodes are sent to update the central model. Although, this can make the updating easier, any nodes that lag behind the others will greatly reduce the speed at which the model is trained; this also leaves nodes idle while others are still processing and can be highly inefficient. To optimize this, after each node has finished its respective processing, it sends everything to the main server for an update rather than waiting for all the others to finish. However, this requires some more communication, as after each update, every node must get the updated model for the central server; this can be challenging to repeatedly do, but the efficiency that is gained is often worth it.

'''Federated learning:''' Federated learning is another means by which the data as well as computational burden is distributed among a set of nodes, thus reducing network traffic and strain on the central servers. It is similar to distributed learning, but the key difference lies in the data partitioning. Unlike distributed learning, data used in federated learning is not shared with the central servers. Instead, only local data from each of the nodes is used to train the model. Then, the only part shared with the central model is the updated parameters. A key aspect of this is that federated learning provides a greater amount of data privacy, which is crucial for certain applications dealing with sensitive data. Therefore, it is especially useful to utilize edge devices to perform federated learning. Federated learning is discussed in detail in chapter 5 of this wiki.

==='''Transfer Learning'''===
Transfer learning is a method of machine learning in which a pretrained model is sent to the different nodes for further processing and fine-tuning. The initial training of the model may involve a very large amount of data and could place a major burden on the device that must execute it, which may not be feasible for edge devices. Therefore, the bulk of the model training is done by a more powerful machine, such as the cloud, and then the pretrained model is sent to the edge device. This can be useful, as it reduces the computational burden on the edge device, and allows it to fine-tune the model using the data it collects without having to completely train the whole system. One form of this is knowledge distillation, in which a smaller model can be trained to mimic that of a larger model. This may often be the case when edge and cloud systems are used in a combined way.

==='''Methods of Data Optimization'''===

'''Data Selection:''' Certain types of data may be more useful than others, and therefore the ones that most affect the accuracy of a model can be offloaded to edge devices. This decreases the workload on them because less data must be processed, while also conserving the accuracy of the model as much as possible. Some larger data may not be needed, such as only putting Small Language Models on edge devices that can handle simple commands and prompts, rather than having to offload an entire LLM onto the device, which may overload its computational abilities and not provide enough use.

'''Data Compression:''' Data can be compressed to a smaller form in order to fit the constraints of edge devices. This is especially true given their limited memory, and also makes the workload smaller. Quantization, discussed previously, is a prevalent example of this.

'''Container Workloads:''' Container workloads can be very useful, as they provide all resources and important data the device needs for processing the work. By examining the computational abilities of the device, different sized workloads can be allocated as deemed necessary to maximize the efficiency of the training.

'''Batch size tuning:''' The batch size used by a certain model is very important when considering the memory constraints of edge devices. Batch sizes that are smaller allow for a quicker and more efficient means of training models, and are less likely to lead to bottlenecks in memory. This is related to partitioning because the computing and memory capacity of the devices available are very important factors to consider.

==='''Utilizing Small Language Models (SLMs)'''===
Large Language Models (LLMs) have become a prevalent system recently and are able to do and help with a variety of tasks. However, running and training an LLM requires a significant amount of computational resources which is not feasible when working with edge devices. Most modern LLMs are cloud based, but this may lead to high latency and increased network traffic, especially when working with a large subsystem of nodes.

One way that a similar system can be achieved on edge devices is by using SLMs. These are not as accurate and do not have the vast knowledge of LLMs or the amount of data they are trained on, but for the purposes of basic applications and edge devices, they can be sufficient to accomplish many tasks. They are also often fine-tuned and trained to accomplish the specific tasks which they are deployed for and are much faster and resource-efficient than LLMs. They can also provide much more privacy because they are able to be run on local devices without sharing user data to the cloud. This can be useful for a wide variety of edge applications. If needed and privacy constraints permit, they can query and LLM as needed for more complicated tasks. This means that not every prompt leads to a query, and thus network traffic, privacy, and latency constraints are still preserved.

==='''Synthesis'''===

In summary, edge-oriented machine learning optimization requires an integrated approach that combines model-level compression with system-level orchestration. Techniques such as quantization, structured and unstructured pruning, and knowledge distillation reduce the computational footprint and memory requirements of deep learning models, enabling deployment on resource-constrained devices without substantial loss in inference accuracy. Concurrently, dynamic workload partitioning, heterogeneity-aware scheduling, and adaptive runtime profiling allow the system to allocate tasks across edge and cloud tiers based on real-time availability of compute, bandwidth, and energy resources. This joint optimization across model architecture and execution environment is essential to meet the latency, privacy, and resilience demands of edge AI deployments.

=='''References'''==
[https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10818760&tag=1 1. M. Zhang, X. Shen, J. Cao, Z. Cui and S. Jiang, "EdgeShard: Efficient LLM Inference via Collaborative Edge Computing," in IEEE Internet of Things Journal, doi: 10.1109/JIOT.2024.3524255]

[https://ieeexplore.ieee.org/abstract/document/8690980 2. X. Chen, H. Zhang, C. Wu, S. Mao, Y. Ji and M. Bennis, "Performance Optimization in Mobile-Edge Computing via Deep Reinforcement Learning," 2018 IEEE 88th Vehicular Technology Conference (VTC-Fall), Chicago, IL, USA, 2018, pp. 1-6, doi: 10.1109/VTCFall.2018.8690980.]

[https://ieeexplore.ieee.org/abstract/document/8976180 3. X. Wang, Y. Han, V. C. M. Leung, D. Niyato, X. Yan and X. Chen, "Convergence of Edge Computing and Deep Learning: A Comprehensive Survey," in IEEE Communications Surveys & Tutorials, vol. 22, no. 2, pp. 869-904, Secondquarter 2020, doi: 10.1109/COMST.2020.2970550.]

[https://dl.acm.org/doi/abs/10.1145/3093337.3037698 4. Kang, Yiping and Hauswald, Johann and Gao, Cao and Rovinski, Austin and Mudge, Trevor and Mars, Jason and Tang, Lingjia, "Neurosurgeon: Collaborative Intelligence Between the Cloud and Mobile Edge" 2017 Association for Computing Machinery, New York, NY, USA, 2017, doi: 10.1145/3093337.3037698.]

[https://dl.acm.org/doi/pdf/10.1145/3555802 5. H. Hua, Y. Li, T. Wang, N. Dong, W. Li, and J. Cao, "Edge computing with artificial intelligence: A machine learning perspective," ACM Computing Surveys, vol. 55, no. 9, Art. no. 184, pp. 1–35, Jan. 2023, doi: 10.1145/3555802..]

[https://www.mdpi.com/2079-9292/13/3/640 6. Grzesik, Piotr, and Dariusz Mrozek. "Combining machine learning and edge computing: Opportunities, challenges, platforms, frameworks, and use cases." Electronics 13.3 (2024): 640.]

[https://link.springer.com/chapter/10.1007/978-3-030-96756-7_1#citeas 7. S. Rafatirad, H. Homayoun, Z. Chen, and S. M. Pudukotai Dinakarrao, "What Is Applied Machine Learning?," in Machine Learning for Computer Scientists and Data Analysts, Cham, Switzerland: Springer, 2022.]

[https://link.springer.com/article/10.1007/s13748-012-0035-5 8. Peteiro-Barral, Diego, and Bertha Guijarro-Berdiñas. "A survey of methods for distributed machine learning." Progress in Artificial Intelligence 2 (2013): 1-11.]

[https://proceedings.neurips.cc/paper/2010/hash/abea47ba24142ed16b7d8fbf2c740e0d-Abstract.html 9. Zinkevich, Martin, et al. "Parallelized stochastic gradient descent." Advances in neural information processing systems 23 (2010).]

[https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9134370 10. F. Zhuang et al., "A Comprehensive Survey on Transfer Learning," in Proceedings of the IEEE, vol. 109, no. 1, pp. 43-76, Jan. 2021, doi: 10.1109/JPROC.2020.3004555]

[https://proceedings.mlr.press/v97/phuong19a/phuong19a.pdf 11. M. Phuong and C. Lampert, "Towards understanding knowledge distillation," in Proc. Int. Conf. Mach. Learn., May 2019, pp. 5142–5151.]

[https://www.jmlr.org/papers/volume18/16-456/16-456.pdf 12. I. Hubara, M. Courbariaux, D. Soudry, R. El-Yaniv, and Y. Bengio, "Quantized neural networks: Training neural networks with low precision weights and activations," Journal of Machine Learning Research, vol. 18, no. 187, pp. 1–30, 2018.]]

[https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=8588318 13. D. Banik, A. Ekbal, and P. Bhattacharyya, "Machine learning based optimized pruning approach for decoding in statistical machine translation," IEEE Access, vol. 7, pp. 1736–1751, Dec. 2018, doi: 10.1109/ACCESS.2018.2883738.]

Machine Learning at the Edge

2025-04-25T05:10:12Z

Ciarang: /* Dealing With Challenges */

='''Machine Learning at the Edge'''=

=='''4.1 Overview of ML at the Edge'''==
==='''Introduction'''===
Machine Learning is a branch of Artificial Intelligence that is primarily concerned with the training of algorithms to look at sets of data, analyze patterns, and generate conclusions so that the gathered data can be used to generate results and carry out tasks. As an application, machine learning is especially relevant when considering edge computing [7]. Given the recent prevalence and rise in the field of machine learning, it is inevitable that deployment of machine learning algorithms will play a crucial role in the function of edge computing systems as well. The abundant sensor networks often involved in edge systems provide a means of gathering very large amounts of data. With the help of machine learning, such data can be utilized in very effective ways and automation methods can be employed to improve efficiency. Machine learning on edge devices themselves can help ease the burden on cloud systems and the networks connected to them as well, especially if the model does need such large computational power such as in the case of certain Small Language Models (SLMs). However, the main issue regarding machine learning at the edge lies in the devices, which may have limited computational power, and the environment, which may change dynamically and be effected by network congestion, device outages, or other unexpected events. If these challenges can be overcome, edge systems could play a pivotal role in advancing machine learning and utilize it to maximize efficiency.

[[File:ECML.png|600px|thumb|center|]]

==='''Benefits of Machine Learning using Edge Computing'''===
'''Data Volumes:''' Traditional cloud computing architecture may not be able to keep up with the massive volume of data that is often generated by IoT devices and sensors, thus increasing costs as well as pressure and congestion on the networks transmitting data [5]. With edge computing architectures, machine learning can be deployed in a distributed nature, allowing ML models to be deployed and trained across many edge devices. The nature of edge devices sharing computational tasks makes it perfect for taking large datasets and splitting the computational load so that many devices can compute what a single device might not have been able to handle [5]. More than just splitting input data across edge devices, machine learning models can be split up across edge devices as well. This allows for deployment of models that would be otherwise too large to be deployed on a single edge device with limited memory. In such cases, edge nodes are able to collaboratively pass data between them, and the global ML model is later updated based on the workings of each smaller portion [5]. Additionally, the sheer amount of data that processed from edge devices means more training data for machine learning models, which generally increases accuracy, and thus edge computing could be an effective solution to providing lots of good and relevant data for a variety of applications.

'''Lower Latency:''' For certain real-time applications, such as Virtual and Augmented Reality(VR/AR) or smart cars, the latency required to transmit data and process it to the cloud may be too high to make these applications efficient and safe. A key benefit of edge computing lies in the reduced latency provided by putting devices closer to the users, and this is applicable to machine learning as well [5].

'''Enhanced privacy:''' Processing data locally rather than on the cloud reduces the risk of data theft and enhances privacy. The data does not have to be sent to anyone else, or go over a network to potentially get compromised. This is crucial given the large amounts of data needed for machine learning, and especially if personal data is needed for a personal application, many users would prefer the privacy done by having it processed locally [5].

==='''Challenges of Machine Learning Using Edge Computing'''===

'''Potentially Low Computational Power:''' Many edge devices lack the computational power needed for needed for deep learning applications, especially given the large amount of data and complex operations that may have to take place [1].

'''Energy Management:''' Given that edge devices may consist of sensors, or other devices that are meant to have low energy consumption, the tasks associated with machine learning could quickly drain their power, even if they have sufficient power to run the workloads [6].

'''New Attack Surfaces:''' With more devices comes the increased potential for malicious hackers to steal data or compromise devices, despite the enhanced privacy. Encryption, access controls, and other methodologies must be employed to mitigate the potential for new attack surfaces to be exploited [6].

==='''Applications for ML at the Edge'''===
The utilization of data to make conjectures about what is going on in the environment and how to respond has a variety of use cases that can greatly benefit people, cities, and the environment. By leveraging and monitoring a constant stream of data and training machine learning models to detect or even respond to different events, there can be many practical applications for such systems. These would rely on the combination of edge devices and machine learning to better enhance experience for users and detect events of interest.

'''Self-Driving Cars''': Imitation learning can be leveraged to better understand and emulate human driving. The low latency that can be provided by edge computing is especially useful for the quick decision making needed by these systems [7].

'''Smart Home Devices''': Understanding user habits by leveraging the available data for them can make smart homes more convenient for users. The increased privacy that can come with edge computing, along with a few extra cybersecurity measures, can ensure the personal data that may be used for training is not compromised.

'''Environmental and Industrial Monitoring:''' Sensors deployed in environments and industrial settings could be trained to recognize when there are anomalies or undesirable behavior, thus ensuring a quick response and active information sending [7].

'''Smart Cities:''' Similar to above, the data collected by sensors in cities can leverage machine learning to help in crime and emergency detection, traffic management, or energy management[7].

=='''4.2 ML Training at the Edge'''==

Machine Learning (ML) training at the edge is basically the process of developing, updating, or fine-tuning ML models directly on edge devices like on smartphones, IoT sensors, wearables, and other embedded systems instead of only depending on centralized cloud infrastructure. This approach is becoming a lot more important as the demand for real-time, personalized AI applications continues to grow. By being able to train models closer to where the data is generated, edge-based ML enables faster responses, helps reduce latency, and enhances user privacy by minimizing the need to transmit sensitive data to the cloud. It’s also especially useful in scenarios where devices operate in environments with limited or unreliable network connectivity, allowing them to function more efficiently.

[[File:Machine Learning Pipeline.png|500px|thumb|right|Diagram of the machine learning pipeline: raw structured and unstructured data is preprocessed, analyzed, and used for feature selection, followed by model construction, evaluation, and deployment for application use.]]

[[File:Learning.png|500px|thumb|center| This diagram shows the relationship among different devices in the edge-cloud system. As shown, the deep learning workload is partitioned among all layers so that each one contributes based on its advantages.]]

==='''Benefits:'''===
One significant advantage of training ML models directly on edge devices is reduced latency. By processing data locally, devices can make immediate decisions without the delays caused by transmitting data back and forth to cloud servers. This immediate responsiveness is extremely important for applications like real time health monitoring, autonomous driving, and industrial automation.

Additionally, training machine learning models at the edge significantly enhances user privacy. Since sensitive data can be processed and stored directly on the user's device rather than being sent to centralized cloud servers, the risk of data breaches or unauthorized access during transmission is reduced by a lot. This local data handling is able to prevent exposure of personal or confidential information, providing users greater control over their data. Edge-based training naturally aligns with privacy regulations such as the General Data Protection Regulation (GDPR), which emphasizes strict data security, transparency, and explicit user consent. By keeping personal data localized, edge training not only improves security but also helps organizations easily comply with privacy laws, protecting users’ rights and maintaining trust.

Efficiency and resilience are important benefits of edge training. By training machine learning models directly on edge devices, these devices become capable of processing data locally without relying on constant internet connectivity. This local processing allows edge devices to continue operating effectively even in environments where network connections are weak, unstable, or completely unavailable. Because they are not fully dependent on cloud infrastructure, edge devices can quickly adapt to changes, respond in real-time, and update their ML models based on immediate local data. As a result, edge training ensures reliable performance and uninterrupted operation, making it particularly valuable for remote locations, emergency scenarios, and harsh environments where cloud-based solutions might fail or become unreliable.

'''Examples: '''
A smart thermostat in a home can learn a user’s preferences for temperature and adjust automatically based on real-time inputs, like time of day or weather conditions. Similarly, a fitness tracker can track user activity patterns and adapt its recommendations for workouts or rest periods based on how the user is performing each day. These devices don’t need to rely on cloud servers to update or personalize their behavior — they can do it instantly on the device, which makes them more responsive and efficient.

In smart agriculture, edge computing is used to enhance crop monitoring and optimize farming practices. Devices like soil sensors, drones, and automated irrigation systems are equipped with sensors that collect data on soil moisture, temperature, and crop health. Edge devices process this data locally, enabling real-time decisions for tasks like irrigation, fertilization, and pest control.

In smart retail, edge computing is used to improve inventory management and customer experience. Retailers use smart shelves, RFID tags, and in-store cameras equipped with sensors to track inventory and monitor customer behavior. By processing this data locally on edge devices, retailers can manage stock levels, detect theft, and optimize store layouts in real-time. RFID tags placed on products can detect when an item is removed from the shelf. Using edge processing, the system can immediately update the inventory count and trigger a restocking request if an item’s stock is low.

'''Research Papers:'''
An important contribution to the understanding of machine learning (ML) training at the edge is the research paper "Making Distributed Edge Machine Learning for Resource-Constrained Communities and Environments Smarter: Contexts and Challenges" by Truong et al. (2023). This paper focuses on training ML models directly on edge devices in communities and environments facing limitations, such as unstable network connections, limited computational resources, and scarce technical expertise. The authors emphasize the necessity of developing context-aware ML training methods specifically tailored to these environments. Traditional centralized ML training methods often fail to operate effectively in such constrained settings, highlighting the need for decentralized, localized solutions. Truong et al. explore various challenges, including managing data efficiently, deploying suitable software frameworks, and designing intelligent runtime strategies that allow edge devices to train models effectively despite limited resources. Their work points out significant research directions, advocating for more adaptable and sustainable ML training solutions that genuinely reflect the technological and social contexts of resource-limited environments.

==='''Tools and Frameworks:'''===

Frameworks like TensorFlow Lite, PyTorch Mobile, and Edge Impulse are designed to support edge-based model training and inference. These tools allow developers to build and fine-tune models specifically for deployment on low-power devices.

==='''Technical Challenges:'''===

Despite its advantages, ML training at the edge presents challenges, including limited processing power, memory constraints, and energy efficiency. Edge devices often lack the computational resources of cloud servers, requiring lightweight models, optimized algorithms, and energy-efficient hardware.

Real World Applications:
A well known example is Apple’s use of on-device training for personalized voice recognition with Siri. Instead of uploading user voice data to the cloud, Apple uses local training to improve accuracy over time while maintaining user privacy.

==='''Model Compression Techniques'''===
Despite the challenges, of ML at the edge, there are a variety of methods that can be used to provide a more efficient means of training, and making the heavy workloads compatible with even the limited computing power of certain edge devices.

'''Quantization:'''
Quantization is a method that involves reducing the precision of numbers, and thus easing the burden of computational power as well as memory management on the edge devices. There are multiple forms of quantization, but each one essentially sacrifices some precision - enough so that accuracy is mostly maintained but the numbers are easier to handle. For example, converting from floating point to integer datatypes means significantly less memory is used, and the differences for some models in the precision may be negligible. Another example is K-means based Weight Quantization, which involves creating a matrix and grouping similar numbers together with centroids. An example is shown below:

[[File:Screenshot_2025-04-24_170604.png|500px|thumb|right|By clustering each index in the matrix, and using centroids to approximate, the overall computations can be done much quicker and are more easily handled by edge devices]]

In recent work, Quantized Neural Networks (QNNs) have demonstrated that even extreme quantization—such as using just 1-bit values for weights and activations—can retain near state-of-the-art accuracy across vision and language tasks [12]. This type of quantization drastically reduces memory access requirements and replaces expensive arithmetic operations with fast, low-power bitwise operations like XNOR and popcount. These benefits are especially important for edge deployment, where energy efficiency is critical. In addition to model compression, Hubara et al. also show that quantized gradients—using as little as 6 bits—can be employed during training with minimal performance loss, further enabling efficient on-device learning [12]. QNNs have achieved strong results even on demanding benchmarks like ImageNet, while offering significant speedups and memory savings, making them one of the most practical solutions for edge AI deployment [12].

'''Pruning:'''
Pruning is an optimization technique that systematically removes low-salience parameters—such as weakly contributing weights or redundant hypothesis paths—from a machine learning model or decoding algorithm to reduce computational overhead. In the context of edge computing, where resources like memory bandwidth, power, and processing time are limited, pruning enables the deployment of performant models within strict efficiency constraints.

In statistical machine translation (SMT) systems, pruning is particularly critical during the decoding phase, where the search space of possible translations grows exponentially with sentence length. Techniques such as histogram pruning and threshold pruning are employed to manage this complexity. Histogram pruning restricts the number of candidate hypotheses retained in a decoding stack to a fixed size 𝑛, discarding the remainder. Threshold pruning eliminates hypotheses whose scores fall below a proportion 𝛼 of the best-scoring candidate, effectively filtering out weak candidates early.

The paper by Banik et al. introduces a machine learning-based dynamic pruning framework that adaptively tunes pruning parameters—namely stack size and beam threshold—based on structural features of the input text, such as sentence length, syntactic complexity, and the distribution of stop words. Rather than relying on static hyperparameters, this method uses a classifier (CN2 algorithm) trained on performance data to predict optimal pruning configurations at runtime. Experimental results showed consistent reductions in decoding latency (up to 90%) while maintaining or improving translation quality, as measured by BLEU scores [13].

This adaptive pruning paradigm is highly relevant to edge inference pipelines, where models must maintain a balance between latency and predictive accuracy. By intelligently limiting the hypothesis space and focusing computational resources on high-probability paths, pruning supports real-time, resource-efficient processing in edge NLP and embedded translation systems.

[[File:Pruning.png|350px|thumb|center|This shows how pruning can significantly reduce the overall network, thus leading to better computational and memory management]]

'''Distillation:'''
Distillation is a key strategy for reducing model complexity in edge computing environments. Instead of training a compact student model on hard labels—discrete class labels like 0, 1, or 2—it is trained on the soft outputs of a larger teacher model. These soft labels represent probability distributions over all classes, offering more nuanced supervision. For instance, rather than telling the student the input belongs strictly to class 3, a teacher might output “70% class 3, 25% class 2, 5% class 1.” This richer feedback helps the student model capture subtle relationships between classes that hard labels miss. Beyond reducing computational demands, distillation enhances generalization by conveying more informative training signals. It also benefits from favorable data geometry—when class distributions are well-separated and aligned—and exhibits strong monotonicity, meaning the student model reliably improves as more data becomes available [11]. These properties make it exceptionally suited for edge devices where training data may be limited, but efficient inference is crucial.

In most cases, knowledge distillation in edge environments involves a large, high-capacity model trained in the cloud acting as the teacher, while the smaller, lightweight student model is deployed on edge devices. A less common—but emerging—practice is edge-to-edge distillation, where a more powerful edge node or edge server functions as the teacher for other nearby edge devices. This setup is especially valuable in federated, collaborative, or hierarchical edge networks, where cloud connectivity may be limited or privacy concerns necessitate local training. Distillation can also be combined with techniques such as quantization or pruning to further optimize model performance under hardware constraints. An example is shown below:

[[File:Knowledge_Distillation.png|700px|thumb|center|This shows how a complex teacher model transfers learned knowledge to a smaller student model using soft predictions to enable efficient edge deployment]]

{| class="wikitable" style="width:100%; text-align:left;"
|+ '''Comparison of Model Compression Techniques for Edge Deployment'''
! Technique
! Description
! Primary Benefit
! Trade-offs
! Ideal Use Case
|-
| '''Pruning'''
| Removes unnecessary weights or neurons from a neural network.
| Reduces model size and computation.
| May require retraining or fine-tuning to preserve accuracy.
| Useful for deploying models on devices with strict memory and compute constraints.
|-
| '''Quantization'''
| Converts high-precision values (e.g., 32-bit float) to lower precision (e.g., 8-bit integer or binary).
| Lowers memory usage and accelerates inference.
| Risk of precision loss, especially in very small or sensitive models.
| Ideal when real-time inference and power efficiency are essential.
|-
| '''Distillation'''
| Trains a smaller model (student) using the output probabilities of a larger, more complex teacher model.
| Preserves performance while reducing model complexity.
| Requires access to a trained teacher model and additional training data.
| Effective when deploying accurate, lightweight models under data or resource constraints.
|}

==='''Usage and Applications of AI Agents'''===
As artificial intelligence and machine learning technologies continue to mature, they pave the way for the development of intelligent AI agents capable of autonomous, context-aware behavior, with the goal of efficiently performing tasks specified by users. These agents combine perception, reasoning, and decision-making to execute tasks with minimal human intervention. When deployed on edge devices, AI agents can operate with low latency, preserve user privacy, and adapt to local data—making them ideal for real-time, personalized applications in homes, vehicles, factories, and beyond.

To function effectively, an agent must first perceive its environment and understand the task—often defined by the user. Then, it must reason about the optimal steps to accomplish that task, and finally, it must act on those decisions. These three components—perception, reasoning, and action—are essential to the agent’s ability to operate accurately and autonomously in dynamic environments.

'''Reasoning:''' The agent must be able to think sequentially, and decompose its specified tasks into a sequence of specific steps in order to accomplish its goal. It must also have some memory storage in order to remember what it has done, as well as the results of its sequence of actions in order to learn for future steps.

'''Autonomy:''' The agent must choose from the availability of possible steps, and operate based on its reasoning without step-by-step instructions from the user.

'''Tools:''' These tasks, however, are impossible to accomplish without the correct tools. Even if an AI agent understands how to go about carrying a task for optimal results, it must have the actual means to do it. This can include the ability to use and interact with APIs, interpret code, and access certain databases.

Utilizing AI agents on edge devices can be tricky due to the computational and reasoning power needed. However, there are methods to accomplish this such as SLMs which query LLMs as need (discussed later), or utilizing more powerful edge devices to carry out tasks. However, utilizing edge devices can be paramount if latency is a major issue, or if the agent is exposed to sensitive user data. Additionally, by using edge devices specific to a user, it may be able to better learn a user's patterns and preferences and react accordingly to provide the best possible outcome for that user.

=='''4.3 ML Model Optimization at the Edge'''==

==='''The Need for Model Optimization at the Edge'''===
Given the constrained resources, along with the inherently dynamic environment that edge devices must function in, model optimization is a crucial part of machine learning in edge computing. The current most widely used methodology consists of simply specifying an exceptionally large set of parameters, and giving it to the model to train on. This can be feasible when hardware is very advanced and powerful, and is necessary for systems such as Large Language Models (LLMs). However, this is no longer viable when dealing with the devices and environments at the edge. It is crucial to identify the best parameters and training methodology so as to minimize the amount of work done by these devices, while compromising as little as possible on the accuracy of the models. There are multiple ways to this, and they include either optimization or augmentation of the dataset itself, or optimization of the partition of work among the edge devices.

==='''Edge and Cloud Collaboration'''===
One methodology that is often used involves collaboration between both Edge and Cloud Devices. The cloud has the ability to process workloads that may require much more resources and cannot be done on edge devices. On the other hand, edge devices, which can store and process data locally, may have lower latency and more privacy. Given the advantages of each of these, many have proposed that the best way to handle machine learning is through a combination of edge and cloud computing.

The primary issue facing this computing paradigm, however, is the problem of optimally selecting which workloads should be done on the cloud and which should be done on the edge. This is a crucial problem to solve, as the correct partition of workloads is the best way to ensure that the respective benefits of the devices can be leveraged. A common way to do this, is to run certain computing tasks on the necessary devices and determine the length of time and resources that it takes. An example of this is the profiling step done in [https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10818760&tag=1 EdgeShard] [1] and [https://dl.acm.org/doi/pdf/10.1145/3093337.3037698 Neurosurgeon] [4]. Other frameworks implement similar steps, where the capabilities of different devices are tested in order to allocate their workloads and determine the limit at which they can provide efficient functionality. If the workload is beyond the limits of the devices, it can be sent to the cloud for processing

The key advantage of this is that it is able to utilize the resources of the edge devices as necessary, allowing increased data privacy and lower latency. Since workloads are only processed in the cloud as needed, this will reduce the overall amount of time needed for processing because data is not constantly sent back and forth. It also allows for much less network congestion, which is crucial for many applications.

[[File:ECcollab.png|400px|thumb|center|The collaboration of Edge and Cloud Devices]]

==='''Optimizing Workload Partitioning'''===
The key idea for much of the optimization done in machine learning on edge systems involves fully utilizing the heterogenous devices that are often contained in these systems. As such, it is important to understand the capabilities of each device so as to fully utilize its advantages. Devices can very greatly from smartphones with more powerful computational abilities to raspberry pis to sensors. More difficult tasks are offloaded to the powerful devices, while simpler tasks, or models that have been somewhat pretrained can be sent to the smaller devices. In some cases, as in [https://ieeexplore.ieee.org/abstract/document/8690980 Mobile-Edge] [2], the task may be dropped altogether if the resources are deemed insufficient. In this way, exceptionally difficult tasks do not block tasks that have the ability to be executed and therefore the system can continue working.

==='''Dynamic Models'''===
Given the dynamic nature of the environments that edge devices must function, as well as the heterogeneity of the devices themselves, a dynamic model of machine learning is often employed. Such models must keep track of the current available resources including computation usage and power, as well as network traffic. These may change very often depending on the workloads and devices in the system. As such, training models to continuously monitor and dynamically distribute the workloads is a very important part of optimization. Simply offloading larger tasks to more powerful devices may be obsolete if the devices has all of its computing resources or network capabilities being used up by another workload.

The way this is commonly done is by using the profiling step described above as a baseline. Then, a machine learning model utilizes the data to estimate the performance of devices and/or layers. During runtime, a similar process is employed which may update the data used and help the model refine its predictions. Network traffic is also taken into account at this stage in order to preserve the edge computing benefit of providing lower latency. Using all of this data and updates at runtime, the partitioning model is able to dynamically distribute workloads at runtime in order to optimize the workflow and ensure each device is utilizing its resources in the most efficient manner. 2 very good examples of how such a system is specifically deployed are the Neurosurgeon and EdgeShard systems, shown above.

==='''Horizontal and Vertical Partitioning'''===
There are 2 major ways that these models split the workloads in order to optimize the machine learning: Horizontal and vertical partitioning [3]. Given a set of layers that ranges from the cloud to edge, vertical partitioning involves splitting up the workload between the layers. For example, if a large amount of computational resources is deemed necessary, this task may go to the cloud to be completed and preprocessed. One the other hand, if a small amount of computational power is required, this type of work can go to edge devices. Such partitioning also depends on the confidence and accuracy level of the given learning. If the accuracy is completed on an edge device and found to be very low, it can be sent to the cloud; on the other hand if the accuracy is already fairly high and the learning model needs smaller work to reach the threshold deemed acceptable, it may be sent to edge devices to free up network traffic on the cloud and reduce latency [3].

The second model of partitioning is called horizontal partitioning. This involves splitting among the devices within a certain layer rather than among the layers themselves. This is similar to what has been described in previous sections, as it allows a means for fully utilizing the heterogenous abilities that are found in edge devices. Similar functionality and determination to what is found in horizontal partitioning is done, but all of the devices that the workload is split across function on the same layer [3]. To fully optimize a machine learning model, both horizontal and vertical partitioning must be used.

[[File:Edge_computing_layers.png|400px|thumb|center|An example of different layers with multiple devices]]

==='''Distributed Learning'''===
Distributed learning is a process in which instead of giving all of the data to the nodes, each node takes in and processes only a subset of the overall data, then sends its results to a central server which contains the main model. These periodic updates are done by each node, and by only processing part of the data, it is much more easy for edge devices to handle these workloads. It can also reduce the time and computational burden on the cloud and network because it is not only the central server performing all of the computations. One popular method of accomplishing this is by using parallel gradient descent. Gradient descent by itself is a very useful means of training a model, and parallel gradient descent uses a similar process but instead of a single operation, it aggregates the gradient descent calculated by each node in order to update the central model. This efficiently utilizes each node and makes sure that the data that is used to update the model does not exceed the memory constraints of the edge devices being used as nodes.

[[File:Distributed.png|600px|thumb|center| A visualization of data partitioning among nodes doing distributed learning]]

'''Asynchronous Scheduling:''' One important aspect of distributed learning is the means by which the updates are given to the model. Each node may have different amounts of data, memory, and computational power, and therefore the time it takes to process and update will not be the same. Synchronous algorithms make sure that each node finishes its respective processing, and then all of the calculations from all nodes are sent to update the central model. Although, this can make the updating easier, any nodes that lag behind the others will greatly reduce the speed at which the model is trained; this also leaves nodes idle while others are still processing and can be highly inefficient. To optimize this, after each node has finished its respective processing, it sends everything to the main server for an update rather than waiting for all the others to finish. However, this requires some more communication, as after each update, every node must get the updated model for the central server; this can be challenging to repeatedly do, but the efficiency that is gained is often worth it.

'''Federated learning:''' Federated learning is another means by which the data as well as computational burden is distributed among a set of nodes, thus reducing network traffic and strain on the central servers. It is similar to distributed learning, but the key difference lies in the data partitioning. Unlike distributed learning, data used in federated learning is not shared with the central servers. Instead, only local data from each of the nodes is used to train the model. Then, the only part shared with the central model is the updated parameters. A key aspect of this is that federated learning provides a greater amount of data privacy, which is crucial for certain applications dealing with sensitive data. Therefore, it is especially useful to utilize edge devices to perform federated learning. Federated learning is discussed in detail in chapter 5 of this wiki.

==='''Transfer Learning'''===
Transfer learning is a method of machine learning in which a pretrained model is sent to the different nodes for further processing and fine-tuning. The initial training of the model may involve a very large amount of data and could place a major burden on the device that must execute it, which may not be feasible for edge devices. Therefore, the bulk of the model training is done by a more powerful machine, such as the cloud, and then the pretrained model is sent to the edge device. This can be useful, as it reduces the computational burden on the edge device, and allows it to fine-tune the model using the data it collects without having to completely train the whole system. One form of this is knowledge distillation, in which a smaller model can be trained to mimic that of a larger model. This may often be the case when edge and cloud systems are used in a combined way.

==='''Methods of Data Optimization'''===

'''Data Selection:''' Certain types of data may be more useful than others, and therefore the ones that most affect the accuracy of a model can be offloaded to edge devices. This decreases the workload on them because less data must be processed, while also conserving the accuracy of the model as much as possible. Some larger data may not be needed, such as only putting Small Language Models on edge devices that can handle simple commands and prompts, rather than having to offload an entire LLM onto the device, which may overload its computational abilities and not provide enough use.

'''Data Compression:''' Data can be compressed to a smaller form in order to fit the constraints of edge devices. This is especially true given their limited memory, and also makes the workload smaller. Quantization, discussed previously, is a prevalent example of this.

'''Container Workloads:''' Container workloads can be very useful, as they provide all resources and important data the device needs for processing the work. By examining the computational abilities of the device, different sized workloads can be allocated as deemed necessary to maximize the efficiency of the training.

'''Batch size tuning:''' The batch size used by a certain model is very important when considering the memory constraints of edge devices. Batch sizes that are smaller allow for a quicker and more efficient means of training models, and are less likely to lead to bottlenecks in memory. This is related to partitioning because the computing and memory capacity of the devices available are very important factors to consider.

==='''Utilizing Small Language Models (SLMs)'''===
Large Language Models (LLMs) have become a prevalent system recently and are able to do and help with a variety of tasks. However, running and training an LLM requires a significant amount of computational resources which is not feasible when working with edge devices. Most modern LLMs are cloud based, but this may lead to high latency and increased network traffic, especially when working with a large subsystem of nodes.

One way that a similar system can be achieved on edge devices is by using SLMs. These are not as accurate and do not have the vast knowledge of LLMs or the amount of data they are trained on, but for the purposes of basic applications and edge devices, they can be sufficient to accomplish many tasks. They are also often fine-tuned and trained to accomplish the specific tasks which they are deployed for and are much faster and resource-efficient than LLMs. They can also provide much more privacy because they are able to be run on local devices without sharing user data to the cloud. This can be useful for a wide variety of edge applications. If needed and privacy constraints permit, they can query and LLM as needed for more complicated tasks. This means that not every prompt leads to a query, and thus network traffic, privacy, and latency constraints are still preserved.

==='''Synthesis'''===

In summary, edge-oriented machine learning optimization requires an integrated approach that combines model-level compression with system-level orchestration. Techniques such as quantization, structured and unstructured pruning, and knowledge distillation reduce the computational footprint and memory requirements of deep learning models, enabling deployment on resource-constrained devices without substantial loss in inference accuracy. Concurrently, dynamic workload partitioning, heterogeneity-aware scheduling, and adaptive runtime profiling allow the system to allocate tasks across edge and cloud tiers based on real-time availability of compute, bandwidth, and energy resources. This joint optimization across model architecture and execution environment is essential to meet the latency, privacy, and resilience demands of edge AI deployments.

=='''References'''==
[https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10818760&tag=1 1. M. Zhang, X. Shen, J. Cao, Z. Cui and S. Jiang, "EdgeShard: Efficient LLM Inference via Collaborative Edge Computing," in IEEE Internet of Things Journal, doi: 10.1109/JIOT.2024.3524255]

[https://ieeexplore.ieee.org/abstract/document/8690980 2. X. Chen, H. Zhang, C. Wu, S. Mao, Y. Ji and M. Bennis, "Performance Optimization in Mobile-Edge Computing via Deep Reinforcement Learning," 2018 IEEE 88th Vehicular Technology Conference (VTC-Fall), Chicago, IL, USA, 2018, pp. 1-6, doi: 10.1109/VTCFall.2018.8690980.]

[https://ieeexplore.ieee.org/abstract/document/8976180 3. X. Wang, Y. Han, V. C. M. Leung, D. Niyato, X. Yan and X. Chen, "Convergence of Edge Computing and Deep Learning: A Comprehensive Survey," in IEEE Communications Surveys & Tutorials, vol. 22, no. 2, pp. 869-904, Secondquarter 2020, doi: 10.1109/COMST.2020.2970550.]

[https://dl.acm.org/doi/abs/10.1145/3093337.3037698 4. Kang, Yiping and Hauswald, Johann and Gao, Cao and Rovinski, Austin and Mudge, Trevor and Mars, Jason and Tang, Lingjia, "Neurosurgeon: Collaborative Intelligence Between the Cloud and Mobile Edge" 2017 Association for Computing Machinery, New York, NY, USA, 2017, doi: 10.1145/3093337.3037698.]

[https://dl.acm.org/doi/pdf/10.1145/3555802 5. H. Hua, Y. Li, T. Wang, N. Dong, W. Li, and J. Cao, "Edge computing with artificial intelligence: A machine learning perspective," ACM Computing Surveys, vol. 55, no. 9, Art. no. 184, pp. 1–35, Jan. 2023, doi: 10.1145/3555802..]

[https://www.mdpi.com/2079-9292/13/3/640 6. Grzesik, Piotr, and Dariusz Mrozek. "Combining machine learning and edge computing: Opportunities, challenges, platforms, frameworks, and use cases." Electronics 13.3 (2024): 640.]

[https://link.springer.com/chapter/10.1007/978-3-030-96756-7_1#citeas 7. S. Rafatirad, H. Homayoun, Z. Chen, and S. M. Pudukotai Dinakarrao, "What Is Applied Machine Learning?," in Machine Learning for Computer Scientists and Data Analysts, Cham, Switzerland: Springer, 2022.]

[https://link.springer.com/article/10.1007/s13748-012-0035-5 8. Peteiro-Barral, Diego, and Bertha Guijarro-Berdiñas. "A survey of methods for distributed machine learning." Progress in Artificial Intelligence 2 (2013): 1-11.]

[https://proceedings.neurips.cc/paper/2010/hash/abea47ba24142ed16b7d8fbf2c740e0d-Abstract.html 9. Zinkevich, Martin, et al. "Parallelized stochastic gradient descent." Advances in neural information processing systems 23 (2010).]

[https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9134370 10. F. Zhuang et al., "A Comprehensive Survey on Transfer Learning," in Proceedings of the IEEE, vol. 109, no. 1, pp. 43-76, Jan. 2021, doi: 10.1109/JPROC.2020.3004555]

[https://proceedings.mlr.press/v97/phuong19a/phuong19a.pdf 11. M. Phuong and C. Lampert, "Towards understanding knowledge distillation," in Proc. Int. Conf. Mach. Learn., May 2019, pp. 5142–5151.]

[https://www.jmlr.org/papers/volume18/16-456/16-456.pdf 12. I. Hubara, M. Courbariaux, D. Soudry, R. El-Yaniv, and Y. Bengio, "Quantized neural networks: Training neural networks with low precision weights and activations," Journal of Machine Learning Research, vol. 18, no. 187, pp. 1–30, 2018.]]

[https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=8588318 13. D. Banik, A. Ekbal, and P. Bhattacharyya, "Machine learning based optimized pruning approach for decoding in statistical machine translation," IEEE Access, vol. 7, pp. 1736–1751, Dec. 2018, doi: 10.1109/ACCESS.2018.2883738.]

Machine Learning at the Edge

2025-04-25T05:01:21Z

Ciarang: /* Usage and Applications of AI Agents */

='''Machine Learning at the Edge'''=

=='''4.1 Overview of ML at the Edge'''==
==='''Introduction'''===
Machine Learning is a branch of Artificial Intelligence that is primarily concerned with the training of algorithms to look at sets of data, analyze patterns, and generate conclusions so that the gathered data can be used to generate results and carry out tasks. As an application, machine learning is especially relevant when considering edge computing [7]. Given the recent prevalence and rise in the field of machine learning, it is inevitable that deployment of machine learning algorithms will play a crucial role in the function of edge computing systems as well. The abundant sensor networks often involved in edge systems provide a means of gathering very large amounts of data. With the help of machine learning, such data can be utilized in very effective ways and automation methods can be employed to improve efficiency. Machine learning on edge devices themselves can help ease the burden on cloud systems and the networks connected to them as well, especially if the model does need such large computational power such as in the case of certain Small Language Models (SLMs). However, the main issue regarding machine learning at the edge lies in the devices, which may have limited computational power, and the environment, which may change dynamically and be effected by network congestion, device outages, or other unexpected events. If these challenges can be overcome, edge systems could play a pivotal role in advancing machine learning and utilize it to maximize efficiency.

[[File:ECML.png|600px|thumb|center|]]

==='''Benefits of Machine Learning using Edge Computing'''===
'''Data Volumes:''' Traditional cloud computing architecture may not be able to keep up with the massive volume of data that is often generated by IoT devices and sensors, thus increasing costs as well as pressure and congestion on the networks transmitting data [5]. With edge computing architectures, machine learning can be deployed in a distributed nature, allowing ML models to be deployed and trained across many edge devices. The nature of edge devices sharing computational tasks makes it perfect for taking large datasets and splitting the computational load so that many devices can compute what a single device might not have been able to handle [5]. More than just splitting input data across edge devices, machine learning models can be split up across edge devices as well. This allows for deployment of models that would be otherwise too large to be deployed on a single edge device with limited memory. In such cases, edge nodes are able to collaboratively pass data between them, and the global ML model is later updated based on the workings of each smaller portion [5]. Additionally, the sheer amount of data that processed from edge devices means more training data for machine learning models, which generally increases accuracy, and thus edge computing could be an effective solution to providing lots of good and relevant data for a variety of applications.

'''Lower Latency:''' For certain real-time applications, such as Virtual and Augmented Reality(VR/AR) or smart cars, the latency required to transmit data and process it to the cloud may be too high to make these applications efficient and safe. A key benefit of edge computing lies in the reduced latency provided by putting devices closer to the users, and this is applicable to machine learning as well [5].

'''Enhanced privacy:''' Processing data locally rather than on the cloud reduces the risk of data theft and enhances privacy. The data does not have to be sent to anyone else, or go over a network to potentially get compromised. This is crucial given the large amounts of data needed for machine learning, and especially if personal data is needed for a personal application, many users would prefer the privacy done by having it processed locally [5].

==='''Challenges of Machine Learning Using Edge Computing'''===

'''Potentially Low Computational Power:''' Many edge devices lack the computational power needed for needed for deep learning applications, especially given the large amount of data and complex operations that may have to take place [1].

'''Energy Management:''' Given that edge devices may consist of sensors, or other devices that are meant to have low energy consumption, the tasks associated with machine learning could quickly drain their power, even if they have sufficient power to run the workloads [6].

'''New Attack Surfaces:''' With more devices comes the increased potential for malicious hackers to steal data or compromise devices, despite the enhanced privacy. Encryption, access controls, and other methodologies must be employed to mitigate the potential for new attack surfaces to be exploited [6].

==='''Applications for ML at the Edge'''===
The utilization of data to make conjectures about what is going on in the environment and how to respond has a variety of use cases that can greatly benefit people, cities, and the environment. By leveraging and monitoring a constant stream of data and training machine learning models to detect or even respond to different events, there can be many practical applications for such systems. These would rely on the combination of edge devices and machine learning to better enhance experience for users and detect events of interest.

'''Self-Driving Cars''': Imitation learning can be leveraged to better understand and emulate human driving. The low latency that can be provided by edge computing is especially useful for the quick decision making needed by these systems [7].

'''Smart Home Devices''': Understanding user habits by leveraging the available data for them can make smart homes more convenient for users. The increased privacy that can come with edge computing, along with a few extra cybersecurity measures, can ensure the personal data that may be used for training is not compromised.

'''Environmental and Industrial Monitoring:''' Sensors deployed in environments and industrial settings could be trained to recognize when there are anomalies or undesirable behavior, thus ensuring a quick response and active information sending [7].

'''Smart Cities:''' Similar to above, the data collected by sensors in cities can leverage machine learning to help in crime and emergency detection, traffic management, or energy management[7].

=='''4.2 ML Training at the Edge'''==

Machine Learning (ML) training at the edge is basically the process of developing, updating, or fine-tuning ML models directly on edge devices like on smartphones, IoT sensors, wearables, and other embedded systems instead of only depending on centralized cloud infrastructure. This approach is becoming a lot more important as the demand for real-time, personalized AI applications continues to grow. By being able to train models closer to where the data is generated, edge-based ML enables faster responses, helps reduce latency, and enhances user privacy by minimizing the need to transmit sensitive data to the cloud. It’s also especially useful in scenarios where devices operate in environments with limited or unreliable network connectivity, allowing them to function more efficiently.

[[File:Machine Learning Pipeline.png|500px|thumb|right|Diagram of the machine learning pipeline: raw structured and unstructured data is preprocessed, analyzed, and used for feature selection, followed by model construction, evaluation, and deployment for application use.]]

[[File:Learning.png|500px|thumb|center| This diagram shows the relationship among different devices in the edge-cloud system. As shown, the deep learning workload is partitioned among all layers so that each one contributes based on its advantages.]]

==='''Benefits:'''===
One significant advantage of training ML models directly on edge devices is reduced latency. By processing data locally, devices can make immediate decisions without the delays caused by transmitting data back and forth to cloud servers. This immediate responsiveness is extremely important for applications like real time health monitoring, autonomous driving, and industrial automation.

Additionally, training machine learning models at the edge significantly enhances user privacy. Since sensitive data can be processed and stored directly on the user's device rather than being sent to centralized cloud servers, the risk of data breaches or unauthorized access during transmission is reduced by a lot. This local data handling is able to prevent exposure of personal or confidential information, providing users greater control over their data. Edge-based training naturally aligns with privacy regulations such as the General Data Protection Regulation (GDPR), which emphasizes strict data security, transparency, and explicit user consent. By keeping personal data localized, edge training not only improves security but also helps organizations easily comply with privacy laws, protecting users’ rights and maintaining trust.

Efficiency and resilience are important benefits of edge training. By training machine learning models directly on edge devices, these devices become capable of processing data locally without relying on constant internet connectivity. This local processing allows edge devices to continue operating effectively even in environments where network connections are weak, unstable, or completely unavailable. Because they are not fully dependent on cloud infrastructure, edge devices can quickly adapt to changes, respond in real-time, and update their ML models based on immediate local data. As a result, edge training ensures reliable performance and uninterrupted operation, making it particularly valuable for remote locations, emergency scenarios, and harsh environments where cloud-based solutions might fail or become unreliable.

'''Examples: '''
A smart thermostat in a home can learn a user’s preferences for temperature and adjust automatically based on real-time inputs, like time of day or weather conditions. Similarly, a fitness tracker can track user activity patterns and adapt its recommendations for workouts or rest periods based on how the user is performing each day. These devices don’t need to rely on cloud servers to update or personalize their behavior — they can do it instantly on the device, which makes them more responsive and efficient.

In smart agriculture, edge computing is used to enhance crop monitoring and optimize farming practices. Devices like soil sensors, drones, and automated irrigation systems are equipped with sensors that collect data on soil moisture, temperature, and crop health. Edge devices process this data locally, enabling real-time decisions for tasks like irrigation, fertilization, and pest control.

In smart retail, edge computing is used to improve inventory management and customer experience. Retailers use smart shelves, RFID tags, and in-store cameras equipped with sensors to track inventory and monitor customer behavior. By processing this data locally on edge devices, retailers can manage stock levels, detect theft, and optimize store layouts in real-time. RFID tags placed on products can detect when an item is removed from the shelf. Using edge processing, the system can immediately update the inventory count and trigger a restocking request if an item’s stock is low.

'''Research Papers:'''
An important contribution to the understanding of machine learning (ML) training at the edge is the research paper "Making Distributed Edge Machine Learning for Resource-Constrained Communities and Environments Smarter: Contexts and Challenges" by Truong et al. (2023). This paper focuses on training ML models directly on edge devices in communities and environments facing limitations, such as unstable network connections, limited computational resources, and scarce technical expertise. The authors emphasize the necessity of developing context-aware ML training methods specifically tailored to these environments. Traditional centralized ML training methods often fail to operate effectively in such constrained settings, highlighting the need for decentralized, localized solutions. Truong et al. explore various challenges, including managing data efficiently, deploying suitable software frameworks, and designing intelligent runtime strategies that allow edge devices to train models effectively despite limited resources. Their work points out significant research directions, advocating for more adaptable and sustainable ML training solutions that genuinely reflect the technological and social contexts of resource-limited environments.

==='''Tools and Frameworks:'''===

Frameworks like TensorFlow Lite, PyTorch Mobile, and Edge Impulse are designed to support edge-based model training and inference. These tools allow developers to build and fine-tune models specifically for deployment on low-power devices.

==='''Technical Challenges:'''===

Despite its advantages, ML training at the edge presents challenges, including limited processing power, memory constraints, and energy efficiency. Edge devices often lack the computational resources of cloud servers, requiring lightweight models, optimized algorithms, and energy-efficient hardware.

Real World Applications:
A well known example is Apple’s use of on-device training for personalized voice recognition with Siri. Instead of uploading user voice data to the cloud, Apple uses local training to improve accuracy over time while maintaining user privacy.

==='''Dealing With Challenges'''===
Despite the challenges, of ML at the edge, there are a variety of methods that can be used to provide a more efficient means of training, and making the heavy workloads compatible with even the limited computing power of certain edge devices.

'''Quantization:'''
Quantization is a method that involves reducing the precision of numbers, and thus easing the burden of computational power as well as memory management on the edge devices. There are multiple forms of quantization, but each one essentially sacrifices some precision - enough so that accuracy is mostly maintained but the numbers are easier to handle. For example, converting from floating point to integer datatypes means significantly less memory is used, and the differences for some models in the precision may be negligible. Another example is K-means based Weight Quantization, which involves creating a matrix and grouping similar numbers together with centroids. An example is shown below:

[[File:Screenshot_2025-04-24_170604.png|500px|thumb|right|By clustering each index in the matrix, and using centroids to approximate, the overall computations can be done much quicker and are more easily handled by edge devices]]

In recent work, Quantized Neural Networks (QNNs) have demonstrated that even extreme quantization—such as using just 1-bit values for weights and activations—can retain near state-of-the-art accuracy across vision and language tasks [12]. This type of quantization drastically reduces memory access requirements and replaces expensive arithmetic operations with fast, low-power bitwise operations like XNOR and popcount. These benefits are especially important for edge deployment, where energy efficiency is critical. In addition to model compression, Hubara et al. also show that quantized gradients—using as little as 6 bits—can be employed during training with minimal performance loss, further enabling efficient on-device learning [12]. QNNs have achieved strong results even on demanding benchmarks like ImageNet, while offering significant speedups and memory savings, making them one of the most practical solutions for edge AI deployment [12].

'''Pruning:'''
Pruning is an optimization technique that systematically removes low-salience parameters—such as weakly contributing weights or redundant hypothesis paths—from a machine learning model or decoding algorithm to reduce computational overhead. In the context of edge computing, where resources like memory bandwidth, power, and processing time are limited, pruning enables the deployment of performant models within strict efficiency constraints.

In statistical machine translation (SMT) systems, pruning is particularly critical during the decoding phase, where the search space of possible translations grows exponentially with sentence length. Techniques such as histogram pruning and threshold pruning are employed to manage this complexity. Histogram pruning restricts the number of candidate hypotheses retained in a decoding stack to a fixed size 𝑛, discarding the remainder. Threshold pruning eliminates hypotheses whose scores fall below a proportion 𝛼 of the best-scoring candidate, effectively filtering out weak candidates early.

The paper by Banik et al. introduces a machine learning-based dynamic pruning framework that adaptively tunes pruning parameters—namely stack size and beam threshold—based on structural features of the input text, such as sentence length, syntactic complexity, and the distribution of stop words. Rather than relying on static hyperparameters, this method uses a classifier (CN2 algorithm) trained on performance data to predict optimal pruning configurations at runtime. Experimental results showed consistent reductions in decoding latency (up to 90%) while maintaining or improving translation quality, as measured by BLEU scores [13].

This adaptive pruning paradigm is highly relevant to edge inference pipelines, where models must maintain a balance between latency and predictive accuracy. By intelligently limiting the hypothesis space and focusing computational resources on high-probability paths, pruning supports real-time, resource-efficient processing in edge NLP and embedded translation systems.

[[File:Pruning.png|350px|thumb|center|This shows how pruning can significantly reduce the overall network, thus leading to better computational and memory management]]

'''Distillation:'''
Distillation is a key strategy for reducing model complexity in edge computing environments. Instead of training a compact student model on hard labels—discrete class labels like 0, 1, or 2—it is trained on the soft outputs of a larger teacher model. These soft labels represent probability distributions over all classes, offering more nuanced supervision. For instance, rather than telling the student the input belongs strictly to class 3, a teacher might output “70% class 3, 25% class 2, 5% class 1.” This richer feedback helps the student model capture subtle relationships between classes that hard labels miss. Beyond reducing computational demands, distillation enhances generalization by conveying more informative training signals. It also benefits from favorable data geometry—when class distributions are well-separated and aligned—and exhibits strong monotonicity, meaning the student model reliably improves as more data becomes available [11]. These properties make it exceptionally suited for edge devices where training data may be limited, but efficient inference is crucial.

In most cases, knowledge distillation in edge environments involves a large, high-capacity model trained in the cloud acting as the teacher, while the smaller, lightweight student model is deployed on edge devices. A less common—but emerging—practice is edge-to-edge distillation, where a more powerful edge node or edge server functions as the teacher for other nearby edge devices. This setup is especially valuable in federated, collaborative, or hierarchical edge networks, where cloud connectivity may be limited or privacy concerns necessitate local training. Distillation can also be combined with techniques such as quantization or pruning to further optimize model performance under hardware constraints. An example is shown below:

[[File:Knowledge_Distillation.png|700px|thumb|center|This shows how a complex teacher model transfers learned knowledge to a smaller student model using soft predictions to enable efficient edge deployment]]

==='''Usage and Applications of AI Agents'''===
As artificial intelligence and machine learning technologies continue to mature, they pave the way for the development of intelligent AI agents capable of autonomous, context-aware behavior, with the goal of efficiently performing tasks specified by users. These agents combine perception, reasoning, and decision-making to execute tasks with minimal human intervention. When deployed on edge devices, AI agents can operate with low latency, preserve user privacy, and adapt to local data—making them ideal for real-time, personalized applications in homes, vehicles, factories, and beyond.

To function effectively, an agent must first perceive its environment and understand the task—often defined by the user. Then, it must reason about the optimal steps to accomplish that task, and finally, it must act on those decisions. These three components—perception, reasoning, and action—are essential to the agent’s ability to operate accurately and autonomously in dynamic environments.

'''Reasoning:''' The agent must be able to think sequentially, and decompose its specified tasks into a sequence of specific steps in order to accomplish its goal. It must also have some memory storage in order to remember what it has done, as well as the results of its sequence of actions in order to learn for future steps.

'''Autonomy:''' The agent must choose from the availability of possible steps, and operate based on its reasoning without step-by-step instructions from the user.

'''Tools:''' These tasks, however, are impossible to accomplish without the correct tools. Even if an AI agent understands how to go about carrying a task for optimal results, it must have the actual means to do it. This can include the ability to use and interact with APIs, interpret code, and access certain databases.

Utilizing AI agents on edge devices can be tricky due to the computational and reasoning power needed. However, there are methods to accomplish this such as SLMs which query LLMs as need (discussed later), or utilizing more powerful edge devices to carry out tasks. However, utilizing edge devices can be paramount if latency is a major issue, or if the agent is exposed to sensitive user data. Additionally, by using edge devices specific to a user, it may be able to better learn a user's patterns and preferences and react accordingly to provide the best possible outcome for that user.

=='''4.3 ML Model Optimization at the Edge'''==

==='''The Need for Model Optimization at the Edge'''===
Given the constrained resources, along with the inherently dynamic environment that edge devices must function in, model optimization is a crucial part of machine learning in edge computing. The current most widely used methodology consists of simply specifying an exceptionally large set of parameters, and giving it to the model to train on. This can be feasible when hardware is very advanced and powerful, and is necessary for systems such as Large Language Models (LLMs). However, this is no longer viable when dealing with the devices and environments at the edge. It is crucial to identify the best parameters and training methodology so as to minimize the amount of work done by these devices, while compromising as little as possible on the accuracy of the models. There are multiple ways to this, and they include either optimization or augmentation of the dataset itself, or optimization of the partition of work among the edge devices.

==='''Edge and Cloud Collaboration'''===
One methodology that is often used involves collaboration between both Edge and Cloud Devices. The cloud has the ability to process workloads that may require much more resources and cannot be done on edge devices. On the other hand, edge devices, which can store and process data locally, may have lower latency and more privacy. Given the advantages of each of these, many have proposed that the best way to handle machine learning is through a combination of edge and cloud computing.

The primary issue facing this computing paradigm, however, is the problem of optimally selecting which workloads should be done on the cloud and which should be done on the edge. This is a crucial problem to solve, as the correct partition of workloads is the best way to ensure that the respective benefits of the devices can be leveraged. A common way to do this, is to run certain computing tasks on the necessary devices and determine the length of time and resources that it takes. An example of this is the profiling step done in [https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10818760&tag=1 EdgeShard] [1] and [https://dl.acm.org/doi/pdf/10.1145/3093337.3037698 Neurosurgeon] [4]. Other frameworks implement similar steps, where the capabilities of different devices are tested in order to allocate their workloads and determine the limit at which they can provide efficient functionality. If the workload is beyond the limits of the devices, it can be sent to the cloud for processing

The key advantage of this is that it is able to utilize the resources of the edge devices as necessary, allowing increased data privacy and lower latency. Since workloads are only processed in the cloud as needed, this will reduce the overall amount of time needed for processing because data is not constantly sent back and forth. It also allows for much less network congestion, which is crucial for many applications.

[[File:ECcollab.png|400px|thumb|center|The collaboration of Edge and Cloud Devices]]

==='''Optimizing Workload Partitioning'''===
The key idea for much of the optimization done in machine learning on edge systems involves fully utilizing the heterogenous devices that are often contained in these systems. As such, it is important to understand the capabilities of each device so as to fully utilize its advantages. Devices can very greatly from smartphones with more powerful computational abilities to raspberry pis to sensors. More difficult tasks are offloaded to the powerful devices, while simpler tasks, or models that have been somewhat pretrained can be sent to the smaller devices. In some cases, as in [https://ieeexplore.ieee.org/abstract/document/8690980 Mobile-Edge] [2], the task may be dropped altogether if the resources are deemed insufficient. In this way, exceptionally difficult tasks do not block tasks that have the ability to be executed and therefore the system can continue working.

==='''Dynamic Models'''===
Given the dynamic nature of the environments that edge devices must function, as well as the heterogeneity of the devices themselves, a dynamic model of machine learning is often employed. Such models must keep track of the current available resources including computation usage and power, as well as network traffic. These may change very often depending on the workloads and devices in the system. As such, training models to continuously monitor and dynamically distribute the workloads is a very important part of optimization. Simply offloading larger tasks to more powerful devices may be obsolete if the devices has all of its computing resources or network capabilities being used up by another workload.

The way this is commonly done is by using the profiling step described above as a baseline. Then, a machine learning model utilizes the data to estimate the performance of devices and/or layers. During runtime, a similar process is employed which may update the data used and help the model refine its predictions. Network traffic is also taken into account at this stage in order to preserve the edge computing benefit of providing lower latency. Using all of this data and updates at runtime, the partitioning model is able to dynamically distribute workloads at runtime in order to optimize the workflow and ensure each device is utilizing its resources in the most efficient manner. 2 very good examples of how such a system is specifically deployed are the Neurosurgeon and EdgeShard systems, shown above.

==='''Horizontal and Vertical Partitioning'''===
There are 2 major ways that these models split the workloads in order to optimize the machine learning: Horizontal and vertical partitioning [3]. Given a set of layers that ranges from the cloud to edge, vertical partitioning involves splitting up the workload between the layers. For example, if a large amount of computational resources is deemed necessary, this task may go to the cloud to be completed and preprocessed. One the other hand, if a small amount of computational power is required, this type of work can go to edge devices. Such partitioning also depends on the confidence and accuracy level of the given learning. If the accuracy is completed on an edge device and found to be very low, it can be sent to the cloud; on the other hand if the accuracy is already fairly high and the learning model needs smaller work to reach the threshold deemed acceptable, it may be sent to edge devices to free up network traffic on the cloud and reduce latency [3].

The second model of partitioning is called horizontal partitioning. This involves splitting among the devices within a certain layer rather than among the layers themselves. This is similar to what has been described in previous sections, as it allows a means for fully utilizing the heterogenous abilities that are found in edge devices. Similar functionality and determination to what is found in horizontal partitioning is done, but all of the devices that the workload is split across function on the same layer [3]. To fully optimize a machine learning model, both horizontal and vertical partitioning must be used.

[[File:Edge_computing_layers.png|400px|thumb|center|An example of different layers with multiple devices]]

==='''Distributed Learning'''===
Distributed learning is a process in which instead of giving all of the data to the nodes, each node takes in and processes only a subset of the overall data, then sends its results to a central server which contains the main model. These periodic updates are done by each node, and by only processing part of the data, it is much more easy for edge devices to handle these workloads. It can also reduce the time and computational burden on the cloud and network because it is not only the central server performing all of the computations. One popular method of accomplishing this is by using parallel gradient descent. Gradient descent by itself is a very useful means of training a model, and parallel gradient descent uses a similar process but instead of a single operation, it aggregates the gradient descent calculated by each node in order to update the central model. This efficiently utilizes each node and makes sure that the data that is used to update the model does not exceed the memory constraints of the edge devices being used as nodes.

[[File:Distributed.png|600px|thumb|center| A visualization of data partitioning among nodes doing distributed learning]]

'''Asynchronous Scheduling:''' One important aspect of distributed learning is the means by which the updates are given to the model. Each node may have different amounts of data, memory, and computational power, and therefore the time it takes to process and update will not be the same. Synchronous algorithms make sure that each node finishes its respective processing, and then all of the calculations from all nodes are sent to update the central model. Although, this can make the updating easier, any nodes that lag behind the others will greatly reduce the speed at which the model is trained; this also leaves nodes idle while others are still processing and can be highly inefficient. To optimize this, after each node has finished its respective processing, it sends everything to the main server for an update rather than waiting for all the others to finish. However, this requires some more communication, as after each update, every node must get the updated model for the central server; this can be challenging to repeatedly do, but the efficiency that is gained is often worth it.

'''Federated learning:''' Federated learning is another means by which the data as well as computational burden is distributed among a set of nodes, thus reducing network traffic and strain on the central servers. It is similar to distributed learning, but the key difference lies in the data partitioning. Unlike distributed learning, data used in federated learning is not shared with the central servers. Instead, only local data from each of the nodes is used to train the model. Then, the only part shared with the central model is the updated parameters. A key aspect of this is that federated learning provides a greater amount of data privacy, which is crucial for certain applications dealing with sensitive data. Therefore, it is especially useful to utilize edge devices to perform federated learning. Federated learning is discussed in detail in chapter 5 of this wiki.

==='''Transfer Learning'''===
Transfer learning is a method of machine learning in which a pretrained model is sent to the different nodes for further processing and fine-tuning. The initial training of the model may involve a very large amount of data and could place a major burden on the device that must execute it, which may not be feasible for edge devices. Therefore, the bulk of the model training is done by a more powerful machine, such as the cloud, and then the pretrained model is sent to the edge device. This can be useful, as it reduces the computational burden on the edge device, and allows it to fine-tune the model using the data it collects without having to completely train the whole system. One form of this is knowledge distillation, in which a smaller model can be trained to mimic that of a larger model. This may often be the case when edge and cloud systems are used in a combined way.

==='''Methods of Data Optimization'''===

'''Data Selection:''' Certain types of data may be more useful than others, and therefore the ones that most affect the accuracy of a model can be offloaded to edge devices. This decreases the workload on them because less data must be processed, while also conserving the accuracy of the model as much as possible. Some larger data may not be needed, such as only putting Small Language Models on edge devices that can handle simple commands and prompts, rather than having to offload an entire LLM onto the device, which may overload its computational abilities and not provide enough use.

'''Data Compression:''' Data can be compressed to a smaller form in order to fit the constraints of edge devices. This is especially true given their limited memory, and also makes the workload smaller. Quantization, discussed previously, is a prevalent example of this.

'''Container Workloads:''' Container workloads can be very useful, as they provide all resources and important data the device needs for processing the work. By examining the computational abilities of the device, different sized workloads can be allocated as deemed necessary to maximize the efficiency of the training.

'''Batch size tuning:''' The batch size used by a certain model is very important when considering the memory constraints of edge devices. Batch sizes that are smaller allow for a quicker and more efficient means of training models, and are less likely to lead to bottlenecks in memory. This is related to partitioning because the computing and memory capacity of the devices available are very important factors to consider.

==='''Utilizing Small Language Models (SLMs)'''===
Large Language Models (LLMs) have become a prevalent system recently and are able to do and help with a variety of tasks. However, running and training an LLM requires a significant amount of computational resources which is not feasible when working with edge devices. Most modern LLMs are cloud based, but this may lead to high latency and increased network traffic, especially when working with a large subsystem of nodes.

One way that a similar system can be achieved on edge devices is by using SLMs. These are not as accurate and do not have the vast knowledge of LLMs or the amount of data they are trained on, but for the purposes of basic applications and edge devices, they can be sufficient to accomplish many tasks. They are also often fine-tuned and trained to accomplish the specific tasks which they are deployed for and are much faster and resource-efficient than LLMs. They can also provide much more privacy because they are able to be run on local devices without sharing user data to the cloud. This can be useful for a wide variety of edge applications. If needed and privacy constraints permit, they can query and LLM as needed for more complicated tasks. This means that not every prompt leads to a query, and thus network traffic, privacy, and latency constraints are still preserved.

==='''Synthesis'''===

In summary, edge-oriented machine learning optimization requires an integrated approach that combines model-level compression with system-level orchestration. Techniques such as quantization, structured and unstructured pruning, and knowledge distillation reduce the computational footprint and memory requirements of deep learning models, enabling deployment on resource-constrained devices without substantial loss in inference accuracy. Concurrently, dynamic workload partitioning, heterogeneity-aware scheduling, and adaptive runtime profiling allow the system to allocate tasks across edge and cloud tiers based on real-time availability of compute, bandwidth, and energy resources. This joint optimization across model architecture and execution environment is essential to meet the latency, privacy, and resilience demands of edge AI deployments.

=='''References'''==
[https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10818760&tag=1 1. M. Zhang, X. Shen, J. Cao, Z. Cui and S. Jiang, "EdgeShard: Efficient LLM Inference via Collaborative Edge Computing," in IEEE Internet of Things Journal, doi: 10.1109/JIOT.2024.3524255]

[https://ieeexplore.ieee.org/abstract/document/8690980 2. X. Chen, H. Zhang, C. Wu, S. Mao, Y. Ji and M. Bennis, "Performance Optimization in Mobile-Edge Computing via Deep Reinforcement Learning," 2018 IEEE 88th Vehicular Technology Conference (VTC-Fall), Chicago, IL, USA, 2018, pp. 1-6, doi: 10.1109/VTCFall.2018.8690980.]

[https://ieeexplore.ieee.org/abstract/document/8976180 3. X. Wang, Y. Han, V. C. M. Leung, D. Niyato, X. Yan and X. Chen, "Convergence of Edge Computing and Deep Learning: A Comprehensive Survey," in IEEE Communications Surveys & Tutorials, vol. 22, no. 2, pp. 869-904, Secondquarter 2020, doi: 10.1109/COMST.2020.2970550.]

[https://dl.acm.org/doi/abs/10.1145/3093337.3037698 4. Kang, Yiping and Hauswald, Johann and Gao, Cao and Rovinski, Austin and Mudge, Trevor and Mars, Jason and Tang, Lingjia, "Neurosurgeon: Collaborative Intelligence Between the Cloud and Mobile Edge" 2017 Association for Computing Machinery, New York, NY, USA, 2017, doi: 10.1145/3093337.3037698.]

[https://dl.acm.org/doi/pdf/10.1145/3555802 5. H. Hua, Y. Li, T. Wang, N. Dong, W. Li, and J. Cao, "Edge computing with artificial intelligence: A machine learning perspective," ACM Computing Surveys, vol. 55, no. 9, Art. no. 184, pp. 1–35, Jan. 2023, doi: 10.1145/3555802..]

[https://www.mdpi.com/2079-9292/13/3/640 6. Grzesik, Piotr, and Dariusz Mrozek. "Combining machine learning and edge computing: Opportunities, challenges, platforms, frameworks, and use cases." Electronics 13.3 (2024): 640.]

[https://link.springer.com/chapter/10.1007/978-3-030-96756-7_1#citeas 7. S. Rafatirad, H. Homayoun, Z. Chen, and S. M. Pudukotai Dinakarrao, "What Is Applied Machine Learning?," in Machine Learning for Computer Scientists and Data Analysts, Cham, Switzerland: Springer, 2022.]

[https://link.springer.com/article/10.1007/s13748-012-0035-5 8. Peteiro-Barral, Diego, and Bertha Guijarro-Berdiñas. "A survey of methods for distributed machine learning." Progress in Artificial Intelligence 2 (2013): 1-11.]

[https://proceedings.neurips.cc/paper/2010/hash/abea47ba24142ed16b7d8fbf2c740e0d-Abstract.html 9. Zinkevich, Martin, et al. "Parallelized stochastic gradient descent." Advances in neural information processing systems 23 (2010).]

[https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9134370 10. F. Zhuang et al., "A Comprehensive Survey on Transfer Learning," in Proceedings of the IEEE, vol. 109, no. 1, pp. 43-76, Jan. 2021, doi: 10.1109/JPROC.2020.3004555]

[https://proceedings.mlr.press/v97/phuong19a/phuong19a.pdf 11. M. Phuong and C. Lampert, "Towards understanding knowledge distillation," in Proc. Int. Conf. Mach. Learn., May 2019, pp. 5142–5151.]

[https://www.jmlr.org/papers/volume18/16-456/16-456.pdf 12. I. Hubara, M. Courbariaux, D. Soudry, R. El-Yaniv, and Y. Bengio, "Quantized neural networks: Training neural networks with low precision weights and activations," Journal of Machine Learning Research, vol. 18, no. 187, pp. 1–30, 2018.]]

[https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=8588318 13. D. Banik, A. Ekbal, and P. Bhattacharyya, "Machine learning based optimized pruning approach for decoding in statistical machine translation," IEEE Access, vol. 7, pp. 1736–1751, Dec. 2018, doi: 10.1109/ACCESS.2018.2883738.]

Machine Learning at the Edge

2025-04-25T04:45:13Z

Ciarang: /* 4.3 ML Model Optimization at the Edge */

='''Machine Learning at the Edge'''=

=='''4.1 Overview of ML at the Edge'''==
==='''Introduction'''===
Machine Learning is a branch of Artificial Intelligence that is primarily concerned with the training of algorithms to look at sets of data, analyze patterns, and generate conclusions so that the gathered data can be used to generate results and carry out tasks. As an application, machine learning is especially relevant when considering edge computing [7]. Given the recent prevalence and rise in the field of machine learning, it is inevitable that deployment of machine learning algorithms will play a crucial role in the function of edge computing systems as well. The abundant sensor networks often involved in edge systems provide a means of gathering very large amounts of data. With the help of machine learning, such data can be utilized in very effective ways and automation methods can be employed to improve efficiency. Machine learning on edge devices themselves can help ease the burden on cloud systems and the networks connected to them as well, especially if the model does need such large computational power such as in the case of certain Small Language Models (SLMs). However, the main issue regarding machine learning at the edge lies in the devices, which may have limited computational power, and the environment, which may change dynamically and be effected by network congestion, device outages, or other unexpected events. If these challenges can be overcome, edge systems could play a pivotal role in advancing machine learning and utilize it to maximize efficiency.

[[File:ECML.png|600px|thumb|center|]]

==='''Benefits of Machine Learning using Edge Computing'''===
'''Data Volumes:''' Traditional cloud computing architecture may not be able to keep up with the massive volume of data that is often generated by IoT devices and sensors, thus increasing costs as well as pressure and congestion on the networks transmitting data [5]. With edge computing architectures, machine learning can be deployed in a distributed nature, allowing ML models to be deployed and trained across many edge devices. The nature of edge devices sharing computational tasks makes it perfect for taking large datasets and splitting the computational load so that many devices can compute what a single device might not have been able to handle [5]. More than just splitting input data across edge devices, machine learning models can be split up across edge devices as well. This allows for deployment of models that would be otherwise too large to be deployed on a single edge device with limited memory. In such cases, edge nodes are able to collaboratively pass data between them, and the global ML model is later updated based on the workings of each smaller portion [5]. Additionally, the sheer amount of data that processed from edge devices means more training data for machine learning models, which generally increases accuracy, and thus edge computing could be an effective solution to providing lots of good and relevant data for a variety of applications.

'''Lower Latency:''' For certain real-time applications, such as Virtual and Augmented Reality(VR/AR) or smart cars, the latency required to transmit data and process it to the cloud may be too high to make these applications efficient and safe. A key benefit of edge computing lies in the reduced latency provided by putting devices closer to the users, and this is applicable to machine learning as well [5].

'''Enhanced privacy:''' Processing data locally rather than on the cloud reduces the risk of data theft and enhances privacy. The data does not have to be sent to anyone else, or go over a network to potentially get compromised. This is crucial given the large amounts of data needed for machine learning, and especially if personal data is needed for a personal application, many users would prefer the privacy done by having it processed locally [5].

==='''Challenges of Machine Learning Using Edge Computing'''===

'''Potentially Low Computational Power:''' Many edge devices lack the computational power needed for needed for deep learning applications, especially given the large amount of data and complex operations that may have to take place [1].

'''Energy Management:''' Given that edge devices may consist of sensors, or other devices that are meant to have low energy consumption, the tasks associated with machine learning could quickly drain their power, even if they have sufficient power to run the workloads [6].

'''New Attack Surfaces:''' With more devices comes the increased potential for malicious hackers to steal data or compromise devices, despite the enhanced privacy. Encryption, access controls, and other methodologies must be employed to mitigate the potential for new attack surfaces to be exploited [6].

==='''Applications for ML at the Edge'''===
The utilization of data to make conjectures about what is going on in the environment and how to respond has a variety of use cases that can greatly benefit people, cities, and the environment. By leveraging and monitoring a constant stream of data and training machine learning models to detect or even respond to different events, there can be many practical applications for such systems. These would rely on the combination of edge devices and machine learning to better enhance experience for users and detect events of interest.

'''Self-Driving Cars''': Imitation learning can be leveraged to better understand and emulate human driving. The low latency that can be provided by edge computing is especially useful for the quick decision making needed by these systems [7].

'''Smart Home Devices''': Understanding user habits by leveraging the available data for them can make smart homes more convenient for users. The increased privacy that can come with edge computing, along with a few extra cybersecurity measures, can ensure the personal data that may be used for training is not compromised.

'''Environmental and Industrial Monitoring:''' Sensors deployed in environments and industrial settings could be trained to recognize when there are anomalies or undesirable behavior, thus ensuring a quick response and active information sending [7].

'''Smart Cities:''' Similar to above, the data collected by sensors in cities can leverage machine learning to help in crime and emergency detection, traffic management, or energy management[7].

=='''4.2 ML Training at the Edge'''==

Machine Learning (ML) training at the edge is basically the process of developing, updating, or fine-tuning ML models directly on edge devices like on smartphones, IoT sensors, wearables, and other embedded systems instead of only depending on centralized cloud infrastructure. This approach is becoming a lot more important as the demand for real-time, personalized AI applications continues to grow. By being able to train models closer to where the data is generated, edge-based ML enables faster responses, helps reduce latency, and enhances user privacy by minimizing the need to transmit sensitive data to the cloud. It’s also especially useful in scenarios where devices operate in environments with limited or unreliable network connectivity, allowing them to function more efficiently.

[[File:Machine Learning Pipeline.png|500px|thumb|right|Diagram of the machine learning pipeline: raw structured and unstructured data is preprocessed, analyzed, and used for feature selection, followed by model construction, evaluation, and deployment for application use.]]

[[File:Learning.png|500px|thumb|center| This diagram shows the relationship among different devices in the edge-cloud system. As shown, the deep learning workload is partitioned among all layers so that each one contributes based on its advantages.]]

==='''Benefits:'''===
One significant advantage of training ML models directly on edge devices is reduced latency. By processing data locally, devices can make immediate decisions without the delays caused by transmitting data back and forth to cloud servers. This immediate responsiveness is extremely important for applications like real time health monitoring, autonomous driving, and industrial automation.

Additionally, training machine learning models at the edge significantly enhances user privacy. Since sensitive data can be processed and stored directly on the user's device rather than being sent to centralized cloud servers, the risk of data breaches or unauthorized access during transmission is reduced by a lot. This local data handling is able to prevent exposure of personal or confidential information, providing users greater control over their data. Edge-based training naturally aligns with privacy regulations such as the General Data Protection Regulation (GDPR), which emphasizes strict data security, transparency, and explicit user consent. By keeping personal data localized, edge training not only improves security but also helps organizations easily comply with privacy laws, protecting users’ rights and maintaining trust.

Efficiency and resilience are important benefits of edge training. By training machine learning models directly on edge devices, these devices become capable of processing data locally without relying on constant internet connectivity. This local processing allows edge devices to continue operating effectively even in environments where network connections are weak, unstable, or completely unavailable. Because they are not fully dependent on cloud infrastructure, edge devices can quickly adapt to changes, respond in real-time, and update their ML models based on immediate local data. As a result, edge training ensures reliable performance and uninterrupted operation, making it particularly valuable for remote locations, emergency scenarios, and harsh environments where cloud-based solutions might fail or become unreliable.

'''Examples: '''
A smart thermostat in a home can learn a user’s preferences for temperature and adjust automatically based on real-time inputs, like time of day or weather conditions. Similarly, a fitness tracker can track user activity patterns and adapt its recommendations for workouts or rest periods based on how the user is performing each day. These devices don’t need to rely on cloud servers to update or personalize their behavior — they can do it instantly on the device, which makes them more responsive and efficient.

In smart agriculture, edge computing is used to enhance crop monitoring and optimize farming practices. Devices like soil sensors, drones, and automated irrigation systems are equipped with sensors that collect data on soil moisture, temperature, and crop health. Edge devices process this data locally, enabling real-time decisions for tasks like irrigation, fertilization, and pest control.

In smart retail, edge computing is used to improve inventory management and customer experience. Retailers use smart shelves, RFID tags, and in-store cameras equipped with sensors to track inventory and monitor customer behavior. By processing this data locally on edge devices, retailers can manage stock levels, detect theft, and optimize store layouts in real-time. RFID tags placed on products can detect when an item is removed from the shelf. Using edge processing, the system can immediately update the inventory count and trigger a restocking request if an item’s stock is low.

'''Research Papers:'''
An important contribution to the understanding of machine learning (ML) training at the edge is the research paper "Making Distributed Edge Machine Learning for Resource-Constrained Communities and Environments Smarter: Contexts and Challenges" by Truong et al. (2023). This paper focuses on training ML models directly on edge devices in communities and environments facing limitations, such as unstable network connections, limited computational resources, and scarce technical expertise. The authors emphasize the necessity of developing context-aware ML training methods specifically tailored to these environments. Traditional centralized ML training methods often fail to operate effectively in such constrained settings, highlighting the need for decentralized, localized solutions. Truong et al. explore various challenges, including managing data efficiently, deploying suitable software frameworks, and designing intelligent runtime strategies that allow edge devices to train models effectively despite limited resources. Their work points out significant research directions, advocating for more adaptable and sustainable ML training solutions that genuinely reflect the technological and social contexts of resource-limited environments.

==='''Tools and Frameworks:'''===

Frameworks like TensorFlow Lite, PyTorch Mobile, and Edge Impulse are designed to support edge-based model training and inference. These tools allow developers to build and fine-tune models specifically for deployment on low-power devices.

==='''Technical Challenges:'''===

Despite its advantages, ML training at the edge presents challenges, including limited processing power, memory constraints, and energy efficiency. Edge devices often lack the computational resources of cloud servers, requiring lightweight models, optimized algorithms, and energy-efficient hardware.

Real World Applications:
A well known example is Apple’s use of on-device training for personalized voice recognition with Siri. Instead of uploading user voice data to the cloud, Apple uses local training to improve accuracy over time while maintaining user privacy.

==='''Dealing With Challenges'''===
Despite the challenges, of ML at the edge, there are a variety of methods that can be used to provide a more efficient means of training, and making the heavy workloads compatible with even the limited computing power of certain edge devices.

'''Quantization:'''
Quantization is a method that involves reducing the precision of numbers, and thus easing the burden of computational power as well as memory management on the edge devices. There are multiple forms of quantization, but each one essentially sacrifices some precision - enough so that accuracy is mostly maintained but the numbers are easier to handle. For example, converting from floating point to integer datatypes means significantly less memory is used, and the differences for some models in the precision may be negligible. Another example is K-means based Weight Quantization, which involves creating a matrix and grouping similar numbers together with centroids. An example is shown below:

[[File:Screenshot_2025-04-24_170604.png|500px|thumb|right|By clustering each index in the matrix, and using centroids to approximate, the overall computations can be done much quicker and are more easily handled by edge devices]]

In recent work, Quantized Neural Networks (QNNs) have demonstrated that even extreme quantization—such as using just 1-bit values for weights and activations—can retain near state-of-the-art accuracy across vision and language tasks [12]. This type of quantization drastically reduces memory access requirements and replaces expensive arithmetic operations with fast, low-power bitwise operations like XNOR and popcount. These benefits are especially important for edge deployment, where energy efficiency is critical. In addition to model compression, Hubara et al. also show that quantized gradients—using as little as 6 bits—can be employed during training with minimal performance loss, further enabling efficient on-device learning [12]. QNNs have achieved strong results even on demanding benchmarks like ImageNet, while offering significant speedups and memory savings, making them one of the most practical solutions for edge AI deployment [12].

'''Pruning:'''
Pruning is an optimization technique that systematically removes low-salience parameters—such as weakly contributing weights or redundant hypothesis paths—from a machine learning model or decoding algorithm to reduce computational overhead. In the context of edge computing, where resources like memory bandwidth, power, and processing time are limited, pruning enables the deployment of performant models within strict efficiency constraints.

In statistical machine translation (SMT) systems, pruning is particularly critical during the decoding phase, where the search space of possible translations grows exponentially with sentence length. Techniques such as histogram pruning and threshold pruning are employed to manage this complexity. Histogram pruning restricts the number of candidate hypotheses retained in a decoding stack to a fixed size 𝑛, discarding the remainder. Threshold pruning eliminates hypotheses whose scores fall below a proportion 𝛼 of the best-scoring candidate, effectively filtering out weak candidates early.

The paper by Banik et al. introduces a machine learning-based dynamic pruning framework that adaptively tunes pruning parameters—namely stack size and beam threshold—based on structural features of the input text, such as sentence length, syntactic complexity, and the distribution of stop words. Rather than relying on static hyperparameters, this method uses a classifier (CN2 algorithm) trained on performance data to predict optimal pruning configurations at runtime. Experimental results showed consistent reductions in decoding latency (up to 90%) while maintaining or improving translation quality, as measured by BLEU scores [13].

This adaptive pruning paradigm is highly relevant to edge inference pipelines, where models must maintain a balance between latency and predictive accuracy. By intelligently limiting the hypothesis space and focusing computational resources on high-probability paths, pruning supports real-time, resource-efficient processing in edge NLP and embedded translation systems.

[[File:Pruning.png|350px|thumb|center|This shows how pruning can significantly reduce the overall network, thus leading to better computational and memory management]]

'''Distillation:'''
Distillation is a key strategy for reducing model complexity in edge computing environments. Instead of training a compact student model on hard labels—discrete class labels like 0, 1, or 2—it is trained on the soft outputs of a larger teacher model. These soft labels represent probability distributions over all classes, offering more nuanced supervision. For instance, rather than telling the student the input belongs strictly to class 3, a teacher might output “70% class 3, 25% class 2, 5% class 1.” This richer feedback helps the student model capture subtle relationships between classes that hard labels miss. Beyond reducing computational demands, distillation enhances generalization by conveying more informative training signals. It also benefits from favorable data geometry—when class distributions are well-separated and aligned—and exhibits strong monotonicity, meaning the student model reliably improves as more data becomes available [11]. These properties make it exceptionally suited for edge devices where training data may be limited, but efficient inference is crucial.

In most cases, knowledge distillation in edge environments involves a large, high-capacity model trained in the cloud acting as the teacher, while the smaller, lightweight student model is deployed on edge devices. A less common—but emerging—practice is edge-to-edge distillation, where a more powerful edge node or edge server functions as the teacher for other nearby edge devices. This setup is especially valuable in federated, collaborative, or hierarchical edge networks, where cloud connectivity may be limited or privacy concerns necessitate local training. Distillation can also be combined with techniques such as quantization or pruning to further optimize model performance under hardware constraints. An example is shown below:

[[File:Knowledge_Distillation.png|700px|thumb|center|This shows how a complex teacher model transfers learned knowledge to a smaller student model using soft predictions to enable efficient edge deployment]]

==='''Usage and Applications of AI Agents'''===
As AI develops and is able to identify different images, generate chats and utilize human language, and optimize different systems, the next step is for it to be able to efficiently perform tasks specified by users. This is known as an AI agent. The agent must first perceive its environment and task, often specified by a user, then it must think about what is the best means to accomplish the task, and finally be able to act on it. 3 aspects of AI agents are crucial in their development to efficiently and correctly carry out task:

'''Reasoning:''' The agent must be able to think sequentially, and decompose its specified tasks into a sequence of specific steps in order to accomplish its goal. It must also have some memory storage in order to remember what it has done, as well as the results of its sequence of actions in order to learn for future steps.

'''Autonomy:''' The agent must choose from the availability of possible steps, and operate based on its reasoning without step-by-step instructions from the user.

'''Tools:''' These tasks, however, are impossible to accomplish without the correct tools. Even if an AI agent understands how to go about carrying a task for optimal results, it must have the actual means to do it. This can include the ability to use and interact with APIs, interpret code, and access certain databases.

Utilizing AI agents on edge devices can be tricky due to the computational and reasoning power needed. However, there are methods to accomplish this such as SLMs which query LLMs as need (discussed later), or utilizing more powerful edge devices to carry out tasks. However, utilizing edge devices can be paramount if latency is a major issue, or if the agent is exposed to sensitive user data. Additionally, by using edge devices specific to a user, it may be able to better learn a user's patterns and preferences and react accordingly to provide the best possible outcome for that user.

=='''4.3 ML Model Optimization at the Edge'''==

==='''The Need for Model Optimization at the Edge'''===
Given the constrained resources, along with the inherently dynamic environment that edge devices must function in, model optimization is a crucial part of machine learning in edge computing. The current most widely used methodology consists of simply specifying an exceptionally large set of parameters, and giving it to the model to train on. This can be feasible when hardware is very advanced and powerful, and is necessary for systems such as Large Language Models (LLMs). However, this is no longer viable when dealing with the devices and environments at the edge. It is crucial to identify the best parameters and training methodology so as to minimize the amount of work done by these devices, while compromising as little as possible on the accuracy of the models. There are multiple ways to this, and they include either optimization or augmentation of the dataset itself, or optimization of the partition of work among the edge devices.

==='''Edge and Cloud Collaboration'''===
One methodology that is often used involves collaboration between both Edge and Cloud Devices. The cloud has the ability to process workloads that may require much more resources and cannot be done on edge devices. On the other hand, edge devices, which can store and process data locally, may have lower latency and more privacy. Given the advantages of each of these, many have proposed that the best way to handle machine learning is through a combination of edge and cloud computing.

The primary issue facing this computing paradigm, however, is the problem of optimally selecting which workloads should be done on the cloud and which should be done on the edge. This is a crucial problem to solve, as the correct partition of workloads is the best way to ensure that the respective benefits of the devices can be leveraged. A common way to do this, is to run certain computing tasks on the necessary devices and determine the length of time and resources that it takes. An example of this is the profiling step done in [https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10818760&tag=1 EdgeShard] [1] and [https://dl.acm.org/doi/pdf/10.1145/3093337.3037698 Neurosurgeon] [4]. Other frameworks implement similar steps, where the capabilities of different devices are tested in order to allocate their workloads and determine the limit at which they can provide efficient functionality. If the workload is beyond the limits of the devices, it can be sent to the cloud for processing

The key advantage of this is that it is able to utilize the resources of the edge devices as necessary, allowing increased data privacy and lower latency. Since workloads are only processed in the cloud as needed, this will reduce the overall amount of time needed for processing because data is not constantly sent back and forth. It also allows for much less network congestion, which is crucial for many applications.

[[File:ECcollab.png|400px|thumb|center|The collaboration of Edge and Cloud Devices]]

==='''Optimizing Workload Partitioning'''===
The key idea for much of the optimization done in machine learning on edge systems involves fully utilizing the heterogenous devices that are often contained in these systems. As such, it is important to understand the capabilities of each device so as to fully utilize its advantages. Devices can very greatly from smartphones with more powerful computational abilities to raspberry pis to sensors. More difficult tasks are offloaded to the powerful devices, while simpler tasks, or models that have been somewhat pretrained can be sent to the smaller devices. In some cases, as in [https://ieeexplore.ieee.org/abstract/document/8690980 Mobile-Edge] [2], the task may be dropped altogether if the resources are deemed insufficient. In this way, exceptionally difficult tasks do not block tasks that have the ability to be executed and therefore the system can continue working.

==='''Dynamic Models'''===
Given the dynamic nature of the environments that edge devices must function, as well as the heterogeneity of the devices themselves, a dynamic model of machine learning is often employed. Such models must keep track of the current available resources including computation usage and power, as well as network traffic. These may change very often depending on the workloads and devices in the system. As such, training models to continuously monitor and dynamically distribute the workloads is a very important part of optimization. Simply offloading larger tasks to more powerful devices may be obsolete if the devices has all of its computing resources or network capabilities being used up by another workload.

The way this is commonly done is by using the profiling step described above as a baseline. Then, a machine learning model utilizes the data to estimate the performance of devices and/or layers. During runtime, a similar process is employed which may update the data used and help the model refine its predictions. Network traffic is also taken into account at this stage in order to preserve the edge computing benefit of providing lower latency. Using all of this data and updates at runtime, the partitioning model is able to dynamically distribute workloads at runtime in order to optimize the workflow and ensure each device is utilizing its resources in the most efficient manner. 2 very good examples of how such a system is specifically deployed are the Neurosurgeon and EdgeShard systems, shown above.

==='''Horizontal and Vertical Partitioning'''===
There are 2 major ways that these models split the workloads in order to optimize the machine learning: Horizontal and vertical partitioning [3]. Given a set of layers that ranges from the cloud to edge, vertical partitioning involves splitting up the workload between the layers. For example, if a large amount of computational resources is deemed necessary, this task may go to the cloud to be completed and preprocessed. One the other hand, if a small amount of computational power is required, this type of work can go to edge devices. Such partitioning also depends on the confidence and accuracy level of the given learning. If the accuracy is completed on an edge device and found to be very low, it can be sent to the cloud; on the other hand if the accuracy is already fairly high and the learning model needs smaller work to reach the threshold deemed acceptable, it may be sent to edge devices to free up network traffic on the cloud and reduce latency [3].

The second model of partitioning is called horizontal partitioning. This involves splitting among the devices within a certain layer rather than among the layers themselves. This is similar to what has been described in previous sections, as it allows a means for fully utilizing the heterogenous abilities that are found in edge devices. Similar functionality and determination to what is found in horizontal partitioning is done, but all of the devices that the workload is split across function on the same layer [3]. To fully optimize a machine learning model, both horizontal and vertical partitioning must be used.

[[File:Edge_computing_layers.png|400px|thumb|center|An example of different layers with multiple devices]]

==='''Distributed Learning'''===
Distributed learning is a process in which instead of giving all of the data to the nodes, each node takes in and processes only a subset of the overall data, then sends its results to a central server which contains the main model. These periodic updates are done by each node, and by only processing part of the data, it is much more easy for edge devices to handle these workloads. It can also reduce the time and computational burden on the cloud and network because it is not only the central server performing all of the computations. One popular method of accomplishing this is by using parallel gradient descent. Gradient descent by itself is a very useful means of training a model, and parallel gradient descent uses a similar process but instead of a single operation, it aggregates the gradient descent calculated by each node in order to update the central model. This efficiently utilizes each node and makes sure that the data that is used to update the model does not exceed the memory constraints of the edge devices being used as nodes.

[[File:Distributed.png|600px|thumb|center| A visualization of data partitioning among nodes doing distributed learning]]

'''Asynchronous Scheduling:''' One important aspect of distributed learning is the means by which the updates are given to the model. Each node may have different amounts of data, memory, and computational power, and therefore the time it takes to process and update will not be the same. Synchronous algorithms make sure that each node finishes its respective processing, and then all of the calculations from all nodes are sent to update the central model. Although, this can make the updating easier, any nodes that lag behind the others will greatly reduce the speed at which the model is trained; this also leaves nodes idle while others are still processing and can be highly inefficient. To optimize this, after each node has finished its respective processing, it sends everything to the main server for an update rather than waiting for all the others to finish. However, this requires some more communication, as after each update, every node must get the updated model for the central server; this can be challenging to repeatedly do, but the efficiency that is gained is often worth it.

'''Federated learning:''' Federated learning is another means by which the data as well as computational burden is distributed among a set of nodes, thus reducing network traffic and strain on the central servers. It is similar to distributed learning, but the key difference lies in the data partitioning. Unlike distributed learning, data used in federated learning is not shared with the central servers. Instead, only local data from each of the nodes is used to train the model. Then, the only part shared with the central model is the updated parameters. A key aspect of this is that federated learning provides a greater amount of data privacy, which is crucial for certain applications dealing with sensitive data. Therefore, it is especially useful to utilize edge devices to perform federated learning. Federated learning is discussed in detail in chapter 5 of this wiki.

==='''Transfer Learning'''===
Transfer learning is a method of machine learning in which a pretrained model is sent to the different nodes for further processing and fine-tuning. The initial training of the model may involve a very large amount of data and could place a major burden on the device that must execute it, which may not be feasible for edge devices. Therefore, the bulk of the model training is done by a more powerful machine, such as the cloud, and then the pretrained model is sent to the edge device. This can be useful, as it reduces the computational burden on the edge device, and allows it to fine-tune the model using the data it collects without having to completely train the whole system. One form of this is knowledge distillation, in which a smaller model can be trained to mimic that of a larger model. This may often be the case when edge and cloud systems are used in a combined way.

==='''Methods of Data Optimization'''===

'''Data Selection:''' Certain types of data may be more useful than others, and therefore the ones that most affect the accuracy of a model can be offloaded to edge devices. This decreases the workload on them because less data must be processed, while also conserving the accuracy of the model as much as possible. Some larger data may not be needed, such as only putting Small Language Models on edge devices that can handle simple commands and prompts, rather than having to offload an entire LLM onto the device, which may overload its computational abilities and not provide enough use.

'''Data Compression:''' Data can be compressed to a smaller form in order to fit the constraints of edge devices. This is especially true given their limited memory, and also makes the workload smaller. Quantization, discussed previously, is a prevalent example of this.

'''Container Workloads:''' Container workloads can be very useful, as they provide all resources and important data the device needs for processing the work. By examining the computational abilities of the device, different sized workloads can be allocated as deemed necessary to maximize the efficiency of the training.

'''Batch size tuning:''' The batch size used by a certain model is very important when considering the memory constraints of edge devices. Batch sizes that are smaller allow for a quicker and more efficient means of training models, and are less likely to lead to bottlenecks in memory. This is related to partitioning because the computing and memory capacity of the devices available are very important factors to consider.

==='''Utilizing Small Language Models (SLMs)'''===
Large Language Models (LLMs) have become a prevalent system recently and are able to do and help with a variety of tasks. However, running and training an LLM requires a significant amount of computational resources which is not feasible when working with edge devices. Most modern LLMs are cloud based, but this may lead to high latency and increased network traffic, especially when working with a large subsystem of nodes.

One way that a similar system can be achieved on edge devices is by using SLMs. These are not as accurate and do not have the vast knowledge of LLMs or the amount of data they are trained on, but for the purposes of basic applications and edge devices, they can be sufficient to accomplish many tasks. They are also often fine-tuned and trained to accomplish the specific tasks which they are deployed for and are much faster and resource-efficient than LLMs. They can also provide much more privacy because they are able to be run on local devices without sharing user data to the cloud. This can be useful for a wide variety of edge applications. If needed and privacy constraints permit, they can query and LLM as needed for more complicated tasks. This means that not every prompt leads to a query, and thus network traffic, privacy, and latency constraints are still preserved.

==='''Synthesis'''===

In summary, edge-oriented machine learning optimization requires an integrated approach that combines model-level compression with system-level orchestration. Techniques such as quantization, structured and unstructured pruning, and knowledge distillation reduce the computational footprint and memory requirements of deep learning models, enabling deployment on resource-constrained devices without substantial loss in inference accuracy. Concurrently, dynamic workload partitioning, heterogeneity-aware scheduling, and adaptive runtime profiling allow the system to allocate tasks across edge and cloud tiers based on real-time availability of compute, bandwidth, and energy resources. This joint optimization across model architecture and execution environment is essential to meet the latency, privacy, and resilience demands of edge AI deployments.

=='''References'''==
[https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10818760&tag=1 1. M. Zhang, X. Shen, J. Cao, Z. Cui and S. Jiang, "EdgeShard: Efficient LLM Inference via Collaborative Edge Computing," in IEEE Internet of Things Journal, doi: 10.1109/JIOT.2024.3524255]

[https://ieeexplore.ieee.org/abstract/document/8690980 2. X. Chen, H. Zhang, C. Wu, S. Mao, Y. Ji and M. Bennis, "Performance Optimization in Mobile-Edge Computing via Deep Reinforcement Learning," 2018 IEEE 88th Vehicular Technology Conference (VTC-Fall), Chicago, IL, USA, 2018, pp. 1-6, doi: 10.1109/VTCFall.2018.8690980.]

[https://ieeexplore.ieee.org/abstract/document/8976180 3. X. Wang, Y. Han, V. C. M. Leung, D. Niyato, X. Yan and X. Chen, "Convergence of Edge Computing and Deep Learning: A Comprehensive Survey," in IEEE Communications Surveys & Tutorials, vol. 22, no. 2, pp. 869-904, Secondquarter 2020, doi: 10.1109/COMST.2020.2970550.]

[https://dl.acm.org/doi/abs/10.1145/3093337.3037698 4. Kang, Yiping and Hauswald, Johann and Gao, Cao and Rovinski, Austin and Mudge, Trevor and Mars, Jason and Tang, Lingjia, "Neurosurgeon: Collaborative Intelligence Between the Cloud and Mobile Edge" 2017 Association for Computing Machinery, New York, NY, USA, 2017, doi: 10.1145/3093337.3037698.]

[https://dl.acm.org/doi/pdf/10.1145/3555802 5. H. Hua, Y. Li, T. Wang, N. Dong, W. Li, and J. Cao, "Edge computing with artificial intelligence: A machine learning perspective," ACM Computing Surveys, vol. 55, no. 9, Art. no. 184, pp. 1–35, Jan. 2023, doi: 10.1145/3555802..]

[https://www.mdpi.com/2079-9292/13/3/640 6. Grzesik, Piotr, and Dariusz Mrozek. "Combining machine learning and edge computing: Opportunities, challenges, platforms, frameworks, and use cases." Electronics 13.3 (2024): 640.]

[https://link.springer.com/chapter/10.1007/978-3-030-96756-7_1#citeas 7. S. Rafatirad, H. Homayoun, Z. Chen, and S. M. Pudukotai Dinakarrao, "What Is Applied Machine Learning?," in Machine Learning for Computer Scientists and Data Analysts, Cham, Switzerland: Springer, 2022.]

[https://link.springer.com/article/10.1007/s13748-012-0035-5 8. Peteiro-Barral, Diego, and Bertha Guijarro-Berdiñas. "A survey of methods for distributed machine learning." Progress in Artificial Intelligence 2 (2013): 1-11.]

[https://proceedings.neurips.cc/paper/2010/hash/abea47ba24142ed16b7d8fbf2c740e0d-Abstract.html 9. Zinkevich, Martin, et al. "Parallelized stochastic gradient descent." Advances in neural information processing systems 23 (2010).]

[https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9134370 10. F. Zhuang et al., "A Comprehensive Survey on Transfer Learning," in Proceedings of the IEEE, vol. 109, no. 1, pp. 43-76, Jan. 2021, doi: 10.1109/JPROC.2020.3004555]

[https://proceedings.mlr.press/v97/phuong19a/phuong19a.pdf 11. M. Phuong and C. Lampert, "Towards understanding knowledge distillation," in Proc. Int. Conf. Mach. Learn., May 2019, pp. 5142–5151.]

[https://www.jmlr.org/papers/volume18/16-456/16-456.pdf 12. I. Hubara, M. Courbariaux, D. Soudry, R. El-Yaniv, and Y. Bengio, "Quantized neural networks: Training neural networks with low precision weights and activations," Journal of Machine Learning Research, vol. 18, no. 187, pp. 1–30, 2018.]]

[https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=8588318 13. D. Banik, A. Ekbal, and P. Bhattacharyya, "Machine learning based optimized pruning approach for decoding in statistical machine translation," IEEE Access, vol. 7, pp. 1736–1751, Dec. 2018, doi: 10.1109/ACCESS.2018.2883738.]

Machine Learning at the Edge

2025-04-25T04:31:28Z

Ciarang: /* References */

='''Machine Learning at the Edge'''=

=='''4.1 Overview of ML at the Edge'''==
==='''Introduction'''===
Machine Learning is a branch of Artificial Intelligence that is primarily concerned with the training of algorithms to look at sets of data, analyze patterns, and generate conclusions so that the gathered data can be used to generate results and carry out tasks. As an application, machine learning is especially relevant when considering edge computing [7]. Given the recent prevalence and rise in the field of machine learning, it is inevitable that deployment of machine learning algorithms will play a crucial role in the function of edge computing systems as well. The abundant sensor networks often involved in edge systems provide a means of gathering very large amounts of data. With the help of machine learning, such data can be utilized in very effective ways and automation methods can be employed to improve efficiency. Machine learning on edge devices themselves can help ease the burden on cloud systems and the networks connected to them as well, especially if the model does need such large computational power such as in the case of certain Small Language Models (SLMs). However, the main issue regarding machine learning at the edge lies in the devices, which may have limited computational power, and the environment, which may change dynamically and be effected by network congestion, device outages, or other unexpected events. If these challenges can be overcome, edge systems could play a pivotal role in advancing machine learning and utilize it to maximize efficiency.

[[File:ECML.png|600px|thumb|center|]]

==='''Benefits of Machine Learning using Edge Computing'''===
'''Data Volumes:''' Traditional cloud computing architecture may not be able to keep up with the massive volume of data that is often generated by IoT devices and sensors, thus increasing costs as well as pressure and congestion on the networks transmitting data [5]. With edge computing architectures, machine learning can be deployed in a distributed nature, allowing ML models to be deployed and trained across many edge devices. The nature of edge devices sharing computational tasks makes it perfect for taking large datasets and splitting the computational load so that many devices can compute what a single device might not have been able to handle [5]. More than just splitting input data across edge devices, machine learning models can be split up across edge devices as well. This allows for deployment of models that would be otherwise too large to be deployed on a single edge device with limited memory. In such cases, edge nodes are able to collaboratively pass data between them, and the global ML model is later updated based on the workings of each smaller portion [5]. Additionally, the sheer amount of data that processed from edge devices means more training data for machine learning models, which generally increases accuracy, and thus edge computing could be an effective solution to providing lots of good and relevant data for a variety of applications.

'''Lower Latency:''' For certain real-time applications, such as Virtual and Augmented Reality(VR/AR) or smart cars, the latency required to transmit data and process it to the cloud may be too high to make these applications efficient and safe. A key benefit of edge computing lies in the reduced latency provided by putting devices closer to the users, and this is applicable to machine learning as well [5].

'''Enhanced privacy:''' Processing data locally rather than on the cloud reduces the risk of data theft and enhances privacy. The data does not have to be sent to anyone else, or go over a network to potentially get compromised. This is crucial given the large amounts of data needed for machine learning, and especially if personal data is needed for a personal application, many users would prefer the privacy done by having it processed locally [5].

==='''Challenges of Machine Learning Using Edge Computing'''===

'''Potentially Low Computational Power:''' Many edge devices lack the computational power needed for needed for deep learning applications, especially given the large amount of data and complex operations that may have to take place [1].

'''Energy Management:''' Given that edge devices may consist of sensors, or other devices that are meant to have low energy consumption, the tasks associated with machine learning could quickly drain their power, even if they have sufficient power to run the workloads [6].

'''New Attack Surfaces:''' With more devices comes the increased potential for malicious hackers to steal data or compromise devices, despite the enhanced privacy. Encryption, access controls, and other methodologies must be employed to mitigate the potential for new attack surfaces to be exploited [6].

==='''Applications for ML at the Edge'''===
The utilization of data to make conjectures about what is going on in the environment and how to respond has a variety of use cases that can greatly benefit people, cities, and the environment. By leveraging and monitoring a constant stream of data and training machine learning models to detect or even respond to different events, there can be many practical applications for such systems. These would rely on the combination of edge devices and machine learning to better enhance experience for users and detect events of interest.

'''Self-Driving Cars''': Imitation learning can be leveraged to better understand and emulate human driving. The low latency that can be provided by edge computing is especially useful for the quick decision making needed by these systems [7].

'''Smart Home Devices''': Understanding user habits by leveraging the available data for them can make smart homes more convenient for users. The increased privacy that can come with edge computing, along with a few extra cybersecurity measures, can ensure the personal data that may be used for training is not compromised.

'''Environmental and Industrial Monitoring:''' Sensors deployed in environments and industrial settings could be trained to recognize when there are anomalies or undesirable behavior, thus ensuring a quick response and active information sending [7].

'''Smart Cities:''' Similar to above, the data collected by sensors in cities can leverage machine learning to help in crime and emergency detection, traffic management, or energy management[7].

=='''4.2 ML Training at the Edge'''==

Machine Learning (ML) training at the edge is basically the process of developing, updating, or fine-tuning ML models directly on edge devices like on smartphones, IoT sensors, wearables, and other embedded systems instead of only depending on centralized cloud infrastructure. This approach is becoming a lot more important as the demand for real-time, personalized AI applications continues to grow. By being able to train models closer to where the data is generated, edge-based ML enables faster responses, helps reduce latency, and enhances user privacy by minimizing the need to transmit sensitive data to the cloud. It’s also especially useful in scenarios where devices operate in environments with limited or unreliable network connectivity, allowing them to function more efficiently.

[[File:Machine Learning Pipeline.png|500px|thumb|right|Diagram of the machine learning pipeline: raw structured and unstructured data is preprocessed, analyzed, and used for feature selection, followed by model construction, evaluation, and deployment for application use.]]

[[File:Learning.png|500px|thumb|center| This diagram shows the relationship among different devices in the edge-cloud system. As shown, the deep learning workload is partitioned among all layers so that each one contributes based on its advantages.]]

==='''Benefits:'''===
One significant advantage of training ML models directly on edge devices is reduced latency. By processing data locally, devices can make immediate decisions without the delays caused by transmitting data back and forth to cloud servers. This immediate responsiveness is extremely important for applications like real time health monitoring, autonomous driving, and industrial automation.

Additionally, training machine learning models at the edge significantly enhances user privacy. Since sensitive data can be processed and stored directly on the user's device rather than being sent to centralized cloud servers, the risk of data breaches or unauthorized access during transmission is reduced by a lot. This local data handling is able to prevent exposure of personal or confidential information, providing users greater control over their data. Edge-based training naturally aligns with privacy regulations such as the General Data Protection Regulation (GDPR), which emphasizes strict data security, transparency, and explicit user consent. By keeping personal data localized, edge training not only improves security but also helps organizations easily comply with privacy laws, protecting users’ rights and maintaining trust.

Efficiency and resilience are important benefits of edge training. By training machine learning models directly on edge devices, these devices become capable of processing data locally without relying on constant internet connectivity. This local processing allows edge devices to continue operating effectively even in environments where network connections are weak, unstable, or completely unavailable. Because they are not fully dependent on cloud infrastructure, edge devices can quickly adapt to changes, respond in real-time, and update their ML models based on immediate local data. As a result, edge training ensures reliable performance and uninterrupted operation, making it particularly valuable for remote locations, emergency scenarios, and harsh environments where cloud-based solutions might fail or become unreliable.

'''Examples: '''
A smart thermostat in a home can learn a user’s preferences for temperature and adjust automatically based on real-time inputs, like time of day or weather conditions. Similarly, a fitness tracker can track user activity patterns and adapt its recommendations for workouts or rest periods based on how the user is performing each day. These devices don’t need to rely on cloud servers to update or personalize their behavior — they can do it instantly on the device, which makes them more responsive and efficient.

In smart agriculture, edge computing is used to enhance crop monitoring and optimize farming practices. Devices like soil sensors, drones, and automated irrigation systems are equipped with sensors that collect data on soil moisture, temperature, and crop health. Edge devices process this data locally, enabling real-time decisions for tasks like irrigation, fertilization, and pest control.

In smart retail, edge computing is used to improve inventory management and customer experience. Retailers use smart shelves, RFID tags, and in-store cameras equipped with sensors to track inventory and monitor customer behavior. By processing this data locally on edge devices, retailers can manage stock levels, detect theft, and optimize store layouts in real-time. RFID tags placed on products can detect when an item is removed from the shelf. Using edge processing, the system can immediately update the inventory count and trigger a restocking request if an item’s stock is low.

'''Research Papers:'''
An important contribution to the understanding of machine learning (ML) training at the edge is the research paper "Making Distributed Edge Machine Learning for Resource-Constrained Communities and Environments Smarter: Contexts and Challenges" by Truong et al. (2023). This paper focuses on training ML models directly on edge devices in communities and environments facing limitations, such as unstable network connections, limited computational resources, and scarce technical expertise. The authors emphasize the necessity of developing context-aware ML training methods specifically tailored to these environments. Traditional centralized ML training methods often fail to operate effectively in such constrained settings, highlighting the need for decentralized, localized solutions. Truong et al. explore various challenges, including managing data efficiently, deploying suitable software frameworks, and designing intelligent runtime strategies that allow edge devices to train models effectively despite limited resources. Their work points out significant research directions, advocating for more adaptable and sustainable ML training solutions that genuinely reflect the technological and social contexts of resource-limited environments.

==='''Tools and Frameworks:'''===

Frameworks like TensorFlow Lite, PyTorch Mobile, and Edge Impulse are designed to support edge-based model training and inference. These tools allow developers to build and fine-tune models specifically for deployment on low-power devices.

==='''Technical Challenges:'''===

Despite its advantages, ML training at the edge presents challenges, including limited processing power, memory constraints, and energy efficiency. Edge devices often lack the computational resources of cloud servers, requiring lightweight models, optimized algorithms, and energy-efficient hardware.

Real World Applications:
A well known example is Apple’s use of on-device training for personalized voice recognition with Siri. Instead of uploading user voice data to the cloud, Apple uses local training to improve accuracy over time while maintaining user privacy.

==='''Dealing With Challenges'''===
Despite the challenges, of ML at the edge, there are a variety of methods that can be used to provide a more efficient means of training, and making the heavy workloads compatible with even the limited computing power of certain edge devices.

'''Quantization:'''
Quantization is a method that involves reducing the precision of numbers, and thus easing the burden of computational power as well as memory management on the edge devices. There are multiple forms of quantization, but each one essentially sacrifices some precision - enough so that accuracy is mostly maintained but the numbers are easier to handle. For example, converting from floating point to integer datatypes means significantly less memory is used, and the differences for some models in the precision may be negligible. Another example is K-means based Weight Quantization, which involves creating a matrix and grouping similar numbers together with centroids. An example is shown below:

[[File:Screenshot_2025-04-24_170604.png|500px|thumb|right|By clustering each index in the matrix, and using centroids to approximate, the overall computations can be done much quicker and are more easily handled by edge devices]]

In recent work, Quantized Neural Networks (QNNs) have demonstrated that even extreme quantization—such as using just 1-bit values for weights and activations—can retain near state-of-the-art accuracy across vision and language tasks [12]. This type of quantization drastically reduces memory access requirements and replaces expensive arithmetic operations with fast, low-power bitwise operations like XNOR and popcount. These benefits are especially important for edge deployment, where energy efficiency is critical. In addition to model compression, Hubara et al. also show that quantized gradients—using as little as 6 bits—can be employed during training with minimal performance loss, further enabling efficient on-device learning [12]. QNNs have achieved strong results even on demanding benchmarks like ImageNet, while offering significant speedups and memory savings, making them one of the most practical solutions for edge AI deployment [12].

'''Pruning:'''
Pruning is an optimization technique that systematically removes low-salience parameters—such as weakly contributing weights or redundant hypothesis paths—from a machine learning model or decoding algorithm to reduce computational overhead. In the context of edge computing, where resources like memory bandwidth, power, and processing time are limited, pruning enables the deployment of performant models within strict efficiency constraints.

In statistical machine translation (SMT) systems, pruning is particularly critical during the decoding phase, where the search space of possible translations grows exponentially with sentence length. Techniques such as histogram pruning and threshold pruning are employed to manage this complexity. Histogram pruning restricts the number of candidate hypotheses retained in a decoding stack to a fixed size 𝑛, discarding the remainder. Threshold pruning eliminates hypotheses whose scores fall below a proportion 𝛼 of the best-scoring candidate, effectively filtering out weak candidates early.

The paper by Banik et al. introduces a machine learning-based dynamic pruning framework that adaptively tunes pruning parameters—namely stack size and beam threshold—based on structural features of the input text, such as sentence length, syntactic complexity, and the distribution of stop words. Rather than relying on static hyperparameters, this method uses a classifier (CN2 algorithm) trained on performance data to predict optimal pruning configurations at runtime. Experimental results showed consistent reductions in decoding latency (up to 90%) while maintaining or improving translation quality, as measured by BLEU scores [13].

This adaptive pruning paradigm is highly relevant to edge inference pipelines, where models must maintain a balance between latency and predictive accuracy. By intelligently limiting the hypothesis space and focusing computational resources on high-probability paths, pruning supports real-time, resource-efficient processing in edge NLP and embedded translation systems.

[[File:Pruning.png|350px|thumb|center|This shows how pruning can significantly reduce the overall network, thus leading to better computational and memory management]]

'''Distillation:'''
Distillation is a key strategy for reducing model complexity in edge computing environments. Instead of training a compact student model on hard labels—discrete class labels like 0, 1, or 2—it is trained on the soft outputs of a larger teacher model. These soft labels represent probability distributions over all classes, offering more nuanced supervision. For instance, rather than telling the student the input belongs strictly to class 3, a teacher might output “70% class 3, 25% class 2, 5% class 1.” This richer feedback helps the student model capture subtle relationships between classes that hard labels miss. Beyond reducing computational demands, distillation enhances generalization by conveying more informative training signals. It also benefits from favorable data geometry—when class distributions are well-separated and aligned—and exhibits strong monotonicity, meaning the student model reliably improves as more data becomes available [11]. These properties make it exceptionally suited for edge devices where training data may be limited, but efficient inference is crucial.

In most cases, knowledge distillation in edge environments involves a large, high-capacity model trained in the cloud acting as the teacher, while the smaller, lightweight student model is deployed on edge devices. A less common—but emerging—practice is edge-to-edge distillation, where a more powerful edge node or edge server functions as the teacher for other nearby edge devices. This setup is especially valuable in federated, collaborative, or hierarchical edge networks, where cloud connectivity may be limited or privacy concerns necessitate local training. Distillation can also be combined with techniques such as quantization or pruning to further optimize model performance under hardware constraints. An example is shown below:

[[File:Knowledge_Distillation.png|700px|thumb|center|This shows how a complex teacher model transfers learned knowledge to a smaller student model using soft predictions to enable efficient edge deployment]]

==='''Usage and Applications of AI Agents'''===
As AI develops and is able to identify different images, generate chats and utilize human language, and optimize different systems, the next step is for it to be able to efficiently perform tasks specified by users. This is known as an AI agent. The agent must first perceive its environment and task, often specified by a user, then it must think about what is the best means to accomplish the task, and finally be able to act on it. 3 aspects of AI agents are crucial in their development to efficiently and correctly carry out task:

'''Reasoning:''' The agent must be able to think sequentially, and decompose its specified tasks into a sequence of specific steps in order to accomplish its goal. It must also have some memory storage in order to remember what it has done, as well as the results of its sequence of actions in order to learn for future steps.

'''Autonomy:''' The agent must choose from the availability of possible steps, and operate based on its reasoning without step-by-step instructions from the user.

'''Tools:''' These tasks, however, are impossible to accomplish without the correct tools. Even if an AI agent understands how to go about carrying a task for optimal results, it must have the actual means to do it. This can include the ability to use and interact with APIs, interpret code, and access certain databases.

Utilizing AI agents on edge devices can be tricky due to the computational and reasoning power needed. However, there are methods to accomplish this such as SLMs which query LLMs as need (discussed later), or utilizing more powerful edge devices to carry out tasks. However, utilizing edge devices can be paramount if latency is a major issue, or if the agent is exposed to sensitive user data. Additionally, by using edge devices specific to a user, it may be able to better learn a user's patterns and preferences and react accordingly to provide the best possible outcome for that user.

=='''4.3 ML Model Optimization at the Edge'''==

==='''The Need for Model Optimization at the Edge'''===
Given the constrained resources, along with the inherently dynamic environment that edge devices must function in, model optimization is a crucial part of machine learning in edge computing. The current most widely used methodology consists of simply specifying an exceptionally large set of parameters, and giving it to the model to train on. This can be feasible when hardware is very advanced and powerful, and is necessary for systems such as Large Language Models (LLMs). However, this is no longer viable when dealing with the devices and environments at the edge. It is crucial to identify the best parameters and training methodology so as to minimize the amount of work done by these devices, while compromising as little as possible on the accuracy of the models. There are multiple ways to this, and they include either optimization or augmentation of the dataset itself, or optimization of the partition of work among the edge devices.

==='''Edge and Cloud Collaboration'''===
One methodology that is often used involves collaboration between both Edge and Cloud Devices. The cloud has the ability to process workloads that may require much more resources and cannot be done on edge devices. On the other hand, edge devices, which can store and process data locally, may have lower latency and more privacy. Given the advantages of each of these, many have proposed that the best way to handle machine learning is through a combination of edge and cloud computing.

The primary issue facing this computing paradigm, however, is the problem of optimally selecting which workloads should be done on the cloud and which should be done on the edge. This is a crucial problem to solve, as the correct partition of workloads is the best way to ensure that the respective benefits of the devices can be leveraged. A common way to do this, is to run certain computing tasks on the necessary devices and determine the length of time and resources that it takes. An example of this is the profiling step done in [https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10818760&tag=1 EdgeShard] [1] and [https://dl.acm.org/doi/pdf/10.1145/3093337.3037698 Neurosurgeon] [4]. Other frameworks implement similar steps, where the capabilities of different devices are tested in order to allocate their workloads and determine the limit at which they can provide efficient functionality. If the workload is beyond the limits of the devices, it can be sent to the cloud for processing

The key advantage of this is that it is able to utilize the resources of the edge devices as necessary, allowing increased data privacy and lower latency. Since workloads are only processed in the cloud as needed, this will reduce the overall amount of time needed for processing because data is not constantly sent back and forth. It also allows for much less network congestion, which is crucial for many applications.

[[File:ECcollab.png|400px|thumb|center|The collaboration of Edge and Cloud Devices]]

==='''Optimizing Workload Partitioning'''===
The key idea for much of the optimization done in machine learning on edge systems involves fully utilizing the heterogenous devices that are often contained in these systems. As such, it is important to understand the capabilities of each device so as to fully utilize its advantages. Devices can very greatly from smartphones with more powerful computational abilities to raspberry pis to sensors. More difficult tasks are offloaded to the powerful devices, while simpler tasks, or models that have been somewhat pretrained can be sent to the smaller devices. In some cases, as in [https://ieeexplore.ieee.org/abstract/document/8690980 Mobile-Edge] [2], the task may be dropped altogether if the resources are deemed insufficient. In this way, exceptionally difficult tasks do not block tasks that have the ability to be executed and therefore the system can continue working.

==='''Dynamic Models'''===
Given the dynamic nature of the environments that edge devices must function, as well as the heterogeneity of the devices themselves, a dynamic model of machine learning is often employed. Such models must keep track of the current available resources including computation usage and power, as well as network traffic. These may change very often depending on the workloads and devices in the system. As such, training models to continuously monitor and dynamically distribute the workloads is a very important part of optimization. Simply offloading larger tasks to more powerful devices may be obsolete if the devices has all of its computing resources or network capabilities being used up by another workload.

The way this is commonly done is by using the profiling step described above as a baseline. Then, a machine learning model utilizes the data to estimate the performance of devices and/or layers. During runtime, a similar process is employed which may update the data used and help the model refine its predictions. Network traffic is also taken into account at this stage in order to preserve the edge computing benefit of providing lower latency. Using all of this data and updates at runtime, the partitioning model is able to dynamically distribute workloads at runtime in order to optimize the workflow and ensure each device is utilizing its resources in the most efficient manner. 2 very good examples of how such a system is specifically deployed are the Neurosurgeon and EdgeShard systems, shown above.

==='''Horizontal and Vertical Partitioning'''===
There are 2 major ways that these models split the workloads in order to optimize the machine learning: Horizontal and vertical partitioning [3]. Given a set of layers that ranges from the cloud to edge, vertical partitioning involves splitting up the workload between the layers. For example, if a large amount of computational resources is deemed necessary, this task may go to the cloud to be completed and preprocessed. One the other hand, if a small amount of computational power is required, this type of work can go to edge devices. Such partitioning also depends on the confidence and accuracy level of the given learning. If the accuracy is completed on an edge device and found to be very low, it can be sent to the cloud; on the other hand if the accuracy is already fairly high and the learning model needs smaller work to reach the threshold deemed acceptable, it may be sent to edge devices to free up network traffic on the cloud and reduce latency [3].

The second model of partitioning is called horizontal partitioning. This involves splitting among the devices within a certain layer rather than among the layers themselves. This is similar to what has been described in previous sections, as it allows a means for fully utilizing the heterogenous abilities that are found in edge devices. Similar functionality and determination to what is found in horizontal partitioning is done, but all of the devices that the workload is split across function on the same layer [3]. To fully optimize a machine learning model, both horizontal and vertical partitioning must be used.

[[File:Edge_computing_layers.png|400px|thumb|center|An example of different layers with multiple devices]]

==='''Distributed Learning'''===
Distributed learning is a process in which instead of giving all of the data to the nodes, each node takes in and processes only a subset of the overall data, then sends its results to a central server which contains the main model. These periodic updates are done by each node, and by only processing part of the data, it is much more easy for edge devices to handle these workloads. It can also reduce the time and computational burden on the cloud and network because it is not only the central server performing all of the computations. One popular method of accomplishing this is by using parallel gradient descent. Gradient descent by itself is a very useful means of training a model, and parallel gradient descent uses a similar process but instead of a single operation, it aggregates the gradient descent calculated by each node in order to update the central model. This efficiently utilizes each node and makes sure that the data that is used to update the model does not exceed the memory constraints of the edge devices being used as nodes.

[[File:Distributed.png|600px|thumb|center| A visualization of data partitioning among nodes doing distributed learning]]

'''Asynchronous Scheduling:''' One important aspect of distributed learning is the means by which the updates are given to the model. Each node may have different amounts of data, memory, and computational power, and therefore the time it takes to process and update will not be the same. Synchronous algorithms make sure that each node finishes its respective processing, and then all of the calculations from all nodes are sent to update the central model. Although, this can make the updating easier, any nodes that lag behind the others will greatly reduce the speed at which the model is trained; this also leaves nodes idle while others are still processing and can be highly inefficient. To optimize this, after each node has finished its respective processing, it sends everything to the main server for an update rather than waiting for all the others to finish. However, this requires some more communication, as after each update, every node must get the updated model for the central server; this can be challenging to repeatedly do, but the efficiency that is gained is often worth it.

'''Federated learning:''' Federated learning is another means by which the data as well as computational burden is distributed among a set of nodes, thus reducing network traffic and strain on the central servers. It is similar to distributed learning, but the key difference lies in the data partitioning. Unlike distributed learning, data used in federated learning is not shared with the central servers. Instead, only local data from each of the nodes is used to train the model. Then, the only part shared with the central model is the updated parameters. A key aspect of this is that federated learning provides a greater amount of data privacy, which is crucial for certain applications dealing with sensitive data. Therefore, it is especially useful to utilize edge devices to perform federated learning. Federated learning is discussed in detail in chapter 5 of this wiki.

==='''Transfer Learning'''===
Transfer learning is a method of machine learning in which a pretrained model is sent to the different nodes for further processing and fine-tuning. The initial training of the model may involve a very large amount of data and could place a major burden on the device that must execute it, which may not be feasible for edge devices. Therefore, the bulk of the model training is done by a more powerful machine, such as the cloud, and then the pretrained model is sent to the edge device. This can be useful, as it reduces the computational burden on the edge device, and allows it to fine-tune the model using the data it collects without having to completely train the whole system. One form of this is knowledge distillation, in which a smaller model can be trained to mimic that of a larger model. This may often be the case when edge and cloud systems are used in a combined way.

==='''Methods of Data Optimization'''===

'''Data Selection:''' Certain types of data may be more useful than others, and therefore the ones that most affect the accuracy of a model can be offloaded to edge devices. This decreases the workload on them because less data must be processed, while also conserving the accuracy of the model as much as possible. Some larger data may not be needed, such as only putting Small Language Models on edge devices that can handle simple commands and prompts, rather than having to offload an entire LLM onto the device, which may overload its computational abilities and not provide enough use.

'''Data Compression:''' Data can be compressed to a smaller form in order to fit the constraints of edge devices. This is especially true given their limited memory, and also makes the workload smaller. Quantization, discussed previously, is a prevalent example of this.

'''Container Workloads:''' Container workloads can be very useful, as they provide all resources and important data the device needs for processing the work. By examining the computational abilities of the device, different sized workloads can be allocated as deemed necessary to maximize the efficiency of the training.

'''Batch size tuning:''' The batch size used by a certain model is very important when considering the memory constraints of edge devices. Batch sizes that are smaller allow for a quicker and more efficient means of training models, and are less likely to lead to bottlenecks in memory. This is related to partitioning because the computing and memory capacity of the devices available are very important factors to consider.

==='''Utilizing Small Language Models (SLMs)'''===
Large Language Models (LLMs) have become a prevalent system recently and are able to do and help with a variety of tasks. However, running and training an LLM requires a significant amount of computational resources which is not feasible when working with edge devices. Most modern LLMs are cloud based, but this may lead to high latency and increased network traffic, especially when working with a large subsystem of nodes.

One way that a similar system can be achieved on edge devices is by using SLMs. These are not as accurate and do not have the vast knowledge of LLMs or the amount of data they are trained on, but for the purposes of basic applications and edge devices, they can be sufficient to accomplish many tasks. They are also often fine-tuned and trained to accomplish the specific tasks which they are deployed for and are much faster and resource-efficient than LLMs. They can also provide much more privacy because they are able to be run on local devices without sharing user data to the cloud. This can be useful for a wide variety of edge applications. If needed and privacy constraints permit, they can query and LLM as needed for more complicated tasks. This means that not every prompt leads to a query, and thus network traffic, privacy, and latency constraints are still preserved.

=='''References'''==
[https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10818760&tag=1 1. M. Zhang, X. Shen, J. Cao, Z. Cui and S. Jiang, "EdgeShard: Efficient LLM Inference via Collaborative Edge Computing," in IEEE Internet of Things Journal, doi: 10.1109/JIOT.2024.3524255]

[https://ieeexplore.ieee.org/abstract/document/8690980 2. X. Chen, H. Zhang, C. Wu, S. Mao, Y. Ji and M. Bennis, "Performance Optimization in Mobile-Edge Computing via Deep Reinforcement Learning," 2018 IEEE 88th Vehicular Technology Conference (VTC-Fall), Chicago, IL, USA, 2018, pp. 1-6, doi: 10.1109/VTCFall.2018.8690980.]

[https://ieeexplore.ieee.org/abstract/document/8976180 3. X. Wang, Y. Han, V. C. M. Leung, D. Niyato, X. Yan and X. Chen, "Convergence of Edge Computing and Deep Learning: A Comprehensive Survey," in IEEE Communications Surveys & Tutorials, vol. 22, no. 2, pp. 869-904, Secondquarter 2020, doi: 10.1109/COMST.2020.2970550.]

[https://dl.acm.org/doi/abs/10.1145/3093337.3037698 4. Kang, Yiping and Hauswald, Johann and Gao, Cao and Rovinski, Austin and Mudge, Trevor and Mars, Jason and Tang, Lingjia, "Neurosurgeon: Collaborative Intelligence Between the Cloud and Mobile Edge" 2017 Association for Computing Machinery, New York, NY, USA, 2017, doi: 10.1145/3093337.3037698.]

[https://dl.acm.org/doi/pdf/10.1145/3555802 5. H. Hua, Y. Li, T. Wang, N. Dong, W. Li, and J. Cao, "Edge computing with artificial intelligence: A machine learning perspective," ACM Computing Surveys, vol. 55, no. 9, Art. no. 184, pp. 1–35, Jan. 2023, doi: 10.1145/3555802..]

[https://www.mdpi.com/2079-9292/13/3/640 6. Grzesik, Piotr, and Dariusz Mrozek. "Combining machine learning and edge computing: Opportunities, challenges, platforms, frameworks, and use cases." Electronics 13.3 (2024): 640.]

[https://link.springer.com/chapter/10.1007/978-3-030-96756-7_1#citeas 7. S. Rafatirad, H. Homayoun, Z. Chen, and S. M. Pudukotai Dinakarrao, "What Is Applied Machine Learning?," in Machine Learning for Computer Scientists and Data Analysts, Cham, Switzerland: Springer, 2022.]

[https://link.springer.com/article/10.1007/s13748-012-0035-5 8. Peteiro-Barral, Diego, and Bertha Guijarro-Berdiñas. "A survey of methods for distributed machine learning." Progress in Artificial Intelligence 2 (2013): 1-11.]

[https://proceedings.neurips.cc/paper/2010/hash/abea47ba24142ed16b7d8fbf2c740e0d-Abstract.html 9. Zinkevich, Martin, et al. "Parallelized stochastic gradient descent." Advances in neural information processing systems 23 (2010).]

[https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9134370 10. F. Zhuang et al., "A Comprehensive Survey on Transfer Learning," in Proceedings of the IEEE, vol. 109, no. 1, pp. 43-76, Jan. 2021, doi: 10.1109/JPROC.2020.3004555]

[https://proceedings.mlr.press/v97/phuong19a/phuong19a.pdf 11. M. Phuong and C. Lampert, "Towards understanding knowledge distillation," in Proc. Int. Conf. Mach. Learn., May 2019, pp. 5142–5151.]

[https://www.jmlr.org/papers/volume18/16-456/16-456.pdf 12. I. Hubara, M. Courbariaux, D. Soudry, R. El-Yaniv, and Y. Bengio, "Quantized neural networks: Training neural networks with low precision weights and activations," Journal of Machine Learning Research, vol. 18, no. 187, pp. 1–30, 2018.]]

[https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=8588318 13. D. Banik, A. Ekbal, and P. Bhattacharyya, "Machine learning based optimized pruning approach for decoding in statistical machine translation," IEEE Access, vol. 7, pp. 1736–1751, Dec. 2018, doi: 10.1109/ACCESS.2018.2883738.]

Machine Learning at the Edge

2025-04-25T04:29:00Z

Ciarang: /* Benefits of Machine Learning using Edge Computing */

='''Machine Learning at the Edge'''=

=='''4.1 Overview of ML at the Edge'''==
==='''Introduction'''===
Machine Learning is a branch of Artificial Intelligence that is primarily concerned with the training of algorithms to look at sets of data, analyze patterns, and generate conclusions so that the gathered data can be used to generate results and carry out tasks. As an application, machine learning is especially relevant when considering edge computing [7]. Given the recent prevalence and rise in the field of machine learning, it is inevitable that deployment of machine learning algorithms will play a crucial role in the function of edge computing systems as well. The abundant sensor networks often involved in edge systems provide a means of gathering very large amounts of data. With the help of machine learning, such data can be utilized in very effective ways and automation methods can be employed to improve efficiency. Machine learning on edge devices themselves can help ease the burden on cloud systems and the networks connected to them as well, especially if the model does need such large computational power such as in the case of certain Small Language Models (SLMs). However, the main issue regarding machine learning at the edge lies in the devices, which may have limited computational power, and the environment, which may change dynamically and be effected by network congestion, device outages, or other unexpected events. If these challenges can be overcome, edge systems could play a pivotal role in advancing machine learning and utilize it to maximize efficiency.

[[File:ECML.png|600px|thumb|center|]]

==='''Benefits of Machine Learning using Edge Computing'''===
'''Data Volumes:''' Traditional cloud computing architecture may not be able to keep up with the massive volume of data that is often generated by IoT devices and sensors, thus increasing costs as well as pressure and congestion on the networks transmitting data [5]. With edge computing architectures, machine learning can be deployed in a distributed nature, allowing ML models to be deployed and trained across many edge devices. The nature of edge devices sharing computational tasks makes it perfect for taking large datasets and splitting the computational load so that many devices can compute what a single device might not have been able to handle [5]. More than just splitting input data across edge devices, machine learning models can be split up across edge devices as well. This allows for deployment of models that would be otherwise too large to be deployed on a single edge device with limited memory. In such cases, edge nodes are able to collaboratively pass data between them, and the global ML model is later updated based on the workings of each smaller portion [5]. Additionally, the sheer amount of data that processed from edge devices means more training data for machine learning models, which generally increases accuracy, and thus edge computing could be an effective solution to providing lots of good and relevant data for a variety of applications.

'''Lower Latency:''' For certain real-time applications, such as Virtual and Augmented Reality(VR/AR) or smart cars, the latency required to transmit data and process it to the cloud may be too high to make these applications efficient and safe. A key benefit of edge computing lies in the reduced latency provided by putting devices closer to the users, and this is applicable to machine learning as well [5].

'''Enhanced privacy:''' Processing data locally rather than on the cloud reduces the risk of data theft and enhances privacy. The data does not have to be sent to anyone else, or go over a network to potentially get compromised. This is crucial given the large amounts of data needed for machine learning, and especially if personal data is needed for a personal application, many users would prefer the privacy done by having it processed locally [5].

==='''Challenges of Machine Learning Using Edge Computing'''===

'''Potentially Low Computational Power:''' Many edge devices lack the computational power needed for needed for deep learning applications, especially given the large amount of data and complex operations that may have to take place [1].

'''Energy Management:''' Given that edge devices may consist of sensors, or other devices that are meant to have low energy consumption, the tasks associated with machine learning could quickly drain their power, even if they have sufficient power to run the workloads [6].

'''New Attack Surfaces:''' With more devices comes the increased potential for malicious hackers to steal data or compromise devices, despite the enhanced privacy. Encryption, access controls, and other methodologies must be employed to mitigate the potential for new attack surfaces to be exploited [6].

==='''Applications for ML at the Edge'''===
The utilization of data to make conjectures about what is going on in the environment and how to respond has a variety of use cases that can greatly benefit people, cities, and the environment. By leveraging and monitoring a constant stream of data and training machine learning models to detect or even respond to different events, there can be many practical applications for such systems. These would rely on the combination of edge devices and machine learning to better enhance experience for users and detect events of interest.

'''Self-Driving Cars''': Imitation learning can be leveraged to better understand and emulate human driving. The low latency that can be provided by edge computing is especially useful for the quick decision making needed by these systems [7].

'''Smart Home Devices''': Understanding user habits by leveraging the available data for them can make smart homes more convenient for users. The increased privacy that can come with edge computing, along with a few extra cybersecurity measures, can ensure the personal data that may be used for training is not compromised.

'''Environmental and Industrial Monitoring:''' Sensors deployed in environments and industrial settings could be trained to recognize when there are anomalies or undesirable behavior, thus ensuring a quick response and active information sending [7].

'''Smart Cities:''' Similar to above, the data collected by sensors in cities can leverage machine learning to help in crime and emergency detection, traffic management, or energy management[7].

=='''4.2 ML Training at the Edge'''==

Machine Learning (ML) training at the edge is basically the process of developing, updating, or fine-tuning ML models directly on edge devices like on smartphones, IoT sensors, wearables, and other embedded systems instead of only depending on centralized cloud infrastructure. This approach is becoming a lot more important as the demand for real-time, personalized AI applications continues to grow. By being able to train models closer to where the data is generated, edge-based ML enables faster responses, helps reduce latency, and enhances user privacy by minimizing the need to transmit sensitive data to the cloud. It’s also especially useful in scenarios where devices operate in environments with limited or unreliable network connectivity, allowing them to function more efficiently.

[[File:Machine Learning Pipeline.png|500px|thumb|right|Diagram of the machine learning pipeline: raw structured and unstructured data is preprocessed, analyzed, and used for feature selection, followed by model construction, evaluation, and deployment for application use.]]

[[File:Learning.png|500px|thumb|center| This diagram shows the relationship among different devices in the edge-cloud system. As shown, the deep learning workload is partitioned among all layers so that each one contributes based on its advantages.]]

==='''Benefits:'''===
One significant advantage of training ML models directly on edge devices is reduced latency. By processing data locally, devices can make immediate decisions without the delays caused by transmitting data back and forth to cloud servers. This immediate responsiveness is extremely important for applications like real time health monitoring, autonomous driving, and industrial automation.

Additionally, training machine learning models at the edge significantly enhances user privacy. Since sensitive data can be processed and stored directly on the user's device rather than being sent to centralized cloud servers, the risk of data breaches or unauthorized access during transmission is reduced by a lot. This local data handling is able to prevent exposure of personal or confidential information, providing users greater control over their data. Edge-based training naturally aligns with privacy regulations such as the General Data Protection Regulation (GDPR), which emphasizes strict data security, transparency, and explicit user consent. By keeping personal data localized, edge training not only improves security but also helps organizations easily comply with privacy laws, protecting users’ rights and maintaining trust.

Efficiency and resilience are important benefits of edge training. By training machine learning models directly on edge devices, these devices become capable of processing data locally without relying on constant internet connectivity. This local processing allows edge devices to continue operating effectively even in environments where network connections are weak, unstable, or completely unavailable. Because they are not fully dependent on cloud infrastructure, edge devices can quickly adapt to changes, respond in real-time, and update their ML models based on immediate local data. As a result, edge training ensures reliable performance and uninterrupted operation, making it particularly valuable for remote locations, emergency scenarios, and harsh environments where cloud-based solutions might fail or become unreliable.

'''Examples: '''
A smart thermostat in a home can learn a user’s preferences for temperature and adjust automatically based on real-time inputs, like time of day or weather conditions. Similarly, a fitness tracker can track user activity patterns and adapt its recommendations for workouts or rest periods based on how the user is performing each day. These devices don’t need to rely on cloud servers to update or personalize their behavior — they can do it instantly on the device, which makes them more responsive and efficient.

In smart agriculture, edge computing is used to enhance crop monitoring and optimize farming practices. Devices like soil sensors, drones, and automated irrigation systems are equipped with sensors that collect data on soil moisture, temperature, and crop health. Edge devices process this data locally, enabling real-time decisions for tasks like irrigation, fertilization, and pest control.

In smart retail, edge computing is used to improve inventory management and customer experience. Retailers use smart shelves, RFID tags, and in-store cameras equipped with sensors to track inventory and monitor customer behavior. By processing this data locally on edge devices, retailers can manage stock levels, detect theft, and optimize store layouts in real-time. RFID tags placed on products can detect when an item is removed from the shelf. Using edge processing, the system can immediately update the inventory count and trigger a restocking request if an item’s stock is low.

'''Research Papers:'''
An important contribution to the understanding of machine learning (ML) training at the edge is the research paper "Making Distributed Edge Machine Learning for Resource-Constrained Communities and Environments Smarter: Contexts and Challenges" by Truong et al. (2023). This paper focuses on training ML models directly on edge devices in communities and environments facing limitations, such as unstable network connections, limited computational resources, and scarce technical expertise. The authors emphasize the necessity of developing context-aware ML training methods specifically tailored to these environments. Traditional centralized ML training methods often fail to operate effectively in such constrained settings, highlighting the need for decentralized, localized solutions. Truong et al. explore various challenges, including managing data efficiently, deploying suitable software frameworks, and designing intelligent runtime strategies that allow edge devices to train models effectively despite limited resources. Their work points out significant research directions, advocating for more adaptable and sustainable ML training solutions that genuinely reflect the technological and social contexts of resource-limited environments.

==='''Tools and Frameworks:'''===

Frameworks like TensorFlow Lite, PyTorch Mobile, and Edge Impulse are designed to support edge-based model training and inference. These tools allow developers to build and fine-tune models specifically for deployment on low-power devices.

==='''Technical Challenges:'''===

Despite its advantages, ML training at the edge presents challenges, including limited processing power, memory constraints, and energy efficiency. Edge devices often lack the computational resources of cloud servers, requiring lightweight models, optimized algorithms, and energy-efficient hardware.

Real World Applications:
A well known example is Apple’s use of on-device training for personalized voice recognition with Siri. Instead of uploading user voice data to the cloud, Apple uses local training to improve accuracy over time while maintaining user privacy.

==='''Dealing With Challenges'''===
Despite the challenges, of ML at the edge, there are a variety of methods that can be used to provide a more efficient means of training, and making the heavy workloads compatible with even the limited computing power of certain edge devices.

'''Quantization:'''
Quantization is a method that involves reducing the precision of numbers, and thus easing the burden of computational power as well as memory management on the edge devices. There are multiple forms of quantization, but each one essentially sacrifices some precision - enough so that accuracy is mostly maintained but the numbers are easier to handle. For example, converting from floating point to integer datatypes means significantly less memory is used, and the differences for some models in the precision may be negligible. Another example is K-means based Weight Quantization, which involves creating a matrix and grouping similar numbers together with centroids. An example is shown below:

[[File:Screenshot_2025-04-24_170604.png|500px|thumb|right|By clustering each index in the matrix, and using centroids to approximate, the overall computations can be done much quicker and are more easily handled by edge devices]]

In recent work, Quantized Neural Networks (QNNs) have demonstrated that even extreme quantization—such as using just 1-bit values for weights and activations—can retain near state-of-the-art accuracy across vision and language tasks [12]. This type of quantization drastically reduces memory access requirements and replaces expensive arithmetic operations with fast, low-power bitwise operations like XNOR and popcount. These benefits are especially important for edge deployment, where energy efficiency is critical. In addition to model compression, Hubara et al. also show that quantized gradients—using as little as 6 bits—can be employed during training with minimal performance loss, further enabling efficient on-device learning [12]. QNNs have achieved strong results even on demanding benchmarks like ImageNet, while offering significant speedups and memory savings, making them one of the most practical solutions for edge AI deployment [12].

'''Pruning:'''
Pruning is an optimization technique that systematically removes low-salience parameters—such as weakly contributing weights or redundant hypothesis paths—from a machine learning model or decoding algorithm to reduce computational overhead. In the context of edge computing, where resources like memory bandwidth, power, and processing time are limited, pruning enables the deployment of performant models within strict efficiency constraints.

In statistical machine translation (SMT) systems, pruning is particularly critical during the decoding phase, where the search space of possible translations grows exponentially with sentence length. Techniques such as histogram pruning and threshold pruning are employed to manage this complexity. Histogram pruning restricts the number of candidate hypotheses retained in a decoding stack to a fixed size 𝑛, discarding the remainder. Threshold pruning eliminates hypotheses whose scores fall below a proportion 𝛼 of the best-scoring candidate, effectively filtering out weak candidates early.

The paper by Banik et al. introduces a machine learning-based dynamic pruning framework that adaptively tunes pruning parameters—namely stack size and beam threshold—based on structural features of the input text, such as sentence length, syntactic complexity, and the distribution of stop words. Rather than relying on static hyperparameters, this method uses a classifier (CN2 algorithm) trained on performance data to predict optimal pruning configurations at runtime. Experimental results showed consistent reductions in decoding latency (up to 90%) while maintaining or improving translation quality, as measured by BLEU scores [13].

This adaptive pruning paradigm is highly relevant to edge inference pipelines, where models must maintain a balance between latency and predictive accuracy. By intelligently limiting the hypothesis space and focusing computational resources on high-probability paths, pruning supports real-time, resource-efficient processing in edge NLP and embedded translation systems.

[[File:Pruning.png|350px|thumb|center|This shows how pruning can significantly reduce the overall network, thus leading to better computational and memory management]]

'''Distillation:'''
Distillation is a key strategy for reducing model complexity in edge computing environments. Instead of training a compact student model on hard labels—discrete class labels like 0, 1, or 2—it is trained on the soft outputs of a larger teacher model. These soft labels represent probability distributions over all classes, offering more nuanced supervision. For instance, rather than telling the student the input belongs strictly to class 3, a teacher might output “70% class 3, 25% class 2, 5% class 1.” This richer feedback helps the student model capture subtle relationships between classes that hard labels miss. Beyond reducing computational demands, distillation enhances generalization by conveying more informative training signals. It also benefits from favorable data geometry—when class distributions are well-separated and aligned—and exhibits strong monotonicity, meaning the student model reliably improves as more data becomes available [11]. These properties make it exceptionally suited for edge devices where training data may be limited, but efficient inference is crucial.

In most cases, knowledge distillation in edge environments involves a large, high-capacity model trained in the cloud acting as the teacher, while the smaller, lightweight student model is deployed on edge devices. A less common—but emerging—practice is edge-to-edge distillation, where a more powerful edge node or edge server functions as the teacher for other nearby edge devices. This setup is especially valuable in federated, collaborative, or hierarchical edge networks, where cloud connectivity may be limited or privacy concerns necessitate local training. Distillation can also be combined with techniques such as quantization or pruning to further optimize model performance under hardware constraints. An example is shown below:

[[File:Knowledge_Distillation.png|700px|thumb|center|This shows how a complex teacher model transfers learned knowledge to a smaller student model using soft predictions to enable efficient edge deployment]]

==='''Usage and Applications of AI Agents'''===
As AI develops and is able to identify different images, generate chats and utilize human language, and optimize different systems, the next step is for it to be able to efficiently perform tasks specified by users. This is known as an AI agent. The agent must first perceive its environment and task, often specified by a user, then it must think about what is the best means to accomplish the task, and finally be able to act on it. 3 aspects of AI agents are crucial in their development to efficiently and correctly carry out task:

'''Reasoning:''' The agent must be able to think sequentially, and decompose its specified tasks into a sequence of specific steps in order to accomplish its goal. It must also have some memory storage in order to remember what it has done, as well as the results of its sequence of actions in order to learn for future steps.

'''Autonomy:''' The agent must choose from the availability of possible steps, and operate based on its reasoning without step-by-step instructions from the user.

'''Tools:''' These tasks, however, are impossible to accomplish without the correct tools. Even if an AI agent understands how to go about carrying a task for optimal results, it must have the actual means to do it. This can include the ability to use and interact with APIs, interpret code, and access certain databases.

Utilizing AI agents on edge devices can be tricky due to the computational and reasoning power needed. However, there are methods to accomplish this such as SLMs which query LLMs as need (discussed later), or utilizing more powerful edge devices to carry out tasks. However, utilizing edge devices can be paramount if latency is a major issue, or if the agent is exposed to sensitive user data. Additionally, by using edge devices specific to a user, it may be able to better learn a user's patterns and preferences and react accordingly to provide the best possible outcome for that user.

=='''4.3 ML Model Optimization at the Edge'''==

==='''The Need for Model Optimization at the Edge'''===
Given the constrained resources, along with the inherently dynamic environment that edge devices must function in, model optimization is a crucial part of machine learning in edge computing. The current most widely used methodology consists of simply specifying an exceptionally large set of parameters, and giving it to the model to train on. This can be feasible when hardware is very advanced and powerful, and is necessary for systems such as Large Language Models (LLMs). However, this is no longer viable when dealing with the devices and environments at the edge. It is crucial to identify the best parameters and training methodology so as to minimize the amount of work done by these devices, while compromising as little as possible on the accuracy of the models. There are multiple ways to this, and they include either optimization or augmentation of the dataset itself, or optimization of the partition of work among the edge devices.

==='''Edge and Cloud Collaboration'''===
One methodology that is often used involves collaboration between both Edge and Cloud Devices. The cloud has the ability to process workloads that may require much more resources and cannot be done on edge devices. On the other hand, edge devices, which can store and process data locally, may have lower latency and more privacy. Given the advantages of each of these, many have proposed that the best way to handle machine learning is through a combination of edge and cloud computing.

The primary issue facing this computing paradigm, however, is the problem of optimally selecting which workloads should be done on the cloud and which should be done on the edge. This is a crucial problem to solve, as the correct partition of workloads is the best way to ensure that the respective benefits of the devices can be leveraged. A common way to do this, is to run certain computing tasks on the necessary devices and determine the length of time and resources that it takes. An example of this is the profiling step done in [https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10818760&tag=1 EdgeShard] [1] and [https://dl.acm.org/doi/pdf/10.1145/3093337.3037698 Neurosurgeon] [4]. Other frameworks implement similar steps, where the capabilities of different devices are tested in order to allocate their workloads and determine the limit at which they can provide efficient functionality. If the workload is beyond the limits of the devices, it can be sent to the cloud for processing

The key advantage of this is that it is able to utilize the resources of the edge devices as necessary, allowing increased data privacy and lower latency. Since workloads are only processed in the cloud as needed, this will reduce the overall amount of time needed for processing because data is not constantly sent back and forth. It also allows for much less network congestion, which is crucial for many applications.

[[File:ECcollab.png|400px|thumb|center|The collaboration of Edge and Cloud Devices]]

==='''Optimizing Workload Partitioning'''===
The key idea for much of the optimization done in machine learning on edge systems involves fully utilizing the heterogenous devices that are often contained in these systems. As such, it is important to understand the capabilities of each device so as to fully utilize its advantages. Devices can very greatly from smartphones with more powerful computational abilities to raspberry pis to sensors. More difficult tasks are offloaded to the powerful devices, while simpler tasks, or models that have been somewhat pretrained can be sent to the smaller devices. In some cases, as in [https://ieeexplore.ieee.org/abstract/document/8690980 Mobile-Edge] [2], the task may be dropped altogether if the resources are deemed insufficient. In this way, exceptionally difficult tasks do not block tasks that have the ability to be executed and therefore the system can continue working.

==='''Dynamic Models'''===
Given the dynamic nature of the environments that edge devices must function, as well as the heterogeneity of the devices themselves, a dynamic model of machine learning is often employed. Such models must keep track of the current available resources including computation usage and power, as well as network traffic. These may change very often depending on the workloads and devices in the system. As such, training models to continuously monitor and dynamically distribute the workloads is a very important part of optimization. Simply offloading larger tasks to more powerful devices may be obsolete if the devices has all of its computing resources or network capabilities being used up by another workload.

The way this is commonly done is by using the profiling step described above as a baseline. Then, a machine learning model utilizes the data to estimate the performance of devices and/or layers. During runtime, a similar process is employed which may update the data used and help the model refine its predictions. Network traffic is also taken into account at this stage in order to preserve the edge computing benefit of providing lower latency. Using all of this data and updates at runtime, the partitioning model is able to dynamically distribute workloads at runtime in order to optimize the workflow and ensure each device is utilizing its resources in the most efficient manner. 2 very good examples of how such a system is specifically deployed are the Neurosurgeon and EdgeShard systems, shown above.

==='''Horizontal and Vertical Partitioning'''===
There are 2 major ways that these models split the workloads in order to optimize the machine learning: Horizontal and vertical partitioning [3]. Given a set of layers that ranges from the cloud to edge, vertical partitioning involves splitting up the workload between the layers. For example, if a large amount of computational resources is deemed necessary, this task may go to the cloud to be completed and preprocessed. One the other hand, if a small amount of computational power is required, this type of work can go to edge devices. Such partitioning also depends on the confidence and accuracy level of the given learning. If the accuracy is completed on an edge device and found to be very low, it can be sent to the cloud; on the other hand if the accuracy is already fairly high and the learning model needs smaller work to reach the threshold deemed acceptable, it may be sent to edge devices to free up network traffic on the cloud and reduce latency [3].

The second model of partitioning is called horizontal partitioning. This involves splitting among the devices within a certain layer rather than among the layers themselves. This is similar to what has been described in previous sections, as it allows a means for fully utilizing the heterogenous abilities that are found in edge devices. Similar functionality and determination to what is found in horizontal partitioning is done, but all of the devices that the workload is split across function on the same layer [3]. To fully optimize a machine learning model, both horizontal and vertical partitioning must be used.

[[File:Edge_computing_layers.png|400px|thumb|center|An example of different layers with multiple devices]]

==='''Distributed Learning'''===
Distributed learning is a process in which instead of giving all of the data to the nodes, each node takes in and processes only a subset of the overall data, then sends its results to a central server which contains the main model. These periodic updates are done by each node, and by only processing part of the data, it is much more easy for edge devices to handle these workloads. It can also reduce the time and computational burden on the cloud and network because it is not only the central server performing all of the computations. One popular method of accomplishing this is by using parallel gradient descent. Gradient descent by itself is a very useful means of training a model, and parallel gradient descent uses a similar process but instead of a single operation, it aggregates the gradient descent calculated by each node in order to update the central model. This efficiently utilizes each node and makes sure that the data that is used to update the model does not exceed the memory constraints of the edge devices being used as nodes.

[[File:Distributed.png|600px|thumb|center| A visualization of data partitioning among nodes doing distributed learning]]

'''Asynchronous Scheduling:''' One important aspect of distributed learning is the means by which the updates are given to the model. Each node may have different amounts of data, memory, and computational power, and therefore the time it takes to process and update will not be the same. Synchronous algorithms make sure that each node finishes its respective processing, and then all of the calculations from all nodes are sent to update the central model. Although, this can make the updating easier, any nodes that lag behind the others will greatly reduce the speed at which the model is trained; this also leaves nodes idle while others are still processing and can be highly inefficient. To optimize this, after each node has finished its respective processing, it sends everything to the main server for an update rather than waiting for all the others to finish. However, this requires some more communication, as after each update, every node must get the updated model for the central server; this can be challenging to repeatedly do, but the efficiency that is gained is often worth it.

'''Federated learning:''' Federated learning is another means by which the data as well as computational burden is distributed among a set of nodes, thus reducing network traffic and strain on the central servers. It is similar to distributed learning, but the key difference lies in the data partitioning. Unlike distributed learning, data used in federated learning is not shared with the central servers. Instead, only local data from each of the nodes is used to train the model. Then, the only part shared with the central model is the updated parameters. A key aspect of this is that federated learning provides a greater amount of data privacy, which is crucial for certain applications dealing with sensitive data. Therefore, it is especially useful to utilize edge devices to perform federated learning. Federated learning is discussed in detail in chapter 5 of this wiki.

==='''Transfer Learning'''===
Transfer learning is a method of machine learning in which a pretrained model is sent to the different nodes for further processing and fine-tuning. The initial training of the model may involve a very large amount of data and could place a major burden on the device that must execute it, which may not be feasible for edge devices. Therefore, the bulk of the model training is done by a more powerful machine, such as the cloud, and then the pretrained model is sent to the edge device. This can be useful, as it reduces the computational burden on the edge device, and allows it to fine-tune the model using the data it collects without having to completely train the whole system. One form of this is knowledge distillation, in which a smaller model can be trained to mimic that of a larger model. This may often be the case when edge and cloud systems are used in a combined way.

==='''Methods of Data Optimization'''===

'''Data Selection:''' Certain types of data may be more useful than others, and therefore the ones that most affect the accuracy of a model can be offloaded to edge devices. This decreases the workload on them because less data must be processed, while also conserving the accuracy of the model as much as possible. Some larger data may not be needed, such as only putting Small Language Models on edge devices that can handle simple commands and prompts, rather than having to offload an entire LLM onto the device, which may overload its computational abilities and not provide enough use.

'''Data Compression:''' Data can be compressed to a smaller form in order to fit the constraints of edge devices. This is especially true given their limited memory, and also makes the workload smaller. Quantization, discussed previously, is a prevalent example of this.

'''Container Workloads:''' Container workloads can be very useful, as they provide all resources and important data the device needs for processing the work. By examining the computational abilities of the device, different sized workloads can be allocated as deemed necessary to maximize the efficiency of the training.

'''Batch size tuning:''' The batch size used by a certain model is very important when considering the memory constraints of edge devices. Batch sizes that are smaller allow for a quicker and more efficient means of training models, and are less likely to lead to bottlenecks in memory. This is related to partitioning because the computing and memory capacity of the devices available are very important factors to consider.

==='''Utilizing Small Language Models (SLMs)'''===
Large Language Models (LLMs) have become a prevalent system recently and are able to do and help with a variety of tasks. However, running and training an LLM requires a significant amount of computational resources which is not feasible when working with edge devices. Most modern LLMs are cloud based, but this may lead to high latency and increased network traffic, especially when working with a large subsystem of nodes.

One way that a similar system can be achieved on edge devices is by using SLMs. These are not as accurate and do not have the vast knowledge of LLMs or the amount of data they are trained on, but for the purposes of basic applications and edge devices, they can be sufficient to accomplish many tasks. They are also often fine-tuned and trained to accomplish the specific tasks which they are deployed for and are much faster and resource-efficient than LLMs. They can also provide much more privacy because they are able to be run on local devices without sharing user data to the cloud. This can be useful for a wide variety of edge applications. If needed and privacy constraints permit, they can query and LLM as needed for more complicated tasks. This means that not every prompt leads to a query, and thus network traffic, privacy, and latency constraints are still preserved.

=='''References'''==
[https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10818760&tag=1 1. M. Zhang, X. Shen, J. Cao, Z. Cui and S. Jiang, "EdgeShard: Efficient LLM Inference via Collaborative Edge Computing," in IEEE Internet of Things Journal, doi: 10.1109/JIOT.2024.3524255]

[https://ieeexplore.ieee.org/abstract/document/8690980 2. X. Chen, H. Zhang, C. Wu, S. Mao, Y. Ji and M. Bennis, "Performance Optimization in Mobile-Edge Computing via Deep Reinforcement Learning," 2018 IEEE 88th Vehicular Technology Conference (VTC-Fall), Chicago, IL, USA, 2018, pp. 1-6, doi: 10.1109/VTCFall.2018.8690980.]

[https://ieeexplore.ieee.org/abstract/document/8976180 3. X. Wang, Y. Han, V. C. M. Leung, D. Niyato, X. Yan and X. Chen, "Convergence of Edge Computing and Deep Learning: A Comprehensive Survey," in IEEE Communications Surveys & Tutorials, vol. 22, no. 2, pp. 869-904, Secondquarter 2020, doi: 10.1109/COMST.2020.2970550.]

[https://dl.acm.org/doi/abs/10.1145/3093337.3037698 4. Kang, Yiping and Hauswald, Johann and Gao, Cao and Rovinski, Austin and Mudge, Trevor and Mars, Jason and Tang, Lingjia, "Neurosurgeon: Collaborative Intelligence Between the Cloud and Mobile Edge" 2017 Association for Computing Machinery, New York, NY, USA, 2017, doi: 10.1145/3093337.3037698.]

[https://dl.acm.org/doi/pdf/10.1145/3555802 5. Hua, Haochen, et al. "Edge computing with artificial intelligence: A machine learning perspective." ACM Computing Surveys 55.9 (2023): 1-35.]

[https://www.mdpi.com/2079-9292/13/3/640 6. Grzesik, Piotr, and Dariusz Mrozek. "Combining machine learning and edge computing: Opportunities, challenges, platforms, frameworks, and use cases." Electronics 13.3 (2024): 640.]

[https://link.springer.com/chapter/10.1007/978-3-030-96756-7_1#citeas 7. S. Rafatirad, H. Homayoun, Z. Chen, and S. M. Pudukotai Dinakarrao, "What Is Applied Machine Learning?," in Machine Learning for Computer Scientists and Data Analysts, Cham, Switzerland: Springer, 2022.]

[https://link.springer.com/article/10.1007/s13748-012-0035-5 8. Peteiro-Barral, Diego, and Bertha Guijarro-Berdiñas. "A survey of methods for distributed machine learning." Progress in Artificial Intelligence 2 (2013): 1-11.]

[https://proceedings.neurips.cc/paper/2010/hash/abea47ba24142ed16b7d8fbf2c740e0d-Abstract.html 9. Zinkevich, Martin, et al. "Parallelized stochastic gradient descent." Advances in neural information processing systems 23 (2010).]

[https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9134370 10. F. Zhuang et al., "A Comprehensive Survey on Transfer Learning," in Proceedings of the IEEE, vol. 109, no. 1, pp. 43-76, Jan. 2021, doi: 10.1109/JPROC.2020.3004555]

[https://proceedings.mlr.press/v97/phuong19a/phuong19a.pdf 11. M. Phuong and C. Lampert, "Towards understanding knowledge distillation," in Proc. Int. Conf. Mach. Learn., May 2019, pp. 5142–5151.]

[https://www.jmlr.org/papers/volume18/16-456/16-456.pdf 12. I. Hubara, M. Courbariaux, D. Soudry, R. El-Yaniv, and Y. Bengio, "Quantized neural networks: Training neural networks with low precision weights and activations," Journal of Machine Learning Research, vol. 18, no. 187, pp. 1–30, 2018.]]

[https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=8588318 13. D. Banik, A. Ekbal, and P. Bhattacharyya, "Machine learning based optimized pruning approach for decoding in statistical machine translation," IEEE Access, vol. 7, pp. 1736–1751, Dec. 2018, doi: 10.1109/ACCESS.2018.2883738.]

Machine Learning at the Edge

2025-04-25T04:27:26Z

Ciarang: /* 4.2 ML Training at the Edge */

='''Machine Learning at the Edge'''=

=='''4.1 Overview of ML at the Edge'''==
==='''Introduction'''===
Machine Learning is a branch of Artificial Intelligence that is primarily concerned with the training of algorithms to look at sets of data, analyze patterns, and generate conclusions so that the gathered data can be used to generate results and carry out tasks. As an application, machine learning is especially relevant when considering edge computing [7]. Given the recent prevalence and rise in the field of machine learning, it is inevitable that deployment of machine learning algorithms will play a crucial role in the function of edge computing systems as well. The abundant sensor networks often involved in edge systems provide a means of gathering very large amounts of data. With the help of machine learning, such data can be utilized in very effective ways and automation methods can be employed to improve efficiency. Machine learning on edge devices themselves can help ease the burden on cloud systems and the networks connected to them as well, especially if the model does need such large computational power such as in the case of certain Small Language Models (SLMs). However, the main issue regarding machine learning at the edge lies in the devices, which may have limited computational power, and the environment, which may change dynamically and be effected by network congestion, device outages, or other unexpected events. If these challenges can be overcome, edge systems could play a pivotal role in advancing machine learning and utilize it to maximize efficiency.

[[File:ECML.png|600px|thumb|center|]]

==='''Benefits of Machine Learning using Edge Computing'''===
'''Data Volumes:''' Traditional cloud computing architecture may not be able to keep up with the massive volume of data that is often generated by IoT devices and sensors, thus increasing costs as well as pressure and congestion on the networks transmitting data [5]. With edge computing architecture, machine learning can be deployed in a distributed nature, allowing ML models to be deployed and trained across many edge devices. The nature of edge devices sharing computational tasks makes it perfect for taking large datasets and splitting the computational load so that many devices can compute what a single device might not have been able to handle [5]. More than just splitting input data across edge devices, machine learning models can be split up across edge devices as well. This allows for deployment of models that would be otherwise too large to be deployed on a single edge device with limited memory. In such cases, edge nodes are able to collaboratively pass data between them, and the global ML model is later updated based on the workings of each smaller portion [5]. Additionally, the sheer amount of data that processed from edge devices means more training data for machine learning models, which generally increases accuracy, and thus edge computing could be an effective solution to providing lots of good and relevant data for a variety of applications.

'''Lower Latency:''' For certain real-time applications, such as Virtual and Augmented Reality(VR/AR) or smart cars, the latency required to transmit data and process it to the cloud may be too high to make these applications efficient and safe. A key benefit of edge computing lies in the reduced latency provided by putting devices closer to the users, and this is applicable to machine learning as well [5].

'''Enhanced privacy:''' Processing data locally rather than on the cloud reduces the risk of data theft and enhances privacy. The data does not have to be sent to anyone else, or go over a network to potentially get compromised. This is crucial given the large amounts of data needed for machine learning, and especially if personal data is needed for a personal application, many users would prefer the privacy done by having it processed locally [5].

==='''Challenges of Machine Learning Using Edge Computing'''===

'''Potentially Low Computational Power:''' Many edge devices lack the computational power needed for needed for deep learning applications, especially given the large amount of data and complex operations that may have to take place [1].

'''Energy Management:''' Given that edge devices may consist of sensors, or other devices that are meant to have low energy consumption, the tasks associated with machine learning could quickly drain their power, even if they have sufficient power to run the workloads [6].

'''New Attack Surfaces:''' With more devices comes the increased potential for malicious hackers to steal data or compromise devices, despite the enhanced privacy. Encryption, access controls, and other methodologies must be employed to mitigate the potential for new attack surfaces to be exploited [6].

==='''Applications for ML at the Edge'''===
The utilization of data to make conjectures about what is going on in the environment and how to respond has a variety of use cases that can greatly benefit people, cities, and the environment. By leveraging and monitoring a constant stream of data and training machine learning models to detect or even respond to different events, there can be many practical applications for such systems. These would rely on the combination of edge devices and machine learning to better enhance experience for users and detect events of interest.

'''Self-Driving Cars''': Imitation learning can be leveraged to better understand and emulate human driving. The low latency that can be provided by edge computing is especially useful for the quick decision making needed by these systems [7].

'''Smart Home Devices''': Understanding user habits by leveraging the available data for them can make smart homes more convenient for users. The increased privacy that can come with edge computing, along with a few extra cybersecurity measures, can ensure the personal data that may be used for training is not compromised.

'''Environmental and Industrial Monitoring:''' Sensors deployed in environments and industrial settings could be trained to recognize when there are anomalies or undesirable behavior, thus ensuring a quick response and active information sending [7].

'''Smart Cities:''' Similar to above, the data collected by sensors in cities can leverage machine learning to help in crime and emergency detection, traffic management, or energy management[7].

=='''4.2 ML Training at the Edge'''==

Machine Learning (ML) training at the edge is basically the process of developing, updating, or fine-tuning ML models directly on edge devices like on smartphones, IoT sensors, wearables, and other embedded systems instead of only depending on centralized cloud infrastructure. This approach is becoming a lot more important as the demand for real-time, personalized AI applications continues to grow. By being able to train models closer to where the data is generated, edge-based ML enables faster responses, helps reduce latency, and enhances user privacy by minimizing the need to transmit sensitive data to the cloud. It’s also especially useful in scenarios where devices operate in environments with limited or unreliable network connectivity, allowing them to function more efficiently.

[[File:Machine Learning Pipeline.png|500px|thumb|right|Diagram of the machine learning pipeline: raw structured and unstructured data is preprocessed, analyzed, and used for feature selection, followed by model construction, evaluation, and deployment for application use.]]

[[File:Learning.png|500px|thumb|center| This diagram shows the relationship among different devices in the edge-cloud system. As shown, the deep learning workload is partitioned among all layers so that each one contributes based on its advantages.]]

==='''Benefits:'''===
One significant advantage of training ML models directly on edge devices is reduced latency. By processing data locally, devices can make immediate decisions without the delays caused by transmitting data back and forth to cloud servers. This immediate responsiveness is extremely important for applications like real time health monitoring, autonomous driving, and industrial automation.

Additionally, training machine learning models at the edge significantly enhances user privacy. Since sensitive data can be processed and stored directly on the user's device rather than being sent to centralized cloud servers, the risk of data breaches or unauthorized access during transmission is reduced by a lot. This local data handling is able to prevent exposure of personal or confidential information, providing users greater control over their data. Edge-based training naturally aligns with privacy regulations such as the General Data Protection Regulation (GDPR), which emphasizes strict data security, transparency, and explicit user consent. By keeping personal data localized, edge training not only improves security but also helps organizations easily comply with privacy laws, protecting users’ rights and maintaining trust.

Efficiency and resilience are important benefits of edge training. By training machine learning models directly on edge devices, these devices become capable of processing data locally without relying on constant internet connectivity. This local processing allows edge devices to continue operating effectively even in environments where network connections are weak, unstable, or completely unavailable. Because they are not fully dependent on cloud infrastructure, edge devices can quickly adapt to changes, respond in real-time, and update their ML models based on immediate local data. As a result, edge training ensures reliable performance and uninterrupted operation, making it particularly valuable for remote locations, emergency scenarios, and harsh environments where cloud-based solutions might fail or become unreliable.

'''Examples: '''
A smart thermostat in a home can learn a user’s preferences for temperature and adjust automatically based on real-time inputs, like time of day or weather conditions. Similarly, a fitness tracker can track user activity patterns and adapt its recommendations for workouts or rest periods based on how the user is performing each day. These devices don’t need to rely on cloud servers to update or personalize their behavior — they can do it instantly on the device, which makes them more responsive and efficient.

In smart agriculture, edge computing is used to enhance crop monitoring and optimize farming practices. Devices like soil sensors, drones, and automated irrigation systems are equipped with sensors that collect data on soil moisture, temperature, and crop health. Edge devices process this data locally, enabling real-time decisions for tasks like irrigation, fertilization, and pest control.

In smart retail, edge computing is used to improve inventory management and customer experience. Retailers use smart shelves, RFID tags, and in-store cameras equipped with sensors to track inventory and monitor customer behavior. By processing this data locally on edge devices, retailers can manage stock levels, detect theft, and optimize store layouts in real-time. RFID tags placed on products can detect when an item is removed from the shelf. Using edge processing, the system can immediately update the inventory count and trigger a restocking request if an item’s stock is low.

'''Research Papers:'''
An important contribution to the understanding of machine learning (ML) training at the edge is the research paper "Making Distributed Edge Machine Learning for Resource-Constrained Communities and Environments Smarter: Contexts and Challenges" by Truong et al. (2023). This paper focuses on training ML models directly on edge devices in communities and environments facing limitations, such as unstable network connections, limited computational resources, and scarce technical expertise. The authors emphasize the necessity of developing context-aware ML training methods specifically tailored to these environments. Traditional centralized ML training methods often fail to operate effectively in such constrained settings, highlighting the need for decentralized, localized solutions. Truong et al. explore various challenges, including managing data efficiently, deploying suitable software frameworks, and designing intelligent runtime strategies that allow edge devices to train models effectively despite limited resources. Their work points out significant research directions, advocating for more adaptable and sustainable ML training solutions that genuinely reflect the technological and social contexts of resource-limited environments.

==='''Tools and Frameworks:'''===

Frameworks like TensorFlow Lite, PyTorch Mobile, and Edge Impulse are designed to support edge-based model training and inference. These tools allow developers to build and fine-tune models specifically for deployment on low-power devices.

==='''Technical Challenges:'''===

Despite its advantages, ML training at the edge presents challenges, including limited processing power, memory constraints, and energy efficiency. Edge devices often lack the computational resources of cloud servers, requiring lightweight models, optimized algorithms, and energy-efficient hardware.

Real World Applications:
A well known example is Apple’s use of on-device training for personalized voice recognition with Siri. Instead of uploading user voice data to the cloud, Apple uses local training to improve accuracy over time while maintaining user privacy.

==='''Dealing With Challenges'''===
Despite the challenges, of ML at the edge, there are a variety of methods that can be used to provide a more efficient means of training, and making the heavy workloads compatible with even the limited computing power of certain edge devices.

'''Quantization:'''
Quantization is a method that involves reducing the precision of numbers, and thus easing the burden of computational power as well as memory management on the edge devices. There are multiple forms of quantization, but each one essentially sacrifices some precision - enough so that accuracy is mostly maintained but the numbers are easier to handle. For example, converting from floating point to integer datatypes means significantly less memory is used, and the differences for some models in the precision may be negligible. Another example is K-means based Weight Quantization, which involves creating a matrix and grouping similar numbers together with centroids. An example is shown below:

[[File:Screenshot_2025-04-24_170604.png|500px|thumb|right|By clustering each index in the matrix, and using centroids to approximate, the overall computations can be done much quicker and are more easily handled by edge devices]]

In recent work, Quantized Neural Networks (QNNs) have demonstrated that even extreme quantization—such as using just 1-bit values for weights and activations—can retain near state-of-the-art accuracy across vision and language tasks [12]. This type of quantization drastically reduces memory access requirements and replaces expensive arithmetic operations with fast, low-power bitwise operations like XNOR and popcount. These benefits are especially important for edge deployment, where energy efficiency is critical. In addition to model compression, Hubara et al. also show that quantized gradients—using as little as 6 bits—can be employed during training with minimal performance loss, further enabling efficient on-device learning [12]. QNNs have achieved strong results even on demanding benchmarks like ImageNet, while offering significant speedups and memory savings, making them one of the most practical solutions for edge AI deployment [12].

'''Pruning:'''
Pruning is an optimization technique that systematically removes low-salience parameters—such as weakly contributing weights or redundant hypothesis paths—from a machine learning model or decoding algorithm to reduce computational overhead. In the context of edge computing, where resources like memory bandwidth, power, and processing time are limited, pruning enables the deployment of performant models within strict efficiency constraints.

In statistical machine translation (SMT) systems, pruning is particularly critical during the decoding phase, where the search space of possible translations grows exponentially with sentence length. Techniques such as histogram pruning and threshold pruning are employed to manage this complexity. Histogram pruning restricts the number of candidate hypotheses retained in a decoding stack to a fixed size 𝑛, discarding the remainder. Threshold pruning eliminates hypotheses whose scores fall below a proportion 𝛼 of the best-scoring candidate, effectively filtering out weak candidates early.

The paper by Banik et al. introduces a machine learning-based dynamic pruning framework that adaptively tunes pruning parameters—namely stack size and beam threshold—based on structural features of the input text, such as sentence length, syntactic complexity, and the distribution of stop words. Rather than relying on static hyperparameters, this method uses a classifier (CN2 algorithm) trained on performance data to predict optimal pruning configurations at runtime. Experimental results showed consistent reductions in decoding latency (up to 90%) while maintaining or improving translation quality, as measured by BLEU scores [13].

This adaptive pruning paradigm is highly relevant to edge inference pipelines, where models must maintain a balance between latency and predictive accuracy. By intelligently limiting the hypothesis space and focusing computational resources on high-probability paths, pruning supports real-time, resource-efficient processing in edge NLP and embedded translation systems.

[[File:Pruning.png|350px|thumb|center|This shows how pruning can significantly reduce the overall network, thus leading to better computational and memory management]]

'''Distillation:'''
Distillation is a key strategy for reducing model complexity in edge computing environments. Instead of training a compact student model on hard labels—discrete class labels like 0, 1, or 2—it is trained on the soft outputs of a larger teacher model. These soft labels represent probability distributions over all classes, offering more nuanced supervision. For instance, rather than telling the student the input belongs strictly to class 3, a teacher might output “70% class 3, 25% class 2, 5% class 1.” This richer feedback helps the student model capture subtle relationships between classes that hard labels miss. Beyond reducing computational demands, distillation enhances generalization by conveying more informative training signals. It also benefits from favorable data geometry—when class distributions are well-separated and aligned—and exhibits strong monotonicity, meaning the student model reliably improves as more data becomes available [11]. These properties make it exceptionally suited for edge devices where training data may be limited, but efficient inference is crucial.

In most cases, knowledge distillation in edge environments involves a large, high-capacity model trained in the cloud acting as the teacher, while the smaller, lightweight student model is deployed on edge devices. A less common—but emerging—practice is edge-to-edge distillation, where a more powerful edge node or edge server functions as the teacher for other nearby edge devices. This setup is especially valuable in federated, collaborative, or hierarchical edge networks, where cloud connectivity may be limited or privacy concerns necessitate local training. Distillation can also be combined with techniques such as quantization or pruning to further optimize model performance under hardware constraints. An example is shown below:

[[File:Knowledge_Distillation.png|700px|thumb|center|This shows how a complex teacher model transfers learned knowledge to a smaller student model using soft predictions to enable efficient edge deployment]]

==='''Usage and Applications of AI Agents'''===
As AI develops and is able to identify different images, generate chats and utilize human language, and optimize different systems, the next step is for it to be able to efficiently perform tasks specified by users. This is known as an AI agent. The agent must first perceive its environment and task, often specified by a user, then it must think about what is the best means to accomplish the task, and finally be able to act on it. 3 aspects of AI agents are crucial in their development to efficiently and correctly carry out task:

'''Reasoning:''' The agent must be able to think sequentially, and decompose its specified tasks into a sequence of specific steps in order to accomplish its goal. It must also have some memory storage in order to remember what it has done, as well as the results of its sequence of actions in order to learn for future steps.

'''Autonomy:''' The agent must choose from the availability of possible steps, and operate based on its reasoning without step-by-step instructions from the user.

'''Tools:''' These tasks, however, are impossible to accomplish without the correct tools. Even if an AI agent understands how to go about carrying a task for optimal results, it must have the actual means to do it. This can include the ability to use and interact with APIs, interpret code, and access certain databases.

Utilizing AI agents on edge devices can be tricky due to the computational and reasoning power needed. However, there are methods to accomplish this such as SLMs which query LLMs as need (discussed later), or utilizing more powerful edge devices to carry out tasks. However, utilizing edge devices can be paramount if latency is a major issue, or if the agent is exposed to sensitive user data. Additionally, by using edge devices specific to a user, it may be able to better learn a user's patterns and preferences and react accordingly to provide the best possible outcome for that user.

=='''4.3 ML Model Optimization at the Edge'''==

==='''The Need for Model Optimization at the Edge'''===
Given the constrained resources, along with the inherently dynamic environment that edge devices must function in, model optimization is a crucial part of machine learning in edge computing. The current most widely used methodology consists of simply specifying an exceptionally large set of parameters, and giving it to the model to train on. This can be feasible when hardware is very advanced and powerful, and is necessary for systems such as Large Language Models (LLMs). However, this is no longer viable when dealing with the devices and environments at the edge. It is crucial to identify the best parameters and training methodology so as to minimize the amount of work done by these devices, while compromising as little as possible on the accuracy of the models. There are multiple ways to this, and they include either optimization or augmentation of the dataset itself, or optimization of the partition of work among the edge devices.

==='''Edge and Cloud Collaboration'''===
One methodology that is often used involves collaboration between both Edge and Cloud Devices. The cloud has the ability to process workloads that may require much more resources and cannot be done on edge devices. On the other hand, edge devices, which can store and process data locally, may have lower latency and more privacy. Given the advantages of each of these, many have proposed that the best way to handle machine learning is through a combination of edge and cloud computing.

The primary issue facing this computing paradigm, however, is the problem of optimally selecting which workloads should be done on the cloud and which should be done on the edge. This is a crucial problem to solve, as the correct partition of workloads is the best way to ensure that the respective benefits of the devices can be leveraged. A common way to do this, is to run certain computing tasks on the necessary devices and determine the length of time and resources that it takes. An example of this is the profiling step done in [https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10818760&tag=1 EdgeShard] [1] and [https://dl.acm.org/doi/pdf/10.1145/3093337.3037698 Neurosurgeon] [4]. Other frameworks implement similar steps, where the capabilities of different devices are tested in order to allocate their workloads and determine the limit at which they can provide efficient functionality. If the workload is beyond the limits of the devices, it can be sent to the cloud for processing

The key advantage of this is that it is able to utilize the resources of the edge devices as necessary, allowing increased data privacy and lower latency. Since workloads are only processed in the cloud as needed, this will reduce the overall amount of time needed for processing because data is not constantly sent back and forth. It also allows for much less network congestion, which is crucial for many applications.

[[File:ECcollab.png|400px|thumb|center|The collaboration of Edge and Cloud Devices]]

==='''Optimizing Workload Partitioning'''===
The key idea for much of the optimization done in machine learning on edge systems involves fully utilizing the heterogenous devices that are often contained in these systems. As such, it is important to understand the capabilities of each device so as to fully utilize its advantages. Devices can very greatly from smartphones with more powerful computational abilities to raspberry pis to sensors. More difficult tasks are offloaded to the powerful devices, while simpler tasks, or models that have been somewhat pretrained can be sent to the smaller devices. In some cases, as in [https://ieeexplore.ieee.org/abstract/document/8690980 Mobile-Edge] [2], the task may be dropped altogether if the resources are deemed insufficient. In this way, exceptionally difficult tasks do not block tasks that have the ability to be executed and therefore the system can continue working.

==='''Dynamic Models'''===
Given the dynamic nature of the environments that edge devices must function, as well as the heterogeneity of the devices themselves, a dynamic model of machine learning is often employed. Such models must keep track of the current available resources including computation usage and power, as well as network traffic. These may change very often depending on the workloads and devices in the system. As such, training models to continuously monitor and dynamically distribute the workloads is a very important part of optimization. Simply offloading larger tasks to more powerful devices may be obsolete if the devices has all of its computing resources or network capabilities being used up by another workload.

The way this is commonly done is by using the profiling step described above as a baseline. Then, a machine learning model utilizes the data to estimate the performance of devices and/or layers. During runtime, a similar process is employed which may update the data used and help the model refine its predictions. Network traffic is also taken into account at this stage in order to preserve the edge computing benefit of providing lower latency. Using all of this data and updates at runtime, the partitioning model is able to dynamically distribute workloads at runtime in order to optimize the workflow and ensure each device is utilizing its resources in the most efficient manner. 2 very good examples of how such a system is specifically deployed are the Neurosurgeon and EdgeShard systems, shown above.

==='''Horizontal and Vertical Partitioning'''===
There are 2 major ways that these models split the workloads in order to optimize the machine learning: Horizontal and vertical partitioning [3]. Given a set of layers that ranges from the cloud to edge, vertical partitioning involves splitting up the workload between the layers. For example, if a large amount of computational resources is deemed necessary, this task may go to the cloud to be completed and preprocessed. One the other hand, if a small amount of computational power is required, this type of work can go to edge devices. Such partitioning also depends on the confidence and accuracy level of the given learning. If the accuracy is completed on an edge device and found to be very low, it can be sent to the cloud; on the other hand if the accuracy is already fairly high and the learning model needs smaller work to reach the threshold deemed acceptable, it may be sent to edge devices to free up network traffic on the cloud and reduce latency [3].

The second model of partitioning is called horizontal partitioning. This involves splitting among the devices within a certain layer rather than among the layers themselves. This is similar to what has been described in previous sections, as it allows a means for fully utilizing the heterogenous abilities that are found in edge devices. Similar functionality and determination to what is found in horizontal partitioning is done, but all of the devices that the workload is split across function on the same layer [3]. To fully optimize a machine learning model, both horizontal and vertical partitioning must be used.

[[File:Edge_computing_layers.png|400px|thumb|center|An example of different layers with multiple devices]]

==='''Distributed Learning'''===
Distributed learning is a process in which instead of giving all of the data to the nodes, each node takes in and processes only a subset of the overall data, then sends its results to a central server which contains the main model. These periodic updates are done by each node, and by only processing part of the data, it is much more easy for edge devices to handle these workloads. It can also reduce the time and computational burden on the cloud and network because it is not only the central server performing all of the computations. One popular method of accomplishing this is by using parallel gradient descent. Gradient descent by itself is a very useful means of training a model, and parallel gradient descent uses a similar process but instead of a single operation, it aggregates the gradient descent calculated by each node in order to update the central model. This efficiently utilizes each node and makes sure that the data that is used to update the model does not exceed the memory constraints of the edge devices being used as nodes.

[[File:Distributed.png|600px|thumb|center| A visualization of data partitioning among nodes doing distributed learning]]

'''Asynchronous Scheduling:''' One important aspect of distributed learning is the means by which the updates are given to the model. Each node may have different amounts of data, memory, and computational power, and therefore the time it takes to process and update will not be the same. Synchronous algorithms make sure that each node finishes its respective processing, and then all of the calculations from all nodes are sent to update the central model. Although, this can make the updating easier, any nodes that lag behind the others will greatly reduce the speed at which the model is trained; this also leaves nodes idle while others are still processing and can be highly inefficient. To optimize this, after each node has finished its respective processing, it sends everything to the main server for an update rather than waiting for all the others to finish. However, this requires some more communication, as after each update, every node must get the updated model for the central server; this can be challenging to repeatedly do, but the efficiency that is gained is often worth it.

'''Federated learning:''' Federated learning is another means by which the data as well as computational burden is distributed among a set of nodes, thus reducing network traffic and strain on the central servers. It is similar to distributed learning, but the key difference lies in the data partitioning. Unlike distributed learning, data used in federated learning is not shared with the central servers. Instead, only local data from each of the nodes is used to train the model. Then, the only part shared with the central model is the updated parameters. A key aspect of this is that federated learning provides a greater amount of data privacy, which is crucial for certain applications dealing with sensitive data. Therefore, it is especially useful to utilize edge devices to perform federated learning. Federated learning is discussed in detail in chapter 5 of this wiki.

==='''Transfer Learning'''===
Transfer learning is a method of machine learning in which a pretrained model is sent to the different nodes for further processing and fine-tuning. The initial training of the model may involve a very large amount of data and could place a major burden on the device that must execute it, which may not be feasible for edge devices. Therefore, the bulk of the model training is done by a more powerful machine, such as the cloud, and then the pretrained model is sent to the edge device. This can be useful, as it reduces the computational burden on the edge device, and allows it to fine-tune the model using the data it collects without having to completely train the whole system. One form of this is knowledge distillation, in which a smaller model can be trained to mimic that of a larger model. This may often be the case when edge and cloud systems are used in a combined way.

==='''Methods of Data Optimization'''===

'''Data Selection:''' Certain types of data may be more useful than others, and therefore the ones that most affect the accuracy of a model can be offloaded to edge devices. This decreases the workload on them because less data must be processed, while also conserving the accuracy of the model as much as possible. Some larger data may not be needed, such as only putting Small Language Models on edge devices that can handle simple commands and prompts, rather than having to offload an entire LLM onto the device, which may overload its computational abilities and not provide enough use.

'''Data Compression:''' Data can be compressed to a smaller form in order to fit the constraints of edge devices. This is especially true given their limited memory, and also makes the workload smaller. Quantization, discussed previously, is a prevalent example of this.

'''Container Workloads:''' Container workloads can be very useful, as they provide all resources and important data the device needs for processing the work. By examining the computational abilities of the device, different sized workloads can be allocated as deemed necessary to maximize the efficiency of the training.

'''Batch size tuning:''' The batch size used by a certain model is very important when considering the memory constraints of edge devices. Batch sizes that are smaller allow for a quicker and more efficient means of training models, and are less likely to lead to bottlenecks in memory. This is related to partitioning because the computing and memory capacity of the devices available are very important factors to consider.

==='''Utilizing Small Language Models (SLMs)'''===
Large Language Models (LLMs) have become a prevalent system recently and are able to do and help with a variety of tasks. However, running and training an LLM requires a significant amount of computational resources which is not feasible when working with edge devices. Most modern LLMs are cloud based, but this may lead to high latency and increased network traffic, especially when working with a large subsystem of nodes.

One way that a similar system can be achieved on edge devices is by using SLMs. These are not as accurate and do not have the vast knowledge of LLMs or the amount of data they are trained on, but for the purposes of basic applications and edge devices, they can be sufficient to accomplish many tasks. They are also often fine-tuned and trained to accomplish the specific tasks which they are deployed for and are much faster and resource-efficient than LLMs. They can also provide much more privacy because they are able to be run on local devices without sharing user data to the cloud. This can be useful for a wide variety of edge applications. If needed and privacy constraints permit, they can query and LLM as needed for more complicated tasks. This means that not every prompt leads to a query, and thus network traffic, privacy, and latency constraints are still preserved.

=='''References'''==
[https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10818760&tag=1 1. M. Zhang, X. Shen, J. Cao, Z. Cui and S. Jiang, "EdgeShard: Efficient LLM Inference via Collaborative Edge Computing," in IEEE Internet of Things Journal, doi: 10.1109/JIOT.2024.3524255]

[https://ieeexplore.ieee.org/abstract/document/8690980 2. X. Chen, H. Zhang, C. Wu, S. Mao, Y. Ji and M. Bennis, "Performance Optimization in Mobile-Edge Computing via Deep Reinforcement Learning," 2018 IEEE 88th Vehicular Technology Conference (VTC-Fall), Chicago, IL, USA, 2018, pp. 1-6, doi: 10.1109/VTCFall.2018.8690980.]

[https://ieeexplore.ieee.org/abstract/document/8976180 3. X. Wang, Y. Han, V. C. M. Leung, D. Niyato, X. Yan and X. Chen, "Convergence of Edge Computing and Deep Learning: A Comprehensive Survey," in IEEE Communications Surveys & Tutorials, vol. 22, no. 2, pp. 869-904, Secondquarter 2020, doi: 10.1109/COMST.2020.2970550.]

[https://dl.acm.org/doi/abs/10.1145/3093337.3037698 4. Kang, Yiping and Hauswald, Johann and Gao, Cao and Rovinski, Austin and Mudge, Trevor and Mars, Jason and Tang, Lingjia, "Neurosurgeon: Collaborative Intelligence Between the Cloud and Mobile Edge" 2017 Association for Computing Machinery, New York, NY, USA, 2017, doi: 10.1145/3093337.3037698.]

[https://dl.acm.org/doi/pdf/10.1145/3555802 5. Hua, Haochen, et al. "Edge computing with artificial intelligence: A machine learning perspective." ACM Computing Surveys 55.9 (2023): 1-35.]

[https://www.mdpi.com/2079-9292/13/3/640 6. Grzesik, Piotr, and Dariusz Mrozek. "Combining machine learning and edge computing: Opportunities, challenges, platforms, frameworks, and use cases." Electronics 13.3 (2024): 640.]

[https://link.springer.com/chapter/10.1007/978-3-030-96756-7_1#citeas 7. S. Rafatirad, H. Homayoun, Z. Chen, and S. M. Pudukotai Dinakarrao, "What Is Applied Machine Learning?," in Machine Learning for Computer Scientists and Data Analysts, Cham, Switzerland: Springer, 2022.]

[https://link.springer.com/article/10.1007/s13748-012-0035-5 8. Peteiro-Barral, Diego, and Bertha Guijarro-Berdiñas. "A survey of methods for distributed machine learning." Progress in Artificial Intelligence 2 (2013): 1-11.]

[https://proceedings.neurips.cc/paper/2010/hash/abea47ba24142ed16b7d8fbf2c740e0d-Abstract.html 9. Zinkevich, Martin, et al. "Parallelized stochastic gradient descent." Advances in neural information processing systems 23 (2010).]

[https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9134370 10. F. Zhuang et al., "A Comprehensive Survey on Transfer Learning," in Proceedings of the IEEE, vol. 109, no. 1, pp. 43-76, Jan. 2021, doi: 10.1109/JPROC.2020.3004555]

[https://proceedings.mlr.press/v97/phuong19a/phuong19a.pdf 11. M. Phuong and C. Lampert, "Towards understanding knowledge distillation," in Proc. Int. Conf. Mach. Learn., May 2019, pp. 5142–5151.]

[https://www.jmlr.org/papers/volume18/16-456/16-456.pdf 12. I. Hubara, M. Courbariaux, D. Soudry, R. El-Yaniv, and Y. Bengio, "Quantized neural networks: Training neural networks with low precision weights and activations," Journal of Machine Learning Research, vol. 18, no. 187, pp. 1–30, 2018.]]

[https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=8588318 13. D. Banik, A. Ekbal, and P. Bhattacharyya, "Machine learning based optimized pruning approach for decoding in statistical machine translation," IEEE Access, vol. 7, pp. 1736–1751, Dec. 2018, doi: 10.1109/ACCESS.2018.2883738.]

Machine Learning at the Edge

2025-04-25T04:26:22Z

Ciarang: /* Benefits: */

='''Machine Learning at the Edge'''=

=='''4.1 Overview of ML at the Edge'''==
==='''Introduction'''===
Machine Learning is a branch of Artificial Intelligence that is primarily concerned with the training of algorithms to look at sets of data, analyze patterns, and generate conclusions so that the gathered data can be used to generate results and carry out tasks. As an application, machine learning is especially relevant when considering edge computing [7]. Given the recent prevalence and rise in the field of machine learning, it is inevitable that deployment of machine learning algorithms will play a crucial role in the function of edge computing systems as well. The abundant sensor networks often involved in edge systems provide a means of gathering very large amounts of data. With the help of machine learning, such data can be utilized in very effective ways and automation methods can be employed to improve efficiency. Machine learning on edge devices themselves can help ease the burden on cloud systems and the networks connected to them as well, especially if the model does need such large computational power such as in the case of certain Small Language Models (SLMs). However, the main issue regarding machine learning at the edge lies in the devices, which may have limited computational power, and the environment, which may change dynamically and be effected by network congestion, device outages, or other unexpected events. If these challenges can be overcome, edge systems could play a pivotal role in advancing machine learning and utilize it to maximize efficiency.

[[File:ECML.png|600px|thumb|center|]]

==='''Benefits of Machine Learning using Edge Computing'''===
'''Data Volumes:''' Traditional cloud computing architecture may not be able to keep up with the massive volume of data that is often generated by IoT devices and sensors, thus increasing costs as well as pressure and congestion on the networks transmitting data [5]. With edge computing architecture, machine learning can be deployed in a distributed nature, allowing ML models to be deployed and trained across many edge devices. The nature of edge devices sharing computational tasks makes it perfect for taking large datasets and splitting the computational load so that many devices can compute what a single device might not have been able to handle [5]. More than just splitting input data across edge devices, machine learning models can be split up across edge devices as well. This allows for deployment of models that would be otherwise too large to be deployed on a single edge device with limited memory. In such cases, edge nodes are able to collaboratively pass data between them, and the global ML model is later updated based on the workings of each smaller portion [5]. Additionally, the sheer amount of data that processed from edge devices means more training data for machine learning models, which generally increases accuracy, and thus edge computing could be an effective solution to providing lots of good and relevant data for a variety of applications.

'''Lower Latency:''' For certain real-time applications, such as Virtual and Augmented Reality(VR/AR) or smart cars, the latency required to transmit data and process it to the cloud may be too high to make these applications efficient and safe. A key benefit of edge computing lies in the reduced latency provided by putting devices closer to the users, and this is applicable to machine learning as well [5].

'''Enhanced privacy:''' Processing data locally rather than on the cloud reduces the risk of data theft and enhances privacy. The data does not have to be sent to anyone else, or go over a network to potentially get compromised. This is crucial given the large amounts of data needed for machine learning, and especially if personal data is needed for a personal application, many users would prefer the privacy done by having it processed locally [5].

==='''Challenges of Machine Learning Using Edge Computing'''===

'''Potentially Low Computational Power:''' Many edge devices lack the computational power needed for needed for deep learning applications, especially given the large amount of data and complex operations that may have to take place [1].

'''Energy Management:''' Given that edge devices may consist of sensors, or other devices that are meant to have low energy consumption, the tasks associated with machine learning could quickly drain their power, even if they have sufficient power to run the workloads [6].

'''New Attack Surfaces:''' With more devices comes the increased potential for malicious hackers to steal data or compromise devices, despite the enhanced privacy. Encryption, access controls, and other methodologies must be employed to mitigate the potential for new attack surfaces to be exploited [6].

==='''Applications for ML at the Edge'''===
The utilization of data to make conjectures about what is going on in the environment and how to respond has a variety of use cases that can greatly benefit people, cities, and the environment. By leveraging and monitoring a constant stream of data and training machine learning models to detect or even respond to different events, there can be many practical applications for such systems. These would rely on the combination of edge devices and machine learning to better enhance experience for users and detect events of interest.

'''Self-Driving Cars''': Imitation learning can be leveraged to better understand and emulate human driving. The low latency that can be provided by edge computing is especially useful for the quick decision making needed by these systems [7].

'''Smart Home Devices''': Understanding user habits by leveraging the available data for them can make smart homes more convenient for users. The increased privacy that can come with edge computing, along with a few extra cybersecurity measures, can ensure the personal data that may be used for training is not compromised.

'''Environmental and Industrial Monitoring:''' Sensors deployed in environments and industrial settings could be trained to recognize when there are anomalies or undesirable behavior, thus ensuring a quick response and active information sending [7].

'''Smart Cities:''' Similar to above, the data collected by sensors in cities can leverage machine learning to help in crime and emergency detection, traffic management, or energy management[7].

=='''4.2 ML Training at the Edge'''==

Machine Learning (ML) training at the edge is basically the process of developing, updating, or fine-tuning ML models directly on edge devices like on smartphones, IoT sensors, wearables, and other embedded systems instead of only depending on centralized cloud infrastructure. This approach is becoming a lot more important as the demand for real-time, personalized AI applications continues to grow. By being able to train models closer to where the data is generated, edge-based ML enables faster responses, helps reduce latency, and enhances user privacy by minimizing the need to transmit sensitive data to the cloud. It’s also especially useful in scenarios where devices operate in environments with limited or unreliable network connectivity, allowing them to function more efficiently.

[[File:Machine Learning Pipeline.png|500px|thumb|right|Diagram of the machine learning pipeline: raw structured and unstructured data is preprocessed, analyzed, and used for feature selection, followed by model construction, evaluation, and deployment for application use.]]

[[File:Learning.png|500px|thumb|right| This diagram shows the relationship among different devices in the edge-cloud system. As shown, the deep learning workload is partitioned among all layers so that each one contributes based on its advantages.]]

==='''Benefits:'''===
One significant advantage of training ML models directly on edge devices is reduced latency. By processing data locally, devices can make immediate decisions without the delays caused by transmitting data back and forth to cloud servers. This immediate responsiveness is extremely important for applications like real time health monitoring, autonomous driving, and industrial automation.

Additionally, training machine learning models at the edge significantly enhances user privacy. Since sensitive data can be processed and stored directly on the user's device rather than being sent to centralized cloud servers, the risk of data breaches or unauthorized access during transmission is reduced by a lot. This local data handling is able to prevent exposure of personal or confidential information, providing users greater control over their data. Edge-based training naturally aligns with privacy regulations such as the General Data Protection Regulation (GDPR), which emphasizes strict data security, transparency, and explicit user consent. By keeping personal data localized, edge training not only improves security but also helps organizations easily comply with privacy laws, protecting users’ rights and maintaining trust.

Efficiency and resilience are important benefits of edge training. By training machine learning models directly on edge devices, these devices become capable of processing data locally without relying on constant internet connectivity. This local processing allows edge devices to continue operating effectively even in environments where network connections are weak, unstable, or completely unavailable. Because they are not fully dependent on cloud infrastructure, edge devices can quickly adapt to changes, respond in real-time, and update their ML models based on immediate local data. As a result, edge training ensures reliable performance and uninterrupted operation, making it particularly valuable for remote locations, emergency scenarios, and harsh environments where cloud-based solutions might fail or become unreliable.

'''Examples: '''
A smart thermostat in a home can learn a user’s preferences for temperature and adjust automatically based on real-time inputs, like time of day or weather conditions. Similarly, a fitness tracker can track user activity patterns and adapt its recommendations for workouts or rest periods based on how the user is performing each day. These devices don’t need to rely on cloud servers to update or personalize their behavior — they can do it instantly on the device, which makes them more responsive and efficient.

In smart agriculture, edge computing is used to enhance crop monitoring and optimize farming practices. Devices like soil sensors, drones, and automated irrigation systems are equipped with sensors that collect data on soil moisture, temperature, and crop health. Edge devices process this data locally, enabling real-time decisions for tasks like irrigation, fertilization, and pest control.

In smart retail, edge computing is used to improve inventory management and customer experience. Retailers use smart shelves, RFID tags, and in-store cameras equipped with sensors to track inventory and monitor customer behavior. By processing this data locally on edge devices, retailers can manage stock levels, detect theft, and optimize store layouts in real-time. RFID tags placed on products can detect when an item is removed from the shelf. Using edge processing, the system can immediately update the inventory count and trigger a restocking request if an item’s stock is low.

'''Research Papers:'''
An important contribution to the understanding of machine learning (ML) training at the edge is the research paper "Making Distributed Edge Machine Learning for Resource-Constrained Communities and Environments Smarter: Contexts and Challenges" by Truong et al. (2023). This paper focuses on training ML models directly on edge devices in communities and environments facing limitations, such as unstable network connections, limited computational resources, and scarce technical expertise. The authors emphasize the necessity of developing context-aware ML training methods specifically tailored to these environments. Traditional centralized ML training methods often fail to operate effectively in such constrained settings, highlighting the need for decentralized, localized solutions. Truong et al. explore various challenges, including managing data efficiently, deploying suitable software frameworks, and designing intelligent runtime strategies that allow edge devices to train models effectively despite limited resources. Their work points out significant research directions, advocating for more adaptable and sustainable ML training solutions that genuinely reflect the technological and social contexts of resource-limited environments.

==='''Tools and Frameworks:'''===

Frameworks like TensorFlow Lite, PyTorch Mobile, and Edge Impulse are designed to support edge-based model training and inference. These tools allow developers to build and fine-tune models specifically for deployment on low-power devices.

==='''Technical Challenges:'''===

Despite its advantages, ML training at the edge presents challenges, including limited processing power, memory constraints, and energy efficiency. Edge devices often lack the computational resources of cloud servers, requiring lightweight models, optimized algorithms, and energy-efficient hardware.

Real World Applications:
A well known example is Apple’s use of on-device training for personalized voice recognition with Siri. Instead of uploading user voice data to the cloud, Apple uses local training to improve accuracy over time while maintaining user privacy.

==='''Dealing With Challenges'''===
Despite the challenges, of ML at the edge, there are a variety of methods that can be used to provide a more efficient means of training, and making the heavy workloads compatible with even the limited computing power of certain edge devices.

'''Quantization:'''
Quantization is a method that involves reducing the precision of numbers, and thus easing the burden of computational power as well as memory management on the edge devices. There are multiple forms of quantization, but each one essentially sacrifices some precision - enough so that accuracy is mostly maintained but the numbers are easier to handle. For example, converting from floating point to integer datatypes means significantly less memory is used, and the differences for some models in the precision may be negligible. Another example is K-means based Weight Quantization, which involves creating a matrix and grouping similar numbers together with centroids. An example is shown below:

[[File:Screenshot_2025-04-24_170604.png|500px|thumb|right|By clustering each index in the matrix, and using centroids to approximate, the overall computations can be done much quicker and are more easily handled by edge devices]]

In recent work, Quantized Neural Networks (QNNs) have demonstrated that even extreme quantization—such as using just 1-bit values for weights and activations—can retain near state-of-the-art accuracy across vision and language tasks [12]. This type of quantization drastically reduces memory access requirements and replaces expensive arithmetic operations with fast, low-power bitwise operations like XNOR and popcount. These benefits are especially important for edge deployment, where energy efficiency is critical. In addition to model compression, Hubara et al. also show that quantized gradients—using as little as 6 bits—can be employed during training with minimal performance loss, further enabling efficient on-device learning [12]. QNNs have achieved strong results even on demanding benchmarks like ImageNet, while offering significant speedups and memory savings, making them one of the most practical solutions for edge AI deployment [12].

'''Pruning:'''
Pruning is an optimization technique that systematically removes low-salience parameters—such as weakly contributing weights or redundant hypothesis paths—from a machine learning model or decoding algorithm to reduce computational overhead. In the context of edge computing, where resources like memory bandwidth, power, and processing time are limited, pruning enables the deployment of performant models within strict efficiency constraints.

In statistical machine translation (SMT) systems, pruning is particularly critical during the decoding phase, where the search space of possible translations grows exponentially with sentence length. Techniques such as histogram pruning and threshold pruning are employed to manage this complexity. Histogram pruning restricts the number of candidate hypotheses retained in a decoding stack to a fixed size 𝑛, discarding the remainder. Threshold pruning eliminates hypotheses whose scores fall below a proportion 𝛼 of the best-scoring candidate, effectively filtering out weak candidates early.

The paper by Banik et al. introduces a machine learning-based dynamic pruning framework that adaptively tunes pruning parameters—namely stack size and beam threshold—based on structural features of the input text, such as sentence length, syntactic complexity, and the distribution of stop words. Rather than relying on static hyperparameters, this method uses a classifier (CN2 algorithm) trained on performance data to predict optimal pruning configurations at runtime. Experimental results showed consistent reductions in decoding latency (up to 90%) while maintaining or improving translation quality, as measured by BLEU scores [13].

This adaptive pruning paradigm is highly relevant to edge inference pipelines, where models must maintain a balance between latency and predictive accuracy. By intelligently limiting the hypothesis space and focusing computational resources on high-probability paths, pruning supports real-time, resource-efficient processing in edge NLP and embedded translation systems.

[[File:Pruning.png|350px|thumb|center|This shows how pruning can significantly reduce the overall network, thus leading to better computational and memory management]]

'''Distillation:'''
Distillation is a key strategy for reducing model complexity in edge computing environments. Instead of training a compact student model on hard labels—discrete class labels like 0, 1, or 2—it is trained on the soft outputs of a larger teacher model. These soft labels represent probability distributions over all classes, offering more nuanced supervision. For instance, rather than telling the student the input belongs strictly to class 3, a teacher might output “70% class 3, 25% class 2, 5% class 1.” This richer feedback helps the student model capture subtle relationships between classes that hard labels miss. Beyond reducing computational demands, distillation enhances generalization by conveying more informative training signals. It also benefits from favorable data geometry—when class distributions are well-separated and aligned—and exhibits strong monotonicity, meaning the student model reliably improves as more data becomes available [11]. These properties make it exceptionally suited for edge devices where training data may be limited, but efficient inference is crucial.

In most cases, knowledge distillation in edge environments involves a large, high-capacity model trained in the cloud acting as the teacher, while the smaller, lightweight student model is deployed on edge devices. A less common—but emerging—practice is edge-to-edge distillation, where a more powerful edge node or edge server functions as the teacher for other nearby edge devices. This setup is especially valuable in federated, collaborative, or hierarchical edge networks, where cloud connectivity may be limited or privacy concerns necessitate local training. Distillation can also be combined with techniques such as quantization or pruning to further optimize model performance under hardware constraints. An example is shown below:

[[File:Knowledge_Distillation.png|700px|thumb|center|This shows how a complex teacher model transfers learned knowledge to a smaller student model using soft predictions to enable efficient edge deployment]]

==='''Usage and Applications of AI Agents'''===
As AI develops and is able to identify different images, generate chats and utilize human language, and optimize different systems, the next step is for it to be able to efficiently perform tasks specified by users. This is known as an AI agent. The agent must first perceive its environment and task, often specified by a user, then it must think about what is the best means to accomplish the task, and finally be able to act on it. 3 aspects of AI agents are crucial in their development to efficiently and correctly carry out task:

'''Reasoning:''' The agent must be able to think sequentially, and decompose its specified tasks into a sequence of specific steps in order to accomplish its goal. It must also have some memory storage in order to remember what it has done, as well as the results of its sequence of actions in order to learn for future steps.

'''Autonomy:''' The agent must choose from the availability of possible steps, and operate based on its reasoning without step-by-step instructions from the user.

'''Tools:''' These tasks, however, are impossible to accomplish without the correct tools. Even if an AI agent understands how to go about carrying a task for optimal results, it must have the actual means to do it. This can include the ability to use and interact with APIs, interpret code, and access certain databases.

Utilizing AI agents on edge devices can be tricky due to the computational and reasoning power needed. However, there are methods to accomplish this such as SLMs which query LLMs as need (discussed later), or utilizing more powerful edge devices to carry out tasks. However, utilizing edge devices can be paramount if latency is a major issue, or if the agent is exposed to sensitive user data. Additionally, by using edge devices specific to a user, it may be able to better learn a user's patterns and preferences and react accordingly to provide the best possible outcome for that user.

=='''4.3 ML Model Optimization at the Edge'''==

==='''The Need for Model Optimization at the Edge'''===
Given the constrained resources, along with the inherently dynamic environment that edge devices must function in, model optimization is a crucial part of machine learning in edge computing. The current most widely used methodology consists of simply specifying an exceptionally large set of parameters, and giving it to the model to train on. This can be feasible when hardware is very advanced and powerful, and is necessary for systems such as Large Language Models (LLMs). However, this is no longer viable when dealing with the devices and environments at the edge. It is crucial to identify the best parameters and training methodology so as to minimize the amount of work done by these devices, while compromising as little as possible on the accuracy of the models. There are multiple ways to this, and they include either optimization or augmentation of the dataset itself, or optimization of the partition of work among the edge devices.

==='''Edge and Cloud Collaboration'''===
One methodology that is often used involves collaboration between both Edge and Cloud Devices. The cloud has the ability to process workloads that may require much more resources and cannot be done on edge devices. On the other hand, edge devices, which can store and process data locally, may have lower latency and more privacy. Given the advantages of each of these, many have proposed that the best way to handle machine learning is through a combination of edge and cloud computing.

The primary issue facing this computing paradigm, however, is the problem of optimally selecting which workloads should be done on the cloud and which should be done on the edge. This is a crucial problem to solve, as the correct partition of workloads is the best way to ensure that the respective benefits of the devices can be leveraged. A common way to do this, is to run certain computing tasks on the necessary devices and determine the length of time and resources that it takes. An example of this is the profiling step done in [https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10818760&tag=1 EdgeShard] [1] and [https://dl.acm.org/doi/pdf/10.1145/3093337.3037698 Neurosurgeon] [4]. Other frameworks implement similar steps, where the capabilities of different devices are tested in order to allocate their workloads and determine the limit at which they can provide efficient functionality. If the workload is beyond the limits of the devices, it can be sent to the cloud for processing

The key advantage of this is that it is able to utilize the resources of the edge devices as necessary, allowing increased data privacy and lower latency. Since workloads are only processed in the cloud as needed, this will reduce the overall amount of time needed for processing because data is not constantly sent back and forth. It also allows for much less network congestion, which is crucial for many applications.

[[File:ECcollab.png|400px|thumb|center|The collaboration of Edge and Cloud Devices]]

==='''Optimizing Workload Partitioning'''===
The key idea for much of the optimization done in machine learning on edge systems involves fully utilizing the heterogenous devices that are often contained in these systems. As such, it is important to understand the capabilities of each device so as to fully utilize its advantages. Devices can very greatly from smartphones with more powerful computational abilities to raspberry pis to sensors. More difficult tasks are offloaded to the powerful devices, while simpler tasks, or models that have been somewhat pretrained can be sent to the smaller devices. In some cases, as in [https://ieeexplore.ieee.org/abstract/document/8690980 Mobile-Edge] [2], the task may be dropped altogether if the resources are deemed insufficient. In this way, exceptionally difficult tasks do not block tasks that have the ability to be executed and therefore the system can continue working.

==='''Dynamic Models'''===
Given the dynamic nature of the environments that edge devices must function, as well as the heterogeneity of the devices themselves, a dynamic model of machine learning is often employed. Such models must keep track of the current available resources including computation usage and power, as well as network traffic. These may change very often depending on the workloads and devices in the system. As such, training models to continuously monitor and dynamically distribute the workloads is a very important part of optimization. Simply offloading larger tasks to more powerful devices may be obsolete if the devices has all of its computing resources or network capabilities being used up by another workload.

The way this is commonly done is by using the profiling step described above as a baseline. Then, a machine learning model utilizes the data to estimate the performance of devices and/or layers. During runtime, a similar process is employed which may update the data used and help the model refine its predictions. Network traffic is also taken into account at this stage in order to preserve the edge computing benefit of providing lower latency. Using all of this data and updates at runtime, the partitioning model is able to dynamically distribute workloads at runtime in order to optimize the workflow and ensure each device is utilizing its resources in the most efficient manner. 2 very good examples of how such a system is specifically deployed are the Neurosurgeon and EdgeShard systems, shown above.

==='''Horizontal and Vertical Partitioning'''===
There are 2 major ways that these models split the workloads in order to optimize the machine learning: Horizontal and vertical partitioning [3]. Given a set of layers that ranges from the cloud to edge, vertical partitioning involves splitting up the workload between the layers. For example, if a large amount of computational resources is deemed necessary, this task may go to the cloud to be completed and preprocessed. One the other hand, if a small amount of computational power is required, this type of work can go to edge devices. Such partitioning also depends on the confidence and accuracy level of the given learning. If the accuracy is completed on an edge device and found to be very low, it can be sent to the cloud; on the other hand if the accuracy is already fairly high and the learning model needs smaller work to reach the threshold deemed acceptable, it may be sent to edge devices to free up network traffic on the cloud and reduce latency [3].

The second model of partitioning is called horizontal partitioning. This involves splitting among the devices within a certain layer rather than among the layers themselves. This is similar to what has been described in previous sections, as it allows a means for fully utilizing the heterogenous abilities that are found in edge devices. Similar functionality and determination to what is found in horizontal partitioning is done, but all of the devices that the workload is split across function on the same layer [3]. To fully optimize a machine learning model, both horizontal and vertical partitioning must be used.

[[File:Edge_computing_layers.png|400px|thumb|center|An example of different layers with multiple devices]]

==='''Distributed Learning'''===
Distributed learning is a process in which instead of giving all of the data to the nodes, each node takes in and processes only a subset of the overall data, then sends its results to a central server which contains the main model. These periodic updates are done by each node, and by only processing part of the data, it is much more easy for edge devices to handle these workloads. It can also reduce the time and computational burden on the cloud and network because it is not only the central server performing all of the computations. One popular method of accomplishing this is by using parallel gradient descent. Gradient descent by itself is a very useful means of training a model, and parallel gradient descent uses a similar process but instead of a single operation, it aggregates the gradient descent calculated by each node in order to update the central model. This efficiently utilizes each node and makes sure that the data that is used to update the model does not exceed the memory constraints of the edge devices being used as nodes.

[[File:Distributed.png|600px|thumb|center| A visualization of data partitioning among nodes doing distributed learning]]

'''Asynchronous Scheduling:''' One important aspect of distributed learning is the means by which the updates are given to the model. Each node may have different amounts of data, memory, and computational power, and therefore the time it takes to process and update will not be the same. Synchronous algorithms make sure that each node finishes its respective processing, and then all of the calculations from all nodes are sent to update the central model. Although, this can make the updating easier, any nodes that lag behind the others will greatly reduce the speed at which the model is trained; this also leaves nodes idle while others are still processing and can be highly inefficient. To optimize this, after each node has finished its respective processing, it sends everything to the main server for an update rather than waiting for all the others to finish. However, this requires some more communication, as after each update, every node must get the updated model for the central server; this can be challenging to repeatedly do, but the efficiency that is gained is often worth it.

'''Federated learning:''' Federated learning is another means by which the data as well as computational burden is distributed among a set of nodes, thus reducing network traffic and strain on the central servers. It is similar to distributed learning, but the key difference lies in the data partitioning. Unlike distributed learning, data used in federated learning is not shared with the central servers. Instead, only local data from each of the nodes is used to train the model. Then, the only part shared with the central model is the updated parameters. A key aspect of this is that federated learning provides a greater amount of data privacy, which is crucial for certain applications dealing with sensitive data. Therefore, it is especially useful to utilize edge devices to perform federated learning. Federated learning is discussed in detail in chapter 5 of this wiki.

==='''Transfer Learning'''===
Transfer learning is a method of machine learning in which a pretrained model is sent to the different nodes for further processing and fine-tuning. The initial training of the model may involve a very large amount of data and could place a major burden on the device that must execute it, which may not be feasible for edge devices. Therefore, the bulk of the model training is done by a more powerful machine, such as the cloud, and then the pretrained model is sent to the edge device. This can be useful, as it reduces the computational burden on the edge device, and allows it to fine-tune the model using the data it collects without having to completely train the whole system. One form of this is knowledge distillation, in which a smaller model can be trained to mimic that of a larger model. This may often be the case when edge and cloud systems are used in a combined way.

==='''Methods of Data Optimization'''===

'''Data Selection:''' Certain types of data may be more useful than others, and therefore the ones that most affect the accuracy of a model can be offloaded to edge devices. This decreases the workload on them because less data must be processed, while also conserving the accuracy of the model as much as possible. Some larger data may not be needed, such as only putting Small Language Models on edge devices that can handle simple commands and prompts, rather than having to offload an entire LLM onto the device, which may overload its computational abilities and not provide enough use.

'''Data Compression:''' Data can be compressed to a smaller form in order to fit the constraints of edge devices. This is especially true given their limited memory, and also makes the workload smaller. Quantization, discussed previously, is a prevalent example of this.

'''Container Workloads:''' Container workloads can be very useful, as they provide all resources and important data the device needs for processing the work. By examining the computational abilities of the device, different sized workloads can be allocated as deemed necessary to maximize the efficiency of the training.

'''Batch size tuning:''' The batch size used by a certain model is very important when considering the memory constraints of edge devices. Batch sizes that are smaller allow for a quicker and more efficient means of training models, and are less likely to lead to bottlenecks in memory. This is related to partitioning because the computing and memory capacity of the devices available are very important factors to consider.

==='''Utilizing Small Language Models (SLMs)'''===
Large Language Models (LLMs) have become a prevalent system recently and are able to do and help with a variety of tasks. However, running and training an LLM requires a significant amount of computational resources which is not feasible when working with edge devices. Most modern LLMs are cloud based, but this may lead to high latency and increased network traffic, especially when working with a large subsystem of nodes.

One way that a similar system can be achieved on edge devices is by using SLMs. These are not as accurate and do not have the vast knowledge of LLMs or the amount of data they are trained on, but for the purposes of basic applications and edge devices, they can be sufficient to accomplish many tasks. They are also often fine-tuned and trained to accomplish the specific tasks which they are deployed for and are much faster and resource-efficient than LLMs. They can also provide much more privacy because they are able to be run on local devices without sharing user data to the cloud. This can be useful for a wide variety of edge applications. If needed and privacy constraints permit, they can query and LLM as needed for more complicated tasks. This means that not every prompt leads to a query, and thus network traffic, privacy, and latency constraints are still preserved.

=='''References'''==
[https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10818760&tag=1 1. M. Zhang, X. Shen, J. Cao, Z. Cui and S. Jiang, "EdgeShard: Efficient LLM Inference via Collaborative Edge Computing," in IEEE Internet of Things Journal, doi: 10.1109/JIOT.2024.3524255]

[https://ieeexplore.ieee.org/abstract/document/8690980 2. X. Chen, H. Zhang, C. Wu, S. Mao, Y. Ji and M. Bennis, "Performance Optimization in Mobile-Edge Computing via Deep Reinforcement Learning," 2018 IEEE 88th Vehicular Technology Conference (VTC-Fall), Chicago, IL, USA, 2018, pp. 1-6, doi: 10.1109/VTCFall.2018.8690980.]

[https://ieeexplore.ieee.org/abstract/document/8976180 3. X. Wang, Y. Han, V. C. M. Leung, D. Niyato, X. Yan and X. Chen, "Convergence of Edge Computing and Deep Learning: A Comprehensive Survey," in IEEE Communications Surveys & Tutorials, vol. 22, no. 2, pp. 869-904, Secondquarter 2020, doi: 10.1109/COMST.2020.2970550.]

[https://dl.acm.org/doi/abs/10.1145/3093337.3037698 4. Kang, Yiping and Hauswald, Johann and Gao, Cao and Rovinski, Austin and Mudge, Trevor and Mars, Jason and Tang, Lingjia, "Neurosurgeon: Collaborative Intelligence Between the Cloud and Mobile Edge" 2017 Association for Computing Machinery, New York, NY, USA, 2017, doi: 10.1145/3093337.3037698.]

[https://dl.acm.org/doi/pdf/10.1145/3555802 5. Hua, Haochen, et al. "Edge computing with artificial intelligence: A machine learning perspective." ACM Computing Surveys 55.9 (2023): 1-35.]

[https://www.mdpi.com/2079-9292/13/3/640 6. Grzesik, Piotr, and Dariusz Mrozek. "Combining machine learning and edge computing: Opportunities, challenges, platforms, frameworks, and use cases." Electronics 13.3 (2024): 640.]

[https://link.springer.com/chapter/10.1007/978-3-030-96756-7_1#citeas 7. S. Rafatirad, H. Homayoun, Z. Chen, and S. M. Pudukotai Dinakarrao, "What Is Applied Machine Learning?," in Machine Learning for Computer Scientists and Data Analysts, Cham, Switzerland: Springer, 2022.]

[https://link.springer.com/article/10.1007/s13748-012-0035-5 8. Peteiro-Barral, Diego, and Bertha Guijarro-Berdiñas. "A survey of methods for distributed machine learning." Progress in Artificial Intelligence 2 (2013): 1-11.]

[https://proceedings.neurips.cc/paper/2010/hash/abea47ba24142ed16b7d8fbf2c740e0d-Abstract.html 9. Zinkevich, Martin, et al. "Parallelized stochastic gradient descent." Advances in neural information processing systems 23 (2010).]

[https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9134370 10. F. Zhuang et al., "A Comprehensive Survey on Transfer Learning," in Proceedings of the IEEE, vol. 109, no. 1, pp. 43-76, Jan. 2021, doi: 10.1109/JPROC.2020.3004555]

[https://proceedings.mlr.press/v97/phuong19a/phuong19a.pdf 11. M. Phuong and C. Lampert, "Towards understanding knowledge distillation," in Proc. Int. Conf. Mach. Learn., May 2019, pp. 5142–5151.]

[https://www.jmlr.org/papers/volume18/16-456/16-456.pdf 12. I. Hubara, M. Courbariaux, D. Soudry, R. El-Yaniv, and Y. Bengio, "Quantized neural networks: Training neural networks with low precision weights and activations," Journal of Machine Learning Research, vol. 18, no. 187, pp. 1–30, 2018.]]

[https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=8588318 13. D. Banik, A. Ekbal, and P. Bhattacharyya, "Machine learning based optimized pruning approach for decoding in statistical machine translation," IEEE Access, vol. 7, pp. 1736–1751, Dec. 2018, doi: 10.1109/ACCESS.2018.2883738.]

Machine Learning at the Edge

2025-04-25T04:24:22Z

Ciarang: /* 4.2 ML Training at the Edge */

='''Machine Learning at the Edge'''=

=='''4.1 Overview of ML at the Edge'''==
==='''Introduction'''===
Machine Learning is a branch of Artificial Intelligence that is primarily concerned with the training of algorithms to look at sets of data, analyze patterns, and generate conclusions so that the gathered data can be used to generate results and carry out tasks. As an application, machine learning is especially relevant when considering edge computing [7]. Given the recent prevalence and rise in the field of machine learning, it is inevitable that deployment of machine learning algorithms will play a crucial role in the function of edge computing systems as well. The abundant sensor networks often involved in edge systems provide a means of gathering very large amounts of data. With the help of machine learning, such data can be utilized in very effective ways and automation methods can be employed to improve efficiency. Machine learning on edge devices themselves can help ease the burden on cloud systems and the networks connected to them as well, especially if the model does need such large computational power such as in the case of certain Small Language Models (SLMs). However, the main issue regarding machine learning at the edge lies in the devices, which may have limited computational power, and the environment, which may change dynamically and be effected by network congestion, device outages, or other unexpected events. If these challenges can be overcome, edge systems could play a pivotal role in advancing machine learning and utilize it to maximize efficiency.

[[File:ECML.png|600px|thumb|center|]]

==='''Benefits of Machine Learning using Edge Computing'''===
'''Data Volumes:''' Traditional cloud computing architecture may not be able to keep up with the massive volume of data that is often generated by IoT devices and sensors, thus increasing costs as well as pressure and congestion on the networks transmitting data [5]. With edge computing architecture, machine learning can be deployed in a distributed nature, allowing ML models to be deployed and trained across many edge devices. The nature of edge devices sharing computational tasks makes it perfect for taking large datasets and splitting the computational load so that many devices can compute what a single device might not have been able to handle [5]. More than just splitting input data across edge devices, machine learning models can be split up across edge devices as well. This allows for deployment of models that would be otherwise too large to be deployed on a single edge device with limited memory. In such cases, edge nodes are able to collaboratively pass data between them, and the global ML model is later updated based on the workings of each smaller portion [5]. Additionally, the sheer amount of data that processed from edge devices means more training data for machine learning models, which generally increases accuracy, and thus edge computing could be an effective solution to providing lots of good and relevant data for a variety of applications.

'''Lower Latency:''' For certain real-time applications, such as Virtual and Augmented Reality(VR/AR) or smart cars, the latency required to transmit data and process it to the cloud may be too high to make these applications efficient and safe. A key benefit of edge computing lies in the reduced latency provided by putting devices closer to the users, and this is applicable to machine learning as well [5].

'''Enhanced privacy:''' Processing data locally rather than on the cloud reduces the risk of data theft and enhances privacy. The data does not have to be sent to anyone else, or go over a network to potentially get compromised. This is crucial given the large amounts of data needed for machine learning, and especially if personal data is needed for a personal application, many users would prefer the privacy done by having it processed locally [5].

==='''Challenges of Machine Learning Using Edge Computing'''===

'''Potentially Low Computational Power:''' Many edge devices lack the computational power needed for needed for deep learning applications, especially given the large amount of data and complex operations that may have to take place [1].

'''Energy Management:''' Given that edge devices may consist of sensors, or other devices that are meant to have low energy consumption, the tasks associated with machine learning could quickly drain their power, even if they have sufficient power to run the workloads [6].

'''New Attack Surfaces:''' With more devices comes the increased potential for malicious hackers to steal data or compromise devices, despite the enhanced privacy. Encryption, access controls, and other methodologies must be employed to mitigate the potential for new attack surfaces to be exploited [6].

==='''Applications for ML at the Edge'''===
The utilization of data to make conjectures about what is going on in the environment and how to respond has a variety of use cases that can greatly benefit people, cities, and the environment. By leveraging and monitoring a constant stream of data and training machine learning models to detect or even respond to different events, there can be many practical applications for such systems. These would rely on the combination of edge devices and machine learning to better enhance experience for users and detect events of interest.

'''Self-Driving Cars''': Imitation learning can be leveraged to better understand and emulate human driving. The low latency that can be provided by edge computing is especially useful for the quick decision making needed by these systems [7].

'''Smart Home Devices''': Understanding user habits by leveraging the available data for them can make smart homes more convenient for users. The increased privacy that can come with edge computing, along with a few extra cybersecurity measures, can ensure the personal data that may be used for training is not compromised.

'''Environmental and Industrial Monitoring:''' Sensors deployed in environments and industrial settings could be trained to recognize when there are anomalies or undesirable behavior, thus ensuring a quick response and active information sending [7].

'''Smart Cities:''' Similar to above, the data collected by sensors in cities can leverage machine learning to help in crime and emergency detection, traffic management, or energy management[7].

=='''4.2 ML Training at the Edge'''==

Machine Learning (ML) training at the edge is basically the process of developing, updating, or fine-tuning ML models directly on edge devices like on smartphones, IoT sensors, wearables, and other embedded systems instead of only depending on centralized cloud infrastructure. This approach is becoming a lot more important as the demand for real-time, personalized AI applications continues to grow. By being able to train models closer to where the data is generated, edge-based ML enables faster responses, helps reduce latency, and enhances user privacy by minimizing the need to transmit sensitive data to the cloud. It’s also especially useful in scenarios where devices operate in environments with limited or unreliable network connectivity, allowing them to function more efficiently.

[[File:Machine Learning Pipeline.png|500px|thumb|right|Diagram of the machine learning pipeline: raw structured and unstructured data is preprocessed, analyzed, and used for feature selection, followed by model construction, evaluation, and deployment for application use.]]

[[File:Learning.png|500px|thumb|right| This diagram shows the relationship among different devices in the edge-cloud system. As shown, the deep learning workload is partitioned among all layers so that each one contributes based on its advantages.]]

==='''Benefits:'''===
One significant advantage of training ML models directly on edge devices is reduced latency. By processing data locally, devices can make immediate decisions without the delays caused by transmitting data back and forth to cloud servers. This immediate responsiveness is extremely important for applications like real time health monitoring, autonomous driving, and industrial automation.

Additionally, training machine learning models at the edge significantly enhances user privacy. Since sensitive data can be processed and stored directly on the user's device rather than being sent to centralized cloud servers, the risk of data breaches or unauthorized access during transmission is reduced by a lot. This local data handling is able to prevent exposure of personal or confidential information, providing users greater control over their data. Edge-based training naturally aligns with privacy regulations such as the General Data Protection Regulation (GDPR), which emphasizes strict data security, transparency, and explicit user consent. By keeping personal data localized, edge training not only improves security but also helps organizations easily comply with privacy laws, protecting users’ rights and maintaining trust.

Efficiency and resilience are important benefits of edge training. By training machine learning models directly on edge devices, these devices become capable of processing data locally without relying on constant internet connectivity. This local processing allows edge devices to continue operating effectively even in environments where network connections are weak, unstable, or completely unavailable. Because they are not fully dependent on cloud infrastructure, edge devices can quickly adapt to changes, respond in real-time, and update their ML models based on immediate local data. As a result, edge training ensures reliable performance and uninterrupted operation, making it particularly valuable for remote locations, emergency scenarios, and harsh environments where cloud-based solutions might fail or become unreliable.

Examples:

A smart thermostat in a home can learn a user’s preferences for temperature and adjust automatically based on real-time inputs, like time of day or weather conditions. Similarly, a fitness tracker can track user activity patterns and adapt its recommendations for workouts or rest periods based on how the user is performing each day. These devices don’t need to rely on cloud servers to update or personalize their behavior — they can do it instantly on the device, which makes them more responsive and efficient.

In smart agriculture, edge computing is used to enhance crop monitoring and optimize farming practices. Devices like soil sensors, drones, and automated irrigation systems are equipped with sensors that collect data on soil moisture, temperature, and crop health. Edge devices process this data locally, enabling real-time decisions for tasks like irrigation, fertilization, and pest control.

In smart retail, edge computing is used to improve inventory management and customer experience. Retailers use smart shelves, RFID tags, and in-store cameras equipped with sensors to track inventory and monitor customer behavior. By processing this data locally on edge devices, retailers can manage stock levels, detect theft, and optimize store layouts in real-time. RFID tags placed on products can detect when an item is removed from the shelf. Using edge processing, the system can immediately update the inventory count and trigger a restocking request if an item’s stock is low.

Research Papers

An important contribution to the understanding of machine learning (ML) training at the edge is the research paper "Making Distributed Edge Machine Learning for Resource-Constrained Communities and Environments Smarter: Contexts and Challenges" by Truong et al. (2023). This paper focuses on training ML models directly on edge devices in communities and environments facing limitations, such as unstable network connections, limited computational resources, and scarce technical expertise. The authors emphasize the necessity of developing context-aware ML training methods specifically tailored to these environments. Traditional centralized ML training methods often fail to operate effectively in such constrained settings, highlighting the need for decentralized, localized solutions. Truong et al. explore various challenges, including managing data efficiently, deploying suitable software frameworks, and designing intelligent runtime strategies that allow edge devices to train models effectively despite limited resources. Their work points out significant research directions, advocating for more adaptable and sustainable ML training solutions that genuinely reflect the technological and social contexts of resource-limited environments.

==='''Tools and Frameworks:'''===

Frameworks like TensorFlow Lite, PyTorch Mobile, and Edge Impulse are designed to support edge-based model training and inference. These tools allow developers to build and fine-tune models specifically for deployment on low-power devices.

==='''Technical Challenges:'''===

Despite its advantages, ML training at the edge presents challenges, including limited processing power, memory constraints, and energy efficiency. Edge devices often lack the computational resources of cloud servers, requiring lightweight models, optimized algorithms, and energy-efficient hardware.

Real World Applications:
A well known example is Apple’s use of on-device training for personalized voice recognition with Siri. Instead of uploading user voice data to the cloud, Apple uses local training to improve accuracy over time while maintaining user privacy.

==='''Dealing With Challenges'''===
Despite the challenges, of ML at the edge, there are a variety of methods that can be used to provide a more efficient means of training, and making the heavy workloads compatible with even the limited computing power of certain edge devices.

'''Quantization:'''
Quantization is a method that involves reducing the precision of numbers, and thus easing the burden of computational power as well as memory management on the edge devices. There are multiple forms of quantization, but each one essentially sacrifices some precision - enough so that accuracy is mostly maintained but the numbers are easier to handle. For example, converting from floating point to integer datatypes means significantly less memory is used, and the differences for some models in the precision may be negligible. Another example is K-means based Weight Quantization, which involves creating a matrix and grouping similar numbers together with centroids. An example is shown below:

[[File:Screenshot_2025-04-24_170604.png|500px|thumb|right|By clustering each index in the matrix, and using centroids to approximate, the overall computations can be done much quicker and are more easily handled by edge devices]]

In recent work, Quantized Neural Networks (QNNs) have demonstrated that even extreme quantization—such as using just 1-bit values for weights and activations—can retain near state-of-the-art accuracy across vision and language tasks [12]. This type of quantization drastically reduces memory access requirements and replaces expensive arithmetic operations with fast, low-power bitwise operations like XNOR and popcount. These benefits are especially important for edge deployment, where energy efficiency is critical. In addition to model compression, Hubara et al. also show that quantized gradients—using as little as 6 bits—can be employed during training with minimal performance loss, further enabling efficient on-device learning [12]. QNNs have achieved strong results even on demanding benchmarks like ImageNet, while offering significant speedups and memory savings, making them one of the most practical solutions for edge AI deployment [12].

'''Pruning:'''
Pruning is an optimization technique that systematically removes low-salience parameters—such as weakly contributing weights or redundant hypothesis paths—from a machine learning model or decoding algorithm to reduce computational overhead. In the context of edge computing, where resources like memory bandwidth, power, and processing time are limited, pruning enables the deployment of performant models within strict efficiency constraints.

In statistical machine translation (SMT) systems, pruning is particularly critical during the decoding phase, where the search space of possible translations grows exponentially with sentence length. Techniques such as histogram pruning and threshold pruning are employed to manage this complexity. Histogram pruning restricts the number of candidate hypotheses retained in a decoding stack to a fixed size 𝑛, discarding the remainder. Threshold pruning eliminates hypotheses whose scores fall below a proportion 𝛼 of the best-scoring candidate, effectively filtering out weak candidates early.

The paper by Banik et al. introduces a machine learning-based dynamic pruning framework that adaptively tunes pruning parameters—namely stack size and beam threshold—based on structural features of the input text, such as sentence length, syntactic complexity, and the distribution of stop words. Rather than relying on static hyperparameters, this method uses a classifier (CN2 algorithm) trained on performance data to predict optimal pruning configurations at runtime. Experimental results showed consistent reductions in decoding latency (up to 90%) while maintaining or improving translation quality, as measured by BLEU scores [13].

This adaptive pruning paradigm is highly relevant to edge inference pipelines, where models must maintain a balance between latency and predictive accuracy. By intelligently limiting the hypothesis space and focusing computational resources on high-probability paths, pruning supports real-time, resource-efficient processing in edge NLP and embedded translation systems.

[[File:Pruning.png|350px|thumb|center|This shows how pruning can significantly reduce the overall network, thus leading to better computational and memory management]]

'''Distillation:'''
Distillation is a key strategy for reducing model complexity in edge computing environments. Instead of training a compact student model on hard labels—discrete class labels like 0, 1, or 2—it is trained on the soft outputs of a larger teacher model. These soft labels represent probability distributions over all classes, offering more nuanced supervision. For instance, rather than telling the student the input belongs strictly to class 3, a teacher might output “70% class 3, 25% class 2, 5% class 1.” This richer feedback helps the student model capture subtle relationships between classes that hard labels miss. Beyond reducing computational demands, distillation enhances generalization by conveying more informative training signals. It also benefits from favorable data geometry—when class distributions are well-separated and aligned—and exhibits strong monotonicity, meaning the student model reliably improves as more data becomes available [11]. These properties make it exceptionally suited for edge devices where training data may be limited, but efficient inference is crucial.

In most cases, knowledge distillation in edge environments involves a large, high-capacity model trained in the cloud acting as the teacher, while the smaller, lightweight student model is deployed on edge devices. A less common—but emerging—practice is edge-to-edge distillation, where a more powerful edge node or edge server functions as the teacher for other nearby edge devices. This setup is especially valuable in federated, collaborative, or hierarchical edge networks, where cloud connectivity may be limited or privacy concerns necessitate local training. Distillation can also be combined with techniques such as quantization or pruning to further optimize model performance under hardware constraints. An example is shown below:

[[File:Knowledge_Distillation.png|700px|thumb|center|This shows how a complex teacher model transfers learned knowledge to a smaller student model using soft predictions to enable efficient edge deployment]]

==='''Usage and Applications of AI Agents'''===
As AI develops and is able to identify different images, generate chats and utilize human language, and optimize different systems, the next step is for it to be able to efficiently perform tasks specified by users. This is known as an AI agent. The agent must first perceive its environment and task, often specified by a user, then it must think about what is the best means to accomplish the task, and finally be able to act on it. 3 aspects of AI agents are crucial in their development to efficiently and correctly carry out task:

'''Reasoning:''' The agent must be able to think sequentially, and decompose its specified tasks into a sequence of specific steps in order to accomplish its goal. It must also have some memory storage in order to remember what it has done, as well as the results of its sequence of actions in order to learn for future steps.

'''Autonomy:''' The agent must choose from the availability of possible steps, and operate based on its reasoning without step-by-step instructions from the user.

'''Tools:''' These tasks, however, are impossible to accomplish without the correct tools. Even if an AI agent understands how to go about carrying a task for optimal results, it must have the actual means to do it. This can include the ability to use and interact with APIs, interpret code, and access certain databases.

Utilizing AI agents on edge devices can be tricky due to the computational and reasoning power needed. However, there are methods to accomplish this such as SLMs which query LLMs as need (discussed later), or utilizing more powerful edge devices to carry out tasks. However, utilizing edge devices can be paramount if latency is a major issue, or if the agent is exposed to sensitive user data. Additionally, by using edge devices specific to a user, it may be able to better learn a user's patterns and preferences and react accordingly to provide the best possible outcome for that user.

=='''4.3 ML Model Optimization at the Edge'''==

==='''The Need for Model Optimization at the Edge'''===
Given the constrained resources, along with the inherently dynamic environment that edge devices must function in, model optimization is a crucial part of machine learning in edge computing. The current most widely used methodology consists of simply specifying an exceptionally large set of parameters, and giving it to the model to train on. This can be feasible when hardware is very advanced and powerful, and is necessary for systems such as Large Language Models (LLMs). However, this is no longer viable when dealing with the devices and environments at the edge. It is crucial to identify the best parameters and training methodology so as to minimize the amount of work done by these devices, while compromising as little as possible on the accuracy of the models. There are multiple ways to this, and they include either optimization or augmentation of the dataset itself, or optimization of the partition of work among the edge devices.

==='''Edge and Cloud Collaboration'''===
One methodology that is often used involves collaboration between both Edge and Cloud Devices. The cloud has the ability to process workloads that may require much more resources and cannot be done on edge devices. On the other hand, edge devices, which can store and process data locally, may have lower latency and more privacy. Given the advantages of each of these, many have proposed that the best way to handle machine learning is through a combination of edge and cloud computing.

The primary issue facing this computing paradigm, however, is the problem of optimally selecting which workloads should be done on the cloud and which should be done on the edge. This is a crucial problem to solve, as the correct partition of workloads is the best way to ensure that the respective benefits of the devices can be leveraged. A common way to do this, is to run certain computing tasks on the necessary devices and determine the length of time and resources that it takes. An example of this is the profiling step done in [https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10818760&tag=1 EdgeShard] [1] and [https://dl.acm.org/doi/pdf/10.1145/3093337.3037698 Neurosurgeon] [4]. Other frameworks implement similar steps, where the capabilities of different devices are tested in order to allocate their workloads and determine the limit at which they can provide efficient functionality. If the workload is beyond the limits of the devices, it can be sent to the cloud for processing

The key advantage of this is that it is able to utilize the resources of the edge devices as necessary, allowing increased data privacy and lower latency. Since workloads are only processed in the cloud as needed, this will reduce the overall amount of time needed for processing because data is not constantly sent back and forth. It also allows for much less network congestion, which is crucial for many applications.

[[File:ECcollab.png|400px|thumb|center|The collaboration of Edge and Cloud Devices]]

==='''Optimizing Workload Partitioning'''===
The key idea for much of the optimization done in machine learning on edge systems involves fully utilizing the heterogenous devices that are often contained in these systems. As such, it is important to understand the capabilities of each device so as to fully utilize its advantages. Devices can very greatly from smartphones with more powerful computational abilities to raspberry pis to sensors. More difficult tasks are offloaded to the powerful devices, while simpler tasks, or models that have been somewhat pretrained can be sent to the smaller devices. In some cases, as in [https://ieeexplore.ieee.org/abstract/document/8690980 Mobile-Edge] [2], the task may be dropped altogether if the resources are deemed insufficient. In this way, exceptionally difficult tasks do not block tasks that have the ability to be executed and therefore the system can continue working.

==='''Dynamic Models'''===
Given the dynamic nature of the environments that edge devices must function, as well as the heterogeneity of the devices themselves, a dynamic model of machine learning is often employed. Such models must keep track of the current available resources including computation usage and power, as well as network traffic. These may change very often depending on the workloads and devices in the system. As such, training models to continuously monitor and dynamically distribute the workloads is a very important part of optimization. Simply offloading larger tasks to more powerful devices may be obsolete if the devices has all of its computing resources or network capabilities being used up by another workload.

The way this is commonly done is by using the profiling step described above as a baseline. Then, a machine learning model utilizes the data to estimate the performance of devices and/or layers. During runtime, a similar process is employed which may update the data used and help the model refine its predictions. Network traffic is also taken into account at this stage in order to preserve the edge computing benefit of providing lower latency. Using all of this data and updates at runtime, the partitioning model is able to dynamically distribute workloads at runtime in order to optimize the workflow and ensure each device is utilizing its resources in the most efficient manner. 2 very good examples of how such a system is specifically deployed are the Neurosurgeon and EdgeShard systems, shown above.

==='''Horizontal and Vertical Partitioning'''===
There are 2 major ways that these models split the workloads in order to optimize the machine learning: Horizontal and vertical partitioning [3]. Given a set of layers that ranges from the cloud to edge, vertical partitioning involves splitting up the workload between the layers. For example, if a large amount of computational resources is deemed necessary, this task may go to the cloud to be completed and preprocessed. One the other hand, if a small amount of computational power is required, this type of work can go to edge devices. Such partitioning also depends on the confidence and accuracy level of the given learning. If the accuracy is completed on an edge device and found to be very low, it can be sent to the cloud; on the other hand if the accuracy is already fairly high and the learning model needs smaller work to reach the threshold deemed acceptable, it may be sent to edge devices to free up network traffic on the cloud and reduce latency [3].

The second model of partitioning is called horizontal partitioning. This involves splitting among the devices within a certain layer rather than among the layers themselves. This is similar to what has been described in previous sections, as it allows a means for fully utilizing the heterogenous abilities that are found in edge devices. Similar functionality and determination to what is found in horizontal partitioning is done, but all of the devices that the workload is split across function on the same layer [3]. To fully optimize a machine learning model, both horizontal and vertical partitioning must be used.

[[File:Edge_computing_layers.png|400px|thumb|center|An example of different layers with multiple devices]]

==='''Distributed Learning'''===
Distributed learning is a process in which instead of giving all of the data to the nodes, each node takes in and processes only a subset of the overall data, then sends its results to a central server which contains the main model. These periodic updates are done by each node, and by only processing part of the data, it is much more easy for edge devices to handle these workloads. It can also reduce the time and computational burden on the cloud and network because it is not only the central server performing all of the computations. One popular method of accomplishing this is by using parallel gradient descent. Gradient descent by itself is a very useful means of training a model, and parallel gradient descent uses a similar process but instead of a single operation, it aggregates the gradient descent calculated by each node in order to update the central model. This efficiently utilizes each node and makes sure that the data that is used to update the model does not exceed the memory constraints of the edge devices being used as nodes.

[[File:Distributed.png|600px|thumb|center| A visualization of data partitioning among nodes doing distributed learning]]

'''Asynchronous Scheduling:''' One important aspect of distributed learning is the means by which the updates are given to the model. Each node may have different amounts of data, memory, and computational power, and therefore the time it takes to process and update will not be the same. Synchronous algorithms make sure that each node finishes its respective processing, and then all of the calculations from all nodes are sent to update the central model. Although, this can make the updating easier, any nodes that lag behind the others will greatly reduce the speed at which the model is trained; this also leaves nodes idle while others are still processing and can be highly inefficient. To optimize this, after each node has finished its respective processing, it sends everything to the main server for an update rather than waiting for all the others to finish. However, this requires some more communication, as after each update, every node must get the updated model for the central server; this can be challenging to repeatedly do, but the efficiency that is gained is often worth it.

'''Federated learning:''' Federated learning is another means by which the data as well as computational burden is distributed among a set of nodes, thus reducing network traffic and strain on the central servers. It is similar to distributed learning, but the key difference lies in the data partitioning. Unlike distributed learning, data used in federated learning is not shared with the central servers. Instead, only local data from each of the nodes is used to train the model. Then, the only part shared with the central model is the updated parameters. A key aspect of this is that federated learning provides a greater amount of data privacy, which is crucial for certain applications dealing with sensitive data. Therefore, it is especially useful to utilize edge devices to perform federated learning. Federated learning is discussed in detail in chapter 5 of this wiki.

==='''Transfer Learning'''===
Transfer learning is a method of machine learning in which a pretrained model is sent to the different nodes for further processing and fine-tuning. The initial training of the model may involve a very large amount of data and could place a major burden on the device that must execute it, which may not be feasible for edge devices. Therefore, the bulk of the model training is done by a more powerful machine, such as the cloud, and then the pretrained model is sent to the edge device. This can be useful, as it reduces the computational burden on the edge device, and allows it to fine-tune the model using the data it collects without having to completely train the whole system. One form of this is knowledge distillation, in which a smaller model can be trained to mimic that of a larger model. This may often be the case when edge and cloud systems are used in a combined way.

==='''Methods of Data Optimization'''===

'''Data Selection:''' Certain types of data may be more useful than others, and therefore the ones that most affect the accuracy of a model can be offloaded to edge devices. This decreases the workload on them because less data must be processed, while also conserving the accuracy of the model as much as possible. Some larger data may not be needed, such as only putting Small Language Models on edge devices that can handle simple commands and prompts, rather than having to offload an entire LLM onto the device, which may overload its computational abilities and not provide enough use.

'''Data Compression:''' Data can be compressed to a smaller form in order to fit the constraints of edge devices. This is especially true given their limited memory, and also makes the workload smaller. Quantization, discussed previously, is a prevalent example of this.

'''Container Workloads:''' Container workloads can be very useful, as they provide all resources and important data the device needs for processing the work. By examining the computational abilities of the device, different sized workloads can be allocated as deemed necessary to maximize the efficiency of the training.

'''Batch size tuning:''' The batch size used by a certain model is very important when considering the memory constraints of edge devices. Batch sizes that are smaller allow for a quicker and more efficient means of training models, and are less likely to lead to bottlenecks in memory. This is related to partitioning because the computing and memory capacity of the devices available are very important factors to consider.

==='''Utilizing Small Language Models (SLMs)'''===
Large Language Models (LLMs) have become a prevalent system recently and are able to do and help with a variety of tasks. However, running and training an LLM requires a significant amount of computational resources which is not feasible when working with edge devices. Most modern LLMs are cloud based, but this may lead to high latency and increased network traffic, especially when working with a large subsystem of nodes.

One way that a similar system can be achieved on edge devices is by using SLMs. These are not as accurate and do not have the vast knowledge of LLMs or the amount of data they are trained on, but for the purposes of basic applications and edge devices, they can be sufficient to accomplish many tasks. They are also often fine-tuned and trained to accomplish the specific tasks which they are deployed for and are much faster and resource-efficient than LLMs. They can also provide much more privacy because they are able to be run on local devices without sharing user data to the cloud. This can be useful for a wide variety of edge applications. If needed and privacy constraints permit, they can query and LLM as needed for more complicated tasks. This means that not every prompt leads to a query, and thus network traffic, privacy, and latency constraints are still preserved.

=='''References'''==
[https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10818760&tag=1 1. M. Zhang, X. Shen, J. Cao, Z. Cui and S. Jiang, "EdgeShard: Efficient LLM Inference via Collaborative Edge Computing," in IEEE Internet of Things Journal, doi: 10.1109/JIOT.2024.3524255]

[https://ieeexplore.ieee.org/abstract/document/8690980 2. X. Chen, H. Zhang, C. Wu, S. Mao, Y. Ji and M. Bennis, "Performance Optimization in Mobile-Edge Computing via Deep Reinforcement Learning," 2018 IEEE 88th Vehicular Technology Conference (VTC-Fall), Chicago, IL, USA, 2018, pp. 1-6, doi: 10.1109/VTCFall.2018.8690980.]

[https://ieeexplore.ieee.org/abstract/document/8976180 3. X. Wang, Y. Han, V. C. M. Leung, D. Niyato, X. Yan and X. Chen, "Convergence of Edge Computing and Deep Learning: A Comprehensive Survey," in IEEE Communications Surveys & Tutorials, vol. 22, no. 2, pp. 869-904, Secondquarter 2020, doi: 10.1109/COMST.2020.2970550.]

[https://dl.acm.org/doi/abs/10.1145/3093337.3037698 4. Kang, Yiping and Hauswald, Johann and Gao, Cao and Rovinski, Austin and Mudge, Trevor and Mars, Jason and Tang, Lingjia, "Neurosurgeon: Collaborative Intelligence Between the Cloud and Mobile Edge" 2017 Association for Computing Machinery, New York, NY, USA, 2017, doi: 10.1145/3093337.3037698.]

[https://dl.acm.org/doi/pdf/10.1145/3555802 5. Hua, Haochen, et al. "Edge computing with artificial intelligence: A machine learning perspective." ACM Computing Surveys 55.9 (2023): 1-35.]

[https://www.mdpi.com/2079-9292/13/3/640 6. Grzesik, Piotr, and Dariusz Mrozek. "Combining machine learning and edge computing: Opportunities, challenges, platforms, frameworks, and use cases." Electronics 13.3 (2024): 640.]

[https://link.springer.com/chapter/10.1007/978-3-030-96756-7_1#citeas 7. S. Rafatirad, H. Homayoun, Z. Chen, and S. M. Pudukotai Dinakarrao, "What Is Applied Machine Learning?," in Machine Learning for Computer Scientists and Data Analysts, Cham, Switzerland: Springer, 2022.]

[https://link.springer.com/article/10.1007/s13748-012-0035-5 8. Peteiro-Barral, Diego, and Bertha Guijarro-Berdiñas. "A survey of methods for distributed machine learning." Progress in Artificial Intelligence 2 (2013): 1-11.]

[https://proceedings.neurips.cc/paper/2010/hash/abea47ba24142ed16b7d8fbf2c740e0d-Abstract.html 9. Zinkevich, Martin, et al. "Parallelized stochastic gradient descent." Advances in neural information processing systems 23 (2010).]

[https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9134370 10. F. Zhuang et al., "A Comprehensive Survey on Transfer Learning," in Proceedings of the IEEE, vol. 109, no. 1, pp. 43-76, Jan. 2021, doi: 10.1109/JPROC.2020.3004555]

[https://proceedings.mlr.press/v97/phuong19a/phuong19a.pdf 11. M. Phuong and C. Lampert, "Towards understanding knowledge distillation," in Proc. Int. Conf. Mach. Learn., May 2019, pp. 5142–5151.]

[https://www.jmlr.org/papers/volume18/16-456/16-456.pdf 12. I. Hubara, M. Courbariaux, D. Soudry, R. El-Yaniv, and Y. Bengio, "Quantized neural networks: Training neural networks with low precision weights and activations," Journal of Machine Learning Research, vol. 18, no. 187, pp. 1–30, 2018.]]

[https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=8588318 13. D. Banik, A. Ekbal, and P. Bhattacharyya, "Machine learning based optimized pruning approach for decoding in statistical machine translation," IEEE Access, vol. 7, pp. 1736–1751, Dec. 2018, doi: 10.1109/ACCESS.2018.2883738.]

Machine Learning at the Edge

2025-04-25T04:23:00Z

Ciarang: /* Dealing With Challenges */

='''Machine Learning at the Edge'''=

=='''4.1 Overview of ML at the Edge'''==
==='''Introduction'''===
Machine Learning is a branch of Artificial Intelligence that is primarily concerned with the training of algorithms to look at sets of data, analyze patterns, and generate conclusions so that the gathered data can be used to generate results and carry out tasks. As an application, machine learning is especially relevant when considering edge computing [7]. Given the recent prevalence and rise in the field of machine learning, it is inevitable that deployment of machine learning algorithms will play a crucial role in the function of edge computing systems as well. The abundant sensor networks often involved in edge systems provide a means of gathering very large amounts of data. With the help of machine learning, such data can be utilized in very effective ways and automation methods can be employed to improve efficiency. Machine learning on edge devices themselves can help ease the burden on cloud systems and the networks connected to them as well, especially if the model does need such large computational power such as in the case of certain Small Language Models (SLMs). However, the main issue regarding machine learning at the edge lies in the devices, which may have limited computational power, and the environment, which may change dynamically and be effected by network congestion, device outages, or other unexpected events. If these challenges can be overcome, edge systems could play a pivotal role in advancing machine learning and utilize it to maximize efficiency.

[[File:ECML.png|600px|thumb|center|]]

==='''Benefits of Machine Learning using Edge Computing'''===
'''Data Volumes:''' Traditional cloud computing architecture may not be able to keep up with the massive volume of data that is often generated by IoT devices and sensors, thus increasing costs as well as pressure and congestion on the networks transmitting data [5]. With edge computing architecture, machine learning can be deployed in a distributed nature, allowing ML models to be deployed and trained across many edge devices. The nature of edge devices sharing computational tasks makes it perfect for taking large datasets and splitting the computational load so that many devices can compute what a single device might not have been able to handle [5]. More than just splitting input data across edge devices, machine learning models can be split up across edge devices as well. This allows for deployment of models that would be otherwise too large to be deployed on a single edge device with limited memory. In such cases, edge nodes are able to collaboratively pass data between them, and the global ML model is later updated based on the workings of each smaller portion [5]. Additionally, the sheer amount of data that processed from edge devices means more training data for machine learning models, which generally increases accuracy, and thus edge computing could be an effective solution to providing lots of good and relevant data for a variety of applications.

'''Lower Latency:''' For certain real-time applications, such as Virtual and Augmented Reality(VR/AR) or smart cars, the latency required to transmit data and process it to the cloud may be too high to make these applications efficient and safe. A key benefit of edge computing lies in the reduced latency provided by putting devices closer to the users, and this is applicable to machine learning as well [5].

'''Enhanced privacy:''' Processing data locally rather than on the cloud reduces the risk of data theft and enhances privacy. The data does not have to be sent to anyone else, or go over a network to potentially get compromised. This is crucial given the large amounts of data needed for machine learning, and especially if personal data is needed for a personal application, many users would prefer the privacy done by having it processed locally [5].

==='''Challenges of Machine Learning Using Edge Computing'''===

'''Potentially Low Computational Power:''' Many edge devices lack the computational power needed for needed for deep learning applications, especially given the large amount of data and complex operations that may have to take place [1].

'''Energy Management:''' Given that edge devices may consist of sensors, or other devices that are meant to have low energy consumption, the tasks associated with machine learning could quickly drain their power, even if they have sufficient power to run the workloads [6].

'''New Attack Surfaces:''' With more devices comes the increased potential for malicious hackers to steal data or compromise devices, despite the enhanced privacy. Encryption, access controls, and other methodologies must be employed to mitigate the potential for new attack surfaces to be exploited [6].

==='''Applications for ML at the Edge'''===
The utilization of data to make conjectures about what is going on in the environment and how to respond has a variety of use cases that can greatly benefit people, cities, and the environment. By leveraging and monitoring a constant stream of data and training machine learning models to detect or even respond to different events, there can be many practical applications for such systems. These would rely on the combination of edge devices and machine learning to better enhance experience for users and detect events of interest.

'''Self-Driving Cars''': Imitation learning can be leveraged to better understand and emulate human driving. The low latency that can be provided by edge computing is especially useful for the quick decision making needed by these systems [7].

'''Smart Home Devices''': Understanding user habits by leveraging the available data for them can make smart homes more convenient for users. The increased privacy that can come with edge computing, along with a few extra cybersecurity measures, can ensure the personal data that may be used for training is not compromised.

'''Environmental and Industrial Monitoring:''' Sensors deployed in environments and industrial settings could be trained to recognize when there are anomalies or undesirable behavior, thus ensuring a quick response and active information sending [7].

'''Smart Cities:''' Similar to above, the data collected by sensors in cities can leverage machine learning to help in crime and emergency detection, traffic management, or energy management[7].

=='''4.2 ML Training at the Edge'''==

Machine Learning (ML) training at the edge is basically the process of developing, updating, or fine-tuning ML models directly on edge devices like on smartphones, IoT sensors, wearables, and other embedded systems instead of only depending on centralized cloud infrastructure. This approach is becoming a lot more important as the demand for real-time, personalized AI applications continues to grow. By being able to train models closer to where the data is generated, edge-based ML enables faster responses, helps reduce latency, and enhances user privacy by minimizing the need to transmit sensitive data to the cloud. It’s also especially useful in scenarios where devices operate in environments with limited or unreliable network connectivity, allowing them to function more efficiently.

[[File:Machine Learning Pipeline.png|500px|thumb|center|Diagram of the machine learning pipeline: raw structured and unstructured data is preprocessed, analyzed, and used for feature selection, followed by model construction, evaluation, and deployment for application use.]]

[[File:Learning.png|500px|thumb|center| This diagram shows the relationship among different devices in the edge-cloud system. As shown, the deep learning workload is partitioned among all layers so that each one contributes based on its advantages.]]

==='''Benefits:'''===
One significant advantage of training ML models directly on edge devices is reduced latency. By processing data locally, devices can make immediate decisions without the delays caused by transmitting data back and forth to cloud servers. This immediate responsiveness is extremely important for applications like real time health monitoring, autonomous driving, and industrial automation.

Additionally, training machine learning models at the edge significantly enhances user privacy. Since sensitive data can be processed and stored directly on the user's device rather than being sent to centralized cloud servers, the risk of data breaches or unauthorized access during transmission is reduced by a lot. This local data handling is able to prevent exposure of personal or confidential information, providing users greater control over their data. Edge-based training naturally aligns with privacy regulations such as the General Data Protection Regulation (GDPR), which emphasizes strict data security, transparency, and explicit user consent. By keeping personal data localized, edge training not only improves security but also helps organizations easily comply with privacy laws, protecting users’ rights and maintaining trust.

Efficiency and resilience are important benefits of edge training. By training machine learning models directly on edge devices, these devices become capable of processing data locally without relying on constant internet connectivity. This local processing allows edge devices to continue operating effectively even in environments where network connections are weak, unstable, or completely unavailable. Because they are not fully dependent on cloud infrastructure, edge devices can quickly adapt to changes, respond in real-time, and update their ML models based on immediate local data. As a result, edge training ensures reliable performance and uninterrupted operation, making it particularly valuable for remote locations, emergency scenarios, and harsh environments where cloud-based solutions might fail or become unreliable.

Examples:

A smart thermostat in a home can learn a user’s preferences for temperature and adjust automatically based on real-time inputs, like time of day or weather conditions. Similarly, a fitness tracker can track user activity patterns and adapt its recommendations for workouts or rest periods based on how the user is performing each day. These devices don’t need to rely on cloud servers to update or personalize their behavior — they can do it instantly on the device, which makes them more responsive and efficient.

In smart agriculture, edge computing is used to enhance crop monitoring and optimize farming practices. Devices like soil sensors, drones, and automated irrigation systems are equipped with sensors that collect data on soil moisture, temperature, and crop health. Edge devices process this data locally, enabling real-time decisions for tasks like irrigation, fertilization, and pest control.

In smart retail, edge computing is used to improve inventory management and customer experience. Retailers use smart shelves, RFID tags, and in-store cameras equipped with sensors to track inventory and monitor customer behavior. By processing this data locally on edge devices, retailers can manage stock levels, detect theft, and optimize store layouts in real-time. RFID tags placed on products can detect when an item is removed from the shelf. Using edge processing, the system can immediately update the inventory count and trigger a restocking request if an item’s stock is low.

Research Papers

An important contribution to the understanding of machine learning (ML) training at the edge is the research paper "Making Distributed Edge Machine Learning for Resource-Constrained Communities and Environments Smarter: Contexts and Challenges" by Truong et al. (2023). This paper focuses on training ML models directly on edge devices in communities and environments facing limitations, such as unstable network connections, limited computational resources, and scarce technical expertise. The authors emphasize the necessity of developing context-aware ML training methods specifically tailored to these environments. Traditional centralized ML training methods often fail to operate effectively in such constrained settings, highlighting the need for decentralized, localized solutions. Truong et al. explore various challenges, including managing data efficiently, deploying suitable software frameworks, and designing intelligent runtime strategies that allow edge devices to train models effectively despite limited resources. Their work points out significant research directions, advocating for more adaptable and sustainable ML training solutions that genuinely reflect the technological and social contexts of resource-limited environments.

==='''Tools and Frameworks:'''===

Frameworks like TensorFlow Lite, PyTorch Mobile, and Edge Impulse are designed to support edge-based model training and inference. These tools allow developers to build and fine-tune models specifically for deployment on low-power devices.

==='''Technical Challenges:'''===

Despite its advantages, ML training at the edge presents challenges, including limited processing power, memory constraints, and energy efficiency. Edge devices often lack the computational resources of cloud servers, requiring lightweight models, optimized algorithms, and energy-efficient hardware.

Real World Applications:
A well known example is Apple’s use of on-device training for personalized voice recognition with Siri. Instead of uploading user voice data to the cloud, Apple uses local training to improve accuracy over time while maintaining user privacy.

==='''Dealing With Challenges'''===
Despite the challenges, of ML at the edge, there are a variety of methods that can be used to provide a more efficient means of training, and making the heavy workloads compatible with even the limited computing power of certain edge devices.

'''Quantization:'''
Quantization is a method that involves reducing the precision of numbers, and thus easing the burden of computational power as well as memory management on the edge devices. There are multiple forms of quantization, but each one essentially sacrifices some precision - enough so that accuracy is mostly maintained but the numbers are easier to handle. For example, converting from floating point to integer datatypes means significantly less memory is used, and the differences for some models in the precision may be negligible. Another example is K-means based Weight Quantization, which involves creating a matrix and grouping similar numbers together with centroids. An example is shown below:

[[File:Screenshot_2025-04-24_170604.png|500px|thumb|right|By clustering each index in the matrix, and using centroids to approximate, the overall computations can be done much quicker and are more easily handled by edge devices]]

In recent work, Quantized Neural Networks (QNNs) have demonstrated that even extreme quantization—such as using just 1-bit values for weights and activations—can retain near state-of-the-art accuracy across vision and language tasks [12]. This type of quantization drastically reduces memory access requirements and replaces expensive arithmetic operations with fast, low-power bitwise operations like XNOR and popcount. These benefits are especially important for edge deployment, where energy efficiency is critical. In addition to model compression, Hubara et al. also show that quantized gradients—using as little as 6 bits—can be employed during training with minimal performance loss, further enabling efficient on-device learning [12]. QNNs have achieved strong results even on demanding benchmarks like ImageNet, while offering significant speedups and memory savings, making them one of the most practical solutions for edge AI deployment [12].

'''Pruning:'''
Pruning is an optimization technique that systematically removes low-salience parameters—such as weakly contributing weights or redundant hypothesis paths—from a machine learning model or decoding algorithm to reduce computational overhead. In the context of edge computing, where resources like memory bandwidth, power, and processing time are limited, pruning enables the deployment of performant models within strict efficiency constraints.

In statistical machine translation (SMT) systems, pruning is particularly critical during the decoding phase, where the search space of possible translations grows exponentially with sentence length. Techniques such as histogram pruning and threshold pruning are employed to manage this complexity. Histogram pruning restricts the number of candidate hypotheses retained in a decoding stack to a fixed size 𝑛, discarding the remainder. Threshold pruning eliminates hypotheses whose scores fall below a proportion 𝛼 of the best-scoring candidate, effectively filtering out weak candidates early.

The paper by Banik et al. introduces a machine learning-based dynamic pruning framework that adaptively tunes pruning parameters—namely stack size and beam threshold—based on structural features of the input text, such as sentence length, syntactic complexity, and the distribution of stop words. Rather than relying on static hyperparameters, this method uses a classifier (CN2 algorithm) trained on performance data to predict optimal pruning configurations at runtime. Experimental results showed consistent reductions in decoding latency (up to 90%) while maintaining or improving translation quality, as measured by BLEU scores [13].

This adaptive pruning paradigm is highly relevant to edge inference pipelines, where models must maintain a balance between latency and predictive accuracy. By intelligently limiting the hypothesis space and focusing computational resources on high-probability paths, pruning supports real-time, resource-efficient processing in edge NLP and embedded translation systems.

[[File:Pruning.png|350px|thumb|center|This shows how pruning can significantly reduce the overall network, thus leading to better computational and memory management]]

'''Distillation:'''
Distillation is a key strategy for reducing model complexity in edge computing environments. Instead of training a compact student model on hard labels—discrete class labels like 0, 1, or 2—it is trained on the soft outputs of a larger teacher model. These soft labels represent probability distributions over all classes, offering more nuanced supervision. For instance, rather than telling the student the input belongs strictly to class 3, a teacher might output “70% class 3, 25% class 2, 5% class 1.” This richer feedback helps the student model capture subtle relationships between classes that hard labels miss. Beyond reducing computational demands, distillation enhances generalization by conveying more informative training signals. It also benefits from favorable data geometry—when class distributions are well-separated and aligned—and exhibits strong monotonicity, meaning the student model reliably improves as more data becomes available [11]. These properties make it exceptionally suited for edge devices where training data may be limited, but efficient inference is crucial.

In most cases, knowledge distillation in edge environments involves a large, high-capacity model trained in the cloud acting as the teacher, while the smaller, lightweight student model is deployed on edge devices. A less common—but emerging—practice is edge-to-edge distillation, where a more powerful edge node or edge server functions as the teacher for other nearby edge devices. This setup is especially valuable in federated, collaborative, or hierarchical edge networks, where cloud connectivity may be limited or privacy concerns necessitate local training. Distillation can also be combined with techniques such as quantization or pruning to further optimize model performance under hardware constraints. An example is shown below:

[[File:Knowledge_Distillation.png|700px|thumb|center|This shows how a complex teacher model transfers learned knowledge to a smaller student model using soft predictions to enable efficient edge deployment]]

==='''Usage and Applications of AI Agents'''===
As AI develops and is able to identify different images, generate chats and utilize human language, and optimize different systems, the next step is for it to be able to efficiently perform tasks specified by users. This is known as an AI agent. The agent must first perceive its environment and task, often specified by a user, then it must think about what is the best means to accomplish the task, and finally be able to act on it. 3 aspects of AI agents are crucial in their development to efficiently and correctly carry out task:

'''Reasoning:''' The agent must be able to think sequentially, and decompose its specified tasks into a sequence of specific steps in order to accomplish its goal. It must also have some memory storage in order to remember what it has done, as well as the results of its sequence of actions in order to learn for future steps.

'''Autonomy:''' The agent must choose from the availability of possible steps, and operate based on its reasoning without step-by-step instructions from the user.

'''Tools:''' These tasks, however, are impossible to accomplish without the correct tools. Even if an AI agent understands how to go about carrying a task for optimal results, it must have the actual means to do it. This can include the ability to use and interact with APIs, interpret code, and access certain databases.

Utilizing AI agents on edge devices can be tricky due to the computational and reasoning power needed. However, there are methods to accomplish this such as SLMs which query LLMs as need (discussed later), or utilizing more powerful edge devices to carry out tasks. However, utilizing edge devices can be paramount if latency is a major issue, or if the agent is exposed to sensitive user data. Additionally, by using edge devices specific to a user, it may be able to better learn a user's patterns and preferences and react accordingly to provide the best possible outcome for that user.

=='''4.3 ML Model Optimization at the Edge'''==

==='''The Need for Model Optimization at the Edge'''===
Given the constrained resources, along with the inherently dynamic environment that edge devices must function in, model optimization is a crucial part of machine learning in edge computing. The current most widely used methodology consists of simply specifying an exceptionally large set of parameters, and giving it to the model to train on. This can be feasible when hardware is very advanced and powerful, and is necessary for systems such as Large Language Models (LLMs). However, this is no longer viable when dealing with the devices and environments at the edge. It is crucial to identify the best parameters and training methodology so as to minimize the amount of work done by these devices, while compromising as little as possible on the accuracy of the models. There are multiple ways to this, and they include either optimization or augmentation of the dataset itself, or optimization of the partition of work among the edge devices.

==='''Edge and Cloud Collaboration'''===
One methodology that is often used involves collaboration between both Edge and Cloud Devices. The cloud has the ability to process workloads that may require much more resources and cannot be done on edge devices. On the other hand, edge devices, which can store and process data locally, may have lower latency and more privacy. Given the advantages of each of these, many have proposed that the best way to handle machine learning is through a combination of edge and cloud computing.

The primary issue facing this computing paradigm, however, is the problem of optimally selecting which workloads should be done on the cloud and which should be done on the edge. This is a crucial problem to solve, as the correct partition of workloads is the best way to ensure that the respective benefits of the devices can be leveraged. A common way to do this, is to run certain computing tasks on the necessary devices and determine the length of time and resources that it takes. An example of this is the profiling step done in [https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10818760&tag=1 EdgeShard] [1] and [https://dl.acm.org/doi/pdf/10.1145/3093337.3037698 Neurosurgeon] [4]. Other frameworks implement similar steps, where the capabilities of different devices are tested in order to allocate their workloads and determine the limit at which they can provide efficient functionality. If the workload is beyond the limits of the devices, it can be sent to the cloud for processing

The key advantage of this is that it is able to utilize the resources of the edge devices as necessary, allowing increased data privacy and lower latency. Since workloads are only processed in the cloud as needed, this will reduce the overall amount of time needed for processing because data is not constantly sent back and forth. It also allows for much less network congestion, which is crucial for many applications.

[[File:ECcollab.png|400px|thumb|center|The collaboration of Edge and Cloud Devices]]

==='''Optimizing Workload Partitioning'''===
The key idea for much of the optimization done in machine learning on edge systems involves fully utilizing the heterogenous devices that are often contained in these systems. As such, it is important to understand the capabilities of each device so as to fully utilize its advantages. Devices can very greatly from smartphones with more powerful computational abilities to raspberry pis to sensors. More difficult tasks are offloaded to the powerful devices, while simpler tasks, or models that have been somewhat pretrained can be sent to the smaller devices. In some cases, as in [https://ieeexplore.ieee.org/abstract/document/8690980 Mobile-Edge] [2], the task may be dropped altogether if the resources are deemed insufficient. In this way, exceptionally difficult tasks do not block tasks that have the ability to be executed and therefore the system can continue working.

==='''Dynamic Models'''===
Given the dynamic nature of the environments that edge devices must function, as well as the heterogeneity of the devices themselves, a dynamic model of machine learning is often employed. Such models must keep track of the current available resources including computation usage and power, as well as network traffic. These may change very often depending on the workloads and devices in the system. As such, training models to continuously monitor and dynamically distribute the workloads is a very important part of optimization. Simply offloading larger tasks to more powerful devices may be obsolete if the devices has all of its computing resources or network capabilities being used up by another workload.

The way this is commonly done is by using the profiling step described above as a baseline. Then, a machine learning model utilizes the data to estimate the performance of devices and/or layers. During runtime, a similar process is employed which may update the data used and help the model refine its predictions. Network traffic is also taken into account at this stage in order to preserve the edge computing benefit of providing lower latency. Using all of this data and updates at runtime, the partitioning model is able to dynamically distribute workloads at runtime in order to optimize the workflow and ensure each device is utilizing its resources in the most efficient manner. 2 very good examples of how such a system is specifically deployed are the Neurosurgeon and EdgeShard systems, shown above.

==='''Horizontal and Vertical Partitioning'''===
There are 2 major ways that these models split the workloads in order to optimize the machine learning: Horizontal and vertical partitioning [3]. Given a set of layers that ranges from the cloud to edge, vertical partitioning involves splitting up the workload between the layers. For example, if a large amount of computational resources is deemed necessary, this task may go to the cloud to be completed and preprocessed. One the other hand, if a small amount of computational power is required, this type of work can go to edge devices. Such partitioning also depends on the confidence and accuracy level of the given learning. If the accuracy is completed on an edge device and found to be very low, it can be sent to the cloud; on the other hand if the accuracy is already fairly high and the learning model needs smaller work to reach the threshold deemed acceptable, it may be sent to edge devices to free up network traffic on the cloud and reduce latency [3].

The second model of partitioning is called horizontal partitioning. This involves splitting among the devices within a certain layer rather than among the layers themselves. This is similar to what has been described in previous sections, as it allows a means for fully utilizing the heterogenous abilities that are found in edge devices. Similar functionality and determination to what is found in horizontal partitioning is done, but all of the devices that the workload is split across function on the same layer [3]. To fully optimize a machine learning model, both horizontal and vertical partitioning must be used.

[[File:Edge_computing_layers.png|400px|thumb|center|An example of different layers with multiple devices]]

==='''Distributed Learning'''===
Distributed learning is a process in which instead of giving all of the data to the nodes, each node takes in and processes only a subset of the overall data, then sends its results to a central server which contains the main model. These periodic updates are done by each node, and by only processing part of the data, it is much more easy for edge devices to handle these workloads. It can also reduce the time and computational burden on the cloud and network because it is not only the central server performing all of the computations. One popular method of accomplishing this is by using parallel gradient descent. Gradient descent by itself is a very useful means of training a model, and parallel gradient descent uses a similar process but instead of a single operation, it aggregates the gradient descent calculated by each node in order to update the central model. This efficiently utilizes each node and makes sure that the data that is used to update the model does not exceed the memory constraints of the edge devices being used as nodes.

[[File:Distributed.png|600px|thumb|center| A visualization of data partitioning among nodes doing distributed learning]]

'''Asynchronous Scheduling:''' One important aspect of distributed learning is the means by which the updates are given to the model. Each node may have different amounts of data, memory, and computational power, and therefore the time it takes to process and update will not be the same. Synchronous algorithms make sure that each node finishes its respective processing, and then all of the calculations from all nodes are sent to update the central model. Although, this can make the updating easier, any nodes that lag behind the others will greatly reduce the speed at which the model is trained; this also leaves nodes idle while others are still processing and can be highly inefficient. To optimize this, after each node has finished its respective processing, it sends everything to the main server for an update rather than waiting for all the others to finish. However, this requires some more communication, as after each update, every node must get the updated model for the central server; this can be challenging to repeatedly do, but the efficiency that is gained is often worth it.

'''Federated learning:''' Federated learning is another means by which the data as well as computational burden is distributed among a set of nodes, thus reducing network traffic and strain on the central servers. It is similar to distributed learning, but the key difference lies in the data partitioning. Unlike distributed learning, data used in federated learning is not shared with the central servers. Instead, only local data from each of the nodes is used to train the model. Then, the only part shared with the central model is the updated parameters. A key aspect of this is that federated learning provides a greater amount of data privacy, which is crucial for certain applications dealing with sensitive data. Therefore, it is especially useful to utilize edge devices to perform federated learning. Federated learning is discussed in detail in chapter 5 of this wiki.

==='''Transfer Learning'''===
Transfer learning is a method of machine learning in which a pretrained model is sent to the different nodes for further processing and fine-tuning. The initial training of the model may involve a very large amount of data and could place a major burden on the device that must execute it, which may not be feasible for edge devices. Therefore, the bulk of the model training is done by a more powerful machine, such as the cloud, and then the pretrained model is sent to the edge device. This can be useful, as it reduces the computational burden on the edge device, and allows it to fine-tune the model using the data it collects without having to completely train the whole system. One form of this is knowledge distillation, in which a smaller model can be trained to mimic that of a larger model. This may often be the case when edge and cloud systems are used in a combined way.

==='''Methods of Data Optimization'''===

'''Data Selection:''' Certain types of data may be more useful than others, and therefore the ones that most affect the accuracy of a model can be offloaded to edge devices. This decreases the workload on them because less data must be processed, while also conserving the accuracy of the model as much as possible. Some larger data may not be needed, such as only putting Small Language Models on edge devices that can handle simple commands and prompts, rather than having to offload an entire LLM onto the device, which may overload its computational abilities and not provide enough use.

'''Data Compression:''' Data can be compressed to a smaller form in order to fit the constraints of edge devices. This is especially true given their limited memory, and also makes the workload smaller. Quantization, discussed previously, is a prevalent example of this.

'''Container Workloads:''' Container workloads can be very useful, as they provide all resources and important data the device needs for processing the work. By examining the computational abilities of the device, different sized workloads can be allocated as deemed necessary to maximize the efficiency of the training.

'''Batch size tuning:''' The batch size used by a certain model is very important when considering the memory constraints of edge devices. Batch sizes that are smaller allow for a quicker and more efficient means of training models, and are less likely to lead to bottlenecks in memory. This is related to partitioning because the computing and memory capacity of the devices available are very important factors to consider.

==='''Utilizing Small Language Models (SLMs)'''===
Large Language Models (LLMs) have become a prevalent system recently and are able to do and help with a variety of tasks. However, running and training an LLM requires a significant amount of computational resources which is not feasible when working with edge devices. Most modern LLMs are cloud based, but this may lead to high latency and increased network traffic, especially when working with a large subsystem of nodes.

One way that a similar system can be achieved on edge devices is by using SLMs. These are not as accurate and do not have the vast knowledge of LLMs or the amount of data they are trained on, but for the purposes of basic applications and edge devices, they can be sufficient to accomplish many tasks. They are also often fine-tuned and trained to accomplish the specific tasks which they are deployed for and are much faster and resource-efficient than LLMs. They can also provide much more privacy because they are able to be run on local devices without sharing user data to the cloud. This can be useful for a wide variety of edge applications. If needed and privacy constraints permit, they can query and LLM as needed for more complicated tasks. This means that not every prompt leads to a query, and thus network traffic, privacy, and latency constraints are still preserved.

=='''References'''==
[https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10818760&tag=1 1. M. Zhang, X. Shen, J. Cao, Z. Cui and S. Jiang, "EdgeShard: Efficient LLM Inference via Collaborative Edge Computing," in IEEE Internet of Things Journal, doi: 10.1109/JIOT.2024.3524255]

[https://ieeexplore.ieee.org/abstract/document/8690980 2. X. Chen, H. Zhang, C. Wu, S. Mao, Y. Ji and M. Bennis, "Performance Optimization in Mobile-Edge Computing via Deep Reinforcement Learning," 2018 IEEE 88th Vehicular Technology Conference (VTC-Fall), Chicago, IL, USA, 2018, pp. 1-6, doi: 10.1109/VTCFall.2018.8690980.]

[https://ieeexplore.ieee.org/abstract/document/8976180 3. X. Wang, Y. Han, V. C. M. Leung, D. Niyato, X. Yan and X. Chen, "Convergence of Edge Computing and Deep Learning: A Comprehensive Survey," in IEEE Communications Surveys & Tutorials, vol. 22, no. 2, pp. 869-904, Secondquarter 2020, doi: 10.1109/COMST.2020.2970550.]

[https://dl.acm.org/doi/abs/10.1145/3093337.3037698 4. Kang, Yiping and Hauswald, Johann and Gao, Cao and Rovinski, Austin and Mudge, Trevor and Mars, Jason and Tang, Lingjia, "Neurosurgeon: Collaborative Intelligence Between the Cloud and Mobile Edge" 2017 Association for Computing Machinery, New York, NY, USA, 2017, doi: 10.1145/3093337.3037698.]

[https://dl.acm.org/doi/pdf/10.1145/3555802 5. Hua, Haochen, et al. "Edge computing with artificial intelligence: A machine learning perspective." ACM Computing Surveys 55.9 (2023): 1-35.]

[https://www.mdpi.com/2079-9292/13/3/640 6. Grzesik, Piotr, and Dariusz Mrozek. "Combining machine learning and edge computing: Opportunities, challenges, platforms, frameworks, and use cases." Electronics 13.3 (2024): 640.]

[https://link.springer.com/chapter/10.1007/978-3-030-96756-7_1#citeas 7. S. Rafatirad, H. Homayoun, Z. Chen, and S. M. Pudukotai Dinakarrao, "What Is Applied Machine Learning?," in Machine Learning for Computer Scientists and Data Analysts, Cham, Switzerland: Springer, 2022.]

[https://link.springer.com/article/10.1007/s13748-012-0035-5 8. Peteiro-Barral, Diego, and Bertha Guijarro-Berdiñas. "A survey of methods for distributed machine learning." Progress in Artificial Intelligence 2 (2013): 1-11.]

[https://proceedings.neurips.cc/paper/2010/hash/abea47ba24142ed16b7d8fbf2c740e0d-Abstract.html 9. Zinkevich, Martin, et al. "Parallelized stochastic gradient descent." Advances in neural information processing systems 23 (2010).]

[https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9134370 10. F. Zhuang et al., "A Comprehensive Survey on Transfer Learning," in Proceedings of the IEEE, vol. 109, no. 1, pp. 43-76, Jan. 2021, doi: 10.1109/JPROC.2020.3004555]

[https://proceedings.mlr.press/v97/phuong19a/phuong19a.pdf 11. M. Phuong and C. Lampert, "Towards understanding knowledge distillation," in Proc. Int. Conf. Mach. Learn., May 2019, pp. 5142–5151.]

[https://www.jmlr.org/papers/volume18/16-456/16-456.pdf 12. I. Hubara, M. Courbariaux, D. Soudry, R. El-Yaniv, and Y. Bengio, "Quantized neural networks: Training neural networks with low precision weights and activations," Journal of Machine Learning Research, vol. 18, no. 187, pp. 1–30, 2018.]]

[https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=8588318 13. D. Banik, A. Ekbal, and P. Bhattacharyya, "Machine learning based optimized pruning approach for decoding in statistical machine translation," IEEE Access, vol. 7, pp. 1736–1751, Dec. 2018, doi: 10.1109/ACCESS.2018.2883738.]

Machine Learning at the Edge

2025-04-25T04:18:49Z

Ciarang: /* References */

='''Machine Learning at the Edge'''=

=='''4.1 Overview of ML at the Edge'''==
==='''Introduction'''===
Machine Learning is a branch of Artificial Intelligence that is primarily concerned with the training of algorithms to look at sets of data, analyze patterns, and generate conclusions so that the gathered data can be used to generate results and carry out tasks. As an application, machine learning is especially relevant when considering edge computing [7]. Given the recent prevalence and rise in the field of machine learning, it is inevitable that deployment of machine learning algorithms will play a crucial role in the function of edge computing systems as well. The abundant sensor networks often involved in edge systems provide a means of gathering very large amounts of data. With the help of machine learning, such data can be utilized in very effective ways and automation methods can be employed to improve efficiency. Machine learning on edge devices themselves can help ease the burden on cloud systems and the networks connected to them as well, especially if the model does need such large computational power such as in the case of certain Small Language Models (SLMs). However, the main issue regarding machine learning at the edge lies in the devices, which may have limited computational power, and the environment, which may change dynamically and be effected by network congestion, device outages, or other unexpected events. If these challenges can be overcome, edge systems could play a pivotal role in advancing machine learning and utilize it to maximize efficiency.

[[File:ECML.png|600px|thumb|center|]]

==='''Benefits of Machine Learning using Edge Computing'''===
'''Data Volumes:''' Traditional cloud computing architecture may not be able to keep up with the massive volume of data that is often generated by IoT devices and sensors, thus increasing costs as well as pressure and congestion on the networks transmitting data [5]. With edge computing architecture, machine learning can be deployed in a distributed nature, allowing ML models to be deployed and trained across many edge devices. The nature of edge devices sharing computational tasks makes it perfect for taking large datasets and splitting the computational load so that many devices can compute what a single device might not have been able to handle [5]. More than just splitting input data across edge devices, machine learning models can be split up across edge devices as well. This allows for deployment of models that would be otherwise too large to be deployed on a single edge device with limited memory. In such cases, edge nodes are able to collaboratively pass data between them, and the global ML model is later updated based on the workings of each smaller portion [5]. Additionally, the sheer amount of data that processed from edge devices means more training data for machine learning models, which generally increases accuracy, and thus edge computing could be an effective solution to providing lots of good and relevant data for a variety of applications.

'''Lower Latency:''' For certain real-time applications, such as Virtual and Augmented Reality(VR/AR) or smart cars, the latency required to transmit data and process it to the cloud may be too high to make these applications efficient and safe. A key benefit of edge computing lies in the reduced latency provided by putting devices closer to the users, and this is applicable to machine learning as well [5].

'''Enhanced privacy:''' Processing data locally rather than on the cloud reduces the risk of data theft and enhances privacy. The data does not have to be sent to anyone else, or go over a network to potentially get compromised. This is crucial given the large amounts of data needed for machine learning, and especially if personal data is needed for a personal application, many users would prefer the privacy done by having it processed locally [5].

==='''Challenges of Machine Learning Using Edge Computing'''===

'''Potentially Low Computational Power:''' Many edge devices lack the computational power needed for needed for deep learning applications, especially given the large amount of data and complex operations that may have to take place [1].

'''Energy Management:''' Given that edge devices may consist of sensors, or other devices that are meant to have low energy consumption, the tasks associated with machine learning could quickly drain their power, even if they have sufficient power to run the workloads [6].

'''New Attack Surfaces:''' With more devices comes the increased potential for malicious hackers to steal data or compromise devices, despite the enhanced privacy. Encryption, access controls, and other methodologies must be employed to mitigate the potential for new attack surfaces to be exploited [6].

==='''Applications for ML at the Edge'''===
The utilization of data to make conjectures about what is going on in the environment and how to respond has a variety of use cases that can greatly benefit people, cities, and the environment. By leveraging and monitoring a constant stream of data and training machine learning models to detect or even respond to different events, there can be many practical applications for such systems. These would rely on the combination of edge devices and machine learning to better enhance experience for users and detect events of interest.

'''Self-Driving Cars''': Imitation learning can be leveraged to better understand and emulate human driving. The low latency that can be provided by edge computing is especially useful for the quick decision making needed by these systems [7].

'''Smart Home Devices''': Understanding user habits by leveraging the available data for them can make smart homes more convenient for users. The increased privacy that can come with edge computing, along with a few extra cybersecurity measures, can ensure the personal data that may be used for training is not compromised.

'''Environmental and Industrial Monitoring:''' Sensors deployed in environments and industrial settings could be trained to recognize when there are anomalies or undesirable behavior, thus ensuring a quick response and active information sending [7].

'''Smart Cities:''' Similar to above, the data collected by sensors in cities can leverage machine learning to help in crime and emergency detection, traffic management, or energy management[7].

=='''4.2 ML Training at the Edge'''==

Machine Learning (ML) training at the edge is basically the process of developing, updating, or fine-tuning ML models directly on edge devices like on smartphones, IoT sensors, wearables, and other embedded systems instead of only depending on centralized cloud infrastructure. This approach is becoming a lot more important as the demand for real-time, personalized AI applications continues to grow. By being able to train models closer to where the data is generated, edge-based ML enables faster responses, helps reduce latency, and enhances user privacy by minimizing the need to transmit sensitive data to the cloud. It’s also especially useful in scenarios where devices operate in environments with limited or unreliable network connectivity, allowing them to function more efficiently.

[[File:Machine Learning Pipeline.png|500px|thumb|center|Diagram of the machine learning pipeline: raw structured and unstructured data is preprocessed, analyzed, and used for feature selection, followed by model construction, evaluation, and deployment for application use.]]

[[File:Learning.png|500px|thumb|center| This diagram shows the relationship among different devices in the edge-cloud system. As shown, the deep learning workload is partitioned among all layers so that each one contributes based on its advantages.]]

==='''Benefits:'''===
One significant advantage of training ML models directly on edge devices is reduced latency. By processing data locally, devices can make immediate decisions without the delays caused by transmitting data back and forth to cloud servers. This immediate responsiveness is extremely important for applications like real time health monitoring, autonomous driving, and industrial automation.

Additionally, training machine learning models at the edge significantly enhances user privacy. Since sensitive data can be processed and stored directly on the user's device rather than being sent to centralized cloud servers, the risk of data breaches or unauthorized access during transmission is reduced by a lot. This local data handling is able to prevent exposure of personal or confidential information, providing users greater control over their data. Edge-based training naturally aligns with privacy regulations such as the General Data Protection Regulation (GDPR), which emphasizes strict data security, transparency, and explicit user consent. By keeping personal data localized, edge training not only improves security but also helps organizations easily comply with privacy laws, protecting users’ rights and maintaining trust.

Efficiency and resilience are important benefits of edge training. By training machine learning models directly on edge devices, these devices become capable of processing data locally without relying on constant internet connectivity. This local processing allows edge devices to continue operating effectively even in environments where network connections are weak, unstable, or completely unavailable. Because they are not fully dependent on cloud infrastructure, edge devices can quickly adapt to changes, respond in real-time, and update their ML models based on immediate local data. As a result, edge training ensures reliable performance and uninterrupted operation, making it particularly valuable for remote locations, emergency scenarios, and harsh environments where cloud-based solutions might fail or become unreliable.

Examples:

A smart thermostat in a home can learn a user’s preferences for temperature and adjust automatically based on real-time inputs, like time of day or weather conditions. Similarly, a fitness tracker can track user activity patterns and adapt its recommendations for workouts or rest periods based on how the user is performing each day. These devices don’t need to rely on cloud servers to update or personalize their behavior — they can do it instantly on the device, which makes them more responsive and efficient.

In smart agriculture, edge computing is used to enhance crop monitoring and optimize farming practices. Devices like soil sensors, drones, and automated irrigation systems are equipped with sensors that collect data on soil moisture, temperature, and crop health. Edge devices process this data locally, enabling real-time decisions for tasks like irrigation, fertilization, and pest control.

In smart retail, edge computing is used to improve inventory management and customer experience. Retailers use smart shelves, RFID tags, and in-store cameras equipped with sensors to track inventory and monitor customer behavior. By processing this data locally on edge devices, retailers can manage stock levels, detect theft, and optimize store layouts in real-time. RFID tags placed on products can detect when an item is removed from the shelf. Using edge processing, the system can immediately update the inventory count and trigger a restocking request if an item’s stock is low.

Research Papers

An important contribution to the understanding of machine learning (ML) training at the edge is the research paper "Making Distributed Edge Machine Learning for Resource-Constrained Communities and Environments Smarter: Contexts and Challenges" by Truong et al. (2023). This paper focuses on training ML models directly on edge devices in communities and environments facing limitations, such as unstable network connections, limited computational resources, and scarce technical expertise. The authors emphasize the necessity of developing context-aware ML training methods specifically tailored to these environments. Traditional centralized ML training methods often fail to operate effectively in such constrained settings, highlighting the need for decentralized, localized solutions. Truong et al. explore various challenges, including managing data efficiently, deploying suitable software frameworks, and designing intelligent runtime strategies that allow edge devices to train models effectively despite limited resources. Their work points out significant research directions, advocating for more adaptable and sustainable ML training solutions that genuinely reflect the technological and social contexts of resource-limited environments.

==='''Tools and Frameworks:'''===

Frameworks like TensorFlow Lite, PyTorch Mobile, and Edge Impulse are designed to support edge-based model training and inference. These tools allow developers to build and fine-tune models specifically for deployment on low-power devices.

==='''Technical Challenges:'''===

Despite its advantages, ML training at the edge presents challenges, including limited processing power, memory constraints, and energy efficiency. Edge devices often lack the computational resources of cloud servers, requiring lightweight models, optimized algorithms, and energy-efficient hardware.

Real World Applications:
A well known example is Apple’s use of on-device training for personalized voice recognition with Siri. Instead of uploading user voice data to the cloud, Apple uses local training to improve accuracy over time while maintaining user privacy.

==='''Dealing With Challenges'''===
Despite the challenges, of ML at the edge, there are a variety of methods that can be used to provide a more efficient means of training, and making the heavy workloads compatible with even the limited computing power of certain edge devices.

'''Quantization:'''
Quantization is a method that involves reducing the precision of numbers, and thus easing the burden of computational power as well as memory management on the edge devices. There are multiple forms of quantization, but each one essentially sacrifices some precision - enough so that accuracy is mostly maintained but the numbers are easier to handle. For example, converting from floating point to integer datatypes means significantly less memory is used, and the differences for some models in the precision may be negligible. Another example is K-means based Weight Quantization, which involves creating a matrix and grouping similar numbers together with centroids. An example is shown below:

[[File:Screenshot_2025-04-24_170604.png|500px|thumb|center|By clustering each index in the matrix, and using centroids to approximate, the overall computations can be done much quicker and are more easily handled by edge devices]]

In recent work, Quantized Neural Networks (QNNs) have demonstrated that even extreme quantization—such as using just 1-bit values for weights and activations—can retain near state-of-the-art accuracy across vision and language tasks [12]. This type of quantization drastically reduces memory access requirements and replaces expensive arithmetic operations with fast, low-power bitwise operations like XNOR and popcount. These benefits are especially important for edge deployment, where energy efficiency is critical. In addition to model compression, Hubara et al. also show that quantized gradients—using as little as 6 bits—can be employed during training with minimal performance loss, further enabling efficient on-device learning [12]. QNNs have achieved strong results even on demanding benchmarks like ImageNet, while offering significant speedups and memory savings, making them one of the most practical solutions for edge AI deployment [12].

'''Pruning:'''
Pruning is an optimization technique that systematically removes low-salience parameters—such as weakly contributing weights or redundant hypothesis paths—from a machine learning model or decoding algorithm to reduce computational overhead. In the context of edge computing, where resources like memory bandwidth, power, and processing time are limited, pruning enables the deployment of performant models within strict efficiency constraints.

In statistical machine translation (SMT) systems, pruning is particularly critical during the decoding phase, where the search space of possible translations grows exponentially with sentence length. Techniques such as histogram pruning and threshold pruning are employed to manage this complexity. Histogram pruning restricts the number of candidate hypotheses retained in a decoding stack to a fixed size 𝑛, discarding the remainder. Threshold pruning eliminates hypotheses whose scores fall below a proportion 𝛼 of the best-scoring candidate, effectively filtering out weak candidates early.

The paper by Banik et al. introduces a machine learning-based dynamic pruning framework that adaptively tunes pruning parameters—namely stack size and beam threshold—based on structural features of the input text, such as sentence length, syntactic complexity, and the distribution of stop words. Rather than relying on static hyperparameters, this method uses a classifier (CN2 algorithm) trained on performance data to predict optimal pruning configurations at runtime. Experimental results showed consistent reductions in decoding latency (up to 90%) while maintaining or improving translation quality, as measured by BLEU scores [13].

This adaptive pruning paradigm is highly relevant to edge inference pipelines, where models must maintain a balance between latency and predictive accuracy. By intelligently limiting the hypothesis space and focusing computational resources on high-probability paths, pruning supports real-time, resource-efficient processing in edge NLP and embedded translation systems.

[[File:Pruning.png|350px|thumb|center|This shows how pruning can significantly reduce the overall network, thus leading to better computational and memory management]]

'''Distillation:'''
Distillation is a key strategy for reducing model complexity in edge computing environments. Instead of training a compact student model on hard labels—discrete class labels like 0, 1, or 2—it is trained on the soft outputs of a larger teacher model. These soft labels represent probability distributions over all classes, offering more nuanced supervision. For instance, rather than telling the student the input belongs strictly to class 3, a teacher might output “70% class 3, 25% class 2, 5% class 1.” This richer feedback helps the student model capture subtle relationships between classes that hard labels miss. Beyond reducing computational demands, distillation enhances generalization by conveying more informative training signals. It also benefits from favorable data geometry—when class distributions are well-separated and aligned—and exhibits strong monotonicity, meaning the student model reliably improves as more data becomes available [11]. These properties make it exceptionally suited for edge devices where training data may be limited, but efficient inference is crucial.

In most cases, knowledge distillation in edge environments involves a large, high-capacity model trained in the cloud acting as the teacher, while the smaller, lightweight student model is deployed on edge devices. A less common—but emerging—practice is edge-to-edge distillation, where a more powerful edge node or edge server functions as the teacher for other nearby edge devices. This setup is especially valuable in federated, collaborative, or hierarchical edge networks, where cloud connectivity may be limited or privacy concerns necessitate local training. Distillation can also be combined with techniques such as quantization or pruning to further optimize model performance under hardware constraints. An example is shown below:

[[File:Knowledge_Distillation.png|700px|thumb|center|This shows how a complex teacher model transfers learned knowledge to a smaller student model using soft predictions to enable efficient edge deployment]]

==='''Usage and Applications of AI Agents'''===
As AI develops and is able to identify different images, generate chats and utilize human language, and optimize different systems, the next step is for it to be able to efficiently perform tasks specified by users. This is known as an AI agent. The agent must first perceive its environment and task, often specified by a user, then it must think about what is the best means to accomplish the task, and finally be able to act on it. 3 aspects of AI agents are crucial in their development to efficiently and correctly carry out task:

'''Reasoning:''' The agent must be able to think sequentially, and decompose its specified tasks into a sequence of specific steps in order to accomplish its goal. It must also have some memory storage in order to remember what it has done, as well as the results of its sequence of actions in order to learn for future steps.

'''Autonomy:''' The agent must choose from the availability of possible steps, and operate based on its reasoning without step-by-step instructions from the user.

'''Tools:''' These tasks, however, are impossible to accomplish without the correct tools. Even if an AI agent understands how to go about carrying a task for optimal results, it must have the actual means to do it. This can include the ability to use and interact with APIs, interpret code, and access certain databases.

Utilizing AI agents on edge devices can be tricky due to the computational and reasoning power needed. However, there are methods to accomplish this such as SLMs which query LLMs as need (discussed later), or utilizing more powerful edge devices to carry out tasks. However, utilizing edge devices can be paramount if latency is a major issue, or if the agent is exposed to sensitive user data. Additionally, by using edge devices specific to a user, it may be able to better learn a user's patterns and preferences and react accordingly to provide the best possible outcome for that user.

=='''4.3 ML Model Optimization at the Edge'''==

==='''The Need for Model Optimization at the Edge'''===
Given the constrained resources, along with the inherently dynamic environment that edge devices must function in, model optimization is a crucial part of machine learning in edge computing. The current most widely used methodology consists of simply specifying an exceptionally large set of parameters, and giving it to the model to train on. This can be feasible when hardware is very advanced and powerful, and is necessary for systems such as Large Language Models (LLMs). However, this is no longer viable when dealing with the devices and environments at the edge. It is crucial to identify the best parameters and training methodology so as to minimize the amount of work done by these devices, while compromising as little as possible on the accuracy of the models. There are multiple ways to this, and they include either optimization or augmentation of the dataset itself, or optimization of the partition of work among the edge devices.

==='''Edge and Cloud Collaboration'''===
One methodology that is often used involves collaboration between both Edge and Cloud Devices. The cloud has the ability to process workloads that may require much more resources and cannot be done on edge devices. On the other hand, edge devices, which can store and process data locally, may have lower latency and more privacy. Given the advantages of each of these, many have proposed that the best way to handle machine learning is through a combination of edge and cloud computing.

The primary issue facing this computing paradigm, however, is the problem of optimally selecting which workloads should be done on the cloud and which should be done on the edge. This is a crucial problem to solve, as the correct partition of workloads is the best way to ensure that the respective benefits of the devices can be leveraged. A common way to do this, is to run certain computing tasks on the necessary devices and determine the length of time and resources that it takes. An example of this is the profiling step done in [https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10818760&tag=1 EdgeShard] [1] and [https://dl.acm.org/doi/pdf/10.1145/3093337.3037698 Neurosurgeon] [4]. Other frameworks implement similar steps, where the capabilities of different devices are tested in order to allocate their workloads and determine the limit at which they can provide efficient functionality. If the workload is beyond the limits of the devices, it can be sent to the cloud for processing

The key advantage of this is that it is able to utilize the resources of the edge devices as necessary, allowing increased data privacy and lower latency. Since workloads are only processed in the cloud as needed, this will reduce the overall amount of time needed for processing because data is not constantly sent back and forth. It also allows for much less network congestion, which is crucial for many applications.

[[File:ECcollab.png|400px|thumb|center|The collaboration of Edge and Cloud Devices]]

==='''Optimizing Workload Partitioning'''===
The key idea for much of the optimization done in machine learning on edge systems involves fully utilizing the heterogenous devices that are often contained in these systems. As such, it is important to understand the capabilities of each device so as to fully utilize its advantages. Devices can very greatly from smartphones with more powerful computational abilities to raspberry pis to sensors. More difficult tasks are offloaded to the powerful devices, while simpler tasks, or models that have been somewhat pretrained can be sent to the smaller devices. In some cases, as in [https://ieeexplore.ieee.org/abstract/document/8690980 Mobile-Edge] [2], the task may be dropped altogether if the resources are deemed insufficient. In this way, exceptionally difficult tasks do not block tasks that have the ability to be executed and therefore the system can continue working.

==='''Dynamic Models'''===
Given the dynamic nature of the environments that edge devices must function, as well as the heterogeneity of the devices themselves, a dynamic model of machine learning is often employed. Such models must keep track of the current available resources including computation usage and power, as well as network traffic. These may change very often depending on the workloads and devices in the system. As such, training models to continuously monitor and dynamically distribute the workloads is a very important part of optimization. Simply offloading larger tasks to more powerful devices may be obsolete if the devices has all of its computing resources or network capabilities being used up by another workload.

The way this is commonly done is by using the profiling step described above as a baseline. Then, a machine learning model utilizes the data to estimate the performance of devices and/or layers. During runtime, a similar process is employed which may update the data used and help the model refine its predictions. Network traffic is also taken into account at this stage in order to preserve the edge computing benefit of providing lower latency. Using all of this data and updates at runtime, the partitioning model is able to dynamically distribute workloads at runtime in order to optimize the workflow and ensure each device is utilizing its resources in the most efficient manner. 2 very good examples of how such a system is specifically deployed are the Neurosurgeon and EdgeShard systems, shown above.

==='''Horizontal and Vertical Partitioning'''===
There are 2 major ways that these models split the workloads in order to optimize the machine learning: Horizontal and vertical partitioning [3]. Given a set of layers that ranges from the cloud to edge, vertical partitioning involves splitting up the workload between the layers. For example, if a large amount of computational resources is deemed necessary, this task may go to the cloud to be completed and preprocessed. One the other hand, if a small amount of computational power is required, this type of work can go to edge devices. Such partitioning also depends on the confidence and accuracy level of the given learning. If the accuracy is completed on an edge device and found to be very low, it can be sent to the cloud; on the other hand if the accuracy is already fairly high and the learning model needs smaller work to reach the threshold deemed acceptable, it may be sent to edge devices to free up network traffic on the cloud and reduce latency [3].

The second model of partitioning is called horizontal partitioning. This involves splitting among the devices within a certain layer rather than among the layers themselves. This is similar to what has been described in previous sections, as it allows a means for fully utilizing the heterogenous abilities that are found in edge devices. Similar functionality and determination to what is found in horizontal partitioning is done, but all of the devices that the workload is split across function on the same layer [3]. To fully optimize a machine learning model, both horizontal and vertical partitioning must be used.

[[File:Edge_computing_layers.png|400px|thumb|center|An example of different layers with multiple devices]]

==='''Distributed Learning'''===
Distributed learning is a process in which instead of giving all of the data to the nodes, each node takes in and processes only a subset of the overall data, then sends its results to a central server which contains the main model. These periodic updates are done by each node, and by only processing part of the data, it is much more easy for edge devices to handle these workloads. It can also reduce the time and computational burden on the cloud and network because it is not only the central server performing all of the computations. One popular method of accomplishing this is by using parallel gradient descent. Gradient descent by itself is a very useful means of training a model, and parallel gradient descent uses a similar process but instead of a single operation, it aggregates the gradient descent calculated by each node in order to update the central model. This efficiently utilizes each node and makes sure that the data that is used to update the model does not exceed the memory constraints of the edge devices being used as nodes.

[[File:Distributed.png|600px|thumb|center| A visualization of data partitioning among nodes doing distributed learning]]

'''Asynchronous Scheduling:''' One important aspect of distributed learning is the means by which the updates are given to the model. Each node may have different amounts of data, memory, and computational power, and therefore the time it takes to process and update will not be the same. Synchronous algorithms make sure that each node finishes its respective processing, and then all of the calculations from all nodes are sent to update the central model. Although, this can make the updating easier, any nodes that lag behind the others will greatly reduce the speed at which the model is trained; this also leaves nodes idle while others are still processing and can be highly inefficient. To optimize this, after each node has finished its respective processing, it sends everything to the main server for an update rather than waiting for all the others to finish. However, this requires some more communication, as after each update, every node must get the updated model for the central server; this can be challenging to repeatedly do, but the efficiency that is gained is often worth it.

'''Federated learning:''' Federated learning is another means by which the data as well as computational burden is distributed among a set of nodes, thus reducing network traffic and strain on the central servers. It is similar to distributed learning, but the key difference lies in the data partitioning. Unlike distributed learning, data used in federated learning is not shared with the central servers. Instead, only local data from each of the nodes is used to train the model. Then, the only part shared with the central model is the updated parameters. A key aspect of this is that federated learning provides a greater amount of data privacy, which is crucial for certain applications dealing with sensitive data. Therefore, it is especially useful to utilize edge devices to perform federated learning. Federated learning is discussed in detail in chapter 5 of this wiki.

==='''Transfer Learning'''===
Transfer learning is a method of machine learning in which a pretrained model is sent to the different nodes for further processing and fine-tuning. The initial training of the model may involve a very large amount of data and could place a major burden on the device that must execute it, which may not be feasible for edge devices. Therefore, the bulk of the model training is done by a more powerful machine, such as the cloud, and then the pretrained model is sent to the edge device. This can be useful, as it reduces the computational burden on the edge device, and allows it to fine-tune the model using the data it collects without having to completely train the whole system. One form of this is knowledge distillation, in which a smaller model can be trained to mimic that of a larger model. This may often be the case when edge and cloud systems are used in a combined way.

==='''Methods of Data Optimization'''===

'''Data Selection:''' Certain types of data may be more useful than others, and therefore the ones that most affect the accuracy of a model can be offloaded to edge devices. This decreases the workload on them because less data must be processed, while also conserving the accuracy of the model as much as possible. Some larger data may not be needed, such as only putting Small Language Models on edge devices that can handle simple commands and prompts, rather than having to offload an entire LLM onto the device, which may overload its computational abilities and not provide enough use.

'''Data Compression:''' Data can be compressed to a smaller form in order to fit the constraints of edge devices. This is especially true given their limited memory, and also makes the workload smaller. Quantization, discussed previously, is a prevalent example of this.

'''Container Workloads:''' Container workloads can be very useful, as they provide all resources and important data the device needs for processing the work. By examining the computational abilities of the device, different sized workloads can be allocated as deemed necessary to maximize the efficiency of the training.

'''Batch size tuning:''' The batch size used by a certain model is very important when considering the memory constraints of edge devices. Batch sizes that are smaller allow for a quicker and more efficient means of training models, and are less likely to lead to bottlenecks in memory. This is related to partitioning because the computing and memory capacity of the devices available are very important factors to consider.

==='''Utilizing Small Language Models (SLMs)'''===
Large Language Models (LLMs) have become a prevalent system recently and are able to do and help with a variety of tasks. However, running and training an LLM requires a significant amount of computational resources which is not feasible when working with edge devices. Most modern LLMs are cloud based, but this may lead to high latency and increased network traffic, especially when working with a large subsystem of nodes.

One way that a similar system can be achieved on edge devices is by using SLMs. These are not as accurate and do not have the vast knowledge of LLMs or the amount of data they are trained on, but for the purposes of basic applications and edge devices, they can be sufficient to accomplish many tasks. They are also often fine-tuned and trained to accomplish the specific tasks which they are deployed for and are much faster and resource-efficient than LLMs. They can also provide much more privacy because they are able to be run on local devices without sharing user data to the cloud. This can be useful for a wide variety of edge applications. If needed and privacy constraints permit, they can query and LLM as needed for more complicated tasks. This means that not every prompt leads to a query, and thus network traffic, privacy, and latency constraints are still preserved.

=='''References'''==
[https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10818760&tag=1 1. M. Zhang, X. Shen, J. Cao, Z. Cui and S. Jiang, "EdgeShard: Efficient LLM Inference via Collaborative Edge Computing," in IEEE Internet of Things Journal, doi: 10.1109/JIOT.2024.3524255]

[https://ieeexplore.ieee.org/abstract/document/8690980 2. X. Chen, H. Zhang, C. Wu, S. Mao, Y. Ji and M. Bennis, "Performance Optimization in Mobile-Edge Computing via Deep Reinforcement Learning," 2018 IEEE 88th Vehicular Technology Conference (VTC-Fall), Chicago, IL, USA, 2018, pp. 1-6, doi: 10.1109/VTCFall.2018.8690980.]

[https://ieeexplore.ieee.org/abstract/document/8976180 3. X. Wang, Y. Han, V. C. M. Leung, D. Niyato, X. Yan and X. Chen, "Convergence of Edge Computing and Deep Learning: A Comprehensive Survey," in IEEE Communications Surveys & Tutorials, vol. 22, no. 2, pp. 869-904, Secondquarter 2020, doi: 10.1109/COMST.2020.2970550.]

[https://dl.acm.org/doi/abs/10.1145/3093337.3037698 4. Kang, Yiping and Hauswald, Johann and Gao, Cao and Rovinski, Austin and Mudge, Trevor and Mars, Jason and Tang, Lingjia, "Neurosurgeon: Collaborative Intelligence Between the Cloud and Mobile Edge" 2017 Association for Computing Machinery, New York, NY, USA, 2017, doi: 10.1145/3093337.3037698.]

[https://dl.acm.org/doi/pdf/10.1145/3555802 5. Hua, Haochen, et al. "Edge computing with artificial intelligence: A machine learning perspective." ACM Computing Surveys 55.9 (2023): 1-35.]

[https://www.mdpi.com/2079-9292/13/3/640 6. Grzesik, Piotr, and Dariusz Mrozek. "Combining machine learning and edge computing: Opportunities, challenges, platforms, frameworks, and use cases." Electronics 13.3 (2024): 640.]

[https://link.springer.com/chapter/10.1007/978-3-030-96756-7_1#citeas 7. S. Rafatirad, H. Homayoun, Z. Chen, and S. M. Pudukotai Dinakarrao, "What Is Applied Machine Learning?," in Machine Learning for Computer Scientists and Data Analysts, Cham, Switzerland: Springer, 2022.]

[https://link.springer.com/article/10.1007/s13748-012-0035-5 8. Peteiro-Barral, Diego, and Bertha Guijarro-Berdiñas. "A survey of methods for distributed machine learning." Progress in Artificial Intelligence 2 (2013): 1-11.]

[https://proceedings.neurips.cc/paper/2010/hash/abea47ba24142ed16b7d8fbf2c740e0d-Abstract.html 9. Zinkevich, Martin, et al. "Parallelized stochastic gradient descent." Advances in neural information processing systems 23 (2010).]

[https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9134370 10. F. Zhuang et al., "A Comprehensive Survey on Transfer Learning," in Proceedings of the IEEE, vol. 109, no. 1, pp. 43-76, Jan. 2021, doi: 10.1109/JPROC.2020.3004555]

[https://proceedings.mlr.press/v97/phuong19a/phuong19a.pdf 11. M. Phuong and C. Lampert, "Towards understanding knowledge distillation," in Proc. Int. Conf. Mach. Learn., May 2019, pp. 5142–5151.]

[https://www.jmlr.org/papers/volume18/16-456/16-456.pdf 12. I. Hubara, M. Courbariaux, D. Soudry, R. El-Yaniv, and Y. Bengio, "Quantized neural networks: Training neural networks with low precision weights and activations," Journal of Machine Learning Research, vol. 18, no. 187, pp. 1–30, 2018.]]

[https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=8588318 13. D. Banik, A. Ekbal, and P. Bhattacharyya, "Machine learning based optimized pruning approach for decoding in statistical machine translation," IEEE Access, vol. 7, pp. 1736–1751, Dec. 2018, doi: 10.1109/ACCESS.2018.2883738.]

Machine Learning at the Edge

2025-04-25T04:16:56Z

Ciarang: /* Dealing With Challenges */

='''Machine Learning at the Edge'''=

=='''4.1 Overview of ML at the Edge'''==
==='''Introduction'''===
Machine Learning is a branch of Artificial Intelligence that is primarily concerned with the training of algorithms to look at sets of data, analyze patterns, and generate conclusions so that the gathered data can be used to generate results and carry out tasks. As an application, machine learning is especially relevant when considering edge computing [7]. Given the recent prevalence and rise in the field of machine learning, it is inevitable that deployment of machine learning algorithms will play a crucial role in the function of edge computing systems as well. The abundant sensor networks often involved in edge systems provide a means of gathering very large amounts of data. With the help of machine learning, such data can be utilized in very effective ways and automation methods can be employed to improve efficiency. Machine learning on edge devices themselves can help ease the burden on cloud systems and the networks connected to them as well, especially if the model does need such large computational power such as in the case of certain Small Language Models (SLMs). However, the main issue regarding machine learning at the edge lies in the devices, which may have limited computational power, and the environment, which may change dynamically and be effected by network congestion, device outages, or other unexpected events. If these challenges can be overcome, edge systems could play a pivotal role in advancing machine learning and utilize it to maximize efficiency.

[[File:ECML.png|600px|thumb|center|]]

==='''Benefits of Machine Learning using Edge Computing'''===
'''Data Volumes:''' Traditional cloud computing architecture may not be able to keep up with the massive volume of data that is often generated by IoT devices and sensors, thus increasing costs as well as pressure and congestion on the networks transmitting data [5]. With edge computing architecture, machine learning can be deployed in a distributed nature, allowing ML models to be deployed and trained across many edge devices. The nature of edge devices sharing computational tasks makes it perfect for taking large datasets and splitting the computational load so that many devices can compute what a single device might not have been able to handle [5]. More than just splitting input data across edge devices, machine learning models can be split up across edge devices as well. This allows for deployment of models that would be otherwise too large to be deployed on a single edge device with limited memory. In such cases, edge nodes are able to collaboratively pass data between them, and the global ML model is later updated based on the workings of each smaller portion [5]. Additionally, the sheer amount of data that processed from edge devices means more training data for machine learning models, which generally increases accuracy, and thus edge computing could be an effective solution to providing lots of good and relevant data for a variety of applications.

'''Lower Latency:''' For certain real-time applications, such as Virtual and Augmented Reality(VR/AR) or smart cars, the latency required to transmit data and process it to the cloud may be too high to make these applications efficient and safe. A key benefit of edge computing lies in the reduced latency provided by putting devices closer to the users, and this is applicable to machine learning as well [5].

'''Enhanced privacy:''' Processing data locally rather than on the cloud reduces the risk of data theft and enhances privacy. The data does not have to be sent to anyone else, or go over a network to potentially get compromised. This is crucial given the large amounts of data needed for machine learning, and especially if personal data is needed for a personal application, many users would prefer the privacy done by having it processed locally [5].

==='''Challenges of Machine Learning Using Edge Computing'''===

'''Potentially Low Computational Power:''' Many edge devices lack the computational power needed for needed for deep learning applications, especially given the large amount of data and complex operations that may have to take place [1].

'''Energy Management:''' Given that edge devices may consist of sensors, or other devices that are meant to have low energy consumption, the tasks associated with machine learning could quickly drain their power, even if they have sufficient power to run the workloads [6].

'''New Attack Surfaces:''' With more devices comes the increased potential for malicious hackers to steal data or compromise devices, despite the enhanced privacy. Encryption, access controls, and other methodologies must be employed to mitigate the potential for new attack surfaces to be exploited [6].

==='''Applications for ML at the Edge'''===
The utilization of data to make conjectures about what is going on in the environment and how to respond has a variety of use cases that can greatly benefit people, cities, and the environment. By leveraging and monitoring a constant stream of data and training machine learning models to detect or even respond to different events, there can be many practical applications for such systems. These would rely on the combination of edge devices and machine learning to better enhance experience for users and detect events of interest.

'''Self-Driving Cars''': Imitation learning can be leveraged to better understand and emulate human driving. The low latency that can be provided by edge computing is especially useful for the quick decision making needed by these systems [7].

'''Smart Home Devices''': Understanding user habits by leveraging the available data for them can make smart homes more convenient for users. The increased privacy that can come with edge computing, along with a few extra cybersecurity measures, can ensure the personal data that may be used for training is not compromised.

'''Environmental and Industrial Monitoring:''' Sensors deployed in environments and industrial settings could be trained to recognize when there are anomalies or undesirable behavior, thus ensuring a quick response and active information sending [7].

'''Smart Cities:''' Similar to above, the data collected by sensors in cities can leverage machine learning to help in crime and emergency detection, traffic management, or energy management[7].

=='''4.2 ML Training at the Edge'''==

Machine Learning (ML) training at the edge is basically the process of developing, updating, or fine-tuning ML models directly on edge devices like on smartphones, IoT sensors, wearables, and other embedded systems instead of only depending on centralized cloud infrastructure. This approach is becoming a lot more important as the demand for real-time, personalized AI applications continues to grow. By being able to train models closer to where the data is generated, edge-based ML enables faster responses, helps reduce latency, and enhances user privacy by minimizing the need to transmit sensitive data to the cloud. It’s also especially useful in scenarios where devices operate in environments with limited or unreliable network connectivity, allowing them to function more efficiently.

[[File:Machine Learning Pipeline.png|500px|thumb|center|Diagram of the machine learning pipeline: raw structured and unstructured data is preprocessed, analyzed, and used for feature selection, followed by model construction, evaluation, and deployment for application use.]]

[[File:Learning.png|500px|thumb|center| This diagram shows the relationship among different devices in the edge-cloud system. As shown, the deep learning workload is partitioned among all layers so that each one contributes based on its advantages.]]

==='''Benefits:'''===
One significant advantage of training ML models directly on edge devices is reduced latency. By processing data locally, devices can make immediate decisions without the delays caused by transmitting data back and forth to cloud servers. This immediate responsiveness is extremely important for applications like real time health monitoring, autonomous driving, and industrial automation.

Additionally, training machine learning models at the edge significantly enhances user privacy. Since sensitive data can be processed and stored directly on the user's device rather than being sent to centralized cloud servers, the risk of data breaches or unauthorized access during transmission is reduced by a lot. This local data handling is able to prevent exposure of personal or confidential information, providing users greater control over their data. Edge-based training naturally aligns with privacy regulations such as the General Data Protection Regulation (GDPR), which emphasizes strict data security, transparency, and explicit user consent. By keeping personal data localized, edge training not only improves security but also helps organizations easily comply with privacy laws, protecting users’ rights and maintaining trust.

Efficiency and resilience are important benefits of edge training. By training machine learning models directly on edge devices, these devices become capable of processing data locally without relying on constant internet connectivity. This local processing allows edge devices to continue operating effectively even in environments where network connections are weak, unstable, or completely unavailable. Because they are not fully dependent on cloud infrastructure, edge devices can quickly adapt to changes, respond in real-time, and update their ML models based on immediate local data. As a result, edge training ensures reliable performance and uninterrupted operation, making it particularly valuable for remote locations, emergency scenarios, and harsh environments where cloud-based solutions might fail or become unreliable.

Examples:

A smart thermostat in a home can learn a user’s preferences for temperature and adjust automatically based on real-time inputs, like time of day or weather conditions. Similarly, a fitness tracker can track user activity patterns and adapt its recommendations for workouts or rest periods based on how the user is performing each day. These devices don’t need to rely on cloud servers to update or personalize their behavior — they can do it instantly on the device, which makes them more responsive and efficient.

In smart agriculture, edge computing is used to enhance crop monitoring and optimize farming practices. Devices like soil sensors, drones, and automated irrigation systems are equipped with sensors that collect data on soil moisture, temperature, and crop health. Edge devices process this data locally, enabling real-time decisions for tasks like irrigation, fertilization, and pest control.

In smart retail, edge computing is used to improve inventory management and customer experience. Retailers use smart shelves, RFID tags, and in-store cameras equipped with sensors to track inventory and monitor customer behavior. By processing this data locally on edge devices, retailers can manage stock levels, detect theft, and optimize store layouts in real-time. RFID tags placed on products can detect when an item is removed from the shelf. Using edge processing, the system can immediately update the inventory count and trigger a restocking request if an item’s stock is low.

Research Papers

An important contribution to the understanding of machine learning (ML) training at the edge is the research paper "Making Distributed Edge Machine Learning for Resource-Constrained Communities and Environments Smarter: Contexts and Challenges" by Truong et al. (2023). This paper focuses on training ML models directly on edge devices in communities and environments facing limitations, such as unstable network connections, limited computational resources, and scarce technical expertise. The authors emphasize the necessity of developing context-aware ML training methods specifically tailored to these environments. Traditional centralized ML training methods often fail to operate effectively in such constrained settings, highlighting the need for decentralized, localized solutions. Truong et al. explore various challenges, including managing data efficiently, deploying suitable software frameworks, and designing intelligent runtime strategies that allow edge devices to train models effectively despite limited resources. Their work points out significant research directions, advocating for more adaptable and sustainable ML training solutions that genuinely reflect the technological and social contexts of resource-limited environments.

==='''Tools and Frameworks:'''===

Frameworks like TensorFlow Lite, PyTorch Mobile, and Edge Impulse are designed to support edge-based model training and inference. These tools allow developers to build and fine-tune models specifically for deployment on low-power devices.

==='''Technical Challenges:'''===

Despite its advantages, ML training at the edge presents challenges, including limited processing power, memory constraints, and energy efficiency. Edge devices often lack the computational resources of cloud servers, requiring lightweight models, optimized algorithms, and energy-efficient hardware.

Real World Applications:
A well known example is Apple’s use of on-device training for personalized voice recognition with Siri. Instead of uploading user voice data to the cloud, Apple uses local training to improve accuracy over time while maintaining user privacy.

==='''Dealing With Challenges'''===
Despite the challenges, of ML at the edge, there are a variety of methods that can be used to provide a more efficient means of training, and making the heavy workloads compatible with even the limited computing power of certain edge devices.

'''Quantization:'''
Quantization is a method that involves reducing the precision of numbers, and thus easing the burden of computational power as well as memory management on the edge devices. There are multiple forms of quantization, but each one essentially sacrifices some precision - enough so that accuracy is mostly maintained but the numbers are easier to handle. For example, converting from floating point to integer datatypes means significantly less memory is used, and the differences for some models in the precision may be negligible. Another example is K-means based Weight Quantization, which involves creating a matrix and grouping similar numbers together with centroids. An example is shown below:

[[File:Screenshot_2025-04-24_170604.png|500px|thumb|center|By clustering each index in the matrix, and using centroids to approximate, the overall computations can be done much quicker and are more easily handled by edge devices]]

In recent work, Quantized Neural Networks (QNNs) have demonstrated that even extreme quantization—such as using just 1-bit values for weights and activations—can retain near state-of-the-art accuracy across vision and language tasks [12]. This type of quantization drastically reduces memory access requirements and replaces expensive arithmetic operations with fast, low-power bitwise operations like XNOR and popcount. These benefits are especially important for edge deployment, where energy efficiency is critical. In addition to model compression, Hubara et al. also show that quantized gradients—using as little as 6 bits—can be employed during training with minimal performance loss, further enabling efficient on-device learning [12]. QNNs have achieved strong results even on demanding benchmarks like ImageNet, while offering significant speedups and memory savings, making them one of the most practical solutions for edge AI deployment [12].

'''Pruning:'''
Pruning is an optimization technique that systematically removes low-salience parameters—such as weakly contributing weights or redundant hypothesis paths—from a machine learning model or decoding algorithm to reduce computational overhead. In the context of edge computing, where resources like memory bandwidth, power, and processing time are limited, pruning enables the deployment of performant models within strict efficiency constraints.

In statistical machine translation (SMT) systems, pruning is particularly critical during the decoding phase, where the search space of possible translations grows exponentially with sentence length. Techniques such as histogram pruning and threshold pruning are employed to manage this complexity. Histogram pruning restricts the number of candidate hypotheses retained in a decoding stack to a fixed size 𝑛, discarding the remainder. Threshold pruning eliminates hypotheses whose scores fall below a proportion 𝛼 of the best-scoring candidate, effectively filtering out weak candidates early.

The paper by Banik et al. introduces a machine learning-based dynamic pruning framework that adaptively tunes pruning parameters—namely stack size and beam threshold—based on structural features of the input text, such as sentence length, syntactic complexity, and the distribution of stop words. Rather than relying on static hyperparameters, this method uses a classifier (CN2 algorithm) trained on performance data to predict optimal pruning configurations at runtime. Experimental results showed consistent reductions in decoding latency (up to 90%) while maintaining or improving translation quality, as measured by BLEU scores [13].

This adaptive pruning paradigm is highly relevant to edge inference pipelines, where models must maintain a balance between latency and predictive accuracy. By intelligently limiting the hypothesis space and focusing computational resources on high-probability paths, pruning supports real-time, resource-efficient processing in edge NLP and embedded translation systems.

[[File:Pruning.png|350px|thumb|center|This shows how pruning can significantly reduce the overall network, thus leading to better computational and memory management]]

'''Distillation:'''
Distillation is a key strategy for reducing model complexity in edge computing environments. Instead of training a compact student model on hard labels—discrete class labels like 0, 1, or 2—it is trained on the soft outputs of a larger teacher model. These soft labels represent probability distributions over all classes, offering more nuanced supervision. For instance, rather than telling the student the input belongs strictly to class 3, a teacher might output “70% class 3, 25% class 2, 5% class 1.” This richer feedback helps the student model capture subtle relationships between classes that hard labels miss. Beyond reducing computational demands, distillation enhances generalization by conveying more informative training signals. It also benefits from favorable data geometry—when class distributions are well-separated and aligned—and exhibits strong monotonicity, meaning the student model reliably improves as more data becomes available [11]. These properties make it exceptionally suited for edge devices where training data may be limited, but efficient inference is crucial.

In most cases, knowledge distillation in edge environments involves a large, high-capacity model trained in the cloud acting as the teacher, while the smaller, lightweight student model is deployed on edge devices. A less common—but emerging—practice is edge-to-edge distillation, where a more powerful edge node or edge server functions as the teacher for other nearby edge devices. This setup is especially valuable in federated, collaborative, or hierarchical edge networks, where cloud connectivity may be limited or privacy concerns necessitate local training. Distillation can also be combined with techniques such as quantization or pruning to further optimize model performance under hardware constraints. An example is shown below:

[[File:Knowledge_Distillation.png|700px|thumb|center|This shows how a complex teacher model transfers learned knowledge to a smaller student model using soft predictions to enable efficient edge deployment]]

==='''Usage and Applications of AI Agents'''===
As AI develops and is able to identify different images, generate chats and utilize human language, and optimize different systems, the next step is for it to be able to efficiently perform tasks specified by users. This is known as an AI agent. The agent must first perceive its environment and task, often specified by a user, then it must think about what is the best means to accomplish the task, and finally be able to act on it. 3 aspects of AI agents are crucial in their development to efficiently and correctly carry out task:

'''Reasoning:''' The agent must be able to think sequentially, and decompose its specified tasks into a sequence of specific steps in order to accomplish its goal. It must also have some memory storage in order to remember what it has done, as well as the results of its sequence of actions in order to learn for future steps.

'''Autonomy:''' The agent must choose from the availability of possible steps, and operate based on its reasoning without step-by-step instructions from the user.

'''Tools:''' These tasks, however, are impossible to accomplish without the correct tools. Even if an AI agent understands how to go about carrying a task for optimal results, it must have the actual means to do it. This can include the ability to use and interact with APIs, interpret code, and access certain databases.

Utilizing AI agents on edge devices can be tricky due to the computational and reasoning power needed. However, there are methods to accomplish this such as SLMs which query LLMs as need (discussed later), or utilizing more powerful edge devices to carry out tasks. However, utilizing edge devices can be paramount if latency is a major issue, or if the agent is exposed to sensitive user data. Additionally, by using edge devices specific to a user, it may be able to better learn a user's patterns and preferences and react accordingly to provide the best possible outcome for that user.

=='''4.3 ML Model Optimization at the Edge'''==

==='''The Need for Model Optimization at the Edge'''===
Given the constrained resources, along with the inherently dynamic environment that edge devices must function in, model optimization is a crucial part of machine learning in edge computing. The current most widely used methodology consists of simply specifying an exceptionally large set of parameters, and giving it to the model to train on. This can be feasible when hardware is very advanced and powerful, and is necessary for systems such as Large Language Models (LLMs). However, this is no longer viable when dealing with the devices and environments at the edge. It is crucial to identify the best parameters and training methodology so as to minimize the amount of work done by these devices, while compromising as little as possible on the accuracy of the models. There are multiple ways to this, and they include either optimization or augmentation of the dataset itself, or optimization of the partition of work among the edge devices.

==='''Edge and Cloud Collaboration'''===
One methodology that is often used involves collaboration between both Edge and Cloud Devices. The cloud has the ability to process workloads that may require much more resources and cannot be done on edge devices. On the other hand, edge devices, which can store and process data locally, may have lower latency and more privacy. Given the advantages of each of these, many have proposed that the best way to handle machine learning is through a combination of edge and cloud computing.

The primary issue facing this computing paradigm, however, is the problem of optimally selecting which workloads should be done on the cloud and which should be done on the edge. This is a crucial problem to solve, as the correct partition of workloads is the best way to ensure that the respective benefits of the devices can be leveraged. A common way to do this, is to run certain computing tasks on the necessary devices and determine the length of time and resources that it takes. An example of this is the profiling step done in [https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10818760&tag=1 EdgeShard] [1] and [https://dl.acm.org/doi/pdf/10.1145/3093337.3037698 Neurosurgeon] [4]. Other frameworks implement similar steps, where the capabilities of different devices are tested in order to allocate their workloads and determine the limit at which they can provide efficient functionality. If the workload is beyond the limits of the devices, it can be sent to the cloud for processing

The key advantage of this is that it is able to utilize the resources of the edge devices as necessary, allowing increased data privacy and lower latency. Since workloads are only processed in the cloud as needed, this will reduce the overall amount of time needed for processing because data is not constantly sent back and forth. It also allows for much less network congestion, which is crucial for many applications.

[[File:ECcollab.png|400px|thumb|center|The collaboration of Edge and Cloud Devices]]

==='''Optimizing Workload Partitioning'''===
The key idea for much of the optimization done in machine learning on edge systems involves fully utilizing the heterogenous devices that are often contained in these systems. As such, it is important to understand the capabilities of each device so as to fully utilize its advantages. Devices can very greatly from smartphones with more powerful computational abilities to raspberry pis to sensors. More difficult tasks are offloaded to the powerful devices, while simpler tasks, or models that have been somewhat pretrained can be sent to the smaller devices. In some cases, as in [https://ieeexplore.ieee.org/abstract/document/8690980 Mobile-Edge] [2], the task may be dropped altogether if the resources are deemed insufficient. In this way, exceptionally difficult tasks do not block tasks that have the ability to be executed and therefore the system can continue working.

==='''Dynamic Models'''===
Given the dynamic nature of the environments that edge devices must function, as well as the heterogeneity of the devices themselves, a dynamic model of machine learning is often employed. Such models must keep track of the current available resources including computation usage and power, as well as network traffic. These may change very often depending on the workloads and devices in the system. As such, training models to continuously monitor and dynamically distribute the workloads is a very important part of optimization. Simply offloading larger tasks to more powerful devices may be obsolete if the devices has all of its computing resources or network capabilities being used up by another workload.

The way this is commonly done is by using the profiling step described above as a baseline. Then, a machine learning model utilizes the data to estimate the performance of devices and/or layers. During runtime, a similar process is employed which may update the data used and help the model refine its predictions. Network traffic is also taken into account at this stage in order to preserve the edge computing benefit of providing lower latency. Using all of this data and updates at runtime, the partitioning model is able to dynamically distribute workloads at runtime in order to optimize the workflow and ensure each device is utilizing its resources in the most efficient manner. 2 very good examples of how such a system is specifically deployed are the Neurosurgeon and EdgeShard systems, shown above.

==='''Horizontal and Vertical Partitioning'''===
There are 2 major ways that these models split the workloads in order to optimize the machine learning: Horizontal and vertical partitioning [3]. Given a set of layers that ranges from the cloud to edge, vertical partitioning involves splitting up the workload between the layers. For example, if a large amount of computational resources is deemed necessary, this task may go to the cloud to be completed and preprocessed. One the other hand, if a small amount of computational power is required, this type of work can go to edge devices. Such partitioning also depends on the confidence and accuracy level of the given learning. If the accuracy is completed on an edge device and found to be very low, it can be sent to the cloud; on the other hand if the accuracy is already fairly high and the learning model needs smaller work to reach the threshold deemed acceptable, it may be sent to edge devices to free up network traffic on the cloud and reduce latency [3].

The second model of partitioning is called horizontal partitioning. This involves splitting among the devices within a certain layer rather than among the layers themselves. This is similar to what has been described in previous sections, as it allows a means for fully utilizing the heterogenous abilities that are found in edge devices. Similar functionality and determination to what is found in horizontal partitioning is done, but all of the devices that the workload is split across function on the same layer [3]. To fully optimize a machine learning model, both horizontal and vertical partitioning must be used.

[[File:Edge_computing_layers.png|400px|thumb|center|An example of different layers with multiple devices]]

==='''Distributed Learning'''===
Distributed learning is a process in which instead of giving all of the data to the nodes, each node takes in and processes only a subset of the overall data, then sends its results to a central server which contains the main model. These periodic updates are done by each node, and by only processing part of the data, it is much more easy for edge devices to handle these workloads. It can also reduce the time and computational burden on the cloud and network because it is not only the central server performing all of the computations. One popular method of accomplishing this is by using parallel gradient descent. Gradient descent by itself is a very useful means of training a model, and parallel gradient descent uses a similar process but instead of a single operation, it aggregates the gradient descent calculated by each node in order to update the central model. This efficiently utilizes each node and makes sure that the data that is used to update the model does not exceed the memory constraints of the edge devices being used as nodes.

[[File:Distributed.png|600px|thumb|center| A visualization of data partitioning among nodes doing distributed learning]]

'''Asynchronous Scheduling:''' One important aspect of distributed learning is the means by which the updates are given to the model. Each node may have different amounts of data, memory, and computational power, and therefore the time it takes to process and update will not be the same. Synchronous algorithms make sure that each node finishes its respective processing, and then all of the calculations from all nodes are sent to update the central model. Although, this can make the updating easier, any nodes that lag behind the others will greatly reduce the speed at which the model is trained; this also leaves nodes idle while others are still processing and can be highly inefficient. To optimize this, after each node has finished its respective processing, it sends everything to the main server for an update rather than waiting for all the others to finish. However, this requires some more communication, as after each update, every node must get the updated model for the central server; this can be challenging to repeatedly do, but the efficiency that is gained is often worth it.

'''Federated learning:''' Federated learning is another means by which the data as well as computational burden is distributed among a set of nodes, thus reducing network traffic and strain on the central servers. It is similar to distributed learning, but the key difference lies in the data partitioning. Unlike distributed learning, data used in federated learning is not shared with the central servers. Instead, only local data from each of the nodes is used to train the model. Then, the only part shared with the central model is the updated parameters. A key aspect of this is that federated learning provides a greater amount of data privacy, which is crucial for certain applications dealing with sensitive data. Therefore, it is especially useful to utilize edge devices to perform federated learning. Federated learning is discussed in detail in chapter 5 of this wiki.

==='''Transfer Learning'''===
Transfer learning is a method of machine learning in which a pretrained model is sent to the different nodes for further processing and fine-tuning. The initial training of the model may involve a very large amount of data and could place a major burden on the device that must execute it, which may not be feasible for edge devices. Therefore, the bulk of the model training is done by a more powerful machine, such as the cloud, and then the pretrained model is sent to the edge device. This can be useful, as it reduces the computational burden on the edge device, and allows it to fine-tune the model using the data it collects without having to completely train the whole system. One form of this is knowledge distillation, in which a smaller model can be trained to mimic that of a larger model. This may often be the case when edge and cloud systems are used in a combined way.

==='''Methods of Data Optimization'''===

'''Data Selection:''' Certain types of data may be more useful than others, and therefore the ones that most affect the accuracy of a model can be offloaded to edge devices. This decreases the workload on them because less data must be processed, while also conserving the accuracy of the model as much as possible. Some larger data may not be needed, such as only putting Small Language Models on edge devices that can handle simple commands and prompts, rather than having to offload an entire LLM onto the device, which may overload its computational abilities and not provide enough use.

'''Data Compression:''' Data can be compressed to a smaller form in order to fit the constraints of edge devices. This is especially true given their limited memory, and also makes the workload smaller. Quantization, discussed previously, is a prevalent example of this.

'''Container Workloads:''' Container workloads can be very useful, as they provide all resources and important data the device needs for processing the work. By examining the computational abilities of the device, different sized workloads can be allocated as deemed necessary to maximize the efficiency of the training.

'''Batch size tuning:''' The batch size used by a certain model is very important when considering the memory constraints of edge devices. Batch sizes that are smaller allow for a quicker and more efficient means of training models, and are less likely to lead to bottlenecks in memory. This is related to partitioning because the computing and memory capacity of the devices available are very important factors to consider.

==='''Utilizing Small Language Models (SLMs)'''===
Large Language Models (LLMs) have become a prevalent system recently and are able to do and help with a variety of tasks. However, running and training an LLM requires a significant amount of computational resources which is not feasible when working with edge devices. Most modern LLMs are cloud based, but this may lead to high latency and increased network traffic, especially when working with a large subsystem of nodes.

One way that a similar system can be achieved on edge devices is by using SLMs. These are not as accurate and do not have the vast knowledge of LLMs or the amount of data they are trained on, but for the purposes of basic applications and edge devices, they can be sufficient to accomplish many tasks. They are also often fine-tuned and trained to accomplish the specific tasks which they are deployed for and are much faster and resource-efficient than LLMs. They can also provide much more privacy because they are able to be run on local devices without sharing user data to the cloud. This can be useful for a wide variety of edge applications. If needed and privacy constraints permit, they can query and LLM as needed for more complicated tasks. This means that not every prompt leads to a query, and thus network traffic, privacy, and latency constraints are still preserved.

=='''References'''==
[https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10818760&tag=1 1. M. Zhang, X. Shen, J. Cao, Z. Cui and S. Jiang, "EdgeShard: Efficient LLM Inference via Collaborative Edge Computing," in IEEE Internet of Things Journal, doi: 10.1109/JIOT.2024.3524255]

[https://ieeexplore.ieee.org/abstract/document/8690980 2. X. Chen, H. Zhang, C. Wu, S. Mao, Y. Ji and M. Bennis, "Performance Optimization in Mobile-Edge Computing via Deep Reinforcement Learning," 2018 IEEE 88th Vehicular Technology Conference (VTC-Fall), Chicago, IL, USA, 2018, pp. 1-6, doi: 10.1109/VTCFall.2018.8690980.]

[https://ieeexplore.ieee.org/abstract/document/8976180 3. X. Wang, Y. Han, V. C. M. Leung, D. Niyato, X. Yan and X. Chen, "Convergence of Edge Computing and Deep Learning: A Comprehensive Survey," in IEEE Communications Surveys & Tutorials, vol. 22, no. 2, pp. 869-904, Secondquarter 2020, doi: 10.1109/COMST.2020.2970550.]

[https://dl.acm.org/doi/abs/10.1145/3093337.3037698 4. Kang, Yiping and Hauswald, Johann and Gao, Cao and Rovinski, Austin and Mudge, Trevor and Mars, Jason and Tang, Lingjia, "Neurosurgeon: Collaborative Intelligence Between the Cloud and Mobile Edge" 2017 Association for Computing Machinery, New York, NY, USA, 2017, doi: 10.1145/3093337.3037698.]

[https://dl.acm.org/doi/pdf/10.1145/3555802 5. Hua, Haochen, et al. "Edge computing with artificial intelligence: A machine learning perspective." ACM Computing Surveys 55.9 (2023): 1-35.]

[https://www.mdpi.com/2079-9292/13/3/640 6. Grzesik, Piotr, and Dariusz Mrozek. "Combining machine learning and edge computing: Opportunities, challenges, platforms, frameworks, and use cases." Electronics 13.3 (2024): 640.]

[https://link.springer.com/chapter/10.1007/978-3-030-96756-7_1#citeas 7. S. Rafatirad, H. Homayoun, Z. Chen, and S. M. Pudukotai Dinakarrao, "What Is Applied Machine Learning?," in Machine Learning for Computer Scientists and Data Analysts, Cham, Switzerland: Springer, 2022.]

[https://link.springer.com/article/10.1007/s13748-012-0035-5 8. Peteiro-Barral, Diego, and Bertha Guijarro-Berdiñas. "A survey of methods for distributed machine learning." Progress in Artificial Intelligence 2 (2013): 1-11.]

[https://proceedings.neurips.cc/paper/2010/hash/abea47ba24142ed16b7d8fbf2c740e0d-Abstract.html 9. Zinkevich, Martin, et al. "Parallelized stochastic gradient descent." Advances in neural information processing systems 23 (2010).]

[https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9134370 10. F. Zhuang et al., "A Comprehensive Survey on Transfer Learning," in Proceedings of the IEEE, vol. 109, no. 1, pp. 43-76, Jan. 2021, doi: 10.1109/JPROC.2020.3004555]

[https://proceedings.mlr.press/v97/phuong19a/phuong19a.pdf 11. M. Phuong and C. Lampert, "Towards understanding knowledge distillation," in Proc. Int. Conf. Mach. Learn., May 2019, pp. 5142–5151.]

[https://www.jmlr.org/papers/volume18/16-456/16-456.pdf 12. I. Hubara, M. Courbariaux, D. Soudry, R. El-Yaniv, and Y. Bengio, "Quantized neural networks: Training neural networks with low precision weights and activations," Journal of Machine Learning Research, vol. 18, no. 187, pp. 1–30, 2018.]]

Machine Learning at the Edge

2025-04-25T03:59:36Z

Ciarang: /* 4.2 ML Training at the Edge */

='''Machine Learning at the Edge'''=

=='''4.1 Overview of ML at the Edge'''==
==='''Introduction'''===
Machine Learning is a branch of Artificial Intelligence that is primarily concerned with the training of algorithms to look at sets of data, analyze patterns, and generate conclusions so that the gathered data can be used to generate results and carry out tasks. As an application, machine learning is especially relevant when considering edge computing [7]. Given the recent prevalence and rise in the field of machine learning, it is inevitable that deployment of machine learning algorithms will play a crucial role in the function of edge computing systems as well. The abundant sensor networks often involved in edge systems provide a means of gathering very large amounts of data. With the help of machine learning, such data can be utilized in very effective ways and automation methods can be employed to improve efficiency. Machine learning on edge devices themselves can help ease the burden on cloud systems and the networks connected to them as well, especially if the model does need such large computational power such as in the case of certain Small Language Models (SLMs). However, the main issue regarding machine learning at the edge lies in the devices, which may have limited computational power, and the environment, which may change dynamically and be effected by network congestion, device outages, or other unexpected events. If these challenges can be overcome, edge systems could play a pivotal role in advancing machine learning and utilize it to maximize efficiency.

[[File:ECML.png|600px|thumb|center|]]

==='''Benefits of Machine Learning using Edge Computing'''===
'''Data Volumes:''' Traditional cloud computing architecture may not be able to keep up with the massive volume of data that is often generated by IoT devices and sensors, thus increasing costs as well as pressure and congestion on the networks transmitting data [5]. With edge computing architecture, machine learning can be deployed in a distributed nature, allowing ML models to be deployed and trained across many edge devices. The nature of edge devices sharing computational tasks makes it perfect for taking large datasets and splitting the computational load so that many devices can compute what a single device might not have been able to handle [5]. More than just splitting input data across edge devices, machine learning models can be split up across edge devices as well. This allows for deployment of models that would be otherwise too large to be deployed on a single edge device with limited memory. In such cases, edge nodes are able to collaboratively pass data between them, and the global ML model is later updated based on the workings of each smaller portion [5]. Additionally, the sheer amount of data that processed from edge devices means more training data for machine learning models, which generally increases accuracy, and thus edge computing could be an effective solution to providing lots of good and relevant data for a variety of applications.

'''Lower Latency:''' For certain real-time applications, such as Virtual and Augmented Reality(VR/AR) or smart cars, the latency required to transmit data and process it to the cloud may be too high to make these applications efficient and safe. A key benefit of edge computing lies in the reduced latency provided by putting devices closer to the users, and this is applicable to machine learning as well [5].

'''Enhanced privacy:''' Processing data locally rather than on the cloud reduces the risk of data theft and enhances privacy. The data does not have to be sent to anyone else, or go over a network to potentially get compromised. This is crucial given the large amounts of data needed for machine learning, and especially if personal data is needed for a personal application, many users would prefer the privacy done by having it processed locally [5].

==='''Challenges of Machine Learning Using Edge Computing'''===

'''Potentially Low Computational Power:''' Many edge devices lack the computational power needed for needed for deep learning applications, especially given the large amount of data and complex operations that may have to take place [1].

'''Energy Management:''' Given that edge devices may consist of sensors, or other devices that are meant to have low energy consumption, the tasks associated with machine learning could quickly drain their power, even if they have sufficient power to run the workloads [6].

'''New Attack Surfaces:''' With more devices comes the increased potential for malicious hackers to steal data or compromise devices, despite the enhanced privacy. Encryption, access controls, and other methodologies must be employed to mitigate the potential for new attack surfaces to be exploited [6].

==='''Applications for ML at the Edge'''===
The utilization of data to make conjectures about what is going on in the environment and how to respond has a variety of use cases that can greatly benefit people, cities, and the environment. By leveraging and monitoring a constant stream of data and training machine learning models to detect or even respond to different events, there can be many practical applications for such systems. These would rely on the combination of edge devices and machine learning to better enhance experience for users and detect events of interest.

'''Self-Driving Cars''': Imitation learning can be leveraged to better understand and emulate human driving. The low latency that can be provided by edge computing is especially useful for the quick decision making needed by these systems [7].

'''Smart Home Devices''': Understanding user habits by leveraging the available data for them can make smart homes more convenient for users. The increased privacy that can come with edge computing, along with a few extra cybersecurity measures, can ensure the personal data that may be used for training is not compromised.

'''Environmental and Industrial Monitoring:''' Sensors deployed in environments and industrial settings could be trained to recognize when there are anomalies or undesirable behavior, thus ensuring a quick response and active information sending [7].

'''Smart Cities:''' Similar to above, the data collected by sensors in cities can leverage machine learning to help in crime and emergency detection, traffic management, or energy management[7].

=='''4.2 ML Training at the Edge'''==

Machine Learning (ML) training at the edge is basically the process of developing, updating, or fine-tuning ML models directly on edge devices like on smartphones, IoT sensors, wearables, and other embedded systems instead of only depending on centralized cloud infrastructure. This approach is becoming a lot more important as the demand for real-time, personalized AI applications continues to grow. By being able to train models closer to where the data is generated, edge-based ML enables faster responses, helps reduce latency, and enhances user privacy by minimizing the need to transmit sensitive data to the cloud. It’s also especially useful in scenarios where devices operate in environments with limited or unreliable network connectivity, allowing them to function more efficiently.

[[File:Machine Learning Pipeline.png|500px|thumb|center|Diagram of the machine learning pipeline: raw structured and unstructured data is preprocessed, analyzed, and used for feature selection, followed by model construction, evaluation, and deployment for application use.]]

[[File:Learning.png|500px|thumb|center| This diagram shows the relationship among different devices in the edge-cloud system. As shown, the deep learning workload is partitioned among all layers so that each one contributes based on its advantages.]]

==='''Benefits:'''===
One significant advantage of training ML models directly on edge devices is reduced latency. By processing data locally, devices can make immediate decisions without the delays caused by transmitting data back and forth to cloud servers. This immediate responsiveness is extremely important for applications like real time health monitoring, autonomous driving, and industrial automation.

Additionally, training machine learning models at the edge significantly enhances user privacy. Since sensitive data can be processed and stored directly on the user's device rather than being sent to centralized cloud servers, the risk of data breaches or unauthorized access during transmission is reduced by a lot. This local data handling is able to prevent exposure of personal or confidential information, providing users greater control over their data. Edge-based training naturally aligns with privacy regulations such as the General Data Protection Regulation (GDPR), which emphasizes strict data security, transparency, and explicit user consent. By keeping personal data localized, edge training not only improves security but also helps organizations easily comply with privacy laws, protecting users’ rights and maintaining trust.

Efficiency and resilience are important benefits of edge training. By training machine learning models directly on edge devices, these devices become capable of processing data locally without relying on constant internet connectivity. This local processing allows edge devices to continue operating effectively even in environments where network connections are weak, unstable, or completely unavailable. Because they are not fully dependent on cloud infrastructure, edge devices can quickly adapt to changes, respond in real-time, and update their ML models based on immediate local data. As a result, edge training ensures reliable performance and uninterrupted operation, making it particularly valuable for remote locations, emergency scenarios, and harsh environments where cloud-based solutions might fail or become unreliable.

Examples:

A smart thermostat in a home can learn a user’s preferences for temperature and adjust automatically based on real-time inputs, like time of day or weather conditions. Similarly, a fitness tracker can track user activity patterns and adapt its recommendations for workouts or rest periods based on how the user is performing each day. These devices don’t need to rely on cloud servers to update or personalize their behavior — they can do it instantly on the device, which makes them more responsive and efficient.

In smart agriculture, edge computing is used to enhance crop monitoring and optimize farming practices. Devices like soil sensors, drones, and automated irrigation systems are equipped with sensors that collect data on soil moisture, temperature, and crop health. Edge devices process this data locally, enabling real-time decisions for tasks like irrigation, fertilization, and pest control.

In smart retail, edge computing is used to improve inventory management and customer experience. Retailers use smart shelves, RFID tags, and in-store cameras equipped with sensors to track inventory and monitor customer behavior. By processing this data locally on edge devices, retailers can manage stock levels, detect theft, and optimize store layouts in real-time. RFID tags placed on products can detect when an item is removed from the shelf. Using edge processing, the system can immediately update the inventory count and trigger a restocking request if an item’s stock is low.

Research Papers

An important contribution to the understanding of machine learning (ML) training at the edge is the research paper "Making Distributed Edge Machine Learning for Resource-Constrained Communities and Environments Smarter: Contexts and Challenges" by Truong et al. (2023). This paper focuses on training ML models directly on edge devices in communities and environments facing limitations, such as unstable network connections, limited computational resources, and scarce technical expertise. The authors emphasize the necessity of developing context-aware ML training methods specifically tailored to these environments. Traditional centralized ML training methods often fail to operate effectively in such constrained settings, highlighting the need for decentralized, localized solutions. Truong et al. explore various challenges, including managing data efficiently, deploying suitable software frameworks, and designing intelligent runtime strategies that allow edge devices to train models effectively despite limited resources. Their work points out significant research directions, advocating for more adaptable and sustainable ML training solutions that genuinely reflect the technological and social contexts of resource-limited environments.

==='''Tools and Frameworks:'''===

Frameworks like TensorFlow Lite, PyTorch Mobile, and Edge Impulse are designed to support edge-based model training and inference. These tools allow developers to build and fine-tune models specifically for deployment on low-power devices.

==='''Technical Challenges:'''===

Despite its advantages, ML training at the edge presents challenges, including limited processing power, memory constraints, and energy efficiency. Edge devices often lack the computational resources of cloud servers, requiring lightweight models, optimized algorithms, and energy-efficient hardware.

Real World Applications:
A well known example is Apple’s use of on-device training for personalized voice recognition with Siri. Instead of uploading user voice data to the cloud, Apple uses local training to improve accuracy over time while maintaining user privacy.

==='''Dealing With Challenges'''===
Despite the challenges, of ML at the edge, there are a variety of methods that can be used to provide a more efficient means of training, and making the heavy workloads compatible with even the limited computing power of certain edge devices.

'''Quantization:'''
Quantization is a method that involves reducing the precision of numbers, and thus easing the burden of computational power as well as memory management on the edge devices. There are multiple forms of quantization, but each one essentially sacrifices some precision - enough so that accuracy is mostly maintained but the numbers are easier to handle. For example, converting from floating point to integer datatypes means significantly less memory is used, and the differences for some models in the precision may be negligible. Another example is K-means based Weight Quantization, which involves creating a matrix and grouping similar numbers together with centroids. An example is shown below:

[[File:Screenshot_2025-04-24_170604.png|500px|thumb|center|By clustering each index in the matrix, and using centroids to approximate, the overall computations can be done much quicker and are more easily handled by edge devices]]

In recent work, Quantized Neural Networks (QNNs) have demonstrated that even extreme quantization—such as using just 1-bit values for weights and activations—can retain near state-of-the-art accuracy across vision and language tasks [12]. This type of quantization drastically reduces memory access requirements and replaces expensive arithmetic operations with fast, low-power bitwise operations like XNOR and popcount. These benefits are especially important for edge deployment, where energy efficiency is critical. In addition to model compression, Hubara et al. also show that quantized gradients—using as little as 6 bits—can be employed during training with minimal performance loss, further enabling efficient on-device learning [12]. QNNs have achieved strong results even on demanding benchmarks like ImageNet, while offering significant speedups and memory savings, making them one of the most practical solutions for edge AI deployment [12].

'''Pruning:'''
This is a method often used in neural networks, in which the least important connections between different nodes and layers are taken out to remove the overall computation. Certain links between the layers are much more important than others and therefore only some will significantly affect the accuracy. By cutting out the least necessary ones, the same accuracy and training can be achieved with much less computation. This can also be applied to entire neurons in the neural network which are seen as the least significant.

[[File:Pruning.png|350px|thumb|center|This shows how pruning can significantly reduce the overall network, thus leading to better computational and memory management]]

'''Distillation:'''
Distillation is a key strategy for reducing model complexity in edge computing environments. Instead of training a compact student model on hard labels—discrete class labels like 0, 1, or 2—it is trained on the soft outputs of a larger teacher model. These soft labels represent probability distributions over all classes, offering more nuanced supervision. For instance, rather than telling the student the input belongs strictly to class 3, a teacher might output “70% class 3, 25% class 2, 5% class 1.” This richer feedback helps the student model capture subtle relationships between classes that hard labels miss. Beyond reducing computational demands, distillation enhances generalization by conveying more informative training signals. It also benefits from favorable data geometry—when class distributions are well-separated and aligned—and exhibits strong monotonicity, meaning the student model reliably improves as more data becomes available [11]. These properties make it exceptionally suited for edge devices where training data may be limited, but efficient inference is crucial.

In most cases, knowledge distillation in edge environments involves a large, high-capacity model trained in the cloud acting as the teacher, while the smaller, lightweight student model is deployed on edge devices. A less common—but emerging—practice is edge-to-edge distillation, where a more powerful edge node or edge server functions as the teacher for other nearby edge devices. This setup is especially valuable in federated, collaborative, or hierarchical edge networks, where cloud connectivity may be limited or privacy concerns necessitate local training. Distillation can also be combined with techniques such as quantization or pruning to further optimize model performance under hardware constraints. An example is shown below:

[[File:Knowledge_Distillation.png|700px|thumb|center|This shows how a complex teacher model transfers learned knowledge to a smaller student model using soft predictions to enable efficient edge deployment]]

==='''Usage and Applications of AI Agents'''===
As AI develops and is able to identify different images, generate chats and utilize human language, and optimize different systems, the next step is for it to be able to efficiently perform tasks specified by users. This is known as an AI agent. The agent must first perceive its environment and task, often specified by a user, then it must think about what is the best means to accomplish the task, and finally be able to act on it. 3 aspects of AI agents are crucial in their development to efficiently and correctly carry out task:

'''Reasoning:''' The agent must be able to think sequentially, and decompose its specified tasks into a sequence of specific steps in order to accomplish its goal. It must also have some memory storage in order to remember what it has done, as well as the results of its sequence of actions in order to learn for future steps.

'''Autonomy:''' The agent must choose from the availability of possible steps, and operate based on its reasoning without step-by-step instructions from the user.

'''Tools:''' These tasks, however, are impossible to accomplish without the correct tools. Even if an AI agent understands how to go about carrying a task for optimal results, it must have the actual means to do it. This can include the ability to use and interact with APIs, interpret code, and access certain databases.

Utilizing AI agents on edge devices can be tricky due to the computational and reasoning power needed. However, there are methods to accomplish this such as SLMs which query LLMs as need (discussed later), or utilizing more powerful edge devices to carry out tasks. However, utilizing edge devices can be paramount if latency is a major issue, or if the agent is exposed to sensitive user data. Additionally, by using edge devices specific to a user, it may be able to better learn a user's patterns and preferences and react accordingly to provide the best possible outcome for that user.

=='''4.3 ML Model Optimization at the Edge'''==

==='''The Need for Model Optimization at the Edge'''===
Given the constrained resources, along with the inherently dynamic environment that edge devices must function in, model optimization is a crucial part of machine learning in edge computing. The current most widely used methodology consists of simply specifying an exceptionally large set of parameters, and giving it to the model to train on. This can be feasible when hardware is very advanced and powerful, and is necessary for systems such as Large Language Models (LLMs). However, this is no longer viable when dealing with the devices and environments at the edge. It is crucial to identify the best parameters and training methodology so as to minimize the amount of work done by these devices, while compromising as little as possible on the accuracy of the models. There are multiple ways to this, and they include either optimization or augmentation of the dataset itself, or optimization of the partition of work among the edge devices.

==='''Edge and Cloud Collaboration'''===
One methodology that is often used involves collaboration between both Edge and Cloud Devices. The cloud has the ability to process workloads that may require much more resources and cannot be done on edge devices. On the other hand, edge devices, which can store and process data locally, may have lower latency and more privacy. Given the advantages of each of these, many have proposed that the best way to handle machine learning is through a combination of edge and cloud computing.

The primary issue facing this computing paradigm, however, is the problem of optimally selecting which workloads should be done on the cloud and which should be done on the edge. This is a crucial problem to solve, as the correct partition of workloads is the best way to ensure that the respective benefits of the devices can be leveraged. A common way to do this, is to run certain computing tasks on the necessary devices and determine the length of time and resources that it takes. An example of this is the profiling step done in [https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10818760&tag=1 EdgeShard] [1] and [https://dl.acm.org/doi/pdf/10.1145/3093337.3037698 Neurosurgeon] [4]. Other frameworks implement similar steps, where the capabilities of different devices are tested in order to allocate their workloads and determine the limit at which they can provide efficient functionality. If the workload is beyond the limits of the devices, it can be sent to the cloud for processing

The key advantage of this is that it is able to utilize the resources of the edge devices as necessary, allowing increased data privacy and lower latency. Since workloads are only processed in the cloud as needed, this will reduce the overall amount of time needed for processing because data is not constantly sent back and forth. It also allows for much less network congestion, which is crucial for many applications.

[[File:ECcollab.png|400px|thumb|center|The collaboration of Edge and Cloud Devices]]

==='''Optimizing Workload Partitioning'''===
The key idea for much of the optimization done in machine learning on edge systems involves fully utilizing the heterogenous devices that are often contained in these systems. As such, it is important to understand the capabilities of each device so as to fully utilize its advantages. Devices can very greatly from smartphones with more powerful computational abilities to raspberry pis to sensors. More difficult tasks are offloaded to the powerful devices, while simpler tasks, or models that have been somewhat pretrained can be sent to the smaller devices. In some cases, as in [https://ieeexplore.ieee.org/abstract/document/8690980 Mobile-Edge] [2], the task may be dropped altogether if the resources are deemed insufficient. In this way, exceptionally difficult tasks do not block tasks that have the ability to be executed and therefore the system can continue working.

==='''Dynamic Models'''===
Given the dynamic nature of the environments that edge devices must function, as well as the heterogeneity of the devices themselves, a dynamic model of machine learning is often employed. Such models must keep track of the current available resources including computation usage and power, as well as network traffic. These may change very often depending on the workloads and devices in the system. As such, training models to continuously monitor and dynamically distribute the workloads is a very important part of optimization. Simply offloading larger tasks to more powerful devices may be obsolete if the devices has all of its computing resources or network capabilities being used up by another workload.

The way this is commonly done is by using the profiling step described above as a baseline. Then, a machine learning model utilizes the data to estimate the performance of devices and/or layers. During runtime, a similar process is employed which may update the data used and help the model refine its predictions. Network traffic is also taken into account at this stage in order to preserve the edge computing benefit of providing lower latency. Using all of this data and updates at runtime, the partitioning model is able to dynamically distribute workloads at runtime in order to optimize the workflow and ensure each device is utilizing its resources in the most efficient manner. 2 very good examples of how such a system is specifically deployed are the Neurosurgeon and EdgeShard systems, shown above.

==='''Horizontal and Vertical Partitioning'''===
There are 2 major ways that these models split the workloads in order to optimize the machine learning: Horizontal and vertical partitioning [3]. Given a set of layers that ranges from the cloud to edge, vertical partitioning involves splitting up the workload between the layers. For example, if a large amount of computational resources is deemed necessary, this task may go to the cloud to be completed and preprocessed. One the other hand, if a small amount of computational power is required, this type of work can go to edge devices. Such partitioning also depends on the confidence and accuracy level of the given learning. If the accuracy is completed on an edge device and found to be very low, it can be sent to the cloud; on the other hand if the accuracy is already fairly high and the learning model needs smaller work to reach the threshold deemed acceptable, it may be sent to edge devices to free up network traffic on the cloud and reduce latency [3].

The second model of partitioning is called horizontal partitioning. This involves splitting among the devices within a certain layer rather than among the layers themselves. This is similar to what has been described in previous sections, as it allows a means for fully utilizing the heterogenous abilities that are found in edge devices. Similar functionality and determination to what is found in horizontal partitioning is done, but all of the devices that the workload is split across function on the same layer [3]. To fully optimize a machine learning model, both horizontal and vertical partitioning must be used.

[[File:Edge_computing_layers.png|400px|thumb|center|An example of different layers with multiple devices]]

==='''Distributed Learning'''===
Distributed learning is a process in which instead of giving all of the data to the nodes, each node takes in and processes only a subset of the overall data, then sends its results to a central server which contains the main model. These periodic updates are done by each node, and by only processing part of the data, it is much more easy for edge devices to handle these workloads. It can also reduce the time and computational burden on the cloud and network because it is not only the central server performing all of the computations. One popular method of accomplishing this is by using parallel gradient descent. Gradient descent by itself is a very useful means of training a model, and parallel gradient descent uses a similar process but instead of a single operation, it aggregates the gradient descent calculated by each node in order to update the central model. This efficiently utilizes each node and makes sure that the data that is used to update the model does not exceed the memory constraints of the edge devices being used as nodes.

[[File:Distributed.png|600px|thumb|center| A visualization of data partitioning among nodes doing distributed learning]]

'''Asynchronous Scheduling:''' One important aspect of distributed learning is the means by which the updates are given to the model. Each node may have different amounts of data, memory, and computational power, and therefore the time it takes to process and update will not be the same. Synchronous algorithms make sure that each node finishes its respective processing, and then all of the calculations from all nodes are sent to update the central model. Although, this can make the updating easier, any nodes that lag behind the others will greatly reduce the speed at which the model is trained; this also leaves nodes idle while others are still processing and can be highly inefficient. To optimize this, after each node has finished its respective processing, it sends everything to the main server for an update rather than waiting for all the others to finish. However, this requires some more communication, as after each update, every node must get the updated model for the central server; this can be challenging to repeatedly do, but the efficiency that is gained is often worth it.

'''Federated learning:''' Federated learning is another means by which the data as well as computational burden is distributed among a set of nodes, thus reducing network traffic and strain on the central servers. It is similar to distributed learning, but the key difference lies in the data partitioning. Unlike distributed learning, data used in federated learning is not shared with the central servers. Instead, only local data from each of the nodes is used to train the model. Then, the only part shared with the central model is the updated parameters. A key aspect of this is that federated learning provides a greater amount of data privacy, which is crucial for certain applications dealing with sensitive data. Therefore, it is especially useful to utilize edge devices to perform federated learning. Federated learning is discussed in detail in chapter 5 of this wiki.

==='''Transfer Learning'''===
Transfer learning is a method of machine learning in which a pretrained model is sent to the different nodes for further processing and fine-tuning. The initial training of the model may involve a very large amount of data and could place a major burden on the device that must execute it, which may not be feasible for edge devices. Therefore, the bulk of the model training is done by a more powerful machine, such as the cloud, and then the pretrained model is sent to the edge device. This can be useful, as it reduces the computational burden on the edge device, and allows it to fine-tune the model using the data it collects without having to completely train the whole system. One form of this is knowledge distillation, in which a smaller model can be trained to mimic that of a larger model. This may often be the case when edge and cloud systems are used in a combined way.

==='''Methods of Data Optimization'''===

'''Data Selection:''' Certain types of data may be more useful than others, and therefore the ones that most affect the accuracy of a model can be offloaded to edge devices. This decreases the workload on them because less data must be processed, while also conserving the accuracy of the model as much as possible. Some larger data may not be needed, such as only putting Small Language Models on edge devices that can handle simple commands and prompts, rather than having to offload an entire LLM onto the device, which may overload its computational abilities and not provide enough use.

'''Data Compression:''' Data can be compressed to a smaller form in order to fit the constraints of edge devices. This is especially true given their limited memory, and also makes the workload smaller. Quantization, discussed previously, is a prevalent example of this.

'''Container Workloads:''' Container workloads can be very useful, as they provide all resources and important data the device needs for processing the work. By examining the computational abilities of the device, different sized workloads can be allocated as deemed necessary to maximize the efficiency of the training.

'''Batch size tuning:''' The batch size used by a certain model is very important when considering the memory constraints of edge devices. Batch sizes that are smaller allow for a quicker and more efficient means of training models, and are less likely to lead to bottlenecks in memory. This is related to partitioning because the computing and memory capacity of the devices available are very important factors to consider.

==='''Utilizing Small Language Models (SLMs)'''===
Large Language Models (LLMs) have become a prevalent system recently and are able to do and help with a variety of tasks. However, running and training an LLM requires a significant amount of computational resources which is not feasible when working with edge devices. Most modern LLMs are cloud based, but this may lead to high latency and increased network traffic, especially when working with a large subsystem of nodes.

One way that a similar system can be achieved on edge devices is by using SLMs. These are not as accurate and do not have the vast knowledge of LLMs or the amount of data they are trained on, but for the purposes of basic applications and edge devices, they can be sufficient to accomplish many tasks. They are also often fine-tuned and trained to accomplish the specific tasks which they are deployed for and are much faster and resource-efficient than LLMs. They can also provide much more privacy because they are able to be run on local devices without sharing user data to the cloud. This can be useful for a wide variety of edge applications. If needed and privacy constraints permit, they can query and LLM as needed for more complicated tasks. This means that not every prompt leads to a query, and thus network traffic, privacy, and latency constraints are still preserved.

=='''References'''==
[https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10818760&tag=1 1. M. Zhang, X. Shen, J. Cao, Z. Cui and S. Jiang, "EdgeShard: Efficient LLM Inference via Collaborative Edge Computing," in IEEE Internet of Things Journal, doi: 10.1109/JIOT.2024.3524255]

[https://ieeexplore.ieee.org/abstract/document/8690980 2. X. Chen, H. Zhang, C. Wu, S. Mao, Y. Ji and M. Bennis, "Performance Optimization in Mobile-Edge Computing via Deep Reinforcement Learning," 2018 IEEE 88th Vehicular Technology Conference (VTC-Fall), Chicago, IL, USA, 2018, pp. 1-6, doi: 10.1109/VTCFall.2018.8690980.]

[https://ieeexplore.ieee.org/abstract/document/8976180 3. X. Wang, Y. Han, V. C. M. Leung, D. Niyato, X. Yan and X. Chen, "Convergence of Edge Computing and Deep Learning: A Comprehensive Survey," in IEEE Communications Surveys & Tutorials, vol. 22, no. 2, pp. 869-904, Secondquarter 2020, doi: 10.1109/COMST.2020.2970550.]

[https://dl.acm.org/doi/abs/10.1145/3093337.3037698 4. Kang, Yiping and Hauswald, Johann and Gao, Cao and Rovinski, Austin and Mudge, Trevor and Mars, Jason and Tang, Lingjia, "Neurosurgeon: Collaborative Intelligence Between the Cloud and Mobile Edge" 2017 Association for Computing Machinery, New York, NY, USA, 2017, doi: 10.1145/3093337.3037698.]

[https://dl.acm.org/doi/pdf/10.1145/3555802 5. Hua, Haochen, et al. "Edge computing with artificial intelligence: A machine learning perspective." ACM Computing Surveys 55.9 (2023): 1-35.]

[https://www.mdpi.com/2079-9292/13/3/640 6. Grzesik, Piotr, and Dariusz Mrozek. "Combining machine learning and edge computing: Opportunities, challenges, platforms, frameworks, and use cases." Electronics 13.3 (2024): 640.]

[https://link.springer.com/chapter/10.1007/978-3-030-96756-7_1#citeas 7. S. Rafatirad, H. Homayoun, Z. Chen, and S. M. Pudukotai Dinakarrao, "What Is Applied Machine Learning?," in Machine Learning for Computer Scientists and Data Analysts, Cham, Switzerland: Springer, 2022.]

[https://link.springer.com/article/10.1007/s13748-012-0035-5 8. Peteiro-Barral, Diego, and Bertha Guijarro-Berdiñas. "A survey of methods for distributed machine learning." Progress in Artificial Intelligence 2 (2013): 1-11.]

[https://proceedings.neurips.cc/paper/2010/hash/abea47ba24142ed16b7d8fbf2c740e0d-Abstract.html 9. Zinkevich, Martin, et al. "Parallelized stochastic gradient descent." Advances in neural information processing systems 23 (2010).]

[https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9134370 10. F. Zhuang et al., "A Comprehensive Survey on Transfer Learning," in Proceedings of the IEEE, vol. 109, no. 1, pp. 43-76, Jan. 2021, doi: 10.1109/JPROC.2020.3004555]

[https://proceedings.mlr.press/v97/phuong19a/phuong19a.pdf 11. M. Phuong and C. Lampert, "Towards understanding knowledge distillation," in Proc. Int. Conf. Mach. Learn., May 2019, pp. 5142–5151.]

[https://www.jmlr.org/papers/volume18/16-456/16-456.pdf 12. I. Hubara, M. Courbariaux, D. Soudry, R. El-Yaniv, and Y. Bengio, "Quantized neural networks: Training neural networks with low precision weights and activations," Journal of Machine Learning Research, vol. 18, no. 187, pp. 1–30, 2018.]]

Machine Learning at the Edge

2025-04-25T03:58:05Z

Ciarang: /* Dealing With Challenges */

='''Machine Learning at the Edge'''=

=='''4.1 Overview of ML at the Edge'''==
==='''Introduction'''===
Machine Learning is a branch of Artificial Intelligence that is primarily concerned with the training of algorithms to look at sets of data, analyze patterns, and generate conclusions so that the gathered data can be used to generate results and carry out tasks. As an application, machine learning is especially relevant when considering edge computing [7]. Given the recent prevalence and rise in the field of machine learning, it is inevitable that deployment of machine learning algorithms will play a crucial role in the function of edge computing systems as well. The abundant sensor networks often involved in edge systems provide a means of gathering very large amounts of data. With the help of machine learning, such data can be utilized in very effective ways and automation methods can be employed to improve efficiency. Machine learning on edge devices themselves can help ease the burden on cloud systems and the networks connected to them as well, especially if the model does need such large computational power such as in the case of certain Small Language Models (SLMs). However, the main issue regarding machine learning at the edge lies in the devices, which may have limited computational power, and the environment, which may change dynamically and be effected by network congestion, device outages, or other unexpected events. If these challenges can be overcome, edge systems could play a pivotal role in advancing machine learning and utilize it to maximize efficiency.

[[File:ECML.png|600px|thumb|center|]]

==='''Benefits of Machine Learning using Edge Computing'''===
'''Data Volumes:''' Traditional cloud computing architecture may not be able to keep up with the massive volume of data that is often generated by IoT devices and sensors, thus increasing costs as well as pressure and congestion on the networks transmitting data [5]. With edge computing architecture, machine learning can be deployed in a distributed nature, allowing ML models to be deployed and trained across many edge devices. The nature of edge devices sharing computational tasks makes it perfect for taking large datasets and splitting the computational load so that many devices can compute what a single device might not have been able to handle [5]. More than just splitting input data across edge devices, machine learning models can be split up across edge devices as well. This allows for deployment of models that would be otherwise too large to be deployed on a single edge device with limited memory. In such cases, edge nodes are able to collaboratively pass data between them, and the global ML model is later updated based on the workings of each smaller portion [5]. Additionally, the sheer amount of data that processed from edge devices means more training data for machine learning models, which generally increases accuracy, and thus edge computing could be an effective solution to providing lots of good and relevant data for a variety of applications.

'''Lower Latency:''' For certain real-time applications, such as Virtual and Augmented Reality(VR/AR) or smart cars, the latency required to transmit data and process it to the cloud may be too high to make these applications efficient and safe. A key benefit of edge computing lies in the reduced latency provided by putting devices closer to the users, and this is applicable to machine learning as well [5].

'''Enhanced privacy:''' Processing data locally rather than on the cloud reduces the risk of data theft and enhances privacy. The data does not have to be sent to anyone else, or go over a network to potentially get compromised. This is crucial given the large amounts of data needed for machine learning, and especially if personal data is needed for a personal application, many users would prefer the privacy done by having it processed locally [5].

==='''Challenges of Machine Learning Using Edge Computing'''===

'''Potentially Low Computational Power:''' Many edge devices lack the computational power needed for needed for deep learning applications, especially given the large amount of data and complex operations that may have to take place [1].

'''Energy Management:''' Given that edge devices may consist of sensors, or other devices that are meant to have low energy consumption, the tasks associated with machine learning could quickly drain their power, even if they have sufficient power to run the workloads [6].

'''New Attack Surfaces:''' With more devices comes the increased potential for malicious hackers to steal data or compromise devices, despite the enhanced privacy. Encryption, access controls, and other methodologies must be employed to mitigate the potential for new attack surfaces to be exploited [6].

==='''Applications for ML at the Edge'''===
The utilization of data to make conjectures about what is going on in the environment and how to respond has a variety of use cases that can greatly benefit people, cities, and the environment. By leveraging and monitoring a constant stream of data and training machine learning models to detect or even respond to different events, there can be many practical applications for such systems. These would rely on the combination of edge devices and machine learning to better enhance experience for users and detect events of interest.

'''Self-Driving Cars''': Imitation learning can be leveraged to better understand and emulate human driving. The low latency that can be provided by edge computing is especially useful for the quick decision making needed by these systems [7].

'''Smart Home Devices''': Understanding user habits by leveraging the available data for them can make smart homes more convenient for users. The increased privacy that can come with edge computing, along with a few extra cybersecurity measures, can ensure the personal data that may be used for training is not compromised.

'''Environmental and Industrial Monitoring:''' Sensors deployed in environments and industrial settings could be trained to recognize when there are anomalies or undesirable behavior, thus ensuring a quick response and active information sending [7].

'''Smart Cities:''' Similar to above, the data collected by sensors in cities can leverage machine learning to help in crime and emergency detection, traffic management, or energy management[7].

=='''4.2 ML Training at the Edge'''==

Machine Learning (ML) training at the edge is basically the process of developing, updating, or fine-tuning ML models directly on edge devices like on smartphones, IoT sensors, wearables, and other embedded systems instead of only depending on centralized cloud infrastructure. This approach is becoming a lot more important as the demand for real-time, personalized AI applications continues to grow. By being able to train models closer to where the data is generated, edge-based ML enables faster responses, helps reduce latency, and enhances user privacy by minimizing the need to transmit sensitive data to the cloud. It’s also especially useful in scenarios where devices operate in environments with limited or unreliable network connectivity, allowing them to function more efficiently.

[[File:Machine Learning Pipeline.png|500px|thumb|center|Diagram of the machine learning pipeline: raw structured and unstructured data is preprocessed, analyzed, and used for feature selection, followed by model construction, evaluation, and deployment for application use.]]

[[File:Learning.png|500px|thumb|center| This diagram shows the relationship among different devices in the edge-cloud system. As shown, the deep learning workload is partitioned among all layers so that each one contributes based on its advantages.]]

==='''Benefits:'''===
One significant advantage of training ML models directly on edge devices is reduced latency. By processing data locally, devices can make immediate decisions without the delays caused by transmitting data back and forth to cloud servers. This immediate responsiveness is extremely important for applications like real time health monitoring, autonomous driving, and industrial automation.

Additionally, training machine learning models at the edge significantly enhances user privacy. Since sensitive data can be processed and stored directly on the user's device rather than being sent to centralized cloud servers, the risk of data breaches or unauthorized access during transmission is reduced by a lot. This local data handling is able to prevent exposure of personal or confidential information, providing users greater control over their data. Edge-based training naturally aligns with privacy regulations such as the General Data Protection Regulation (GDPR), which emphasizes strict data security, transparency, and explicit user consent. By keeping personal data localized, edge training not only improves security but also helps organizations easily comply with privacy laws, protecting users’ rights and maintaining trust.

Efficiency and resilience are important benefits of edge training. By training machine learning models directly on edge devices, these devices become capable of processing data locally without relying on constant internet connectivity. This local processing allows edge devices to continue operating effectively even in environments where network connections are weak, unstable, or completely unavailable. Because they are not fully dependent on cloud infrastructure, edge devices can quickly adapt to changes, respond in real-time, and update their ML models based on immediate local data. As a result, edge training ensures reliable performance and uninterrupted operation, making it particularly valuable for remote locations, emergency scenarios, and harsh environments where cloud-based solutions might fail or become unreliable.

Examples:

A smart thermostat in a home can learn a user’s preferences for temperature and adjust automatically based on real-time inputs, like time of day or weather conditions. Similarly, a fitness tracker can track user activity patterns and adapt its recommendations for workouts or rest periods based on how the user is performing each day. These devices don’t need to rely on cloud servers to update or personalize their behavior — they can do it instantly on the device, which makes them more responsive and efficient.

In smart agriculture, edge computing is used to enhance crop monitoring and optimize farming practices. Devices like soil sensors, drones, and automated irrigation systems are equipped with sensors that collect data on soil moisture, temperature, and crop health. Edge devices process this data locally, enabling real-time decisions for tasks like irrigation, fertilization, and pest control.

In smart retail, edge computing is used to improve inventory management and customer experience. Retailers use smart shelves, RFID tags, and in-store cameras equipped with sensors to track inventory and monitor customer behavior. By processing this data locally on edge devices, retailers can manage stock levels, detect theft, and optimize store layouts in real-time. RFID tags placed on products can detect when an item is removed from the shelf. Using edge processing, the system can immediately update the inventory count and trigger a restocking request if an item’s stock is low.

Research Papers

An important contribution to the understanding of machine learning (ML) training at the edge is the research paper "Making Distributed Edge Machine Learning for Resource-Constrained Communities and Environments Smarter: Contexts and Challenges" by Truong et al. (2023). This paper focuses on training ML models directly on edge devices in communities and environments facing limitations, such as unstable network connections, limited computational resources, and scarce technical expertise. The authors emphasize the necessity of developing context-aware ML training methods specifically tailored to these environments. Traditional centralized ML training methods often fail to operate effectively in such constrained settings, highlighting the need for decentralized, localized solutions. Truong et al. explore various challenges, including managing data efficiently, deploying suitable software frameworks, and designing intelligent runtime strategies that allow edge devices to train models effectively despite limited resources. Their work points out significant research directions, advocating for more adaptable and sustainable ML training solutions that genuinely reflect the technological and social contexts of resource-limited environments.

==='''Tools and Frameworks:'''===

Frameworks like TensorFlow Lite, PyTorch Mobile, and Edge Impulse are designed to support edge-based model training and inference. These tools allow developers to build and fine-tune models specifically for deployment on low-power devices.

==='''Technical Challenges:'''===

Despite its advantages, ML training at the edge presents challenges, including limited processing power, memory constraints, and energy efficiency. Edge devices often lack the computational resources of cloud servers, requiring lightweight models, optimized algorithms, and energy-efficient hardware.

Real World Applications:
A well known example is Apple’s use of on-device training for personalized voice recognition with Siri. Instead of uploading user voice data to the cloud, Apple uses local training to improve accuracy over time while maintaining user privacy.

==='''Dealing With Challenges'''===
Despite the challenges, of ML at the edge, there are a variety of methods that can be used to provide a more efficient means of training, and making the heavy workloads compatible with even the limited computing power of certain edge devices.

'''Quantization:'''
Quantization is a method that involves reducing the precision of numbers, and thus easing the burden of computational power as well as memory management on the edge devices. There are multiple forms of quantization, but each one essentially sacrifices some precision - enough so that accuracy is mostly maintained but the numbers are easier to handle. For example, converting from floating point to integer datatypes means significantly less memory is used, and the differences for some models in the precision may be negligible. Another example is K-means based Weight Quantization, which involves creating a matrix and grouping similar numbers together with centroids. An example is shown below:

[[File:Screenshot_2025-04-24_170604.png|500px|thumb|center|By clustering each index in the matrix, and using centroids to approximate, the overall computations can be done much quicker and are more easily handled by edge devices]]

In recent work, Quantized Neural Networks (QNNs) have demonstrated that even extreme quantization—such as using just 1-bit values for weights and activations—can retain near state-of-the-art accuracy across vision and language tasks [12]. This type of quantization drastically reduces memory access requirements and replaces expensive arithmetic operations with fast, low-power bitwise operations like XNOR and popcount. These benefits are especially important for edge deployment, where energy efficiency is critical. In addition to model compression, Hubara et al. also show that quantized gradients—using as little as 6 bits—can be employed during training with minimal performance loss, further enabling efficient on-device learning [12]. QNNs have achieved strong results even on demanding benchmarks like ImageNet, while offering significant speedups and memory savings, making them one of the most practical solutions for edge AI deployment [12].

'''Pruning:'''
This is a method often used in neural networks, in which the least important connections between different nodes and layers are taken out to remove the overall computation. Certain links between the layers are much more important than others and therefore only some will significantly affect the accuracy. By cutting out the least necessary ones, the same accuracy and training can be achieved with much less computation. This can also be applied to entire neurons in the neural network which are seen as the least significant.

[[File:Pruning.png|350px|thumb|center|This shows how pruning can significantly reduce the overall network, thus leading to better computational and memory management]]

'''Distillation:'''
Distillation is a key strategy for reducing model complexity in edge computing environments. Instead of training a compact student model on hard labels—discrete class labels like 0, 1, or 2—it is trained on the soft outputs of a larger teacher model. These soft labels represent probability distributions over all classes, offering more nuanced supervision. For instance, rather than telling the student the input belongs strictly to class 3, a teacher might output “70% class 3, 25% class 2, 5% class 1.” This richer feedback helps the student model capture subtle relationships between classes that hard labels miss. Beyond reducing computational demands, distillation enhances generalization by conveying more informative training signals. It also benefits from favorable data geometry—when class distributions are well-separated and aligned—and exhibits strong monotonicity, meaning the student model reliably improves as more data becomes available [11]. These properties make it exceptionally suited for edge devices where training data may be limited, but efficient inference is crucial.

In most cases, knowledge distillation in edge environments involves a large, high-capacity model trained in the cloud acting as the teacher, while the smaller, lightweight student model is deployed on edge devices. A less common—but emerging—practice is edge-to-edge distillation, where a more powerful edge node or edge server functions as the teacher for other nearby edge devices. This setup is especially valuable in federated, collaborative, or hierarchical edge networks, where cloud connectivity may be limited or privacy concerns necessitate local training. Distillation can also be combined with techniques such as quantization or pruning to further optimize model performance under hardware constraints. An example is shown below:

[[File:Knowledge_Distillation.png|700px|thumb|center|This shows how a complex teacher model transfers learned knowledge to a smaller student model using soft predictions to enable efficient edge deployment]]

==='''Usage and Applications of AI Agents'''===
As AI develops and is able to identify different images, generate chats and utilize human language, and optimize different systems, the next step is for it to be able to efficiently perform tasks specified by users. This is known as an AI agent. The agent must first perceive its environment and task, often specified by a user, then it must think about what is the best means to accomplish the task, and finally be able to act on it. 3 aspects of AI agents are crucial in their development to efficiently and correctly carry out task:

'''Reasoning:''' The agent must be able to think sequentially, and decompose its specified tasks into a sequence of specific steps in order to accomplish its goal. It must also have some memory storage in order to remember what it has done, as well as the results of its sequence of actions in order to learn for future steps.

'''Autonomy:''' The agent must choose from the availability of possible steps, and operate based on its reasoning without step-by-step instructions from the user.

'''Tools:''' These tasks, however, are impossible to accomplish without the correct tools. Even if an AI agent understands how to go about carrying a task for optimal results, it must have the actual means to do it. This can include the ability to use and interact with APIs, interpret code, and access certain databases.

Utilizing AI agents on edge devices can be tricky due to the computational and reasoning power needed. However, there are methods to accomplish this such as SLMs which query LLMs as need (discussed later), or utilizing more powerful edge devices to carry out tasks. However, utilizing edge devices can be paramount if latency is a major issue, or if the agent is exposed to sensitive user data. Additionally, by using edge devices specific to a user, it may be able to better learn a user's patterns and preferences and react accordingly to provide the best possible outcome for that user.

=='''4.3 ML Model Optimization at the Edge'''==

==='''The Need for Model Optimization at the Edge'''===
Given the constrained resources, along with the inherently dynamic environment that edge devices must function in, model optimization is a crucial part of machine learning in edge computing. The current most widely used methodology consists of simply specifying an exceptionally large set of parameters, and giving it to the model to train on. This can be feasible when hardware is very advanced and powerful, and is necessary for systems such as Large Language Models (LLMs). However, this is no longer viable when dealing with the devices and environments at the edge. It is crucial to identify the best parameters and training methodology so as to minimize the amount of work done by these devices, while compromising as little as possible on the accuracy of the models. There are multiple ways to this, and they include either optimization or augmentation of the dataset itself, or optimization of the partition of work among the edge devices.

==='''Edge and Cloud Collaboration'''===
One methodology that is often used involves collaboration between both Edge and Cloud Devices. The cloud has the ability to process workloads that may require much more resources and cannot be done on edge devices. On the other hand, edge devices, which can store and process data locally, may have lower latency and more privacy. Given the advantages of each of these, many have proposed that the best way to handle machine learning is through a combination of edge and cloud computing.

The primary issue facing this computing paradigm, however, is the problem of optimally selecting which workloads should be done on the cloud and which should be done on the edge. This is a crucial problem to solve, as the correct partition of workloads is the best way to ensure that the respective benefits of the devices can be leveraged. A common way to do this, is to run certain computing tasks on the necessary devices and determine the length of time and resources that it takes. An example of this is the profiling step done in [https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10818760&tag=1 EdgeShard] [1] and [https://dl.acm.org/doi/pdf/10.1145/3093337.3037698 Neurosurgeon] [4]. Other frameworks implement similar steps, where the capabilities of different devices are tested in order to allocate their workloads and determine the limit at which they can provide efficient functionality. If the workload is beyond the limits of the devices, it can be sent to the cloud for processing

The key advantage of this is that it is able to utilize the resources of the edge devices as necessary, allowing increased data privacy and lower latency. Since workloads are only processed in the cloud as needed, this will reduce the overall amount of time needed for processing because data is not constantly sent back and forth. It also allows for much less network congestion, which is crucial for many applications.

[[File:ECcollab.png|400px|thumb|center|The collaboration of Edge and Cloud Devices]]

==='''Optimizing Workload Partitioning'''===
The key idea for much of the optimization done in machine learning on edge systems involves fully utilizing the heterogenous devices that are often contained in these systems. As such, it is important to understand the capabilities of each device so as to fully utilize its advantages. Devices can very greatly from smartphones with more powerful computational abilities to raspberry pis to sensors. More difficult tasks are offloaded to the powerful devices, while simpler tasks, or models that have been somewhat pretrained can be sent to the smaller devices. In some cases, as in [https://ieeexplore.ieee.org/abstract/document/8690980 Mobile-Edge] [2], the task may be dropped altogether if the resources are deemed insufficient. In this way, exceptionally difficult tasks do not block tasks that have the ability to be executed and therefore the system can continue working.

==='''Dynamic Models'''===
Given the dynamic nature of the environments that edge devices must function, as well as the heterogeneity of the devices themselves, a dynamic model of machine learning is often employed. Such models must keep track of the current available resources including computation usage and power, as well as network traffic. These may change very often depending on the workloads and devices in the system. As such, training models to continuously monitor and dynamically distribute the workloads is a very important part of optimization. Simply offloading larger tasks to more powerful devices may be obsolete if the devices has all of its computing resources or network capabilities being used up by another workload.

The way this is commonly done is by using the profiling step described above as a baseline. Then, a machine learning model utilizes the data to estimate the performance of devices and/or layers. During runtime, a similar process is employed which may update the data used and help the model refine its predictions. Network traffic is also taken into account at this stage in order to preserve the edge computing benefit of providing lower latency. Using all of this data and updates at runtime, the partitioning model is able to dynamically distribute workloads at runtime in order to optimize the workflow and ensure each device is utilizing its resources in the most efficient manner. 2 very good examples of how such a system is specifically deployed are the Neurosurgeon and EdgeShard systems, shown above.

==='''Horizontal and Vertical Partitioning'''===
There are 2 major ways that these models split the workloads in order to optimize the machine learning: Horizontal and vertical partitioning [3]. Given a set of layers that ranges from the cloud to edge, vertical partitioning involves splitting up the workload between the layers. For example, if a large amount of computational resources is deemed necessary, this task may go to the cloud to be completed and preprocessed. One the other hand, if a small amount of computational power is required, this type of work can go to edge devices. Such partitioning also depends on the confidence and accuracy level of the given learning. If the accuracy is completed on an edge device and found to be very low, it can be sent to the cloud; on the other hand if the accuracy is already fairly high and the learning model needs smaller work to reach the threshold deemed acceptable, it may be sent to edge devices to free up network traffic on the cloud and reduce latency [3].

The second model of partitioning is called horizontal partitioning. This involves splitting among the devices within a certain layer rather than among the layers themselves. This is similar to what has been described in previous sections, as it allows a means for fully utilizing the heterogenous abilities that are found in edge devices. Similar functionality and determination to what is found in horizontal partitioning is done, but all of the devices that the workload is split across function on the same layer [3]. To fully optimize a machine learning model, both horizontal and vertical partitioning must be used.

[[File:Edge_computing_layers.png|400px|thumb|center|An example of different layers with multiple devices]]

==='''Distributed Learning'''===
Distributed learning is a process in which instead of giving all of the data to the nodes, each node takes in and processes only a subset of the overall data, then sends its results to a central server which contains the main model. These periodic updates are done by each node, and by only processing part of the data, it is much more easy for edge devices to handle these workloads. It can also reduce the time and computational burden on the cloud and network because it is not only the central server performing all of the computations. One popular method of accomplishing this is by using parallel gradient descent. Gradient descent by itself is a very useful means of training a model, and parallel gradient descent uses a similar process but instead of a single operation, it aggregates the gradient descent calculated by each node in order to update the central model. This efficiently utilizes each node and makes sure that the data that is used to update the model does not exceed the memory constraints of the edge devices being used as nodes.

[[File:Distributed.png|600px|thumb|center| A visualization of data partitioning among nodes doing distributed learning]]

'''Asynchronous Scheduling:''' One important aspect of distributed learning is the means by which the updates are given to the model. Each node may have different amounts of data, memory, and computational power, and therefore the time it takes to process and update will not be the same. Synchronous algorithms make sure that each node finishes its respective processing, and then all of the calculations from all nodes are sent to update the central model. Although, this can make the updating easier, any nodes that lag behind the others will greatly reduce the speed at which the model is trained; this also leaves nodes idle while others are still processing and can be highly inefficient. To optimize this, after each node has finished its respective processing, it sends everything to the main server for an update rather than waiting for all the others to finish. However, this requires some more communication, as after each update, every node must get the updated model for the central server; this can be challenging to repeatedly do, but the efficiency that is gained is often worth it.

'''Federated learning:''' Federated learning is another means by which the data as well as computational burden is distributed among a set of nodes, thus reducing network traffic and strain on the central servers. It is similar to distributed learning, but the key difference lies in the data partitioning. Unlike distributed learning, data used in federated learning is not shared with the central servers. Instead, only local data from each of the nodes is used to train the model. Then, the only part shared with the central model is the updated parameters. A key aspect of this is that federated learning provides a greater amount of data privacy, which is crucial for certain applications dealing with sensitive data. Therefore, it is especially useful to utilize edge devices to perform federated learning. Federated learning is discussed in detail in chapter 5 of this wiki.

==='''Transfer Learning'''===
Transfer learning is a method of machine learning in which a pretrained model is sent to the different nodes for further processing and fine-tuning. The initial training of the model may involve a very large amount of data and could place a major burden on the device that must execute it, which may not be feasible for edge devices. Therefore, the bulk of the model training is done by a more powerful machine, such as the cloud, and then the pretrained model is sent to the edge device. This can be useful, as it reduces the computational burden on the edge device, and allows it to fine-tune the model using the data it collects without having to completely train the whole system. One form of this is knowledge distillation, in which a smaller model can be trained to mimic that of a larger model. This may often be the case when edge and cloud systems are used in a combined way.

==='''Methods of Data Optimization'''===

'''Data Selection:''' Certain types of data may be more useful than others, and therefore the ones that most affect the accuracy of a model can be offloaded to edge devices. This decreases the workload on them because less data must be processed, while also conserving the accuracy of the model as much as possible. Some larger data may not be needed, such as only putting Small Language Models on edge devices that can handle simple commands and prompts, rather than having to offload an entire LLM onto the device, which may overload its computational abilities and not provide enough use.

'''Data Compression:''' Data can be compressed to a smaller form in order to fit the constraints of edge devices. This is especially true given their limited memory, and also makes the workload smaller. Quantization, discussed previously, is a prevalent example of this.

'''Container Workloads:''' Container workloads can be very useful, as they provide all resources and important data the device needs for processing the work. By examining the computational abilities of the device, different sized workloads can be allocated as deemed necessary to maximize the efficiency of the training.

'''Batch size tuning:''' The batch size used by a certain model is very important when considering the memory constraints of edge devices. Batch sizes that are smaller allow for a quicker and more efficient means of training models, and are less likely to lead to bottlenecks in memory. This is related to partitioning because the computing and memory capacity of the devices available are very important factors to consider.

==='''Utilizing Small Language Models (SLMs)'''===
Large Language Models (LLMs) have become a prevalent system recently and are able to do and help with a variety of tasks. However, running and training an LLM requires a significant amount of computational resources which is not feasible when working with edge devices. Most modern LLMs are cloud based, but this may lead to high latency and increased network traffic, especially when working with a large subsystem of nodes.

One way that a similar system can be achieved on edge devices is by using SLMs. These are not as accurate and do not have the vast knowledge of LLMs or the amount of data they are trained on, but for the purposes of basic applications and edge devices, they can be sufficient to accomplish many tasks. They are also often fine-tuned and trained to accomplish the specific tasks which they are deployed for and are much faster and resource-efficient than LLMs. They can also provide much more privacy because they are able to be run on local devices without sharing user data to the cloud. This can be useful for a wide variety of edge applications. If needed and privacy constraints permit, they can query and LLM as needed for more complicated tasks. This means that not every prompt leads to a query, and thus network traffic, privacy, and latency constraints are still preserved.

=='''References'''==
[https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10818760&tag=1 1. M. Zhang, X. Shen, J. Cao, Z. Cui and S. Jiang, "EdgeShard: Efficient LLM Inference via Collaborative Edge Computing," in IEEE Internet of Things Journal, doi: 10.1109/JIOT.2024.3524255]

[https://ieeexplore.ieee.org/abstract/document/8690980 2. X. Chen, H. Zhang, C. Wu, S. Mao, Y. Ji and M. Bennis, "Performance Optimization in Mobile-Edge Computing via Deep Reinforcement Learning," 2018 IEEE 88th Vehicular Technology Conference (VTC-Fall), Chicago, IL, USA, 2018, pp. 1-6, doi: 10.1109/VTCFall.2018.8690980.]

[https://ieeexplore.ieee.org/abstract/document/8976180 3. X. Wang, Y. Han, V. C. M. Leung, D. Niyato, X. Yan and X. Chen, "Convergence of Edge Computing and Deep Learning: A Comprehensive Survey," in IEEE Communications Surveys & Tutorials, vol. 22, no. 2, pp. 869-904, Secondquarter 2020, doi: 10.1109/COMST.2020.2970550.]

[https://dl.acm.org/doi/abs/10.1145/3093337.3037698 4. Kang, Yiping and Hauswald, Johann and Gao, Cao and Rovinski, Austin and Mudge, Trevor and Mars, Jason and Tang, Lingjia, "Neurosurgeon: Collaborative Intelligence Between the Cloud and Mobile Edge" 2017 Association for Computing Machinery, New York, NY, USA, 2017, doi: 10.1145/3093337.3037698.]

[https://dl.acm.org/doi/pdf/10.1145/3555802 5. Hua, Haochen, et al. "Edge computing with artificial intelligence: A machine learning perspective." ACM Computing Surveys 55.9 (2023): 1-35.]

[https://www.mdpi.com/2079-9292/13/3/640 6. Grzesik, Piotr, and Dariusz Mrozek. "Combining machine learning and edge computing: Opportunities, challenges, platforms, frameworks, and use cases." Electronics 13.3 (2024): 640.]

[https://link.springer.com/chapter/10.1007/978-3-030-96756-7_1#citeas 7. S. Rafatirad, H. Homayoun, Z. Chen, and S. M. Pudukotai Dinakarrao, "What Is Applied Machine Learning?," in Machine Learning for Computer Scientists and Data Analysts, Cham, Switzerland: Springer, 2022.]

[https://link.springer.com/article/10.1007/s13748-012-0035-5 8. Peteiro-Barral, Diego, and Bertha Guijarro-Berdiñas. "A survey of methods for distributed machine learning." Progress in Artificial Intelligence 2 (2013): 1-11.]

[https://proceedings.neurips.cc/paper/2010/hash/abea47ba24142ed16b7d8fbf2c740e0d-Abstract.html 9. Zinkevich, Martin, et al. "Parallelized stochastic gradient descent." Advances in neural information processing systems 23 (2010).]

[https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9134370 10. F. Zhuang et al., "A Comprehensive Survey on Transfer Learning," in Proceedings of the IEEE, vol. 109, no. 1, pp. 43-76, Jan. 2021, doi: 10.1109/JPROC.2020.3004555]

[https://proceedings.mlr.press/v97/phuong19a/phuong19a.pdf 11. M. Phuong and C. Lampert, "Towards understanding knowledge distillation," in Proc. Int. Conf. Mach. Learn., May 2019, pp. 5142–5151.]

[https://www.jmlr.org/papers/volume18/16-456/16-456.pdf 12. I. Hubara, M. Courbariaux, D. Soudry, R. El-Yaniv, and Y. Bengio, "Quantized neural networks: Training neural networks with low precision weights and activations," Journal of Machine Learning Research, vol. 18, no. 187, pp. 1–30, 2018.]]

Machine Learning at the Edge

2025-04-25T03:51:38Z

Ciarang: /* References */

='''Machine Learning at the Edge'''=

=='''4.1 Overview of ML at the Edge'''==
==='''Introduction'''===
Machine Learning is a branch of Artificial Intelligence that is primarily concerned with the training of algorithms to look at sets of data, analyze patterns, and generate conclusions so that the gathered data can be used to generate results and carry out tasks. As an application, machine learning is especially relevant when considering edge computing [7]. Given the recent prevalence and rise in the field of machine learning, it is inevitable that deployment of machine learning algorithms will play a crucial role in the function of edge computing systems as well. The abundant sensor networks often involved in edge systems provide a means of gathering very large amounts of data. With the help of machine learning, such data can be utilized in very effective ways and automation methods can be employed to improve efficiency. Machine learning on edge devices themselves can help ease the burden on cloud systems and the networks connected to them as well, especially if the model does need such large computational power such as in the case of certain Small Language Models (SLMs). However, the main issue regarding machine learning at the edge lies in the devices, which may have limited computational power, and the environment, which may change dynamically and be effected by network congestion, device outages, or other unexpected events. If these challenges can be overcome, edge systems could play a pivotal role in advancing machine learning and utilize it to maximize efficiency.

[[File:ECML.png|600px|thumb|center|]]

==='''Benefits of Machine Learning using Edge Computing'''===
'''Data Volumes:''' Traditional cloud computing architecture may not be able to keep up with the massive volume of data that is often generated by IoT devices and sensors, thus increasing costs as well as pressure and congestion on the networks transmitting data [5]. With edge computing architecture, machine learning can be deployed in a distributed nature, allowing ML models to be deployed and trained across many edge devices. The nature of edge devices sharing computational tasks makes it perfect for taking large datasets and splitting the computational load so that many devices can compute what a single device might not have been able to handle [5]. More than just splitting input data across edge devices, machine learning models can be split up across edge devices as well. This allows for deployment of models that would be otherwise too large to be deployed on a single edge device with limited memory. In such cases, edge nodes are able to collaboratively pass data between them, and the global ML model is later updated based on the workings of each smaller portion [5]. Additionally, the sheer amount of data that processed from edge devices means more training data for machine learning models, which generally increases accuracy, and thus edge computing could be an effective solution to providing lots of good and relevant data for a variety of applications.

'''Lower Latency:''' For certain real-time applications, such as Virtual and Augmented Reality(VR/AR) or smart cars, the latency required to transmit data and process it to the cloud may be too high to make these applications efficient and safe. A key benefit of edge computing lies in the reduced latency provided by putting devices closer to the users, and this is applicable to machine learning as well [5].

'''Enhanced privacy:''' Processing data locally rather than on the cloud reduces the risk of data theft and enhances privacy. The data does not have to be sent to anyone else, or go over a network to potentially get compromised. This is crucial given the large amounts of data needed for machine learning, and especially if personal data is needed for a personal application, many users would prefer the privacy done by having it processed locally [5].

==='''Challenges of Machine Learning Using Edge Computing'''===

'''Potentially Low Computational Power:''' Many edge devices lack the computational power needed for needed for deep learning applications, especially given the large amount of data and complex operations that may have to take place [1].

'''Energy Management:''' Given that edge devices may consist of sensors, or other devices that are meant to have low energy consumption, the tasks associated with machine learning could quickly drain their power, even if they have sufficient power to run the workloads [6].

'''New Attack Surfaces:''' With more devices comes the increased potential for malicious hackers to steal data or compromise devices, despite the enhanced privacy. Encryption, access controls, and other methodologies must be employed to mitigate the potential for new attack surfaces to be exploited [6].

==='''Applications for ML at the Edge'''===
The utilization of data to make conjectures about what is going on in the environment and how to respond has a variety of use cases that can greatly benefit people, cities, and the environment. By leveraging and monitoring a constant stream of data and training machine learning models to detect or even respond to different events, there can be many practical applications for such systems. These would rely on the combination of edge devices and machine learning to better enhance experience for users and detect events of interest.

'''Self-Driving Cars''': Imitation learning can be leveraged to better understand and emulate human driving. The low latency that can be provided by edge computing is especially useful for the quick decision making needed by these systems [7].

'''Smart Home Devices''': Understanding user habits by leveraging the available data for them can make smart homes more convenient for users. The increased privacy that can come with edge computing, along with a few extra cybersecurity measures, can ensure the personal data that may be used for training is not compromised.

'''Environmental and Industrial Monitoring:''' Sensors deployed in environments and industrial settings could be trained to recognize when there are anomalies or undesirable behavior, thus ensuring a quick response and active information sending [7].

'''Smart Cities:''' Similar to above, the data collected by sensors in cities can leverage machine learning to help in crime and emergency detection, traffic management, or energy management[7].

=='''4.2 ML Training at the Edge'''==

Machine Learning (ML) training at the edge is basically the process of developing, updating, or fine-tuning ML models directly on edge devices like on smartphones, IoT sensors, wearables, and other embedded systems instead of only depending on centralized cloud infrastructure. This approach is becoming a lot more important as the demand for real-time, personalized AI applications continues to grow. By being able to train models closer to where the data is generated, edge-based ML enables faster responses, helps reduce latency, and enhances user privacy by minimizing the need to transmit sensitive data to the cloud. It’s also especially useful in scenarios where devices operate in environments with limited or unreliable network connectivity, allowing them to function more efficiently.

[[File:Machine Learning Pipeline.png|500px|thumb|center|Diagram of the machine learning pipeline: raw structured and unstructured data is preprocessed, analyzed, and used for feature selection, followed by model construction, evaluation, and deployment for application use.]]

[[File:Learning.png|500px|thumb|center| This diagram shows the relationship among different devices in the edge-cloud system. As shown, the deep learning workload is partitioned among all layers so that each one contributes based on its advantages.]]

==='''Benefits:'''===
One significant advantage of training ML models directly on edge devices is reduced latency. By processing data locally, devices can make immediate decisions without the delays caused by transmitting data back and forth to cloud servers. This immediate responsiveness is extremely important for applications like real time health monitoring, autonomous driving, and industrial automation.

Additionally, training machine learning models at the edge significantly enhances user privacy. Since sensitive data can be processed and stored directly on the user's device rather than being sent to centralized cloud servers, the risk of data breaches or unauthorized access during transmission is reduced by a lot. This local data handling is able to prevent exposure of personal or confidential information, providing users greater control over their data. Edge-based training naturally aligns with privacy regulations such as the General Data Protection Regulation (GDPR), which emphasizes strict data security, transparency, and explicit user consent. By keeping personal data localized, edge training not only improves security but also helps organizations easily comply with privacy laws, protecting users’ rights and maintaining trust.

Efficiency and resilience are important benefits of edge training. By training machine learning models directly on edge devices, these devices become capable of processing data locally without relying on constant internet connectivity. This local processing allows edge devices to continue operating effectively even in environments where network connections are weak, unstable, or completely unavailable. Because they are not fully dependent on cloud infrastructure, edge devices can quickly adapt to changes, respond in real-time, and update their ML models based on immediate local data. As a result, edge training ensures reliable performance and uninterrupted operation, making it particularly valuable for remote locations, emergency scenarios, and harsh environments where cloud-based solutions might fail or become unreliable.
Research Papers

An important contribution to the understanding of machine learning (ML) training at the edge is the research paper "Making Distributed Edge Machine Learning for Resource-Constrained Communities and Environments Smarter: Contexts and Challenges" by Truong et al. (2023). This paper focuses on training ML models directly on edge devices in communities and environments facing limitations, such as unstable network connections, limited computational resources, and scarce technical expertise. The authors emphasize the necessity of developing context-aware ML training methods specifically tailored to these environments. Traditional centralized ML training methods often fail to operate effectively in such constrained settings, highlighting the need for decentralized, localized solutions. Truong et al. explore various challenges, including managing data efficiently, deploying suitable software frameworks, and designing intelligent runtime strategies that allow edge devices to train models effectively despite limited resources. Their work points out significant research directions, advocating for more adaptable and sustainable ML training solutions that genuinely reflect the technological and social contexts of resource-limited environments.

==='''Tools and Frameworks:'''===

Frameworks like TensorFlow Lite, PyTorch Mobile, and Edge Impulse are designed to support edge-based model training and inference. These tools allow developers to build and fine-tune models specifically for deployment on low-power devices.

==='''Technical Challenges:'''===

Despite its advantages, ML training at the edge presents challenges, including limited processing power, memory constraints, and energy efficiency. Edge devices often lack the computational resources of cloud servers, requiring lightweight models, optimized algorithms, and energy-efficient hardware.

Real World Applications:
A well known example is Apple’s use of on-device training for personalized voice recognition with Siri. Instead of uploading user voice data to the cloud, Apple uses local training to improve accuracy over time while maintaining user privacy.

==='''Dealing With Challenges'''===
Despite the challenges, of ML at the edge, there are a variety of methods that can be used to provide a more efficient means of training, and making the heavy workloads compatible with even the limited computing power of certain edge devices.

'''Quantization:'''
Quantization is a method that involves reducing the precision of numbers, and thus easing the burden of computational power as well as memory management on the edge devices. There are multiple forms of quantization, but each one essentially sacrifices some precision - enough so that accuracy is mostly maintained but the numbers are easier to handle. For example, converting from floating point to integer datatypes means significantly less memory is used, and the differences for some models in the precision may be negligible. Another example is K-means based Weight Quantization, which involves creating a matrix and grouping similar numbers together with centroids. An example is shown below:

[[File:Screenshot_2025-04-24_170604.png|500px|thumb|center|By clustering each index in the matrix, and using centroids to approximate, the overall computations can be done much quicker and are more easily handled by edge devices]]

'''Pruning:'''
This is a method often used in neural networks, in which the least important connections between different nodes and layers are taken out to remove the overall computation. Certain links between the layers are much more important than others and therefore only some will significantly affect the accuracy. By cutting out the least necessary ones, the same accuracy and training can be achieved with much less computation. This can also be applied to entire neurons in the neural network which are seen as the least significant.

[[File:Pruning.png|350px|thumb|center|This shows how pruning can significantly reduce the overall network, thus leading to better computational and memory management]]

'''Distillation:'''
Distillation is a key strategy for reducing model complexity in edge computing environments. Instead of training a compact student model on hard labels—discrete class labels like 0, 1, or 2—it is trained on the soft outputs of a larger teacher model. These soft labels represent probability distributions over all classes, offering more nuanced supervision. For instance, rather than telling the student the input belongs strictly to class 3, a teacher might output “70% class 3, 25% class 2, 5% class 1.” This richer feedback helps the student model capture subtle relationships between classes that hard labels miss. Beyond reducing computational demands, distillation enhances generalization by conveying more informative training signals. It also benefits from favorable data geometry—when class distributions are well-separated and aligned—and exhibits strong monotonicity, meaning the student model reliably improves as more data becomes available [11]. These properties make it exceptionally suited for edge devices where training data may be limited, but efficient inference is crucial.

In most cases, knowledge distillation in edge environments involves a large, high-capacity model trained in the cloud acting as the teacher, while the smaller, lightweight student model is deployed on edge devices. A less common—but emerging—practice is edge-to-edge distillation, where a more powerful edge node or edge server functions as the teacher for other nearby edge devices. This setup is especially valuable in federated, collaborative, or hierarchical edge networks, where cloud connectivity may be limited or privacy concerns necessitate local training. Distillation can also be combined with techniques such as quantization or pruning to further optimize model performance under hardware constraints. An example is shown below:

[[File:Knowledge_Distillation.png|700px|thumb|center|This shows how a complex teacher model transfers learned knowledge to a smaller student model using soft predictions to enable efficient edge deployment]]

==='''Usage and Applications of AI Agents'''===
As AI develops and is able to identify different images, generate chats and utilize human language, and optimize different systems, the next step is for it to be able to efficiently perform tasks specified by users. This is known as an AI agent. The agent must first perceive its environment and task, often specified by a user, then it must think about what is the best means to accomplish the task, and finally be able to act on it. 3 aspects of AI agents are crucial in their development to efficiently and correctly carry out task:

'''Reasoning:''' The agent must be able to think sequentially, and decompose its specified tasks into a sequence of specific steps in order to accomplish its goal. It must also have some memory storage in order to remember what it has done, as well as the results of its sequence of actions in order to learn for future steps.

'''Autonomy:''' The agent must choose from the availability of possible steps, and operate based on its reasoning without step-by-step instructions from the user.

'''Tools:''' These tasks, however, are impossible to accomplish without the correct tools. Even if an AI agent understands how to go about carrying a task for optimal results, it must have the actual means to do it. This can include the ability to use and interact with APIs, interpret code, and access certain databases.

Utilizing AI agents on edge devices can be tricky due to the computational and reasoning power needed. However, there are methods to accomplish this such as SLMs which query LLMs as need (discussed later), or utilizing more powerful edge devices to carry out tasks. However, utilizing edge devices can be paramount if latency is a major issue, or if the agent is exposed to sensitive user data. Additionally, by using edge devices specific to a user, it may be able to better learn a user's patterns and preferences and react accordingly to provide the best possible outcome for that user.

=='''4.3 ML Model Optimization at the Edge'''==

==='''The Need for Model Optimization at the Edge'''===
Given the constrained resources, along with the inherently dynamic environment that edge devices must function in, model optimization is a crucial part of machine learning in edge computing. The current most widely used methodology consists of simply specifying an exceptionally large set of parameters, and giving it to the model to train on. This can be feasible when hardware is very advanced and powerful, and is necessary for systems such as Large Language Models (LLMs). However, this is no longer viable when dealing with the devices and environments at the edge. It is crucial to identify the best parameters and training methodology so as to minimize the amount of work done by these devices, while compromising as little as possible on the accuracy of the models. There are multiple ways to this, and they include either optimization or augmentation of the dataset itself, or optimization of the partition of work among the edge devices.

==='''Edge and Cloud Collaboration'''===
One methodology that is often used involves collaboration between both Edge and Cloud Devices. The cloud has the ability to process workloads that may require much more resources and cannot be done on edge devices. On the other hand, edge devices, which can store and process data locally, may have lower latency and more privacy. Given the advantages of each of these, many have proposed that the best way to handle machine learning is through a combination of edge and cloud computing.

The primary issue facing this computing paradigm, however, is the problem of optimally selecting which workloads should be done on the cloud and which should be done on the edge. This is a crucial problem to solve, as the correct partition of workloads is the best way to ensure that the respective benefits of the devices can be leveraged. A common way to do this, is to run certain computing tasks on the necessary devices and determine the length of time and resources that it takes. An example of this is the profiling step done in [https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10818760&tag=1 EdgeShard] [1] and [https://dl.acm.org/doi/pdf/10.1145/3093337.3037698 Neurosurgeon] [4]. Other frameworks implement similar steps, where the capabilities of different devices are tested in order to allocate their workloads and determine the limit at which they can provide efficient functionality. If the workload is beyond the limits of the devices, it can be sent to the cloud for processing

The key advantage of this is that it is able to utilize the resources of the edge devices as necessary, allowing increased data privacy and lower latency. Since workloads are only processed in the cloud as needed, this will reduce the overall amount of time needed for processing because data is not constantly sent back and forth. It also allows for much less network congestion, which is crucial for many applications.

[[File:ECcollab.png|400px|thumb|center|The collaboration of Edge and Cloud Devices]]

==='''Optimizing Workload Partitioning'''===
The key idea for much of the optimization done in machine learning on edge systems involves fully utilizing the heterogenous devices that are often contained in these systems. As such, it is important to understand the capabilities of each device so as to fully utilize its advantages. Devices can very greatly from smartphones with more powerful computational abilities to raspberry pis to sensors. More difficult tasks are offloaded to the powerful devices, while simpler tasks, or models that have been somewhat pretrained can be sent to the smaller devices. In some cases, as in [https://ieeexplore.ieee.org/abstract/document/8690980 Mobile-Edge] [2], the task may be dropped altogether if the resources are deemed insufficient. In this way, exceptionally difficult tasks do not block tasks that have the ability to be executed and therefore the system can continue working.

==='''Dynamic Models'''===
Given the dynamic nature of the environments that edge devices must function, as well as the heterogeneity of the devices themselves, a dynamic model of machine learning is often employed. Such models must keep track of the current available resources including computation usage and power, as well as network traffic. These may change very often depending on the workloads and devices in the system. As such, training models to continuously monitor and dynamically distribute the workloads is a very important part of optimization. Simply offloading larger tasks to more powerful devices may be obsolete if the devices has all of its computing resources or network capabilities being used up by another workload.

The way this is commonly done is by using the profiling step described above as a baseline. Then, a machine learning model utilizes the data to estimate the performance of devices and/or layers. During runtime, a similar process is employed which may update the data used and help the model refine its predictions. Network traffic is also taken into account at this stage in order to preserve the edge computing benefit of providing lower latency. Using all of this data and updates at runtime, the partitioning model is able to dynamically distribute workloads at runtime in order to optimize the workflow and ensure each device is utilizing its resources in the most efficient manner. 2 very good examples of how such a system is specifically deployed are the Neurosurgeon and EdgeShard systems, shown above.

==='''Horizontal and Vertical Partitioning'''===
There are 2 major ways that these models split the workloads in order to optimize the machine learning: Horizontal and vertical partitioning [3]. Given a set of layers that ranges from the cloud to edge, vertical partitioning involves splitting up the workload between the layers. For example, if a large amount of computational resources is deemed necessary, this task may go to the cloud to be completed and preprocessed. One the other hand, if a small amount of computational power is required, this type of work can go to edge devices. Such partitioning also depends on the confidence and accuracy level of the given learning. If the accuracy is completed on an edge device and found to be very low, it can be sent to the cloud; on the other hand if the accuracy is already fairly high and the learning model needs smaller work to reach the threshold deemed acceptable, it may be sent to edge devices to free up network traffic on the cloud and reduce latency [3].

The second model of partitioning is called horizontal partitioning. This involves splitting among the devices within a certain layer rather than among the layers themselves. This is similar to what has been described in previous sections, as it allows a means for fully utilizing the heterogenous abilities that are found in edge devices. Similar functionality and determination to what is found in horizontal partitioning is done, but all of the devices that the workload is split across function on the same layer [3]. To fully optimize a machine learning model, both horizontal and vertical partitioning must be used.

[[File:Edge_computing_layers.png|400px|thumb|center|An example of different layers with multiple devices]]

==='''Distributed Learning'''===
Distributed learning is a process in which instead of giving all of the data to the nodes, each node takes in and processes only a subset of the overall data, then sends its results to a central server which contains the main model. These periodic updates are done by each node, and by only processing part of the data, it is much more easy for edge devices to handle these workloads. It can also reduce the time and computational burden on the cloud and network because it is not only the central server performing all of the computations. One popular method of accomplishing this is by using parallel gradient descent. Gradient descent by itself is a very useful means of training a model, and parallel gradient descent uses a similar process but instead of a single operation, it aggregates the gradient descent calculated by each node in order to update the central model. This efficiently utilizes each node and makes sure that the data that is used to update the model does not exceed the memory constraints of the edge devices being used as nodes.

[[File:Distributed.png|600px|thumb|center| A visualization of data partitioning among nodes doing distributed learning]]

'''Asynchronous Scheduling:''' One important aspect of distributed learning is the means by which the updates are given to the model. Each node may have different amounts of data, memory, and computational power, and therefore the time it takes to process and update will not be the same. Synchronous algorithms make sure that each node finishes its respective processing, and then all of the calculations from all nodes are sent to update the central model. Although, this can make the updating easier, any nodes that lag behind the others will greatly reduce the speed at which the model is trained; this also leaves nodes idle while others are still processing and can be highly inefficient. To optimize this, after each node has finished its respective processing, it sends everything to the main server for an update rather than waiting for all the others to finish. However, this requires some more communication, as after each update, every node must get the updated model for the central server; this can be challenging to repeatedly do, but the efficiency that is gained is often worth it.

'''Federated learning:''' Federated learning is another means by which the data as well as computational burden is distributed among a set of nodes, thus reducing network traffic and strain on the central servers. It is similar to distributed learning, but the key difference lies in the data partitioning. Unlike distributed learning, data used in federated learning is not shared with the central servers. Instead, only local data from each of the nodes is used to train the model. Then, the only part shared with the central model is the updated parameters. A key aspect of this is that federated learning provides a greater amount of data privacy, which is crucial for certain applications dealing with sensitive data. Therefore, it is especially useful to utilize edge devices to perform federated learning. Federated learning is discussed in detail in chapter 5 of this wiki.

==='''Transfer Learning'''===
Transfer learning is a method of machine learning in which a pretrained model is sent to the different nodes for further processing and fine-tuning. The initial training of the model may involve a very large amount of data and could place a major burden on the device that must execute it, which may not be feasible for edge devices. Therefore, the bulk of the model training is done by a more powerful machine, such as the cloud, and then the pretrained model is sent to the edge device. This can be useful, as it reduces the computational burden on the edge device, and allows it to fine-tune the model using the data it collects without having to completely train the whole system. One form of this is knowledge distillation, in which a smaller model can be trained to mimic that of a larger model. This may often be the case when edge and cloud systems are used in a combined way.

==='''Methods of Data Optimization'''===

'''Data Selection:''' Certain types of data may be more useful than others, and therefore the ones that most affect the accuracy of a model can be offloaded to edge devices. This decreases the workload on them because less data must be processed, while also conserving the accuracy of the model as much as possible. Some larger data may not be needed, such as only putting Small Language Models on edge devices that can handle simple commands and prompts, rather than having to offload an entire LLM onto the device, which may overload its computational abilities and not provide enough use.

'''Data Compression:''' Data can be compressed to a smaller form in order to fit the constraints of edge devices. This is especially true given their limited memory, and also makes the workload smaller. Quantization, discussed previously, is a prevalent example of this.

'''Container Workloads:''' Container workloads can be very useful, as they provide all resources and important data the device needs for processing the work. By examining the computational abilities of the device, different sized workloads can be allocated as deemed necessary to maximize the efficiency of the training.

'''Batch size tuning:''' The batch size used by a certain model is very important when considering the memory constraints of edge devices. Batch sizes that are smaller allow for a quicker and more efficient means of training models, and are less likely to lead to bottlenecks in memory. This is related to partitioning because the computing and memory capacity of the devices available are very important factors to consider.

==='''Utilizing Small Language Models (SLMs)'''===
Large Language Models (LLMs) have become a prevalent system recently and are able to do and help with a variety of tasks. However, running and training an LLM requires a significant amount of computational resources which is not feasible when working with edge devices. Most modern LLMs are cloud based, but this may lead to high latency and increased network traffic, especially when working with a large subsystem of nodes.

One way that a similar system can be achieved on edge devices is by using SLMs. These are not as accurate and do not have the vast knowledge of LLMs or the amount of data they are trained on, but for the purposes of basic applications and edge devices, they can be sufficient to accomplish many tasks. They are also often fine-tuned and trained to accomplish the specific tasks which they are deployed for and are much faster and resource-efficient than LLMs. They can also provide much more privacy because they are able to be run on local devices without sharing user data to the cloud. This can be useful for a wide variety of edge applications. If needed and privacy constraints permit, they can query and LLM as needed for more complicated tasks. This means that not every prompt leads to a query, and thus network traffic, privacy, and latency constraints are still preserved.

=='''References'''==
[https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10818760&tag=1 1. M. Zhang, X. Shen, J. Cao, Z. Cui and S. Jiang, "EdgeShard: Efficient LLM Inference via Collaborative Edge Computing," in IEEE Internet of Things Journal, doi: 10.1109/JIOT.2024.3524255]

[https://ieeexplore.ieee.org/abstract/document/8690980 2. X. Chen, H. Zhang, C. Wu, S. Mao, Y. Ji and M. Bennis, "Performance Optimization in Mobile-Edge Computing via Deep Reinforcement Learning," 2018 IEEE 88th Vehicular Technology Conference (VTC-Fall), Chicago, IL, USA, 2018, pp. 1-6, doi: 10.1109/VTCFall.2018.8690980.]

[https://ieeexplore.ieee.org/abstract/document/8976180 3. X. Wang, Y. Han, V. C. M. Leung, D. Niyato, X. Yan and X. Chen, "Convergence of Edge Computing and Deep Learning: A Comprehensive Survey," in IEEE Communications Surveys & Tutorials, vol. 22, no. 2, pp. 869-904, Secondquarter 2020, doi: 10.1109/COMST.2020.2970550.]

[https://dl.acm.org/doi/abs/10.1145/3093337.3037698 4. Kang, Yiping and Hauswald, Johann and Gao, Cao and Rovinski, Austin and Mudge, Trevor and Mars, Jason and Tang, Lingjia, "Neurosurgeon: Collaborative Intelligence Between the Cloud and Mobile Edge" 2017 Association for Computing Machinery, New York, NY, USA, 2017, doi: 10.1145/3093337.3037698.]

[https://dl.acm.org/doi/pdf/10.1145/3555802 5. Hua, Haochen, et al. "Edge computing with artificial intelligence: A machine learning perspective." ACM Computing Surveys 55.9 (2023): 1-35.]

[https://www.mdpi.com/2079-9292/13/3/640 6. Grzesik, Piotr, and Dariusz Mrozek. "Combining machine learning and edge computing: Opportunities, challenges, platforms, frameworks, and use cases." Electronics 13.3 (2024): 640.]

[https://link.springer.com/chapter/10.1007/978-3-030-96756-7_1#citeas 7. S. Rafatirad, H. Homayoun, Z. Chen, and S. M. Pudukotai Dinakarrao, "What Is Applied Machine Learning?," in Machine Learning for Computer Scientists and Data Analysts, Cham, Switzerland: Springer, 2022.]

[https://link.springer.com/article/10.1007/s13748-012-0035-5 8. Peteiro-Barral, Diego, and Bertha Guijarro-Berdiñas. "A survey of methods for distributed machine learning." Progress in Artificial Intelligence 2 (2013): 1-11.]

[https://proceedings.neurips.cc/paper/2010/hash/abea47ba24142ed16b7d8fbf2c740e0d-Abstract.html 9. Zinkevich, Martin, et al. "Parallelized stochastic gradient descent." Advances in neural information processing systems 23 (2010).]

[https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9134370 10. F. Zhuang et al., "A Comprehensive Survey on Transfer Learning," in Proceedings of the IEEE, vol. 109, no. 1, pp. 43-76, Jan. 2021, doi: 10.1109/JPROC.2020.3004555]

[https://proceedings.mlr.press/v97/phuong19a/phuong19a.pdf 11. M. Phuong and C. Lampert, "Towards understanding knowledge distillation," in Proc. Int. Conf. Mach. Learn., May 2019, pp. 5142–5151.]

[https://www.jmlr.org/papers/volume18/16-456/16-456.pdf 12. I. Hubara, M. Courbariaux, D. Soudry, R. El-Yaniv, and Y. Bengio, "Quantized neural networks: Training neural networks with low precision weights and activations," Journal of Machine Learning Research, vol. 18, no. 187, pp. 1–30, 2018.]]

Machine Learning at the Edge

2025-04-25T03:37:47Z

Ciarang: /* 4.2 ML Training at the Edge */

='''Machine Learning at the Edge'''=

=='''4.1 Overview of ML at the Edge'''==
==='''Introduction'''===
Machine Learning is a branch of Artificial Intelligence that is primarily concerned with the training of algorithms to look at sets of data, analyze patterns, and generate conclusions so that the gathered data can be used to generate results and carry out tasks. As an application, machine learning is especially relevant when considering edge computing [7]. Given the recent prevalence and rise in the field of machine learning, it is inevitable that deployment of machine learning algorithms will play a crucial role in the function of edge computing systems as well. The abundant sensor networks often involved in edge systems provide a means of gathering very large amounts of data. With the help of machine learning, such data can be utilized in very effective ways and automation methods can be employed to improve efficiency. Machine learning on edge devices themselves can help ease the burden on cloud systems and the networks connected to them as well, especially if the model does need such large computational power such as in the case of certain Small Language Models (SLMs). However, the main issue regarding machine learning at the edge lies in the devices, which may have limited computational power, and the environment, which may change dynamically and be effected by network congestion, device outages, or other unexpected events. If these challenges can be overcome, edge systems could play a pivotal role in advancing machine learning and utilize it to maximize efficiency.

[[File:ECML.png|600px|thumb|center|]]

==='''Benefits of Machine Learning using Edge Computing'''===
'''Data Volumes:''' Traditional cloud computing architecture may not be able to keep up with the massive volume of data that is often generated by IoT devices and sensors, thus increasing costs as well as pressure and congestion on the networks transmitting data [5]. With edge computing architecture, machine learning can be deployed in a distributed nature, allowing ML models to be deployed and trained across many edge devices. The nature of edge devices sharing computational tasks makes it perfect for taking large datasets and splitting the computational load so that many devices can compute what a single device might not have been able to handle [5]. More than just splitting input data across edge devices, machine learning models can be split up across edge devices as well. This allows for deployment of models that would be otherwise too large to be deployed on a single edge device with limited memory. In such cases, edge nodes are able to collaboratively pass data between them, and the global ML model is later updated based on the workings of each smaller portion [5]. Additionally, the sheer amount of data that processed from edge devices means more training data for machine learning models, which generally increases accuracy, and thus edge computing could be an effective solution to providing lots of good and relevant data for a variety of applications.

'''Lower Latency:''' For certain real-time applications, such as Virtual and Augmented Reality(VR/AR) or smart cars, the latency required to transmit data and process it to the cloud may be too high to make these applications efficient and safe. A key benefit of edge computing lies in the reduced latency provided by putting devices closer to the users, and this is applicable to machine learning as well [5].

'''Enhanced privacy:''' Processing data locally rather than on the cloud reduces the risk of data theft and enhances privacy. The data does not have to be sent to anyone else, or go over a network to potentially get compromised. This is crucial given the large amounts of data needed for machine learning, and especially if personal data is needed for a personal application, many users would prefer the privacy done by having it processed locally [5].

==='''Challenges of Machine Learning Using Edge Computing'''===

'''Potentially Low Computational Power:''' Many edge devices lack the computational power needed for needed for deep learning applications, especially given the large amount of data and complex operations that may have to take place [1].

'''Energy Management:''' Given that edge devices may consist of sensors, or other devices that are meant to have low energy consumption, the tasks associated with machine learning could quickly drain their power, even if they have sufficient power to run the workloads [6].

'''New Attack Surfaces:''' With more devices comes the increased potential for malicious hackers to steal data or compromise devices, despite the enhanced privacy. Encryption, access controls, and other methodologies must be employed to mitigate the potential for new attack surfaces to be exploited [6].

==='''Applications for ML at the Edge'''===
The utilization of data to make conjectures about what is going on in the environment and how to respond has a variety of use cases that can greatly benefit people, cities, and the environment. By leveraging and monitoring a constant stream of data and training machine learning models to detect or even respond to different events, there can be many practical applications for such systems. These would rely on the combination of edge devices and machine learning to better enhance experience for users and detect events of interest.

'''Self-Driving Cars''': Imitation learning can be leveraged to better understand and emulate human driving. The low latency that can be provided by edge computing is especially useful for the quick decision making needed by these systems [7].

'''Smart Home Devices''': Understanding user habits by leveraging the available data for them can make smart homes more convenient for users. The increased privacy that can come with edge computing, along with a few extra cybersecurity measures, can ensure the personal data that may be used for training is not compromised.

'''Environmental and Industrial Monitoring:''' Sensors deployed in environments and industrial settings could be trained to recognize when there are anomalies or undesirable behavior, thus ensuring a quick response and active information sending [7].

'''Smart Cities:''' Similar to above, the data collected by sensors in cities can leverage machine learning to help in crime and emergency detection, traffic management, or energy management[7].

=='''4.2 ML Training at the Edge'''==

Machine Learning (ML) training at the edge is basically the process of developing, updating, or fine-tuning ML models directly on edge devices like on smartphones, IoT sensors, wearables, and other embedded systems instead of only depending on centralized cloud infrastructure. This approach is becoming a lot more important as the demand for real-time, personalized AI applications continues to grow. By being able to train models closer to where the data is generated, edge-based ML enables faster responses, helps reduce latency, and enhances user privacy by minimizing the need to transmit sensitive data to the cloud. It’s also especially useful in scenarios where devices operate in environments with limited or unreliable network connectivity, allowing them to function more efficiently.

[[File:Machine Learning Pipeline.png|500px|thumb|center|Diagram of the machine learning pipeline: raw structured and unstructured data is preprocessed, analyzed, and used for feature selection, followed by model construction, evaluation, and deployment for application use.]]

[[File:Learning.png|500px|thumb|center| This diagram shows the relationship among different devices in the edge-cloud system. As shown, the deep learning workload is partitioned among all layers so that each one contributes based on its advantages.]]

==='''Benefits:'''===
One significant advantage of training ML models directly on edge devices is reduced latency. By processing data locally, devices can make immediate decisions without the delays caused by transmitting data back and forth to cloud servers. This immediate responsiveness is extremely important for applications like real time health monitoring, autonomous driving, and industrial automation.

Additionally, training machine learning models at the edge significantly enhances user privacy. Since sensitive data can be processed and stored directly on the user's device rather than being sent to centralized cloud servers, the risk of data breaches or unauthorized access during transmission is reduced by a lot. This local data handling is able to prevent exposure of personal or confidential information, providing users greater control over their data. Edge-based training naturally aligns with privacy regulations such as the General Data Protection Regulation (GDPR), which emphasizes strict data security, transparency, and explicit user consent. By keeping personal data localized, edge training not only improves security but also helps organizations easily comply with privacy laws, protecting users’ rights and maintaining trust.

Efficiency and resilience are important benefits of edge training. By training machine learning models directly on edge devices, these devices become capable of processing data locally without relying on constant internet connectivity. This local processing allows edge devices to continue operating effectively even in environments where network connections are weak, unstable, or completely unavailable. Because they are not fully dependent on cloud infrastructure, edge devices can quickly adapt to changes, respond in real-time, and update their ML models based on immediate local data. As a result, edge training ensures reliable performance and uninterrupted operation, making it particularly valuable for remote locations, emergency scenarios, and harsh environments where cloud-based solutions might fail or become unreliable.
Research Papers

An important contribution to the understanding of machine learning (ML) training at the edge is the research paper "Making Distributed Edge Machine Learning for Resource-Constrained Communities and Environments Smarter: Contexts and Challenges" by Truong et al. (2023). This paper focuses on training ML models directly on edge devices in communities and environments facing limitations, such as unstable network connections, limited computational resources, and scarce technical expertise. The authors emphasize the necessity of developing context-aware ML training methods specifically tailored to these environments. Traditional centralized ML training methods often fail to operate effectively in such constrained settings, highlighting the need for decentralized, localized solutions. Truong et al. explore various challenges, including managing data efficiently, deploying suitable software frameworks, and designing intelligent runtime strategies that allow edge devices to train models effectively despite limited resources. Their work points out significant research directions, advocating for more adaptable and sustainable ML training solutions that genuinely reflect the technological and social contexts of resource-limited environments.

==='''Tools and Frameworks:'''===

Frameworks like TensorFlow Lite, PyTorch Mobile, and Edge Impulse are designed to support edge-based model training and inference. These tools allow developers to build and fine-tune models specifically for deployment on low-power devices.

==='''Technical Challenges:'''===

Despite its advantages, ML training at the edge presents challenges, including limited processing power, memory constraints, and energy efficiency. Edge devices often lack the computational resources of cloud servers, requiring lightweight models, optimized algorithms, and energy-efficient hardware.

Real World Applications:
A well known example is Apple’s use of on-device training for personalized voice recognition with Siri. Instead of uploading user voice data to the cloud, Apple uses local training to improve accuracy over time while maintaining user privacy.

==='''Dealing With Challenges'''===
Despite the challenges, of ML at the edge, there are a variety of methods that can be used to provide a more efficient means of training, and making the heavy workloads compatible with even the limited computing power of certain edge devices.

'''Quantization:'''
Quantization is a method that involves reducing the precision of numbers, and thus easing the burden of computational power as well as memory management on the edge devices. There are multiple forms of quantization, but each one essentially sacrifices some precision - enough so that accuracy is mostly maintained but the numbers are easier to handle. For example, converting from floating point to integer datatypes means significantly less memory is used, and the differences for some models in the precision may be negligible. Another example is K-means based Weight Quantization, which involves creating a matrix and grouping similar numbers together with centroids. An example is shown below:

[[File:Screenshot_2025-04-24_170604.png|500px|thumb|center|By clustering each index in the matrix, and using centroids to approximate, the overall computations can be done much quicker and are more easily handled by edge devices]]

'''Pruning:'''
This is a method often used in neural networks, in which the least important connections between different nodes and layers are taken out to remove the overall computation. Certain links between the layers are much more important than others and therefore only some will significantly affect the accuracy. By cutting out the least necessary ones, the same accuracy and training can be achieved with much less computation. This can also be applied to entire neurons in the neural network which are seen as the least significant.

[[File:Pruning.png|350px|thumb|center|This shows how pruning can significantly reduce the overall network, thus leading to better computational and memory management]]

'''Distillation:'''
Distillation is a key strategy for reducing model complexity in edge computing environments. Instead of training a compact student model on hard labels—discrete class labels like 0, 1, or 2—it is trained on the soft outputs of a larger teacher model. These soft labels represent probability distributions over all classes, offering more nuanced supervision. For instance, rather than telling the student the input belongs strictly to class 3, a teacher might output “70% class 3, 25% class 2, 5% class 1.” This richer feedback helps the student model capture subtle relationships between classes that hard labels miss. Beyond reducing computational demands, distillation enhances generalization by conveying more informative training signals. It also benefits from favorable data geometry—when class distributions are well-separated and aligned—and exhibits strong monotonicity, meaning the student model reliably improves as more data becomes available [11]. These properties make it exceptionally suited for edge devices where training data may be limited, but efficient inference is crucial.

In most cases, knowledge distillation in edge environments involves a large, high-capacity model trained in the cloud acting as the teacher, while the smaller, lightweight student model is deployed on edge devices. A less common—but emerging—practice is edge-to-edge distillation, where a more powerful edge node or edge server functions as the teacher for other nearby edge devices. This setup is especially valuable in federated, collaborative, or hierarchical edge networks, where cloud connectivity may be limited or privacy concerns necessitate local training. Distillation can also be combined with techniques such as quantization or pruning to further optimize model performance under hardware constraints. An example is shown below:

[[File:Knowledge_Distillation.png|700px|thumb|center|This shows how a complex teacher model transfers learned knowledge to a smaller student model using soft predictions to enable efficient edge deployment]]

==='''Usage and Applications of AI Agents'''===
As AI develops and is able to identify different images, generate chats and utilize human language, and optimize different systems, the next step is for it to be able to efficiently perform tasks specified by users. This is known as an AI agent. The agent must first perceive its environment and task, often specified by a user, then it must think about what is the best means to accomplish the task, and finally be able to act on it. 3 aspects of AI agents are crucial in their development to efficiently and correctly carry out task:

'''Reasoning:''' The agent must be able to think sequentially, and decompose its specified tasks into a sequence of specific steps in order to accomplish its goal. It must also have some memory storage in order to remember what it has done, as well as the results of its sequence of actions in order to learn for future steps.

'''Autonomy:''' The agent must choose from the availability of possible steps, and operate based on its reasoning without step-by-step instructions from the user.

'''Tools:''' These tasks, however, are impossible to accomplish without the correct tools. Even if an AI agent understands how to go about carrying a task for optimal results, it must have the actual means to do it. This can include the ability to use and interact with APIs, interpret code, and access certain databases.

Utilizing AI agents on edge devices can be tricky due to the computational and reasoning power needed. However, there are methods to accomplish this such as SLMs which query LLMs as need (discussed later), or utilizing more powerful edge devices to carry out tasks. However, utilizing edge devices can be paramount if latency is a major issue, or if the agent is exposed to sensitive user data. Additionally, by using edge devices specific to a user, it may be able to better learn a user's patterns and preferences and react accordingly to provide the best possible outcome for that user.

=='''4.3 ML Model Optimization at the Edge'''==

==='''The Need for Model Optimization at the Edge'''===
Given the constrained resources, along with the inherently dynamic environment that edge devices must function in, model optimization is a crucial part of machine learning in edge computing. The current most widely used methodology consists of simply specifying an exceptionally large set of parameters, and giving it to the model to train on. This can be feasible when hardware is very advanced and powerful, and is necessary for systems such as Large Language Models (LLMs). However, this is no longer viable when dealing with the devices and environments at the edge. It is crucial to identify the best parameters and training methodology so as to minimize the amount of work done by these devices, while compromising as little as possible on the accuracy of the models. There are multiple ways to this, and they include either optimization or augmentation of the dataset itself, or optimization of the partition of work among the edge devices.

==='''Edge and Cloud Collaboration'''===
One methodology that is often used involves collaboration between both Edge and Cloud Devices. The cloud has the ability to process workloads that may require much more resources and cannot be done on edge devices. On the other hand, edge devices, which can store and process data locally, may have lower latency and more privacy. Given the advantages of each of these, many have proposed that the best way to handle machine learning is through a combination of edge and cloud computing.

The primary issue facing this computing paradigm, however, is the problem of optimally selecting which workloads should be done on the cloud and which should be done on the edge. This is a crucial problem to solve, as the correct partition of workloads is the best way to ensure that the respective benefits of the devices can be leveraged. A common way to do this, is to run certain computing tasks on the necessary devices and determine the length of time and resources that it takes. An example of this is the profiling step done in [https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10818760&tag=1 EdgeShard] [1] and [https://dl.acm.org/doi/pdf/10.1145/3093337.3037698 Neurosurgeon] [4]. Other frameworks implement similar steps, where the capabilities of different devices are tested in order to allocate their workloads and determine the limit at which they can provide efficient functionality. If the workload is beyond the limits of the devices, it can be sent to the cloud for processing

The key advantage of this is that it is able to utilize the resources of the edge devices as necessary, allowing increased data privacy and lower latency. Since workloads are only processed in the cloud as needed, this will reduce the overall amount of time needed for processing because data is not constantly sent back and forth. It also allows for much less network congestion, which is crucial for many applications.

[[File:ECcollab.png|400px|thumb|center|The collaboration of Edge and Cloud Devices]]

==='''Optimizing Workload Partitioning'''===
The key idea for much of the optimization done in machine learning on edge systems involves fully utilizing the heterogenous devices that are often contained in these systems. As such, it is important to understand the capabilities of each device so as to fully utilize its advantages. Devices can very greatly from smartphones with more powerful computational abilities to raspberry pis to sensors. More difficult tasks are offloaded to the powerful devices, while simpler tasks, or models that have been somewhat pretrained can be sent to the smaller devices. In some cases, as in [https://ieeexplore.ieee.org/abstract/document/8690980 Mobile-Edge] [2], the task may be dropped altogether if the resources are deemed insufficient. In this way, exceptionally difficult tasks do not block tasks that have the ability to be executed and therefore the system can continue working.

==='''Dynamic Models'''===
Given the dynamic nature of the environments that edge devices must function, as well as the heterogeneity of the devices themselves, a dynamic model of machine learning is often employed. Such models must keep track of the current available resources including computation usage and power, as well as network traffic. These may change very often depending on the workloads and devices in the system. As such, training models to continuously monitor and dynamically distribute the workloads is a very important part of optimization. Simply offloading larger tasks to more powerful devices may be obsolete if the devices has all of its computing resources or network capabilities being used up by another workload.

The way this is commonly done is by using the profiling step described above as a baseline. Then, a machine learning model utilizes the data to estimate the performance of devices and/or layers. During runtime, a similar process is employed which may update the data used and help the model refine its predictions. Network traffic is also taken into account at this stage in order to preserve the edge computing benefit of providing lower latency. Using all of this data and updates at runtime, the partitioning model is able to dynamically distribute workloads at runtime in order to optimize the workflow and ensure each device is utilizing its resources in the most efficient manner. 2 very good examples of how such a system is specifically deployed are the Neurosurgeon and EdgeShard systems, shown above.

==='''Horizontal and Vertical Partitioning'''===
There are 2 major ways that these models split the workloads in order to optimize the machine learning: Horizontal and vertical partitioning [3]. Given a set of layers that ranges from the cloud to edge, vertical partitioning involves splitting up the workload between the layers. For example, if a large amount of computational resources is deemed necessary, this task may go to the cloud to be completed and preprocessed. One the other hand, if a small amount of computational power is required, this type of work can go to edge devices. Such partitioning also depends on the confidence and accuracy level of the given learning. If the accuracy is completed on an edge device and found to be very low, it can be sent to the cloud; on the other hand if the accuracy is already fairly high and the learning model needs smaller work to reach the threshold deemed acceptable, it may be sent to edge devices to free up network traffic on the cloud and reduce latency [3].

The second model of partitioning is called horizontal partitioning. This involves splitting among the devices within a certain layer rather than among the layers themselves. This is similar to what has been described in previous sections, as it allows a means for fully utilizing the heterogenous abilities that are found in edge devices. Similar functionality and determination to what is found in horizontal partitioning is done, but all of the devices that the workload is split across function on the same layer [3]. To fully optimize a machine learning model, both horizontal and vertical partitioning must be used.

[[File:Edge_computing_layers.png|400px|thumb|center|An example of different layers with multiple devices]]

==='''Distributed Learning'''===
Distributed learning is a process in which instead of giving all of the data to the nodes, each node takes in and processes only a subset of the overall data, then sends its results to a central server which contains the main model. These periodic updates are done by each node, and by only processing part of the data, it is much more easy for edge devices to handle these workloads. It can also reduce the time and computational burden on the cloud and network because it is not only the central server performing all of the computations. One popular method of accomplishing this is by using parallel gradient descent. Gradient descent by itself is a very useful means of training a model, and parallel gradient descent uses a similar process but instead of a single operation, it aggregates the gradient descent calculated by each node in order to update the central model. This efficiently utilizes each node and makes sure that the data that is used to update the model does not exceed the memory constraints of the edge devices being used as nodes.

[[File:Distributed.png|600px|thumb|center| A visualization of data partitioning among nodes doing distributed learning]]

'''Asynchronous Scheduling:''' One important aspect of distributed learning is the means by which the updates are given to the model. Each node may have different amounts of data, memory, and computational power, and therefore the time it takes to process and update will not be the same. Synchronous algorithms make sure that each node finishes its respective processing, and then all of the calculations from all nodes are sent to update the central model. Although, this can make the updating easier, any nodes that lag behind the others will greatly reduce the speed at which the model is trained; this also leaves nodes idle while others are still processing and can be highly inefficient. To optimize this, after each node has finished its respective processing, it sends everything to the main server for an update rather than waiting for all the others to finish. However, this requires some more communication, as after each update, every node must get the updated model for the central server; this can be challenging to repeatedly do, but the efficiency that is gained is often worth it.

==='''Transfer Learning'''===
Transfer learning is a method of machine learning in which a pretrained model is sent to the different nodes for further processing and fine-tuning. The initial training of the model may involve a very large amount of data and could place a major burden on the device that must execute it, which may not be feasible for edge devices. Therefore, the bulk of the model training is done by a more powerful machine, such as the cloud, and then the pretrained model is sent to the edge device. This can be useful, as it reduces the computational burden on the edge device, and allows it to fine-tune the model using the data it collects without having to completely train the whole system. One form of this is knowledge distillation, in which a smaller model can be trained to mimic that of a larger model. This may often be the case when edge and cloud systems are used in a combined way.

==='''Methods of Data Optimization'''===

'''Data Selection:''' Certain types of data may be more useful than others, and therefore the ones that most affect the accuracy of a model can be offloaded to edge devices. This decreases the workload on them because less data must be processed, while also conserving the accuracy of the model as much as possible. Some larger data may not be needed, such as only putting Small Language Models on edge devices that can handle simple commands and prompts, rather than having to offload an entire LLM onto the device, which may overload its computational abilities and not provide enough use.

'''Data Compression:''' Data can be compressed to a smaller form in order to fit the constraints of edge devices. This is especially true given their limited memory, and also makes the workload smaller. Quantization, discussed previously, is a prevalent example of this.

'''Container Workloads:''' Container workloads can be very useful, as they provide all resources and important data the device needs for processing the work. By examining the computational abilities of the device, different sized workloads can be allocated as deemed necessary to maximize the efficiency of the training.

'''Batch size tuning:''' The batch size used by a certain model is very important when considering the memory constraints of edge devices. Batch sizes that are smaller allow for a quicker and more efficient means of training models, and are less likely to lead to bottlenecks in memory. This is related to partitioning because the computing and memory capacity of the devices available are very important factors to consider.

==='''Utilizing Small Language Models (SLMs)'''===
Large Language Models (LLMs) have become a prevalent system recently and are able to do and help with a variety of tasks. However, running and training an LLM requires a significant amount of computational resources which is not feasible when working with edge devices. Most modern LLMs are cloud based, but this may lead to high latency and increased network traffic, especially when working with a large subsystem of nodes.

One way that a similar system can be achieved on edge devices is by using SLMs. These are not as accurate and do not have the vast knowledge of LLMs or the amount of data they are trained on, but for the purposes of basic applications and edge devices, they can be sufficient to accomplish many tasks. They are also often fine-tuned and trained to accomplish the specific tasks which they are deployed for and are much faster and resource-efficient than LLMs. They can also provide much more privacy because they are able to be run on local devices without sharing user data to the cloud. This can be useful for a wide variety of edge applications. If needed and privacy constraints permit, they can query and LLM as needed for more complicated tasks. This means that not every prompt leads to a query, and thus network traffic, privacy, and latency constraints are still preserved.

=='''References'''==
[https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10818760&tag=1 1. M. Zhang, X. Shen, J. Cao, Z. Cui and S. Jiang, "EdgeShard: Efficient LLM Inference via Collaborative Edge Computing," in IEEE Internet of Things Journal, doi: 10.1109/JIOT.2024.3524255]

[https://ieeexplore.ieee.org/abstract/document/8690980 2. X. Chen, H. Zhang, C. Wu, S. Mao, Y. Ji and M. Bennis, "Performance Optimization in Mobile-Edge Computing via Deep Reinforcement Learning," 2018 IEEE 88th Vehicular Technology Conference (VTC-Fall), Chicago, IL, USA, 2018, pp. 1-6, doi: 10.1109/VTCFall.2018.8690980.]

[https://ieeexplore.ieee.org/abstract/document/8976180 3. X. Wang, Y. Han, V. C. M. Leung, D. Niyato, X. Yan and X. Chen, "Convergence of Edge Computing and Deep Learning: A Comprehensive Survey," in IEEE Communications Surveys & Tutorials, vol. 22, no. 2, pp. 869-904, Secondquarter 2020, doi: 10.1109/COMST.2020.2970550.]

[https://dl.acm.org/doi/abs/10.1145/3093337.3037698 4. Kang, Yiping and Hauswald, Johann and Gao, Cao and Rovinski, Austin and Mudge, Trevor and Mars, Jason and Tang, Lingjia, "Neurosurgeon: Collaborative Intelligence Between the Cloud and Mobile Edge" 2017 Association for Computing Machinery, New York, NY, USA, 2017, doi: 10.1145/3093337.3037698.]

[https://dl.acm.org/doi/pdf/10.1145/3555802 5. Hua, Haochen, et al. "Edge computing with artificial intelligence: A machine learning perspective." ACM Computing Surveys 55.9 (2023): 1-35.]

[https://www.mdpi.com/2079-9292/13/3/640 6. Grzesik, Piotr, and Dariusz Mrozek. "Combining machine learning and edge computing: Opportunities, challenges, platforms, frameworks, and use cases." Electronics 13.3 (2024): 640.]

[https://link.springer.com/chapter/10.1007/978-3-030-96756-7_1#citeas 7. S. Rafatirad, H. Homayoun, Z. Chen, and S. M. Pudukotai Dinakarrao, "What Is Applied Machine Learning?," in Machine Learning for Computer Scientists and Data Analysts, Cham, Switzerland: Springer, 2022.]

[https://link.springer.com/article/10.1007/s13748-012-0035-5 8. Peteiro-Barral, Diego, and Bertha Guijarro-Berdiñas. "A survey of methods for distributed machine learning." Progress in Artificial Intelligence 2 (2013): 1-11.]

[https://proceedings.neurips.cc/paper/2010/hash/abea47ba24142ed16b7d8fbf2c740e0d-Abstract.html 9. Zinkevich, Martin, et al. "Parallelized stochastic gradient descent." Advances in neural information processing systems 23 (2010).]

[https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9134370 10. F. Zhuang et al., "A Comprehensive Survey on Transfer Learning," in Proceedings of the IEEE, vol. 109, no. 1, pp. 43-76, Jan. 2021, doi: 10.1109/JPROC.2020.3004555]

[https://proceedings.mlr.press/v97/phuong19a/phuong19a.pdf 11. M. Phuong and C. Lampert, "Towards understanding knowledge distillation," in Proc. Int. Conf. Mach. Learn., May 2019, pp. 5142–5151.]

Machine Learning at the Edge

2025-04-25T03:30:36Z

Ciarang: /* 4.3 ML Model Optimization at the Edge */

='''Machine Learning at the Edge'''=

=='''4.1 Overview of ML at the Edge'''==
==='''Introduction'''===
Machine Learning is a branch of Artificial Intelligence that is primarily concerned with the training of algorithms to look at sets of data, analyze patterns, and generate conclusions so that the gathered data can be used to generate results and carry out tasks. As an application, machine learning is especially relevant when considering edge computing [7]. Given the recent prevalence and rise in the field of machine learning, it is inevitable that deployment of machine learning algorithms will play a crucial role in the function of edge computing systems as well. The abundant sensor networks often involved in edge systems provide a means of gathering very large amounts of data. With the help of machine learning, such data can be utilized in very effective ways and automation methods can be employed to improve efficiency. Machine learning on edge devices themselves can help ease the burden on cloud systems and the networks connected to them as well, especially if the model does need such large computational power such as in the case of certain Small Language Models (SLMs). However, the main issue regarding machine learning at the edge lies in the devices, which may have limited computational power, and the environment, which may change dynamically and be effected by network congestion, device outages, or other unexpected events. If these challenges can be overcome, edge systems could play a pivotal role in advancing machine learning and utilize it to maximize efficiency.

[[File:ECML.png|600px|thumb|center|]]

==='''Benefits of Machine Learning using Edge Computing'''===
'''Data Volumes:''' Traditional cloud computing architecture may not be able to keep up with the massive volume of data that is often generated by IoT devices and sensors, thus increasing costs as well as pressure and congestion on the networks transmitting data [5]. With edge computing architecture, machine learning can be deployed in a distributed nature, allowing ML models to be deployed and trained across many edge devices. The nature of edge devices sharing computational tasks makes it perfect for taking large datasets and splitting the computational load so that many devices can compute what a single device might not have been able to handle [5]. More than just splitting input data across edge devices, machine learning models can be split up across edge devices as well. This allows for deployment of models that would be otherwise too large to be deployed on a single edge device with limited memory. In such cases, edge nodes are able to collaboratively pass data between them, and the global ML model is later updated based on the workings of each smaller portion [5]. Additionally, the sheer amount of data that processed from edge devices means more training data for machine learning models, which generally increases accuracy, and thus edge computing could be an effective solution to providing lots of good and relevant data for a variety of applications.

'''Lower Latency:''' For certain real-time applications, such as Virtual and Augmented Reality(VR/AR) or smart cars, the latency required to transmit data and process it to the cloud may be too high to make these applications efficient and safe. A key benefit of edge computing lies in the reduced latency provided by putting devices closer to the users, and this is applicable to machine learning as well [5].

'''Enhanced privacy:''' Processing data locally rather than on the cloud reduces the risk of data theft and enhances privacy. The data does not have to be sent to anyone else, or go over a network to potentially get compromised. This is crucial given the large amounts of data needed for machine learning, and especially if personal data is needed for a personal application, many users would prefer the privacy done by having it processed locally [5].

==='''Challenges of Machine Learning Using Edge Computing'''===

'''Potentially Low Computational Power:''' Many edge devices lack the computational power needed for needed for deep learning applications, especially given the large amount of data and complex operations that may have to take place [1].

'''Energy Management:''' Given that edge devices may consist of sensors, or other devices that are meant to have low energy consumption, the tasks associated with machine learning could quickly drain their power, even if they have sufficient power to run the workloads [6].

'''New Attack Surfaces:''' With more devices comes the increased potential for malicious hackers to steal data or compromise devices, despite the enhanced privacy. Encryption, access controls, and other methodologies must be employed to mitigate the potential for new attack surfaces to be exploited [6].

==='''Applications for ML at the Edge'''===
The utilization of data to make conjectures about what is going on in the environment and how to respond has a variety of use cases that can greatly benefit people, cities, and the environment. By leveraging and monitoring a constant stream of data and training machine learning models to detect or even respond to different events, there can be many practical applications for such systems. These would rely on the combination of edge devices and machine learning to better enhance experience for users and detect events of interest.

'''Self-Driving Cars''': Imitation learning can be leveraged to better understand and emulate human driving. The low latency that can be provided by edge computing is especially useful for the quick decision making needed by these systems [7].

'''Smart Home Devices''': Understanding user habits by leveraging the available data for them can make smart homes more convenient for users. The increased privacy that can come with edge computing, along with a few extra cybersecurity measures, can ensure the personal data that may be used for training is not compromised.

'''Environmental and Industrial Monitoring:''' Sensors deployed in environments and industrial settings could be trained to recognize when there are anomalies or undesirable behavior, thus ensuring a quick response and active information sending [7].

'''Smart Cities:''' Similar to above, the data collected by sensors in cities can leverage machine learning to help in crime and emergency detection, traffic management, or energy management[7].

=='''4.2 ML Training at the Edge'''==

Machine Learning (ML) training at the edge is basically the process of developing, updating, or fine-tuning ML models directly on edge devices like on smartphones, IoT sensors, wearables, and other embedded systems instead of only depending on centralized cloud infrastructure. This approach is becoming a lot more important as the demand for real-time, personalized AI applications continues to grow. By being able to train models closer to where the data is generated, edge-based ML enables faster responses, helps reduce latency, and enhances user privacy by minimizing the need to transmit sensitive data to the cloud. It’s also especially useful in scenarios where devices operate in environments with limited or unreliable network connectivity, allowing them to function more efficiently.

[[File:Machine Learning Pipeline.png|500px|thumb|center]]

[[File:Learning.png|500px|thumb|center| This diagram shows the relationship among different devices in the edge-cloud system. As shown, the deep learning workload is partitioned among all layers so that each one contributes based on its advantages.]]

==='''Benefits:'''===
One significant advantage of training ML models directly on edge devices is reduced latency. By processing data locally, devices can make immediate decisions without the delays caused by transmitting data back and forth to cloud servers. This immediate responsiveness is extremely important for applications like real time health monitoring, autonomous driving, and industrial automation.

Additionally, training machine learning models at the edge significantly enhances user privacy. Since sensitive data can be processed and stored directly on the user's device rather than being sent to centralized cloud servers, the risk of data breaches or unauthorized access during transmission is reduced by a lot. This local data handling is able to prevent exposure of personal or confidential information, providing users greater control over their data. Edge-based training naturally aligns with privacy regulations such as the General Data Protection Regulation (GDPR), which emphasizes strict data security, transparency, and explicit user consent. By keeping personal data localized, edge training not only improves security but also helps organizations easily comply with privacy laws, protecting users’ rights and maintaining trust.

Efficiency and resilience are important benefits of edge training. By training machine learning models directly on edge devices, these devices become capable of processing data locally without relying on constant internet connectivity. This local processing allows edge devices to continue operating effectively even in environments where network connections are weak, unstable, or completely unavailable. Because they are not fully dependent on cloud infrastructure, edge devices can quickly adapt to changes, respond in real-time, and update their ML models based on immediate local data. As a result, edge training ensures reliable performance and uninterrupted operation, making it particularly valuable for remote locations, emergency scenarios, and harsh environments where cloud-based solutions might fail or become unreliable.
Research Papers

An important contribution to the understanding of machine learning (ML) training at the edge is the research paper "Making Distributed Edge Machine Learning for Resource-Constrained Communities and Environments Smarter: Contexts and Challenges" by Truong et al. (2023). This paper focuses on training ML models directly on edge devices in communities and environments facing limitations, such as unstable network connections, limited computational resources, and scarce technical expertise. The authors emphasize the necessity of developing context-aware ML training methods specifically tailored to these environments. Traditional centralized ML training methods often fail to operate effectively in such constrained settings, highlighting the need for decentralized, localized solutions. Truong et al. explore various challenges, including managing data efficiently, deploying suitable software frameworks, and designing intelligent runtime strategies that allow edge devices to train models effectively despite limited resources. Their work points out significant research directions, advocating for more adaptable and sustainable ML training solutions that genuinely reflect the technological and social contexts of resource-limited environments.

==='''Tools and Frameworks:'''===

Frameworks like TensorFlow Lite, PyTorch Mobile, and Edge Impulse are designed to support edge-based model training and inference. These tools allow developers to build and fine-tune models specifically for deployment on low-power devices.

==='''Technical Challenges:'''===

Despite its advantages, ML training at the edge presents challenges, including limited processing power, memory constraints, and energy efficiency. Edge devices often lack the computational resources of cloud servers, requiring lightweight models, optimized algorithms, and energy-efficient hardware.

Real World Applications:
A well known example is Apple’s use of on-device training for personalized voice recognition with Siri. Instead of uploading user voice data to the cloud, Apple uses local training to improve accuracy over time while maintaining user privacy.

==='''Dealing With Challenges'''===
Despite the challenges, of ML at the edge, there are a variety of methods that can be used to provide a more efficient means of training, and making the heavy workloads compatible with even the limited computing power of certain edge devices.

'''Quantization:'''
Quantization is a method that involves reducing the precision of numbers, and thus easing the burden of computational power as well as memory management on the edge devices. There are multiple forms of quantization, but each one essentially sacrifices some precision - enough so that accuracy is mostly maintained but the numbers are easier to handle. For example, converting from floating point to integer datatypes means significantly less memory is used, and the differences for some models in the precision may be negligible. Another example is K-means based Weight Quantization, which involves creating a matrix and grouping similar numbers together with centroids. An example is shown below:

[[File:Screenshot_2025-04-24_170604.png|500px|thumb|center|By clustering each index in the matrix, and using centroids to approximate, the overall computations can be done much quicker and are more easily handled by edge devices]]

'''Pruning:'''
This is a method often used in neural networks, in which the least important connections between different nodes and layers are taken out to remove the overall computation. Certain links between the layers are much more important than others and therefore only some will significantly affect the accuracy. By cutting out the least necessary ones, the same accuracy and training can be achieved with much less computation. This can also be applied to entire neurons in the neural network which are seen as the least significant.

[[File:Pruning.png|350px|thumb|center|This shows how pruning can significantly reduce the overall network, thus leading to better computational and memory management]]

'''Distillation:'''
Distillation is a key strategy for reducing model complexity in edge computing environments. Instead of training a compact student model on hard labels—discrete class labels like 0, 1, or 2—it is trained on the soft outputs of a larger teacher model. These soft labels represent probability distributions over all classes, offering more nuanced supervision. For instance, rather than telling the student the input belongs strictly to class 3, a teacher might output “70% class 3, 25% class 2, 5% class 1.” This richer feedback helps the student model capture subtle relationships between classes that hard labels miss. Beyond reducing computational demands, distillation enhances generalization by conveying more informative training signals. It also benefits from favorable data geometry—when class distributions are well-separated and aligned—and exhibits strong monotonicity, meaning the student model reliably improves as more data becomes available [11]. These properties make it exceptionally suited for edge devices where training data may be limited, but efficient inference is crucial.

In most cases, knowledge distillation in edge environments involves a large, high-capacity model trained in the cloud acting as the teacher, while the smaller, lightweight student model is deployed on edge devices. A less common—but emerging—practice is edge-to-edge distillation, where a more powerful edge node or edge server functions as the teacher for other nearby edge devices. This setup is especially valuable in federated, collaborative, or hierarchical edge networks, where cloud connectivity may be limited or privacy concerns necessitate local training. Distillation can also be combined with techniques such as quantization or pruning to further optimize model performance under hardware constraints. An example is shown below:

[[File:Knowledge_Distillation.png|700px|thumb|center|This shows how a complex teacher model transfers learned knowledge to a smaller student model using soft predictions to enable efficient edge deployment]]

==='''Usage and Applications of AI Agents'''===
As AI develops and is able to identify different images, generate chats and utilize human language, and optimize different systems, the next step is for it to be able to efficiently perform tasks specified by users. This is known as an AI agent. The agent must first perceive its environment and task, often specified by a user, then it must think about what is the best means to accomplish the task, and finally be able to act on it. 3 aspects of AI agents are crucial in their development to efficiently and correctly carry out task:

'''Reasoning:''' The agent must be able to think sequentially, and decompose its specified tasks into a sequence of specific steps in order to accomplish its goal. It must also have some memory storage in order to remember what it has done, as well as the results of its sequence of actions in order to learn for future steps.

'''Autonomy:''' The agent must choose from the availability of possible steps, and operate based on its reasoning without step-by-step instructions from the user.

'''Tools:''' These tasks, however, are impossible to accomplish without the correct tools. Even if an AI agent understands how to go about carrying a task for optimal results, it must have the actual means to do it. This can include the ability to use and interact with APIs, interpret code, and access certain databases.

Utilizing AI agents on edge devices can be tricky due to the computational and reasoning power needed. However, there are methods to accomplish this such as SLMs which query LLMs as need (discussed later), or utilizing more powerful edge devices to carry out tasks. However, utilizing edge devices can be paramount if latency is a major issue, or if the agent is exposed to sensitive user data. Additionally, by using edge devices specific to a user, it may be able to better learn a user's patterns and preferences and react accordingly to provide the best possible outcome for that user.

=='''4.3 ML Model Optimization at the Edge'''==

==='''The Need for Model Optimization at the Edge'''===
Given the constrained resources, along with the inherently dynamic environment that edge devices must function in, model optimization is a crucial part of machine learning in edge computing. The current most widely used methodology consists of simply specifying an exceptionally large set of parameters, and giving it to the model to train on. This can be feasible when hardware is very advanced and powerful, and is necessary for systems such as Large Language Models (LLMs). However, this is no longer viable when dealing with the devices and environments at the edge. It is crucial to identify the best parameters and training methodology so as to minimize the amount of work done by these devices, while compromising as little as possible on the accuracy of the models. There are multiple ways to this, and they include either optimization or augmentation of the dataset itself, or optimization of the partition of work among the edge devices.

==='''Edge and Cloud Collaboration'''===
One methodology that is often used involves collaboration between both Edge and Cloud Devices. The cloud has the ability to process workloads that may require much more resources and cannot be done on edge devices. On the other hand, edge devices, which can store and process data locally, may have lower latency and more privacy. Given the advantages of each of these, many have proposed that the best way to handle machine learning is through a combination of edge and cloud computing.

The primary issue facing this computing paradigm, however, is the problem of optimally selecting which workloads should be done on the cloud and which should be done on the edge. This is a crucial problem to solve, as the correct partition of workloads is the best way to ensure that the respective benefits of the devices can be leveraged. A common way to do this, is to run certain computing tasks on the necessary devices and determine the length of time and resources that it takes. An example of this is the profiling step done in [https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10818760&tag=1 EdgeShard] [1] and [https://dl.acm.org/doi/pdf/10.1145/3093337.3037698 Neurosurgeon] [4]. Other frameworks implement similar steps, where the capabilities of different devices are tested in order to allocate their workloads and determine the limit at which they can provide efficient functionality. If the workload is beyond the limits of the devices, it can be sent to the cloud for processing

The key advantage of this is that it is able to utilize the resources of the edge devices as necessary, allowing increased data privacy and lower latency. Since workloads are only processed in the cloud as needed, this will reduce the overall amount of time needed for processing because data is not constantly sent back and forth. It also allows for much less network congestion, which is crucial for many applications.

[[File:ECcollab.png|400px|thumb|center|The collaboration of Edge and Cloud Devices]]

==='''Optimizing Workload Partitioning'''===
The key idea for much of the optimization done in machine learning on edge systems involves fully utilizing the heterogenous devices that are often contained in these systems. As such, it is important to understand the capabilities of each device so as to fully utilize its advantages. Devices can very greatly from smartphones with more powerful computational abilities to raspberry pis to sensors. More difficult tasks are offloaded to the powerful devices, while simpler tasks, or models that have been somewhat pretrained can be sent to the smaller devices. In some cases, as in [https://ieeexplore.ieee.org/abstract/document/8690980 Mobile-Edge] [2], the task may be dropped altogether if the resources are deemed insufficient. In this way, exceptionally difficult tasks do not block tasks that have the ability to be executed and therefore the system can continue working.

==='''Dynamic Models'''===
Given the dynamic nature of the environments that edge devices must function, as well as the heterogeneity of the devices themselves, a dynamic model of machine learning is often employed. Such models must keep track of the current available resources including computation usage and power, as well as network traffic. These may change very often depending on the workloads and devices in the system. As such, training models to continuously monitor and dynamically distribute the workloads is a very important part of optimization. Simply offloading larger tasks to more powerful devices may be obsolete if the devices has all of its computing resources or network capabilities being used up by another workload.

The way this is commonly done is by using the profiling step described above as a baseline. Then, a machine learning model utilizes the data to estimate the performance of devices and/or layers. During runtime, a similar process is employed which may update the data used and help the model refine its predictions. Network traffic is also taken into account at this stage in order to preserve the edge computing benefit of providing lower latency. Using all of this data and updates at runtime, the partitioning model is able to dynamically distribute workloads at runtime in order to optimize the workflow and ensure each device is utilizing its resources in the most efficient manner. 2 very good examples of how such a system is specifically deployed are the Neurosurgeon and EdgeShard systems, shown above.

==='''Horizontal and Vertical Partitioning'''===
There are 2 major ways that these models split the workloads in order to optimize the machine learning: Horizontal and vertical partitioning [3]. Given a set of layers that ranges from the cloud to edge, vertical partitioning involves splitting up the workload between the layers. For example, if a large amount of computational resources is deemed necessary, this task may go to the cloud to be completed and preprocessed. One the other hand, if a small amount of computational power is required, this type of work can go to edge devices. Such partitioning also depends on the confidence and accuracy level of the given learning. If the accuracy is completed on an edge device and found to be very low, it can be sent to the cloud; on the other hand if the accuracy is already fairly high and the learning model needs smaller work to reach the threshold deemed acceptable, it may be sent to edge devices to free up network traffic on the cloud and reduce latency [3].

The second model of partitioning is called horizontal partitioning. This involves splitting among the devices within a certain layer rather than among the layers themselves. This is similar to what has been described in previous sections, as it allows a means for fully utilizing the heterogenous abilities that are found in edge devices. Similar functionality and determination to what is found in horizontal partitioning is done, but all of the devices that the workload is split across function on the same layer [3]. To fully optimize a machine learning model, both horizontal and vertical partitioning must be used.

[[File:Edge_computing_layers.png|400px|thumb|center|An example of different layers with multiple devices]]

==='''Distributed Learning'''===
Distributed learning is a process in which instead of giving all of the data to the nodes, each node takes in and processes only a subset of the overall data, then sends its results to a central server which contains the main model. These periodic updates are done by each node, and by only processing part of the data, it is much more easy for edge devices to handle these workloads. It can also reduce the time and computational burden on the cloud and network because it is not only the central server performing all of the computations. One popular method of accomplishing this is by using parallel gradient descent. Gradient descent by itself is a very useful means of training a model, and parallel gradient descent uses a similar process but instead of a single operation, it aggregates the gradient descent calculated by each node in order to update the central model. This efficiently utilizes each node and makes sure that the data that is used to update the model does not exceed the memory constraints of the edge devices being used as nodes.

[[File:Distributed.png|600px|thumb|center| A visualization of data partitioning among nodes doing distributed learning]]

'''Asynchronous Scheduling:''' One important aspect of distributed learning is the means by which the updates are given to the model. Each node may have different amounts of data, memory, and computational power, and therefore the time it takes to process and update will not be the same. Synchronous algorithms make sure that each node finishes its respective processing, and then all of the calculations from all nodes are sent to update the central model. Although, this can make the updating easier, any nodes that lag behind the others will greatly reduce the speed at which the model is trained; this also leaves nodes idle while others are still processing and can be highly inefficient. To optimize this, after each node has finished its respective processing, it sends everything to the main server for an update rather than waiting for all the others to finish. However, this requires some more communication, as after each update, every node must get the updated model for the central server; this can be challenging to repeatedly do, but the efficiency that is gained is often worth it.

==='''Transfer Learning'''===
Transfer learning is a method of machine learning in which a pretrained model is sent to the different nodes for further processing and fine-tuning. The initial training of the model may involve a very large amount of data and could place a major burden on the device that must execute it, which may not be feasible for edge devices. Therefore, the bulk of the model training is done by a more powerful machine, such as the cloud, and then the pretrained model is sent to the edge device. This can be useful, as it reduces the computational burden on the edge device, and allows it to fine-tune the model using the data it collects without having to completely train the whole system. One form of this is knowledge distillation, in which a smaller model can be trained to mimic that of a larger model. This may often be the case when edge and cloud systems are used in a combined way.

==='''Methods of Data Optimization'''===

'''Data Selection:''' Certain types of data may be more useful than others, and therefore the ones that most affect the accuracy of a model can be offloaded to edge devices. This decreases the workload on them because less data must be processed, while also conserving the accuracy of the model as much as possible. Some larger data may not be needed, such as only putting Small Language Models on edge devices that can handle simple commands and prompts, rather than having to offload an entire LLM onto the device, which may overload its computational abilities and not provide enough use.

'''Data Compression:''' Data can be compressed to a smaller form in order to fit the constraints of edge devices. This is especially true given their limited memory, and also makes the workload smaller. Quantization, discussed previously, is a prevalent example of this.

'''Container Workloads:''' Container workloads can be very useful, as they provide all resources and important data the device needs for processing the work. By examining the computational abilities of the device, different sized workloads can be allocated as deemed necessary to maximize the efficiency of the training.

'''Batch size tuning:''' The batch size used by a certain model is very important when considering the memory constraints of edge devices. Batch sizes that are smaller allow for a quicker and more efficient means of training models, and are less likely to lead to bottlenecks in memory. This is related to partitioning because the computing and memory capacity of the devices available are very important factors to consider.

==='''Utilizing Small Language Models (SLMs)'''===
Large Language Models (LLMs) have become a prevalent system recently and are able to do and help with a variety of tasks. However, running and training an LLM requires a significant amount of computational resources which is not feasible when working with edge devices. Most modern LLMs are cloud based, but this may lead to high latency and increased network traffic, especially when working with a large subsystem of nodes.

One way that a similar system can be achieved on edge devices is by using SLMs. These are not as accurate and do not have the vast knowledge of LLMs or the amount of data they are trained on, but for the purposes of basic applications and edge devices, they can be sufficient to accomplish many tasks. They are also often fine-tuned and trained to accomplish the specific tasks which they are deployed for and are much faster and resource-efficient than LLMs. They can also provide much more privacy because they are able to be run on local devices without sharing user data to the cloud. This can be useful for a wide variety of edge applications. If needed and privacy constraints permit, they can query and LLM as needed for more complicated tasks. This means that not every prompt leads to a query, and thus network traffic, privacy, and latency constraints are still preserved.

=='''References'''==
[https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10818760&tag=1 1. M. Zhang, X. Shen, J. Cao, Z. Cui and S. Jiang, "EdgeShard: Efficient LLM Inference via Collaborative Edge Computing," in IEEE Internet of Things Journal, doi: 10.1109/JIOT.2024.3524255]

[https://ieeexplore.ieee.org/abstract/document/8690980 2. X. Chen, H. Zhang, C. Wu, S. Mao, Y. Ji and M. Bennis, "Performance Optimization in Mobile-Edge Computing via Deep Reinforcement Learning," 2018 IEEE 88th Vehicular Technology Conference (VTC-Fall), Chicago, IL, USA, 2018, pp. 1-6, doi: 10.1109/VTCFall.2018.8690980.]

[https://ieeexplore.ieee.org/abstract/document/8976180 3. X. Wang, Y. Han, V. C. M. Leung, D. Niyato, X. Yan and X. Chen, "Convergence of Edge Computing and Deep Learning: A Comprehensive Survey," in IEEE Communications Surveys & Tutorials, vol. 22, no. 2, pp. 869-904, Secondquarter 2020, doi: 10.1109/COMST.2020.2970550.]

[https://dl.acm.org/doi/abs/10.1145/3093337.3037698 4. Kang, Yiping and Hauswald, Johann and Gao, Cao and Rovinski, Austin and Mudge, Trevor and Mars, Jason and Tang, Lingjia, "Neurosurgeon: Collaborative Intelligence Between the Cloud and Mobile Edge" 2017 Association for Computing Machinery, New York, NY, USA, 2017, doi: 10.1145/3093337.3037698.]

[https://dl.acm.org/doi/pdf/10.1145/3555802 5. Hua, Haochen, et al. "Edge computing with artificial intelligence: A machine learning perspective." ACM Computing Surveys 55.9 (2023): 1-35.]

[https://www.mdpi.com/2079-9292/13/3/640 6. Grzesik, Piotr, and Dariusz Mrozek. "Combining machine learning and edge computing: Opportunities, challenges, platforms, frameworks, and use cases." Electronics 13.3 (2024): 640.]

[https://link.springer.com/chapter/10.1007/978-3-030-96756-7_1#citeas 7. S. Rafatirad, H. Homayoun, Z. Chen, and S. M. Pudukotai Dinakarrao, "What Is Applied Machine Learning?," in Machine Learning for Computer Scientists and Data Analysts, Cham, Switzerland: Springer, 2022.]

[https://link.springer.com/article/10.1007/s13748-012-0035-5 8. Peteiro-Barral, Diego, and Bertha Guijarro-Berdiñas. "A survey of methods for distributed machine learning." Progress in Artificial Intelligence 2 (2013): 1-11.]

[https://proceedings.neurips.cc/paper/2010/hash/abea47ba24142ed16b7d8fbf2c740e0d-Abstract.html 9. Zinkevich, Martin, et al. "Parallelized stochastic gradient descent." Advances in neural information processing systems 23 (2010).]

[https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9134370 10. F. Zhuang et al., "A Comprehensive Survey on Transfer Learning," in Proceedings of the IEEE, vol. 109, no. 1, pp. 43-76, Jan. 2021, doi: 10.1109/JPROC.2020.3004555]

[https://proceedings.mlr.press/v97/phuong19a/phuong19a.pdf 11. M. Phuong and C. Lampert, "Towards understanding knowledge distillation," in Proc. Int. Conf. Mach. Learn., May 2019, pp. 5142–5151.]

Machine Learning at the Edge

2025-04-25T03:25:13Z

Ciarang: /* Dealing With Challenges */

='''Machine Learning at the Edge'''=

=='''4.1 Overview of ML at the Edge'''==
==='''Introduction'''===
Machine Learning is a branch of Artificial Intelligence that is primarily concerned with the training of algorithms to look at sets of data, analyze patterns, and generate conclusions so that the gathered data can be used to generate results and carry out tasks. As an application, machine learning is especially relevant when considering edge computing [7]. Given the recent prevalence and rise in the field of machine learning, it is inevitable that deployment of machine learning algorithms will play a crucial role in the function of edge computing systems as well. The abundant sensor networks often involved in edge systems provide a means of gathering very large amounts of data. With the help of machine learning, such data can be utilized in very effective ways and automation methods can be employed to improve efficiency. Machine learning on edge devices themselves can help ease the burden on cloud systems and the networks connected to them as well, especially if the model does need such large computational power such as in the case of certain Small Language Models (SLMs). However, the main issue regarding machine learning at the edge lies in the devices, which may have limited computational power, and the environment, which may change dynamically and be effected by network congestion, device outages, or other unexpected events. If these challenges can be overcome, edge systems could play a pivotal role in advancing machine learning and utilize it to maximize efficiency.

[[File:ECML.png|600px|thumb|center|]]

==='''Benefits of Machine Learning using Edge Computing'''===
'''Data Volumes:''' Traditional cloud computing architecture may not be able to keep up with the massive volume of data that is often generated by IoT devices and sensors, thus increasing costs as well as pressure and congestion on the networks transmitting data [5]. With edge computing architecture, machine learning can be deployed in a distributed nature, allowing ML models to be deployed and trained across many edge devices. The nature of edge devices sharing computational tasks makes it perfect for taking large datasets and splitting the computational load so that many devices can compute what a single device might not have been able to handle [5]. More than just splitting input data across edge devices, machine learning models can be split up across edge devices as well. This allows for deployment of models that would be otherwise too large to be deployed on a single edge device with limited memory. In such cases, edge nodes are able to collaboratively pass data between them, and the global ML model is later updated based on the workings of each smaller portion [5]. Additionally, the sheer amount of data that processed from edge devices means more training data for machine learning models, which generally increases accuracy, and thus edge computing could be an effective solution to providing lots of good and relevant data for a variety of applications.

'''Lower Latency:''' For certain real-time applications, such as Virtual and Augmented Reality(VR/AR) or smart cars, the latency required to transmit data and process it to the cloud may be too high to make these applications efficient and safe. A key benefit of edge computing lies in the reduced latency provided by putting devices closer to the users, and this is applicable to machine learning as well [5].

'''Enhanced privacy:''' Processing data locally rather than on the cloud reduces the risk of data theft and enhances privacy. The data does not have to be sent to anyone else, or go over a network to potentially get compromised. This is crucial given the large amounts of data needed for machine learning, and especially if personal data is needed for a personal application, many users would prefer the privacy done by having it processed locally [5].

==='''Challenges of Machine Learning Using Edge Computing'''===

'''Potentially Low Computational Power:''' Many edge devices lack the computational power needed for needed for deep learning applications, especially given the large amount of data and complex operations that may have to take place [1].

'''Energy Management:''' Given that edge devices may consist of sensors, or other devices that are meant to have low energy consumption, the tasks associated with machine learning could quickly drain their power, even if they have sufficient power to run the workloads [6].

'''New Attack Surfaces:''' With more devices comes the increased potential for malicious hackers to steal data or compromise devices, despite the enhanced privacy. Encryption, access controls, and other methodologies must be employed to mitigate the potential for new attack surfaces to be exploited [6].

==='''Applications for ML at the Edge'''===
The utilization of data to make conjectures about what is going on in the environment and how to respond has a variety of use cases that can greatly benefit people, cities, and the environment. By leveraging and monitoring a constant stream of data and training machine learning models to detect or even respond to different events, there can be many practical applications for such systems. These would rely on the combination of edge devices and machine learning to better enhance experience for users and detect events of interest.

'''Self-Driving Cars''': Imitation learning can be leveraged to better understand and emulate human driving. The low latency that can be provided by edge computing is especially useful for the quick decision making needed by these systems [7].

'''Smart Home Devices''': Understanding user habits by leveraging the available data for them can make smart homes more convenient for users. The increased privacy that can come with edge computing, along with a few extra cybersecurity measures, can ensure the personal data that may be used for training is not compromised.

'''Environmental and Industrial Monitoring:''' Sensors deployed in environments and industrial settings could be trained to recognize when there are anomalies or undesirable behavior, thus ensuring a quick response and active information sending [7].

'''Smart Cities:''' Similar to above, the data collected by sensors in cities can leverage machine learning to help in crime and emergency detection, traffic management, or energy management[7].

=='''4.2 ML Training at the Edge'''==

Machine Learning (ML) training at the edge is basically the process of developing, updating, or fine-tuning ML models directly on edge devices like on smartphones, IoT sensors, wearables, and other embedded systems instead of only depending on centralized cloud infrastructure. This approach is becoming a lot more important as the demand for real-time, personalized AI applications continues to grow. By being able to train models closer to where the data is generated, edge-based ML enables faster responses, helps reduce latency, and enhances user privacy by minimizing the need to transmit sensitive data to the cloud. It’s also especially useful in scenarios where devices operate in environments with limited or unreliable network connectivity, allowing them to function more efficiently.

[[File:Machine Learning Pipeline.png|500px|thumb|center]]

[[File:Learning.png|500px|thumb|center| This diagram shows the relationship among different devices in the edge-cloud system. As shown, the deep learning workload is partitioned among all layers so that each one contributes based on its advantages.]]

==='''Benefits:'''===
One significant advantage of training ML models directly on edge devices is reduced latency. By processing data locally, devices can make immediate decisions without the delays caused by transmitting data back and forth to cloud servers. This immediate responsiveness is extremely important for applications like real time health monitoring, autonomous driving, and industrial automation.

Additionally, training machine learning models at the edge significantly enhances user privacy. Since sensitive data can be processed and stored directly on the user's device rather than being sent to centralized cloud servers, the risk of data breaches or unauthorized access during transmission is reduced by a lot. This local data handling is able to prevent exposure of personal or confidential information, providing users greater control over their data. Edge-based training naturally aligns with privacy regulations such as the General Data Protection Regulation (GDPR), which emphasizes strict data security, transparency, and explicit user consent. By keeping personal data localized, edge training not only improves security but also helps organizations easily comply with privacy laws, protecting users’ rights and maintaining trust.

Efficiency and resilience are important benefits of edge training. By training machine learning models directly on edge devices, these devices become capable of processing data locally without relying on constant internet connectivity. This local processing allows edge devices to continue operating effectively even in environments where network connections are weak, unstable, or completely unavailable. Because they are not fully dependent on cloud infrastructure, edge devices can quickly adapt to changes, respond in real-time, and update their ML models based on immediate local data. As a result, edge training ensures reliable performance and uninterrupted operation, making it particularly valuable for remote locations, emergency scenarios, and harsh environments where cloud-based solutions might fail or become unreliable.
Research Papers

An important contribution to the understanding of machine learning (ML) training at the edge is the research paper "Making Distributed Edge Machine Learning for Resource-Constrained Communities and Environments Smarter: Contexts and Challenges" by Truong et al. (2023). This paper focuses on training ML models directly on edge devices in communities and environments facing limitations, such as unstable network connections, limited computational resources, and scarce technical expertise. The authors emphasize the necessity of developing context-aware ML training methods specifically tailored to these environments. Traditional centralized ML training methods often fail to operate effectively in such constrained settings, highlighting the need for decentralized, localized solutions. Truong et al. explore various challenges, including managing data efficiently, deploying suitable software frameworks, and designing intelligent runtime strategies that allow edge devices to train models effectively despite limited resources. Their work points out significant research directions, advocating for more adaptable and sustainable ML training solutions that genuinely reflect the technological and social contexts of resource-limited environments.

==='''Tools and Frameworks:'''===

Frameworks like TensorFlow Lite, PyTorch Mobile, and Edge Impulse are designed to support edge-based model training and inference. These tools allow developers to build and fine-tune models specifically for deployment on low-power devices.

==='''Technical Challenges:'''===

Despite its advantages, ML training at the edge presents challenges, including limited processing power, memory constraints, and energy efficiency. Edge devices often lack the computational resources of cloud servers, requiring lightweight models, optimized algorithms, and energy-efficient hardware.

Real World Applications:
A well known example is Apple’s use of on-device training for personalized voice recognition with Siri. Instead of uploading user voice data to the cloud, Apple uses local training to improve accuracy over time while maintaining user privacy.

==='''Dealing With Challenges'''===
Despite the challenges, of ML at the edge, there are a variety of methods that can be used to provide a more efficient means of training, and making the heavy workloads compatible with even the limited computing power of certain edge devices.

'''Quantization:'''
Quantization is a method that involves reducing the precision of numbers, and thus easing the burden of computational power as well as memory management on the edge devices. There are multiple forms of quantization, but each one essentially sacrifices some precision - enough so that accuracy is mostly maintained but the numbers are easier to handle. For example, converting from floating point to integer datatypes means significantly less memory is used, and the differences for some models in the precision may be negligible. Another example is K-means based Weight Quantization, which involves creating a matrix and grouping similar numbers together with centroids. An example is shown below:

[[File:Screenshot_2025-04-24_170604.png|500px|thumb|center|By clustering each index in the matrix, and using centroids to approximate, the overall computations can be done much quicker and are more easily handled by edge devices]]

'''Pruning:'''
This is a method often used in neural networks, in which the least important connections between different nodes and layers are taken out to remove the overall computation. Certain links between the layers are much more important than others and therefore only some will significantly affect the accuracy. By cutting out the least necessary ones, the same accuracy and training can be achieved with much less computation. This can also be applied to entire neurons in the neural network which are seen as the least significant.

[[File:Pruning.png|350px|thumb|center|This shows how pruning can significantly reduce the overall network, thus leading to better computational and memory management]]

'''Distillation:'''
Distillation is a key strategy for reducing model complexity in edge computing environments. Instead of training a compact student model on hard labels—discrete class labels like 0, 1, or 2—it is trained on the soft outputs of a larger teacher model. These soft labels represent probability distributions over all classes, offering more nuanced supervision. For instance, rather than telling the student the input belongs strictly to class 3, a teacher might output “70% class 3, 25% class 2, 5% class 1.” This richer feedback helps the student model capture subtle relationships between classes that hard labels miss. Beyond reducing computational demands, distillation enhances generalization by conveying more informative training signals. It also benefits from favorable data geometry—when class distributions are well-separated and aligned—and exhibits strong monotonicity, meaning the student model reliably improves as more data becomes available [11]. These properties make it exceptionally suited for edge devices where training data may be limited, but efficient inference is crucial.

In most cases, knowledge distillation in edge environments involves a large, high-capacity model trained in the cloud acting as the teacher, while the smaller, lightweight student model is deployed on edge devices. A less common—but emerging—practice is edge-to-edge distillation, where a more powerful edge node or edge server functions as the teacher for other nearby edge devices. This setup is especially valuable in federated, collaborative, or hierarchical edge networks, where cloud connectivity may be limited or privacy concerns necessitate local training. Distillation can also be combined with techniques such as quantization or pruning to further optimize model performance under hardware constraints. An example is shown below:

[[File:Knowledge_Distillation.png|700px|thumb|center|This shows how a complex teacher model transfers learned knowledge to a smaller student model using soft predictions to enable efficient edge deployment]]

==='''Usage and Applications of AI Agents'''===
As AI develops and is able to identify different images, generate chats and utilize human language, and optimize different systems, the next step is for it to be able to efficiently perform tasks specified by users. This is known as an AI agent. The agent must first perceive its environment and task, often specified by a user, then it must think about what is the best means to accomplish the task, and finally be able to act on it. 3 aspects of AI agents are crucial in their development to efficiently and correctly carry out task:

'''Reasoning:''' The agent must be able to think sequentially, and decompose its specified tasks into a sequence of specific steps in order to accomplish its goal. It must also have some memory storage in order to remember what it has done, as well as the results of its sequence of actions in order to learn for future steps.

'''Autonomy:''' The agent must choose from the availability of possible steps, and operate based on its reasoning without step-by-step instructions from the user.

'''Tools:''' These tasks, however, are impossible to accomplish without the correct tools. Even if an AI agent understands how to go about carrying a task for optimal results, it must have the actual means to do it. This can include the ability to use and interact with APIs, interpret code, and access certain databases.

Utilizing AI agents on edge devices can be tricky due to the computational and reasoning power needed. However, there are methods to accomplish this such as SLMs which query LLMs as need (discussed later), or utilizing more powerful edge devices to carry out tasks. However, utilizing edge devices can be paramount if latency is a major issue, or if the agent is exposed to sensitive user data. Additionally, by using edge devices specific to a user, it may be able to better learn a user's patterns and preferences and react accordingly to provide the best possible outcome for that user.

=='''4.3 ML Model Optimization at the Edge'''==

==='''The Need for Model Optimization at the Edge'''===
Given the constrained resources, along with the inherently dynamic environment that edge devices must function in, model optimization is a crucial part of machine learning in edge computing. The current most widely used methodology consists of simply specifying an exceptionally large set of parameters, and giving it to the model to train on. This can be feasible when hardware is very advanced and powerful, and is necessary for systems such as Large Language Models (LLMs). However, this is no longer viable when dealing with the devices and environments at the edge. It is crucial to identify the best parameters and training methodology so as to minimize the amount of work done by these devices, while compromising as little as possible on the accuracy of the models. There are multiple ways to this, and they include either optimization or augmentation of the dataset itself, or optimization of the partition of work among the edge devices.

==='''Edge and Cloud Collaboration'''===
One methodology that is often used involves collaboration between both Edge and Cloud Devices. The cloud has the ability to process workloads that may require much more resources and cannot be done on edge devices. On the other hand, edge devices, which can store and process data locally, may have lower latency and more privacy. Given the advantages of each of these, many have proposed that the best way to handle machine learning is through a combination of edge and cloud computing.

The primary issue facing this computing paradigm, however, is the problem of optimally selecting which workloads should be done on the cloud and which should be done on the edge. This is a crucial problem to solve, as the correct partition of workloads is the best way to ensure that the respective benefits of the devices can be leveraged. A common way to do this, is to run certain computing tasks on the necessary devices and determine the length of time and resources that it takes. An example of this is the profiling step done in [https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10818760&tag=1 EdgeShard] [1] and [https://dl.acm.org/doi/pdf/10.1145/3093337.3037698 Neurosurgeon] [4]. Other frameworks implement similar steps, where the capabilities of different devices are tested in order to allocate their workloads and determine the limit at which they can provide efficient functionality. If the workload is beyond the limits of the devices, it can be sent to the cloud for processing

The key advantage of this is that it is able to utilize the resources of the edge devices as necessary, allowing increased data privacy and lower latency. Since workloads are only processed in the cloud as needed, this will reduce the overall amount of time needed for processing because data is not constantly sent back and forth. It also allows for much less network congestion, which is crucial for many applications.

[[File:ECcollab.png|400px|thumb|center|The collaboration of Edge and Cloud Devices]]

==='''Optimizing Workload Partitioning'''===
The key idea for much of the optimization done in machine learning on edge systems involves fully utilizing the heterogenous devices that are often contained in these systems. As such, it is important to understand the capabilities of each device so as to fully utilize its advantages. Devices can very greatly from smartphones with more powerful computational abilities to raspberry pis to sensors. More difficult tasks are offloaded to the powerful devices, while simpler tasks, or models that have been somewhat pretrained can be sent to the smaller devices. In some cases, as in [https://ieeexplore.ieee.org/abstract/document/8690980 Mobile-Edge] [2], the task may be dropped altogether if the resources are deemed insufficient. In this way, exceptionally difficult tasks do not block tasks that have the ability to be executed and therefore the system can continue working.

==='''Dynamic Models'''===
Given the dynamic nature of the environments that edge devices must function, as well as the heterogeneity of the devices themselves, a dynamic model of machine learning is often employed. Such models must keep track of the current available resources including computation usage and power, as well as network traffic. These may change very often depending on the workloads and devices in the system. As such, training models to continuously monitor and dynamically distribute the workloads is a very important part of optimization. Simply offloading larger tasks to more powerful devices may be obsolete if the devices has all of its computing resources or network capabilities being used up by another workload.

The way this is commonly done is by using the profiling step described above as a baseline. Then, a machine learning model utilizes the data to estimate the performance of devices and/or layers. During runtime, a similar process is employed which may update the data used and help the model refine its predictions. Network traffic is also taken into account at this stage in order to preserve the edge computing benefit of providing lower latency. Using all of this data and updates at runtime, the partitioning model is able to dynamically distribute workloads at runtime in order to optimize the workflow and ensure each device is utilizing its resources in the most efficient manner. 2 very good examples of how such a system is specifically deployed are the Neurosurgeon and EdgeShard systems, shown above.

==='''Horizontal and Vertical Partitioning'''===
There are 2 major ways that these models split the workloads in order to optimize the machine learning: Horizontal and vertical partitioning [3]. Given a set of layers that ranges from the cloud to edge, vertical partitioning involves splitting up the workload between the layers. For example, if a large amount of computational resources is deemed necessary, this task may go to the cloud to be completed and preprocessed. One the other hand, if a small amount of computational power is required, this type of work can go to edge devices. Such partitioning also depends on the confidence and accuracy level of the given learning. If the accuracy is completed on an edge device and found to be very low, it can be sent to the cloud; on the other hand if the accuracy is already fairly high and the learning model needs smaller work to reach the threshold deemed acceptable, it may be sent to edge devices to free up network traffic on the cloud and reduce latency [3].

The second model of partitioning is called horizontal partitioning. This involves splitting among the devices within a certain layer rather than among the layers themselves. This is similar to what has been described in previous sections, as it allows a means for fully utilizing the heterogenous abilities that are found in edge devices. Similar functionality and determination to what is found in horizontal partitioning is done, but all of the devices that the workload is split across function on the same layer [3]. To fully optimize a machine learning model, both horizontal and vertical partitioning must be used.

[[File:Edge_computing_layers.png|400px|thumb|center|An example of different layers with multiple devices]]

==='''Distributed Learning'''===
Distributed learning is a process in which instead of giving all of the data to the nodes, each node takes in and processes only a subset of the overall data, then sends its results to a central server which contains the main model. These periodic updates are done by each node, and by only processing part of the data, it is much more easy for edge devices to handle these workloads. It can also reduce the time and computational burden on the cloud and network because it is not only the central server performing all of the computations. One popular method of accomplishing this is by using parallel gradient descent. Gradient descent by itself is a very useful means of training a model, and parallel gradient descent uses a similar process but instead of a single operation, it aggregates the gradient descent calculated by each node in order to update the central model. This efficiently utilizes each node and makes sure that the data that is used to update the model does not exceed the memory constraints of the edge devices being used as nodes.

[[File:Distributed.png|600px|thumb|center| A visualization of data partitioning among nodes doing distributed learning]]

'''Asynchronous Scheduling:''' One important aspect of distributed learning is the means by which the updates are given to the model. Each node may have different amounts of data, memory, and computational power, and therefore the time it takes to process and update will not be the same. Synchronous algorithms make sure that each node finishes its respective processing, and then all of the calculations from all nodes are sent to update the central model. Although, this can make the updating easier, any nodes that lag behind the others will greatly reduce the speed at which the model is trained; this also leaves nodes idle while others are still processing and can be highly inefficient. To optimize this, after each node has finished its respective processing, it sends everything to the main server for an update rather than waiting for all the others to finish. However, this requires some more communication, as after each update, every node must get the updated model for the central server; this can be challenging to repeatedly do, but the efficiency that is gained is often worth it.

==='''Transfer Learning'''===
Transfer learning is a method of machine learning in which a pretrained model is sent to the different nodes for further processing and fine-tuning. The initial training of the model may involve a very large amount of data and could place a major burden on the device that must execute it, which may not be feasible for edge devices. Therefore, the bulk of the model training is done by a more powerful machine, such as the cloud, and then the pretrained model is sent to the edge device. This can be useful, as it reduces the computational burden on the edge device, and allows it to fine-tune the model using the data it collects without having to completely train the whole system. One form of this is knowledge distillation, in which a smaller model can be trained to mimic that of a larger model. This may often be the case when edge and cloud systems are used in a combined way.

==='''Methods of Data Optimization'''===

'''Data Selection:''' Certain types of data may be more useful than others, and therefore the ones that most affect the accuracy of a model can be offloaded to edge devices. This decreases the workload on them because less data must be processed, while also conserving the accuracy of the model as much as possible. Some larger data may not be needed, such as only putting Small Language Models on edge devices that can handle simple commands and prompts, rather than having to offload an entire LLM onto the device, which may overload its computational abilities and not provide enough use.

'''Data Compression:''' Data can be compressed to a smaller form in order to fit the constraints of edge devices. This is especially true given their limited memory, and also makes the workload smaller. Quantization, discussed previously, is a prevalent example of this.

'''Container Workloads:''' Container workloads can be very useful, as they provide all resources and important data the device needs for processing the work. By examining the computational abilities of the device, different sized workloads can be allocated as deemed necessary to maximize the efficiency of the training.

'''Batch size tuning:''' The batch size used by a certain model is very important when considering the memory constraints of edge devices. Batch sizes that are smaller allow for a quicker and more efficient means of training models, and are less likely to lead to bottlenecks in memory. This is related to partitioning because the computing and memory capacity of the devices available are very important factors to consider.

==='''Utilizing Small Language Models (SLMs)'''===
Large Language Models (LLMs) have become a prevalent system recently and are able to do and help with a variety of tasks. However, running and training an LLM requires a significant amount of computational resources which is not feasible when working with edge devices. Most modern LLMs are cloud based, but this may lead to high latency and increased network traffic, especially when working with a large subsystem of nodes.

One way that a similar system can be achieved on edge devices is by using SLMs. These are not as accurate and do not have the vast knowledge of LLMs or the amount of data they are trained on, but for the purposes of basic applications and edge devices, they can be sufficient to accomplish many tasks. They are also often fine-tuned and trained to accomplish the specific tasks which they are deployed for and are much faster and resource-efficient than LLMs. They can also provide much more privacy because they are able to be run on local devices without sharing user data to the cloud. This can be useful for a wide variety of edge applications. If needed and privacy constraints permit, they can query and LLM as needed for more complicated tasks. This means that not every prompt leads to a query, and thus network traffic, privacy, and latency constraints are still preserved.

=='''References'''==
[https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10818760&tag=1 1. M. Zhang, X. Shen, J. Cao, Z. Cui and S. Jiang, "EdgeShard: Efficient LLM Inference via Collaborative Edge Computing," in IEEE Internet of Things Journal, doi: 10.1109/JIOT.2024.3524255]

[https://ieeexplore.ieee.org/abstract/document/8690980 2. X. Chen, H. Zhang, C. Wu, S. Mao, Y. Ji and M. Bennis, "Performance Optimization in Mobile-Edge Computing via Deep Reinforcement Learning," 2018 IEEE 88th Vehicular Technology Conference (VTC-Fall), Chicago, IL, USA, 2018, pp. 1-6, doi: 10.1109/VTCFall.2018.8690980.]

[https://ieeexplore.ieee.org/abstract/document/8976180 3. X. Wang, Y. Han, V. C. M. Leung, D. Niyato, X. Yan and X. Chen, "Convergence of Edge Computing and Deep Learning: A Comprehensive Survey," in IEEE Communications Surveys & Tutorials, vol. 22, no. 2, pp. 869-904, Secondquarter 2020, doi: 10.1109/COMST.2020.2970550.]

[https://dl.acm.org/doi/abs/10.1145/3093337.3037698 4. Kang, Yiping and Hauswald, Johann and Gao, Cao and Rovinski, Austin and Mudge, Trevor and Mars, Jason and Tang, Lingjia, "Neurosurgeon: Collaborative Intelligence Between the Cloud and Mobile Edge" 2017 Association for Computing Machinery, New York, NY, USA, 2017, doi: 10.1145/3093337.3037698.]

[https://dl.acm.org/doi/pdf/10.1145/3555802 5. Hua, Haochen, et al. "Edge computing with artificial intelligence: A machine learning perspective." ACM Computing Surveys 55.9 (2023): 1-35.]

[https://www.mdpi.com/2079-9292/13/3/640 6. Grzesik, Piotr, and Dariusz Mrozek. "Combining machine learning and edge computing: Opportunities, challenges, platforms, frameworks, and use cases." Electronics 13.3 (2024): 640.]

[https://link.springer.com/chapter/10.1007/978-3-030-96756-7_1#citeas 7. S. Rafatirad, H. Homayoun, Z. Chen, and S. M. Pudukotai Dinakarrao, "What Is Applied Machine Learning?," in Machine Learning for Computer Scientists and Data Analysts, Cham, Switzerland: Springer, 2022.]

[https://link.springer.com/article/10.1007/s13748-012-0035-5 8. Peteiro-Barral, Diego, and Bertha Guijarro-Berdiñas. "A survey of methods for distributed machine learning." Progress in Artificial Intelligence 2 (2013): 1-11.]

[https://proceedings.neurips.cc/paper/2010/hash/abea47ba24142ed16b7d8fbf2c740e0d-Abstract.html 9. Zinkevich, Martin, et al. "Parallelized stochastic gradient descent." Advances in neural information processing systems 23 (2010).]

[https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9134370 10. F. Zhuang et al., "A Comprehensive Survey on Transfer Learning," in Proceedings of the IEEE, vol. 109, no. 1, pp. 43-76, Jan. 2021, doi: 10.1109/JPROC.2020.3004555]

[https://proceedings.mlr.press/v97/phuong19a/phuong19a.pdf 11. M. Phuong and C. Lampert, "Towards understanding knowledge distillation," in Proc. Int. Conf. Mach. Learn., May 2019, pp. 5142–5151.]

Machine Learning at the Edge

2025-04-25T03:03:55Z

Ciarang: /* Dealing With Challenges */

='''Machine Learning at the Edge'''=

=='''4.1 Overview of ML at the Edge'''==
==='''Introduction'''===
Machine Learning is a branch of Artificial Intelligence that is primarily concerned with the training of algorithms to look at sets of data, analyze patterns, and generate conclusions so that the gathered data can be used to generate results and carry out tasks. As an application, machine learning is especially relevant when considering edge computing [7]. Given the recent prevalence and rise in the field of machine learning, it is inevitable that deployment of machine learning algorithms will play a crucial role in the function of edge computing systems as well. The abundant sensor networks often involved in edge systems provide a means of gathering very large amounts of data. With the help of machine learning, such data can be utilized in very effective ways and automation methods can be employed to improve efficiency. Machine learning on edge devices themselves can help ease the burden on cloud systems and the networks connected to them as well, especially if the model does need such large computational power such as in the case of certain Small Language Models (SLMs). However, the main issue regarding machine learning at the edge lies in the devices, which may have limited computational power, and the environment, which may change dynamically and be effected by network congestion, device outages, or other unexpected events. If these challenges can be overcome, edge systems could play a pivotal role in advancing machine learning and utilize it to maximize efficiency.

[[File:ECML.png|600px|thumb|center|]]

==='''Benefits of Machine Learning using Edge Computing'''===
'''Data Volumes:''' Traditional cloud computing architecture may not be able to keep up with the massive volume of data that is often generated by IoT devices and sensors, thus increasing costs as well as pressure and congestion on the networks transmitting data [5]. With edge computing architecture, machine learning can be deployed in a distributed nature, allowing ML models to be deployed and trained across many edge devices. The nature of edge devices sharing computational tasks makes it perfect for taking large datasets and splitting the computational load so that many devices can compute what a single device might not have been able to handle [5]. More than just splitting input data across edge devices, machine learning models can be split up across edge devices as well. This allows for deployment of models that would be otherwise too large to be deployed on a single edge device with limited memory. In such cases, edge nodes are able to collaboratively pass data between them, and the global ML model is later updated based on the workings of each smaller portion [5]. Additionally, the sheer amount of data that processed from edge devices means more training data for machine learning models, which generally increases accuracy, and thus edge computing could be an effective solution to providing lots of good and relevant data for a variety of applications.

'''Lower Latency:''' For certain real-time applications, such as Virtual and Augmented Reality(VR/AR) or smart cars, the latency required to transmit data and process it to the cloud may be too high to make these applications efficient and safe. A key benefit of edge computing lies in the reduced latency provided by putting devices closer to the users, and this is applicable to machine learning as well [5].

'''Enhanced privacy:''' Processing data locally rather than on the cloud reduces the risk of data theft and enhances privacy. The data does not have to be sent to anyone else, or go over a network to potentially get compromised. This is crucial given the large amounts of data needed for machine learning, and especially if personal data is needed for a personal application, many users would prefer the privacy done by having it processed locally [5].

==='''Challenges of Machine Learning Using Edge Computing'''===

'''Potentially Low Computational Power:''' Many edge devices lack the computational power needed for needed for deep learning applications, especially given the large amount of data and complex operations that may have to take place [1].

'''Energy Management:''' Given that edge devices may consist of sensors, or other devices that are meant to have low energy consumption, the tasks associated with machine learning could quickly drain their power, even if they have sufficient power to run the workloads [6].

'''New Attack Surfaces:''' With more devices comes the increased potential for malicious hackers to steal data or compromise devices, despite the enhanced privacy. Encryption, access controls, and other methodologies must be employed to mitigate the potential for new attack surfaces to be exploited [6].

==='''Applications for ML at the Edge'''===
The utilization of data to make conjectures about what is going on in the environment and how to respond has a variety of use cases that can greatly benefit people, cities, and the environment. By leveraging and monitoring a constant stream of data and training machine learning models to detect or even respond to different events, there can be many practical applications for such systems. These would rely on the combination of edge devices and machine learning to better enhance experience for users and detect events of interest.

'''Self-Driving Cars''': Imitation learning can be leveraged to better understand and emulate human driving. The low latency that can be provided by edge computing is especially useful for the quick decision making needed by these systems [7].

'''Smart Home Devices''': Understanding user habits by leveraging the available data for them can make smart homes more convenient for users. The increased privacy that can come with edge computing, along with a few extra cybersecurity measures, can ensure the personal data that may be used for training is not compromised.

'''Environmental and Industrial Monitoring:''' Sensors deployed in environments and industrial settings could be trained to recognize when there are anomalies or undesirable behavior, thus ensuring a quick response and active information sending [7].

'''Smart Cities:''' Similar to above, the data collected by sensors in cities can leverage machine learning to help in crime and emergency detection, traffic management, or energy management[7].

=='''4.2 ML Training at the Edge'''==

Machine Learning (ML) training at the edge is basically the process of developing, updating, or fine-tuning ML models directly on edge devices like on smartphones, IoT sensors, wearables, and other embedded systems instead of only depending on centralized cloud infrastructure. This approach is becoming a lot more important as the demand for real-time, personalized AI applications continues to grow. By being able to train models closer to where the data is generated, edge-based ML enables faster responses, helps reduce latency, and enhances user privacy by minimizing the need to transmit sensitive data to the cloud. It’s also especially useful in scenarios where devices operate in environments with limited or unreliable network connectivity, allowing them to function more efficiently.

[[File:Machine Learning Pipeline.png|500px|thumb|center]]

[[File:Learning.png|500px|thumb|center| This diagram shows the relationship among different devices in the edge-cloud system. As shown, the deep learning workload is partitioned among all layers so that each one contributes based on its advantages.]]

==='''Benefits:'''===
One significant advantage of training ML models directly on edge devices is reduced latency. By processing data locally, devices can make immediate decisions without the delays caused by transmitting data back and forth to cloud servers. This immediate responsiveness is extremely important for applications like real time health monitoring, autonomous driving, and industrial automation.

Additionally, training machine learning models at the edge significantly enhances user privacy. Since sensitive data can be processed and stored directly on the user's device rather than being sent to centralized cloud servers, the risk of data breaches or unauthorized access during transmission is reduced by a lot. This local data handling is able to prevent exposure of personal or confidential information, providing users greater control over their data. Edge-based training naturally aligns with privacy regulations such as the General Data Protection Regulation (GDPR), which emphasizes strict data security, transparency, and explicit user consent. By keeping personal data localized, edge training not only improves security but also helps organizations easily comply with privacy laws, protecting users’ rights and maintaining trust.

Efficiency and resilience are important benefits of edge training. By training machine learning models directly on edge devices, these devices become capable of processing data locally without relying on constant internet connectivity. This local processing allows edge devices to continue operating effectively even in environments where network connections are weak, unstable, or completely unavailable. Because they are not fully dependent on cloud infrastructure, edge devices can quickly adapt to changes, respond in real-time, and update their ML models based on immediate local data. As a result, edge training ensures reliable performance and uninterrupted operation, making it particularly valuable for remote locations, emergency scenarios, and harsh environments where cloud-based solutions might fail or become unreliable.
Research Papers

An important contribution to the understanding of machine learning (ML) training at the edge is the research paper "Making Distributed Edge Machine Learning for Resource-Constrained Communities and Environments Smarter: Contexts and Challenges" by Truong et al. (2023). This paper focuses on training ML models directly on edge devices in communities and environments facing limitations, such as unstable network connections, limited computational resources, and scarce technical expertise. The authors emphasize the necessity of developing context-aware ML training methods specifically tailored to these environments. Traditional centralized ML training methods often fail to operate effectively in such constrained settings, highlighting the need for decentralized, localized solutions. Truong et al. explore various challenges, including managing data efficiently, deploying suitable software frameworks, and designing intelligent runtime strategies that allow edge devices to train models effectively despite limited resources. Their work points out significant research directions, advocating for more adaptable and sustainable ML training solutions that genuinely reflect the technological and social contexts of resource-limited environments.

==='''Tools and Frameworks:'''===

Frameworks like TensorFlow Lite, PyTorch Mobile, and Edge Impulse are designed to support edge-based model training and inference. These tools allow developers to build and fine-tune models specifically for deployment on low-power devices.

==='''Technical Challenges:'''===

Despite its advantages, ML training at the edge presents challenges, including limited processing power, memory constraints, and energy efficiency. Edge devices often lack the computational resources of cloud servers, requiring lightweight models, optimized algorithms, and energy-efficient hardware.

Real World Applications:
A well known example is Apple’s use of on-device training for personalized voice recognition with Siri. Instead of uploading user voice data to the cloud, Apple uses local training to improve accuracy over time while maintaining user privacy.

==='''Dealing With Challenges'''===
Despite the challenges, of ML at the edge, there are a variety of methods that can be used to provide a more efficient means of training, and making the heavy workloads compatible with even the limited computing power of certain edge devices.

'''Quantization:'''
Quantization is a method that involves reducing the precision of numbers, and thus easing the burden of computational power as well as memory management on the edge devices. There are multiple forms of quantization, but each one essentially sacrifices some precision - enough so that accuracy is mostly maintained but the numbers are easier to handle. For example, converting from floating point to integer datatypes means significantly less memory is used, and the differences for some models in the precision may be negligible. Another example is K-means based Weight Quantization, which involves creating a matrix and grouping similar numbers together with centroids. An example is shown below:

[[File:Screenshot_2025-04-24_170604.png|500px|thumb|center|By clustering each index in the matrix, and using centroids to approximate, the overall computations can be done much quicker and are more easily handled by edge devices]]

'''Pruning:'''
This is a method often used in neural networks, in which the least important connections between different nodes and layers are taken out to remove the overall computation. Certain links between the layers are much more important than others and therefore only some will significantly affect the accuracy. By cutting out the least necessary ones, the same accuracy and training can be achieved with much less computation. This can also be applied to entire neurons in the neural network which are seen as the least significant.

[[File:Pruning.png|350px|thumb|center|This shows how pruning can significantly reduce the overall network, thus leading to better computational and memory management]]

'''Distillation:'''
Distillation is a key strategy for reducing model complexity in edge computing environments. Instead of training a compact student model on hard labels—discrete class labels like 0, 1, or 2—it is trained on the soft outputs of a larger teacher model. These soft labels represent probability distributions over all classes, offering more nuanced supervision. For instance, rather than telling the student the input belongs strictly to class 3, a teacher might output “70% class 3, 25% class 2, 5% class 1.” This richer feedback helps the student model capture subtle relationships between classes that hard labels miss. Beyond reducing computational demands, distillation enhances generalization by conveying more informative training signals. It also benefits from favorable data geometry—when class distributions are well-separated and aligned—and exhibits strong monotonicity, meaning the student model reliably improves as more data becomes available [11]. These properties make it exceptionally suited for edge devices where training data may be limited, but efficient inference is crucial. Distillation can also be combined with quantization or pruning to further optimize performance under hardware constraints. An example is shown below:

[[File:Knowledge_Distillation.png|700px|thumb|center|This shows how a complex teacher model transfers learned knowledge to a smaller student model using soft predictions to enable efficient edge deployment]]

==='''Usage and Applications of AI Agents'''===
As AI develops and is able to identify different images, generate chats and utilize human language, and optimize different systems, the next step is for it to be able to efficiently perform tasks specified by users. This is known as an AI agent. The agent must first perceive its environment and task, often specified by a user, then it must think about what is the best means to accomplish the task, and finally be able to act on it. 3 aspects of AI agents are crucial in their development to efficiently and correctly carry out task:

'''Reasoning:''' The agent must be able to think sequentially, and decompose its specified tasks into a sequence of specific steps in order to accomplish its goal. It must also have some memory storage in order to remember what it has done, as well as the results of its sequence of actions in order to learn for future steps.

'''Autonomy:''' The agent must choose from the availability of possible steps, and operate based on its reasoning without step-by-step instructions from the user.

'''Tools:''' These tasks, however, are impossible to accomplish without the correct tools. Even if an AI agent understands how to go about carrying a task for optimal results, it must have the actual means to do it. This can include the ability to use and interact with APIs, interpret code, and access certain databases.

Utilizing AI agents on edge devices can be tricky due to the computational and reasoning power needed. However, there are methods to accomplish this such as SLMs which query LLMs as need (discussed later), or utilizing more powerful edge devices to carry out tasks. However, utilizing edge devices can be paramount if latency is a major issue, or if the agent is exposed to sensitive user data. Additionally, by using edge devices specific to a user, it may be able to better learn a user's patterns and preferences and react accordingly to provide the best possible outcome for that user.

=='''4.3 ML Model Optimization at the Edge'''==

==='''The Need for Model Optimization at the Edge'''===
Given the constrained resources, along with the inherently dynamic environment that edge devices must function in, model optimization is a crucial part of machine learning in edge computing. The current most widely used methodology consists of simply specifying an exceptionally large set of parameters, and giving it to the model to train on. This can be feasible when hardware is very advanced and powerful, and is necessary for systems such as Large Language Models (LLMs). However, this is no longer viable when dealing with the devices and environments at the edge. It is crucial to identify the best parameters and training methodology so as to minimize the amount of work done by these devices, while compromising as little as possible on the accuracy of the models. There are multiple ways to this, and they include either optimization or augmentation of the dataset itself, or optimization of the partition of work among the edge devices.

==='''Edge and Cloud Collaboration'''===
One methodology that is often used involves collaboration between both Edge and Cloud Devices. The cloud has the ability to process workloads that may require much more resources and cannot be done on edge devices. On the other hand, edge devices, which can store and process data locally, may have lower latency and more privacy. Given the advantages of each of these, many have proposed that the best way to handle machine learning is through a combination of edge and cloud computing.

The primary issue facing this computing paradigm, however, is the problem of optimally selecting which workloads should be done on the cloud and which should be done on the edge. This is a crucial problem to solve, as the correct partition of workloads is the best way to ensure that the respective benefits of the devices can be leveraged. A common way to do this, is to run certain computing tasks on the necessary devices and determine the length of time and resources that it takes. An example of this is the profiling step done in [https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10818760&tag=1 EdgeShard] [1] and [https://dl.acm.org/doi/pdf/10.1145/3093337.3037698 Neurosurgeon] [4]. Other frameworks implement similar steps, where the capabilities of different devices are tested in order to allocate their workloads and determine the limit at which they can provide efficient functionality. If the workload is beyond the limits of the devices, it can be sent to the cloud for processing

The key advantage of this is that it is able to utilize the resources of the edge devices as necessary, allowing increased data privacy and lower latency. Since workloads are only processed in the cloud as needed, this will reduce the overall amount of time needed for processing because data is not constantly sent back and forth. It also allows for much less network congestion, which is crucial for many applications.

[[File:ECcollab.png|400px|thumb|center|The collaboration of Edge and Cloud Devices]]

==='''Optimizing Workload Partitioning'''===
The key idea for much of the optimization done in machine learning on edge systems involves fully utilizing the heterogenous devices that are often contained in these systems. As such, it is important to understand the capabilities of each device so as to fully utilize its advantages. Devices can very greatly from smartphones with more powerful computational abilities to raspberry pis to sensors. More difficult tasks are offloaded to the powerful devices, while simpler tasks, or models that have been somewhat pretrained can be sent to the smaller devices. In some cases, as in [https://ieeexplore.ieee.org/abstract/document/8690980 Mobile-Edge] [2], the task may be dropped altogether if the resources are deemed insufficient. In this way, exceptionally difficult tasks do not block tasks that have the ability to be executed and therefore the system can continue working.

==='''Dynamic Models'''===
Given the dynamic nature of the environments that edge devices must function, as well as the heterogeneity of the devices themselves, a dynamic model of machine learning is often employed. Such models must keep track of the current available resources including computation usage and power, as well as network traffic. These may change very often depending on the workloads and devices in the system. As such, training models to continuously monitor and dynamically distribute the workloads is a very important part of optimization. Simply offloading larger tasks to more powerful devices may be obsolete if the devices has all of its computing resources or network capabilities being used up by another workload.

The way this is commonly done is by using the profiling step described above as a baseline. Then, a machine learning model utilizes the data to estimate the performance of devices and/or layers. During runtime, a similar process is employed which may update the data used and help the model refine its predictions. Network traffic is also taken into account at this stage in order to preserve the edge computing benefit of providing lower latency. Using all of this data and updates at runtime, the partitioning model is able to dynamically distribute workloads at runtime in order to optimize the workflow and ensure each device is utilizing its resources in the most efficient manner. 2 very good examples of how such a system is specifically deployed are the Neurosurgeon and EdgeShard systems, shown above.

==='''Horizontal and Vertical Partitioning'''===
There are 2 major ways that these models split the workloads in order to optimize the machine learning: Horizontal and vertical partitioning [3]. Given a set of layers that ranges from the cloud to edge, vertical partitioning involves splitting up the workload between the layers. For example, if a large amount of computational resources is deemed necessary, this task may go to the cloud to be completed and preprocessed. One the other hand, if a small amount of computational power is required, this type of work can go to edge devices. Such partitioning also depends on the confidence and accuracy level of the given learning. If the accuracy is completed on an edge device and found to be very low, it can be sent to the cloud; on the other hand if the accuracy is already fairly high and the learning model needs smaller work to reach the threshold deemed acceptable, it may be sent to edge devices to free up network traffic on the cloud and reduce latency [3].

The second model of partitioning is called horizontal partitioning. This involves splitting among the devices within a certain layer rather than among the layers themselves. This is similar to what has been described in previous sections, as it allows a means for fully utilizing the heterogenous abilities that are found in edge devices. Similar functionality and determination to what is found in horizontal partitioning is done, but all of the devices that the workload is split across function on the same layer [3]. To fully optimize a machine learning model, both horizontal and vertical partitioning must be used.

[[File:Edge_computing_layers.png|400px|thumb|center|An example of different layers with multiple devices]]

==='''Distributed Learning'''===
Distributed learning is a process in which instead of giving all of the data to the nodes, each node takes in and processes only a subset of the overall data, then sends its results to a central server which contains the main model. These periodic updates are done by each node, and by only processing part of the data, it is much more easy for edge devices to handle these workloads. It can also reduce the time and computational burden on the cloud and network because it is not only the central server performing all of the computations. One popular method of accomplishing this is by using parallel gradient descent. Gradient descent by itself is a very useful means of training a model, and parallel gradient descent uses a similar process but instead of a single operation, it aggregates the gradient descent calculated by each node in order to update the central model. This efficiently utilizes each node and makes sure that the data that is used to update the model does not exceed the memory constraints of the edge devices being used as nodes.

[[File:Distributed.png|600px|thumb|center| A visualization of data partitioning among nodes doing distributed learning]]

'''Asynchronous Scheduling:''' One important aspect of distributed learning is the means by which the updates are given to the model. Each node may have different amounts of data, memory, and computational power, and therefore the time it takes to process and update will not be the same. Synchronous algorithms make sure that each node finishes its respective processing, and then all of the calculations from all nodes are sent to update the central model. Although, this can make the updating easier, any nodes that lag behind the others will greatly reduce the speed at which the model is trained; this also leaves nodes idle while others are still processing and can be highly inefficient. To optimize this, after each node has finished its respective processing, it sends everything to the main server for an update rather than waiting for all the others to finish. However, this requires some more communication, as after each update, every node must get the updated model for the central server; this can be challenging to repeatedly do, but the efficiency that is gained is often worth it.

==='''Transfer Learning'''===
Transfer learning is a method of machine learning in which a pretrained model is sent to the different nodes for further processing and fine-tuning. The initial training of the model may involve a very large amount of data and could place a major burden on the device that must execute it, which may not be feasible for edge devices. Therefore, the bulk of the model training is done by a more powerful machine, such as the cloud, and then the pretrained model is sent to the edge device. This can be useful, as it reduces the computational burden on the edge device, and allows it to fine-tune the model using the data it collects without having to completely train the whole system. One form of this is knowledge distillation, in which a smaller model can be trained to mimic that of a larger model. This may often be the case when edge and cloud systems are used in a combined way.

==='''Methods of Data Optimization'''===

'''Data Selection:''' Certain types of data may be more useful than others, and therefore the ones that most affect the accuracy of a model can be offloaded to edge devices. This decreases the workload on them because less data must be processed, while also conserving the accuracy of the model as much as possible. Some larger data may not be needed, such as only putting Small Language Models on edge devices that can handle simple commands and prompts, rather than having to offload an entire LLM onto the device, which may overload its computational abilities and not provide enough use.

'''Data Compression:''' Data can be compressed to a smaller form in order to fit the constraints of edge devices. This is especially true given their limited memory, and also makes the workload smaller. Quantization, discussed previously, is a prevalent example of this.

'''Container Workloads:''' Container workloads can be very useful, as they provide all resources and important data the device needs for processing the work. By examining the computational abilities of the device, different sized workloads can be allocated as deemed necessary to maximize the efficiency of the training.

'''Batch size tuning:''' The batch size used by a certain model is very important when considering the memory constraints of edge devices. Batch sizes that are smaller allow for a quicker and more efficient means of training models, and are less likely to lead to bottlenecks in memory. This is related to partitioning because the computing and memory capacity of the devices available are very important factors to consider.

==='''Utilizing Small Language Models (SLMs)'''===
Large Language Models (LLMs) have become a prevalent system recently and are able to do and help with a variety of tasks. However, running and training an LLM requires a significant amount of computational resources which is not feasible when working with edge devices. Most modern LLMs are cloud based, but this may lead to high latency and increased network traffic, especially when working with a large subsystem of nodes.

One way that a similar system can be achieved on edge devices is by using SLMs. These are not as accurate and do not have the vast knowledge of LLMs or the amount of data they are trained on, but for the purposes of basic applications and edge devices, they can be sufficient to accomplish many tasks. They are also often fine-tuned and trained to accomplish the specific tasks which they are deployed for and are much faster and resource-efficient than LLMs. They can also provide much more privacy because they are able to be run on local devices without sharing user data to the cloud. This can be useful for a wide variety of edge applications. If needed and privacy constraints permit, they can query and LLM as needed for more complicated tasks. This means that not every prompt leads to a query, and thus network traffic, privacy, and latency constraints are still preserved.

=='''References'''==
[https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10818760&tag=1 1. M. Zhang, X. Shen, J. Cao, Z. Cui and S. Jiang, "EdgeShard: Efficient LLM Inference via Collaborative Edge Computing," in IEEE Internet of Things Journal, doi: 10.1109/JIOT.2024.3524255]

[https://ieeexplore.ieee.org/abstract/document/8690980 2. X. Chen, H. Zhang, C. Wu, S. Mao, Y. Ji and M. Bennis, "Performance Optimization in Mobile-Edge Computing via Deep Reinforcement Learning," 2018 IEEE 88th Vehicular Technology Conference (VTC-Fall), Chicago, IL, USA, 2018, pp. 1-6, doi: 10.1109/VTCFall.2018.8690980.]

[https://ieeexplore.ieee.org/abstract/document/8976180 3. X. Wang, Y. Han, V. C. M. Leung, D. Niyato, X. Yan and X. Chen, "Convergence of Edge Computing and Deep Learning: A Comprehensive Survey," in IEEE Communications Surveys & Tutorials, vol. 22, no. 2, pp. 869-904, Secondquarter 2020, doi: 10.1109/COMST.2020.2970550.]

[https://dl.acm.org/doi/abs/10.1145/3093337.3037698 4. Kang, Yiping and Hauswald, Johann and Gao, Cao and Rovinski, Austin and Mudge, Trevor and Mars, Jason and Tang, Lingjia, "Neurosurgeon: Collaborative Intelligence Between the Cloud and Mobile Edge" 2017 Association for Computing Machinery, New York, NY, USA, 2017, doi: 10.1145/3093337.3037698.]

[https://dl.acm.org/doi/pdf/10.1145/3555802 5. Hua, Haochen, et al. "Edge computing with artificial intelligence: A machine learning perspective." ACM Computing Surveys 55.9 (2023): 1-35.]

[https://www.mdpi.com/2079-9292/13/3/640 6. Grzesik, Piotr, and Dariusz Mrozek. "Combining machine learning and edge computing: Opportunities, challenges, platforms, frameworks, and use cases." Electronics 13.3 (2024): 640.]

[https://link.springer.com/chapter/10.1007/978-3-030-96756-7_1#citeas 7. S. Rafatirad, H. Homayoun, Z. Chen, and S. M. Pudukotai Dinakarrao, "What Is Applied Machine Learning?," in Machine Learning for Computer Scientists and Data Analysts, Cham, Switzerland: Springer, 2022.]

[https://link.springer.com/article/10.1007/s13748-012-0035-5 8. Peteiro-Barral, Diego, and Bertha Guijarro-Berdiñas. "A survey of methods for distributed machine learning." Progress in Artificial Intelligence 2 (2013): 1-11.]

[https://proceedings.neurips.cc/paper/2010/hash/abea47ba24142ed16b7d8fbf2c740e0d-Abstract.html 9. Zinkevich, Martin, et al. "Parallelized stochastic gradient descent." Advances in neural information processing systems 23 (2010).]

[https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9134370 10. F. Zhuang et al., "A Comprehensive Survey on Transfer Learning," in Proceedings of the IEEE, vol. 109, no. 1, pp. 43-76, Jan. 2021, doi: 10.1109/JPROC.2020.3004555]

[https://proceedings.mlr.press/v97/phuong19a/phuong19a.pdf 11. M. Phuong and C. Lampert, "Towards understanding knowledge distillation," in Proc. Int. Conf. Mach. Learn., May 2019, pp. 5142–5151.]

Machine Learning at the Edge

2025-04-25T03:03:29Z

Ciarang: /* Dealing With Challenges */

='''Machine Learning at the Edge'''=

=='''4.1 Overview of ML at the Edge'''==
==='''Introduction'''===
Machine Learning is a branch of Artificial Intelligence that is primarily concerned with the training of algorithms to look at sets of data, analyze patterns, and generate conclusions so that the gathered data can be used to generate results and carry out tasks. As an application, machine learning is especially relevant when considering edge computing [7]. Given the recent prevalence and rise in the field of machine learning, it is inevitable that deployment of machine learning algorithms will play a crucial role in the function of edge computing systems as well. The abundant sensor networks often involved in edge systems provide a means of gathering very large amounts of data. With the help of machine learning, such data can be utilized in very effective ways and automation methods can be employed to improve efficiency. Machine learning on edge devices themselves can help ease the burden on cloud systems and the networks connected to them as well, especially if the model does need such large computational power such as in the case of certain Small Language Models (SLMs). However, the main issue regarding machine learning at the edge lies in the devices, which may have limited computational power, and the environment, which may change dynamically and be effected by network congestion, device outages, or other unexpected events. If these challenges can be overcome, edge systems could play a pivotal role in advancing machine learning and utilize it to maximize efficiency.

[[File:ECML.png|600px|thumb|center|]]

==='''Benefits of Machine Learning using Edge Computing'''===
'''Data Volumes:''' Traditional cloud computing architecture may not be able to keep up with the massive volume of data that is often generated by IoT devices and sensors, thus increasing costs as well as pressure and congestion on the networks transmitting data [5]. With edge computing architecture, machine learning can be deployed in a distributed nature, allowing ML models to be deployed and trained across many edge devices. The nature of edge devices sharing computational tasks makes it perfect for taking large datasets and splitting the computational load so that many devices can compute what a single device might not have been able to handle [5]. More than just splitting input data across edge devices, machine learning models can be split up across edge devices as well. This allows for deployment of models that would be otherwise too large to be deployed on a single edge device with limited memory. In such cases, edge nodes are able to collaboratively pass data between them, and the global ML model is later updated based on the workings of each smaller portion [5]. Additionally, the sheer amount of data that processed from edge devices means more training data for machine learning models, which generally increases accuracy, and thus edge computing could be an effective solution to providing lots of good and relevant data for a variety of applications.

'''Lower Latency:''' For certain real-time applications, such as Virtual and Augmented Reality(VR/AR) or smart cars, the latency required to transmit data and process it to the cloud may be too high to make these applications efficient and safe. A key benefit of edge computing lies in the reduced latency provided by putting devices closer to the users, and this is applicable to machine learning as well [5].

'''Enhanced privacy:''' Processing data locally rather than on the cloud reduces the risk of data theft and enhances privacy. The data does not have to be sent to anyone else, or go over a network to potentially get compromised. This is crucial given the large amounts of data needed for machine learning, and especially if personal data is needed for a personal application, many users would prefer the privacy done by having it processed locally [5].

==='''Challenges of Machine Learning Using Edge Computing'''===

'''Potentially Low Computational Power:''' Many edge devices lack the computational power needed for needed for deep learning applications, especially given the large amount of data and complex operations that may have to take place [1].

'''Energy Management:''' Given that edge devices may consist of sensors, or other devices that are meant to have low energy consumption, the tasks associated with machine learning could quickly drain their power, even if they have sufficient power to run the workloads [6].

'''New Attack Surfaces:''' With more devices comes the increased potential for malicious hackers to steal data or compromise devices, despite the enhanced privacy. Encryption, access controls, and other methodologies must be employed to mitigate the potential for new attack surfaces to be exploited [6].

==='''Applications for ML at the Edge'''===
The utilization of data to make conjectures about what is going on in the environment and how to respond has a variety of use cases that can greatly benefit people, cities, and the environment. By leveraging and monitoring a constant stream of data and training machine learning models to detect or even respond to different events, there can be many practical applications for such systems. These would rely on the combination of edge devices and machine learning to better enhance experience for users and detect events of interest.

'''Self-Driving Cars''': Imitation learning can be leveraged to better understand and emulate human driving. The low latency that can be provided by edge computing is especially useful for the quick decision making needed by these systems [7].

'''Smart Home Devices''': Understanding user habits by leveraging the available data for them can make smart homes more convenient for users. The increased privacy that can come with edge computing, along with a few extra cybersecurity measures, can ensure the personal data that may be used for training is not compromised.

'''Environmental and Industrial Monitoring:''' Sensors deployed in environments and industrial settings could be trained to recognize when there are anomalies or undesirable behavior, thus ensuring a quick response and active information sending [7].

'''Smart Cities:''' Similar to above, the data collected by sensors in cities can leverage machine learning to help in crime and emergency detection, traffic management, or energy management[7].

=='''4.2 ML Training at the Edge'''==

Machine Learning (ML) training at the edge is basically the process of developing, updating, or fine-tuning ML models directly on edge devices like on smartphones, IoT sensors, wearables, and other embedded systems instead of only depending on centralized cloud infrastructure. This approach is becoming a lot more important as the demand for real-time, personalized AI applications continues to grow. By being able to train models closer to where the data is generated, edge-based ML enables faster responses, helps reduce latency, and enhances user privacy by minimizing the need to transmit sensitive data to the cloud. It’s also especially useful in scenarios where devices operate in environments with limited or unreliable network connectivity, allowing them to function more efficiently.

[[File:Machine Learning Pipeline.png|500px|thumb|center]]

[[File:Learning.png|500px|thumb|center| This diagram shows the relationship among different devices in the edge-cloud system. As shown, the deep learning workload is partitioned among all layers so that each one contributes based on its advantages.]]

==='''Benefits:'''===
One significant advantage of training ML models directly on edge devices is reduced latency. By processing data locally, devices can make immediate decisions without the delays caused by transmitting data back and forth to cloud servers. This immediate responsiveness is extremely important for applications like real time health monitoring, autonomous driving, and industrial automation.

Additionally, training machine learning models at the edge significantly enhances user privacy. Since sensitive data can be processed and stored directly on the user's device rather than being sent to centralized cloud servers, the risk of data breaches or unauthorized access during transmission is reduced by a lot. This local data handling is able to prevent exposure of personal or confidential information, providing users greater control over their data. Edge-based training naturally aligns with privacy regulations such as the General Data Protection Regulation (GDPR), which emphasizes strict data security, transparency, and explicit user consent. By keeping personal data localized, edge training not only improves security but also helps organizations easily comply with privacy laws, protecting users’ rights and maintaining trust.

Efficiency and resilience are important benefits of edge training. By training machine learning models directly on edge devices, these devices become capable of processing data locally without relying on constant internet connectivity. This local processing allows edge devices to continue operating effectively even in environments where network connections are weak, unstable, or completely unavailable. Because they are not fully dependent on cloud infrastructure, edge devices can quickly adapt to changes, respond in real-time, and update their ML models based on immediate local data. As a result, edge training ensures reliable performance and uninterrupted operation, making it particularly valuable for remote locations, emergency scenarios, and harsh environments where cloud-based solutions might fail or become unreliable.
Research Papers

An important contribution to the understanding of machine learning (ML) training at the edge is the research paper "Making Distributed Edge Machine Learning for Resource-Constrained Communities and Environments Smarter: Contexts and Challenges" by Truong et al. (2023). This paper focuses on training ML models directly on edge devices in communities and environments facing limitations, such as unstable network connections, limited computational resources, and scarce technical expertise. The authors emphasize the necessity of developing context-aware ML training methods specifically tailored to these environments. Traditional centralized ML training methods often fail to operate effectively in such constrained settings, highlighting the need for decentralized, localized solutions. Truong et al. explore various challenges, including managing data efficiently, deploying suitable software frameworks, and designing intelligent runtime strategies that allow edge devices to train models effectively despite limited resources. Their work points out significant research directions, advocating for more adaptable and sustainable ML training solutions that genuinely reflect the technological and social contexts of resource-limited environments.

==='''Tools and Frameworks:'''===

Frameworks like TensorFlow Lite, PyTorch Mobile, and Edge Impulse are designed to support edge-based model training and inference. These tools allow developers to build and fine-tune models specifically for deployment on low-power devices.

==='''Technical Challenges:'''===

Despite its advantages, ML training at the edge presents challenges, including limited processing power, memory constraints, and energy efficiency. Edge devices often lack the computational resources of cloud servers, requiring lightweight models, optimized algorithms, and energy-efficient hardware.

Real World Applications:
A well known example is Apple’s use of on-device training for personalized voice recognition with Siri. Instead of uploading user voice data to the cloud, Apple uses local training to improve accuracy over time while maintaining user privacy.

==='''Dealing With Challenges'''===
Despite the challenges, of ML at the edge, there are a variety of methods that can be used to provide a more efficient means of training, and making the heavy workloads compatible with even the limited computing power of certain edge devices.

'''Quantization:'''
Quantization is a method that involves reducing the precision of numbers, and thus easing the burden of computational power as well as memory management on the edge devices. There are multiple forms of quantization, but each one essentially sacrifices some precision - enough so that accuracy is mostly maintained but the numbers are easier to handle. For example, converting from floating point to integer datatypes means significantly less memory is used, and the differences for some models in the precision may be negligible. Another example is K-means based Weight Quantization, which involves creating a matrix and grouping similar numbers together with centroids. An example is shown below:

[[File:Screenshot_2025-04-24_170604.png|500px|thumb|center|By clustering each index in the matrix, and using centroids to approximate, the overall computations can be done much quicker and are more easily handled by edge devices]]

'''Pruning:'''
This is a method often used in neural networks, in which the least important connections between different nodes and layers are taken out to remove the overall computation. Certain links between the layers are much more important than others and therefore only some will significantly affect the accuracy. By cutting out the least necessary ones, the same accuracy and training can be achieved with much less computation. This can also be applied to entire neurons in the neural network which are seen as the least significant.

[[File:Pruning.png|350px|thumb|center|This shows how pruning can significantly reduce the overall network, thus leading to better computational and memory management]]

'''Distillation:'''
Distillation is a key strategy for reducing model complexity in edge computing environments. Instead of training a compact student model on hard labels—discrete class labels like 0, 1, or 2—it is trained on the soft outputs of a larger teacher model. These soft labels represent probability distributions over all classes, offering more nuanced supervision. For instance, rather than telling the student the input belongs strictly to class 3, a teacher might output “70% class 3, 25% class 2, 5% class 1.” This richer feedback helps the student model capture subtle relationships between classes that hard labels miss. Beyond reducing computational demands, distillation enhances generalization by conveying more informative training signals. It also benefits from favorable data geometry—when class distributions are well-separated and aligned—and exhibits strong monotonicity, meaning the student model reliably improves as more data becomes available [11]. These properties make it exceptionally suited for edge devices where training data may be limited, but efficient inference is crucial. Distillation can also be combined with quantization or pruning to further optimize performance under hardware constraints. An example is shown below:

[[File:Knowledge_Distillation.png|500px|thumb|center|This shows how a complex teacher model transfers learned knowledge to a smaller student model using soft predictions to enable efficient edge deployment]]

==='''Usage and Applications of AI Agents'''===
As AI develops and is able to identify different images, generate chats and utilize human language, and optimize different systems, the next step is for it to be able to efficiently perform tasks specified by users. This is known as an AI agent. The agent must first perceive its environment and task, often specified by a user, then it must think about what is the best means to accomplish the task, and finally be able to act on it. 3 aspects of AI agents are crucial in their development to efficiently and correctly carry out task:

'''Reasoning:''' The agent must be able to think sequentially, and decompose its specified tasks into a sequence of specific steps in order to accomplish its goal. It must also have some memory storage in order to remember what it has done, as well as the results of its sequence of actions in order to learn for future steps.

'''Autonomy:''' The agent must choose from the availability of possible steps, and operate based on its reasoning without step-by-step instructions from the user.

'''Tools:''' These tasks, however, are impossible to accomplish without the correct tools. Even if an AI agent understands how to go about carrying a task for optimal results, it must have the actual means to do it. This can include the ability to use and interact with APIs, interpret code, and access certain databases.

Utilizing AI agents on edge devices can be tricky due to the computational and reasoning power needed. However, there are methods to accomplish this such as SLMs which query LLMs as need (discussed later), or utilizing more powerful edge devices to carry out tasks. However, utilizing edge devices can be paramount if latency is a major issue, or if the agent is exposed to sensitive user data. Additionally, by using edge devices specific to a user, it may be able to better learn a user's patterns and preferences and react accordingly to provide the best possible outcome for that user.

=='''4.3 ML Model Optimization at the Edge'''==

==='''The Need for Model Optimization at the Edge'''===
Given the constrained resources, along with the inherently dynamic environment that edge devices must function in, model optimization is a crucial part of machine learning in edge computing. The current most widely used methodology consists of simply specifying an exceptionally large set of parameters, and giving it to the model to train on. This can be feasible when hardware is very advanced and powerful, and is necessary for systems such as Large Language Models (LLMs). However, this is no longer viable when dealing with the devices and environments at the edge. It is crucial to identify the best parameters and training methodology so as to minimize the amount of work done by these devices, while compromising as little as possible on the accuracy of the models. There are multiple ways to this, and they include either optimization or augmentation of the dataset itself, or optimization of the partition of work among the edge devices.

==='''Edge and Cloud Collaboration'''===
One methodology that is often used involves collaboration between both Edge and Cloud Devices. The cloud has the ability to process workloads that may require much more resources and cannot be done on edge devices. On the other hand, edge devices, which can store and process data locally, may have lower latency and more privacy. Given the advantages of each of these, many have proposed that the best way to handle machine learning is through a combination of edge and cloud computing.

The primary issue facing this computing paradigm, however, is the problem of optimally selecting which workloads should be done on the cloud and which should be done on the edge. This is a crucial problem to solve, as the correct partition of workloads is the best way to ensure that the respective benefits of the devices can be leveraged. A common way to do this, is to run certain computing tasks on the necessary devices and determine the length of time and resources that it takes. An example of this is the profiling step done in [https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10818760&tag=1 EdgeShard] [1] and [https://dl.acm.org/doi/pdf/10.1145/3093337.3037698 Neurosurgeon] [4]. Other frameworks implement similar steps, where the capabilities of different devices are tested in order to allocate their workloads and determine the limit at which they can provide efficient functionality. If the workload is beyond the limits of the devices, it can be sent to the cloud for processing

The key advantage of this is that it is able to utilize the resources of the edge devices as necessary, allowing increased data privacy and lower latency. Since workloads are only processed in the cloud as needed, this will reduce the overall amount of time needed for processing because data is not constantly sent back and forth. It also allows for much less network congestion, which is crucial for many applications.

[[File:ECcollab.png|400px|thumb|center|The collaboration of Edge and Cloud Devices]]

==='''Optimizing Workload Partitioning'''===
The key idea for much of the optimization done in machine learning on edge systems involves fully utilizing the heterogenous devices that are often contained in these systems. As such, it is important to understand the capabilities of each device so as to fully utilize its advantages. Devices can very greatly from smartphones with more powerful computational abilities to raspberry pis to sensors. More difficult tasks are offloaded to the powerful devices, while simpler tasks, or models that have been somewhat pretrained can be sent to the smaller devices. In some cases, as in [https://ieeexplore.ieee.org/abstract/document/8690980 Mobile-Edge] [2], the task may be dropped altogether if the resources are deemed insufficient. In this way, exceptionally difficult tasks do not block tasks that have the ability to be executed and therefore the system can continue working.

==='''Dynamic Models'''===
Given the dynamic nature of the environments that edge devices must function, as well as the heterogeneity of the devices themselves, a dynamic model of machine learning is often employed. Such models must keep track of the current available resources including computation usage and power, as well as network traffic. These may change very often depending on the workloads and devices in the system. As such, training models to continuously monitor and dynamically distribute the workloads is a very important part of optimization. Simply offloading larger tasks to more powerful devices may be obsolete if the devices has all of its computing resources or network capabilities being used up by another workload.

The way this is commonly done is by using the profiling step described above as a baseline. Then, a machine learning model utilizes the data to estimate the performance of devices and/or layers. During runtime, a similar process is employed which may update the data used and help the model refine its predictions. Network traffic is also taken into account at this stage in order to preserve the edge computing benefit of providing lower latency. Using all of this data and updates at runtime, the partitioning model is able to dynamically distribute workloads at runtime in order to optimize the workflow and ensure each device is utilizing its resources in the most efficient manner. 2 very good examples of how such a system is specifically deployed are the Neurosurgeon and EdgeShard systems, shown above.

==='''Horizontal and Vertical Partitioning'''===
There are 2 major ways that these models split the workloads in order to optimize the machine learning: Horizontal and vertical partitioning [3]. Given a set of layers that ranges from the cloud to edge, vertical partitioning involves splitting up the workload between the layers. For example, if a large amount of computational resources is deemed necessary, this task may go to the cloud to be completed and preprocessed. One the other hand, if a small amount of computational power is required, this type of work can go to edge devices. Such partitioning also depends on the confidence and accuracy level of the given learning. If the accuracy is completed on an edge device and found to be very low, it can be sent to the cloud; on the other hand if the accuracy is already fairly high and the learning model needs smaller work to reach the threshold deemed acceptable, it may be sent to edge devices to free up network traffic on the cloud and reduce latency [3].

The second model of partitioning is called horizontal partitioning. This involves splitting among the devices within a certain layer rather than among the layers themselves. This is similar to what has been described in previous sections, as it allows a means for fully utilizing the heterogenous abilities that are found in edge devices. Similar functionality and determination to what is found in horizontal partitioning is done, but all of the devices that the workload is split across function on the same layer [3]. To fully optimize a machine learning model, both horizontal and vertical partitioning must be used.

[[File:Edge_computing_layers.png|400px|thumb|center|An example of different layers with multiple devices]]

==='''Distributed Learning'''===
Distributed learning is a process in which instead of giving all of the data to the nodes, each node takes in and processes only a subset of the overall data, then sends its results to a central server which contains the main model. These periodic updates are done by each node, and by only processing part of the data, it is much more easy for edge devices to handle these workloads. It can also reduce the time and computational burden on the cloud and network because it is not only the central server performing all of the computations. One popular method of accomplishing this is by using parallel gradient descent. Gradient descent by itself is a very useful means of training a model, and parallel gradient descent uses a similar process but instead of a single operation, it aggregates the gradient descent calculated by each node in order to update the central model. This efficiently utilizes each node and makes sure that the data that is used to update the model does not exceed the memory constraints of the edge devices being used as nodes.

[[File:Distributed.png|600px|thumb|center| A visualization of data partitioning among nodes doing distributed learning]]

'''Asynchronous Scheduling:''' One important aspect of distributed learning is the means by which the updates are given to the model. Each node may have different amounts of data, memory, and computational power, and therefore the time it takes to process and update will not be the same. Synchronous algorithms make sure that each node finishes its respective processing, and then all of the calculations from all nodes are sent to update the central model. Although, this can make the updating easier, any nodes that lag behind the others will greatly reduce the speed at which the model is trained; this also leaves nodes idle while others are still processing and can be highly inefficient. To optimize this, after each node has finished its respective processing, it sends everything to the main server for an update rather than waiting for all the others to finish. However, this requires some more communication, as after each update, every node must get the updated model for the central server; this can be challenging to repeatedly do, but the efficiency that is gained is often worth it.

==='''Transfer Learning'''===
Transfer learning is a method of machine learning in which a pretrained model is sent to the different nodes for further processing and fine-tuning. The initial training of the model may involve a very large amount of data and could place a major burden on the device that must execute it, which may not be feasible for edge devices. Therefore, the bulk of the model training is done by a more powerful machine, such as the cloud, and then the pretrained model is sent to the edge device. This can be useful, as it reduces the computational burden on the edge device, and allows it to fine-tune the model using the data it collects without having to completely train the whole system. One form of this is knowledge distillation, in which a smaller model can be trained to mimic that of a larger model. This may often be the case when edge and cloud systems are used in a combined way.

==='''Methods of Data Optimization'''===

'''Data Selection:''' Certain types of data may be more useful than others, and therefore the ones that most affect the accuracy of a model can be offloaded to edge devices. This decreases the workload on them because less data must be processed, while also conserving the accuracy of the model as much as possible. Some larger data may not be needed, such as only putting Small Language Models on edge devices that can handle simple commands and prompts, rather than having to offload an entire LLM onto the device, which may overload its computational abilities and not provide enough use.

'''Data Compression:''' Data can be compressed to a smaller form in order to fit the constraints of edge devices. This is especially true given their limited memory, and also makes the workload smaller. Quantization, discussed previously, is a prevalent example of this.

'''Container Workloads:''' Container workloads can be very useful, as they provide all resources and important data the device needs for processing the work. By examining the computational abilities of the device, different sized workloads can be allocated as deemed necessary to maximize the efficiency of the training.

'''Batch size tuning:''' The batch size used by a certain model is very important when considering the memory constraints of edge devices. Batch sizes that are smaller allow for a quicker and more efficient means of training models, and are less likely to lead to bottlenecks in memory. This is related to partitioning because the computing and memory capacity of the devices available are very important factors to consider.

==='''Utilizing Small Language Models (SLMs)'''===
Large Language Models (LLMs) have become a prevalent system recently and are able to do and help with a variety of tasks. However, running and training an LLM requires a significant amount of computational resources which is not feasible when working with edge devices. Most modern LLMs are cloud based, but this may lead to high latency and increased network traffic, especially when working with a large subsystem of nodes.

One way that a similar system can be achieved on edge devices is by using SLMs. These are not as accurate and do not have the vast knowledge of LLMs or the amount of data they are trained on, but for the purposes of basic applications and edge devices, they can be sufficient to accomplish many tasks. They are also often fine-tuned and trained to accomplish the specific tasks which they are deployed for and are much faster and resource-efficient than LLMs. They can also provide much more privacy because they are able to be run on local devices without sharing user data to the cloud. This can be useful for a wide variety of edge applications. If needed and privacy constraints permit, they can query and LLM as needed for more complicated tasks. This means that not every prompt leads to a query, and thus network traffic, privacy, and latency constraints are still preserved.

=='''References'''==
[https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10818760&tag=1 1. M. Zhang, X. Shen, J. Cao, Z. Cui and S. Jiang, "EdgeShard: Efficient LLM Inference via Collaborative Edge Computing," in IEEE Internet of Things Journal, doi: 10.1109/JIOT.2024.3524255]

[https://ieeexplore.ieee.org/abstract/document/8690980 2. X. Chen, H. Zhang, C. Wu, S. Mao, Y. Ji and M. Bennis, "Performance Optimization in Mobile-Edge Computing via Deep Reinforcement Learning," 2018 IEEE 88th Vehicular Technology Conference (VTC-Fall), Chicago, IL, USA, 2018, pp. 1-6, doi: 10.1109/VTCFall.2018.8690980.]

[https://ieeexplore.ieee.org/abstract/document/8976180 3. X. Wang, Y. Han, V. C. M. Leung, D. Niyato, X. Yan and X. Chen, "Convergence of Edge Computing and Deep Learning: A Comprehensive Survey," in IEEE Communications Surveys & Tutorials, vol. 22, no. 2, pp. 869-904, Secondquarter 2020, doi: 10.1109/COMST.2020.2970550.]

[https://dl.acm.org/doi/abs/10.1145/3093337.3037698 4. Kang, Yiping and Hauswald, Johann and Gao, Cao and Rovinski, Austin and Mudge, Trevor and Mars, Jason and Tang, Lingjia, "Neurosurgeon: Collaborative Intelligence Between the Cloud and Mobile Edge" 2017 Association for Computing Machinery, New York, NY, USA, 2017, doi: 10.1145/3093337.3037698.]

[https://dl.acm.org/doi/pdf/10.1145/3555802 5. Hua, Haochen, et al. "Edge computing with artificial intelligence: A machine learning perspective." ACM Computing Surveys 55.9 (2023): 1-35.]

[https://www.mdpi.com/2079-9292/13/3/640 6. Grzesik, Piotr, and Dariusz Mrozek. "Combining machine learning and edge computing: Opportunities, challenges, platforms, frameworks, and use cases." Electronics 13.3 (2024): 640.]

[https://link.springer.com/chapter/10.1007/978-3-030-96756-7_1#citeas 7. S. Rafatirad, H. Homayoun, Z. Chen, and S. M. Pudukotai Dinakarrao, "What Is Applied Machine Learning?," in Machine Learning for Computer Scientists and Data Analysts, Cham, Switzerland: Springer, 2022.]

[https://link.springer.com/article/10.1007/s13748-012-0035-5 8. Peteiro-Barral, Diego, and Bertha Guijarro-Berdiñas. "A survey of methods for distributed machine learning." Progress in Artificial Intelligence 2 (2013): 1-11.]

[https://proceedings.neurips.cc/paper/2010/hash/abea47ba24142ed16b7d8fbf2c740e0d-Abstract.html 9. Zinkevich, Martin, et al. "Parallelized stochastic gradient descent." Advances in neural information processing systems 23 (2010).]

[https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9134370 10. F. Zhuang et al., "A Comprehensive Survey on Transfer Learning," in Proceedings of the IEEE, vol. 109, no. 1, pp. 43-76, Jan. 2021, doi: 10.1109/JPROC.2020.3004555]

[https://proceedings.mlr.press/v97/phuong19a/phuong19a.pdf 11. M. Phuong and C. Lampert, "Towards understanding knowledge distillation," in Proc. Int. Conf. Mach. Learn., May 2019, pp. 5142–5151.]

Machine Learning at the Edge

2025-04-25T03:00:40Z

Ciarang: /* Dealing With Challenges */

='''Machine Learning at the Edge'''=

=='''4.1 Overview of ML at the Edge'''==
==='''Introduction'''===
Machine Learning is a branch of Artificial Intelligence that is primarily concerned with the training of algorithms to look at sets of data, analyze patterns, and generate conclusions so that the gathered data can be used to generate results and carry out tasks. As an application, machine learning is especially relevant when considering edge computing [7]. Given the recent prevalence and rise in the field of machine learning, it is inevitable that deployment of machine learning algorithms will play a crucial role in the function of edge computing systems as well. The abundant sensor networks often involved in edge systems provide a means of gathering very large amounts of data. With the help of machine learning, such data can be utilized in very effective ways and automation methods can be employed to improve efficiency. Machine learning on edge devices themselves can help ease the burden on cloud systems and the networks connected to them as well, especially if the model does need such large computational power such as in the case of certain Small Language Models (SLMs). However, the main issue regarding machine learning at the edge lies in the devices, which may have limited computational power, and the environment, which may change dynamically and be effected by network congestion, device outages, or other unexpected events. If these challenges can be overcome, edge systems could play a pivotal role in advancing machine learning and utilize it to maximize efficiency.

[[File:ECML.png|600px|thumb|center|]]

==='''Benefits of Machine Learning using Edge Computing'''===
'''Data Volumes:''' Traditional cloud computing architecture may not be able to keep up with the massive volume of data that is often generated by IoT devices and sensors, thus increasing costs as well as pressure and congestion on the networks transmitting data [5]. With edge computing architecture, machine learning can be deployed in a distributed nature, allowing ML models to be deployed and trained across many edge devices. The nature of edge devices sharing computational tasks makes it perfect for taking large datasets and splitting the computational load so that many devices can compute what a single device might not have been able to handle [5]. More than just splitting input data across edge devices, machine learning models can be split up across edge devices as well. This allows for deployment of models that would be otherwise too large to be deployed on a single edge device with limited memory. In such cases, edge nodes are able to collaboratively pass data between them, and the global ML model is later updated based on the workings of each smaller portion [5]. Additionally, the sheer amount of data that processed from edge devices means more training data for machine learning models, which generally increases accuracy, and thus edge computing could be an effective solution to providing lots of good and relevant data for a variety of applications.

'''Lower Latency:''' For certain real-time applications, such as Virtual and Augmented Reality(VR/AR) or smart cars, the latency required to transmit data and process it to the cloud may be too high to make these applications efficient and safe. A key benefit of edge computing lies in the reduced latency provided by putting devices closer to the users, and this is applicable to machine learning as well [5].

'''Enhanced privacy:''' Processing data locally rather than on the cloud reduces the risk of data theft and enhances privacy. The data does not have to be sent to anyone else, or go over a network to potentially get compromised. This is crucial given the large amounts of data needed for machine learning, and especially if personal data is needed for a personal application, many users would prefer the privacy done by having it processed locally [5].

==='''Challenges of Machine Learning Using Edge Computing'''===

'''Potentially Low Computational Power:''' Many edge devices lack the computational power needed for needed for deep learning applications, especially given the large amount of data and complex operations that may have to take place [1].

'''Energy Management:''' Given that edge devices may consist of sensors, or other devices that are meant to have low energy consumption, the tasks associated with machine learning could quickly drain their power, even if they have sufficient power to run the workloads [6].

'''New Attack Surfaces:''' With more devices comes the increased potential for malicious hackers to steal data or compromise devices, despite the enhanced privacy. Encryption, access controls, and other methodologies must be employed to mitigate the potential for new attack surfaces to be exploited [6].

==='''Applications for ML at the Edge'''===
The utilization of data to make conjectures about what is going on in the environment and how to respond has a variety of use cases that can greatly benefit people, cities, and the environment. By leveraging and monitoring a constant stream of data and training machine learning models to detect or even respond to different events, there can be many practical applications for such systems. These would rely on the combination of edge devices and machine learning to better enhance experience for users and detect events of interest.

'''Self-Driving Cars''': Imitation learning can be leveraged to better understand and emulate human driving. The low latency that can be provided by edge computing is especially useful for the quick decision making needed by these systems [7].

'''Smart Home Devices''': Understanding user habits by leveraging the available data for them can make smart homes more convenient for users. The increased privacy that can come with edge computing, along with a few extra cybersecurity measures, can ensure the personal data that may be used for training is not compromised.

'''Environmental and Industrial Monitoring:''' Sensors deployed in environments and industrial settings could be trained to recognize when there are anomalies or undesirable behavior, thus ensuring a quick response and active information sending [7].

'''Smart Cities:''' Similar to above, the data collected by sensors in cities can leverage machine learning to help in crime and emergency detection, traffic management, or energy management[7].

=='''4.2 ML Training at the Edge'''==

Machine Learning (ML) training at the edge is basically the process of developing, updating, or fine-tuning ML models directly on edge devices like on smartphones, IoT sensors, wearables, and other embedded systems instead of only depending on centralized cloud infrastructure. This approach is becoming a lot more important as the demand for real-time, personalized AI applications continues to grow. By being able to train models closer to where the data is generated, edge-based ML enables faster responses, helps reduce latency, and enhances user privacy by minimizing the need to transmit sensitive data to the cloud. It’s also especially useful in scenarios where devices operate in environments with limited or unreliable network connectivity, allowing them to function more efficiently.

[[File:Machine Learning Pipeline.png|500px|thumb|center]]

[[File:Learning.png|500px|thumb|center| This diagram shows the relationship among different devices in the edge-cloud system. As shown, the deep learning workload is partitioned among all layers so that each one contributes based on its advantages.]]

==='''Benefits:'''===
One significant advantage of training ML models directly on edge devices is reduced latency. By processing data locally, devices can make immediate decisions without the delays caused by transmitting data back and forth to cloud servers. This immediate responsiveness is extremely important for applications like real time health monitoring, autonomous driving, and industrial automation.

Additionally, training machine learning models at the edge significantly enhances user privacy. Since sensitive data can be processed and stored directly on the user's device rather than being sent to centralized cloud servers, the risk of data breaches or unauthorized access during transmission is reduced by a lot. This local data handling is able to prevent exposure of personal or confidential information, providing users greater control over their data. Edge-based training naturally aligns with privacy regulations such as the General Data Protection Regulation (GDPR), which emphasizes strict data security, transparency, and explicit user consent. By keeping personal data localized, edge training not only improves security but also helps organizations easily comply with privacy laws, protecting users’ rights and maintaining trust.

Efficiency and resilience are important benefits of edge training. By training machine learning models directly on edge devices, these devices become capable of processing data locally without relying on constant internet connectivity. This local processing allows edge devices to continue operating effectively even in environments where network connections are weak, unstable, or completely unavailable. Because they are not fully dependent on cloud infrastructure, edge devices can quickly adapt to changes, respond in real-time, and update their ML models based on immediate local data. As a result, edge training ensures reliable performance and uninterrupted operation, making it particularly valuable for remote locations, emergency scenarios, and harsh environments where cloud-based solutions might fail or become unreliable.
Research Papers

An important contribution to the understanding of machine learning (ML) training at the edge is the research paper "Making Distributed Edge Machine Learning for Resource-Constrained Communities and Environments Smarter: Contexts and Challenges" by Truong et al. (2023). This paper focuses on training ML models directly on edge devices in communities and environments facing limitations, such as unstable network connections, limited computational resources, and scarce technical expertise. The authors emphasize the necessity of developing context-aware ML training methods specifically tailored to these environments. Traditional centralized ML training methods often fail to operate effectively in such constrained settings, highlighting the need for decentralized, localized solutions. Truong et al. explore various challenges, including managing data efficiently, deploying suitable software frameworks, and designing intelligent runtime strategies that allow edge devices to train models effectively despite limited resources. Their work points out significant research directions, advocating for more adaptable and sustainable ML training solutions that genuinely reflect the technological and social contexts of resource-limited environments.

==='''Tools and Frameworks:'''===

Frameworks like TensorFlow Lite, PyTorch Mobile, and Edge Impulse are designed to support edge-based model training and inference. These tools allow developers to build and fine-tune models specifically for deployment on low-power devices.

==='''Technical Challenges:'''===

Despite its advantages, ML training at the edge presents challenges, including limited processing power, memory constraints, and energy efficiency. Edge devices often lack the computational resources of cloud servers, requiring lightweight models, optimized algorithms, and energy-efficient hardware.

Real World Applications:
A well known example is Apple’s use of on-device training for personalized voice recognition with Siri. Instead of uploading user voice data to the cloud, Apple uses local training to improve accuracy over time while maintaining user privacy.

==='''Dealing With Challenges'''===
Despite the challenges, of ML at the edge, there are a variety of methods that can be used to provide a more efficient means of training, and making the heavy workloads compatible with even the limited computing power of certain edge devices.

'''Quantization:'''
Quantization is a method that involves reducing the precision of numbers, and thus easing the burden of computational power as well as memory management on the edge devices. There are multiple forms of quantization, but each one essentially sacrifices some precision - enough so that accuracy is mostly maintained but the numbers are easier to handle. For example, converting from floating point to integer datatypes means significantly less memory is used, and the differences for some models in the precision may be negligible. Another example is K-means based Weight Quantization, which involves creating a matrix and grouping similar numbers together with centroids. An example is shown below:

[[File:Screenshot_2025-04-24_170604.png|500px|thumb|center|By clustering each index in the matrix, and using centroids to approximate, the overall computations can be done much quicker and are more easily handled by edge devices]]

'''Pruning:'''
This is a method often used in neural networks, in which the least important connections between different nodes and layers are taken out to remove the overall computation. Certain links between the layers are much more important than others and therefore only some will significantly affect the accuracy. By cutting out the least necessary ones, the same accuracy and training can be achieved with much less computation. This can also be applied to entire neurons in the neural network which are seen as the least significant.

[[File:Pruning.png|350px|thumb|center|This shows how pruning can significantly reduce the overall network, thus leading to better computational and memory management]]

'''Distillation'''
Distillation is a key strategy for reducing model complexity in edge computing environments. Instead of training a compact student model on hard labels—discrete class labels like 0, 1, or 2—it is trained on the soft outputs of a larger teacher model. These soft labels represent probability distributions over all classes, offering more nuanced supervision. For instance, rather than telling the student the input belongs strictly to class 3, a teacher might output “70% class 3, 25% class 2, 5% class 1.” This richer feedback helps the student model capture subtle relationships between classes that hard labels miss. Beyond reducing computational demands, distillation enhances generalization by conveying more informative training signals. It also benefits from favorable data geometry—when class distributions are well-separated and aligned—and exhibits strong monotonicity, meaning the student model reliably improves as more data becomes available [11]. These properties make it exceptionally suited for edge devices where training data may be limited, but efficient inference is crucial. Distillation can also be combined with quantization or pruning to further optimize performance under hardware constraints. An example is shown below:

[[File:Knowledge_Distillation.png|150ox|center|This shows how a complex teacher model transfers learned knowledge to a smaller student model using soft predictions to enable efficient edge deployment]]

==='''Usage and Applications of AI Agents'''===
As AI develops and is able to identify different images, generate chats and utilize human language, and optimize different systems, the next step is for it to be able to efficiently perform tasks specified by users. This is known as an AI agent. The agent must first perceive its environment and task, often specified by a user, then it must think about what is the best means to accomplish the task, and finally be able to act on it. 3 aspects of AI agents are crucial in their development to efficiently and correctly carry out task:

'''Reasoning:''' The agent must be able to think sequentially, and decompose its specified tasks into a sequence of specific steps in order to accomplish its goal. It must also have some memory storage in order to remember what it has done, as well as the results of its sequence of actions in order to learn for future steps.

'''Autonomy:''' The agent must choose from the availability of possible steps, and operate based on its reasoning without step-by-step instructions from the user.

'''Tools:''' These tasks, however, are impossible to accomplish without the correct tools. Even if an AI agent understands how to go about carrying a task for optimal results, it must have the actual means to do it. This can include the ability to use and interact with APIs, interpret code, and access certain databases.

Utilizing AI agents on edge devices can be tricky due to the computational and reasoning power needed. However, there are methods to accomplish this such as SLMs which query LLMs as need (discussed later), or utilizing more powerful edge devices to carry out tasks. However, utilizing edge devices can be paramount if latency is a major issue, or if the agent is exposed to sensitive user data. Additionally, by using edge devices specific to a user, it may be able to better learn a user's patterns and preferences and react accordingly to provide the best possible outcome for that user.

=='''4.3 ML Model Optimization at the Edge'''==

==='''The Need for Model Optimization at the Edge'''===
Given the constrained resources, along with the inherently dynamic environment that edge devices must function in, model optimization is a crucial part of machine learning in edge computing. The current most widely used methodology consists of simply specifying an exceptionally large set of parameters, and giving it to the model to train on. This can be feasible when hardware is very advanced and powerful, and is necessary for systems such as Large Language Models (LLMs). However, this is no longer viable when dealing with the devices and environments at the edge. It is crucial to identify the best parameters and training methodology so as to minimize the amount of work done by these devices, while compromising as little as possible on the accuracy of the models. There are multiple ways to this, and they include either optimization or augmentation of the dataset itself, or optimization of the partition of work among the edge devices.

==='''Edge and Cloud Collaboration'''===
One methodology that is often used involves collaboration between both Edge and Cloud Devices. The cloud has the ability to process workloads that may require much more resources and cannot be done on edge devices. On the other hand, edge devices, which can store and process data locally, may have lower latency and more privacy. Given the advantages of each of these, many have proposed that the best way to handle machine learning is through a combination of edge and cloud computing.

The primary issue facing this computing paradigm, however, is the problem of optimally selecting which workloads should be done on the cloud and which should be done on the edge. This is a crucial problem to solve, as the correct partition of workloads is the best way to ensure that the respective benefits of the devices can be leveraged. A common way to do this, is to run certain computing tasks on the necessary devices and determine the length of time and resources that it takes. An example of this is the profiling step done in [https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10818760&tag=1 EdgeShard] [1] and [https://dl.acm.org/doi/pdf/10.1145/3093337.3037698 Neurosurgeon] [4]. Other frameworks implement similar steps, where the capabilities of different devices are tested in order to allocate their workloads and determine the limit at which they can provide efficient functionality. If the workload is beyond the limits of the devices, it can be sent to the cloud for processing

The key advantage of this is that it is able to utilize the resources of the edge devices as necessary, allowing increased data privacy and lower latency. Since workloads are only processed in the cloud as needed, this will reduce the overall amount of time needed for processing because data is not constantly sent back and forth. It also allows for much less network congestion, which is crucial for many applications.

[[File:ECcollab.png|400px|thumb|center|The collaboration of Edge and Cloud Devices]]

==='''Optimizing Workload Partitioning'''===
The key idea for much of the optimization done in machine learning on edge systems involves fully utilizing the heterogenous devices that are often contained in these systems. As such, it is important to understand the capabilities of each device so as to fully utilize its advantages. Devices can very greatly from smartphones with more powerful computational abilities to raspberry pis to sensors. More difficult tasks are offloaded to the powerful devices, while simpler tasks, or models that have been somewhat pretrained can be sent to the smaller devices. In some cases, as in [https://ieeexplore.ieee.org/abstract/document/8690980 Mobile-Edge] [2], the task may be dropped altogether if the resources are deemed insufficient. In this way, exceptionally difficult tasks do not block tasks that have the ability to be executed and therefore the system can continue working.

==='''Dynamic Models'''===
Given the dynamic nature of the environments that edge devices must function, as well as the heterogeneity of the devices themselves, a dynamic model of machine learning is often employed. Such models must keep track of the current available resources including computation usage and power, as well as network traffic. These may change very often depending on the workloads and devices in the system. As such, training models to continuously monitor and dynamically distribute the workloads is a very important part of optimization. Simply offloading larger tasks to more powerful devices may be obsolete if the devices has all of its computing resources or network capabilities being used up by another workload.

The way this is commonly done is by using the profiling step described above as a baseline. Then, a machine learning model utilizes the data to estimate the performance of devices and/or layers. During runtime, a similar process is employed which may update the data used and help the model refine its predictions. Network traffic is also taken into account at this stage in order to preserve the edge computing benefit of providing lower latency. Using all of this data and updates at runtime, the partitioning model is able to dynamically distribute workloads at runtime in order to optimize the workflow and ensure each device is utilizing its resources in the most efficient manner. 2 very good examples of how such a system is specifically deployed are the Neurosurgeon and EdgeShard systems, shown above.

==='''Horizontal and Vertical Partitioning'''===
There are 2 major ways that these models split the workloads in order to optimize the machine learning: Horizontal and vertical partitioning [3]. Given a set of layers that ranges from the cloud to edge, vertical partitioning involves splitting up the workload between the layers. For example, if a large amount of computational resources is deemed necessary, this task may go to the cloud to be completed and preprocessed. One the other hand, if a small amount of computational power is required, this type of work can go to edge devices. Such partitioning also depends on the confidence and accuracy level of the given learning. If the accuracy is completed on an edge device and found to be very low, it can be sent to the cloud; on the other hand if the accuracy is already fairly high and the learning model needs smaller work to reach the threshold deemed acceptable, it may be sent to edge devices to free up network traffic on the cloud and reduce latency [3].

The second model of partitioning is called horizontal partitioning. This involves splitting among the devices within a certain layer rather than among the layers themselves. This is similar to what has been described in previous sections, as it allows a means for fully utilizing the heterogenous abilities that are found in edge devices. Similar functionality and determination to what is found in horizontal partitioning is done, but all of the devices that the workload is split across function on the same layer [3]. To fully optimize a machine learning model, both horizontal and vertical partitioning must be used.

[[File:Edge_computing_layers.png|400px|thumb|center|An example of different layers with multiple devices]]

==='''Distributed Learning'''===
Distributed learning is a process in which instead of giving all of the data to the nodes, each node takes in and processes only a subset of the overall data, then sends its results to a central server which contains the main model. These periodic updates are done by each node, and by only processing part of the data, it is much more easy for edge devices to handle these workloads. It can also reduce the time and computational burden on the cloud and network because it is not only the central server performing all of the computations. One popular method of accomplishing this is by using parallel gradient descent. Gradient descent by itself is a very useful means of training a model, and parallel gradient descent uses a similar process but instead of a single operation, it aggregates the gradient descent calculated by each node in order to update the central model. This efficiently utilizes each node and makes sure that the data that is used to update the model does not exceed the memory constraints of the edge devices being used as nodes.

[[File:Distributed.png|600px|thumb|center| A visualization of data partitioning among nodes doing distributed learning]]

'''Asynchronous Scheduling:''' One important aspect of distributed learning is the means by which the updates are given to the model. Each node may have different amounts of data, memory, and computational power, and therefore the time it takes to process and update will not be the same. Synchronous algorithms make sure that each node finishes its respective processing, and then all of the calculations from all nodes are sent to update the central model. Although, this can make the updating easier, any nodes that lag behind the others will greatly reduce the speed at which the model is trained; this also leaves nodes idle while others are still processing and can be highly inefficient. To optimize this, after each node has finished its respective processing, it sends everything to the main server for an update rather than waiting for all the others to finish. However, this requires some more communication, as after each update, every node must get the updated model for the central server; this can be challenging to repeatedly do, but the efficiency that is gained is often worth it.

==='''Transfer Learning'''===
Transfer learning is a method of machine learning in which a pretrained model is sent to the different nodes for further processing and fine-tuning. The initial training of the model may involve a very large amount of data and could place a major burden on the device that must execute it, which may not be feasible for edge devices. Therefore, the bulk of the model training is done by a more powerful machine, such as the cloud, and then the pretrained model is sent to the edge device. This can be useful, as it reduces the computational burden on the edge device, and allows it to fine-tune the model using the data it collects without having to completely train the whole system. One form of this is knowledge distillation, in which a smaller model can be trained to mimic that of a larger model. This may often be the case when edge and cloud systems are used in a combined way.

==='''Methods of Data Optimization'''===

'''Data Selection:''' Certain types of data may be more useful than others, and therefore the ones that most affect the accuracy of a model can be offloaded to edge devices. This decreases the workload on them because less data must be processed, while also conserving the accuracy of the model as much as possible. Some larger data may not be needed, such as only putting Small Language Models on edge devices that can handle simple commands and prompts, rather than having to offload an entire LLM onto the device, which may overload its computational abilities and not provide enough use.

'''Data Compression:''' Data can be compressed to a smaller form in order to fit the constraints of edge devices. This is especially true given their limited memory, and also makes the workload smaller. Quantization, discussed previously, is a prevalent example of this.

'''Container Workloads:''' Container workloads can be very useful, as they provide all resources and important data the device needs for processing the work. By examining the computational abilities of the device, different sized workloads can be allocated as deemed necessary to maximize the efficiency of the training.

'''Batch size tuning:''' The batch size used by a certain model is very important when considering the memory constraints of edge devices. Batch sizes that are smaller allow for a quicker and more efficient means of training models, and are less likely to lead to bottlenecks in memory. This is related to partitioning because the computing and memory capacity of the devices available are very important factors to consider.

==='''Utilizing Small Language Models (SLMs)'''===
Large Language Models (LLMs) have become a prevalent system recently and are able to do and help with a variety of tasks. However, running and training an LLM requires a significant amount of computational resources which is not feasible when working with edge devices. Most modern LLMs are cloud based, but this may lead to high latency and increased network traffic, especially when working with a large subsystem of nodes.

One way that a similar system can be achieved on edge devices is by using SLMs. These are not as accurate and do not have the vast knowledge of LLMs or the amount of data they are trained on, but for the purposes of basic applications and edge devices, they can be sufficient to accomplish many tasks. They are also often fine-tuned and trained to accomplish the specific tasks which they are deployed for and are much faster and resource-efficient than LLMs. They can also provide much more privacy because they are able to be run on local devices without sharing user data to the cloud. This can be useful for a wide variety of edge applications. If needed and privacy constraints permit, they can query and LLM as needed for more complicated tasks. This means that not every prompt leads to a query, and thus network traffic, privacy, and latency constraints are still preserved.

=='''References'''==
[https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10818760&tag=1 1. M. Zhang, X. Shen, J. Cao, Z. Cui and S. Jiang, "EdgeShard: Efficient LLM Inference via Collaborative Edge Computing," in IEEE Internet of Things Journal, doi: 10.1109/JIOT.2024.3524255]

[https://ieeexplore.ieee.org/abstract/document/8690980 2. X. Chen, H. Zhang, C. Wu, S. Mao, Y. Ji and M. Bennis, "Performance Optimization in Mobile-Edge Computing via Deep Reinforcement Learning," 2018 IEEE 88th Vehicular Technology Conference (VTC-Fall), Chicago, IL, USA, 2018, pp. 1-6, doi: 10.1109/VTCFall.2018.8690980.]

[https://ieeexplore.ieee.org/abstract/document/8976180 3. X. Wang, Y. Han, V. C. M. Leung, D. Niyato, X. Yan and X. Chen, "Convergence of Edge Computing and Deep Learning: A Comprehensive Survey," in IEEE Communications Surveys & Tutorials, vol. 22, no. 2, pp. 869-904, Secondquarter 2020, doi: 10.1109/COMST.2020.2970550.]

[https://dl.acm.org/doi/abs/10.1145/3093337.3037698 4. Kang, Yiping and Hauswald, Johann and Gao, Cao and Rovinski, Austin and Mudge, Trevor and Mars, Jason and Tang, Lingjia, "Neurosurgeon: Collaborative Intelligence Between the Cloud and Mobile Edge" 2017 Association for Computing Machinery, New York, NY, USA, 2017, doi: 10.1145/3093337.3037698.]

[https://dl.acm.org/doi/pdf/10.1145/3555802 5. Hua, Haochen, et al. "Edge computing with artificial intelligence: A machine learning perspective." ACM Computing Surveys 55.9 (2023): 1-35.]

[https://www.mdpi.com/2079-9292/13/3/640 6. Grzesik, Piotr, and Dariusz Mrozek. "Combining machine learning and edge computing: Opportunities, challenges, platforms, frameworks, and use cases." Electronics 13.3 (2024): 640.]

[https://link.springer.com/chapter/10.1007/978-3-030-96756-7_1#citeas 7. S. Rafatirad, H. Homayoun, Z. Chen, and S. M. Pudukotai Dinakarrao, "What Is Applied Machine Learning?," in Machine Learning for Computer Scientists and Data Analysts, Cham, Switzerland: Springer, 2022.]

[https://link.springer.com/article/10.1007/s13748-012-0035-5 8. Peteiro-Barral, Diego, and Bertha Guijarro-Berdiñas. "A survey of methods for distributed machine learning." Progress in Artificial Intelligence 2 (2013): 1-11.]

[https://proceedings.neurips.cc/paper/2010/hash/abea47ba24142ed16b7d8fbf2c740e0d-Abstract.html 9. Zinkevich, Martin, et al. "Parallelized stochastic gradient descent." Advances in neural information processing systems 23 (2010).]

[https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9134370 10. F. Zhuang et al., "A Comprehensive Survey on Transfer Learning," in Proceedings of the IEEE, vol. 109, no. 1, pp. 43-76, Jan. 2021, doi: 10.1109/JPROC.2020.3004555]

[https://proceedings.mlr.press/v97/phuong19a/phuong19a.pdf 11. M. Phuong and C. Lampert, "Towards understanding knowledge distillation," in Proc. Int. Conf. Mach. Learn., May 2019, pp. 5142–5151.]

File:Knowledge Distillation.png

2025-04-25T02:56:52Z

Ciarang: Diagram of knowledge distillation: a complex teacher model transfers learned knowledge to a smaller student model using soft predictions to enable efficient edge deployment.

== Summary ==
Diagram of knowledge distillation: a complex teacher model transfers learned knowledge to a smaller student model using soft predictions to enable efficient edge deployment.

File:Knowledge-Distillation.webp

2025-04-25T02:53:33Z

Ciarang: Diagram of knowledge distillation: a complex teacher model transfers learned knowledge to a smaller student model using soft predictions to enable efficient edge deployment.

== Summary ==
Diagram of knowledge distillation: a complex teacher model transfers learned knowledge to a smaller student model using soft predictions to enable efficient edge deployment.

Machine Learning at the Edge

2025-04-25T02:11:24Z

Ciarang: /* References */

='''Machine Learning at the Edge'''=

=='''4.1 Overview of ML at the Edge'''==
==='''Introduction'''===
Machine Learning is a branch of Artificial Intelligence that is primarily concerned with the training of algorithms to look at sets of data, analyze patterns, and generate conclusions so that the gathered data can be used to generate results and carry out tasks. As an application, machine learning is especially relevant when considering edge computing [7]. Given the recent prevalence and rise in the field of machine learning, it is inevitable that deployment of machine learning algorithms will play a crucial role in the function of edge computing systems as well. The abundant sensor networks often involved in edge systems provide a means of gathering very large amounts of data. With the help of machine learning, such data can be utilized in very effective ways and automation methods can be employed to improve efficiency. Machine learning on edge devices themselves can help ease the burden on cloud systems and the networks connected to them as well, especially if the model does need such large computational power such as in the case of certain Small Language Models (SLMs). However, the main issue regarding machine learning at the edge lies in the devices, which may have limited computational power, and the environment, which may change dynamically and be effected by network congestion, device outages, or other unexpected events. If these challenges can be overcome, edge systems could play a pivotal role in advancing machine learning and utilize it to maximize efficiency.

[[File:ECML.png|600px|thumb|center|]]

==='''Benefits of Machine Learning using Edge Computing'''===
'''Data Volumes:''' Traditional cloud computing architecture may not be able to keep up with the massive volume of data that is often generated by IoT devices and sensors, thus increasing costs as well as pressure and congestion on the networks transmitting data [5]. With edge computing architecture, machine learning can be deployed in a distributed nature, allowing ML models to be deployed and trained across many edge devices. The nature of edge devices sharing computational tasks makes it perfect for taking large datasets and splitting the computational load so that many devices can compute what a single device might not have been able to handle [5]. More than just splitting input data across edge devices, machine learning models can be split up across edge devices as well. This allows for deployment of models that would be otherwise too large to be deployed on a single edge device with limited memory. In such cases, edge nodes are able to collaboratively pass data between them, and the global ML model is later updated based on the workings of each smaller portion [5]. Additionally, the sheer amount of data that processed from edge devices means more training data for machine learning models, which generally increases accuracy, and thus edge computing could be an effective solution to providing lots of good and relevant data for a variety of applications.

'''Lower Latency:''' For certain real-time applications, such as Virtual and Augmented Reality(VR/AR) or smart cars, the latency required to transmit data and process it to the cloud may be too high to make these applications efficient and safe. A key benefit of edge computing lies in the reduced latency provided by putting devices closer to the users, and this is applicable to machine learning as well [5].

'''Enhanced privacy:''' Processing data locally rather than on the cloud reduces the risk of data theft and enhances privacy. The data does not have to be sent to anyone else, or go over a network to potentially get compromised. This is crucial given the large amounts of data needed for machine learning, and especially if personal data is needed for a personal application, many users would prefer the privacy done by having it processed locally [5].

==='''Challenges of Machine Learning Using Edge Computing'''===

'''Potentially Low Computational Power:''' Many edge devices lack the computational power needed for needed for deep learning applications, especially given the large amount of data and complex operations that may have to take place [1].

'''Energy Management:''' Given that edge devices may consist of sensors, or other devices that are meant to have low energy consumption, the tasks associated with machine learning could quickly drain their power, even if they have sufficient power to run the workloads [6].

'''New Attack Surfaces:''' With more devices comes the increased potential for malicious hackers to steal data or compromise devices, despite the enhanced privacy. Encryption, access controls, and other methodologies must be employed to mitigate the potential for new attack surfaces to be exploited [6].

==='''Applications for ML at the Edge'''===
The utilization of data to make conjectures about what is going on in the environment and how to respond has a variety of use cases that can greatly benefit people, cities, and the environment. By leveraging and monitoring a constant stream of data and training machine learning models to detect or even respond to different events, there can be many practical applications for such systems. These would rely on the combination of edge devices and machine learning to better enhance experience for users and detect events of interest.

'''Self-Driving Cars''': Imitation learning can be leveraged to better understand and emulate human driving. The low latency that can be provided by edge computing is especially useful for the quick decision making needed by these systems [7].

'''Smart Home Devices''': Understanding user habits by leveraging the available data for them can make smart homes more convenient for users. The increased privacy that can come with edge computing, along with a few extra cybersecurity measures, can ensure the personal data that may be used for training is not compromised.

'''Environmental and Industrial Monitoring:''' Sensors deployed in environments and industrial settings could be trained to recognize when there are anomalies or undesirable behavior, thus ensuring a quick response and active information sending [7].

'''Smart Cities:''' Similar to above, the data collected by sensors in cities can leverage machine learning to help in crime and emergency detection, traffic management, or energy management[7].

=='''4.2 ML Training at the Edge'''==

Machine Learning (ML) training at the edge is basically the process of developing, updating, or fine-tuning ML models directly on edge devices like on smartphones, IoT sensors, wearables, and other embedded systems instead of only depending on centralized cloud infrastructure. This approach is becoming a lot more important as the demand for real-time, personalized AI applications continues to grow. By being able to train models closer to where the data is generated, edge-based ML enables faster responses, helps reduce latency, and enhances user privacy by minimizing the need to transmit sensitive data to the cloud. It’s also especially useful in scenarios where devices operate in environments with limited or unreliable network connectivity, allowing them to function more efficiently.

[[File:Machine Learning Pipeline.png|500px|thumb|center]]

[[File:Learning.png|500px|thumb|center| this diagram shows the relationship among different devices in the edge-cloud system. As shown, the deep learning workload is partitioned among all layers so that each one contributes based on its advantages.]]

==='''Benefits:'''===
One significant advantage of training ML models directly on edge devices is reduced latency. By processing data locally, devices can make immediate decisions without the delays caused by transmitting data back and forth to cloud servers. This immediate responsiveness is extremely important for applications like real time health monitoring, autonomous driving, and industrial automation.

Additionally, training machine learning models at the edge significantly enhances user privacy. Since sensitive data can be processed and stored directly on the user's device rather than being sent to centralized cloud servers, the risk of data breaches or unauthorized access during transmission is reduced by a lot. This local data handling is able to prevent exposure of personal or confidential information, providing users greater control over their data. Edge-based training naturally aligns with privacy regulations such as the General Data Protection Regulation (GDPR), which emphasizes strict data security, transparency, and explicit user consent. By keeping personal data localized, edge training not only improves security but also helps organizations easily comply with privacy laws, protecting users’ rights and maintaining trust.

Efficiency and resilience are important benefits of edge training. By training machine learning models directly on edge devices, these devices become capable of processing data locally without relying on constant internet connectivity. This local processing allows edge devices to continue operating effectively even in environments where network connections are weak, unstable, or completely unavailable. Because they are not fully dependent on cloud infrastructure, edge devices can quickly adapt to changes, respond in real-time, and update their ML models based on immediate local data. As a result, edge training ensures reliable performance and uninterrupted operation, making it particularly valuable for remote locations, emergency scenarios, and harsh environments where cloud-based solutions might fail or become unreliable.
Research Papers

An important contribution to the understanding of machine learning (ML) training at the edge is the research paper "Making Distributed Edge Machine Learning for Resource-Constrained Communities and Environments Smarter: Contexts and Challenges" by Truong et al. (2023). This paper focuses on training ML models directly on edge devices in communities and environments facing limitations, such as unstable network connections, limited computational resources, and scarce technical expertise. The authors emphasize the necessity of developing context-aware ML training methods specifically tailored to these environments. Traditional centralized ML training methods often fail to operate effectively in such constrained settings, highlighting the need for decentralized, localized solutions. Truong et al. explore various challenges, including managing data efficiently, deploying suitable software frameworks, and designing intelligent runtime strategies that allow edge devices to train models effectively despite limited resources. Their work points out significant research directions, advocating for more adaptable and sustainable ML training solutions that genuinely reflect the technological and social contexts of resource-limited environments.

==='''Tools and Frameworks:'''===

Frameworks like TensorFlow Lite, PyTorch Mobile, and Edge Impulse are designed to support edge-based model training and inference. These tools allow developers to build and fine-tune models specifically for deployment on low-power devices.

==='''Technical Challenges:'''===

Despite its advantages, ML training at the edge presents challenges, including limited processing power, memory constraints, and energy efficiency. Edge devices often lack the computational resources of cloud servers, requiring lightweight models, optimized algorithms, and energy-efficient hardware.

Real World Applications:
A well known example is Apple’s use of on-device training for personalized voice recognition with Siri. Instead of uploading user voice data to the cloud, Apple uses local training to improve accuracy over time while maintaining user privacy.

==='''Dealing With Challenges'''===
Despite the challenges, of ML at the edge, there are a variety of methods that can be used to provide a more efficient means of training, and making the heavy workloads compatible with even the limited computing power of certain edge devices.

'''Quantization:'''
Quantization is a method that involves reducing the precision of numbers, and thus easing the burden of computational power as well as memory management on the edge devices. There are multiple forms of quantization, but each one essentially sacrifices some precision - enough so that accuracy is mostly maintained but the numbers are easier to handle. For example, converting from floating point to integer datatypes means significantly less memory is used, and the differences for some models in the precision may be negligible. Another example is K-means based Weight Quantization, which involves creating a matrix and grouping similar numbers together with centroids. An example is shown below:

[[File:Screenshot_2025-04-24_170604.png|500px|thumb|center|By clustering each index in the matrix, and using centroids to approximate, the overall computations can be done much quicker and are more easily handled by edge devices]]

'''Pruning:'''
This is a method often used in neural networks, in which the least important connections between different nodes and layers are taken out to remove the overall computation. Certain links between the layers are much more important than others and therefore only some will significantly affect the accuracy. By cutting out the least necessary ones, the same accuracy and training can be achieved with much less computation. This can also be applied to entire neurons in the neural network which are seen as the least significant.

[[File:Pruning.png|350px|thumb|center|This shows how pruning can significantly reduce the overall network, thus leading to better computational and memory management]]

==='''Usage and Applications of AI Agents'''===
As AI develops and is able to identify different images, generate chats and utilize human language, and optimize different systems, the next step is for it to be able to efficiently perform tasks specified by users. This is known as an AI agent. The agent must first perceive its environment and task, often specified by a user, then it must think about what is the best means to accomplish the task, and finally be able to act on it. 3 aspects of AI agents are crucial in their development to efficiently and correctly carry out task:

'''Reasoning:''' The agent must be able to think sequentially, and decompose its specified tasks into a sequence of specific steps in order to accomplish its goal. It must also have some memory storage in order to remember what it has done, as well as the results of its sequence of actions in order to learn for future steps.

'''Autonomy:''' The agent must choose from the availability of possible steps, and operate based on its reasoning without step-by-step instructions from the user.

'''Tools:''' These tasks, however, are impossible to accomplish without the correct tools. Even if an AI agent understands how to go about carrying a task for optimal results, it must have the actual means to do it. This can include the ability to use and interact with APIs, interpret code, and access certain databases.

Utilizing AI agents on edge devices can be tricky due to the computational and reasoning power needed. However, there are methods to accomplish this such as SLMs which query LLMs as need (discussed later), or utilizing more powerful edge devices to carry out tasks. However, utilizing edge devices can be paramount if latency is a major issue, or if the agent is exposed to sensitive user data. Additionally, by using edge devices specific to a user, it may be able to better learn a user's patterns and preferences and react accordingly to provide the best possible outcome for that user.

=='''4.3 ML Model Optimization at the Edge'''==

==='''The Need for Model Optimization at the Edge'''===
Given the constrained resources, along with the inherently dynamic environment that edge devices must function in, model optimization is a crucial part of machine learning in edge computing. The current most widely used methodology consists of simply specifying an exceptionally large set of parameters, and giving it to the model to train on. This can be feasible when hardware is very advanced and powerful, and is necessary for systems such as Large Language Models (LLMs). However, this is no longer viable when dealing with the devices and environments at the edge. It is crucial to identify the best parameters and training methodology so as to minimize the amount of work done by these devices, while compromising as little as possible on the accuracy of the models. There are multiple ways to this, and they include either optimization or augmentation of the dataset itself, or optimization of the partition of work among the edge devices.

==='''Edge and Cloud Collaboration'''===
One methodology that is often used involves collaboration between both Edge and Cloud Devices. The cloud has the ability to process workloads that may require much more resources and cannot be done on edge devices. On the other hand, edge devices, which can store and process data locally, may have lower latency and more privacy. Given the advantages of each of these, many have proposed that the best way to handle machine learning is through a combination of edge and cloud computing.

The primary issue facing this computing paradigm, however, is the problem of optimally selecting which workloads should be done on the cloud and which should be done on the edge. This is a crucial problem to solve, as the correct partition of workloads is the best way to ensure that the respective benefits of the devices can be leveraged. A common way to do this, is to run certain computing tasks on the necessary devices and determine the length of time and resources that it takes. An example of this is the profiling step done in [https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10818760&tag=1 EdgeShard] [1] and [https://dl.acm.org/doi/pdf/10.1145/3093337.3037698 Neurosurgeon] [4]. Other frameworks implement similar steps, where the capabilities of different devices are tested in order to allocate their workloads and determine the limit at which they can provide efficient functionality. If the workload is beyond the limits of the devices, it can be sent to the cloud for processing

The key advantage of this is that it is able to utilize the resources of the edge devices as necessary, allowing increased data privacy and lower latency. Since workloads are only processed in the cloud as needed, this will reduce the overall amount of time needed for processing because data is not constantly sent back and forth. It also allows for much less network congestion, which is crucial for many applications.

[[File:ECcollab.png|400px|thumb|center|The collaboration of Edge and Cloud Devices]]

==='''Optimizing Workload Partitioning'''===
The key idea for much of the optimization done in machine learning on edge systems involves fully utilizing the heterogenous devices that are often contained in these systems. As such, it is important to understand the capabilities of each device so as to fully utilize its advantages. Devices can very greatly from smartphones with more powerful computational abilities to raspberry pis to sensors. More difficult tasks are offloaded to the powerful devices, while simpler tasks, or models that have been somewhat pretrained can be sent to the smaller devices. In some cases, as in [https://ieeexplore.ieee.org/abstract/document/8690980 Mobile-Edge] [2], the task may be dropped altogether if the resources are deemed insufficient. In this way, exceptionally difficult tasks do not block tasks that have the ability to be executed and therefore the system can continue working.

==='''Dynamic Models'''===
Given the dynamic nature of the environments that edge devices must function, as well as the heterogeneity of the devices themselves, a dynamic model of machine learning is often employed. Such models must keep track of the current available resources including computation usage and power, as well as network traffic. These may change very often depending on the workloads and devices in the system. As such, training models to continuously monitor and dynamically distribute the workloads is a very important part of optimization. Simply offloading larger tasks to more powerful devices may be obsolete if the devices has all of its computing resources or network capabilities being used up by another workload.

The way this is commonly done is by using the profiling step described above as a baseline. Then, a machine learning model utilizes the data to estimate the performance of devices and/or layers. During runtime, a similar process is employed which may update the data used and help the model refine its predictions. Network traffic is also taken into account at this stage in order to preserve the edge computing benefit of providing lower latency. Using all of this data and updates at runtime, the partitioning model is able to dynamically distribute workloads at runtime in order to optimize the workflow and ensure each device is utilizing its resources in the most efficient manner. 2 very good examples of how such a system is specifically deployed are the Neurosurgeon and EdgeShard systems, shown above.

==='''Horizontal and Vertical Partitioning'''===
There are 2 major ways that these models split the workloads in order to optimize the machine learning: Horizontal and vertical partitioning [3]. Given a set of layers that ranges from the cloud to edge, vertical partitioning involves splitting up the workload between the layers. For example, if a large amount of computational resources is deemed necessary, this task may go to the cloud to be completed and preprocessed. One the other hand, if a small amount of computational power is required, this type of work can go to edge devices. Such partitioning also depends on the confidence and accuracy level of the given learning. If the accuracy is completed on an edge device and found to be very low, it can be sent to the cloud; on the other hand if the accuracy is already fairly high and the learning model needs smaller work to reach the threshold deemed acceptable, it may be sent to edge devices to free up network traffic on the cloud and reduce latency [3].

The second model of partitioning is called horizontal partitioning. This involves splitting among the devices within a certain layer rather than among the layers themselves. This is similar to what has been described in previous sections, as it allows a means for fully utilizing the heterogenous abilities that are found in edge devices. Similar functionality and determination to what is found in horizontal partitioning is done, but all of the devices that the workload is split across function on the same layer [3]. To fully optimize a machine learning model, both horizontal and vertical partitioning must be used.

[[File:Edge_computing_layers.png|400px|thumb|center|An example of different layers with multiple devices]]

==='''Distributed Learning'''===
Distributed learning is a process in which instead of giving all of the data to the nodes, each node takes in and processes only a subset of the overall data, then sends its results to a central server which contains the main model. These periodic updates are done by each node, and by only processing part of the data, it is much more easy for edge devices to handle these workloads. It can also reduce the time and computational burden on the cloud and network because it is not only the central server performing all of the computations. One popular method of accomplishing this is by using parallel gradient descent. Gradient descent by itself is a very useful means of training a model, and parallel gradient descent uses a similar process but instead of a single operation, it aggregates the gradient descent calculated by each node in order to update the central model. This efficiently utilizes each node and makes sure that the data that is used to update the model does not exceed the memory constraints of the edge devices being used as nodes.

'''Asynchronous Scheduling:''' One important aspect of distributed learning is the means by which the updates are given to the model. Each node may have different amounts of data, memory, and computational power, and therefore the time it takes to process and update will not be the same. Synchronous algorithms make sure that each node finishes its respective processing, and then all of the calculations from all nodes are sent to update the central model. Although, this can make the updating easier, any nodes that lag behind the others will greatly reduce the speed at which the model is trained; this also leaves nodes idle while others are still processing and can be highly inefficient. To optimize this, after each node has finished its respective processing, it sends everything to the main server for an update rather than waiting for all the others to finish. However, this requires some more communication, as after each update, every node must get the updated model for the central server; this can be challenging to repeatedly do, but the efficiency that is gained is often worth it.

==='''Transfer Learning'''===
Transfer learning is a method of machine learning in which a pretrained model is sent to the different nodes for further processing and fine-tuning. The initial training of the model may involve a very large amount of data and could place a major burden on the device that must execute it, which may not be feasible for edge devices. Therefore, the bulk of the model training is done by a more powerful machine, such as the cloud, and then the pretrained model is sent to the edge device. This can be useful, as it reduces the computational burden on the edge device, and allows it to fine-tune the model using the data it collects without having to completely train the whole system.

==='''Methods of Data Optimization'''===

'''Data Selection:''' Certain types of data may be more useful than others, and therefore the ones that most affect the accuracy of a model can be offloaded to edge devices. This decreases the workload on them because less data must be processed, while also conserving the accuracy of the model as much as possible. Some larger data may not be needed, such as only putting Small Language Models on edge devices that can handle simple commands and prompts, rather than having to offload an entire LLM onto the device, which may overload its computational abilities and not provide enough use.

'''Data Compression:''' Data can be compressed to a smaller form in order to fit the constraints of edge devices. This is especially true given their limited memory, and also makes the workload smaller. Quantization, discussed previously, is a prevalent example of this.

'''Container Workloads:''' Container workloads can be very useful, as they provide all resources and important data the device needs for processing the work. By examining the computational abilities of the device, different sized workloads can be allocated as deemed necessary to maximize the efficiency of the training.

'''Batch size tuning:''' The batch size used by a certain model is very important when considering the memory constraints of edge devices. Batch sizes that are smaller allow for a quicker and more efficient means of training models, and are less likely to lead to bottlenecks in memory. This is related to partitioning because the computing and memory capacity of the devices available are very important factors to consider.

==='''Utilizing Small Language Models (SLMs)'''===
Large Language Models (LLMs) have become a prevalent system recently and are able to do and help with a variety of tasks. However, running and training an LLM requires a significant amount of computational resources which is not feasible when working with edge devices. Most modern LLMs are cloud based, but this may lead to high latency and increased network traffic, especially when working with a large subsystem of nodes.

One way that a similar system can be achieved on edge devices is by using SLMs. These are not as accurate and do not have the vast knowledge of LLMs or the amount of data they are trained on, but for the purposes of basic applications and edge devices, they can be sufficient to accomplish many tasks. They are also often fine-tuned and trained to accomplish the specific tasks which they are deployed for and are much faster and resource-efficient than LLMs. They can also provide much more privacy because they are able to be run on local devices without sharing user data to the cloud. This can be useful for a wide variety of edge applications. If needed and privacy constraints permit, they can query and LLM as needed for more complicated tasks. This means that not every prompt leads to a query, and thus network traffic, privacy, and latency constraints are still preserved.

=='''References'''==
[https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10818760&tag=1 1. M. Zhang, X. Shen, J. Cao, Z. Cui and S. Jiang, "EdgeShard: Efficient LLM Inference via Collaborative Edge Computing," in IEEE Internet of Things Journal, doi: 10.1109/JIOT.2024.3524255]

[https://ieeexplore.ieee.org/abstract/document/8690980 2. X. Chen, H. Zhang, C. Wu, S. Mao, Y. Ji and M. Bennis, "Performance Optimization in Mobile-Edge Computing via Deep Reinforcement Learning," 2018 IEEE 88th Vehicular Technology Conference (VTC-Fall), Chicago, IL, USA, 2018, pp. 1-6, doi: 10.1109/VTCFall.2018.8690980.]

[https://ieeexplore.ieee.org/abstract/document/8976180 3. X. Wang, Y. Han, V. C. M. Leung, D. Niyato, X. Yan and X. Chen, "Convergence of Edge Computing and Deep Learning: A Comprehensive Survey," in IEEE Communications Surveys & Tutorials, vol. 22, no. 2, pp. 869-904, Secondquarter 2020, doi: 10.1109/COMST.2020.2970550.]

[https://dl.acm.org/doi/abs/10.1145/3093337.3037698 4. Kang, Yiping and Hauswald, Johann and Gao, Cao and Rovinski, Austin and Mudge, Trevor and Mars, Jason and Tang, Lingjia, "Neurosurgeon: Collaborative Intelligence Between the Cloud and Mobile Edge" 2017 Association for Computing Machinery, New York, NY, USA, 2017, doi: 10.1145/3093337.3037698.]

[https://dl.acm.org/doi/pdf/10.1145/3555802 5. Hua, Haochen, et al. "Edge computing with artificial intelligence: A machine learning perspective." ACM Computing Surveys 55.9 (2023): 1-35.]

[https://www.mdpi.com/2079-9292/13/3/640 6. Grzesik, Piotr, and Dariusz Mrozek. "Combining machine learning and edge computing: Opportunities, challenges, platforms, frameworks, and use cases." Electronics 13.3 (2024): 640.]

[https://link.springer.com/chapter/10.1007/978-3-030-96756-7_1#citeas 7. S. Rafatirad, H. Homayoun, Z. Chen, and S. M. Pudukotai Dinakarrao, "What Is Applied Machine Learning?," in Machine Learning for Computer Scientists and Data Analysts, Cham, Switzerland: Springer, 2022.]

[https://link.springer.com/article/10.1007/s13748-012-0035-5 8. Peteiro-Barral, Diego, and Bertha Guijarro-Berdiñas. "A survey of methods for distributed machine learning." Progress in Artificial Intelligence 2 (2013): 1-11.]

[https://proceedings.neurips.cc/paper/2010/hash/abea47ba24142ed16b7d8fbf2c740e0d-Abstract.html 9. Zinkevich, Martin, et al. "Parallelized stochastic gradient descent." Advances in neural information processing systems 23 (2010).]

[https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9134370 10. F. Zhuang et al., "A Comprehensive Survey on Transfer Learning," in Proceedings of the IEEE, vol. 109, no. 1, pp. 43-76, Jan. 2021, doi: 10.1109/JPROC.2020.3004555]

[https://proceedings.mlr.press/v97/phuong19a/phuong19a.pdf 11. M. Phuong and C. Lampert, "Towards understanding knowledge distillation," in Proc. Int. Conf. Mach. Learn., May 2019, pp. 5142–5151.]

Machine Learning at the Edge

2025-04-07T03:01:05Z

Ciarang: /* Benefits of Machine Learning using Edge Computing */

='''Machine Learning at the Edge'''=

=='''4.1 Overview of ML at the Edge'''==
===Introduction===
Machine Learning is a branch of Artificial Intelligence that is primarily concerned with the training of algorithms to look at sets of data, analyze patterns, and generate conclusions so that the gathered data can be used to generate results and carry out tasks. As an application, machine learning is especially relevant when considering edge computing [7]. Given the recent prevalence and rise in the field of machine learning, it is inevitable that deployment of machine learning algorithms will play a crucial role in the function of edge computing systems as well. The abundant sensor networks often involved in edge systems provide a means of gathering very large amounts of data. With the help of machine learning, such data can be utilized in very effective ways and automation methods can be employed to improve efficiency. Machine learning on edge devices themselves can help ease the burden on cloud systems and the networks connected to them as well, especially if the model does need such large computational power such as in the case of certain Small Language Models (SLMs). However, the main issue regarding machine learning at the edge lies in the devices, which may have limited computational power, and the environment, which may change dynamically and be effected by network congestion, device outages, or other unexpected events. If these challenges can be overcome, edge systems could play a pivotal role in advancing machine learning and utilize it to maximize efficiency.

[[File:ECML.png|600px|thumb|center|]]

===Benefits of Machine Learning using Edge Computing===
'''Data Volumes:''' Traditional cloud computing architecture may not be able to keep up with the massive volume of data that is often generated by IoT devices and sensors, thus increasing costs as well as pressure and congestion on the networks transmitting data [5]. With edge computing architecture, machine learning can be deployed in a distributed nature, allowing ML models to be deployed and trained across many edge devices. The nature of edge devices sharing computational tasks makes it perfect for taking large datasets and splitting the computational load so that many devices can compute what a single device might not have been able to handle [5]. More than just splitting input data across edge devices, machine learning models can be split up across edge devices as well. This allows for deployment of models that would be otherwise too large to be deployed on a single edge device with limited memory. In such cases, edge nodes are able to collaboratively pass data between them, and the global ML model is later updated based on the workings of each smaller portion [5]. Additionally, the sheer amount of data that processed from edge devices means more training data for machine learning models, which generally increases accuracy, and thus edge computing could be an effective solution to providing lots of good and relevant data for a variety of applications.

'''Lower Latency:''' For certain real-time applications, such as Virtual and Augmented Reality(VR/AR) or smart cars, the latency required to transmit data and process it to the cloud may be too high to make these applications efficient and safe. A key benefit of edge computing lies in the reduced latency provided by putting devices closer to the users, and this is applicable to machine learning as well [5].

'''Enhanced privacy:''' Processing data locally rather than on the cloud reduces the risk of data theft and enhances privacy. The data does not have to be sent to anyone else, or go over a network to potentially get compromised. This is crucial given the large amounts of data needed for machine learning, and especially if personal data is needed for a personal application, many users would prefer the privacy done by having it processed locally [5].

===Challenges of Machine Learning Using Edge Computing===

'''Potentially Low Computational Power:''' Many edge devices lack the computational power needed for needed for deep learning applications, especially given the large amount of data and complex operations that may have to take place [1].

'''Energy Management:''' Given that edge devices may consist of sensors, or other devices that are meant to have low energy consumption, the tasks associated with machine learning could quickly drain their power, even if they have sufficient power to run the workloads [6].

'''New Attack Surfaces:''' With more devices comes the increased potential for malicious hackers to steal data or compromise devices, despite the enhanced privacy. Encryption, access controls, and other methodologies must be employed to mitigate the potential for new attack surfaces to be exploited [6].

=='''4.2 ML Training at the Edge'''==

Machine Learning (ML) training at the edge is basically the process of developing, updating, or fine-tuning ML models directly on edge devices like on smartphones, IoT sensors, wearables, and other embedded systems instead of only depending on centralized cloud infrastructure. This approach is becoming a lot more important as the demand for real-time, personalized AI applications continues to grow. By being able to train models closer to where the data is generated, edge-based ML enables faster responses, helps reduce latency, and enhances user privacy by minimizing the need to transmit sensitive data to the cloud. It’s also especially useful in scenarios where devices operate in environments with limited or unreliable network connectivity, allowing them to function more efficiently.

[[File:Machine Learning Pipeline.png|500px|thumb|center]]

[[File:Learning.png|500px|thumb|center|]]

==='''Benefits:'''===
One significant advantage of training ML models directly on edge devices is reduced latency. By processing data locally, devices can make immediate decisions without the delays caused by transmitting data back and forth to cloud servers. This immediate responsiveness is extremely important for applications like real time health monitoring, autonomous driving, and industrial automation.

Additionally, training machine learning models at the edge significantly enhances user privacy. Since sensitive data can be processed and stored directly on the user's device rather than being sent to centralized cloud servers, the risk of data breaches or unauthorized access during transmission is reduced by a lot. This local data handling is able to prevent exposure of personal or confidential information, providing users greater control over their data. Edge-based training naturally aligns with privacy regulations such as the General Data Protection Regulation (GDPR), which emphasizes strict data security, transparency, and explicit user consent. By keeping personal data localized, edge training not only improves security but also helps organizations easily comply with privacy laws, protecting users’ rights and maintaining trust.

Efficiency and resilience are important benefits of edge training. By training machine learning models directly on edge devices, these devices become capable of processing data locally without relying on constant internet connectivity. This local processing allows edge devices to continue operating effectively even in environments where network connections are weak, unstable, or completely unavailable. Because they are not fully dependent on cloud infrastructure, edge devices can quickly adapt to changes, respond in real-time, and update their ML models based on immediate local data. As a result, edge training ensures reliable performance and uninterrupted operation, making it particularly valuable for remote locations, emergency scenarios, and harsh environments where cloud-based solutions might fail or become unreliable.
Research Papers

An important contribution to the understanding of machine learning (ML) training at the edge is the research paper "Making Distributed Edge Machine Learning for Resource-Constrained Communities and Environments Smarter: Contexts and Challenges" by Truong et al. (2023). This paper focuses on training ML models directly on edge devices in communities and environments facing limitations, such as unstable network connections, limited computational resources, and scarce technical expertise. The authors emphasize the necessity of developing context-aware ML training methods specifically tailored to these environments. Traditional centralized ML training methods often fail to operate effectively in such constrained settings, highlighting the need for decentralized, localized solutions. Truong et al. explore various challenges, including managing data efficiently, deploying suitable software frameworks, and designing intelligent runtime strategies that allow edge devices to train models effectively despite limited resources. Their work points out significant research directions, advocating for more adaptable and sustainable ML training solutions that genuinely reflect the technological and social contexts of resource-limited environments.

==='''Tools and Frameworks:'''===

Frameworks like TensorFlow Lite, PyTorch Mobile, and Edge Impulse are designed to support edge-based model training and inference. These tools allow developers to build and fine-tune models specifically for deployment on low-power devices.

==='''Technical Challenges:'''===

Despite its advantages, ML training at the edge presents challenges, including limited processing power, memory constraints, and energy efficiency. Edge devices often lack the computational resources of cloud servers, requiring lightweight models, optimized algorithms, and energy-efficient hardware.

Real World Applications:
A well known example is Apple’s use of on-device training for personalized voice recognition with Siri. Instead of uploading user voice data to the cloud, Apple uses local training to improve accuracy over time while maintaining user privacy.

=='''4.3 ML Model Optimization at the Edge'''==

===The Need for Model Optimization at the Edge===
Given the constrained resources, along with the inherently dynamic environment that edge devices must function in, model optimization is a crucial part of machine learning in edge computing. The current most widely used methodology consists of simply specifying an exceptionally large set of parameters, and giving it to the model to train on. This can be feasible when hardware is very advanced and powerful, and is necessary for systems such as Large Language Models (LLMs). However, this is no longer viable when dealing with the devices and environments at the edge. It is crucial to identify the best parameters and training methodology so as to minimize the amount of work done by these devices, while compromising as little as possible on the accuracy of the models. There are multiple ways to this, and they include either optimization or augmentation of the dataset itself, or optimization of the partition of work among the edge devices.

===Edge and Cloud Collaboration===
One methodology that is often used involves collaboration between both Edge and Cloud Devices. The cloud has the ability to process workloads that may require much more resources and cannot be done on edge devices. On the other hand, edge devices, which can store and process data locally, may have lower latency and more privacy. Given the advantages of each of these, many have proposed that the best way to handle machine learning is through a combination of edge and cloud computing.

The primary issue facing this computing paradigm, however, is the problem of optimally selecting which workloads should be done on the cloud and which should be done on the edge. This is a crucial problem to solve, as the correct partition of workloads is the best way to ensure that the respective benefits of the devices can be leveraged. A common way to do this, is to run certain computing tasks on the necessary devices and determine the length of time and resources that it takes. An example of this is the profiling step done in [https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10818760&tag=1 EdgeShard] [1] and [https://dl.acm.org/doi/pdf/10.1145/3093337.3037698 Neurosurgeon] [4]. Other frameworks implement similar steps, where the capabilities of different devices are tested in order to allocate their workloads and determine the limit at which they can provide efficient functionality. If the workload is beyond the limits of the devices, it can be sent to the cloud for processing

The key advantage of this is that it is able to utilize the resources of the edge devices as necessary, allowing increased data privacy and lower latency. Since workloads are only processed in the cloud as needed, this will reduce the overall amount of time needed for processing because data is not constantly sent back and forth. It also allows for much less network congestion, which is crucial for many applications.

[[File:ECcollab.png|400px|thumb|center|The collaboration of Edge and Cloud Devices]]

===Optimizing Workload Partitioning===
The key idea for much of the optimization done in machine learning on edge systems involves fully utilizing the heterogenous devices that are often contained in these systems. As such, it is important to understand the capabilities of each device so as to fully utilize its advantages. Devices can very greatly from smartphones with more powerful computational abilities to raspberry pis to sensors. More difficult tasks are offloaded to the powerful devices, while simpler tasks, or models that have been somewhat pretrained can be sent to the smaller devices. In some cases, as in [https://ieeexplore.ieee.org/abstract/document/8690980 Mobile-Edge] [2], the task may be dropped altogether if the resources are deemed insufficient. In this way, exceptionally difficult tasks do not block tasks that have the ability to be executed and therefore the system can continue working.

===Dynamic Models===
Given the dynamic nature of the environments that edge devices must function, as well as the heterogeneity of the devices themselves, a dynamic model of machine learning is often employed. Such models must keep track of the current available resources including computation usage and power, as well as network traffic. These may change very often depending on the workloads and devices in the system. As such, training models to continuously monitor and dynamically distribute the workloads is a very important part of optimization. Simply offloading larger tasks to more powerful devices may be obsolete if the devices has all of its computing resources or network capabilities being used up by another workload.

The way this is commonly done is by using the profiling step described above as a baseline. Then, a machine learning model utilizes the data to estimate the performance of devices and/or layers. During runtime, a similar process is employed which may update the data used and help the model refine its predictions. Network traffic is also taken into account at this stage in order to preserve the edge computing benefit of providing lower latency. Using all of this data and updates at runtime, the partitioning model is able to dynamically distribute workloads at runtime in order to optimize the workflow and ensure each device is utilizing its resources in the most efficient manner. 2 very good examples of how such a system is specifically deployed are the Neurosurgeon and EdgeShard systems, shown above.

===Horizontal and Vertical Partitioning===
There are 2 major ways that these models split the workloads in order to optimize the machine learning: Horizontal and vertical partitioning [3]. Given a set of layers that ranges from the cloud to edge, horizontal partitioning involves splitting up the workload between the layers. For example, if a large amount of computational resources is deemed necessary, this task may go to the cloud to be completed and preprocessed. One the other hand, if a small amount of computational power is required, this type of work can go to edge devices. Such partitioning also depends on the confidence and accuracy level of the given learning. If the accuracy is completed on an edge device and found to be very low, it can be sent to the cloud; on the other hand if the accuracy is already fairly high and the learning model needs smaller work to reach the threshold deemed acceptable, it may be sent to edge devices to free up network traffic on the cloud and reduce latency [3].

The second model of partitioning is called vertical partitioning. This involves splitting among the devices within a certain layer rather than among the layers themselves. This is similar to what has been described in previous sections, as it allows a means for fully utilizing the heterogenous abilities that are found in edge devices. Similar functionality and determination to what is found in horizontal partitioning is done, but all of the devices that the workload is split across function on the same layer [3]. To fully optimize a machine learning model, both horizontal and vertical partitioning must be used.

[[File:Edge_computing_layers.png|400px|thumb|center|An example of different layers with multiple devices]]

===Methods of Training Optimization===

=='''References'''==
[https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10818760&tag=1 1. M. Zhang, X. Shen, J. Cao, Z. Cui and S. Jiang, "EdgeShard: Efficient LLM Inference via Collaborative Edge Computing," in IEEE Internet of Things Journal, doi: 10.1109/JIOT.2024.3524255]

[https://ieeexplore.ieee.org/abstract/document/8690980 2. X. Chen, H. Zhang, C. Wu, S. Mao, Y. Ji and M. Bennis, "Performance Optimization in Mobile-Edge Computing via Deep Reinforcement Learning," 2018 IEEE 88th Vehicular Technology Conference (VTC-Fall), Chicago, IL, USA, 2018, pp. 1-6, doi: 10.1109/VTCFall.2018.8690980.]

[https://ieeexplore.ieee.org/abstract/document/8976180 3. X. Wang, Y. Han, V. C. M. Leung, D. Niyato, X. Yan and X. Chen, "Convergence of Edge Computing and Deep Learning: A Comprehensive Survey," in IEEE Communications Surveys & Tutorials, vol. 22, no. 2, pp. 869-904, Secondquarter 2020, doi: 10.1109/COMST.2020.2970550.]

[https://dl.acm.org/doi/abs/10.1145/3093337.3037698 4. Kang, Yiping and Hauswald, Johann and Gao, Cao and Rovinski, Austin and Mudge, Trevor and Mars, Jason and Tang, Lingjia, "Neurosurgeon: Collaborative Intelligence Between the Cloud and Mobile Edge" 2017 Association for Computing Machinery, New York, NY, USA, 2017, doi: 10.1145/3093337.3037698.]

[https://dl.acm.org/doi/pdf/10.1145/3555802 5. Hua, Haochen, et al. "Edge computing with artificial intelligence: A machine learning perspective." ACM Computing Surveys 55.9 (2023): 1-35.]

[https://www.mdpi.com/2079-9292/13/3/640 6. Grzesik, Piotr, and Dariusz Mrozek. "Combining machine learning and edge computing: Opportunities, challenges, platforms, frameworks, and use cases." Electronics 13.3 (2024): 640.]

[https://link.springer.com/chapter/10.1007/978-3-030-96756-7_1#citeas 7. Rafatirad, S., Homayoun, H., Chen, Z., Pudukotai Dinakarrao, S.M. (2022). What Is Applied Machine Learning?. In: Machine Learning for Computer Scientists and Data Analysts. Springer, Cham. https://doi.org/10.1007/978-3-030-96756-7_1]

Machine Learning at the Edge

2025-04-07T02:27:09Z

Ciarang: /* Introduction */

='''Machine Learning at the Edge'''=

=='''4.1 Overview of ML at the Edge'''==
===Introduction===
Machine Learning is a branch of Artificial Intelligence that is primarily concerned with the training of algorithms to look at sets of data, analyze patterns, and generate conclusions so that the gathered data can be used to generate results and carry out tasks. As an application, machine learning is especially relevant when considering edge computing [7]. Given the recent prevalence and rise in the field of machine learning, it is inevitable that deployment of machine learning algorithms will play a crucial role in the function of edge computing systems as well. The abundant sensor networks often involved in edge systems provide a means of gathering very large amounts of data. With the help of machine learning, such data can be utilized in very effective ways and automation methods can be employed to improve efficiency. Machine learning on edge devices themselves can help ease the burden on cloud systems and the networks connected to them as well, especially if the model does need such large computational power such as in the case of certain Small Language Models (SLMs). However, the main issue regarding machine learning at the edge lies in the devices, which may have limited computational power, and the environment, which may change dynamically and be effected by network congestion, device outages, or other unexpected events. If these challenges can be overcome, edge systems could play a pivotal role in advancing machine learning and utilize it to maximize efficiency.

[[File:ECML.png|600px|thumb|center|]]

===Benefits of Machine Learning using Edge Computing===
'''Data Volumes:''' Traditional cloud computing architecture may not be able to keep up with the massive volume of data that is often generated by IoT devices and sensors, thus increasing cost as well as pressure and congestion on networks transmitting this data [5]. Additionally, more training data for machine learning models generally increases accuracy, and thus edge computing could be an effective solution to providing lots of good and relevant data for a variety of applications.

'''Lower Latency:''' For certain real-time applications, such as Virtual and Augmented Reality(VR/AR) or smart cars, the latency required to transmit data and process it to the cloud may be too high to make these applications efficient and safe. A key benefit of edge computing lies in the reduced latency provided by putting devices closer to the users, and this is applicable to machine learning as well [5].

'''Enhanced privacy:''' Processing data locally rather than on the cloud reduces the risk of data theft and enhances privacy. The data does not have to be sent to anyone else, or go over a network to potentially get compromised. This is crucial given the large amounts of data needed for machine learning, and especially if personal data is needed for a personal application, many users would prefer the privacy done by having it processed locally [5].

===Challenges of Machine Learning Using Edge Computing===

'''Potentially Low Computational Power:''' Many edge devices lack the computational power needed for needed for deep learning applications, especially given the large amount of data and complex operations that may have to take place [1].

'''Energy Management:''' Given that edge devices may consist of sensors, or other devices that are meant to have low energy consumption, the tasks associated with machine learning could quickly drain their power, even if they have sufficient power to run the workloads [6].

'''New Attack Surfaces:''' With more devices comes the increased potential for malicious hackers to steal data or compromise devices, despite the enhanced privacy. Encryption, access controls, and other methodologies must be employed to mitigate the potential for new attack surfaces to be exploited [6].

=='''4.2 ML Training at the Edge'''==

Machine Learning (ML) training at the edge is basically the process of developing, updating, or fine-tuning ML models directly on edge devices like on smartphones, IoT sensors, wearables, and other embedded systems instead of only depending on centralized cloud infrastructure. This approach is becoming a lot more important as the demand for real-time, personalized AI applications continues to grow. By being able to train models closer to where the data is generated, edge-based ML enables faster responses, helps reduce latency, and enhances user privacy by minimizing the need to transmit sensitive data to the cloud. It’s also especially useful in scenarios where devices operate in environments with limited or unreliable network connectivity, allowing them to function more efficiently.

[[File:Machine Learning Pipeline.png|500px|thumb|center]]

[[File:Learning.png|500px|thumb|center|]]

==='''Benefits:'''===
One significant advantage of training ML models directly on edge devices is reduced latency. By processing data locally, devices can make immediate decisions without the delays caused by transmitting data back and forth to cloud servers. This immediate responsiveness is extremely important for applications like real time health monitoring, autonomous driving, and industrial automation.

Additionally, training machine learning models at the edge significantly enhances user privacy. Since sensitive data can be processed and stored directly on the user's device rather than being sent to centralized cloud servers, the risk of data breaches or unauthorized access during transmission is reduced by a lot. This local data handling is able to prevent exposure of personal or confidential information, providing users greater control over their data. Edge-based training naturally aligns with privacy regulations such as the General Data Protection Regulation (GDPR), which emphasizes strict data security, transparency, and explicit user consent. By keeping personal data localized, edge training not only improves security but also helps organizations easily comply with privacy laws, protecting users’ rights and maintaining trust.

Efficiency and resilience are important benefits of edge training. By training machine learning models directly on edge devices, these devices become capable of processing data locally without relying on constant internet connectivity. This local processing allows edge devices to continue operating effectively even in environments where network connections are weak, unstable, or completely unavailable. Because they are not fully dependent on cloud infrastructure, edge devices can quickly adapt to changes, respond in real-time, and update their ML models based on immediate local data. As a result, edge training ensures reliable performance and uninterrupted operation, making it particularly valuable for remote locations, emergency scenarios, and harsh environments where cloud-based solutions might fail or become unreliable.
Research Papers

An important contribution to the understanding of machine learning (ML) training at the edge is the research paper "Making Distributed Edge Machine Learning for Resource-Constrained Communities and Environments Smarter: Contexts and Challenges" by Truong et al. (2023). This paper focuses on training ML models directly on edge devices in communities and environments facing limitations, such as unstable network connections, limited computational resources, and scarce technical expertise. The authors emphasize the necessity of developing context-aware ML training methods specifically tailored to these environments. Traditional centralized ML training methods often fail to operate effectively in such constrained settings, highlighting the need for decentralized, localized solutions. Truong et al. explore various challenges, including managing data efficiently, deploying suitable software frameworks, and designing intelligent runtime strategies that allow edge devices to train models effectively despite limited resources. Their work points out significant research directions, advocating for more adaptable and sustainable ML training solutions that genuinely reflect the technological and social contexts of resource-limited environments.

==='''Tools and Frameworks:'''===

Frameworks like TensorFlow Lite, PyTorch Mobile, and Edge Impulse are designed to support edge-based model training and inference. These tools allow developers to build and fine-tune models specifically for deployment on low-power devices.

==='''Technical Challenges:'''===

Despite its advantages, ML training at the edge presents challenges, including limited processing power, memory constraints, and energy efficiency. Edge devices often lack the computational resources of cloud servers, requiring lightweight models, optimized algorithms, and energy-efficient hardware.

Real World Applications:
A well known example is Apple’s use of on-device training for personalized voice recognition with Siri. Instead of uploading user voice data to the cloud, Apple uses local training to improve accuracy over time while maintaining user privacy.

=='''4.3 ML Model Optimization at the Edge'''==

===The Need for Model Optimization at the Edge===
Given the constrained resources, along with the inherently dynamic environment that edge devices must function in, model optimization is a crucial part of machine learning in edge computing. The current most widely used methodology consists of simply specifying an exceptionally large set of parameters, and giving it to the model to train on. This can be feasible when hardware is very advanced and powerful, and is necessary for systems such as Large Language Models (LLMs). However, this is no longer viable when dealing with the devices and environments at the edge. It is crucial to identify the best parameters and training methodology so as to minimize the amount of work done by these devices, while compromising as little as possible on the accuracy of the models. There are multiple ways to this, and they include either optimization or augmentation of the dataset itself, or optimization of the partition of work among the edge devices.

===Edge and Cloud Collaboration===
One methodology that is often used involves collaboration between both Edge and Cloud Devices. The cloud has the ability to process workloads that may require much more resources and cannot be done on edge devices. On the other hand, edge devices, which can store and process data locally, may have lower latency and more privacy. Given the advantages of each of these, many have proposed that the best way to handle machine learning is through a combination of edge and cloud computing.

The primary issue facing this computing paradigm, however, is the problem of optimally selecting which workloads should be done on the cloud and which should be done on the edge. This is a crucial problem to solve, as the correct partition of workloads is the best way to ensure that the respective benefits of the devices can be leveraged. A common way to do this, is to run certain computing tasks on the necessary devices and determine the length of time and resources that it takes. An example of this is the profiling step done in [https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10818760&tag=1 EdgeShard] [1] and [https://dl.acm.org/doi/pdf/10.1145/3093337.3037698 Neurosurgeon] [4]. Other frameworks implement similar steps, where the capabilities of different devices are tested in order to allocate their workloads and determine the limit at which they can provide efficient functionality. If the workload is beyond the limits of the devices, it can be sent to the cloud for processing

The key advantage of this is that it is able to utilize the resources of the edge devices as necessary, allowing increased data privacy and lower latency. Since workloads are only processed in the cloud as needed, this will reduce the overall amount of time needed for processing because data is not constantly sent back and forth. It also allows for much less network congestion, which is crucial for many applications.

[[File:ECcollab.png|400px|thumb|center|The collaboration of Edge and Cloud Devices]]

===Optimizing Workload Partitioning===
The key idea for much of the optimization done in machine learning on edge systems involves fully utilizing the heterogenous devices that are often contained in these systems. As such, it is important to understand the capabilities of each device so as to fully utilize its advantages. Devices can very greatly from smartphones with more powerful computational abilities to raspberry pis to sensors. More difficult tasks are offloaded to the powerful devices, while simpler tasks, or models that have been somewhat pretrained can be sent to the smaller devices. In some cases, as in [https://ieeexplore.ieee.org/abstract/document/8690980 Mobile-Edge] [2], the task may be dropped altogether if the resources are deemed insufficient. In this way, exceptionally difficult tasks do not block tasks that have the ability to be executed and therefore the system can continue working.

===Dynamic Models===
Given the dynamic nature of the environments that edge devices must function, as well as the heterogeneity of the devices themselves, a dynamic model of machine learning is often employed. Such models must keep track of the current available resources including computation usage and power, as well as network traffic. These may change very often depending on the workloads and devices in the system. As such, training models to continuously monitor and dynamically distribute the workloads is a very important part of optimization. Simply offloading larger tasks to more powerful devices may be obsolete if the devices has all of its computing resources or network capabilities being used up by another workload.

The way this is commonly done is by using the profiling step described above as a baseline. Then, a machine learning model utilizes the data to estimate the performance of devices and/or layers. During runtime, a similar process is employed which may update the data used and help the model refine its predictions. Network traffic is also taken into account at this stage in order to preserve the edge computing benefit of providing lower latency. Using all of this data and updates at runtime, the partitioning model is able to dynamically distribute workloads at runtime in order to optimize the workflow and ensure each device is utilizing its resources in the most efficient manner. 2 very good examples of how such a system is specifically deployed are the Neurosurgeon and EdgeShard systems, shown above.

===Horizontal and Vertical Partitioning===
There are 2 major ways that these models split the workloads in order to optimize the machine learning: Horizontal and vertical partitioning [3]. Given a set of layers that ranges from the cloud to edge, horizontal partitioning involves splitting up the workload between the layers. For example, if a large amount of computational resources is deemed necessary, this task may go to the cloud to be completed and preprocessed. One the other hand, if a small amount of computational power is required, this type of work can go to edge devices. Such partitioning also depends on the confidence and accuracy level of the given learning. If the accuracy is completed on an edge device and found to be very low, it can be sent to the cloud; on the other hand if the accuracy is already fairly high and the learning model needs smaller work to reach the threshold deemed acceptable, it may be sent to edge devices to free up network traffic on the cloud and reduce latency [3].

The second model of partitioning is called vertical partitioning. This involves splitting among the devices within a certain layer rather than among the layers themselves. This is similar to what has been described in previous sections, as it allows a means for fully utilizing the heterogenous abilities that are found in edge devices. Similar functionality and determination to what is found in horizontal partitioning is done, but all of the devices that the workload is split across function on the same layer [3]. To fully optimize a machine learning model, both horizontal and vertical partitioning must be used.

[[File:Edge_computing_layers.png|400px|thumb|center|An example of different layers with multiple devices]]

=='''References'''==
[https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10818760&tag=1 1. M. Zhang, X. Shen, J. Cao, Z. Cui and S. Jiang, "EdgeShard: Efficient LLM Inference via Collaborative Edge Computing," in IEEE Internet of Things Journal, doi: 10.1109/JIOT.2024.3524255]

[https://ieeexplore.ieee.org/abstract/document/8690980 2. X. Chen, H. Zhang, C. Wu, S. Mao, Y. Ji and M. Bennis, "Performance Optimization in Mobile-Edge Computing via Deep Reinforcement Learning," 2018 IEEE 88th Vehicular Technology Conference (VTC-Fall), Chicago, IL, USA, 2018, pp. 1-6, doi: 10.1109/VTCFall.2018.8690980.]

[https://ieeexplore.ieee.org/abstract/document/8976180 3. X. Wang, Y. Han, V. C. M. Leung, D. Niyato, X. Yan and X. Chen, "Convergence of Edge Computing and Deep Learning: A Comprehensive Survey," in IEEE Communications Surveys & Tutorials, vol. 22, no. 2, pp. 869-904, Secondquarter 2020, doi: 10.1109/COMST.2020.2970550.]

[https://dl.acm.org/doi/abs/10.1145/3093337.3037698 4. Kang, Yiping and Hauswald, Johann and Gao, Cao and Rovinski, Austin and Mudge, Trevor and Mars, Jason and Tang, Lingjia, "Neurosurgeon: Collaborative Intelligence Between the Cloud and Mobile Edge" 2017 Association for Computing Machinery, New York, NY, USA, 2017, doi: 10.1145/3093337.3037698.]

[https://dl.acm.org/doi/pdf/10.1145/3555802 5. Hua, Haochen, et al. "Edge computing with artificial intelligence: A machine learning perspective." ACM Computing Surveys 55.9 (2023): 1-35.]

[https://www.mdpi.com/2079-9292/13/3/640 6. Grzesik, Piotr, and Dariusz Mrozek. "Combining machine learning and edge computing: Opportunities, challenges, platforms, frameworks, and use cases." Electronics 13.3 (2024): 640.]

[https://link.springer.com/chapter/10.1007/978-3-030-96756-7_1#citeas 7. Rafatirad, S., Homayoun, H., Chen, Z., Pudukotai Dinakarrao, S.M. (2022). What Is Applied Machine Learning?. In: Machine Learning for Computer Scientists and Data Analysts. Springer, Cham. https://doi.org/10.1007/978-3-030-96756-7_1]

Machine Learning at the Edge

2025-04-07T02:22:55Z

Ciarang: /* 4.2 ML Training at the Edge */

='''Machine Learning at the Edge'''=

=='''4.1 Overview of ML at the Edge'''==
===Introduction===
Machine Learning is a branch of Artificial Intelligence that is primarily concerned with the training of algorithms to look at sets of data, analyze patterns and generate conclusions so that the gathered data can be used to generate results and carry out tasks. As an application, machine learning is especially relevant when considering edge computing [7]. Given the recent prevalence and rise in the field of machine learning, it is inevitable that deployment of machine learning algorithms will play a crucial role in the function of edge computing systems as well. The abundant sensor networks often involved in edge systems provide a means of gathering very large amounts of data. With the help of machine learning, such data can be utilized in very effective ways and automation methods can be employed to improve efficiency. Machine learning on edge devices themselves can help ease the burden on cloud systems and the networks connected to them as well, especially if the model does need such large computational power such as in the case of certain Small Language Models (SLMs). However, the main issue regarding machine learning at the edge lies in the devices, which may have limited computational power, and the environment, which may change dynamically and be effected by network congestion, device outages, or other unexpected events. If these challenges can be overcome, edge systems could play a pivotal role in advancing machine learning and utilize it to maximize efficiency.

[[File:ECML.png|600px|thumb|center|]]

===Benefits of Machine Learning using Edge Computing===
'''Data Volumes:''' Traditional cloud computing architecture may not be able to keep up with the massive volume of data that is often generated by IoT devices and sensors, thus increasing cost as well as pressure and congestion on networks transmitting this data [5]. Additionally, more training data for machine learning models generally increases accuracy, and thus edge computing could be an effective solution to providing lots of good and relevant data for a variety of applications.

'''Lower Latency:''' For certain real-time applications, such as Virtual and Augmented Reality(VR/AR) or smart cars, the latency required to transmit data and process it to the cloud may be too high to make these applications efficient and safe. A key benefit of edge computing lies in the reduced latency provided by putting devices closer to the users, and this is applicable to machine learning as well [5].

'''Enhanced privacy:''' Processing data locally rather than on the cloud reduces the risk of data theft and enhances privacy. The data does not have to be sent to anyone else, or go over a network to potentially get compromised. This is crucial given the large amounts of data needed for machine learning, and especially if personal data is needed for a personal application, many users would prefer the privacy done by having it processed locally [5].

===Challenges of Machine Learning Using Edge Computing===

'''Potentially Low Computational Power:''' Many edge devices lack the computational power needed for needed for deep learning applications, especially given the large amount of data and complex operations that may have to take place [1].

'''Energy Management:''' Given that edge devices may consist of sensors, or other devices that are meant to have low energy consumption, the tasks associated with machine learning could quickly drain their power, even if they have sufficient power to run the workloads [6].

'''New Attack Surfaces:''' With more devices comes the increased potential for malicious hackers to steal data or compromise devices, despite the enhanced privacy. Encryption, access controls, and other methodologies must be employed to mitigate the potential for new attack surfaces to be exploited [6].

=='''4.2 ML Training at the Edge'''==

Machine Learning (ML) training at the edge is basically the process of developing, updating, or fine-tuning ML models directly on edge devices like on smartphones, IoT sensors, wearables, and other embedded systems instead of only depending on centralized cloud infrastructure. This approach is becoming a lot more important as the demand for real-time, personalized AI applications continues to grow. By being able to train models closer to where the data is generated, edge-based ML enables faster responses, helps reduce latency, and enhances user privacy by minimizing the need to transmit sensitive data to the cloud. It’s also especially useful in scenarios where devices operate in environments with limited or unreliable network connectivity, allowing them to function more efficiently.

[[File:Machine Learning Pipeline.png|500px|thumb|center]]

[[File:Learning.png|500px|thumb|center|]]

==='''Benefits:'''===
One significant advantage of training ML models directly on edge devices is reduced latency. By processing data locally, devices can make immediate decisions without the delays caused by transmitting data back and forth to cloud servers. This immediate responsiveness is extremely important for applications like real time health monitoring, autonomous driving, and industrial automation.

Additionally, training machine learning models at the edge significantly enhances user privacy. Since sensitive data can be processed and stored directly on the user's device rather than being sent to centralized cloud servers, the risk of data breaches or unauthorized access during transmission is reduced by a lot. This local data handling is able to prevent exposure of personal or confidential information, providing users greater control over their data. Edge-based training naturally aligns with privacy regulations such as the General Data Protection Regulation (GDPR), which emphasizes strict data security, transparency, and explicit user consent. By keeping personal data localized, edge training not only improves security but also helps organizations easily comply with privacy laws, protecting users’ rights and maintaining trust.

Efficiency and resilience are important benefits of edge training. By training machine learning models directly on edge devices, these devices become capable of processing data locally without relying on constant internet connectivity. This local processing allows edge devices to continue operating effectively even in environments where network connections are weak, unstable, or completely unavailable. Because they are not fully dependent on cloud infrastructure, edge devices can quickly adapt to changes, respond in real-time, and update their ML models based on immediate local data. As a result, edge training ensures reliable performance and uninterrupted operation, making it particularly valuable for remote locations, emergency scenarios, and harsh environments where cloud-based solutions might fail or become unreliable.
Research Papers

An important contribution to the understanding of machine learning (ML) training at the edge is the research paper "Making Distributed Edge Machine Learning for Resource-Constrained Communities and Environments Smarter: Contexts and Challenges" by Truong et al. (2023). This paper focuses on training ML models directly on edge devices in communities and environments facing limitations, such as unstable network connections, limited computational resources, and scarce technical expertise. The authors emphasize the necessity of developing context-aware ML training methods specifically tailored to these environments. Traditional centralized ML training methods often fail to operate effectively in such constrained settings, highlighting the need for decentralized, localized solutions. Truong et al. explore various challenges, including managing data efficiently, deploying suitable software frameworks, and designing intelligent runtime strategies that allow edge devices to train models effectively despite limited resources. Their work points out significant research directions, advocating for more adaptable and sustainable ML training solutions that genuinely reflect the technological and social contexts of resource-limited environments.

==='''Tools and Frameworks:'''===

Frameworks like TensorFlow Lite, PyTorch Mobile, and Edge Impulse are designed to support edge-based model training and inference. These tools allow developers to build and fine-tune models specifically for deployment on low-power devices.

==='''Technical Challenges:'''===

Despite its advantages, ML training at the edge presents challenges, including limited processing power, memory constraints, and energy efficiency. Edge devices often lack the computational resources of cloud servers, requiring lightweight models, optimized algorithms, and energy-efficient hardware.

Real World Applications:
A well known example is Apple’s use of on-device training for personalized voice recognition with Siri. Instead of uploading user voice data to the cloud, Apple uses local training to improve accuracy over time while maintaining user privacy.

=='''4.3 ML Model Optimization at the Edge'''==

===The Need for Model Optimization at the Edge===
Given the constrained resources, along with the inherently dynamic environment that edge devices must function in, model optimization is a crucial part of machine learning in edge computing. The current most widely used methodology consists of simply specifying an exceptionally large set of parameters, and giving it to the model to train on. This can be feasible when hardware is very advanced and powerful, and is necessary for systems such as Large Language Models (LLMs). However, this is no longer viable when dealing with the devices and environments at the edge. It is crucial to identify the best parameters and training methodology so as to minimize the amount of work done by these devices, while compromising as little as possible on the accuracy of the models. There are multiple ways to this, and they include either optimization or augmentation of the dataset itself, or optimization of the partition of work among the edge devices.

===Edge and Cloud Collaboration===
One methodology that is often used involves collaboration between both Edge and Cloud Devices. The cloud has the ability to process workloads that may require much more resources and cannot be done on edge devices. On the other hand, edge devices, which can store and process data locally, may have lower latency and more privacy. Given the advantages of each of these, many have proposed that the best way to handle machine learning is through a combination of edge and cloud computing.

The primary issue facing this computing paradigm, however, is the problem of optimally selecting which workloads should be done on the cloud and which should be done on the edge. This is a crucial problem to solve, as the correct partition of workloads is the best way to ensure that the respective benefits of the devices can be leveraged. A common way to do this, is to run certain computing tasks on the necessary devices and determine the length of time and resources that it takes. An example of this is the profiling step done in [https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10818760&tag=1 EdgeShard] [1] and [https://dl.acm.org/doi/pdf/10.1145/3093337.3037698 Neurosurgeon] [4]. Other frameworks implement similar steps, where the capabilities of different devices are tested in order to allocate their workloads and determine the limit at which they can provide efficient functionality. If the workload is beyond the limits of the devices, it can be sent to the cloud for processing

The key advantage of this is that it is able to utilize the resources of the edge devices as necessary, allowing increased data privacy and lower latency. Since workloads are only processed in the cloud as needed, this will reduce the overall amount of time needed for processing because data is not constantly sent back and forth. It also allows for much less network congestion, which is crucial for many applications.

[[File:ECcollab.png|400px|thumb|center|The collaboration of Edge and Cloud Devices]]

===Optimizing Workload Partitioning===
The key idea for much of the optimization done in machine learning on edge systems involves fully utilizing the heterogenous devices that are often contained in these systems. As such, it is important to understand the capabilities of each device so as to fully utilize its advantages. Devices can very greatly from smartphones with more powerful computational abilities to raspberry pis to sensors. More difficult tasks are offloaded to the powerful devices, while simpler tasks, or models that have been somewhat pretrained can be sent to the smaller devices. In some cases, as in [https://ieeexplore.ieee.org/abstract/document/8690980 Mobile-Edge] [2], the task may be dropped altogether if the resources are deemed insufficient. In this way, exceptionally difficult tasks do not block tasks that have the ability to be executed and therefore the system can continue working.

===Dynamic Models===
Given the dynamic nature of the environments that edge devices must function, as well as the heterogeneity of the devices themselves, a dynamic model of machine learning is often employed. Such models must keep track of the current available resources including computation usage and power, as well as network traffic. These may change very often depending on the workloads and devices in the system. As such, training models to continuously monitor and dynamically distribute the workloads is a very important part of optimization. Simply offloading larger tasks to more powerful devices may be obsolete if the devices has all of its computing resources or network capabilities being used up by another workload.

The way this is commonly done is by using the profiling step described above as a baseline. Then, a machine learning model utilizes the data to estimate the performance of devices and/or layers. During runtime, a similar process is employed which may update the data used and help the model refine its predictions. Network traffic is also taken into account at this stage in order to preserve the edge computing benefit of providing lower latency. Using all of this data and updates at runtime, the partitioning model is able to dynamically distribute workloads at runtime in order to optimize the workflow and ensure each device is utilizing its resources in the most efficient manner. 2 very good examples of how such a system is specifically deployed are the Neurosurgeon and EdgeShard systems, shown above.

===Horizontal and Vertical Partitioning===
There are 2 major ways that these models split the workloads in order to optimize the machine learning: Horizontal and vertical partitioning [3]. Given a set of layers that ranges from the cloud to edge, horizontal partitioning involves splitting up the workload between the layers. For example, if a large amount of computational resources is deemed necessary, this task may go to the cloud to be completed and preprocessed. One the other hand, if a small amount of computational power is required, this type of work can go to edge devices. Such partitioning also depends on the confidence and accuracy level of the given learning. If the accuracy is completed on an edge device and found to be very low, it can be sent to the cloud; on the other hand if the accuracy is already fairly high and the learning model needs smaller work to reach the threshold deemed acceptable, it may be sent to edge devices to free up network traffic on the cloud and reduce latency [3].

The second model of partitioning is called vertical partitioning. This involves splitting among the devices within a certain layer rather than among the layers themselves. This is similar to what has been described in previous sections, as it allows a means for fully utilizing the heterogenous abilities that are found in edge devices. Similar functionality and determination to what is found in horizontal partitioning is done, but all of the devices that the workload is split across function on the same layer [3]. To fully optimize a machine learning model, both horizontal and vertical partitioning must be used.

[[File:Edge_computing_layers.png|400px|thumb|center|An example of different layers with multiple devices]]

=='''References'''==
[https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10818760&tag=1 1. M. Zhang, X. Shen, J. Cao, Z. Cui and S. Jiang, "EdgeShard: Efficient LLM Inference via Collaborative Edge Computing," in IEEE Internet of Things Journal, doi: 10.1109/JIOT.2024.3524255]

[https://ieeexplore.ieee.org/abstract/document/8690980 2. X. Chen, H. Zhang, C. Wu, S. Mao, Y. Ji and M. Bennis, "Performance Optimization in Mobile-Edge Computing via Deep Reinforcement Learning," 2018 IEEE 88th Vehicular Technology Conference (VTC-Fall), Chicago, IL, USA, 2018, pp. 1-6, doi: 10.1109/VTCFall.2018.8690980.]

[https://ieeexplore.ieee.org/abstract/document/8976180 3. X. Wang, Y. Han, V. C. M. Leung, D. Niyato, X. Yan and X. Chen, "Convergence of Edge Computing and Deep Learning: A Comprehensive Survey," in IEEE Communications Surveys & Tutorials, vol. 22, no. 2, pp. 869-904, Secondquarter 2020, doi: 10.1109/COMST.2020.2970550.]

[https://dl.acm.org/doi/abs/10.1145/3093337.3037698 4. Kang, Yiping and Hauswald, Johann and Gao, Cao and Rovinski, Austin and Mudge, Trevor and Mars, Jason and Tang, Lingjia, "Neurosurgeon: Collaborative Intelligence Between the Cloud and Mobile Edge" 2017 Association for Computing Machinery, New York, NY, USA, 2017, doi: 10.1145/3093337.3037698.]

[https://dl.acm.org/doi/pdf/10.1145/3555802 5. Hua, Haochen, et al. "Edge computing with artificial intelligence: A machine learning perspective." ACM Computing Surveys 55.9 (2023): 1-35.]

[https://www.mdpi.com/2079-9292/13/3/640 6. Grzesik, Piotr, and Dariusz Mrozek. "Combining machine learning and edge computing: Opportunities, challenges, platforms, frameworks, and use cases." Electronics 13.3 (2024): 640.]

[https://link.springer.com/chapter/10.1007/978-3-030-96756-7_1#citeas 7. Rafatirad, S., Homayoun, H., Chen, Z., Pudukotai Dinakarrao, S.M. (2022). What Is Applied Machine Learning?. In: Machine Learning for Computer Scientists and Data Analysts. Springer, Cham. https://doi.org/10.1007/978-3-030-96756-7_1]

File:Machine Learning Pipeline.png

2025-04-07T02:18:42Z

Ciarang: Machine Learning Pipeline

== Summary ==
Machine Learning Pipeline

Machine Learning at the Edge

2025-04-07T02:15:26Z

Ciarang: /* 4.1 Overview of ML at the Edge */

='''Machine Learning at the Edge'''=

=='''4.1 Overview of ML at the Edge'''==
===Introduction===
Machine Learning is a branch of Artificial Intelligence that is primarily concerned with the training of algorithms to look at sets of data, analyze patterns and generate conclusions so that the gathered data can be used to generate results and carry out tasks. As an application, machine learning is especially relevant when considering edge computing [7]. Given the recent prevalence and rise in the field of machine learning, it is inevitable that deployment of machine learning algorithms will play a crucial role in the function of edge computing systems as well. The abundant sensor networks often involved in edge systems provide a means of gathering very large amounts of data. With the help of machine learning, such data can be utilized in very effective ways and automation methods can be employed to improve efficiency. Machine learning on edge devices themselves can help ease the burden on cloud systems and the networks connected to them as well, especially if the model does need such large computational power such as in the case of certain Small Language Models (SLMs). However, the main issue regarding machine learning at the edge lies in the devices, which may have limited computational power, and the environment, which may change dynamically and be effected by network congestion, device outages, or other unexpected events. If these challenges can be overcome, edge systems could play a pivotal role in advancing machine learning and utilize it to maximize efficiency.

[[File:ECML.png|600px|thumb|center|]]

===Benefits of Machine Learning using Edge Computing===
'''Data Volumes:''' Traditional cloud computing architecture may not be able to keep up with the massive volume of data that is often generated by IoT devices and sensors, thus increasing cost as well as pressure and congestion on networks transmitting this data [5]. Additionally, more training data for machine learning models generally increases accuracy, and thus edge computing could be an effective solution to providing lots of good and relevant data for a variety of applications.

'''Lower Latency:''' For certain real-time applications, such as Virtual and Augmented Reality(VR/AR) or smart cars, the latency required to transmit data and process it to the cloud may be too high to make these applications efficient and safe. A key benefit of edge computing lies in the reduced latency provided by putting devices closer to the users, and this is applicable to machine learning as well [5].

'''Enhanced privacy:''' Processing data locally rather than on the cloud reduces the risk of data theft and enhances privacy. The data does not have to be sent to anyone else, or go over a network to potentially get compromised. This is crucial given the large amounts of data needed for machine learning, and especially if personal data is needed for a personal application, many users would prefer the privacy done by having it processed locally [5].

===Challenges of Machine Learning Using Edge Computing===

'''Potentially Low Computational Power:''' Many edge devices lack the computational power needed for needed for deep learning applications, especially given the large amount of data and complex operations that may have to take place [1].

'''Energy Management:''' Given that edge devices may consist of sensors, or other devices that are meant to have low energy consumption, the tasks associated with machine learning could quickly drain their power, even if they have sufficient power to run the workloads [6].

'''New Attack Surfaces:''' With more devices comes the increased potential for malicious hackers to steal data or compromise devices, despite the enhanced privacy. Encryption, access controls, and other methodologies must be employed to mitigate the potential for new attack surfaces to be exploited [6].

=='''4.2 ML Training at the Edge'''==

Machine Learning (ML) training at the edge is basically the process of developing, updating, or fine-tuning ML models directly on edge devices like on smartphones, IoT sensors, wearables, and other embedded systems instead of only depending on centralized cloud infrastructure. This approach is becoming a lot more important as the demand for real-time, personalized AI applications continues to grow. By being able to train models closer to where the data is generated, edge-based ML enables faster responses, helps reduce latency, and enhances user privacy by minimizing the need to transmit sensitive data to the cloud. It’s also especially useful in scenarios where devices operate in environments with limited or unreliable network connectivity, allowing them to function more efficiently.

[[File:Learning.png|500px|thumb|center|]]

==='''Benefits:'''===
One significant advantage of training ML models directly on edge devices is reduced latency. By processing data locally, devices can make immediate decisions without the delays caused by transmitting data back and forth to cloud servers. This immediate responsiveness is extremely important for applications like real time health monitoring, autonomous driving, and industrial automation.

Additionally, training machine learning models at the edge significantly enhances user privacy. Since sensitive data can be processed and stored directly on the user's device rather than being sent to centralized cloud servers, the risk of data breaches or unauthorized access during transmission is reduced by a lot. This local data handling is able to prevent exposure of personal or confidential information, providing users greater control over their data. Edge-based training naturally aligns with privacy regulations such as the General Data Protection Regulation (GDPR), which emphasizes strict data security, transparency, and explicit user consent. By keeping personal data localized, edge training not only improves security but also helps organizations easily comply with privacy laws, protecting users’ rights and maintaining trust.

Efficiency and resilience are important benefits of edge training. By training machine learning models directly on edge devices, these devices become capable of processing data locally without relying on constant internet connectivity. This local processing allows edge devices to continue operating effectively even in environments where network connections are weak, unstable, or completely unavailable. Because they are not fully dependent on cloud infrastructure, edge devices can quickly adapt to changes, respond in real-time, and update their ML models based on immediate local data. As a result, edge training ensures reliable performance and uninterrupted operation, making it particularly valuable for remote locations, emergency scenarios, and harsh environments where cloud-based solutions might fail or become unreliable.
Research Papers

An important contribution to the understanding of machine learning (ML) training at the edge is the research paper "Making Distributed Edge Machine Learning for Resource-Constrained Communities and Environments Smarter: Contexts and Challenges" by Truong et al. (2023). This paper focuses on training ML models directly on edge devices in communities and environments facing limitations, such as unstable network connections, limited computational resources, and scarce technical expertise. The authors emphasize the necessity of developing context-aware ML training methods specifically tailored to these environments. Traditional centralized ML training methods often fail to operate effectively in such constrained settings, highlighting the need for decentralized, localized solutions. Truong et al. explore various challenges, including managing data efficiently, deploying suitable software frameworks, and designing intelligent runtime strategies that allow edge devices to train models effectively despite limited resources. Their work points out significant research directions, advocating for more adaptable and sustainable ML training solutions that genuinely reflect the technological and social contexts of resource-limited environments.

==='''Tools and Frameworks:'''===

Frameworks like TensorFlow Lite, PyTorch Mobile, and Edge Impulse are designed to support edge-based model training and inference. These tools allow developers to build and fine-tune models specifically for deployment on low-power devices.

==='''Technical Challenges:'''===

Despite its advantages, ML training at the edge presents challenges, including limited processing power, memory constraints, and energy efficiency. Edge devices often lack the computational resources of cloud servers, requiring lightweight models, optimized algorithms, and energy-efficient hardware.

Real World Applications:
A well known example is Apple’s use of on-device training for personalized voice recognition with Siri. Instead of uploading user voice data to the cloud, Apple uses local training to improve accuracy over time while maintaining user privacy.

=='''4.3 ML Model Optimization at the Edge'''==

===The Need for Model Optimization at the Edge===
Given the constrained resources, along with the inherently dynamic environment that edge devices must function in, model optimization is a crucial part of machine learning in edge computing. The current most widely used methodology consists of simply specifying an exceptionally large set of parameters, and giving it to the model to train on. This can be feasible when hardware is very advanced and powerful, and is necessary for systems such as Large Language Models (LLMs). However, this is no longer viable when dealing with the devices and environments at the edge. It is crucial to identify the best parameters and training methodology so as to minimize the amount of work done by these devices, while compromising as little as possible on the accuracy of the models. There are multiple ways to this, and they include either optimization or augmentation of the dataset itself, or optimization of the partition of work among the edge devices.

===Edge and Cloud Collaboration===
One methodology that is often used involves collaboration between both Edge and Cloud Devices. The cloud has the ability to process workloads that may require much more resources and cannot be done on edge devices. On the other hand, edge devices, which can store and process data locally, may have lower latency and more privacy. Given the advantages of each of these, many have proposed that the best way to handle machine learning is through a combination of edge and cloud computing.

The primary issue facing this computing paradigm, however, is the problem of optimally selecting which workloads should be done on the cloud and which should be done on the edge. This is a crucial problem to solve, as the correct partition of workloads is the best way to ensure that the respective benefits of the devices can be leveraged. A common way to do this, is to run certain computing tasks on the necessary devices and determine the length of time and resources that it takes. An example of this is the profiling step done in [https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10818760&tag=1 EdgeShard] [1] and [https://dl.acm.org/doi/pdf/10.1145/3093337.3037698 Neurosurgeon] [4]. Other frameworks implement similar steps, where the capabilities of different devices are tested in order to allocate their workloads and determine the limit at which they can provide efficient functionality. If the workload is beyond the limits of the devices, it can be sent to the cloud for processing

The key advantage of this is that it is able to utilize the resources of the edge devices as necessary, allowing increased data privacy and lower latency. Since workloads are only processed in the cloud as needed, this will reduce the overall amount of time needed for processing because data is not constantly sent back and forth. It also allows for much less network congestion, which is crucial for many applications.

[[File:ECcollab.png|400px|thumb|center|The collaboration of Edge and Cloud Devices]]

===Optimizing Workload Partitioning===
The key idea for much of the optimization done in machine learning on edge systems involves fully utilizing the heterogenous devices that are often contained in these systems. As such, it is important to understand the capabilities of each device so as to fully utilize its advantages. Devices can very greatly from smartphones with more powerful computational abilities to raspberry pis to sensors. More difficult tasks are offloaded to the powerful devices, while simpler tasks, or models that have been somewhat pretrained can be sent to the smaller devices. In some cases, as in [https://ieeexplore.ieee.org/abstract/document/8690980 Mobile-Edge] [2], the task may be dropped altogether if the resources are deemed insufficient. In this way, exceptionally difficult tasks do not block tasks that have the ability to be executed and therefore the system can continue working.

===Dynamic Models===
Given the dynamic nature of the environments that edge devices must function, as well as the heterogeneity of the devices themselves, a dynamic model of machine learning is often employed. Such models must keep track of the current available resources including computation usage and power, as well as network traffic. These may change very often depending on the workloads and devices in the system. As such, training models to continuously monitor and dynamically distribute the workloads is a very important part of optimization. Simply offloading larger tasks to more powerful devices may be obsolete if the devices has all of its computing resources or network capabilities being used up by another workload.

The way this is commonly done is by using the profiling step described above as a baseline. Then, a machine learning model utilizes the data to estimate the performance of devices and/or layers. During runtime, a similar process is employed which may update the data used and help the model refine its predictions. Network traffic is also taken into account at this stage in order to preserve the edge computing benefit of providing lower latency. Using all of this data and updates at runtime, the partitioning model is able to dynamically distribute workloads at runtime in order to optimize the workflow and ensure each device is utilizing its resources in the most efficient manner. 2 very good examples of how such a system is specifically deployed are the Neurosurgeon and EdgeShard systems, shown above.

===Horizontal and Vertical Partitioning===
There are 2 major ways that these models split the workloads in order to optimize the machine learning: Horizontal and vertical partitioning [3]. Given a set of layers that ranges from the cloud to edge, horizontal partitioning involves splitting up the workload between the layers. For example, if a large amount of computational resources is deemed necessary, this task may go to the cloud to be completed and preprocessed. One the other hand, if a small amount of computational power is required, this type of work can go to edge devices. Such partitioning also depends on the confidence and accuracy level of the given learning. If the accuracy is completed on an edge device and found to be very low, it can be sent to the cloud; on the other hand if the accuracy is already fairly high and the learning model needs smaller work to reach the threshold deemed acceptable, it may be sent to edge devices to free up network traffic on the cloud and reduce latency [3].

The second model of partitioning is called vertical partitioning. This involves splitting among the devices within a certain layer rather than among the layers themselves. This is similar to what has been described in previous sections, as it allows a means for fully utilizing the heterogenous abilities that are found in edge devices. Similar functionality and determination to what is found in horizontal partitioning is done, but all of the devices that the workload is split across function on the same layer [3]. To fully optimize a machine learning model, both horizontal and vertical partitioning must be used.

[[File:Edge_computing_layers.png|400px|thumb|center|An example of different layers with multiple devices]]

=='''References'''==
[https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10818760&tag=1 1. M. Zhang, X. Shen, J. Cao, Z. Cui and S. Jiang, "EdgeShard: Efficient LLM Inference via Collaborative Edge Computing," in IEEE Internet of Things Journal, doi: 10.1109/JIOT.2024.3524255]

[https://ieeexplore.ieee.org/abstract/document/8690980 2. X. Chen, H. Zhang, C. Wu, S. Mao, Y. Ji and M. Bennis, "Performance Optimization in Mobile-Edge Computing via Deep Reinforcement Learning," 2018 IEEE 88th Vehicular Technology Conference (VTC-Fall), Chicago, IL, USA, 2018, pp. 1-6, doi: 10.1109/VTCFall.2018.8690980.]

[https://ieeexplore.ieee.org/abstract/document/8976180 3. X. Wang, Y. Han, V. C. M. Leung, D. Niyato, X. Yan and X. Chen, "Convergence of Edge Computing and Deep Learning: A Comprehensive Survey," in IEEE Communications Surveys & Tutorials, vol. 22, no. 2, pp. 869-904, Secondquarter 2020, doi: 10.1109/COMST.2020.2970550.]

[https://dl.acm.org/doi/abs/10.1145/3093337.3037698 4. Kang, Yiping and Hauswald, Johann and Gao, Cao and Rovinski, Austin and Mudge, Trevor and Mars, Jason and Tang, Lingjia, "Neurosurgeon: Collaborative Intelligence Between the Cloud and Mobile Edge" 2017 Association for Computing Machinery, New York, NY, USA, 2017, doi: 10.1145/3093337.3037698.]

[https://dl.acm.org/doi/pdf/10.1145/3555802 5. Hua, Haochen, et al. "Edge computing with artificial intelligence: A machine learning perspective." ACM Computing Surveys 55.9 (2023): 1-35.]

[https://www.mdpi.com/2079-9292/13/3/640 6. Grzesik, Piotr, and Dariusz Mrozek. "Combining machine learning and edge computing: Opportunities, challenges, platforms, frameworks, and use cases." Electronics 13.3 (2024): 640.]

[https://link.springer.com/chapter/10.1007/978-3-030-96756-7_1#citeas 7. Rafatirad, S., Homayoun, H., Chen, Z., Pudukotai Dinakarrao, S.M. (2022). What Is Applied Machine Learning?. In: Machine Learning for Computer Scientists and Data Analysts. Springer, Cham. https://doi.org/10.1007/978-3-030-96756-7_1]

Machine Learning at the Edge

2025-04-07T01:33:13Z

Ciarang: /* References */

='''Machine Learning at the Edge'''=

=='''4.1 Overview of ML at the Edge'''==

=='''4.2 ML Training at the Edge'''==

Machine Learning (ML) training at the edge is basically the process of developing, updating, or fine-tuning ML models directly on edge devices like on smartphones, IoT sensors, wearables, and other embedded systems instead of only depending on centralized cloud infrastructure. This approach is becoming a lot more important as the demand for real-time, personalized AI applications continues to grow. By being able to train models closer to where the data is generated, edge-based ML enables faster responses, helps reduce latency, and enhances user privacy by minimizing the need to transmit sensitive data to the cloud. It’s also especially useful in scenarios where devices operate in environments with limited or unreliable network connectivity, allowing them to function more efficiently.

Benefits:
One significant advantage of training ML models directly on edge devices is reduced latency. By processing data locally, devices can make immediate decisions without the delays caused by transmitting data back and forth to cloud servers. This immediate responsiveness is extremely important for applications like real time health monitoring, autonomous driving, and industrial automation.

Additionally, training machine learning models at the edge significantly enhances user privacy. Since sensitive data can be processed and stored directly on the user's device rather than being sent to centralized cloud servers, the risk of data breaches or unauthorized access during transmission is reduced by a lot. This local data handling is able to prevent exposure of personal or confidential information, providing users greater control over their data. Edge-based training naturally aligns with privacy regulations such as the General Data Protection Regulation (GDPR), which emphasizes strict data security, transparency, and explicit user consent. By keeping personal data localized, edge training not only improves security but also helps organizations easily comply with privacy laws, protecting users’ rights and maintaining trust.

Efficiency and resilience are important benefits of edge training. By training machine learning models directly on edge devices, these devices become capable of processing data locally without relying on constant internet connectivity. This local processing allows edge devices to continue operating effectively even in environments where network connections are weak, unstable, or completely unavailable. Because they are not fully dependent on cloud infrastructure, edge devices can quickly adapt to changes, respond in real-time, and update their ML models based on immediate local data. As a result, edge training ensures reliable performance and uninterrupted operation, making it particularly valuable for remote locations, emergency scenarios, and harsh environments where cloud-based solutions might fail or become unreliable.
Research Papers

An important contribution to the understanding of machine learning (ML) training at the edge is the research paper "Making Distributed Edge Machine Learning for Resource-Constrained Communities and Environments Smarter: Contexts and Challenges" by Truong et al. (2023). This paper focuses on training ML models directly on edge devices in communities and environments facing limitations, such as unstable network connections, limited computational resources, and scarce technical expertise. The authors emphasize the necessity of developing context-aware ML training methods specifically tailored to these environments. Traditional centralized ML training methods often fail to operate effectively in such constrained settings, highlighting the need for decentralized, localized solutions. Truong et al. explore various challenges, including managing data efficiently, deploying suitable software frameworks, and designing intelligent runtime strategies that allow edge devices to train models effectively despite limited resources. Their work points out significant research directions, advocating for more adaptable and sustainable ML training solutions that genuinely reflect the technological and social contexts of resource-limited environments.

Tools and Frameworks:

Frameworks like TensorFlow Lite, PyTorch Mobile, and Edge Impulse are designed to support edge-based model training and inference. These tools allow developers to build and fine-tune models specifically for deployment on low-power devices.

Technical Challenges:

Despite its advantages, ML training at the edge presents challenges, including limited processing power, memory constraints, and energy efficiency. Edge devices often lack the computational resources of cloud servers, requiring lightweight models, optimized algorithms, and energy-efficient hardware.

Real World Applications:
A well known example is Apple’s use of on-device training for personalized voice recognition with Siri. Instead of uploading user voice data to the cloud, Apple uses local training to improve accuracy over time while maintaining user privacy.

=='''4.2 ML Training at the Edge'''==

=='''4.3 ML Model Optimization at the Edge'''==

===The Need for Model Optimization at the Edge===
Given the constrained resources, along with the inherently dynamic environment that edge devices must function in, model optimization is a crucial part of machine learning in edge computing. The current most widely used methodology consists of simply specifying an exceptionally large set of parameters, and giving it to the model to train on. This can be feasible when hardware is very advanced and powerful, and is necessary for systems such as Large Language Models (LLMs). However, this is no longer viable when dealing with the devices and environments at the edge. It is crucial to identify the best parameters and training methodology so as to minimize the amount of work done by these devices, while compromising as little as possible on the accuracy of the models. There are multiple ways to this, and they include either optimization or augmentation of the dataset itself, or optimization of the partition of work among the edge devices.

===Edge and Cloud Collaboration===
One methodology that is often used involves collaboration between both Edge and Cloud Devices. The cloud has the ability to process workloads that may require much more resources and cannot be done on edge devices. On the other hand, edge devices, which can store and process data locally, may have lower latency and more privacy. Given the advantages of each of these, many have proposed that the best way to handle machine learning is through a combination of edge and cloud computing.

The primary issue facing this computing paradigm, however, is the problem of optimally selecting which workloads should be done on the cloud and which should be done on the edge. This is a crucial problem to solve, as the correct partition of workloads is the best way to ensure that the respective benefits of the devices can be leveraged. A common way to do this, is to run certain computing tasks on the necessary devices and determine the length of time and resources that it takes. An example of this is the profiling step done in [https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10818760&tag=1 EdgeShard] [1] and [https://dl.acm.org/doi/pdf/10.1145/3093337.3037698 Neurosurgeon] [4]. Other frameworks implement similar steps, where the capabilities of different devices are tested in order to allocate their workloads and determine the limit at which they can provide efficient functionality. If the workload is beyond the limits of the devices, it can be sent to the cloud for processing

The key advantage of this is that it is able to utilize the resources of the edge devices as necessary, allowing increased data privacy and lower latency. Since workloads are only processed in the cloud as needed, this will reduce the overall amount of time needed for processing because data is not constantly sent back and forth. It also allows for much less network congestion, which is crucial for many applications.

[[File:ECcollab.png|400px|thumb|center|The collaboration of Edge and Cloud Devices]]

===Optimizing Workload Partitioning===
The key idea for much of the optimization done in machine learning on edge systems involves fully utilizing the heterogenous devices that are often contained in these systems. As such, it is important to understand the capabilities of each device so as to fully utilize its advantages. Devices can very greatly from smartphones with more powerful computational abilities to raspberry pis to sensors. More difficult tasks are offloaded to the powerful devices, while simpler tasks, or models that have been somewhat pretrained can be sent to the smaller devices. In some cases, as in [https://ieeexplore.ieee.org/abstract/document/8690980 Mobile-Edge] [2], the task may be dropped altogether if the resources are deemed insufficient. In this way, exceptionally difficult tasks do not block tasks that have the ability to be executed and therefore the system can continue working.

===Dynamic Models===
Given the dynamic nature of the environments that edge devices must function, as well as the heterogeneity of the devices themselves, a dynamic model of machine learning is often employed. Such models must keep track of the current available resources including computation usage and power, as well as network traffic. These may change very often depending on the workloads and devices in the system. As such, training models to continuously monitor and dynamically distribute the workloads is a very important part of optimization. Simply offloading larger tasks to more powerful devices may be obsolete if the devices has all of its computing resources or network capabilities being used up by another workload.

The way this is commonly done is by using the profiling step described above as a baseline. Then, a machine learning model utilizes the data to estimate the performance of devices and/or layers. During runtime, a similar process is employed which may update the data used and help the model refine its predictions. Network traffic is also taken into account at this stage in order to preserve the edge computing benefit of providing lower latency. Using all of this data and updates at runtime, the partitioning model is able to dynamically distribute workloads at runtime in order to optimize the workflow and ensure each device is utilizing its resources in the most efficient manner. 2 very good examples of how such a system is specifically deployed are the Neurosurgeon and EdgeShard systems, shown above.

===Horizontal and Vertical Partitioning===
There are 2 major ways that these models split the workloads in order to optimize the machine learning: Horizontal and vertical partitioning [3]. Given a set of layers that ranges from the cloud to edge, horizontal partitioning involves splitting up the workload between the layers. For example, if a large amount of computational resources is deemed necessary, this task may go to the cloud to be completed and preprocessed. One the other hand, if a small amount of computational power is required, this type of work can go to edge devices. Such partitioning also depends on the confidence and accuracy level of the given learning. If the accuracy is completed on an edge device and found to be very low, it can be sent to the cloud; on the other hand if the accuracy is already fairly high and the learning model needs smaller work to reach the threshold deemed acceptable, it may be sent to edge devices to free up network traffic on the cloud and reduce latency [3].

The second model of partitioning is called vertical partitioning. This involves splitting among the devices within a certain layer rather than among the layers themselves. This is similar to what has been described in previous sections, as it allows a means for fully utilizing the heterogenous abilities that are found in edge devices. Similar functionality and determination to what is found in horizontal partitioning is done, but all of the devices that the workload is split across function on the same layer [3]. To fully optimize a machine learning model, both horizontal and vertical partitioning must be used.

[[File:Edge_computing_layers.png|400px|thumb|center|An example of different layers with multiple devices]]

=='''References'''==
[https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10818760&tag=1 1. M. Zhang, X. Shen, J. Cao, Z. Cui and S. Jiang, "EdgeShard: Efficient LLM Inference via Collaborative Edge Computing," in IEEE Internet of Things Journal, doi: 10.1109/JIOT.2024.3524255]

[https://ieeexplore.ieee.org/abstract/document/8690980 2. X. Chen, H. Zhang, C. Wu, S. Mao, Y. Ji and M. Bennis, "Performance Optimization in Mobile-Edge Computing via Deep Reinforcement Learning," 2018 IEEE 88th Vehicular Technology Conference (VTC-Fall), Chicago, IL, USA, 2018, pp. 1-6, doi: 10.1109/VTCFall.2018.8690980.]

[https://ieeexplore.ieee.org/abstract/document/8976180 3. X. Wang, Y. Han, V. C. M. Leung, D. Niyato, X. Yan and X. Chen, "Convergence of Edge Computing and Deep Learning: A Comprehensive Survey," in IEEE Communications Surveys & Tutorials, vol. 22, no. 2, pp. 869-904, Secondquarter 2020, doi: 10.1109/COMST.2020.2970550.]

[https://dl.acm.org/doi/abs/10.1145/3093337.3037698 4. Kang, Yiping and Hauswald, Johann and Gao, Cao and Rovinski, Austin and Mudge, Trevor and Mars, Jason and Tang, Lingjia, "Neurosurgeon: Collaborative Intelligence Between the Cloud and Mobile Edge" 2017 Association for Computing Machinery, New York, NY, USA, 2017, doi: 10.1145/3093337.3037698.]

[https://dl.acm.org/doi/pdf/10.1145/3555802 5. Hua, Haochen, et al. "Edge computing with artificial intelligence: A machine learning perspective." ACM Computing Surveys 55.9 (2023): 1-35.]

[https://www.mdpi.com/2079-9292/13/3/640 6. Grzesik, Piotr, and Dariusz Mrozek. "Combining machine learning and edge computing: Opportunities, challenges, platforms, frameworks, and use cases." Electronics 13.3 (2024): 640.]

[https://link.springer.com/chapter/10.1007/978-3-030-96756-7_1#citeas 7. Rafatirad, S., Homayoun, H., Chen, Z., Pudukotai Dinakarrao, S.M. (2022). What Is Applied Machine Learning?. In: Machine Learning for Computer Scientists and Data Analysts. Springer, Cham. https://doi.org/10.1007/978-3-030-96756-7_1]

Federated Learning

2025-04-07T01:29:06Z

Ciarang: /* 5.9 References */

== 5.1 Overview and Motivation ==

'''Federated Learning (FL)''' is a decentralized approach to machine learning where many edge devices—called clients—work together to train a shared model ''without'' sending their private data to a central server. Each client trains the model locally using its own data and then sends only the model updates (like gradients or weights) to a central server, often called an '''aggregator'''. The server combines these updates to create a new global model, which is then sent back to the clients for the next round of training. This cycle repeats, allowing the model to improve while keeping the raw data on the devices. FL shifts the focus from central data collection to collaborative training, supporting privacy and scalability in machine learning systems.[1]

The main reason for using FL is to address concerns about data privacy, security, and communication efficiency—especially in edge computing, where huge amounts of data are created across many different, often limited, devices. Centralized learning struggles here due to limited bandwidth, high data transfer costs, and strict privacy regulations like the '''General Data Protection Regulation (GDPR)''' and the '''Health Insurance Portability and Accountability Act (HIPAA)'''. FL helps solve these problems by keeping data on the device, reducing the risk of leaks and the need for constant cloud connectivity. It also lowers communication costs by sharing small model updates instead of full datasets, making it ideal for real-time learning in mobile and edge networks.[2]

In edge computing, FL is a major shift in how we do machine learning. It supports distributed intelligence even when devices have limited resources, unreliable connections, or very different types of data ('''non-IID'''). Clients can join training sessions at different times, work around network delays, and adjust based on their hardware limitations. This makes FL a flexible option for edge environments with varying battery levels, processing power, and storage. FL can also support personalized models using techniques like '''federated personalization''' or '''clustered aggregation'''. Overall, it provides a strong foundation for building AI systems that are scalable, private, and better suited for the challenges of modern distributed computing.[1][2][3]

== 5.2 Federated Learning Architectures ==

Federated Learning (FL) can be implemented through various architectural configurations, each defining how clients interact, how updates are aggregated, and how trust and responsibility are distributed. These architectures play a central role in determining the scalability, fault tolerance, communication overhead, and privacy guarantees of a federated system. In edge computing environments, where client devices are heterogeneous and network reliability varies, the choice of architecture significantly affects the efficiency and robustness of learning. The three dominant paradigms are centralized, decentralized, and hierarchical architectures. Each of these approaches balances different trade-offs in terms of coordination complexity, system resilience, and resource allocation.

[[File:Architecture.png|thumb|center|800px|Visual comparison of Cloud-Based, Edge-Based, and Hierarchical Federated Learning architectures. Source: [1]]]

=== 5.2.1 Centralized Architecture ===

In the centralized FL architecture, a central server or cloud orchestrator is responsible for all coordination, aggregation, and distribution activities. The server begins each round by broadcasting a global model to a selected subset of client devices, which then perform local training using their private data. After completing local updates, clients send their modified model parameters—usually in the form of weight vectors or gradients—back to the server. The server performs aggregation, typically using algorithms such as Federated Averaging (FedAvg), and sends the updated global model to the clients for the next round of training.

The centralized model is appealing for its simplicity and compatibility with existing cloud-to-client infrastructures. It is relatively easy to deploy, manage, and scale in environments with stable connectivity and limited client churn. However, its reliance on a single server introduces critical vulnerabilities. The server becomes a bottleneck under high communication loads and a single point of failure if it experiences downtime or compromise. Furthermore, this architecture requires clients to trust the central aggregator with metadata, model parameters, and access scheduling. In privacy-sensitive or high-availability contexts, these limitations can restrict centralized FL’s applicability. [1]

=== 5.2.2 Decentralized Architecture ===

Decentralized FL removes the need for a central server altogether. Instead, client devices interact directly with each other to share and aggregate model updates. These peer-to-peer (P2P) networks may operate using structured overlays, such as ring topologies or blockchain systems, or employ gossip-based protocols for stochastic update dissemination. In some implementations, clients collaboratively compute weighted averages or perform federated consensus to update the global model in a distributed fashion.

This architecture significantly enhances system robustness, resilience, and trust decentralization. There is no single point of failure, and the absence of a central coordinator eliminates risks of aggregator bias or compromise. Moreover, decentralized FL supports federated learning in contexts where participants belong to different organizations or jurisdictions and cannot rely on a neutral third party. However, these benefits come at the cost of increased communication overhead, complex synchronization requirements, and difficulties in managing convergence—especially under non-identical data distributions and asynchronous updates. Protocols for secure communication, update verification, and identity authentication are necessary to prevent malicious behavior and ensure model integrity. Due to these complexities, decentralized FL is an active area of research and is best suited for scenarios requiring strong autonomy and fault tolerance. [2]

=== 5.2.3 Hierarchical Architecture ===

Hierarchical FL is a hybrid architecture that introduces one or more intermediary layers—often called edge servers or aggregators—between clients and the global coordinator. In this model, clients are organized into logical or geographical groups, with each group connected to an edge server. Clients send their local model updates to their respective edge aggregator, which performs preliminary aggregation. The edge servers then send their aggregated results to the cloud server, where final aggregation occurs to produce the updated global model.

This multi-tiered architecture is designed to address the scalability and efficiency challenges inherent in centralized systems while avoiding the coordination overhead of full decentralization. Hierarchical FL is especially well-suited for edge computing environments where data, clients, and compute resources are distributed across structured clusters, such as hospitals within a healthcare network or base stations in a telecommunications infrastructure.

One of the key advantages of hierarchical FL is communication optimization. By aggregating locally at edge nodes, the amount of data transmitted over wide-area networks is significantly reduced. Additionally, this model supports region-specific model personalization by allowing edge servers to maintain specialized sub-models adapted to local client behavior. Hierarchical FL also enables asynchronous and fault-tolerant training by isolating disruptions within specific clusters. However, this architecture still depends on reliable edge aggregators and introduces new challenges in cross-layer consistency, scheduling, and privacy preservation across multiple tiers. [1][3]

== 5.3 Aggregation Algorithms and Communication Efficiency ==

Aggregation is a fundamental operation in Federated Learning (FL), where updates from multiple edge clients are merged to form a new global model. The quality, stability, and efficiency of the federated learning process depend heavily on the aggregation strategy employed. In edge environments characterized by device heterogeneity and non-identical data distributions choosing the right aggregation algorithm is essential to ensure reliable convergence and effective collaboration.

[[File:Aggregation.png|thumb|center|700px|Federated Learning protocol showing client selection, local training, model update, and aggregation. Source: Adapted from Federated Learning in Edge Computing: A Systematic Survey [1].]]

=== 5.3.1 Key Aggregation Algorithms ===

{| class="wikitable"
|+ Comparison of Aggregation Algorithms in Federated Learning
! Algorithm !! Description !! Handles Non-IID Data !! Server-Side Optimization !! Typical Use Case
|-
| '''FedAvg''' || Performs weighted averaging of client models based on dataset size. Simple and communication-efficient. || Limited || No || Basic federated setups with IID or mildly non-IID data.
|-
| '''FedProx''' || Adds a proximal term to the local loss function to prevent client drift. Stabilizes training with diverse data. || Yes || No || Suitable for edge deployments with high data heterogeneity or resource-limited clients.
|-
| '''FedOpt''' || Applies adaptive optimizers (e.g., FedAdam, FedYogi) on aggregated updates. Enhances convergence in dynamic systems. || Yes || Yes || Used in large-scale systems or settings with unstable participation and gradient variability.
|}

Aggregation is the cornerstone of Federated Learning (FL), where locally computed model updates from edge devices are combined into a global model. The most widely adopted aggregation method is Federated Averaging (FedAvg), introduced in the foundational work by McMahan et al. FedAvg operates by averaging model parameters received from participating clients, typically weighted by the size of each client’s local dataset. This simple yet powerful method reduces the frequency of communication by allowing each device to perform multiple local updates before sending gradients to the server. However, FedAvg performs optimally only when data across clients is balanced and independent and identically distributed (IID)—conditions rarely satisfied in edge computing environments, where client datasets are often highly non-IID, sparse, or skewed [1][2].

To address these limitations, several advanced aggregation algorithms have been proposed. One notable extension is FedProx, which modifies the local optimization objective by adding a proximal term that penalizes large deviations from the global model. This constrains local training and improves stability in heterogeneous data scenarios. FedProx also allows flexible participation by clients with limited resources or intermittent connectivity, making it more robust in practical edge deployments. Another family of aggregation algorithms is FedOpt, which includes adaptive server-side optimization techniques such as FedAdam and FedYogi. These algorithms build on optimization methods used in centralized training and apply them at the aggregation level, enabling faster convergence and improved generalization under complex, real-world data distributions. Collectively, these variants of aggregation address critical FL challenges such as slow convergence, client drift, and update divergence due to heterogeneity in both data and device capabilities [1][3].

=== 5.3.2 Communication Efficiency in Edge-Based FL ===

Communication remains one of the most critical bottlenecks in deploying Federated Learning (FL) at the edge, where devices often suffer from limited bandwidth, intermittent connectivity, and energy constraints. To address this, several strategies have been developed. '''Gradient quantization''' reduces the size of transmitted updates by lowering numerical precision (e.g., from 32-bit to 8-bit values). '''Gradient sparsification''' limits communication to only the most significant changes in the model, transmitting top-k updates while discarding negligible ones. '''Local update batching''' allows devices to perform multiple rounds of local training before sending updates, reducing the frequency of synchronization.

Further, '''client selection strategies''' dynamically choose a subset of devices to participate in each round, based on criteria like availability, data quality, hardware capacity, or trust level. These communication optimizations are crucial for ensuring that FL remains scalable, efficient, and deployable across millions of edge nodes without overloading the network or draining device batteries.[1][2][3]

== 5.4 Privacy Mechanisms ==
Privacy and data confidentiality are central design goals of Federated Learning (FL), particularly in edge computing scenarios where numerous IoT devices (e.g., hospital servers, autonomous vehicles) gather sensitive data. Although FL does not require the raw data to leave each client’s device, model updates can still leak private information or be correlated to individual data points. To address these challenges, various privacy-preserving mechanisms have been proposed in the literature [1][2][3].

=== 5.4.1 Differential Privacy (DP) ===
Differential Privacy is a formal framework ensuring that the model’s outputs (e.g., parameter updates) do not reveal individual records. In FL, DP often involves injecting calibrated noise into gradients or model weights on each client. This noise is designed so that the global model’s performance remains acceptable, yet attackers cannot reliably infer any single data sample’s presence in the training set. A step-by-step timeline of DP in an FL context can be summarized as follows:
# Clients fetch the global model and compute local gradients.
# Before transmitting gradients, clients add randomized noise to mask specific data patterns.
# The central server aggregates the noisy gradients to produce a new global model.
# Clients download the updated global model for further local training.
By carefully tuning the “privacy budget” (ε and δ), DP can balance privacy against model utility [1][4].

=== 5.4.2 Secure Aggregation ===
'''Secure Aggregation''', or '''SecAgg''', is a protocol that encrypts local updates before they are sent to the server, ensuring that only the aggregated result is revealed. A typical SecAgg workflow includes:
# Each client randomly splits its model updates into multiple shares.
# These shares are exchanged among clients and the server in a way that no single party sees the entirety of any update.
# The server only obtains the sum of all client updates, rather than individual parameters.

This approach can thwart internal adversaries who might try to reconstruct local data from raw updates.[2] '''SecAgg''' is crucial for preserving confidentiality, especially in IoT-based FL systems where data privacy regulations (such as GDPR and HIPAA) prohibit raw data exposure.

=== 5.4.3 Homomorphic Encryption and SMPC ===
Homomorphic Encryption (HE) supports computations on encrypted data without the need for decryption. In FL, a homomorphically encrypted gradient can be aggregated securely by the server, preventing it from seeing cleartext updates. This approach, however, introduces higher computational overhead, which can be burdensome for resource-limited IoT edge devices [3]. Secure Multi-Party Computation (SMPC) is a related set of techniques that enables multiple parties to perform joint computations on secret inputs. In the context of FL, SMPC allows clients to compute sums of model updates without revealing individual updates. Although performance optimizations exist, SMPC remains challenging for large-scale models with millions of parameters[1][5].

=== 5.4.4 IoT-Specific Considerations ===
In edge computing, IoT devices often capture highly sensitive data (patient records, vehicle sensor logs, etc.). Privacy measures must therefore operate seamlessly on low-power hardware while accommodating intermittent connectivity. For instance, a smart healthcare device storing patient records may use DP-based local training and SecAgg to encrypt updates before uploading. Meanwhile, an autonomous vehicle might adopt HE to guard sensor patterns relevant to real-time traffic analysis. Together, these techniques form a multi-layered privacy defense tailored for distributed, resource-constrained IoT ecosystems [4][5].

[[File:System-model-for-privacy-preserving-federated-learning.png|thumb|center|500px|System model illustrating privacy-preserving federated learning using homomorphic encryption. Adapted from Privacy-Preserving Federated Learning Using Homomorphic Encryption.]]

== 5.5 Security Threats ==

While Federated Learning (FL) enhances data privacy by ensuring that raw data remains on edge devices, it introduces significant security vulnerabilities due to its decentralized design and reliance on untrusted participants. In edge computing environments, where clients often operate with limited computational power and over unreliable networks, these threats are particularly pronounced.

=== 5.5.1 Model Poisoning Attacks ===

Model poisoning attacks are a critical threat in Federated Learning (FL), especially in edge computing environments where the infrastructure is distributed and clients may be untrusted or loosely regulated. In this type of attack, malicious clients intentionally craft and submit harmful model updates during the training process to compromise the performance or integrity of the global model. These attacks are typically categorized as either untargeted aimed at degrading general model accuracy or targeted (backdoor attacks), where the global model is manipulated to behave incorrectly in specific scenarios while appearing normal in others. For instance, an attacker might train its local model with a backdoor trigger, such as a specific pixel pattern in an image, so that the global model mis-classifies inputs containing that pattern, even though it performs well on standard test cases [1][4].

FL's inherent reliance on aggregation algorithms like Federated Averaging (FedAvg), which simply compute the average of local updates, makes it susceptible to these attacks. Since raw data is never shared, poisoned updates can be hard to detect, especially in non-IID settings where variability in updates is expected. Robust aggregation techniques like Krum, Trimmed Mean, and Bulyan have been proposed to resist such manipulation by filtering or down-weighting outlier contributions. However, these algorithms often introduce computational and communication overheads, which are impractical for edge devices with limited power and processing capabilities [2][4]. Furthermore, adversaries can design subtle attacks that mimic benign statistical patterns, making them indistinguishable from legitimate updates. Emerging research explores anomaly detection based on update similarity and trust scoring, yet these solutions face limitations when applied to large-scale or asynchronous FL deployments. Developing lightweight, real-time, and scalable defenses that are effective even under device heterogeneity and unreliable network conditions remains an unresolved challenge in secure edge-based FL [3][4].

[[File:FL_IoT_Attack_Detection.png|thumb|center|600px|Federated Learning-based architecture for adversarial sample detection and defense in IoT environments. Adapted from Federated Learning for Internet of Things: A Comprehensive Survey.]]

=== 5.5.2 Data Poisoning Attacks ===

Data poisoning attacks target the integrity of Federated Learning (FL) by manipulating the training data on individual clients before model updates are generated. Unlike model poisoning, which corrupts the gradients or weights directly, data poisoning occurs at the dataset level—allowing adversaries to stealthily influence model behavior through biased or malicious data. This includes techniques such as label flipping (e.g., changing labels of one class to another), outlier injection (introducing data points that fall far outside the normal distribution), or clean-label attacks (subtly altering legitimate data so it has harmful effects without obvious artifacts). Since FL relies on the assumption that client data remains private and uninspected, such poisoned data can easily propagate harmful patterns into the global model, particularly in non-IID settings common in edge environments [2][3].

Edge devices are especially vulnerable to this form of attack due to their limited compute and energy resources, which often preclude comprehensive input validation or anomaly detection. In addition, the highly diverse and fragmented nature of data collected at the edge such as medical readings from wearable sensors or driving behavior from connected vehicles makes it difficult to establish a clear baseline for identifying poisoned updates. Defense strategies include robust aggregation (e.g., Median, Trimmed Mean), anomaly detection techniques, and differentially private mechanisms that inject random noise to reduce precision targeting. However, these methods come with trade-offs, such as reduced model accuracy or increased system complexity [1][4]. There is currently no foolproof solution to detect data poisoning without violating privacy principles. As FL continues to be deployed in critical domains like healthcare, finance, and smart cities, mitigating data poisoning while preserving user data locality and system scalability remains an open and urgent research challenge [3][4].

=== 5.5.3 Inference and Membership Attacks ===

Inference attacks represent a subtle yet powerful class of threats in Federated Learning (FL), where adversaries seek to extract sensitive information from shared model updates rather than raw data. These attacks exploit the iterative nature of the FL training process, where clients send gradient updates or model weights to the server. By analyzing these updates especially in over-parameterized models attackers can infer statistical properties of the underlying data or even reconstruct representative inputs. A well-documented example is the membership inference attack, where an adversary determines whether a specific data point was used in training by observing the model’s behavior on that input. This becomes especially problematic in edge computing environments, where data heterogeneity and limited client datasets make it easier to correlate individual updates with specific users or devices [2][3].

The risk of information leakage through gradient sharing grows in proportion to the model’s complexity and the granularity of updates. In FL, where edge clients often have only a small number of data samples, their updates may reveal disproportionately detailed information. Studies have shown that attackers with access to multiple rounds of updates—especially when clients are selected frequently can perform input reconstruction using gradient inversion techniques. These attacks pose significant risks in domains such as healthcare, where private data like patient symptoms or diagnoses might be inferred from the model’s training dynamics. Mitigation strategies include differential privacy (DP), which adds noise to updates before transmission to obscure precise information. Secure Aggregation protocols also help by ensuring the server only sees aggregated updates from multiple clients. However, both approaches come with trade-offs: DP reduces model accuracy and requires careful calibration of the privacy budget, while Secure Aggregation adds communication and cryptographic overhead [4]. Designing privacy-preserving FL systems that balance utility, efficiency, and strong protection against inference remains a major challenge, particularly at the scale and variability found in real-world edge networks [1][4].

=== 5.5.4 Sybil and Free-Rider Attacks ===

Sybil attacks are a serious security concern in Federated Learning (FL), particularly within decentralized or large-scale edge environments. In a Sybil attack, a single adversary creates multiple fake identity or Sybil nodes that participate in the FL training process. These fake clients can collude to manipulate the global model by amplifying poisoned updates, skewing consensus, or outvoting honest participants during aggregation. This is especially dangerous in FL systems where client selection is randomized and identity verification mechanisms are either weak or absent. In cross-device FL scenarios, where millions of devices participate and authentication is often lightweight, Sybil attacks can be launched without significant computational cost [1]. By overwhelming the training process with manipulated updates from multiple controlled identities, attackers can degrade model accuracy, insert backdoors, or block convergence altogether.

Mitigating Sybil attacks is challenging due to the inherent privacy constraints of FL. Traditional centralized defenses like identity verification or IP-based throttling may violate the privacy-preserving principles of FL or be infeasible in mobile, disconnected, or edge-network settings. Some defense mechanisms include cryptographic client registration, proof-of-work schemes, or client reputation scoring. However, these introduce computational burdens or trust assumptions that may not hold in edge environments. Techniques such as clustering client updates and identifying similarity patterns can help detect coordinated Sybil behavior, but adversaries can adapt by subtly varying their poisoned updates to mimic honest client diversity [4].

Free-rider attacks, while less destructive than Sybil attacks, undermine the collaborative foundation of FL. In this scenario, a client participates in the FL protocol but contributes little to no useful computation e.g., by sending stale updates, randomly initialized models, or dummy gradients—while still downloading the improved global model and benefiting from it. Free-riders reduce overall model quality and fairness, especially in resource constrained settings like IoT networks, where honest clients expend real bandwidth, battery, and computation to train the model [3]. Addressing free-riding behavior often involves contribution aware aggregation (e.g., weighting updates based on gradient quality or model improvement) or audit mechanisms that assess client effort over

=== 5.5.5 Malicious Server Attacks ===

In Federated Learning (FL), especially in the classical centralized architecture, the server plays a critical role in coordinating the learning process by aggregating updates from clients and distributing the global model. However, this central position also makes the server a powerful threat vector if it becomes compromised or malicious. A malicious server can violate confidentiality by launching inference attacks on the collected client updates, attempt to reconstruct local data through gradient inversion, or identify statistical properties of the client datasets. Additionally, the server can tamper with model integrity by selectively dropping updates from honest clients, modifying the aggregation process, or injecting adversarial updates into the global model. This single point of failure is particularly concerning in sensitive applications such as healthcare, finance, and autonomous vehicles, where edge clients rely on trusted model updates [1][3].

To defend against these threats, a variety of cryptographic and architectural strategies have been proposed. Secure Aggregation protocols ensure that the server only receives the aggregated sum of updates from clients, preventing it from accessing individual contributions. However, this protection assumes that clients are non-colluding and connected simultaneously, which can be difficult to guarantee in dynamic edge networks. Homomorphic encryption (HE) offers another layer of defense by enabling computations on encrypted data, but its computational cost is often prohibitive for resource-limited edge devices. Similarly, Secure Multi-Party Computation (SMPC) allows multiple clients to jointly compute global updates without revealing local data, but requires heavy communication and coordination overhead [4]. Decentralized or hierarchical FL architectures also aim to reduce the central server’s authority by distributing aggregation roles across trusted intermediaries or peer clients. However, these designs introduce new challenges such as ensuring consensus, managing trust across multiple layers, and maintaining training efficiency [2][4]. Balancing security, scalability, and efficiency remains an open challenge in FL systems particularly as deployment expands across distributed edge environments where centralized trust cannot be easily assumed.

= 5.6 Resource-Efficient Model Training =

In '''Federated Learning (FL)''', especially within edge computing environments, resource-efficient model training is crucial due to the inherent limitations of edge devices, such as constrained computational power, limited memory, and restricted communication bandwidth. Addressing these challenges involves implementing strategies that optimize the use of local resources while maintaining the integrity and performance of the global model. Key approaches include '''model compression techniques''', '''efficient communication protocols''', and '''adaptive client selection methods'''.

=== 5.6.1 Model Compression Techniques ===

Model compression techniques are essential for reducing the computational and storage demands of FL models on edge devices. By decreasing the model size, these techniques facilitate more efficient local training and minimize the communication overhead during model updates. Common methods include:

* '''Pruning''': This technique involves removing less significant weights or neurons from the neural network, resulting in a sparser model that requires less computation and storage. For instance, a study proposed a framework where the global model is pruned on a powerful server before being distributed to clients, effectively reducing the computational load on resource-constrained edge devices.[1]

* '''Quantization''': This method reduces the precision of the model's weights, such as converting 32-bit floating-point numbers to 8-bit integers, thereby decreasing the model size and accelerating computations. However, careful implementation is necessary to balance the trade-off between model size reduction and potential accuracy loss.

* '''Knowledge Distillation''': In this approach, a large, complex model (teacher) is used to train a smaller, simpler model (student) by transferring knowledge, allowing the student model to achieve comparable performance with reduced resource requirements. This technique has been effectively applied in FL to accommodate the constraints of edge devices.[2]

=== 5.6.2 Efficient Communication Protocols ===

Efficient communication protocols are vital for mitigating the communication bottleneck in FL, as frequent transmission of model updates between clients and the central server can overwhelm limited network resources. Strategies to enhance communication efficiency include:

* '''Update Sparsification''': This technique involves transmitting only the most significant updates or gradients, reducing the amount of data sent during each communication round. By focusing on the most impactful changes, update sparsification decreases communication overhead without substantially affecting model performance.

* '''Compression Algorithms''': Applying data compression methods to model updates before transmission can significantly reduce the data size. For example, using techniques like Huffman coding or run-length encoding can compress the updates, leading to more efficient communication.[3]

* '''Adaptive Communication Frequency''': Adjusting the frequency of communications based on the training progress or model convergence can help in conserving bandwidth. For instance, clients may perform multiple local training iterations before sending updates to the server, thereby reducing the number of communication rounds required.

=== 5.6.3 Adaptive Client Selection Methods ===

Adaptive client selection methods focus on optimizing the selection of participating clients during each training round to enhance resource utilization and overall model performance. Approaches include:

* '''Resource-Aware Selection''': Prioritizing clients with higher computational capabilities and better network connectivity can lead to more efficient training processes. By assessing the resource availability of clients, the FL system can make informed decisions on which clients to involve in each round.[4]

* '''Clustered Federated Learning''': Grouping clients based on similarities in data distribution or system characteristics allows for more efficient training. Clients within the same cluster can collaboratively train a sub-model, which is then aggregated to form the global model, reducing the overall communication and computation burden.[5]

* '''Early Stopping Strategies''': Implementing mechanisms to terminate training early when certain criteria are met can conserve resources. For example, if a client's local model reaches a predefined accuracy threshold, it can stop training and send the update to the server, thereby saving computational resources.[6]

Incorporating these strategies into the FL framework enables more efficient utilization of the limited resources available on edge devices. By tailoring model training processes to the specific constraints of these devices, it is possible to achieve effective and scalable FL deployments in edge computing environments.

= 5.8 Challenges in Federated Learning =

Federated Learning (FL) helps protect user data by training models directly on devices. While useful, FL faces many challenges that make it hard to use in real-world systems [1][2][3][4][5][6]:

* '''System Heterogeneity'''
Devices used in FL—like smartphones, sensors, or edge servers—have different speeds, memory, and battery. Some may be too slow or lose power during training, causing delays. Solutions include using smaller models, pruning, partial updates, and early stopping [1][6].

* '''Communication Bottlenecks'''
FL requires frequent communication between devices and a central server. Sending full model updates can overload weak or slow networks. To fix this, researchers use update compression techniques like quantization, sparsification, and knowledge distillation [2][3].

* '''Statistical Heterogeneity (Non-IID Data)'''
Each device collects different types of data depending on users and context, making the data non-IID. This harms the global model’s accuracy. Solutions include clustered FL (grouping similar clients), personalized FL, and meta-learning to adapt locally [2][5].

* ''' Privacy and Security '''
Even if raw data stays on devices, updates can leak private info or be attacked. Malicious devices may poison the model. Solutions include secure aggregation, differential privacy, and encryption—though they may increase cost [2][3].

* '''Scalability'''
FL needs to support thousands of devices, many of which may go offline. Picking reliable clients and using hierarchical FL where some devices act as local aggregators—can help scale better [4].

* '''Incentive Mechanisms'''
Users may not want to spend energy or bandwidth on FL. Without rewards, participation drops. Research explores systems like blockchain tokens or credit points, but real-world use is still rare [2].

* '''Lack of Standardization and Benchmarks'''
FL lacks shared benchmarks and datasets, making it hard to compare approaches. Simulations often ignore real-world issues like device failures or concept drift. Frameworks like LEAF, FedML, and Flower help but more real-world data is needed [2][3].

* '''Concept Drift and Continual Learning'''
User data changes over time (concept drift), so the model can become outdated. Continual learning helps the model adapt over time. Techniques include memory replay, adaptive learning rates, and early stopping [1][6].

* '''Deployment Complexity'''
Devices use different hardware, software, and networks. It’s hard to manage updates or fix bugs across so many different systems.

* '''Reliability and Fault Tolerance'''
Devices can crash, go offline, or send incorrect updates. FL must be designed to handle such failures without harming the global model.

* '''Monitoring and Debugging'''
In FL, it's difficult to track and debug model behavior across many devices. New tools are needed to observe and fix problems in distributed training.

= 5.9 References =
# ''Efficient Pruning Strategies for Federated Edge Devices''. Springer Journal, 2023. https://link.springer.com/article/10.1007/s40747-023-01120-5
# ''Federated Learning with Knowledge Distillation''. arXiv preprint, 2020. https://arxiv.org/abs/2007.14513
# ''Communication-Efficient FL using Data Compression''. MDPI Applied Sciences, 2022. https://www.mdpi.com/2076-3417/12/18/9124
# ''Resource-Aware Federated Client Selection''. arXiv preprint, 2021. https://arxiv.org/pdf/2111.01108
# ''Clustered Federated Learning for Heterogeneous Clients''. MDPI Telecom, 2023. https://www.mdpi.com/2673-2688/6/2/30
# ''Early Stopping in FL for Energy-Constrained Devices''. arXiv preprint, 2023. https://arxiv.org/abs/2310.09789

Federated Learning

2025-04-07T01:28:25Z

Ciarang: /* 5.9 References */