Editing Machine Learning at the Edge (section)

=='''4.2 ML Training at the Edge'''==

Machine Learning (ML) training at the edge is basically the process of developing, updating, or fine-tuning ML models directly on edge devices like on smartphones, IoT sensors, wearables, and other embedded systems instead of only depending on centralized cloud infrastructure. This approach is becoming a lot more important as the demand for real-time, personalized AI applications continues to grow. By being able to train models closer to where the data is generated, edge-based ML enables faster responses, helps reduce latency, and enhances user privacy by minimizing the need to transmit sensitive data to the cloud. It’s also especially useful in scenarios where devices operate in environments with limited or unreliable network connectivity, allowing them to function more efficiently.

[[File:Machine Learning Pipeline.png|500px|thumb|right|Diagram of the machine learning pipeline: raw structured and unstructured data is preprocessed, analyzed, and used for feature selection, followed by model construction, evaluation, and deployment for application use.]]


[[File:Learning.png|500px|thumb|center| This diagram shows the relationship among different devices in the edge-cloud system. As shown, the deep learning workload is partitioned among all layers so that each one contributes based on its advantages.]]

==='''Benefits:'''===
One significant advantage of training ML models directly on edge devices is reduced latency. By processing data locally, devices can make immediate decisions without the delays caused by transmitting data back and forth to cloud servers. This immediate responsiveness is extremely important for applications like real time health monitoring, autonomous driving, and industrial automation.

Additionally, training machine learning models at the edge significantly enhances user privacy. Since sensitive data can be processed and stored directly on the user's device rather than being sent to centralized cloud servers, the risk of data breaches or unauthorized access during transmission is reduced by a lot. This local data handling is able to prevent exposure of personal or confidential information, providing users greater control over their data. Edge-based training naturally aligns with privacy regulations such as the General Data Protection Regulation (GDPR), which emphasizes strict data security, transparency, and explicit user consent. By keeping personal data localized, edge training not only improves security but also helps organizations easily comply with privacy laws, protecting users’ rights and maintaining trust.

Efficiency and resilience are important benefits of edge training. By training machine learning models directly on edge devices, these devices become capable of processing data locally without relying on constant internet connectivity. This local processing allows edge devices to continue operating effectively even in environments where network connections are weak, unstable, or completely unavailable. Because they are not fully dependent on cloud infrastructure, edge devices can quickly adapt to changes, respond in real-time, and update their ML models based on immediate local data. As a result, edge training ensures reliable performance and uninterrupted operation, making it particularly valuable for remote locations, emergency scenarios, and harsh environments where cloud-based solutions might fail or become unreliable.


'''Examples: '''
A smart thermostat in a home can learn a user’s preferences for temperature and adjust automatically based on real-time inputs, like time of day or weather conditions. Similarly, a fitness tracker can track user activity patterns and adapt its recommendations for workouts or rest periods based on how the user is performing each day. These devices don’t need to rely on cloud servers to update or personalize their behavior — they can do it instantly on the device, which makes them more responsive and efficient.

In smart agriculture, edge computing is used to enhance crop monitoring and optimize farming practices. Devices like soil sensors, drones, and automated irrigation systems are equipped with sensors that collect data on soil moisture, temperature, and crop health. Edge devices process this data locally, enabling real-time decisions for tasks like irrigation, fertilization, and pest control.

In smart retail, edge computing is used to improve inventory management and customer experience. Retailers use smart shelves, RFID tags, and in-store cameras equipped with sensors to track inventory and monitor customer behavior. By processing this data locally on edge devices, retailers can manage stock levels, detect theft, and optimize store layouts in real-time. RFID tags placed on products can detect when an item is removed from the shelf. Using edge processing, the system can immediately update the inventory count and trigger a restocking request if an item’s stock is low.


'''Research Papers:'''
An important contribution to the understanding of machine learning (ML) training at the edge is the research paper "Making Distributed Edge Machine Learning for Resource-Constrained Communities and Environments Smarter: Contexts and Challenges" by Truong et al. (2023). This paper focuses on training ML models directly on edge devices in communities and environments facing limitations, such as unstable network connections, limited computational resources, and scarce technical expertise. The authors emphasize the necessity of developing context-aware ML training methods specifically tailored to these environments. Traditional centralized ML training methods often fail to operate effectively in such constrained settings, highlighting the need for decentralized, localized solutions. Truong et al. explore various challenges, including managing data efficiently, deploying suitable software frameworks, and designing intelligent runtime strategies that allow edge devices to train models effectively despite limited resources. Their work points out significant research directions, advocating for more adaptable and sustainable ML training solutions that genuinely reflect the technological and social contexts of resource-limited environments.

==='''Tools and Frameworks:'''===
 
Frameworks like TensorFlow Lite, PyTorch Mobile, and Edge Impulse are designed to support edge-based model training and inference. These tools allow developers to build and fine-tune models specifically for deployment on low-power devices.

==='''Technical Challenges:'''===

Despite its advantages, ML training at the edge presents challenges, including limited processing power, memory constraints, and energy efficiency. Edge devices often lack the computational resources of cloud servers, requiring lightweight models, optimized algorithms, and energy-efficient hardware.

Real World Applications: 
A well known example is Apple’s use of on-device training for personalized voice recognition with Siri. Instead of uploading user voice data to the cloud, Apple uses local training to improve accuracy over time while maintaining user privacy.

==='''Model Compression Techniques'''===
Despite the challenges, of ML at the edge, there are a variety of methods that can be used to provide a more efficient means of training, and making the heavy workloads compatible with even the limited computing power of certain edge devices.

'''Quantization:'''
Quantization is a method that involves reducing the precision of numbers, and thus easing the burden of computational power as well as memory management on the edge devices. There are multiple forms of quantization, but each one essentially sacrifices some precision - enough so that accuracy is mostly maintained but the numbers are easier to handle. For example, converting from floating point to integer datatypes means significantly less memory is used, and the differences for some models in the precision may be negligible. Another example is K-means based Weight Quantization, which involves creating a matrix and grouping similar numbers together with centroids. An example is shown below:

[[File:Screenshot_2025-04-24_170604.png|500px|thumb|right|By clustering each index in the matrix, and using centroids to approximate, the overall computations can be done much quicker and are more easily handled by edge devices]]

In recent work, Quantized Neural Networks (QNNs) have demonstrated that even extreme quantization—such as using just 1-bit values for weights and activations—can retain near state-of-the-art accuracy across vision and language tasks [12]. This type of quantization drastically reduces memory access requirements and replaces expensive arithmetic operations with fast, low-power bitwise operations like XNOR and popcount. These benefits are especially important for edge deployment, where energy efficiency is critical. In addition to model compression, Hubara et al. also show that quantized gradients—using as little as 6 bits—can be employed during training with minimal performance loss, further enabling efficient on-device learning [12]. QNNs have achieved strong results even on demanding benchmarks like ImageNet, while offering significant speedups and memory savings, making them one of the most practical solutions for edge AI deployment [12].


'''Pruning:'''
Pruning is an optimization technique that systematically removes low-salience parameters—such as weakly contributing weights or redundant hypothesis paths—from a machine learning model or decoding algorithm to reduce computational overhead. In the context of edge computing, where resources like memory bandwidth, power, and processing time are limited, pruning enables the deployment of performant models within strict efficiency constraints.

In statistical machine translation (SMT) systems, pruning is particularly critical during the decoding phase, where the search space of possible translations grows exponentially with sentence length. Techniques such as histogram pruning and threshold pruning are employed to manage this complexity. Histogram pruning restricts the number of candidate hypotheses retained in a decoding stack to a fixed size 𝑛, discarding the remainder. Threshold pruning eliminates hypotheses whose scores fall below a proportion 𝛼 of the best-scoring candidate, effectively filtering out weak candidates early.

The paper by Banik et al. introduces a machine learning-based dynamic pruning framework that adaptively tunes pruning parameters—namely stack size and beam threshold—based on structural features of the input text, such as sentence length, syntactic complexity, and the distribution of stop words. Rather than relying on static hyperparameters, this method uses a classifier (CN2 algorithm) trained on performance data to predict optimal pruning configurations at runtime. Experimental results showed consistent reductions in decoding latency (up to 90%) while maintaining or improving translation quality, as measured by BLEU scores [13].

This adaptive pruning paradigm is highly relevant to edge inference pipelines, where models must maintain a balance between latency and predictive accuracy. By intelligently limiting the hypothesis space and focusing computational resources on high-probability paths, pruning supports real-time, resource-efficient processing in edge NLP and embedded translation systems.

[[File:Pruning.png|350px|thumb|center|This shows how pruning can significantly reduce the overall network, thus leading to better computational and memory management]]


'''Distillation:'''
Distillation is a key strategy for reducing model complexity in edge computing environments. Instead of training a compact student model on hard labels—discrete class labels like 0, 1, or 2—it is trained on the soft outputs of a larger teacher model. These soft labels represent probability distributions over all classes, offering more nuanced supervision. For instance, rather than telling the student the input belongs strictly to class 3, a teacher might output “70% class 3, 25% class 2, 5% class 1.” This richer feedback helps the student model capture subtle relationships between classes that hard labels miss. Beyond reducing computational demands, distillation enhances generalization by conveying more informative training signals. It also benefits from favorable data geometry—when class distributions are well-separated and aligned—and exhibits strong monotonicity, meaning the student model reliably improves as more data becomes available [11]. These properties make it exceptionally suited for edge devices where training data may be limited, but efficient inference is crucial. 

In most cases, knowledge distillation in edge environments involves a large, high-capacity model trained in the cloud acting as the teacher, while the smaller, lightweight student model is deployed on edge devices. A less common—but emerging—practice is edge-to-edge distillation, where a more powerful edge node or edge server functions as the teacher for other nearby edge devices. This setup is especially valuable in federated, collaborative, or hierarchical edge networks, where cloud connectivity may be limited or privacy concerns necessitate local training. Distillation can also be combined with techniques such as quantization or pruning to further optimize model performance under hardware constraints. An example is shown below:

[[File:Knowledge_Distillation.png|700px|thumb|center|This shows how a complex teacher model transfers learned knowledge to a smaller student model using soft predictions to enable efficient edge deployment]]



{| class="wikitable" style="width:100%; text-align:left;"
|+ '''Comparison of Model Compression Techniques for Edge Deployment'''
! Technique
! Description
! Primary Benefit
! Trade-offs
! Ideal Use Case
|-
| '''Pruning'''
| Removes unnecessary weights or neurons from a neural network.
| Reduces model size and computation.
| May require retraining or fine-tuning to preserve accuracy.
| Useful for deploying models on devices with strict memory and compute constraints.
|-
| '''Quantization'''
| Converts high-precision values (e.g., 32-bit float) to lower precision (e.g., 8-bit integer or binary).
| Lowers memory usage and accelerates inference.
| Risk of precision loss, especially in very small or sensitive models.
| Ideal when real-time inference and power efficiency are essential.
|-
| '''Distillation'''
| Trains a smaller model (student) using the output probabilities of a larger, more complex teacher model.
| Preserves performance while reducing model complexity.
| Requires access to a trained teacher model and additional training data.
| Effective when deploying accurate, lightweight models under data or resource constraints.
|}


==='''Usage and Applications of AI Agents'''===
As artificial intelligence and machine learning technologies continue to mature, they pave the way for the development of intelligent AI agents capable of autonomous, context-aware behavior, with the goal of efficiently performing tasks specified by users. These agents combine perception, reasoning, and decision-making to execute tasks with minimal human intervention. When deployed on edge devices, AI agents can operate with low latency, preserve user privacy, and adapt to local data—making them ideal for real-time, personalized applications in homes, vehicles, factories, and beyond.

To function effectively, an agent must first perceive its environment and understand the task—often defined by the user. Then, it must reason about the optimal steps to accomplish that task, and finally, it must act on those decisions. These three components—perception, reasoning, and action—are essential to the agent’s ability to operate accurately and autonomously in dynamic environments.

'''Reasoning:''' The agent must be able to think sequentially, and decompose its specified tasks into a sequence of specific steps in order to accomplish its goal. It must also have some memory storage in order to remember what it has done, as well as the results of its sequence of actions in order to learn for future steps.

'''Autonomy:''' The agent must choose from the availability of possible steps, and operate based on its reasoning without step-by-step instructions from the user.

'''Tools:''' These tasks, however, are impossible to accomplish without the correct tools. Even if an AI agent understands how to go about carrying a task for optimal results, it must have the actual means to do it. This can include the ability to use and interact with APIs, interpret code, and access certain databases. 

Utilizing AI agents on edge devices can be tricky due to the computational and reasoning power needed. However, there are methods to accomplish this such as SLMs which query LLMs as needed (discussed later), or utilizing more powerful edge devices to carry out tasks. However, utilizing edge devices can be paramount if latency is a major issue, or if the agent is exposed to sensitive user data. Additionally, by using edge devices specific to a user, it may be able to better learn a user's patterns and preferences and react accordingly to provide the best possible outcome for that user.