Machine Learning at the Edge: Difference between revisions
No edit summary |
No edit summary |
||
Line 33: | Line 33: | ||
=='''References'''== | =='''References'''== | ||
[https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10818760&tag=1 M. Zhang, X. Shen, J. Cao, Z. Cui and S. Jiang, "EdgeShard: Efficient LLM Inference via Collaborative Edge Computing," in IEEE Internet of Things Journal, doi: 10.1109/JIOT.2024.3524255] | [https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10818760&tag=1 1. M. Zhang, X. Shen, J. Cao, Z. Cui and S. Jiang, "EdgeShard: Efficient LLM Inference via Collaborative Edge Computing," in IEEE Internet of Things Journal, doi: 10.1109/JIOT.2024.3524255] | ||
[https://ieeexplore.ieee.org/abstract/document/8690980 X. Chen, H. Zhang, C. Wu, S. Mao, Y. Ji and M. Bennis, "Performance Optimization in Mobile-Edge Computing via Deep Reinforcement Learning," 2018 IEEE 88th Vehicular Technology Conference (VTC-Fall), Chicago, IL, USA, 2018, pp. 1-6, doi: 10.1109/VTCFall.2018.8690980.] | [https://ieeexplore.ieee.org/abstract/document/8690980 2. X. Chen, H. Zhang, C. Wu, S. Mao, Y. Ji and M. Bennis, "Performance Optimization in Mobile-Edge Computing via Deep Reinforcement Learning," 2018 IEEE 88th Vehicular Technology Conference (VTC-Fall), Chicago, IL, USA, 2018, pp. 1-6, doi: 10.1109/VTCFall.2018.8690980.] | ||
[https://ieeexplore.ieee.org/abstract/document/8976180 X. Wang, Y. Han, V. C. M. Leung, D. Niyato, X. Yan and X. Chen, "Convergence of Edge Computing and Deep Learning: A Comprehensive Survey," in IEEE Communications Surveys & Tutorials, vol. 22, no. 2, pp. 869-904, Secondquarter 2020, doi: 10.1109/COMST.2020.2970550.] | [https://ieeexplore.ieee.org/abstract/document/8976180 3. X. Wang, Y. Han, V. C. M. Leung, D. Niyato, X. Yan and X. Chen, "Convergence of Edge Computing and Deep Learning: A Comprehensive Survey," in IEEE Communications Surveys & Tutorials, vol. 22, no. 2, pp. 869-904, Secondquarter 2020, doi: 10.1109/COMST.2020.2970550.] | ||
[https://dl.acm.org/doi/abs/10.1145/3093337.3037698 Kang, Yiping and Hauswald, Johann and Gao, Cao and Rovinski, Austin and Mudge, Trevor and Mars, Jason and Tang, Lingjia, "Neurosurgeon: Collaborative Intelligence Between the Cloud and Mobile Edge" 2017 Association for Computing Machinery, New York, NY, USA, 2017, doi: 10.1145/3093337.3037698.] | [https://dl.acm.org/doi/abs/10.1145/3093337.3037698 4. Kang, Yiping and Hauswald, Johann and Gao, Cao and Rovinski, Austin and Mudge, Trevor and Mars, Jason and Tang, Lingjia, "Neurosurgeon: Collaborative Intelligence Between the Cloud and Mobile Edge" 2017 Association for Computing Machinery, New York, NY, USA, 2017, doi: 10.1145/3093337.3037698.] |
Revision as of 21:22, 5 April 2025
Machine Learning at the Edge
4.1 Overview of ML at the Edge
4.2 ML Training at the Edge
4.3 ML Model Optimization at the Edge
The Need for Model Optimization at the Edge
Given the constrained resources, along with the inherently dynamic environment that edge devices must function in, model optimization is a crucial part of machine learning in edge computing. The current most widely used methodology consists of simply specifying an exceptionally large set of parameters, and giving it to the model to train on. This can be feasible when hardware is very advanced and powerful, and is necessary for systems such as Large Language Models (LLMs). However, this is no longer viable when dealing with the devices and environments at the edge. It is crucial to identify the best parameters and training methodology so as to minimize the amount of work done by these devices, while compromising as little as possible on the accuracy of the models. There are multiple ways to this, and they include either optimization or augmentation of the dataset itself, or optimization of the partition of work among the edge devices.
Edge and Cloud Collaboration
One methodology that is often used involves collaboration between both Edge and Cloud Devices. The cloud has the ability to process workloads that may require much more resources and cannot be done on edge devices. On the other hand, edge devices, which can store and process data locally, may have lower latency and more privacy. Given the advantages of each of these, many have proposed that the best way to handle machine learning is through a combination of edge and cloud computing.
The primary issue facing this computing paradigm, however, is the problem of optimally selecting which workloads should be done on the cloud and which should be done on the edge. This is a crucial problem to solve, as the correct partition of workloads is the best way to ensure that the respective benefits of the devices can be leveraged. A common way to do this, is to run certain computing tasks on the necessary devices and determine the length of time and resources that it takes. An example of this is the profiling step done in EdgeShard and Neurosurgeon. Other frameworks implement similar steps, where the capabilities of different devices are tested in order to allocate their workloads and determine the limit at which they can provide efficient functionality. If the workload is beyond the limits of the devices, it can be sent to the cloud for processing
The key advantage of this is that it is able to utilize the resources of the edge devices as necessary, allowing increased data privacy and lower latency. Since workloads are only processed in the cloud as needed, this will reduce the overall amount of time needed for processing because data is not constantly sent back and forth. It also allows for much less network congestion, which is crucial for many applications.
Optimizing Workload Partitioning
The key idea for much of the optimization done in machine learning on edge systems involves fully utilizing the heterogenous devices that are often contained in these systems. As such, it is important to understand the capabilities of each device so as to fully utilize its advantages. Devices can very greatly from smartphones with more powerful computational abilities to raspberry pis to sensors. More difficult tasks are offloaded to the powerful devices, while simpler tasks, or models that have been somewhat pretrained can be sent to the smaller devices. In some cases, as in Mobile-Edge, the task may be dropped altogether if the resources are deemed insufficient. In this way, exceptionally difficult tasks do not block tasks that have the ability to be executed and therefore the system can continue working.
Dynamic Models
Given the dynamic nature of the environments that edge devices must function, as well as the heterogeneity of the devices themselves, a dynamic model of machine learning is often employed. Such models must keep track of the current available resources including computation usage and power, as well as network traffic. These may change very often depending on the workloads and devices in the system. As such, training models to continuously monitor and dynamically distribute the workloads is a very important part of optimization. Simply offloading larger tasks to more powerful devices may be obsolete if the devices has all of its computing resources or network capabilities being used up by another workload.
The way this is commonly done is by using the profiling step described above as a baseline. Then, a machine learning model utilizes the data to estimate the performance of devices and/or layers. During runtime, a similar process is employed which may update the data used and help the model refine its predictions. Network traffic is also taken into account at this stage in order to preserve the edge computing benefit of providing lower latency. Using all of this data and updates at runtime, the partitioning model is able to dynamically distribute workloads at runtime in order to optimize the workflow and ensure each device is utilizing its resources in the most efficient manner. 2 very good examples of how such a system is specifically deployed are the Neurosurgeon and EdgeShard systems, shown above.
Horizontal and Vertical Partitioning
There are 2 major ways that these models split the workloads in order to optimize the machine learning: Horizontal and vertical partitioning. Given a set of layers that ranges from the cloud to edge, horizontal partitioning involves splitting up the workload between the layers. For example, if a large amount of computational resources is deemed necessary, this task may go to the cloud to be completed and preprocessed. One the other hand, if a small amount of computational power is required, this type of work can go to edge devices. Such partitioning also depends on the confidence and accuracy level of the given learning. If the accuracy is completed on an edge device and found to be very low, it can be sent to the cloud; on the other hand if the accuracy is already fairly high and the learning model needs smaller work to reach the threshold deemed acceptable, it may be sent to edge devices to free up network traffic on the cloud and reduce latency.
The second model of partitioning is called vertical partitioning. This involves splitting among the devices within a certain layer rather than among the layers themselves. This is similar to what has been described in previous sections, as it allows a means for fully utilizing the heterogenous abilities that are found in edge devices. Similar functionality and determination to what is found in horizontal partitioning is done, but all of the devices that the workload is split across function on the same layer. To fully optimize a machine learning model, both horizontal and vertical partitioning must be used.