Gradient scale monitoring for federated learning systems

Bogacka, Karolina; Danilenka, Anastasiya; Wasielewska-Michniewska, Katarzyna

doi:10.14313/JAMRIS/3‐2024/18

Artykuł - szczegóły

Tytuł artykułu

Gradient scale monitoring for federated learning systems

Autorzy

Bogacka Karolina , Danilenka Anastasiya , Wasielewska-Michniewska Katarzyna

Treść / Zawartość

Pełne teksty:

Pobierz

Identyfikatory

DOI

10.14313/JAMRIS/3‐2024/18

Warianty tytułu

Języki publikacji

Abstrakty

As the computational and communicational capabilities of edge and IoT devices grow, so do the opportunities for novel Machine Learning solutions. This leads to an increase in popularity of Federated Learning (FL), especially in cross-device settings. However, while there is a multitude of ongoing research works analyzing various aspects of the FL process, most of them do not focus on issues of operationalization and monitoring. For instance, there is a noticeable lack of research in the topic of effective problem diagnosis in FL systems. This work begins with a case study, in which we have intended to compare the performance of four selected approaches to the topology of FL systems. For this purpose, we have constructed and executed simulations of their training process in a controlled environment. We have analyzed the obtained results and encountered concerning periodic drops in the accuracy for some of the scenarios. We have performed a successful reexamination of the experiments, which led us to diagnose the problem as caused by exploding gradients. In view of those findings, we have formulated a potential new method for the continuous monitoring of the FL training process. The method would hinge on regular local computation of a handpicked metric - the gradient scale coefficient (GSC). We then extend our prior research to include a preliminary analysis of the effectiveness of GSC and average gradients per layer as potentially suitable for FL diagnostics metrics. In order to perform a more thorough examination of their usefulness in different FL scenarios, we simulate the occurrence of the exploding gradient problem, vanishing gradient problem and stable gradient serving as a baseline. We then evaluate the resulting visualizations based on their clarity and computational requirements. We introduce a gradient monitoring suite for the FL training process based on our results.

Słowa kluczowe

federated learning exploding gradient problem vanishing gradient problem monitoring

Wydawca

Łukasiewicz Industrial Research Institute for Automation and Measurements PIAP

Czasopismo

Journal of Automation Mobile Robotics and Intelligent Systems

Rocznik

2024

Tom

Vol. 18, No. 3

Strony

14--27

Opis fizyczny

Bibliogr. 39 poz., rys.

Twórcy

autor

Bogacka Karolina

karolina.bogacka.dokt@pw.edu.pl

Warsaw University of Technology, Plac Politechniki 1, 00-661 Warszawa, Poland

https://orcid.org/0000-0002-7109-891X

autor

Danilenka Anastasiya

anastasiya.danilenka.dokt@pw.edu.pl

Warsaw University of Technology, Plac Politechniki 1, 00-661 Warszawa, Poland

https://orcid.org/0000-0002-3080-0303

autor

Wasielewska-Michniewska Katarzyna

katarzyna.wasielewska@ibspan.waw.pl

Systems Research Institute, Polish Academy of Sciences, Newelska 6, 01‐447 Warszawa, Poland

https://orcid.org/0000-0002-3763-2373

Bibliografia

[1] Introducing Federated Learning into Internet of Things ecosystems – preliminary considerations, 07 2022.
[2] A. Bellet, A. Kermarrec, and E. Lavoie, “D-cliques: Compensating noniidness in decentralized federated learning with topology”, CoRR, vol. abs/2104.07365, 2021.
[3] K. Bogacka, A. Danilenka, and K. Wasielewska-Michniewska, “Diagnosing machine learning problems in federated learning systems: A ase study”. In: M. Ganzha, L. Maciaszek, M. Paprzycki, and D. Ślęzak, eds., Proceedings of the 18th Conference on Computer Science and Intelligence Systems, vol. 35, 2023, 871–876, 10.15439/2023F722.
[4] Q. Cheng and G. Long, “Federated learning operations (flops): Challenges, lifecycle and approaches”. In: 2022 International Conference on Technologies and Applications f Artiϔicial Intelligence (TAAI), 2022, 12–17, 10.1109/TAAI57707.2022.00012.
[5] L. Chou, Z. Liu, Z. Wang, and A. Shrivastava, “Efficient and less centralized federated learning”, CoRR, vol. abs/2106.06627, 2021.
[6] A.-I. Consortium. “D7.2 Pilot Scenario Implemenation – First Version”, 2022.
[7] X. Du, X. Chen, J. Cao, M. Wen, S.-C. Cheung, and H. Jin, “Understanding the bug characteristics and fix strategies of federated learning systems”. In: Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, New York, NY, USA, 2023, 1358–1370, 10.1145/3611643.3616347.
[8] S. Duan, C. Liu, P. Han, X. Jin, X. Zhang, X. Xiang, H. Pan, et al., “Fed-dnn-debugger: Automatically debugging deep neural network models in federated learning”, Security and Communication Networks, vol. 2023, 2023.
[9] H. Eichner, T. Koren, H. B. McMahan, N. Srebro, and K. Talwar, “Semi-cyclic stochastic gradient descent”, CoRR, vol. abs/1904.10120, 2019.
[10] A. Ghosh, J. Chung, D. Yin, and K. Ramchandran. “An efficient framework for clustered federated learning”, 2021.
[11] W. Gill, A. Anwar, and M. A. Gulzar. “Feddebug: Systematic debugging for federated learning applications”, 2023.
[12] X. Glorot and Y. Bengio, “Understanding the difficulty of training deep feedforward neural networks”. In: Y. W. Teh and M. Titterington, eds., Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, ol. 9, Chia Laguna Resort, Sardinia, Italy, 2010, 249–256.
[13] F. Godin, J. Degrave, J. Dambre, and W. De Neve, “Dual rectified linear units (drelus): A replacement for tanh activation functions in quasi-recurrent neural networks”, Pattern Recognition Letters, vol. 116, 2018, 8–14.
[14] B. Hanin, “Which neural net architectures give rise to exploding and vanishing gradients?”. In: S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, eds., Advances in Neural Information Processing Systems, vol. 31, 2018.
[15] Harshvardhan, A. Ghosh, and A. Mazumdar. “An improved algorithm for clustered federated learning”, 2022.
[16] I. Hegedűs, G. Danner, and M. Jelasity, “Gossip learning as a decentralized alternative to federated learning”. In: J. Pereira and L. Ricci, eds., Distributed Applications and Interoperable Systems, Cham, 2019, 74–90.
[17] L. U. Khan, W. Saad, Z. Han, E. Hossain, andC. S. Hong, “Federated learning for internet of things: Recent advances, taxonomy, and open challenges”, CoRR, vol. abs/2009.13012, 2020.
[18] D. Kreuzberger, N. Kühl, and S. Hirschl, “Machine learning operations (mlops): Overview, definition, and architecture”, IEEE Access, vol. 11, 2023, 31866–31879, 10.1109/ACCESS.2023.3262138.
[19] J. Lee, J. Oh, S. Lim, S. Yun, and J. Lee, “Tornadoag-gregate: Accurate and scalable federated learning via the ring-based architecture”, CoRR, vol. abs/2012.03214, 2020.
[20] A. Li, R. Liu, M. Hu, L. A. Tuan, and H. Yu, “Towards interpretable federated learning”, arXiv preprint arXiv:2302.13473, 2023.
[21] A. Li, L. Zhang, J. Wang, F. Han, and X.-Y. Li, “Privacy-preserving efficient federated-learning model debugging”, IEEE Transactions on Parallel and Distributed Systems, vol. 33, no. 10, 2022, 2291–2303, 10.1109/TPDS.2021.3137321.
[22] Q. Li, Z. Wen, Z. Wu, S. Hu, N. Wang, Y. Li, X. Liu, and B. He, “A survey on federated learning systems: Vision, hype and reality for data privacy and protection”, IEEE Transactions on Knowledge and Data Engineering, vol. 35, no. 4, 2023, 3347–3366, 10.1109/TKDE.2021.3124599.
[23] L. Liu, J. Zhang, S. Song, and K. B. Letaief, “Edge-assisted hierarchical federated learning with non-iid data”, CoRR, vol. abs/1905.06641, 2019.
[24] Y. Liu, W. Wu, L. Flokas, J. Wang, and E. Wu, “Enabling sql-based training data debugging for federated learning”, CoRR, vol. abs/2108.11884, 2021.
[25] H. B. McMahan, E. Moore, D. Ramage, and B. A. y Arcas, “Federated learning of deep networks using model averaging”, CoRR, vol. abs/1602.05629, 2016.
[26] L. Meng, Y. Wei, R. Pan, S. Zhou, J. Zhang, and W. Chen, “Vadaf: Visualization for abnormal client detection and analysis in federated learning”, ACM Trans. Interact. Intell. Syst., vol. 11, no. –4, 2021, 10.1145/3426866.
[27] M. A. Mercioni and S. Holban, “The most used activation functions: Classic versus current”. In: 2020 International Conference on Development and Application Systems (DAS), 2020, 141–145.
[28] N. Mhaisen, A. A. Abdellatif, A. Mohamed, A. Erbad, and M. Guizani, “Optimal user-edge assignment in hierarchical federated learning based on statistical properties and network topology constraints”, IEEE Transactions on Network Science and Engineering, vol. 9, no. 1, 2022, 55–66, 10.1109/TNSE.2021.3053588.
[29] R. Pascanu, T. Mikolov, and Y. Bengio, “On the difficulty of training recurrent neural networks”. In: International conference on machine learning, 2013, 1310–1318.
[30] G. Philipp, “The nonlinearity coefficient-a practical guide to neural architecture design”, arXiv preprint arXiv:2105.12210, 2021.
[31] G. Philipp, D. Song, and J. G. Carbonell. “The exploding gradient problem demystified - definition, prevalence, impact, origin, tradeoffs, and solutions”, 2018.
[32] M. Roodschild, J. Gotay Sardiñas, and A. Will, “A new approach for the vanishing gradient problem on sigmoid activation”, Progress in Artificial Intelligence, vol. 9, no. 4, 2020, 351–360.
[33] Y. Shi, Y. E. Sagduyu, and T. Erpek. “Federated learning for distributed spectrum sensing in nextg communication networks”, 2022.
[34] J. Stallkamp, M. Schlipsing, J. Salmen, and C. Igel, “Man vs. computer: Benchmarking machine learning algorithms for traffic sign recognition”, Neural Networks, vol. 32, 2012, 323–332, https://doi.org/10.1016/j.neunet.2012.02.016, Selected Papers from IJCNN 2011.
[35] J. Wu, S. Drew, F. Dong, Z. Zhu, and J. Zhou. “Topology-aware federated learning in Edge computing: A comprehensive survey”, 2023.
[36] J. Wu, S. Drew, F. Dong, Z. Zhu, and J. Zhou, “Topology-aware federated learning in Edge computing: A comprehensive survey”, arXiv preprint arXiv:2302.02573, 2023.
[37] T. Yang, G. Andrew, H. Eichner, H. Sun, W. Li, N. Kong, D. Ramage, and F. Beaufays. “Applied federated learning: Improving google keyboard query suggestions”, 2018.
[38] D. Yin, Y. Chen, R. Kannan, and P. Bartlett, “Byzantine-robust distributed learning: Towards optimal statistical rates”. In: J. Dy and A. Krause, eds., Proceedings of the 35th International Conference on Machine Learning, vol. 80, 2018, 5650–5659.
[39] M. Zhang, E. Wei, and R. Berry, “Faithful edge federated learning: Scalability and privacy”, IEEE Journal on Selected Areas in Communications, vol. 39, no. 12, 2021, 3790–3804, 10.1109/JSAC.2021.3118423.

Uwagi

Opracowanie rekordu ze środków MNiSW, umowa nr POPUL/SP/0154/2024/02 w ramach programu "Społeczna odpowiedzialność nauki II" - moduł: Popularyzacja nauki (2025).

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-4c7853f5-3108-4a0e-b75b-9892a41ae234