PL EN


Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników
Tytuł artykułu

Developing artificial intelligence in the cloud: the AI_INFN platform

Treść / Zawartość
Identyfikatory
Warianty tytułu
Języki publikacji
EN
Abstrakty
EN
The INFN CSN5-funded projectAI_INFN("Artificial Intelligence at INFN") aims to promote ML and AI adoption within INFN by providing comprehensive support, including state of-the-art hardware and cloud-native solutions within INFN Cloud. This facilitates efficient sharing of hardware accelerators with-out hindering the institute’s diverse research activities. AI_INFN advances from a Virtual-Machine-based model to a flexible Kubernetes-based platform, offering features such as JWT-based authentication, JupyterHub multitenant interface, distributed file system, customizable conda environments, and specialized monitoring and accounting systems. It also enables virtual nodes in the cluster, offloading computing payloads to remote resources through the Virtual Kubelet technology, with InterLinkas provider. This setup can manage workflows across various providers and hardware types, which is crucial for scientific use cases that require dedicated infrastructures for different parts of the workload. Results of initial tests to validate its production applicability, emerging case studies and integration scenarios are presented.
Wydawca
Czasopismo
Rocznik
Strony
9--28
Opis fizyczny
Bibliogr. 51 poz., rys., tab., wykr.
Twórcy
  • Istituto Nazionale di Fisica Nucleare, Sezione di Firenze, via G. Sansone 1, Sesto Fiorentino (FI), 50019, Italy
  • Istituto Nazionale di Fisica Nucleare, CNAF, Viale Berti Pichat 6/2, Bologna (IT), 40127, Italy
  • Istituto Nazionale di Fisica Nucleare, Sezione di Perugia, via A. Pascoli, Perugia (PG), 06123, Italy
  • Istituto Nazionale di Fisica Nucleare, Sezione di Perugia, via A. Pascoli, Perugia (PG), 06123, Italy
  • Istituto Nazionale di Fisica Nucleare, CNAF, Viale Berti Pichat 6/2, Bologna (IT), 40127, Italy
autor
  • Istituto Nazionale di Fisica Nucleare, Sezione di Firenze, via G. Sansone 1, Sesto Fiorentino (FI), 50019, Italy
  • Istituto Nazionale di Fisica Nucleare, Sezione di Perugia, via A. Pascoli, Perugia (PG), 06123, Italy
autor
  • Istituto Nazionale di Fisica Nucleare, CNAF, via G. Sansone 1, Sesto Fiorentino (FI), 50019, Italy
Bibliografia
  • [1] AMD Vitis™AI Software, 2024. https://www.amd.com/en/products/software/vitis-ai.html. Accessed: 15/09/2024.
  • [2] Apache Airflow, 2024. https://airflow.apache.org/. Accessed: 12/12/2024.
  • [3] Borg, 2024. https : / / borgbackup.readthedocs.io / en / stable / #. Accessed:15/09/2024.
  • [4] Conda, 2024. https://conda.io. Accessed: 15/09/2024.
  • [5] Docker Swarm Mode, 2024. https://docs.docker.com/engine/swarm/. Accessed:12/12/2024.
  • [6] ELOG, 2024. https://elog.psi.ch/elog/. Accessed: 15/09/2024.
  • [7] Intel©Distribution of OpenVINO™Toolkit, 2024. https://www.intel.com/content/www/us/en/developer/tools/openvino-toolkit/overview.html. Accessed:15/09/2024.
  • [8] InterLink, 2024.https : / / intertwin - eu.github.io / interLink/. Accessed:15/09/2024.
  • [9] InterTwin, 2024. https://www.intertwin.eu/. Accessed: 15/09/2024.
  • [10] JuiceFS, 2024. https://juicefs.com/en/. Accessed: 15/09/2024.
  • [11] Jupyter Server Proxy, 2024. https://jupyter-server-proxy.readthedocs.io/en/latest/. Accessed: 15/09/2024.
  • [12] Kueue, 2024. https://kueue.sigs.k8s.io/. Accessed: 15/09/2024.
  • [13] Leonardo, 2024.https : / / leonardo - supercomputer.cineca.eu/. Accessed:15/09/2024.
  • [14] MinIO, 2024. https://min.io/. Accessed: 15/09/2024.
  • [15] Multi-instance GPU, 2024. https://www.nvidia.com/it-it/technologies/multi-instance-gpu/. Accessed: 15/09/2024.
  • [16] Nomad by HashiCorp, 2024. https : / / www.nomadproject.io/. Accessed:12/12/2024.
  • [17] nVidia DCGM Exporter, 2024. https://docs.nvidia.com/datacenter/cloud-native/gpu-telemetry/latest/dcgm-exporter.html. Accessed: 15/09/2024.
  • [18] NVIDIA GPU Operator, 2024. https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/index.html. Accessed: 15/09/2024.
  • [19] PostgreSQL, 2024. https://www.postgresql.org/. Accessed: 15/09/2024.
  • [20] Python venv, 2024. https://docs.python.org/3/library/venv.html. Accessed:15/09/2024.
  • [21] Rados Gateway, 2024. https://docs.ceph.com/en/reef/radosgw/. Accessed:15/09/2024.
  • [22] Squashfs, 2024. https://www.kernel.org/doc/Documentation/filesystems/squashfs.txt. Accessed: 15/09/2024.
  • [23] Virtual Kubelet, 2024. https://virtual-kubelet.io/. Accessed: 15/09/2024.
  • [24] Abadi M., Agarwal A., Barham P., Brevdo E., Chen Z., Citro C., Corrado G.S., et al.: TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems,2015. https://www.tensorflow.org/. Software available from tensorflow.org.
  • [25] Adamec M., Attebury G., Bloom K., Bockelman B., Lundstedt C., Shadura O.,Thiltges J.: Coffea-casa: an analysis facility prototype, EPJ Web Conferences, vol. 251, 02061, 2021. doi: 10.1051/epjconf/202125102061.
  • [26] Anderlini L., Boccali T., Dal Pra S., Duma D., Giommi L., Spiga D., Vino G.: ML_INFN project: Status report and future perspectives, EPJ Web of Conferences, vol. 295, 2024. doi: 10.1051/epjconf/202429508013.
  • [27] Antonacci M., Salomoni D.: Leveraging TOSCA orchestration to enable fullyautomated cloud-based research environments on federated heterogeneous e-infrastructures, PoS, vol. ISGC&HEPiX2023, 020, 2023. doi: 10.22323/1.434.0020.
  • [28] Bergholm V., Izaac J., Schuld M., Gogolin C., Ahmed S., Ajith V., Alam M.S., et al.: PennyLane: Automatic differentiation of hybrid quantum-classical com- putations, 2022. https://arxiv.org/abs/1811.04968.
  • [29] Bockelman B., Livny M., Lin B., Prelz F.: Principles, technologies, and time: The translational journey of the HTCondor-CE, Journal of Computational Science, vol. 52, 101213, 2021. doi: 10.1016/j.jocs.2020.101213. Case Studies inTranslational Computer Science.
  • [30] Ceccanti A., Hardt M., Wegh B., Millar A., Caberletti M., Vianello E., Lice-hammer S.: The INDIGO-Datacloud Authentication and Authorization Infrastructure, Journal of Physics: Conference Series, vol. 898(10), 102016, 2017.doi: 10.1088/1742-6596/898/10/102016.
  • [31] Chen S., Glioti A., Panico G., Wulzer A.: Boosting likelihood learning with event reweighting, Journal of High Energy Physics, vol. 2024, 117, 2024. doi: 10.1007/JHEP03(2024)117.
  • [32] Chollet F., et al.: Keras, https://keras.io, 2015.
  • [33] Ciangottini D.: rclone, 2022. https://github.com/DODAS-TS/rclone.
  • [34] Eddelbuettel D.: A Brief Introduction to Redis, 2022. https://arxiv.org/abs/2203.06559.
  • [35] FastML Team: fastmachinelearning/hls4ml, 2023. doi: 10.5281/zenodo.1201549.
  • [36] Grafana Labs: Grafana Documentation, 2018. https://grafana.com/docs/.
  • [37] Grant T., Karau H., Lublinsky B., Liu R., Filonenko I.: Kubeflow for Machine Learning, O’Reilly Media, 2020. https://books.google.it/books?id=YLICEAAAQBAJ.
  • [38] Janssens D., Brunbauer F., Flöthner K., Lisowska M., Muller H., Oliveri E.,Orlandini G., et al.: Studying signals in particle detectors with resistive elements such as the 2D resistive strip bulk MicroMegas, Journal of Instrumentation, vol. 18(08), C08010, 2023. doi: 10.1088/1748-0221/18/08/C08010.
  • [39] Kluyver T., Ragan-Kelley B., Pérez F., Granger B., Bussonnier M., Frederic J., Kelley K., et al.: Jupyter Notebooks – a publishing format for reproducible computational workflows. In: F. Loizides, B. Schmidt (eds.),Positioning andPower in Academic Publishing: Players, Agents and Agendas, pp. 87–90, IOSPress, 2016.
  • [40] Lizzi F., Postuma I., Brero F., Cabini R., Fantacci M., Oliva P., Rinaldi L., et al.: Quantification of pulmonary in volvement in COVID-19 pneumonia: an upgradeof the LungQuant software for lung CT segmentation, The European Physical Journal Plus, vol. 138, 2023. doi: 10.1140/epjp/s13360-023-03896-4.
  • [41] Mariani S., Anderlini L., Di Nezza P., Franzoso E., Graziani G., Pappalardo L.L.: A neural-network-defined Gaussian mixture model for particle identification applied to the LHCb fixed-target programme, Journal of Physics: Conference Series, vol. 2438(1), 012107, 2023. doi: 10.1088/1742-6596/2438/1/012107.
  • [42] Mariotti M., Magalotti D., Spiga D., Storchi L.: The BondMachine, a moldable computer architecture, Parallel Computing, vol. 109, 102873, 2022. doi: 10.1016/j.parco.2021.102873.
  • [43] NVIDIA, Vingelmann P., Fitzek F.H.: CUDA, release: 10.2.89, 2020. https://developer.nvidia.com/cuda-toolkit.
  • [44] Paszke A., Gross S., Massa F., Lerer A., Bradbury J., Chanan G., Killeen T.,et al.: PyTorch: An Imperative Style, High-Performance Deep Learning Library. In: Advances in Neural Information Processing Systems 32, pp. 8024–8035, Curran Associates, Inc., 2019. http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf.
  • [45] Salomoni D., Campos I., Gaido L., de Lucas J.M., Solagna P., Gomes J.,Matyska L., et al.: INDIGO-DataCloud: a Platform to Facilitate Seamless Accessto E-Infrastructures, Journal of Grid Computing, vol. 16(3), pp. 381–408, 2018.doi: 10.1007/s10723-018-9453-3.
  • [46] Schneppenheim M.: Kube eagle, 2020. https://github.com/cloudworkz/kube-eagle.
  • [47] Stetzler S., Jurić M., Boone K., Connolly A., Slater C.T., Zečević P.: The Astronomy Commons Platform: A Deployable Cloud-based Analysis Platform for Astronomy, The Astronomical Journal, vol. 164(2), 68, 2022. doi: 10.3847/1538-3881/ac77fb.
  • [48] Tejedor E., Bocchi E., Castro D., Gonzalez H., Lamanna M., Mato P., Moscicki J.,et al.: Facilitating Collaborative Analysis in SWAN, EPJ Web Conferences, vol. 214, 07022, 2019. doi: 10.1051/epjconf/201921407022.
  • [49] Weil S.A., Brandt S.A., Miller E.L., Long D.D.E., Maltzahn C.: Ceph: a scalable, high-performance distributed file system. In: Proceedings of the 7th Symposium on Operating Systems Design and Implementation, pp. 307–320, OSDI ’06,USENIX Association, USA, 2006.
  • [50] Winikoff M., Padgham L.: The Prometheus Methodology. In: F. Bergenti, M.P.Gleizes, F. Zambonelli (eds.), Methodologies and Software Engineering for Agent Systems, pp. 217–234, Springer, Boston, 2004. doi: 10.1007/1-4020-8058-1_14.
  • [51] Yoo A.B., Jette M.A., Grondona M.: SLURM: Simple Linux Utility for Resource Management. In: D. Feitelson, L. Rudolph, U. Schwiegelshohn (eds.), Job Scheduling Strategies for Parallel Processing, pp. 44–60, Springer Berlin Heidel-berg, Berlin, Heidelberg, 2003
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-03b63b31-fce7-4338-8f8c-95268751bd07
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.