Article information
2019 , Volume 24, ¹ 5, p.49-60
Smagin S.I., Sorokin A.A., Malkovsky S.I., Korolev S.P., Lukyanova O.A., Nikitin O.Y., Kondrashev V.A., Chernykh V.Y.
The organization of effective multi-user operation of hybrid computing systems
Purpose. Improving the technology of machine learning, deep learning and artificial intelligence plays an important role in acquiring new knowledge, technological modernization and the digital economy development. An important factor of the development in these areas is the availability of an appropriate high-performance computing infrastructure capable of providing the processing of large amounts of data. The creation of co-processor-based hybrid computing systems, as well as new parallel programming technologies and application development tools allows partial solving this problem. However, many issues of organizing the effective multi-user operation of this class of systems require a separate study. The current paper addresses research in this area. Methodology. Using the OpenPOWER architecture-based cluster in the Shared Services Center “The Data Center of the Far Eastern Branch of the Russian Academy of Sciences”, the features of the functioning of hybrid computing systems are considered and solutions are proposed for organizing their work in a multi-user mode. Based on the virtual nodes concept, an adaptation of the PBS Professional job scheduling system was carried out, which provides an efficient allocation of cluster hardware resources among user tasks. Application virtualization technology was used for effective execution of machine learning and deep learning problems. Findings. The implemented cluster software environment with the integrated task scheduling system is designed to work with a wide range of computer applications, including programs built using parallel programming technologies. The virtualization technologies were used in this environment for effective execution of the software, based on machine learning, deep learning and artificial intelligence. Having the capabilities of the container Singularity, a specialized software stack and its operation mode was implemented for execution machine learning, deep learning and artificial intelligence tasks on a unified computing digital platform. Originality. The features of hybrid computing platforms functioning are considered, and the approach for their effective multi-user work mode is proposed. An effective resource manage model is developed, based on the virtualization technology usage.
[full text] Keywords: hybrid computing cluster, coprocessor, multi-user access mode, computer architecture, job scheduling system, workload manager, virtualization, container
doi: 10.25743/ICT.2019.24.5.005.
Author(s): Smagin Sergey Ivanovich Dr. , Correspondent member of RAS, Professor Position: Director Office: Computer Center FEB RAS Address: 680000, Russia, Khabarovsk
Phone Office: (4212) 22 72 67 E-mail: smagin@ccfebras.ru SPIN-code: 2419-4990Sorokin Aleksei Anatolyevich PhD. Position: Leading research officer Office: CC FEB RAS Address: 680000, Russia, Khabarovsk, 65, Kim Yu Chen str.
Phone Office: (4212) 703913 E-mail: alsor@febras.net SPIN-code: 1767-2259Malkovsky Sergey Ivanovich Position: Research Scientist Office: Institution of Science Computing Center of the Far Eastern Branch of the Russian Academy of Sciences Address: 680000, Russia, Khabarovsk, 65, Kim U Chen st.
Phone Office: (4212) 703913 E-mail: sergey.malkovsky@ccfebras.ru SPIN-code: 3653-9904Korolev Sergey Pavlovich Position: Research Scientist Office: CC FEB RAS Address: 680000, Russia, Khabarovsk, 65, Kim U Chen st.
Phone Office: (4212) 703913 E-mail: serejk@febras.net SPIN-code: 5884-4506Lukyanova Olga Alexandrovna Position: Research Scientist Office: Federal State Budgetary Institution of Science Computing Center of the Far Eastern Branch of the Russian Academy of Sciences Address: 680000, Russia, Vladivostok, 65, Kim Yu Chen, str.
Phone Office: (924) 411-6656 E-mail: ollukyan@gmail.com SPIN-code: 5347-8092Nikitin Oleg Yur`evich Position: Research Scientist Office: Institution of Science Computing Center of the Far Eastern Branch of the Russian Academy of Sciences Address: 680000, Russia, Khabarovsk, 65, Kim U Chen st.
Phone Office: (4212) 703913 E-mail: olegioner@gmail.com SPIN-code: 8499-0846Kondrashev Vadim Adol`fovich PhD. Position: Senior Research Scientist Office: FRC CSC RAS Address: 119333, Russia, Moscow, 44/2, Vavilova st.
Phone Office: (499)1373494 E-mail: vkondrashev@frccsc.ru SPIN-code: 1060-3954Chernykh Vladimir Yur`evich Position: Junior Research Scientist Office: CC FEB RAS Address: 680000, Russia, Khabarovsk, 65, Kim U Chen st.
Phone Office: (4212) 703913 E-mail: syler1983.9@gmail.com
References: [1] Strohmaier, E., Meuer, H.W., Dongarra, J., Simon, H.D. The TOP500 List and Progress in High-Performance Computing. Computer. 2015; 48(11):42–49. DOI:10.1109/MC.2015.338.
[2] Keckler, S.W., Dally, W.J., Khailany, B., Garland, M., Glasco, D. GPUs and the Future of Parallel Computing. IEEE Micro. 2011; 31(5):7–17. DOI:10.1109/MM.2011.89.
[3] Garland, M., Le Grand, S., Nickolls, J., Anderson, J., Hardwick, J., Morton, S., Phillips, E., Zhang, Y., Volkov, V. Parallel computing experiences with CUDA. IEEE Micro. 2008; 28(4):13–27. DOI:10.1109/MM.2008.57.
[4] Stone, J.E., Gohara, D., Shi, G. OpenCL: A parallel programming standard for heterogeneous computing systems. Computing in Science & Engineering. 2010; 12(3):66–72. DOI:10.1109/MCSE.2010.69.
[5] Liao, C., Yan, Y., de Supinski, B.R., Quinlan, D.J., Chapman, B. Early experiences with the Open MP accelerator model. Lecture Notes in Computer Science. 2013; (8122):84–98. DOI:10.1007/978-3-642-40698-0_7.
[6] Ruetsch, G., Fatica, M. CUDA Fortran for scientists and engineers: Best practices for efficient CUDA Fortran Programming (1st ed.). San Francisco: Morgan Kaufmann Publishers Inc.; 2013: 338.
[7] Steinkraus, D., Buck, I., Simard, P.Y. Using GPUs for machine learning algorithms. Proc. of Eighth Intern. Conf. on Document Analysis and Recognition (ICDAR’05). Seoul. 2005; (2):1115–1120. DOI:10.1109/ICDAR.2005.251.
[8] Reuther, A., Byun, C., Arcand, W., Bestor, D., Bergeron, B., Hubbell, M., Jones, M., Michaleas, P., Prout, A., Rosa, A., Kepner, J. Scalable system scheduling for HPC and big data. Journal of Parallel and Distributed Computing. 2018; (111):76–92. DOI:10.1016/j.jpdc.2017.06.009.
[9] Sterling, T., Anderson, M., Brodowicz, M. Chapter 5 — the essential resource management. High Performance Computing. 2018; 141–190. DOI:10.1016/B978-0-12420158-3.00005-8.
[10] Quintero, D., de Souza Casali, D., Luis Cerdas Moya, E., Fros, F., Olejniczak, M. IBM spectrum computing solutions. First Edition. USA: International Business Machines Corporation; 2017: 214.
[11] Lameter, C. NUMA (Non-Uniform Memory Access): An overview. Queue. 2013; 11(7):1–12. DOI:10.1145/2508834.2513149.
[12] Gschwind, M. OpenPOWER: Reengineering a server ecosystem for large-scale data centers. Proceedings of IEEE Hot Chips 26 Symposium (HCS). Cupertino, CA, USA; 2014. Accession Number: 16123547. DOI:10.1109/HOTCHIPS.2014.7478829.
[13] Sinharoy, B., Van Norstrand, J.A., Eickemeyer, R.J., Le, H.Q., Leenstra, J., Nguyen, D.Q., Konigsburg, B., Ward, K., Brown, M.D., Moreira, J.E., Levitan, D., Tung, S., Hrusecky, D., Bishop, J.W., Gschwind, M., Boersma, M., Kroener, M., Kaltenbach, M., Karkhanis, T., Fernsler, K.M. IBM POWER8 processor core microarchitecture. IBM Journal of Research and Development. 2015; 59(1):2:1–2:21. DOI:10.1147/JRD.2014.2376112.
[14] Foley, D., Danskin, J. Ultra-Performance Pascal GPU and NVLink Interconnect. IEEE Micro. 2017; 37(2):7–17. DOI:10.1109/MM.2017.37.
[15] Appelhans, D., Walkup, B. Leveraging NVLINK and asynchronous data transfer to scale beyond the memory capacity of GPUs. Proc. of the 8th Works. on Latest Advances in Scalable Algorithms for Large-Scale Systems. Denver, CO, USA; 2017; Article No. 5. DOI:10.1145/3148226.3148232.
[16] Merkel, D. Docker: lightweight Linux containers for consistent development and deployment. Linux Journal. 2014; (239):76–90.
[17] Kurtzer, G.M., Sochat, V., Bauer, M.W. Singularity: Scientific containers for mobility of compute. PLOS ONE. 2017; 12(5). Accession Number: e0177459. Available at: https://doi.org/10.1371/journal.pone.0177459
[18] Gerhardt, L., Bhimji, W., Canon, S., Fasel, M., Jacobsen, D., Mustafa, M., Porter, J., Tsulaia, V. Shifter: Containers for HPC. Journal of Physics: Conference Series. 2017; (898):082021. DOI:10.1088/1742-6596/898/8/082021.
[19] Furlani, J.L. Modules: Providing a flexible user environment. Proceedings of the Fifth Large Installation Systems Administration Conference (LISA V). 1991: 141–152.
[20] Sorokin, A.A., Makogonov, S.V., Korolev, S.P. The information infrastructure for collective scientific work in the Far East of Russia. Scientific and Technical Information Processing. 2017; 44(4):302–304. DOI: 10.3103/S0147688217040153.
Bibliography link: Smagin S.I., Sorokin A.A., Malkovsky S.I., Korolev S.P., Lukyanova O.A., Nikitin O.Y., Kondrashev V.A., Chernykh V.Y. The organization of effective multi-user operation of hybrid computing systems // Computational technologies. 2019. V. 24. ¹ 5. P. 49-60
|