Journal "Computational Technologies"

Article information

2019 , Volume 24, № 5, p.49-60

Smagin S.I., Sorokin A.A., Malkovsky S.I., Korolev S.P., Lukyanova O.A., Nikitin O.Y., Kondrashev V.A., Chernykh V.Y.

The organization of effective multi-user operation of hybrid computing systems

Purpose. Improving the technology of machine learning, deep learning and artificial intelligence plays an important role in acquiring new knowledge, technological modernization and the digital economy development. An important factor of the development in these areas is the availability of an appropriate high-performance computing infrastructure capable of providing the processing of large amounts of data. The creation of co-processor-based hybrid computing systems, as well as new parallel programming technologies and application development tools allows partial solving this problem. However, many issues of organizing the effective multi-user operation of this class of systems require a separate study. The current paper addresses research in this area.

Methodology. Using the OpenPOWER architecture-based cluster in the Shared Services Center “The Data Center of the Far Eastern Branch of the Russian Academy of Sciences”, the features of the functioning of hybrid computing systems are considered and solutions are proposed for organizing their work in a multi-user mode. Based on the virtual nodes concept, an adaptation of the PBS Professional job scheduling system was carried out, which provides an efficient allocation of cluster hardware resources among user tasks. Application virtualization technology was used for effective execution of machine learning and deep learning problems.

Findings. The implemented cluster software environment with the integrated task scheduling system is designed to work with a wide range of computer applications, including programs built using parallel programming technologies. The virtualization technologies were used in this environment for effective execution of the software, based on machine learning, deep learning and artificial intelligence. Having the capabilities of the container Singularity, a specialized software stack and its operation mode was implemented for execution machine learning, deep learning and artificial intelligence tasks on a unified computing digital platform.

Originality. The features of hybrid computing platforms functioning are considered, and the approach for their effective multi-user work mode is proposed. An effective resource manage model is developed, based on the virtualization technology usage.

[full text]
Keywords: hybrid computing cluster, coprocessor, multi-user access mode, computer architecture, job scheduling system, workload manager, virtualization, container

doi: 10.25743/ICT.2019.24.5.005.

Author(s):
Smagin Sergey Ivanovich
Dr. , Correspondent member of RAS, Professor
Position: Director
Office: Computer Center FEB RAS
Address: 680000, Russia, Khabarovsk
Phone Office: (4212) 22 72 67
E-mail: smagin@ccfebras.ru
SPIN-code: 2419-4990

Sorokin Aleksei Anatolyevich
PhD.
Position: Leading research officer
Office: CC FEB RAS
Address: 680000, Russia, Khabarovsk, 65, Kim Yu Chen str.
Phone Office: (4212) 703913
E-mail: alsor@febras.net
SPIN-code: 1767-2259

Malkovsky Sergey Ivanovich
Position: Research Scientist
Office: Institution of Science Computing Center of the Far Eastern Branch of the Russian Academy of Sciences
Address: 680000, Russia, Khabarovsk, 65, Kim U Chen st.
Phone Office: (4212) 703913
E-mail: sergey.malkovsky@ccfebras.ru
SPIN-code: 3653-9904

Korolev Sergey Pavlovich
Position: Research Scientist
Office: CC FEB RAS
Address: 680000, Russia, Khabarovsk, 65, Kim U Chen st.
Phone Office: (4212) 703913
E-mail: serejk@febras.net
SPIN-code: 5884-4506

Lukyanova Olga Alexandrovna
Position: Research Scientist
Office: Federal State Budgetary Institution of Science Computing Center of the Far Eastern Branch of the Russian Academy of Sciences
Address: 680000, Russia, Vladivostok, 65, Kim Yu Chen, str.
Phone Office: (924) 411-6656
E-mail: ollukyan@gmail.com
SPIN-code: 5347-8092

Nikitin Oleg Yur`evich
Position: Research Scientist
Office: Institution of Science Computing Center of the Far Eastern Branch of the Russian Academy of Sciences
Address: 680000, Russia, Khabarovsk, 65, Kim U Chen st.
Phone Office: (4212) 703913
E-mail: olegioner@gmail.com
SPIN-code: 8499-0846

Kondrashev Vadim Adol`fovich
PhD.
Position: Senior Research Scientist
Office: FRC CSC RAS
Address: 119333, Russia, Moscow, 44/2, Vavilova st.
Phone Office: (499)1373494
E-mail: vkondrashev@frccsc.ru
SPIN-code: 1060-3954

Chernykh Vladimir Yur`evich
Position: Junior Research Scientist
Office: CC FEB RAS
Address: 680000, Russia, Khabarovsk, 65, Kim U Chen st.
Phone Office: (4212) 703913
E-mail: syler1983.9@gmail.com

References:
[1] Strohmaier, E., Meuer, H.W., Dongarra, J., Simon, H.D. The TOP500 List and Progress in High-Performance Computing. Computer. 2015; 48(11):42–49. DOI:10.1109/MC.2015.338.

[2] Keckler, S.W., Dally, W.J., Khailany, B., Garland, M., Glasco, D. GPUs and the Future of Parallel Computing. IEEE Micro. 2011; 31(5):7–17. DOI:10.1109/MM.2011.89.

[3] Garland, M., Le Grand, S., Nickolls, J., Anderson, J., Hardwick, J., Morton, S., Phillips, E., Zhang, Y., Volkov, V. Parallel computing experiences with CUDA. IEEE Micro. 2008; 28(4):13–27. DOI:10.1109/MM.2008.57.

[4] Stone, J.E., Gohara, D., Shi, G. OpenCL: A parallel programming standard for heterogeneous computing systems. Computing in Science & Engineering. 2010; 12(3):66–72. DOI:10.1109/MCSE.2010.69.

[5] Liao, C., Yan, Y., de Supinski, B.R., Quinlan, D.J., Chapman, B. Early experiences with the Open MP accelerator model. Lecture Notes in Computer Science. 2013; (8122):84–98. DOI:10.1007/978-3-642-40698-0_7.

[6] Ruetsch, G., Fatica, M. CUDA Fortran for scientists and engineers: Best practices for efficient CUDA Fortran Programming (1st ed.). San Francisco: Morgan Kaufmann Publishers Inc.; 2013: 338.

[7] Steinkraus, D., Buck, I., Simard, P.Y. Using GPUs for machine learning algorithms. Proc. of Eighth Intern. Conf. on Document Analysis and Recognition (ICDAR’05). Seoul. 2005; (2):1115–1120. DOI:10.1109/ICDAR.2005.251.

[8] Reuther, A., Byun, C., Arcand, W., Bestor, D., Bergeron, B., Hubbell, M., Jones, M., Michaleas, P., Prout, A., Rosa, A., Kepner, J. Scalable system scheduling for HPC and big data. Journal of Parallel and Distributed Computing. 2018; (111):76–92. DOI:10.1016/j.jpdc.2017.06.009.

[9] Sterling, T., Anderson, M., Brodowicz, M. Chapter 5 — the essential resource management. High Performance Computing. 2018; 141–190. DOI:10.1016/B978-0-12420158-3.00005-8.

[10] Quintero, D., de Souza Casali, D., Luis Cerdas Moya, E., Fros, F., Olejniczak, M. IBM spectrum computing solutions. First Edition. USA: International Business Machines Corporation; 2017: 214.

[11] Lameter, C. NUMA (Non-Uniform Memory Access): An overview. Queue. 2013; 11(7):1–12. DOI:10.1145/2508834.2513149.

[12] Gschwind, M. OpenPOWER: Reengineering a server ecosystem for large-scale data centers. Proceedings of IEEE Hot Chips 26 Symposium (HCS). Cupertino, CA, USA; 2014. Accession Number: 16123547. DOI:10.1109/HOTCHIPS.2014.7478829.

[13] Sinharoy, B., Van Norstrand, J.A., Eickemeyer, R.J., Le, H.Q., Leenstra, J., Nguyen, D.Q., Konigsburg, B., Ward, K., Brown, M.D., Moreira, J.E., Levitan, D., Tung, S., Hrusecky, D., Bishop, J.W., Gschwind, M., Boersma, M., Kroener, M., Kaltenbach, M., Karkhanis, T., Fernsler, K.M. IBM POWER8 processor core microarchitecture. IBM Journal of Research and Development. 2015; 59(1):2:1–2:21. DOI:10.1147/JRD.2014.2376112.

[14] Foley, D., Danskin, J. Ultra-Performance Pascal GPU and NVLink Interconnect. IEEE Micro. 2017; 37(2):7–17. DOI:10.1109/MM.2017.37.

[15] Appelhans, D., Walkup, B. Leveraging NVLINK and asynchronous data transfer to scale beyond the memory capacity of GPUs. Proc. of the 8th Works. on Latest Advances in Scalable Algorithms for Large-Scale Systems. Denver, CO, USA; 2017; Article No. 5. DOI:10.1145/3148226.3148232.

[16] Merkel, D. Docker: lightweight Linux containers for consistent development and deployment. Linux Journal. 2014; (239):76–90.

[17] Kurtzer, G.M., Sochat, V., Bauer, M.W. Singularity: Scientific containers for mobility of compute. PLOS ONE. 2017; 12(5). Accession Number: e0177459. Available at: https://doi.org/10.1371/journal.pone.0177459

[18] Gerhardt, L., Bhimji, W., Canon, S., Fasel, M., Jacobsen, D., Mustafa, M., Porter, J., Tsulaia, V. Shifter: Containers for HPC. Journal of Physics: Conference Series. 2017; (898):082021. DOI:10.1088/1742-6596/898/8/082021.

[19] Furlani, J.L. Modules: Providing a flexible user environment. Proceedings of the Fifth Large Installation Systems Administration Conference (LISA V). 1991: 141–152.

[20] Sorokin, A.A., Makogonov, S.V., Korolev, S.P. The information infrastructure for collective scientific work in the Far East of Russia. Scientific and Technical Information Processing. 2017; 44(4):302–304. DOI: 10.3103/S0147688217040153.

Bibliography link:
Smagin S.I., Sorokin A.A., Malkovsky S.I., Korolev S.P., Lukyanova O.A., Nikitin O.Y., Kondrashev V.A., Chernykh V.Y. The organization of effective multi-user operation of hybrid computing systems // Computational technologies. 2019. V. 24. № 5. P. 49-60