Article information

2022 , Volume 27, ¹ 3, p.46-65

Belov V.A., Illin D.Y., Nikulchev E.V.

Comparative evaluation of the efficiency of data processing by storing the data in a relational database and column format files

Purpose. In the process of developing information and analytical systems, the choice of the most effective tool for data storage is important. The purpose of the presented study is to compare the data processing features for various data storage tools. Analysis of these features in the dynamics of the growth of the data volume is an important issue.

Methodology. Stands were prepared for experimental evaluation of the two presented alternatives. As evaluation criteria, the data volume, processing time, the use of RAM and processor resources as well as the dynamics of changes in the characteristics with a change in the data volume was chosen. Two data queries were prepared that contain different requirements for obtaining results: filtering and data aggregation. For evaluation, both one and several simultaneously running queries were launched.

Findings. Numerical characteristics of the examined criteria were obtained. The processing speed when using a relational database was several times higher than the results obtained when using a big data processing system. As the volume of data grows, big data processing systems perform better. Regarding characteristics such as the data volume, the use of column formats is more efficient for any amount of data.

Value. The results showed the feasibility of using a relational database with small amounts of data. As the volume of data grows, it is necessary to use alternative ways of storing and processing data, which suggests that when designing a system, not only the analysis of the data structure is required, but also the estimated volume.

[full text] [link to elibrary.ru]

Keywords: big data, data storage formats, relational databases, PostgreSQL, Apache Hive

doi: 10.25743/ICT.2022.27.3.005

Author(s):
Belov Vladimir Aleksandrovich
Position: Student
Office: MIREA Russian Technological University
Address: 119454, Russia, ------Âûáðàòü èç ñïèñêà------, avenue Vernadskogo,78
Phone Office: (995) 001-32-67
E-mail: belov_v.a@mail.ru

Illin Dmitry Yurievich
PhD. , Associate Professor
Position: Associate Professor
Office: MIREA Russian Technological University
Address: 119454, Russia, ------Âûáðàòü èç ñïèñêà------, Avenue Vernadsky,78
Phone Office: (926) 617-63-01
SPIN-code: 5801-3500

Nikulchev Evgeny Vitalievich
Dr.
Position: Professor
Office: MIREA Russian Technological University
Address: 119454, Russia, ------Âûáðàòü èç ñïèñêà------, avenue Vernadskogo,78
Phone Office: (916) 2324317
E-mail: nikulchev@mail.ru
SPIN-code: 9380-1627

References:

1. Gusev A., Ilin D., Kolyasnikov P., Nikulchev E. Effective selection of software components based on experimental evaluations of quality of operation. Engineering Letters. 2020; 28(2):420–427.

2. Belov V., Tatarintsev A., Nikulchev E. Choosing a data storage format in the Apache Hadoop system based on experimental evaluation using Apache Spark. Symmetry. 2021; 13(2):195. DOI:10.3390/sym13020195.

3. PostgreSQL. Available at: http://www.postgresql.org (accessed January 31, 2022).

4. Andjelic S., Obradovic S., Gacesa B. A performance analysis of the DBMS — MySQL vs PostgreSQL. Communications — Scientific Letters of the University of Zilina. 2008; 10(4):53–57.

5. Lee S., Jo J.Y., Kim Y. Survey of data locality in Apache Hadoop. 2019 IEEE International Conference on Big Data, Cloud Computing, Data Science & Engineering (BCD). Honolulu, USA; 2019: 46–53.

6. Camacho-Rodriguez J., Chauhan A., Gates A., Koifman E., O’Malley O., Garg V., Haindrich Z., Shelukhin S., Jayachandran P., Seth S., Jaiswal D., Bouguerra S., Bangarwa N., Hariappan S., Agarwal A., Dere J., Dai D., Nair T., Dembla N., Vijayaraghavan G., Hagleitner G. Apache Hive: from MapReduce to enterprise-grade Big Data warehousing. Proceedings of the 2019 International Conference on Management of Data (SIGMOD’19). N.Y., USA: Association for Computing Machinery; 2019: 1773–1786. DOI:10.1145/3299869.3314045.

7. Apache Parquet. Available at: https://parquet.apache.org/documentation/latest.

8. Alasta A.F., Enaba M.A. Data warehouse on manpower employment for decision support system. International Journal of Computing, Communication and Instrumentation Engineering. 2014; (1):48–53.

9. Cappa F., Oriani R., Peruffo E., McCarthy I.P. Big Data for creating and capturing value in the digitalized environment: unpacking the effects of volume, variety and veracity on firm performance. Journal of Product Innovation Management. 2021; 38(1):49–67.

10. Martins P., Tome P., Wanzeller C., Sa F., Abbasi M. Comparing Oracle and PostgreSQL, performance and optimization. Trends and Applications in Information Systems and Technologies. Springer, Cham; 2021: 1366. DOI:10.1007/978-3-030-72651-5_46.

11. Truica C.-O., Boicea A., Radulescu F. Asynchronous replication in Microsoft SQL Server, PostgreSQL and MySQL. International Conference on Cyber Science and Engineering. China, Guangzhou; 2013: 50–55.

12. Li Y., Manoharan S. A performance comparison of SQL and NoSQL databases. 2013 IEEE Pacific Rim Conference on Communications, Computers and Signal Processing (PACRIM). 2013: 15–19. DOI:10.1109/PACRIM.2013.6625441.

13. Cattell R. Scalable SQL and NoSQL data stores. SIGMOD Record. 2010; 39(4):12–27. DOI:10.1145/1978915.1978919.

14. Parker Z., Poe S., Vrbsky S. Comparing NoSQL MongoDB to an SQL DB. Proceedings of the 51st ACM Southeast Conference (ACMSE’13). N.Y., USA: Association for Computing Machinery; 2013: 1–6. DOI:10.1145/2498328.2500047.

15. Jung M., Youn S., Bae J., Choi Y. A study on data input and output performance comparison of MongoDB and PostgreSQL in the Big Data environment. 2015 8th International Conference on Database Theory and Application (DTA). Jeju, Korea (South): IEEE; 2015: 14–17.
DOI:10.1109/DTA.2015.14.

16. Kumar R., Gupta N., Charu Sh., Bansal S., Yadav K. Comparison of SQL with HiveQL. International Journal for Research in Technological Studies. 2014; 1(9):28–30.

17. Ahmed S., Ali M.U., Ferzund J., Sarwar M.A., Rehman A., Mehmood A. Modern data formats for Big Bioinformatics Data analytics. International Journal of Advanced Computer Science and Applications. 2017; 8(4):366–377. DOI:10.14569/IJACSA.2017.080450.

18. Plase D., Niedrite L., Taranovs R. A comparison of HDFS compact data formats: Avro Versus Parquet. Mokslas–Lietuvos Ateitis/Science-Future of Lithuania. 2017; 9(3):267–276.

19. Izergin D.A., Eremeev M.A., Magomedov S.G., Smirnov S.I. Information security evaluation for Android mobile operating system. Russian Technological Journal. 2019; 7(6):44–55. DOI:10.32362/2500-316X-2019-7-6-44-55. (In Russ.)

20. Nikulchev E., Ilin D., Gusev A. Technology stack selection model for software design of digital platforms. Mathematics. 2021; 9(4):308. DOI:10.3390/math9040308.

21. Patroni. Available at: https://patroni.readthedocs.io/en/latest (accessed March 01, 2022).

22. etcd. Available at: https://etcd.io (accessed March 01, 2022).

23. Belov V., Kosenkov A.N., Nikulchev E. Experimental characteristics study of data storage formats for data marts development within Data Lakes. Applied Sciences. 2021; 11(18):8651. DOI:10.3390/app11188651.

24. Chaudhuri S., Dayal U. An overview of data warehousing and OLAP technology. SIGMOD Record. 1997; 26(1):65–74. DOI:10.1145/248603.248616.

25. Popescul A., Flake G.W., Lawrence S., Ungar L.H., Giles C.L. Clustering and identifying temporal trends in document databases. Proceedings IEEE Advances in Digital Libraries 2000. Washington, USA; 2000: 173–182. DOI:10.1109/ADL.2000.848380.

26. Li D., Han L., Ding Y. SQL query optimization methods of relational database system. 2010 Second International Conference on Computer Engineering and Applications. Bali, Island; 2010: 557–560. DOI:10.1109/ICCEA.2010.113.

Bibliography link:
Belov V.A., Illin D.Y., Nikulchev E.V. Comparative evaluation of the efficiency of data processing by storing the data in a relational database and column format files // Computational technologies. 2022. V. 27. ¹ 3. P. 46-65
Home| Scope| Editorial Board| Content| Search| Subscription| Rules| Contacts
ISSN 1560-7534
© 2024 FRC ICT