Article information

2021 , Volume 26, ¹ 5, p.95-105

Zhukova G.N., Ulyanov M.V.

The influence of the cardinality of the alphabet on the quality of reconstruction of a symbolic periodic sequence from a sequence with noise

The relevance of this study is associated with the presence of a wide range of applied problems in real-world data processing and analysis. It is sensible to encode information using symbols from a finite alphabet in such problems. By varying the cardinality of the alphabet, in the description of the process, the symbolic representation provides a level of detail sufficient for real-world data analysis. However, for a number of subject areas in which it is possible to use symbolic coding of trajectories of the examined processes researchers face the presence of distortions, noise, and fragmentation of information. This occurs in bioinformatics, medicine, digital economy, time series forecasting and analysis of business processes. Periodic processes are widely represented in these subject areas. Without noise, these processes correspond to periodic symbolic sequences, i.e. words over a finite alphabet. A researcher often receives a sequence distorted by noises of various origins as the experimental data, instead of the expected periodic symbolic sequence. Under these conditions, when solving the problem of identifying the periodicity, which includes both the determination of a periodically repeating symbolic fragment and its length, hereinafter called the period, the problem requires reducing the effect of noise on the experimental results.

The article deals with the problem of recovering periodic sequences, distorted by presence of noise along the replaced and deleted symbols. Since the level of detail in the description of the process depends on the cardinality of the alphabet, it is of interest to study the influence of the level of detail in the symbolic description on the possibility of recovering complete information about the initially periodic sequences.

The article experimentally examines the dependence of the cardinality of the alphabet on the quality characteristics of the period recovery method proposed by the authors. For alphabets of different cardinalities, the proportion of sequences with a satisfactorily reconstructed period and the relative error in determining the length of the period are given. The quality of reconstruction of a periodically repeating fragment is estimated by the ratio of the editing distance from the reconstructed periodic sequence to the original sequence distorted by noise

Keywords: symbolic sequence, cardinality of an alphabet, periodic sequence, sequence with noise, noise of insertion, noise of deletion, noise of change

doi: 10.25743/ICT.2021.26.5.008

Zhukova Galina Nikolayevna
PhD. , Associate Professor
Position: Associate Professor
Office: HSE University
Address: 101000, Russia, Moscow, 20, Myasnitskaya ulitsa
SPIN-code: 5754-5615

Ulyanov Mikhail Vasilievich
Dr. , Professor
Position: Professor
Office: V. A. Trapeznikov Institute of Control Sciences of Russian Academy of Sciences, Lomonosov Moscow State University
Address: 117997, Russia, Moscow, 65 Profsoyuznaya street
Phone Office: (495) 334-89-10

Zhukova G.N., Ulyanov M.V. The influence of the cardinality of the alphabet on the quality of reconstruction of a symbolic periodic sequence from a sequence with noise // Computational technologies. 2021. V. 26. ¹ 5. P. 95-105
