Article information

2015 , Volume 20, ¹ 2, p.20-28

Berikov V.B., Pestunov I.A., Gerasimov M.K.

Method for clustering of heterogeneous time series

Purpose. The paper addresses the problem of partitioning of a set of multidimensional time series on groups of similar subsets (clusters). Each time series represents characteristics (qualitative or quantitative) of an object that changes in time. By assumptions, the data generating mechanism is unknown and may vary across the set of time series in the sense that the observed values of individual time series depend on one of the unobserved generative functions. Methodology. In this paper, we suggest a way to define a measure of difference between time series with the help of decision trees as approximation functions. The proposed dissimilarity measure satises some useful properties such as non-negativity, identity, and symmetry. Findings. We suggest a mathematical model of data generating mechanism and prove that if we have good approximations of initial well-distinguished generative functions then time series from same clusters are more similar to each other (in the sense of the proposed dissimilarity measure) than series from dierent clusters. Originality /value. The suggested approach makes it possible to determine distance/dissimilarity measure between time series with heterogeneous components, different lengths, large sizes and dimensions along with the interdependencies between observation values at different time points. The approach does not rely on prior assumptions about the data. It is simple to understand and interpret and can be combined with other decision making techniques such as regression analysis and clustering. The algorithm of time series clustering that utilizes the obtained dissimilarity matrix is also suggested.

[full text]
Keywords: multidimensional heterogeneous time series, cluster analysis, decision trees

Author(s):
Berikov Vladimir Borisovich
Dr. , Associate Professor
Position: General Scientist
Office: Sobolev Institute of mathematics Siberian Branch of Russian Academy of Science
Address: 630090, Russia, Novosibirsk, 4, Acad. Koptyug Avenue
Phone Office: (383) 3333291
E-mail: berikov@math.nsc.ru
SPIN-code: 8108-2591

Pestunov Igor Alekseevich
PhD. , Associate Professor
Position: Leading research officer
Office: Federal Research Center for Information and Computational Technologies
Address: 630090, Russia, Novosibirsk, Ac. Lavrentiev ave., 6
Phone Office: (383) 334-91-55
E-mail: pestunov@ict.nsc.ru
SPIN-code: 9159-3765

Gerasimov Maxim Konstantinovi
Position: Leader Expert
Office: Institute of Mathematics SB RAS
Address: 630090, Russia, Novosibirsk, Koptyug St., bl.4
Phone Office: (383) 3634667
E-mail: max_post@ngs.ru

References:
[1] Aggarwal, C., Reddy, C. Data Clustering: Algorithms and Applications. CRC Press; 2013: 652.
[2] Meesrikamolkul, W., Niennattrakul, V., Ratanamahatana, C. [Shape-based clustering for time series data]. Proc. 16th Pacific-Asia Conf., PAKDD 2012, Kuala Lumpur, Malaysia, May 29 – June 1, 2012. Part I:530–541.
[3] Corduas, M., Piccolo, D. Time series clustering and classification by the autoregressive metric. Computational Statistics & Data Analysis. 2008; 52(4):1860–1872.
[4] Ghassempour, S., Girosi, F., Maeder, A. Clustering Multivariate Time Series Using Hidden Markov Models. Intern. J. Environ. Res. Publ. Health. 2014; 11(3):2741–2763.
[5] Lbov, G.S., Berikov, V.B. Ustoychivost' reshayushchikh funktsiy v zadachakh raspoznavaniya obrazov i analiza raznotipnoy informatsii [Stability of decision functions in problems of pattern recognition and analysis of heterogeneous information]. Novosibirsk: Izd-vo Instituta Matematiki; 2005: 218. (In Russ.)
[6] Lbov, G.S., Pestunova, T.M. Gruppirovka ob"ektov v prostranstve raznotipnykh priznakov. Analiz nechislovoy informatsii v sotsiologicheskikh issledovaniyakh [Grouping of objects in the space of heterogeneous features. Analysis of Non-numeric Information in Sociological Researches]. Moscow: Nauka; 1985: 141–149. (In Russ.)
[7] Lbov, G.S., Pestunova, T.M. Postroenie dereva razbieniy v zadache gruppirovki ob"ektov s ispol'zovaniem logicheskikh funktsiy [Construction of partition tree in the problem of grouping of objects with use of logical functions]. Vychislitel'nye sistemy. 1986; (117):63–77. (In Russ.)
[8] Berikov, V.B. Grouping of objects in a space of heterogeneous variables with the use of taxonomic decision trees. Pattern Recognition and Image Analysis. 2011; 21(4):591–598.
[9] Berikov, V.B., Pestunov, I.A., Gerasimov, M.K. Analiz sovokupnosti raznotipnykh vremennykh ryadov s ispol'zovaniem logicheskikh reshayushchikh funktsiy [Analysis of a set of heterogeneous time series with use logical decision functions]. Vychislitel'nye tekhnologii. 2012; 17(5):12–22. (In Russ.)
[10] Vikent’ev, A.A. Distances and degrees of uncertainty in many-valued propositions of experts and application of these concepts in problems of pattern recognition and clustering. Pattern Recognition and Image Analysis. 2014; 24(4):489–501.
[11] Berikov, V.B. Weighted ensemble of algorithms for complex data clustering. Pattern Recognition Letters. 2014; (38):99–106.


Bibliography link:
Berikov V.B., Pestunov I.A., Gerasimov M.K. Method for clustering of heterogeneous time series // Computational technologies. 2015. V. 20. ¹ 2. P. 20-28
Home| Scope| Editorial Board| Content| Search| Subscription| Rules| Contacts
ISSN 1560-7534
© 2024 FRC ICT