Article information
2015 , Volume 20, ¹ 2, p.20-28
Berikov V.B., Pestunov I.A., Gerasimov M.K.
Method for clustering of heterogeneous time series
Purpose. The paper addresses the problem of partitioning of a set of multidimensional time series on groups of similar subsets (clusters). Each time series represents characteristics (qualitative or quantitative) of an object that changes in time. By assumptions, the data generating mechanism is unknown and may vary across the set of time series in the sense that the observed values of individual time series depend on one of the unobserved generative functions. Methodology. In this paper, we suggest a way to define a measure of difference between time series with the help of decision trees as approximation functions. The proposed dissimilarity measure satises some useful properties such as non-negativity, identity, and symmetry. Findings. We suggest a mathematical model of data generating mechanism and prove that if we have good approximations of initial well-distinguished generative functions then time series from same clusters are more similar to each other (in the sense of the proposed dissimilarity measure) than series from dierent clusters. Originality /value. The suggested approach makes it possible to determine distance/dissimilarity measure between time series with heterogeneous components, different lengths, large sizes and dimensions along with the interdependencies between observation values at different time points. The approach does not rely on prior assumptions about the data. It is simple to understand and interpret and can be combined with other decision making techniques such as regression analysis and clustering. The algorithm of time series clustering that utilizes the obtained dissimilarity matrix is also suggested.
[full text] Keywords: multidimensional heterogeneous time series, cluster analysis, decision trees
Author(s): Berikov Vladimir Borisovich Dr. , Associate Professor Position: General Scientist Office: Sobolev Institute of mathematics Siberian Branch of Russian Academy of Science Address: 630090, Russia, Novosibirsk, 4, Acad. Koptyug Avenue
Phone Office: (383) 3333291 E-mail: berikov@math.nsc.ru SPIN-code: 8108-2591Pestunov Igor Alekseevich PhD. , Associate Professor Position: Leading research officer Office: Federal Research Center for Information and Computational Technologies Address: 630090, Russia, Novosibirsk, Ac. Lavrentiev ave., 6
Phone Office: (383) 334-91-55 E-mail: pestunov@ict.nsc.ru SPIN-code: 9159-3765Gerasimov Maxim Konstantinovi Position: Leader Expert Office: Institute of Mathematics SB RAS Address: 630090, Russia, Novosibirsk, Koptyug St., bl.4
Phone Office: (383) 3634667 E-mail: max_post@ngs.ru
References: [1] Aggarwal, C., Reddy, C. Data Clustering: Algorithms and Applications. CRC Press; 2013: 652. [2] Meesrikamolkul, W., Niennattrakul, V., Ratanamahatana, C. [Shape-based clustering for time series data]. Proc. 16th Pacific-Asia Conf., PAKDD 2012, Kuala Lumpur, Malaysia, May 29 – June 1, 2012. Part I:530–541. [3] Corduas, M., Piccolo, D. Time series clustering and classification by the autoregressive metric. Computational Statistics & Data Analysis. 2008; 52(4):1860–1872. [4] Ghassempour, S., Girosi, F., Maeder, A. Clustering Multivariate Time Series Using Hidden Markov Models. Intern. J. Environ. Res. Publ. Health. 2014; 11(3):2741–2763. [5] Lbov, G.S., Berikov, V.B. Ustoychivost' reshayushchikh funktsiy v zadachakh raspoznavaniya obrazov i analiza raznotipnoy informatsii [Stability of decision functions in problems of pattern recognition and analysis of heterogeneous information]. Novosibirsk: Izd-vo Instituta Matematiki; 2005: 218. (In Russ.) [6] Lbov, G.S., Pestunova, T.M. Gruppirovka ob"ektov v prostranstve raznotipnykh priznakov. Analiz nechislovoy informatsii v sotsiologicheskikh issledovaniyakh [Grouping of objects in the space of heterogeneous features. Analysis of Non-numeric Information in Sociological Researches]. Moscow: Nauka; 1985: 141–149. (In Russ.) [7] Lbov, G.S., Pestunova, T.M. Postroenie dereva razbieniy v zadache gruppirovki ob"ektov s ispol'zovaniem logicheskikh funktsiy [Construction of partition tree in the problem of grouping of objects with use of logical functions]. Vychislitel'nye sistemy. 1986; (117):63–77. (In Russ.) [8] Berikov, V.B. Grouping of objects in a space of heterogeneous variables with the use of taxonomic decision trees. Pattern Recognition and Image Analysis. 2011; 21(4):591–598. [9] Berikov, V.B., Pestunov, I.A., Gerasimov, M.K. Analiz sovokupnosti raznotipnykh vremennykh ryadov s ispol'zovaniem logicheskikh reshayushchikh funktsiy [Analysis of a set of heterogeneous time series with use logical decision functions]. Vychislitel'nye tekhnologii. 2012; 17(5):12–22. (In Russ.) [10] Vikent’ev, A.A. Distances and degrees of uncertainty in many-valued propositions of experts and application of these concepts in problems of pattern recognition and clustering. Pattern Recognition and Image Analysis. 2014; 24(4):489–501. [11] Berikov, V.B. Weighted ensemble of algorithms for complex data clustering. Pattern Recognition Letters. 2014; (38):99–106.
Bibliography link: Berikov V.B., Pestunov I.A., Gerasimov M.K. Method for clustering of heterogeneous time series // Computational technologies. 2015. V. 20. ¹ 2. P. 20-28
|