Tree-based methods for clustering time series using domain-relevant attributes

Jun 1, 2019·

Mahsa Ashouri

Galit Shmueli

Chor-Yiu Sin

· 0 min read

PDF Source Document

Abstract

This research proposes two new methods for clustering time series that capture temporal information (trend, seasonality, and autocorrelation) and domain-relevant cross-sectional attributes. The methods are based on model-based partitioning (MOB) trees and can be used as an automated yet transparent tool for clustering a large collection of time series. Using a single linear regression model, the single-step method clusters series using trend, seasonality, time series lags, and domain-relevant cross-sectional attributes. The two-step method first clusters by trend, seasonality, and domain-relevant cross-sectional attributes, then further clusters the residuals series by autocorrelation and the domain-relevant cross-sectional attributes. Both methods produce clusters that are interpretable by domain experts. We illustrate the usefulness of the proposed clustering approach by considering one-step-ahead forecasting. We present empirical results comparing our approach to forecasting each series using an Autoregressive Integrated Moving Average (ARIMA) model applied to a large set of Wikipedia article pageviews time series. Our results show that the tree-based approach produces forecasts practically on par with ARIMA models yet are significantly faster and more efficient, thereby suitable for scaling to large collections of time series. Moreover, our method produces simple parametric forecasting models for interpretable clusters of time series, whereas ARIMA cannot provide such interpretability.

Type

Journal article

Publication

Journal of Business Analytics