Big Data, Data Mining, and Machine Learning Research: June 2017

This paper was presented at PAKDD 2017 in Jeju, Korea.

Abstract
A data stream's concept may evolve over time, which is known as the concept drift. Concept drifts affect the prediction accuracy of the learning model and are required to be handled to maintain the model quality. In most cases, there is a trade-off between maintaining prediction quality and learning efficiency. We present a novel framework known as the Volatility-Adaptive Classifier System (VACS) to balance the trade-off. The system contains an adaptive classifier and a non-adaptive classifier. The former can maintain a higher prediction quality but requires additional computational overhead, and the latter requires less computational overhead but its prediction quality may be susceptible to concept drifts. VACS automatically applies the adaptive classifier when the concept drifts are frequent, and switches to the non-adaptive classifier when drifts are infrequent. As a result, VACS can maintain a relatively low computational cost while still maintaining a high enough overall prediction quality. To the best of our knowledge, this is the first data stream mining framework that applies different learners to reduce the learning overheads.

Summary

Data streams are sequences of unbounded data arriving in real time. For example, electricity usage records produced by a power station, online tweets generated in a region, transactions recorded in a stock market can all be presented as data streams. Such real-world data are generated in order and are considered to be infinite. The task of data stream mining is to find valuable information from these unbounded streams of data. Data stream's properties raise various requirements when designing data stream algorithms. Instances in a stream can arrive very fast, allowing only limited time and memory for the algorithm to learn its underlying concepts. Moreover, a data stream may evolve over time such that the underlying concepts in a stream may change. Consequently, the learning model loses prediction accuracy over time. This is known as concept drifts. To maintain the quality of a learning model, stream learning algorithms are expected to detect changes and update their models to overcome these concept drifts. The frequency of concept drifts is known as stream volatility \cite{huang2014detecting}. High volatility means a high frequency of concept drifts.

Some data stream learners can overcome a concept drift by adjusting their models to generalise the new concept and maintain a high prediction quality during the drift. These learning models can be classified as adaptive learners. Other data stream learners that cannot adjust their models are known as non-adaptive learners. Model adaptations come with a large computational cost. Thus, there is a trade-off between the model quality and the learning efficiency. It is also known that stream volatility may change in a stream over time. For example, in stock market transactions, an anomalous event can result in an increasing number of concept drifts over a short period. One way to balance the trade-off between model quality and efficiency is to apply the model adaptation only when the stream volatility is high to maintain a stable prediction quality. When the volatility becomes low, we disable the model adaptation to save cost. We are addressing this problem by creating a new learning framework containing both adaptive and non-adaptive learners.

We designed a framework called Volatility Adaptive Classifier System (VACS). VACS has lower computational cost than the state-of-art adaptive learner while maintaining a similar prediction quality in a stream with volatility changes. VACS is composed of both adaptive and non-adaptive classifiers. VACS uses stream volatility as the criterion to switch between classifiers. In particular, when the volatility is high, VACS applies the adaptive learner to maintain a better prediction quality. When volatility is low, it is deemed to be unnecessary to spend large overheads to handle infrequent concept drifts, so it switches to the non-adaptive classifier. As a result, VACS will maintain a sufficiently high prediction accuracy with relatively low overheads.

Our contributions are as follows: (1) We proposed a Volatility Adaptive Classifier System (VACS), which is able to choose between the adaptive classifier and the non-adaptive classifier given different levels of stream volatility. (2) We show that the accuracy of VACS is comparable to state-of-the-art techniques, while maintaining low computational cost. To the best of our knowledge, this is the first data stream learning technique that uses stream volatility to adjust model adaptation behaviour to reduce computational cost while maintaining high model quality.

Friday, June 2, 2017

Volatility-Adaptive Classifier System