author={Viktor Losing and Barbara Hammer and Heiko Wersing},
keywords={Incremental learning, On-line learning, Data streams, Hyperparameter optimization, Model selection},
abstract={Recently, incremental and on-line learning gained more attention especially in the context of big data and learning from data streams, conflicting with the traditional assumption of complete data availability. Even though a variety of different methods are available, it often remains unclear which of them is suitable for a specific task and how they perform in comparison to each other. We analyze the key properties of eight popular incremental methods representing different algorithm classes. Thereby, we evaluate them with regards to their on-line classification error as well as to their behavior in the limit. Further, we discuss the often neglected issue of hyperparameter optimization specifically for each method and test how robustly it can be done based on a small set of examples. Our extensive evaluation on data sets with different characteristics gives an overview of the performance with respect to accuracy, convergence speed as well as model complexity, facilitating the choice of the best method for a given application.}
abstract={Deep learning is an effective method for extracting the underlying information in text. However, it performs better on closed datasets and is less effective in real-world scenarios for text classification. As the data is updated and the amount of data increases, the models need to be retrained, in what is often a long training process. Therefore, we propose a novel incremental learning strategy to solve these problems. Our method, called Learn#, includes four components: a Student model, a reinforcement learning (RL) module, a Teacher model, and a discriminator model. The Student models first extract the features from the texts, then the RL module filters the results of multiple Student models. After that, the Teacher model reclassifies the filtered results to obtain the final texts category. To avoid increasing the Student models unlimitedly as the number of samples increases, the discriminator model is used to filter the Student models based on their similarity. The Learn# method has the advantage of a shorter training time than the One-Time model, because it only needs to train a new Student model each time, without changing the existing Student models. Furthermore, it can also obtain feedback during application and tune the models parameters over time. Experiments on different datasets show that our method for text classification outperforms many traditional One-Time methods, reducing training time by nearly 80%.}