Research Article Data Stream Classification Based...

18
Research Article Data Stream Classification Based on the Gamma Classifier Abril Valeria Uriarte-Arcia, 1 Itzamá López-Yáñez, 2 Cornelio Yáñez-Márquez, 1 João Gama, 3 and Oscar Camacho-Nieto 2 1 Neural Networks and Unconventional Computing Lab/Alpha-Beta Group, Centro de Investigaci´ on en Computaci´ on, Instituto Polit´ ecnico Nacional, Avenida Juan de Dios B´ atiz, Colonia Nuevo Industrial Vallejo, Delegaci´ on Gustavo A. Madero, 07738 Mexico City, DF, Mexico 2 Intelligent Computing Lab/Alpha-Beta Group, Centro de Innovaci´ on y Desarrollo Tecnol´ ogico en C´ omputo, Instituto Polit´ ecnico Nacional, Avenida Juan de Dios B´ atiz, Colonia Nuevo Industrial Vallejo, Delegaci´ on Gustavo A. Madero, 07700 Mexico City, DF, Mexico 3 LIAAD-INESC INESC TEC and Faculty of Economics, University of Porto, Rua Dr. Roberto Frias 378, 4200-378 Porto, Portugal Correspondence should be addressed to Abril Valeria Uriarte-Arcia; [email protected] Received 13 April 2015; Revised 7 July 2015; Accepted 29 July 2015 Academic Editor: Konstantinos Karamanos Copyright © 2015 Abril Valeria Uriarte-Arcia et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. e ever increasing data generation confronts us with the problem of handling online massive amounts of information. One of the biggest challenges is how to extract valuable information from these massive continuous data streams during single scanning. In a data stream context, data arrive continuously at high speed; therefore the algorithms developed to address this context must be efficient regarding memory and time management and capable of detecting changes over time in the underlying distribution that generated the data. is work describes a novel method for the task of pattern classification over a continuous data stream based on an associative model. e proposed method is based on the Gamma classifier, which is inspired by the Alpha-Beta associative memories, which are both supervised pattern recognition models. e proposed method is capable of handling the space and time constrain inherent to data stream scenarios. e Data Streaming Gamma classifier (DS-Gamma classifier) implements a sliding window approach to provide concept driſt detection and a forgetting mechanism. In order to test the classifier, several experiments were performed using different data stream scenarios with real and synthetic data streams. e experimental results show that the method exhibits competitive performance when compared to other state-of-the-art algorithms. 1. Introduction In recent years, technological advances have promoted the generation of a vast amount of information from different areas of knowledge: sensor networks, financial data, fraud detection, and web data, among others. According to the study performed by IDC (International Data Corporation) [1], the digital universe in 2013 was estimated in 4.4 trillion gigabytes. From this digital data, only 22% would be a can- didate for analysis, while the available storage capacity could hold just 33% of the generated information. Under this sce- nario, the extraction of knowledge is becoming a very chal- lenging task. We are now confronted with the problem of handling very large datasets and even with the possibility of an infinite data flow. erefore the algorithms developed to address this context, unlike traditional ones, must meet some constraints, as defined in [2], work with a limited amount of time, and use a limited amount of memory, and one or only few pass over the data. ey also should be capable of reacting to concept driſt, that is, changes in the distribution of the data over time. In [3], more details of the requirements to be considered for data streams algorithms can be found. As the idea suggests, a data stream can roughly be thought of as an ordered endless sequence of data items, where the input arrives more or less continuously as time progresses [4]. e extensive research on data stream algorithms performed over the last years has produced a large variety of techniques for data stream problems. Several relevant data mining tasks Hindawi Publishing Corporation Mathematical Problems in Engineering Volume 2015, Article ID 939175, 17 pages http://dx.doi.org/10.1155/2015/939175

Transcript of Research Article Data Stream Classification Based...

Page 1: Research Article Data Stream Classification Based …downloads.hindawi.com/journals/mpe/2015/939175.pdfuncertain data [ ], classi cation with unlabeled instances [ , , ], novel class

Research ArticleData Stream Classification Based on the Gamma Classifier

Abril Valeria Uriarte-Arcia1 Itzamaacute Loacutepez-Yaacutentildeez2 Cornelio Yaacutentildeez-Maacuterquez1

Joatildeo Gama3 and Oscar Camacho-Nieto2

1Neural Networks and Unconventional Computing LabAlpha-Beta Group Centro de Investigacion en ComputacionInstituto Politecnico Nacional Avenida Juan de Dios Batiz Colonia Nuevo Industrial Vallejo Delegacion Gustavo A Madero07738 Mexico City DF Mexico2Intelligent Computing LabAlpha-Beta Group Centro de Innovacion y Desarrollo Tecnologico en ComputoInstituto Politecnico Nacional Avenida Juan de Dios Batiz Colonia Nuevo Industrial Vallejo Delegacion Gustavo A Madero07700 Mexico City DF Mexico3LIAAD-INESC INESC TEC and Faculty of Economics University of Porto Rua Dr Roberto Frias 378 4200-378 Porto Portugal

Correspondence should be addressed to Abril Valeria Uriarte-Arcia auriarteb10sagitariocicipnmx

Received 13 April 2015 Revised 7 July 2015 Accepted 29 July 2015

Academic Editor Konstantinos Karamanos

Copyright copy 2015 Abril Valeria Uriarte-Arcia et al This is an open access article distributed under the Creative CommonsAttribution License which permits unrestricted use distribution and reproduction in any medium provided the original work isproperly cited

The ever increasing data generation confronts us with the problem of handling online massive amounts of information One of thebiggest challenges is how to extract valuable information from these massive continuous data streams during single scanning Ina data stream context data arrive continuously at high speed therefore the algorithms developed to address this context must beefficient regarding memory and time management and capable of detecting changes over time in the underlying distribution thatgenerated the data This work describes a novel method for the task of pattern classification over a continuous data stream basedon an associative model The proposed method is based on the Gamma classifier which is inspired by the Alpha-Beta associativememories which are both supervised pattern recognition models The proposed method is capable of handling the space and timeconstrain inherent to data stream scenarios The Data Streaming Gamma classifier (DS-Gamma classifier) implements a slidingwindow approach to provide concept drift detection and a forgetting mechanism In order to test the classifier several experimentswere performed using different data stream scenarios with real and synthetic data streams The experimental results show that themethod exhibits competitive performance when compared to other state-of-the-art algorithms

1 Introduction

In recent years technological advances have promoted thegeneration of a vast amount of information from differentareas of knowledge sensor networks financial data frauddetection and web data among others According to thestudy performed by IDC (International Data Corporation)[1] the digital universe in 2013 was estimated in 44 trilliongigabytes From this digital data only 22 would be a can-didate for analysis while the available storage capacity couldhold just 33 of the generated information Under this sce-nario the extraction of knowledge is becoming a very chal-lenging task We are now confronted with the problem ofhandling very large datasets and even with the possibility of

an infinite data flow Therefore the algorithms developed toaddress this context unlike traditional ones must meet someconstraints as defined in [2] work with a limited amountof time and use a limited amount of memory and one oronly few pass over the data They also should be capable ofreacting to concept drift that is changes in the distributionof the data over time In [3] more details of the requirementsto be considered for data streams algorithms can be found

As the idea suggests a data stream can roughly be thoughtof as an ordered endless sequence of data items where theinput arrivesmore or less continuously as time progresses [4]The extensive research on data stream algorithms performedover the last years has produced a large variety of techniquesfor data stream problems Several relevant data mining tasks

Hindawi Publishing CorporationMathematical Problems in EngineeringVolume 2015 Article ID 939175 17 pageshttpdxdoiorg1011552015939175

2 Mathematical Problems in Engineering

have been addressed clustering classification predictionand concept drift detection Data stream algorithms can beroughly divided into single classifier approach and ensembleapproach The single model approach can be further dividedinto model based approach and instance based approachModel based approaches have the disadvantage of the com-putational cost of updating the model every time that a newelement of the stream arrives In instance based approachesjust the representative subset of elements (case base) fromthe data stream have to be adapted this does not comefor free since the computational cost is reflected duringthe classification stage [4] The single classifier approachincludes several data mining techniques such as decisiontrees decision rules instance based learners (IBL) andneuralnetworks among others A more detailed taxonomy of datastream algorithms can be found in [5]

Decision trees have been widely used in data stream clas-sification [6ndash9] One of the most representative models fordata stream classification using decision trees was proposedby Domingos and Hulten [6 7] and has been the inspirationfor several of the tree models used for data stream mining In[6] the authors proposed a Very Fast Decision Tree (VFDT)learner that uses Hoeffding bounds which guarantee that thetree can be trained in constant timeAn extension of thisworkwas presented in [7] where the decision trees are built froma sliding window of fixed size Trees are also commonly usedas base learners for ensemble approaches

Decision rules models have also been applied to datastream context One of the first methods proposed for ruleinduction in an online environment was FLORA (FLOatingRough Approximation) [10] To deal with concept drift andhidden contexts this framework uses three techniques keeponly a window of trusted examples store concept descrip-tions to reuse them when a previous context reappearsand use heuristics to control these two functions Anotherrelevant work was presented by Ferrer-Troyano et al in [11]The method is called FACIL (Fast and Adaptive Classifierby Incremental Learning) an incremental classifier based ondecision rules The classifier uses a partial instance memorywhere examples are rejected if they do not describe a decisionboundary

Algorithms for data stream context have to be incre-mental and highly adaptive In IBL algorithms incrementallearning and model adaptation are very simple since thesemethods come down to adapt the case base (ie represen-tative examples) This allows obtaining a flexible and easilyupdated model without complex computations that opti-mize the update and classification times [12] Beringer andHullermeier proposed an IBL algorithm for data streamclassification [4] the system maintains an implicit conceptdescription in the form of a case base that is updatedtaking into account three indicators temporal relevancespatial relevance and consistency On the arrival of a newexample this example is incorporated into the case baseand then a replacement policy is applied based on the threepreviously mentioned indicators In [12] a new techniquenamed Similarity-Based Data Stream Classifier (SimC) isintroduced The system maintains a representative and small

set of information in order to conserve the distributionof classes over the stream It controls noise and outliersapplying an insertionremoval policy designed for retainingthe best representation of the knowledge using appropriateestimators Shaker and Hullermeier presented an instancebased algorithm that can be applied to classification andregression problems [13] The algorithm is called IBLStreamand it optimizes the composition and size of the case baseautonomously On arrival of a new example this example isfirst added to the case base and then it is checked whetherother examples might be removed either since they havebecome redundant or since they are noisy data

For data stream scenarios neural network-based modelshave hardly been used one of themain issues of this approachis the weights update caused by the arrival of a new exampleThe influence on the network of this single example cannotsimply be canceled later on at best it can be reduced grad-ually over the course of time [12] However there are someworks where neural networks are used for data stream tasks[14] where an online perceptron is used to classify non-stationary imbalance data streams They have also been usedas base learners for ensemble methods [15 16]

The ensemble methods have also been extensively used indata streamclassificationThe ensemble approach as its nameimplies maintains in memory an ensemble of multiple mod-els whose individual predictions are combinedmdashgenerally byaverage or voting techniquesmdashto output a final prediction [2]The main idea behind an ensemble classifier is based on theconcept that different learning algorithms explore differentsearch spaces and evaluation of the hypothesis [3] Differentclassifiers can be used as base learners in the ensemblemethods for instance tree [17ndash19] Naıve Bayes [20 21]neural networks [15] and KNN [22]

SVMs have been used for different tasks in the datastreams scenarios such as sentiment analysis fault detectionand prediction medical applications spam detection andmultilabel classification In [23] a fault-detection systembased on data stream prediction is proposed SVM algorithmis used to classify the incoming data and trigger a fault alarmin case the data is considered abnormal Sentiment analysishas also been addressed using SVM methods Kranjc et al[24] introduce a cloud-based scientific workflow platformwhich is able to perform online dynamic adaptive sentimentanalysis on twitter dataThey used a linear SVM for sentimentclassification While this work does not focus on a specifictopic Smailovic et al [25] present a methodology that ana-lyzes whether the sentiment expressed in twitter feeds whichdiscuss selected companies can indicate their stock pricechanges They propose the use of the Pegasos SVM [26] toclassify the tweets as positive negative and neutral Then thetweets classified as positive were used to calculate a sentimentprobability for each day and this is used in a Grangercausality analysis to correlate positive sentiment probabilityand stock closing price In [27] a stream clustering SVMalgorithm to treat spam identification was presented Whilein general this problem has previously been addressed as aclassification problem they handle it as an anomaly detectionproblem Krawczyk and Wozniak [28] proposed a version of

Mathematical Problems in Engineering 3

incremental One-Class Support Vector Machine that assignsweights to each object according to its level of significanceThey also introduce two schemes for estimating weights fornew incoming data and examine their usefulness

Another well studied issue in the data stream contextis concept drift detection The change in the concept mightoccur due to changes in hidden variables affecting the phe-nomena or changes in the characteristics of the observedvariables According to [29] two main tactics are usedin concept drift detection (1) monitoring the evolution ofperformance indicators using statistical techniques and (2)monitoring distribution of two different time windows Somealgorithms use methods that adapt the decision model atregular intervals without taking into account if change reallyhappened this involves extra processing time On the otherhand explicit changemethods can detect the point of changequantify the change and take the needed actions In generalthemethods for concept drift detection can be integratedwithdifferent base learners Somemethods related to this issue canbe found in [30ndash32]

Some other issues related to data stream classificationhave also been addressed such as unbalanced data streamclassification [14 15 19] classification in the presence ofuncertain data [8] classification with unlabeled instances[8 33 34] novel class detection [22 35] and multilabelclassification [36]

In this paper we describe a novel method for the taskof pattern classification over a continuous data stream basedon an associative model The proposed method is based onthe Gamma classifier [37] which is a supervised patternrecognition model inspired on the Alpha-Beta associativememories [38] The proposal has been developed to workin data stream scenarios using a sliding window approachand it is capable of handling the space and time constraintsinherent to this type of scenarios Various methods havebeen designed for data stream classification but to ourknowledge the associative approach has never been used forthis kind of problems The main contribution of this workis the implementation of a novel method for data streamclassification based on an associative model

The rest of this paper is organized as follows Section 2describes all the materials and methods needed to developour proposal Section 3 describes how the experimental phasewas conducted and discusses the results Some conclusionsare presented in Section 4 and finally the Acknowledgmentsand References are included

2 Materials and Methods

21 Gamma Classifier This classifier is a supervised methodwhose name is derived from the similarity operator that ituses the generalized Gamma operatorThis operator takes asinput two binary vectors 119909 and 119910 and a positive integer 120579and returns 1 if both vectors are similar (120579 being the degree ofallowed dissimilarity) or 0 otherwise The Gamma operatoruses other operators (namely 120572 120573 and 119906120573) that will beintroduced first The rest of this section is strongly based on[37 38]

Table 1 The Alpha (120572) and Beta (120573) operators

(a)

120572 119860 times 119860 rarr 119861

119909 119910 120572(119909 119910)

0 0 10 1 01 0 21 1 1

(b)

120573 119861 times 119860 rarr 119860

119909 119910 120573(119909 119910)

0 0 00 1 01 0 01 1 12 0 12 1 1

Definition 1 (Alpha and Beta operators) Given the sets 119860 =

0 1 and 119861 = 0 1 2 the Alpha (120572) and Beta (120573) operatorsare defined in a tabular form as shown in Table 1

Definition 2 (Alpha operator applied to vectors) Let 119909 119910 isin

119860119899 with 119899 isin Z+ be the input column vectors The output for

120572(119909 119910) is 119899-dimensional vector whose 119894th component (119894 =1 2 119899) is computed as follows

120572 (119909 119910)119894 = 120572 (119909119894 119910119894) (1)

Definition 3 (119906120573 operator) Considering the binary pattern119909 isin 119860

119899 as input this unary operator gives the following non-negative integer as output and is calculated as follows

119906120573 (119909) =

119899

sum

119894=1120573 (119909119894 119909119894) (2)

Thus if 119909 = [1 1 0 0 1 0] then

119906120573 (119909) = 120573 (1 1) + 120573 (1 1) + 120573 (0 0) + 120573 (0 0)

+ 120573 (1 1) + 120573 (0 0) = 1+ 1+ 0+ 0+ 1+ 0

= 3

(3)

Definition 4 (pruning operator) Let 119909 isin 119860119899 and 119910 isin 119860119898 let

119899119898 isin Z+ 119899 lt 119898 be two binary vectors then 119910 pruned by119909 denoted by 119910119909 is 119899-dimensional binary vector whose 119894thcomponent is defined as follows

(1199101003817100381710038171003817119909)119894

= 119910119894+119898minus119899 (4)

where (119894 = 1 2 119899)

4 Mathematical Problems in Engineering

For instance let 119910 = [1 1 1 1 0 1] 119909 = [0 1 0]119898 =

6 and then

(1199101003817100381710038171003817119909)1 = 1199101+6minus3 = 1199104

(1199101003817100381710038171003817119909)2 = 1199102+6minus3 = 1199105

(1199101003817100381710038171003817119909)3 = 1199103+6minus3 = 1199106

(1199101003817100381710038171003817119909) = [

1 0 1]

(5)

Thus only the 4th 5th and 6th elements of 119910 will be usedin (119910119909)119894 Notice that the first 119898 minus 119899 components of 119910 arediscarded when building 119910119909

The Gamma operator needs a binary vector as input Inorder to deal with real andor integer vectors a method torepresent this vector in a binary form is needed In this workwe will use the modified Johnson-Mobius code [39] which isa variation of the classical Johnson-Mobius code To converta set of realinteger numbers into a binary representation wewill follow these steps

(1) Let 119877 be a set of real numbers such that 119877 = 1199031 1199032 119903119894 119903119899 where 119899 isin R

(2) If there is one or more negative numbers in the setcreate a new set from subtracting the minimum (ie119903119894) from each number in 119877 obtaining a new set 119879with only nonnegative real numbers 119879 = 1199051 1199052 119905119894 119905119899 where 119905119895 = 119903119895 minus 119903119894 forall119895 isin 1 2 119899 andparticularly 119905119894 = 0

(3) Choose a fixed number 119889 and truncate each numberof the set 119879 to have only 119889 decimal positions

(4) Scale up all the numbers of the set obtained in theprevious step by multiplying all numbers by 10119889 inorder to leave only nonnegative integer numbers 119864 =

1198901 1198902 119890119898 119890119899 where 119890119895 = 119905119895 sdot 10119889forall119895 isin 1 2

119899 and 119890119898 is the maximum value of the set 119864(5) For each element 119890119895 isin 119864 with 119895 = 1 2 119899

concatenate (119890119898 minus 119890119895) zeroes with 119890119895 ones

For instance let 119877 sub R be defined as 119877 = 24 minus07 18 15now let us use the modified Johnson-Mobius code to convertthe elements of 119877 into binary vectors

(1) Subtract the minimum Since minus07 is the minimum in119877 the members of the transformed set 119879 are obtainedby subtracting minus07 from each member in 119877 119905119894 = 119889119894 minus(minus07) = 119889119894 + 07 Consider

119863 997888rarr 119879 sub R

119879 = 31 0 25 22 (6)

(2) Scale up the numbers We select 119889 = 1 so eachnumber is truncate to 1 decimal and then multiply by10 (10119889 = 101 = 10) One has

119879 997888rarr 119864 sub Z+

119864 = 31 0 25 22 (7)

(3) Concatenate (119890119898 minus119890119895) zeroes with 119890119895 ones where 119890119898 isthemaximumnumber in119864 and 119890119895 is the current num-ber to be coded Given that the maximum numberin 119864 is 31 all the binary vectors have 31 componentsthat is they all are 31 bits long For example 1198904 =

22 is converted into its binary representation 1198884by appending 119890119898 minus 1198904 = 31 minus 22 = 9 zeroes fol-lowed by 1198904 = 22 ones which gives the final vectorldquo0000000001111111111111111111111rdquo Consider

119864 997888rarr 119862 = 119888119894 | 119888119894 isin11986031

119862 =

[1111111111111111111111111111111][0000000000000000000000000000000][0000001111111111111111111111111][0000000001111111111111111111111]

(8)

Definition 5 (Gammaoperator) The similarityGammaoper-ator takes two binary patterns119909 isin 119860119899 and119910 isin 119860119898 119899119898 isin Z+119899 le 119898 and a nonnegative integer 120579 as input and outputs abinary number for which there are two cases

Case 1 If 119899 = 119898 then the output is computed according tothe following

120574119892 (119909 119910 120579) =

1 if 119898 minus 119906120573 [120572 (119909 119910) mod 2] le 120579

0 otherwise(9)

where mod denotes the usual modulo operation

Case 2 If 119899 lt 119898 then the output is computed using 119910119909instead of 119910 as follows

120574119892 (119909 119910 120579)

=

1 if 119898 minus 119906120573 [120572 (119909 1199101003817100381710038171003817119909) mod 2] le 120579

0 otherwise

(10)

In order to better illustrate how the generalized Gammaoperator works let us work through some examples of itsapplication Then if 119909 = [1 0 1 1 1] 119910 = [1 0 0 1 0]and 120579 = 1 what is the result of 120574119892(119909 119910 120579) First we calculate120572(119909 119910) = [1 1 2 1 2] then

120572 (119909 119910) mod 2 = [1 1 2 1 2] mod 2

= [1 1 0 1 0] (11)

now

119906120573 (120572 (119909 119910) mod 2) = 119906120573 [1 1 0 1 0]

= sum[120573 (1 1) 120573 (1 1) 120573 (0 0) 120573 (1 1) 120573 (0 0)]

= 1+ 1+ 0+ 1+ 0 = 3

(12)

Mathematical Problems in Engineering 5

and since 119898 = 5 5 minus 3 = 2 given that 2 le (120579 = 1) thenthe result of 120574119892 is 0 and it is decided that given 120579 the twovectors are not similar However if we increase 120579 = 2 theresult of 120574119892 will be 1 since 2 le (120579 = 2) In this case thetwo vectors will be considered similar The main idea of thegeneralized Gamma operator is to indicate (result equal to 1)that two binary vectors are similar allowing up to 120579 bits to bedifferent and still consider those vectors similar If more than120579 bits are different the vectors are said to be not similar (resultequal to 0) Thus if 120579 = 0 both vectors must be equal for 120574119892to output a 1

211 Gamma Classifier Algorithm Let (119909120583 119910120583) | 120583 =

1 2 119901 be the fundamental pattern set with cardinality 119901when a test pattern 119909 isin R119899 with 119899 isin Z+ is presented to theGamma classifier these steps are followed

(1) Code the fundamental set using the modified John-son-Mobius code obtaining a value 119890119898 for each com-ponent of the 119899-dimensional vectors in the funda-mental set These 119890119898 values are calculated from theoriginal real-valued vectors as

119890119898 (119895) = MAX119901119894=1 (119909119894119895) forall119895 isin 1 2 119899 (13)

(2) Compute the stop parameter 120588 = MAX119899119895=1[119890119898(119895)]

(3) Code the test pattern using the modified Johnson-Mobius code with the same parameters used to codethe fundamental set If any 119890119895 (obtained by offsettingscaling and truncating the original 119909119895) is greaterthan the corresponding 119890119898(119895) code it using the largernumber of bits

(4) Transform the index of all fundamental patterns intotwo indices one for their class and another for theirposition in the class (eg 119909120583 which is the 120596th patternfor class 119894 becomes 119909119894120596)

(5) Initialize 120579 to 0

(6) If 120579 = 0 test whether 119909 is a fundamental pattern bycalculating 120574119892(119909

120583119895 119909119895 0) for 120583 = 1 2 119901 and then

computing the initial weighted addition 1198880120583 for each

fundamental pattern as follows

1198880120583 =

119899

sum

119894=1120574119892 (119909120583119895 119909119895 0) for 120583 = 1 2 119901 (14)

If there is a unique maximum whose value equals 119899assign the class associated with such maximum to thetest pattern Consider

119910 = 119910120590 such that MAX119901119894=1119888

0119894 = 119888

0120590 = 119899 (15)

(7) Calculate 120574119892(119909119894120596119895 119909119895 120579) for each component of the

fundamental patterns

(8) Compute a weighted sum 119888119894 for each class accordingto this equation

119888119894 =

sum119896119894120596=1sum119899119895=1 120574119892 (119909

119894120596119895 119909119895 120579)

119896119894

(16)

where 119896119894 is the cardinality in the fundamental set ofclass 119894

(9) If there is more than one maximum among the dif-ferent 119888119894 increment 120579 by 1 and repeat steps (7) and(8) until there is a unique maximum or the stopcondition 120579 gt 120588 is fulfilled

(10) If there is a unique maximum among the 119888119894 assign119909 to the class corresponding to such maximum

119910 = 119910119895 such that MAX 119888119894 = 119888119895 (17)

(11) Otherwise assign 119909 to the class of the first maximum

As an example let us consider the following fundamentalpatterns

1199091= (

28)

1199092= (

19)

1199093= (

63)

1199094= (

74)

(18)

grouped in two classes 1198621 = 1199091 119909

2 and 1198622 = 119909

3 119909

4 Then

the patterns to be classified are

1199091= (

62)

1199092= (

39)

(19)

As can be seen the dimension of all patterns is 119899 = 2 andthere are 2 classes both with the same cardinality 1198961 = 2 and1198962 = 2 Now the steps in the algorithm are followed

6 Mathematical Problems in Engineering

(1) Code the fundamental set with themodified Johnson-Mobius code obtaining a value 119890119898 for each compo-nent Consider

1199091= [

0000011011111111

]

1199092= [

0000001111111111

]

1199093= [

0111111000000111

]

1199094= [

1111111000001111

]

(20)

Since the maximum value among the first compo-nents of all fundamental patterns is 7 then 119890119898(1) = 7and the maximum value among the second compo-nents is 119890119898(2) = 9Thus the binary vectors represent-ing the first components are 7 bits long while thoserepresenting the second component are 9 bits long

(2) Compute the stop parameter

120588 = MAX119899119895=1 [119890119898 (119895)] = MAX2119895=1 [119890119898 (1) 119890119898 (2)]

= MAX (7 9) = 7(21)

(3) Code the patterns to be classified into the modifiedJohnson-Mobius code using the same parametersused with the fundamental set

1199091= [

0111111000000011

]

1199092= [

0000111111111111

]

(22)

Again the binary vectors representing the first com-ponents are 7 bits long while those representing thesecond component are 9 bits long

(4) Transform the index of all fundamental patterns intotwo indices one for the class they belong to andanother for their position in the class Consider

1199091= 119909

11

1199092= 119909

12

1199093= 119909

21

1199094= 119909

22

(23)

Given that 1198621 = 1199091 119909

2 and 1198622 = 119909

3 119909

4 we know

that both 1199091 and 119909

2 belong to class 1198621 so the firstindex for both patterns becomes 1 the second index isused to differentiate between patterns assigned to thesame class thus 1199091 becomes 11990911 and 1199092 is now 119909

12Something similar happens to 1199093 and 1199094 but with thefirst index being 2 since they belong to class 1198622 119909

3

becomes 11990921 and 1199094 is now 11990922

(5) Initialize = 0(6) Calculate 120574119892(119909

119894120596119895 119909119895 120579) for each component of the

fundamental patterns in each class Thus for 1199091 wehave

120574119892 (119909111 119909

11 0) = 0

120574119892 (119909112 119909

12 0) = 0

120574119892 (119909121 119909

11 0) = 0

120574119892 (119909122 119909

12 0) = 0

120574119892 (119909211 119909

11 0) = 1

120574119892 (119909212 119909

12 0) = 0

120574119892 (119909221 119909

11 0) = 0

120574119892 (119909222 119909

12 0) = 0

(24)

Since 120579 = 0 for 120574119892 to give a result of 1 it is necessarythat both vectors 119909119894120596119895 and 1199091119895 are equal But only onefundamental pattern has its first component equal tothat of 1199091 119909211 = [0111111] = 119909

11 while no funda-

mental pattern has a second component equal to thatof the test pattern Given that this is the only case forwhich a component of a fundamental pattern is equalto the same component of the test pattern it is alsothe only instance in which 120574119892 outputs 1

Now for 1199092 we have

120574119892 (119909111 119909

21 0) = 0

120574119892 (119909112 119909

22 0) = 0

120574119892 (119909121 119909

21 0) = 0

120574119892 (119909122 119909

22 0) = 1

120574119892 (119909211 119909

21 0) = 0

120574119892 (119909212 119909

22 0) = 0

120574119892 (119909221 119909

21 0) = 0

120574119892 (119909222 119909

22 0) = 0

(25)

Mathematical Problems in Engineering 7

Again 120579 = 0 thus forcing both vectors 119909119894120596119895 and 1199092119895

to be equal in order to obtain a 1 Similarly to whathappened in the case of 1199091 there is but one instanceof such occurrence 119909122 = [1111111] = 11990922

(7) Compute a weighted sum 119888119894 for each class

1198881 =sum1198961120596=1sum119899119895=1 120574119892 (119909

1120596119895 119909

1119895 120579)

1198961

=

sum2120596=1sum

2119895=1 120574119892 (119909

1120596119895 119909

1119895 120579)

2

=

sum2120596=1 [(0 + 0) (0 + 0)]

2=

(0 + 0)2

= 0

(26)

Here we add together the results obtained on allcomponents of all fundamental patterns belongingto class 1198621 Since all gave 0 (ie none of these weresimilar to the corresponding component of the testpattern given the value of 120579 in effect) the result of theweighted addition for class 1198621 is 1198881 = 0 One has

1198882 =sum1198962120596=1sum119899119895=1 120574119892 (119909

2120596119895 119909

1119895 120579)

1198962

=

sum2120596=1sum

2119895=1 120574119892 (119909

2120596119895 119909

1119895 120579)

2

=

sum2120596=1 [(1 + 0) (0 + 0)]

2=

(1 + 0)2

= 05

(27)

In this case there was one instance similar to thecorresponding component of the test pattern (equalsince 120579 = 0 during this run) Thus the weightedaddition for class 1198622 is 1198882 = 05 given that the sum ofresults is divided by the cardinality of the class 119896 = 2And for 1199092

1198881 =sum1198961120596=1sum119899119895=1 120574119892 (119909

1120596119895 119909

2119895 120579)

1198961

=

sum2120596=1sum

2119895=1 120574119892 (119909

1120596119895 119909

2119895 120579)

2

=

sum2120596=1 [(0 + 0) (0 + 1)]

2=

(0 + 0)2

= 05

(28)

There was one result of 120574119892 equal to 1 which coupledwith the cardinality of class1198621 being 2makes 1198881 = 05One has

1198882 =sum1198962120596=1sum119899119895=1 120574119892 (119909

2120596119895 119909

2119895 120579)

1198962

=

sum2120596=1sum

2119895=1 120574119892 (119909

2120596119895 119909

2119895 120579)

2

=

sum2120596=1 [(0 + 0) (0 + 0)]

2=

(0 + 0)2

= 0

(29)

No similarities were found between the fundamentalpatterns of class1198622 and the test pattern 1199092 (for 120579 = 0)thus 1198882 = 0

(8) If there is more than one maximum among the dif-ferent 119888119894 increment 120579 by 1 and repeat steps (6) and (7)until there is a unique maximum or the stop condi-tion 120579 ge 120588 is fulfilledThere is a uniquemaximum (foreach test pattern) so we go directly to step (9)

(9) If there is a unique maximum assign 119910 to the classcorresponding to such maximum For 1199091

1198621199091 = 1198622 since

MAX2119894=1119888119894 = MAX (0 05) = 05 = 1198882

(30)

while for 1199092

1198621199092 = 1198621 since

MAX2119894=1119888119894 = MAX (05 0) = 05 = 1198881

(31)

(10) Otherwise assign 119910 to the class of the first maximumThis is unnecessary since both test patterns havealready been assigned a class

22 SlidingWindows Awidespread approach for data streammining is the use of a sliding window to keep only a represen-tative portion of the data We are not interested in keeping allthe information of the data stream particularly whenwe haveto meet the memory space constraint required for algorithmsworking in this context This technique is able to deal withconcept drift eliminating those data points that come froman old concept According to [5] the training window sizecan be fixed or variable over time

Sliding windows of a fixed size store in memory a fixednumber of the 119908 most recent examples Whenever a newexample arrives it is saved to memory and the oldest one isdiscarded These types of windows are similar to first-in first-out data structures This simple adaptive learning method isoften used as a baseline in evaluation of new algorithms

Sliding windows of variable size vary the number ofexamples in a window over time typically depending on theindications of a change detector A straightforward approachis to shrink the window whenever a change is detected suchthat the training data reflects the most recent concept andgrow the window otherwise

23 Proposed Method The Gamma classifier has previ-ously been used for time series prediction in air pollution[37] and oil well production [40] showing very promisingresults The current work takes the Gamma classifier andcombines it with a sliding window approach with the aimof developing a newmethod capable of performing classifica-tion over a continuous data stream referred to asDS-Gammaclassifier

8 Mathematical Problems in Engineering

(1) Initialize window(2) for each 119909120591 do(3) If 120591 lt 119882 then(4) Add 119909120591 to the tail of the window(5) Else(6) Use 119909120591 to classify(7) Add 119909120591 to the tail of the window

Algorithm 1 Learning policy algorithm

A data stream 119878 can be defined as a sequence of dataelements of the form 119878 = 1199041 1199042 119904now from a universe ofsize119863 where119863 is huge and potentially infinite but countableWe assume that each element 119904119894 has the following form

119904119894 = (119909120591 119910120591) (32)

where 119909120591 is an input vector of size 119899 and 119910120591 is the class labelassociatedwith it 120591 isin Z+ is an index to indicate the sequentialorder of the elements in the data stream 120591 will be boundedby the size of the data stream such that if the data stream isinfinite 120591 will take infinite values but countable

One of the main challenges for mining data streamsis the ability to maintain an accurate decision model Thecurrent model has to incorporate new knowledge from newdata and also it should forget old information when datais outdated To address these issues the proposed methodimplements a learning and forgetting policy The learning(insertion)forgetting (removal) policies were designed toguarantee the space and temporal relevance of the data andthe consistency of the current model

The proposed method implements a fixed windowapproach This approach achieves concept drift detection inan indirect way by considering only the most recent dataAlgorithm 1 outlines the general steps used as learning policyInitially the window is created with the first 119882 labeledrecords where 119882 is the size of the window After thatevery sample is used for classification and evaluation of theperformance and then it is used to update the window

Classifiers that deal with concept drifts are forced toimplement forgetting adaptation or drift detection mech-anisms in order to adjust to changing environments Inthis proposal a forgetting mechanism is implemented Algo-rithm 2 outlines the basic forgetting policy implementedin this proposal The oldest element in the window 119909

120575 isremoved

24 MOA Massive Online Analysis (MOA) is a softwareenvironment for implementing algorithms and runningexperiments for online learning from evolving data streams[41] MOA contains a collection of offline and online algo-rithms for both classification and clustering as well as toolsfor evaluation on data streams with concept drifts MOA alsocan be used to build synthetic data streams using generatorsreading ARFF files or joining several streams MOA streamsgenerators allow simulating potentially infinite sequences of

Table 2 Datasets characteristics

Dataset Instances Attributes ClassesAgrawal 100000 10 2Bank 45211 16 2CoverType 581012 54 7Electricity 45312 8 2Hyperplane 100000 10 2RandomRBF 100000 10 2

data with different characteristics (different types of conceptdrifts and recurrent concepts among others) All the algo-rithms used for comparative analysis are implemented underthe MOA framework

25 Data Streams This section provides a brief descriptionof the data streams used during the experimental phase Theproposed solution has been tested using synthetic and realdata streams For data stream classification problems just afew suitable data streams can be successfully used for evalu-ation A few data streams with enough examples are hostedin [42 43] For further evaluation the use of synthetic datastreams generators becomes necessary The synthetics datastreams used in our experiments were generated using theMOA framework mentioned in the previous section A sum-mary of the main characteristics of the data streams is shownin Table 2 Descriptions of the data streams were taken from[42 43] in the case of real ones and from [44] for the synthet-ics

251 Real World Data Streams

Bank Marketing Dataset The Bank Marketing dataset isavailable from UCI Machine Learning Repository [43] Thisdata is related to direct marketing campaigns (phone calls)of a Portuguese banking institution The classification goal isto predict if the client will subscribe or not a term deposit Itcontains 45211 instances Each instance has 16 attributes andthe class label The dataset has two classes that identified if abank term deposit would be or would not be subscribed

Electricity Dataset This data was collected from the Aus-tralian New South Wales electricity market In this marketprices are not fixed and are affected by demand and supply ofthemarketTheprices of themarket are set every fiveminutesThe dataset was first described by Harries in [45] During thetime period described in the dataset the electricity marketwas expanded with the inclusion of adjacent areas whichleads tomore elaboratemanagement of the supplyThe excessproduction of one region could be sold on the adjacentregionThe Electricity dataset contains 45312 instances Eachexample of the dataset refers to a period of 30 minutes (iethere are 48 instances for each time period of one day) Eachexample on the dataset has 8 fields the day of week the timestamp the New South Wales electricity demand the NewSouthWales electricity price the Victoria electricity demandtheVictoria electricity price the scheduled electricity transfer

Mathematical Problems in Engineering 9

(1) for each 119909120591 do(2) If 120591 gt 119882 then(3) Remove 119909120575(120575 = 120591 minus119882 + 1) from the beginning of the window

Algorithm 2 Forgetting policy algorithm

between states and the class label The class label identifiesthe change of the price relative to a moving average of the last24 hours We use the normalized versions of these datasets sothat the numerical values are between 0 and 1 The dataset isavailable from [42]

Forest CoverType Dataset The Forest CoverType dataset isone of the largest datasets available from the UCI MachineLearning Repository [43] This dataset contains informationthat describes forest cover types from cartographic variablesA given observation (30times 30meter cell) was determined fromthe US Forest Service (USFS) Region 2 Resource InformationSystem (RIS) data It contains 581012 instances and 54attributes Data is in raw form (not scaled) and containsbinary (0 or 1) columns of data for qualitative independentvariables (wilderness areas and soil types) The dataset has 7classes that identify the type of forest cover

252 Synthetic Data Streams

Agrawal Data Stream This generator was originally intro-duced by Agrawal et al in [46] The MOA implementationwas used to generate the data streams for the experimen-tal phase in this work The generator produces a streamcontaining nine attributes Although not explicitly stated bythe authors a sensible conclusion is that these attributesdescribe hypothetical loan applications There are ten func-tions defined for generating binary class labels from theattributes Presumably these determine whether the loanshould be approved Perturbation shifts numeric attributesfrom their true value adding an offset drawn randomly froma uniform distribution the range of which is a specifiedpercentage of the total value range [44] For each experimenta data stream with 100000 examples was generated

RandomRBFData StreamThis generator was devised to offeran alternate complex concept type that is not straightforwardto approximate with a decision tree model The RBF (RadialBasis Function) generator works as follows A fixed numberof random centroids are generated Each center has a randomposition a single standard deviation class label and weightNew examples are generated by selecting a center at randomtaking weights into consideration so that centers with higherweight are more likely to be chosen A random direction ischosen to offset the attribute values from the central pointThe length of the displacement is randomly drawn from aGaussian distribution with standard deviation determined bythe chosen centroid The chosen centroid also determinesthe class label of the example Only numeric attributes aregenerated [44] For our experiments a data stream with

100000 data instances 10 numerical attributes and 2 classeswas generated

Rotating Hyperplane Data Stream This synthetic data streamwas generated using the MOA framework A hyperplane in119889-dimensional space is the set of points 119909 that satisfy [44]

119889

sum

119894=1119909119894119908119894 = 1199080 =

119889

sum

119894=1119908119894 (33)

where 119909119894 is the 119894th coordinate of 119909 examples for whichsum119889119894=1 119909119894119908119894 ge 1199080 are labeled positive and examples for which

sum119889119894=1 119909119894119908119894 lt 1199080 are labeled negative Hyperplanes are useful

for simulating time-changing concepts because orientationand position of the hyperplane can be adjusted in a gradualway if we change the relative size of the weights In MOAthey introduce change to this data stream by adding driftto each weight attribute using this formula 119908119894 = 119908119894 + 119889120590where 120590 is the probability that the direction of change isreversed and 119889 is the change applied to every example Forour experiments a data stream with 100000 data instances10 numerical attributes and 2 classes was generated

26 Algorithms Evaluation One of the objectives of this workis to perform a consistent comparison between the classi-fication performance of our proposal and the classificationperformance of other well-known data stream pattern clas-sification algorithms Evaluation of data stream algorithmsis a relatively new field that has not been studied as well asevaluation on batch learning In traditional batch learningevaluation has been mainly limited to evaluating a modelusing different random reorder of the same static dataset (119896-fold cross validation leave-one-out and bootstrap) [2] Fordata stream scenarios the problem of infinite data raises newchallenges On the other hand the batch approaches cannoteffectively measure accuracy of a model in a context withconcepts that change over time In the data stream settingone of the main concerns is how to design an evaluationpractice that allows us to obtain a reliablemeasure of accuracyover time According to [41] two main approaches arise

261 Holdout When traditional batch learning reaches ascale where cross validation is too time-consuming it is oftenaccepted to insteadmeasure performance on a single holdoutset An extension of this approach can be applied for datastream scenarios First a set of 119872 instances is used to trainthe model then a set of the following unseen 119873 instances isused to test the model then it is again trained with the next119872 instances and testedwith the subsequence119873 instances andso forth

10 Mathematical Problems in Engineering

262 Interleaved Test-Then-Train or Prequential Each indi-vidual example can be used to test the model before it is usedfor training and from this the accuracy can be incrementallyupdated When intentionally performed in this order themodel is always being tested on examples it has not seen

Holdout evaluation gives a more accurate estimationof the performance of the classifier on more recent dataHowever it requires recent test data that is sometimes difficultto obtain while the prequential evaluation has the advantagethat no holdout set is needed for testing making maximumuse of the available data In [47] Gama et al present a generalframework to assess data stream algorithms They studieddifferent mechanisms of evaluation that include holdoutestimator the prequential error and the prequential errorestimated over a sliding window or using fading factorsTheystated that the use of prequential error with forgetting mech-anisms reveals to be advantageous in assessing performanceand in comparing stream learning algorithms Since ourproposedmethod isworkingwith a slidingwindowapproachwe have selected the prequential error over sliding windowfor evaluation of the proposedmethodThe prequential error119875119882 for a sliding window of size 119882 computed at time 119894 isbased on an accumulated sumof a 0-1 loss function119871 betweenthe predicted 119910119896 values and the observed values 119910119896 using thefollowing expression [47]

119875119882 (119894) =1119882

119894

sum

119896=119894minus119882+1119871 (119910119896 119910119896) (34)

We also evaluate the performance results of each competingalgorithm with the Gamma classifier using a statistical testWe use the Wilcoxon test [48] to assess the statistical sig-nificance of the results The Wilcoxon signed-ranks test is anonparametric test which can be used to rank the differencesin performances of two classifiers for each dataset ignoringthe signs and comparing the ranks for the positive and thenegative differences

27 Experimental Setup To ensure valid comparison of clas-sification performance the same conditions and validationschemes were applied in each experiment Classificationperformance of each of the compared algorithms was cal-culated using the prequential error approach These resultswill be used to compare the performance of our proposaland other data stream classification algorithms We haveaimed at choosing representative algorithms from differentapproaches covering the state of the art in data stream classi-fication such as Hoeffding Tree IBL Naıve Bayes with DDM(Drift Detection Method) SVM and ensemble methods Allof these algorithms are implemented in theMOA frameworkFurther details on the implementation of these algorithmscan be found in [44] For all data streams with exceptionof the Electricity dataset the experiments were executed 10times and the average of the performance and executiontime results are presented Records in the Electricity datasethave a temporal relation that should not be altered forthis reason with this specific dataset the experiment wasexecuted just one time The Bank and CoverType datasets

were reordered 10 times using the randomized filter fromthe WEKA platform [49] for each experiment For thesynthetics data streams (Agrawal Rotating Hyperplane andRandomRBF) 10 data streams were generated for each oneof them using different random seeds in MOA Performancewas calculated using the prequential error introduced in theprevious section We also measure the total time used forclassification to evaluate the efficiency of each algorithm Allexperiments were conducted using a personal computer withan Intel Core i3-2100 processor running Ubuntu 1304 64-bitoperating system with 4096GB of RAM

3 Results and Discussion

In this section we present and discuss the results obtainedduring the experimental phase throughout which six datastreams three real and three synthetic were used to obtainthe classification and time performance for each of thecompared classification algorithms First we evaluate thesensibility of the proposed method to the change on thewindows size which is a parameter that can largely influencethe performance of the classifier Table 3 shows the averageaccuracy of the DS-Gamma classifier for different windowsizes best accuracy for each data stream is emphasized withboldface Table 4 shows the average total time used to classifyeach data stream The results of the average performancewith indicator of the standard deviation are also depicted inFigure 1

In general the DS-Gamma classifier performanceimproved as the window size increases with the exceptionof Electricity data stream for this data stream performanceis better with smaller window sizes It is worth noting thatthe DS-Gamma classifier achieved its best performance forthe Electricity data stream with a window size of 50 witha performance of 8912 This performance is better thanall the performances achieved by all the other comparedalgorithms with a window size = 1000 For the other datastreams the best performance was generally obtained withwindow sizes between 500 and 1000 With window sizesgreater than 1000 performance keeps stable and in somecases even degrades as we can observe in the results fromthe Bank and Hyperplane data streams in Figure 1 Figure 2shows the average classification time for different windowsize A linear trend line (dotted line) and the coefficient ofdetermination 1198772 were included in the graphic of each datastream The 1198772 values ranging between 09743 and 09866show that time is growing in linear way as the size of thewindow increases

To evaluate the DS-Gamma classifier we compared itsperformance with other data stream classifiers Table 5 pre-sents the performance results of each evaluated algorithm oneach data stream including the standard deviation Best per-formance is emphasized with boldface We also include boxplot with the same results in Figure 3 For the Agrawal datastreamHoeffdingTree and theDS-Gamma classifier achievedthe two best performances OzaBoostAdwin presents the bestperformance for the CoverType and Electricity data streamsbut this ensemble classifier exhibits the highest classificationtime for all the data streams as we can observe in Table 6 and

Mathematical Problems in Engineering 11

Table 3 Average accuracy of DS-Gamma for different windows size

Dataset Window size50 100 200 500 750 1000 1250 1500 1750 2000

Agrawal 7964 8597 8986 9259 9288 9282 9267 9251 9232 9207Bank 6819 7340 7705 7913 7912 7896 7860 7824 7776 7737CoverType 3821 3761 3970 4430 4659 4777 4862 4919 4951 4974Electricity 8912 8526 8171 7984 7858 7714 7636 7594 7531 7449Hyperplane 7825 8265 8528 8656 8655 8630 8598 8565 8534 8500RandomRBF 6461 6739 6933 7056 7081 7085 7074 7063 7050 7033

Table 4 Averaged classification time (seconds) of DS-Gamma for different windows size

Dataset Window size50 100 200 500 750 1000 1250 1500 1750 2000

Agrawal 224 450 793 1926 2804 3900 4592 5757 6390 7349Bank 066 114 207 475 701 917 1154 1455 1583 1787CoverType 8347 14081 25120 34005 25412 33503 42894 52239 66924 67699Electricity 034 053 098 233 356 463 599 705 830 949Hyperplane 224 401 756 1819 2676 3028 4428 5296 6187 7063RandomRBF 279 409 773 1854 2735 3072 4502 5412 6223 7104

Table 5 Averaged accuracy () comparison

Classifier Agrawal Bank CoverType Electricity Hyperplane RandomRBFAcc 120590 Acc 120590 Acc 120590 Acc 120590 Acc 120590 Acc 120590

Bayes-DDM 8816 019 8338 141 6761 681 8105 mdash 8602 139 7187 382Perceptron DDM 6720 011 7343 1053 7076 555 7861 mdash 8565 149 7188 593Hoeffding Tree 9439 024 8852 011 7229 268 8241 mdash 8241 562 8672 169Active Classifier 9082 162 8821 014 6731 103 7859 mdash 7950 655 7532 270IBL-DS 8676 048 8652 021 6173 946 7534 mdash 8420 077 9332 121AWE 9217 059 8790 038 7616 366 7836 mdash 8588 165 9198 114OzaBoostAdwin 8711 262 8767 050 7855 497 8792 mdash 8321 191 9149 114SGD 6720 011 7656 166 5901 737 8452 mdash 8319 042 6174 331SPegasos 5582 017 7746 199 5047 517 5760 mdash 5280 608 5333 313DS-Gamma 9282 010 7896 022 4777 431 7714 mdash 8630 135 7085 254

Table 6 Averaged classification time (seconds) comparison

Classifier Agrawal Bank CoverType Electricity Hyperplane RandomRBFTime 120590 Time 120590 Time 120590 Time 120590 Time 120590 Time 120590

Bayes-DDM 062 011 039 008 1776 162 061 mdash 070 001 069 001Perceptron DDM 040 002 020 005 828 010 019 mdash 045 002 044 002Hoeffding Tree 098 023 054 003 3005 1106 078 mdash 136 005 140 005Active Classifier 064 003 026 002 1514 074 023 mdash 072 004 070 002IBL-DS 5319 211 3851 826 105760 6095 861 mdash 8639 096 8810 106AWE 1432 020 663 030 101247 30441 670 mdash 1803 364 3752 303OzaBoostAdwin 19262 4447 4693 1056 1058973 389664 3975 mdash 22399 5134 5704 1666SGD 014 004 011 0009 4035 007 012 mdash 027 002 027 002SPegasos 022 0006 011 0016 1684 012 012 mdash 031 005 024 001DS-Gamma 3900 358 916 005 33503 1291 463 mdash 3028 079 3072 094

12 Mathematical Problems in Engineering

Agrawal data stream

Window size

95

90

85

80

7550 100 200 500 750 1000 1250 1500 1750 2000

Hyperplane data stream

90

85

80

70

75

65

Window size50 100 200 500 750 1000 1250 1500 1750 2000

Bank data stream85

80

70

75

6550 100 200 500 750 1000 1250 1500 1750 2000

Window size

Window size

90

85

80

70

75

65

60

50

55

50 100 200 500 750 1000 1250 1500 1750 2000

RandomRBF data stream

60

50

40

30

20

10

0

Window size50 100 200 500 750 1000 1250 1500 1750 2000

CoverType data stream

90

95

85

80

70

75

6550 100 200 500 750 1000 1250 1500 1750 2000

Window size

Electricity data stream

Figure 1 Average accuracy of DS-Gamma for different windows size

Figure 4 This can be a serious disadvantage especially indata stream scenarios The lowest times are achieved by theperceptron with Drift Detection Method (DDM) but itsperformance is generally low when compared to the othermethods A fact that caught our attention was the low perfor-mance of the DS-Gamma classifier for the CoverType datastream This data stream presents a heavy class imbalancethat seems to be negatively affecting the performance of theclassifier As we can observe in step (8) of the DS-Gamma

classifier algorithm classification relies on a weighted sumper class when a class greatly outnumbers the other(s) thissum will be biased to the class with most members

To statistically compare the results obtained by the DS-Gamma classifier and the other evaluated algorithms theWilcoxon signed-ranks test was used Table 7 shows thevalues of the asymptotic significance (2-sided) obtained bythe Wilcoxon test using 120572 = 005 The obtained values areall greater than 120572 so the null hypothesis is not rejected and

Mathematical Problems in Engineering 13

Hyperplane data stream

Window size

Window size

CoverType data stream Electricity data stream

RandomRBF data stream

Agrawal data stream Bank data stream80

70

60

50

40

30

20

20

15

10

5

010

050 100 200 500 750 1000 1250 1500 1750 2000

50 100 200 500 750 1000 1250 1500 1750 2000

Window size50 100 200 500 750 1000 1250 1500 1750 2000 Window size

50 100 200 500 750 1000 1250 1500 1750 2000

Window size50 100 200 500 750 1000 1250 1500 1750 2000

Window size50 100 200 500 750 1000 1250 1500 1750 2000

minus10

70

60

50

40

30

20

10

0

550

450

350

250

150

50

10

8

6

4

2

0

minus10

minus50

70

60

50

40

30

20

10

0

minus10

minus2

minus5

R

2

= 09866

R

2

= 09854

R

2

= 09804

R

2

= 09796

R

2

= 09743

R

2

= 09842

Figure 2 Averaged classification time (seconds) of DS-Gamma for different windows size

we can infer that there are no significant differences amongthe compared algorithms

4 Conclusions

In this paper we describe a novel method for the task ofpattern classification over a continuous data stream based onan associative model The proposed method is a supervisedpattern recognition model based on the Gamma classifierDuring the experimental phase the proposed method wastested using different data stream scenarios with real andsynthetic data streams The proposed method presented verypromising results in terms of performance and classification

Table 7 Wilcoxon results

Compared algorithms Asymptotic significance (2-sided)Gamma-Bayes-DDM 0345Gamma-Perceptron DDM 0917Gamma-Hoeffding Tree 0116Gamma-Active Classifier 0463Gamma-IBL-DS 0345Gamma-AWE 0116Gamma-OzaBoostAdwin 0116Gamma-SGD 0600Gamma-SPegasos 0075

14 Mathematical Problems in Engineering

Agrawal data stream

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

95

90

85

80

75

70

65

60

55

50

Bank data stream

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

90

80

70

60

50

40

Hyperplane data stream

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

95

90

85

80

75

70

65

60

55

45

50

RandomRBF data stream

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

95

90

85

80

75

70

65

60

55

45

50

CoverType data stream

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

90

85

80

75

70

65

60

55

45

50

Electricity data stream100

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

90

80

70

60

50

40

30

20

10

0

Figure 3 Averaged accuracy () comparison

time when compared with some of the state-of-the-art algo-rithms for data stream classification

We also study the effect of the window size over the clas-sification accuracy and time In general accuracy improvedas the window size increased We observed that best perfor-mances were obtained with window sizes between 500 and

1000 For larger window sizes performance remained stableand in some cases even declined

Since the proposed method based its classification ona weighted sum per class it is strongly affected by severeclass imbalance as we can observe from the results obtainedwith the experiments performed with the CoverType dataset

Mathematical Problems in Engineering 15

Agrawal data stream

200

150

100

50

0

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

1

10

100

Hyperplane data stream

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

1

10

100

1000

10000

100000CoverType data stream

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

0102030405060708090

RandomRBF data stream

0

10

20

30

40

50

60Bank data stream

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

05

10152025303540

Electricity data stream

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

Figure 4 Averaged classification time (seconds) comparison

a heavily imbalance dataset where the accuracy of theclassifier was the worst of the compared algorithms As futurework a mechanism to address severe class imbalance shouldbe integrated to improve classification accuracy

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

The authors would like to thank the Instituto PolitecnicoNacional (Secretarıa Academica COFAA SIP CIC andCIDETEC) the CONACyT and SNI for their economicalsupport to develop this workThe support of theUniversity ofPorto was provided during the research stay from August toDecember 2014 The authors also want to thank Dr YennyVilluendas Rey for her contribution to perform the statisticalsignificance analysis

16 Mathematical Problems in Engineering

References

[1] V Turner J FGantzD Reinsel and SMinton ldquoThedigital uni-verse of opportunities rich data and the increasing value of theinternet of thingsrdquo Tech Rep IDC Information andData 2014httpwwwemccomleadershipdigital-universe2014iviewindexhtm

[2] A Bifet Adaptive Stream Mining Pattern Learning and Miningfrom Evolving Data Streams IOS Press Amsterdam The Neth-erlands 2010

[3] J Gama Knowledge Discovery from Data Streams Chapman ampHallCRC 2010

[4] J Beringer and E Hullermeier ldquoEfficient instance-based learn-ing on data streamsrdquo Intelligent Data Analysis vol 11 no 6 pp627ndash650 2007

[5] J Gama I Zliobaite A Bifet M Pechenizkiy and A Bou-chachia ldquoA survey on concept drift adaptationrdquo ACM Comput-ing Surveys vol 46 no 4 article 44 2014

[6] P Domingos and G Hulten ldquoMining high-speed data streamsrdquoinProceedings of the 6thACMSIGKDD International Conferenceon Knowledge Discovery and Data Mining pp 71ndash80 BostonMass USA August 2000

[7] G Hulten L Spencer and P Domingos ldquoMining time-chang-ing data streamsrdquo in Proceedings of the 7th ACM SIGKDD Inter-national Conference on Knowledge Discovery and Data Mining(KDD rsquo01) pp 97ndash106 San Francisco Calif USA August 2001

[8] C Liang Y Zhang P Shi and Z Hu ldquoLearning very fast deci-sion tree from uncertain data streams with positive and unla-beled samplesrdquo Information Sciences vol 213 pp 50ndash67 2012

[9] L Rutkowski M Jaworski L Pietruczuk and P Duda ldquoDeci-sion trees for mining data streams based on the gaussianapproximationrdquo IEEE Transactions on Knowledge and DataEngineering vol 26 no 1 pp 108ndash119 2014

[10] G Widmer andM Kubat ldquoLearning in the presence of conceptdrift and hidden contextsrdquo Machine Learning vol 23 no 1 pp69ndash101 1996

[11] F Ferrer-Troyano J S Aguilar-Ruiz and J C RiquelmeldquoData streams classification by incremental rule learning withparameterized generalizationrdquo in Proceedings of the 2006 ACMSymposium on Applied Computing pp 657ndash661 Dijon FranceApril 2006

[12] D Mena-Torres and J S Aguilar-Ruiz ldquoA similarity-basedapproach for data stream classificationrdquo Expert Systems withApplications vol 41 no 9 pp 4224ndash4234 2014

[13] A Shaker and E Hullermeier ldquoIBLStreams a system forinstance-based classification and regression on data streamsrdquoEvolving Systems vol 3 no 4 pp 235ndash249 2012

[14] A Ghazikhani R Monsefi and H S Yazdi ldquoRecursive leastsquare perceptron model for non-stationary and imbalanceddata stream classificationrdquo Evolving Systems vol 4 no 2 pp119ndash131 2013

[15] A Ghazikhani R Monsefi and H S Yazdi ldquoEnsemble ofonline neural networks for non-stationary and imbalanced datastreamsrdquo Neurocomputing vol 122 pp 535ndash544 2013

[16] A Sancho-Asensio A Orriols-Puig and E GolobardesldquoRobust on-line neural learning classifier system for data streamclassification tasksrdquo Soft Computing vol 18 no 8 pp 1441ndash14612014

[17] D M Farid L Zhang A Hossain et al ldquoAn adaptive ensembleclassifier for mining concept drifting data streamsrdquo ExpertSystems with Applications vol 40 no 15 pp 5895ndash5906 2013

[18] D Brzezinski and J Stefanowski ldquoCombining block-based andonline methods in learning ensembles from concept driftingdata streamsrdquo Information Sciences vol 265 pp 50ndash67 2014

[19] G Ditzler and R Polikar ldquoIncremental learning of conceptdrift from streaming imbalanced datardquo IEEE Transactions onKnowledge and Data Engineering vol 25 no 10 pp 2283ndash23012013

[20] M J Hosseini Z Ahmadi andH Beigy ldquoUsing a classifier poolin accuracy based tracking of recurring concepts in data streamclassificationrdquo Evolving Systems vol 4 no 1 pp 43ndash60 2013

[21] N Saunier and S Midenet ldquoCreating ensemble classifiersthrough order and incremental data selection in a streamrdquo Pat-tern Analysis and Applications vol 16 no 3 pp 333ndash347 2013

[22] MMMasud J Gao L Khan J Han and BMThuraisinghamldquoClassification and novel class detection in concept-driftingdata streams under time constraintsrdquo IEEE Transactions onKnowledge and Data Engineering vol 23 no 6 pp 859ndash8742011

[23] A Alzghoul M Lofstrand and B Backe ldquoData stream fore-casting for system fault predictionrdquo Computers and IndustrialEngineering vol 62 no 4 pp 972ndash978 2012

[24] J Kranjc J Smailovic V Podpecan M Grcar M Znidarsicand N Lavrac ldquoActive learning for sentiment analysis on datastreams methodology and workflow implementation in theClowdFlows platformrdquo Information Processing and Manage-ment vol 51 no 2 pp 187ndash203 2015

[25] J Smailovic M Grcar N Lavrac and M Znidarsic ldquoStream-based active learning for sentiment analysis in the financialdomainrdquo Information Sciences vol 285 pp 181ndash203 2014

[26] S Shalev-Shwartz Y Singer N Srebro and A Cotter ldquoPegasosprimal estimated sub-gradient solver for SVMrdquo in Proceedingsof the 24th International Conference on Machine Learning pp807ndash814 June 2007

[27] Z Miller B Dickinson W Deitrick W Hu and A H WangldquoTwitter spammer detection using data stream clusteringrdquoInformation Sciences vol 260 pp 64ndash73 2014

[28] B Krawczyk and M Wozniak ldquoIncremental weighted one-class classifier for mining stationary data streamsrdquo Journal ofComputational Science vol 9 pp 19ndash25 2015

[29] J Gama ldquoA survey on learning from data streams current andfuture trendsrdquo Progress in Artificial Intelligence vol 1 no 1 pp45ndash55 2012

[30] L I Kuncheva ldquoChange detection in streaming multivariatedata using likelihood detectorsrdquo IEEE Transactions on Knowl-edge and Data Engineering vol 25 no 5 pp 1175ndash1180 2013

[31] G J Ross N M Adams D K Tasoulis and D J Hand ldquoExpo-nentially weighted moving average charts for detecting conceptdriftrdquoPatternRecognition Letters vol 33 no 2 pp 191ndash198 2012

[32] J Gama P Medas G Castillo and P Rodrigues ldquoLearning withdrift detectionrdquo in Advances in Artificial IntelligencemdashSBIA2004 vol 3171 of Lecture Notes in Computer Science pp 286ndash295 Springer Berlin Germany 2004

[33] X Wu P Li and X Hu ldquoLearning from concept drifting datastreams with unlabeled datardquoNeurocomputing vol 92 pp 145ndash155 2012

[34] X Qin Y Zhang C Li and X Li ldquoLearning from data streamswith only positive and unlabeled datardquo Journal of IntelligentInformation Systems vol 40 no 3 pp 405ndash430 2013

Mathematical Problems in Engineering 17

[35] M M Masud C Woolam J Gao et al ldquoFacing the reality ofdata stream classification coping with scarcity of labeled datardquoKnowledge and Information Systems vol 33 no 1 pp 213ndash2442012

[36] J Read A Bifet G Holmes and B Pfahringer ldquoScalable andefficient multi-label classification for evolving data streamsrdquoMachine Learning vol 88 no 1-2 pp 243ndash272 2012

[37] I Lopez-Yanez A J Arguelles-Cruz O Camacho-Nieto andC Yanez-Marquez ldquoPollutants time-series prediction usingthe gamma classifierrdquo International Journal of ComputationalIntelligence Systems vol 4 no 4 pp 680ndash711 2011

[38] M E Acevedo-Mosqueda C Yanez-Marquez and I Lopez-Yanez ldquoAlpha-beta bidirectional associative memories theoryand applicationsrdquo Neural Processing Letters vol 26 no 1 pp 1ndash40 2007

[39] C Yanez-Marquez E M Felipe-Riveron I Lopez-Yanez andR Flores-Carapia ldquoA novel approach to automatic color match-ingrdquo in Progress in Pattern Recognition Image Analysis andApplications vol 4225 of Lecture Notes in Computer Science pp529ndash538 Springer Berlin Germany 2006

[40] I Lopez-Yanez L Sheremetov and C Yanez-Marquez ldquoAnovel associative model for time series data miningrdquo PatternRecognition Letters vol 41 no 1 pp 23ndash33 2014

[41] A Bifet G Holmes R Kirkby and B Pfahringer ldquoData streammining a practical approachrdquo Tech RepUniversity ofWaikato2011

[42] Massive Online Analysis (MOA) 2014 httpmoacmswaikatoacnzdatasets

[43] K Bache and M Lichman UCI Machine Learning RepositorySchool of Information and Computer Science University ofCalifornia Irvine Calif USA 2013 httparchiveicsucieduml

[44] A Bifet R Kirkby P Kranen and P Reutemann ldquoMassiveOnline Analysis Manualrdquo 2012 httpmoacmswaikatoacnzdocumentation

[45] M Harries ldquoSplice-2 comparative evaluation electricity pric-ingrdquo Tech Rep University of South Wales Cardiff UK 1999

[46] R Agrawal T Imielinski and A Swami ldquoDatabase mining aperformance perspectiverdquo IEEE Transactions on Knowledge andData Engineering vol 5 no 6 pp 914ndash925 1993

[47] J Gama R Sebastiao and P P Rodrigues ldquoOn evaluatingstream learning algorithmsrdquo Machine Learning vol 90 no 3pp 317ndash346 2013

[48] F Wilcoxon ldquoIndividual comparisons by ranking methodsrdquoBiometrics Bulletin vol 1 no 6 pp 80ndash83 1945

[49] M Hall E Frank G Holmes B Pfahringer P Reutemannand I H Witten WEKA 3 Data Mining Software in Java 2010httpwwwcswaikatoacnzmlweka

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 2: Research Article Data Stream Classification Based …downloads.hindawi.com/journals/mpe/2015/939175.pdfuncertain data [ ], classi cation with unlabeled instances [ , , ], novel class

2 Mathematical Problems in Engineering

have been addressed clustering classification predictionand concept drift detection Data stream algorithms can beroughly divided into single classifier approach and ensembleapproach The single model approach can be further dividedinto model based approach and instance based approachModel based approaches have the disadvantage of the com-putational cost of updating the model every time that a newelement of the stream arrives In instance based approachesjust the representative subset of elements (case base) fromthe data stream have to be adapted this does not comefor free since the computational cost is reflected duringthe classification stage [4] The single classifier approachincludes several data mining techniques such as decisiontrees decision rules instance based learners (IBL) andneuralnetworks among others A more detailed taxonomy of datastream algorithms can be found in [5]

Decision trees have been widely used in data stream clas-sification [6ndash9] One of the most representative models fordata stream classification using decision trees was proposedby Domingos and Hulten [6 7] and has been the inspirationfor several of the tree models used for data stream mining In[6] the authors proposed a Very Fast Decision Tree (VFDT)learner that uses Hoeffding bounds which guarantee that thetree can be trained in constant timeAn extension of thisworkwas presented in [7] where the decision trees are built froma sliding window of fixed size Trees are also commonly usedas base learners for ensemble approaches

Decision rules models have also been applied to datastream context One of the first methods proposed for ruleinduction in an online environment was FLORA (FLOatingRough Approximation) [10] To deal with concept drift andhidden contexts this framework uses three techniques keeponly a window of trusted examples store concept descrip-tions to reuse them when a previous context reappearsand use heuristics to control these two functions Anotherrelevant work was presented by Ferrer-Troyano et al in [11]The method is called FACIL (Fast and Adaptive Classifierby Incremental Learning) an incremental classifier based ondecision rules The classifier uses a partial instance memorywhere examples are rejected if they do not describe a decisionboundary

Algorithms for data stream context have to be incre-mental and highly adaptive In IBL algorithms incrementallearning and model adaptation are very simple since thesemethods come down to adapt the case base (ie represen-tative examples) This allows obtaining a flexible and easilyupdated model without complex computations that opti-mize the update and classification times [12] Beringer andHullermeier proposed an IBL algorithm for data streamclassification [4] the system maintains an implicit conceptdescription in the form of a case base that is updatedtaking into account three indicators temporal relevancespatial relevance and consistency On the arrival of a newexample this example is incorporated into the case baseand then a replacement policy is applied based on the threepreviously mentioned indicators In [12] a new techniquenamed Similarity-Based Data Stream Classifier (SimC) isintroduced The system maintains a representative and small

set of information in order to conserve the distributionof classes over the stream It controls noise and outliersapplying an insertionremoval policy designed for retainingthe best representation of the knowledge using appropriateestimators Shaker and Hullermeier presented an instancebased algorithm that can be applied to classification andregression problems [13] The algorithm is called IBLStreamand it optimizes the composition and size of the case baseautonomously On arrival of a new example this example isfirst added to the case base and then it is checked whetherother examples might be removed either since they havebecome redundant or since they are noisy data

For data stream scenarios neural network-based modelshave hardly been used one of themain issues of this approachis the weights update caused by the arrival of a new exampleThe influence on the network of this single example cannotsimply be canceled later on at best it can be reduced grad-ually over the course of time [12] However there are someworks where neural networks are used for data stream tasks[14] where an online perceptron is used to classify non-stationary imbalance data streams They have also been usedas base learners for ensemble methods [15 16]

The ensemble methods have also been extensively used indata streamclassificationThe ensemble approach as its nameimplies maintains in memory an ensemble of multiple mod-els whose individual predictions are combinedmdashgenerally byaverage or voting techniquesmdashto output a final prediction [2]The main idea behind an ensemble classifier is based on theconcept that different learning algorithms explore differentsearch spaces and evaluation of the hypothesis [3] Differentclassifiers can be used as base learners in the ensemblemethods for instance tree [17ndash19] Naıve Bayes [20 21]neural networks [15] and KNN [22]

SVMs have been used for different tasks in the datastreams scenarios such as sentiment analysis fault detectionand prediction medical applications spam detection andmultilabel classification In [23] a fault-detection systembased on data stream prediction is proposed SVM algorithmis used to classify the incoming data and trigger a fault alarmin case the data is considered abnormal Sentiment analysishas also been addressed using SVM methods Kranjc et al[24] introduce a cloud-based scientific workflow platformwhich is able to perform online dynamic adaptive sentimentanalysis on twitter dataThey used a linear SVM for sentimentclassification While this work does not focus on a specifictopic Smailovic et al [25] present a methodology that ana-lyzes whether the sentiment expressed in twitter feeds whichdiscuss selected companies can indicate their stock pricechanges They propose the use of the Pegasos SVM [26] toclassify the tweets as positive negative and neutral Then thetweets classified as positive were used to calculate a sentimentprobability for each day and this is used in a Grangercausality analysis to correlate positive sentiment probabilityand stock closing price In [27] a stream clustering SVMalgorithm to treat spam identification was presented Whilein general this problem has previously been addressed as aclassification problem they handle it as an anomaly detectionproblem Krawczyk and Wozniak [28] proposed a version of

Mathematical Problems in Engineering 3

incremental One-Class Support Vector Machine that assignsweights to each object according to its level of significanceThey also introduce two schemes for estimating weights fornew incoming data and examine their usefulness

Another well studied issue in the data stream contextis concept drift detection The change in the concept mightoccur due to changes in hidden variables affecting the phe-nomena or changes in the characteristics of the observedvariables According to [29] two main tactics are usedin concept drift detection (1) monitoring the evolution ofperformance indicators using statistical techniques and (2)monitoring distribution of two different time windows Somealgorithms use methods that adapt the decision model atregular intervals without taking into account if change reallyhappened this involves extra processing time On the otherhand explicit changemethods can detect the point of changequantify the change and take the needed actions In generalthemethods for concept drift detection can be integratedwithdifferent base learners Somemethods related to this issue canbe found in [30ndash32]

Some other issues related to data stream classificationhave also been addressed such as unbalanced data streamclassification [14 15 19] classification in the presence ofuncertain data [8] classification with unlabeled instances[8 33 34] novel class detection [22 35] and multilabelclassification [36]

In this paper we describe a novel method for the taskof pattern classification over a continuous data stream basedon an associative model The proposed method is based onthe Gamma classifier [37] which is a supervised patternrecognition model inspired on the Alpha-Beta associativememories [38] The proposal has been developed to workin data stream scenarios using a sliding window approachand it is capable of handling the space and time constraintsinherent to this type of scenarios Various methods havebeen designed for data stream classification but to ourknowledge the associative approach has never been used forthis kind of problems The main contribution of this workis the implementation of a novel method for data streamclassification based on an associative model

The rest of this paper is organized as follows Section 2describes all the materials and methods needed to developour proposal Section 3 describes how the experimental phasewas conducted and discusses the results Some conclusionsare presented in Section 4 and finally the Acknowledgmentsand References are included

2 Materials and Methods

21 Gamma Classifier This classifier is a supervised methodwhose name is derived from the similarity operator that ituses the generalized Gamma operatorThis operator takes asinput two binary vectors 119909 and 119910 and a positive integer 120579and returns 1 if both vectors are similar (120579 being the degree ofallowed dissimilarity) or 0 otherwise The Gamma operatoruses other operators (namely 120572 120573 and 119906120573) that will beintroduced first The rest of this section is strongly based on[37 38]

Table 1 The Alpha (120572) and Beta (120573) operators

(a)

120572 119860 times 119860 rarr 119861

119909 119910 120572(119909 119910)

0 0 10 1 01 0 21 1 1

(b)

120573 119861 times 119860 rarr 119860

119909 119910 120573(119909 119910)

0 0 00 1 01 0 01 1 12 0 12 1 1

Definition 1 (Alpha and Beta operators) Given the sets 119860 =

0 1 and 119861 = 0 1 2 the Alpha (120572) and Beta (120573) operatorsare defined in a tabular form as shown in Table 1

Definition 2 (Alpha operator applied to vectors) Let 119909 119910 isin

119860119899 with 119899 isin Z+ be the input column vectors The output for

120572(119909 119910) is 119899-dimensional vector whose 119894th component (119894 =1 2 119899) is computed as follows

120572 (119909 119910)119894 = 120572 (119909119894 119910119894) (1)

Definition 3 (119906120573 operator) Considering the binary pattern119909 isin 119860

119899 as input this unary operator gives the following non-negative integer as output and is calculated as follows

119906120573 (119909) =

119899

sum

119894=1120573 (119909119894 119909119894) (2)

Thus if 119909 = [1 1 0 0 1 0] then

119906120573 (119909) = 120573 (1 1) + 120573 (1 1) + 120573 (0 0) + 120573 (0 0)

+ 120573 (1 1) + 120573 (0 0) = 1+ 1+ 0+ 0+ 1+ 0

= 3

(3)

Definition 4 (pruning operator) Let 119909 isin 119860119899 and 119910 isin 119860119898 let

119899119898 isin Z+ 119899 lt 119898 be two binary vectors then 119910 pruned by119909 denoted by 119910119909 is 119899-dimensional binary vector whose 119894thcomponent is defined as follows

(1199101003817100381710038171003817119909)119894

= 119910119894+119898minus119899 (4)

where (119894 = 1 2 119899)

4 Mathematical Problems in Engineering

For instance let 119910 = [1 1 1 1 0 1] 119909 = [0 1 0]119898 =

6 and then

(1199101003817100381710038171003817119909)1 = 1199101+6minus3 = 1199104

(1199101003817100381710038171003817119909)2 = 1199102+6minus3 = 1199105

(1199101003817100381710038171003817119909)3 = 1199103+6minus3 = 1199106

(1199101003817100381710038171003817119909) = [

1 0 1]

(5)

Thus only the 4th 5th and 6th elements of 119910 will be usedin (119910119909)119894 Notice that the first 119898 minus 119899 components of 119910 arediscarded when building 119910119909

The Gamma operator needs a binary vector as input Inorder to deal with real andor integer vectors a method torepresent this vector in a binary form is needed In this workwe will use the modified Johnson-Mobius code [39] which isa variation of the classical Johnson-Mobius code To converta set of realinteger numbers into a binary representation wewill follow these steps

(1) Let 119877 be a set of real numbers such that 119877 = 1199031 1199032 119903119894 119903119899 where 119899 isin R

(2) If there is one or more negative numbers in the setcreate a new set from subtracting the minimum (ie119903119894) from each number in 119877 obtaining a new set 119879with only nonnegative real numbers 119879 = 1199051 1199052 119905119894 119905119899 where 119905119895 = 119903119895 minus 119903119894 forall119895 isin 1 2 119899 andparticularly 119905119894 = 0

(3) Choose a fixed number 119889 and truncate each numberof the set 119879 to have only 119889 decimal positions

(4) Scale up all the numbers of the set obtained in theprevious step by multiplying all numbers by 10119889 inorder to leave only nonnegative integer numbers 119864 =

1198901 1198902 119890119898 119890119899 where 119890119895 = 119905119895 sdot 10119889forall119895 isin 1 2

119899 and 119890119898 is the maximum value of the set 119864(5) For each element 119890119895 isin 119864 with 119895 = 1 2 119899

concatenate (119890119898 minus 119890119895) zeroes with 119890119895 ones

For instance let 119877 sub R be defined as 119877 = 24 minus07 18 15now let us use the modified Johnson-Mobius code to convertthe elements of 119877 into binary vectors

(1) Subtract the minimum Since minus07 is the minimum in119877 the members of the transformed set 119879 are obtainedby subtracting minus07 from each member in 119877 119905119894 = 119889119894 minus(minus07) = 119889119894 + 07 Consider

119863 997888rarr 119879 sub R

119879 = 31 0 25 22 (6)

(2) Scale up the numbers We select 119889 = 1 so eachnumber is truncate to 1 decimal and then multiply by10 (10119889 = 101 = 10) One has

119879 997888rarr 119864 sub Z+

119864 = 31 0 25 22 (7)

(3) Concatenate (119890119898 minus119890119895) zeroes with 119890119895 ones where 119890119898 isthemaximumnumber in119864 and 119890119895 is the current num-ber to be coded Given that the maximum numberin 119864 is 31 all the binary vectors have 31 componentsthat is they all are 31 bits long For example 1198904 =

22 is converted into its binary representation 1198884by appending 119890119898 minus 1198904 = 31 minus 22 = 9 zeroes fol-lowed by 1198904 = 22 ones which gives the final vectorldquo0000000001111111111111111111111rdquo Consider

119864 997888rarr 119862 = 119888119894 | 119888119894 isin11986031

119862 =

[1111111111111111111111111111111][0000000000000000000000000000000][0000001111111111111111111111111][0000000001111111111111111111111]

(8)

Definition 5 (Gammaoperator) The similarityGammaoper-ator takes two binary patterns119909 isin 119860119899 and119910 isin 119860119898 119899119898 isin Z+119899 le 119898 and a nonnegative integer 120579 as input and outputs abinary number for which there are two cases

Case 1 If 119899 = 119898 then the output is computed according tothe following

120574119892 (119909 119910 120579) =

1 if 119898 minus 119906120573 [120572 (119909 119910) mod 2] le 120579

0 otherwise(9)

where mod denotes the usual modulo operation

Case 2 If 119899 lt 119898 then the output is computed using 119910119909instead of 119910 as follows

120574119892 (119909 119910 120579)

=

1 if 119898 minus 119906120573 [120572 (119909 1199101003817100381710038171003817119909) mod 2] le 120579

0 otherwise

(10)

In order to better illustrate how the generalized Gammaoperator works let us work through some examples of itsapplication Then if 119909 = [1 0 1 1 1] 119910 = [1 0 0 1 0]and 120579 = 1 what is the result of 120574119892(119909 119910 120579) First we calculate120572(119909 119910) = [1 1 2 1 2] then

120572 (119909 119910) mod 2 = [1 1 2 1 2] mod 2

= [1 1 0 1 0] (11)

now

119906120573 (120572 (119909 119910) mod 2) = 119906120573 [1 1 0 1 0]

= sum[120573 (1 1) 120573 (1 1) 120573 (0 0) 120573 (1 1) 120573 (0 0)]

= 1+ 1+ 0+ 1+ 0 = 3

(12)

Mathematical Problems in Engineering 5

and since 119898 = 5 5 minus 3 = 2 given that 2 le (120579 = 1) thenthe result of 120574119892 is 0 and it is decided that given 120579 the twovectors are not similar However if we increase 120579 = 2 theresult of 120574119892 will be 1 since 2 le (120579 = 2) In this case thetwo vectors will be considered similar The main idea of thegeneralized Gamma operator is to indicate (result equal to 1)that two binary vectors are similar allowing up to 120579 bits to bedifferent and still consider those vectors similar If more than120579 bits are different the vectors are said to be not similar (resultequal to 0) Thus if 120579 = 0 both vectors must be equal for 120574119892to output a 1

211 Gamma Classifier Algorithm Let (119909120583 119910120583) | 120583 =

1 2 119901 be the fundamental pattern set with cardinality 119901when a test pattern 119909 isin R119899 with 119899 isin Z+ is presented to theGamma classifier these steps are followed

(1) Code the fundamental set using the modified John-son-Mobius code obtaining a value 119890119898 for each com-ponent of the 119899-dimensional vectors in the funda-mental set These 119890119898 values are calculated from theoriginal real-valued vectors as

119890119898 (119895) = MAX119901119894=1 (119909119894119895) forall119895 isin 1 2 119899 (13)

(2) Compute the stop parameter 120588 = MAX119899119895=1[119890119898(119895)]

(3) Code the test pattern using the modified Johnson-Mobius code with the same parameters used to codethe fundamental set If any 119890119895 (obtained by offsettingscaling and truncating the original 119909119895) is greaterthan the corresponding 119890119898(119895) code it using the largernumber of bits

(4) Transform the index of all fundamental patterns intotwo indices one for their class and another for theirposition in the class (eg 119909120583 which is the 120596th patternfor class 119894 becomes 119909119894120596)

(5) Initialize 120579 to 0

(6) If 120579 = 0 test whether 119909 is a fundamental pattern bycalculating 120574119892(119909

120583119895 119909119895 0) for 120583 = 1 2 119901 and then

computing the initial weighted addition 1198880120583 for each

fundamental pattern as follows

1198880120583 =

119899

sum

119894=1120574119892 (119909120583119895 119909119895 0) for 120583 = 1 2 119901 (14)

If there is a unique maximum whose value equals 119899assign the class associated with such maximum to thetest pattern Consider

119910 = 119910120590 such that MAX119901119894=1119888

0119894 = 119888

0120590 = 119899 (15)

(7) Calculate 120574119892(119909119894120596119895 119909119895 120579) for each component of the

fundamental patterns

(8) Compute a weighted sum 119888119894 for each class accordingto this equation

119888119894 =

sum119896119894120596=1sum119899119895=1 120574119892 (119909

119894120596119895 119909119895 120579)

119896119894

(16)

where 119896119894 is the cardinality in the fundamental set ofclass 119894

(9) If there is more than one maximum among the dif-ferent 119888119894 increment 120579 by 1 and repeat steps (7) and(8) until there is a unique maximum or the stopcondition 120579 gt 120588 is fulfilled

(10) If there is a unique maximum among the 119888119894 assign119909 to the class corresponding to such maximum

119910 = 119910119895 such that MAX 119888119894 = 119888119895 (17)

(11) Otherwise assign 119909 to the class of the first maximum

As an example let us consider the following fundamentalpatterns

1199091= (

28)

1199092= (

19)

1199093= (

63)

1199094= (

74)

(18)

grouped in two classes 1198621 = 1199091 119909

2 and 1198622 = 119909

3 119909

4 Then

the patterns to be classified are

1199091= (

62)

1199092= (

39)

(19)

As can be seen the dimension of all patterns is 119899 = 2 andthere are 2 classes both with the same cardinality 1198961 = 2 and1198962 = 2 Now the steps in the algorithm are followed

6 Mathematical Problems in Engineering

(1) Code the fundamental set with themodified Johnson-Mobius code obtaining a value 119890119898 for each compo-nent Consider

1199091= [

0000011011111111

]

1199092= [

0000001111111111

]

1199093= [

0111111000000111

]

1199094= [

1111111000001111

]

(20)

Since the maximum value among the first compo-nents of all fundamental patterns is 7 then 119890119898(1) = 7and the maximum value among the second compo-nents is 119890119898(2) = 9Thus the binary vectors represent-ing the first components are 7 bits long while thoserepresenting the second component are 9 bits long

(2) Compute the stop parameter

120588 = MAX119899119895=1 [119890119898 (119895)] = MAX2119895=1 [119890119898 (1) 119890119898 (2)]

= MAX (7 9) = 7(21)

(3) Code the patterns to be classified into the modifiedJohnson-Mobius code using the same parametersused with the fundamental set

1199091= [

0111111000000011

]

1199092= [

0000111111111111

]

(22)

Again the binary vectors representing the first com-ponents are 7 bits long while those representing thesecond component are 9 bits long

(4) Transform the index of all fundamental patterns intotwo indices one for the class they belong to andanother for their position in the class Consider

1199091= 119909

11

1199092= 119909

12

1199093= 119909

21

1199094= 119909

22

(23)

Given that 1198621 = 1199091 119909

2 and 1198622 = 119909

3 119909

4 we know

that both 1199091 and 119909

2 belong to class 1198621 so the firstindex for both patterns becomes 1 the second index isused to differentiate between patterns assigned to thesame class thus 1199091 becomes 11990911 and 1199092 is now 119909

12Something similar happens to 1199093 and 1199094 but with thefirst index being 2 since they belong to class 1198622 119909

3

becomes 11990921 and 1199094 is now 11990922

(5) Initialize = 0(6) Calculate 120574119892(119909

119894120596119895 119909119895 120579) for each component of the

fundamental patterns in each class Thus for 1199091 wehave

120574119892 (119909111 119909

11 0) = 0

120574119892 (119909112 119909

12 0) = 0

120574119892 (119909121 119909

11 0) = 0

120574119892 (119909122 119909

12 0) = 0

120574119892 (119909211 119909

11 0) = 1

120574119892 (119909212 119909

12 0) = 0

120574119892 (119909221 119909

11 0) = 0

120574119892 (119909222 119909

12 0) = 0

(24)

Since 120579 = 0 for 120574119892 to give a result of 1 it is necessarythat both vectors 119909119894120596119895 and 1199091119895 are equal But only onefundamental pattern has its first component equal tothat of 1199091 119909211 = [0111111] = 119909

11 while no funda-

mental pattern has a second component equal to thatof the test pattern Given that this is the only case forwhich a component of a fundamental pattern is equalto the same component of the test pattern it is alsothe only instance in which 120574119892 outputs 1

Now for 1199092 we have

120574119892 (119909111 119909

21 0) = 0

120574119892 (119909112 119909

22 0) = 0

120574119892 (119909121 119909

21 0) = 0

120574119892 (119909122 119909

22 0) = 1

120574119892 (119909211 119909

21 0) = 0

120574119892 (119909212 119909

22 0) = 0

120574119892 (119909221 119909

21 0) = 0

120574119892 (119909222 119909

22 0) = 0

(25)

Mathematical Problems in Engineering 7

Again 120579 = 0 thus forcing both vectors 119909119894120596119895 and 1199092119895

to be equal in order to obtain a 1 Similarly to whathappened in the case of 1199091 there is but one instanceof such occurrence 119909122 = [1111111] = 11990922

(7) Compute a weighted sum 119888119894 for each class

1198881 =sum1198961120596=1sum119899119895=1 120574119892 (119909

1120596119895 119909

1119895 120579)

1198961

=

sum2120596=1sum

2119895=1 120574119892 (119909

1120596119895 119909

1119895 120579)

2

=

sum2120596=1 [(0 + 0) (0 + 0)]

2=

(0 + 0)2

= 0

(26)

Here we add together the results obtained on allcomponents of all fundamental patterns belongingto class 1198621 Since all gave 0 (ie none of these weresimilar to the corresponding component of the testpattern given the value of 120579 in effect) the result of theweighted addition for class 1198621 is 1198881 = 0 One has

1198882 =sum1198962120596=1sum119899119895=1 120574119892 (119909

2120596119895 119909

1119895 120579)

1198962

=

sum2120596=1sum

2119895=1 120574119892 (119909

2120596119895 119909

1119895 120579)

2

=

sum2120596=1 [(1 + 0) (0 + 0)]

2=

(1 + 0)2

= 05

(27)

In this case there was one instance similar to thecorresponding component of the test pattern (equalsince 120579 = 0 during this run) Thus the weightedaddition for class 1198622 is 1198882 = 05 given that the sum ofresults is divided by the cardinality of the class 119896 = 2And for 1199092

1198881 =sum1198961120596=1sum119899119895=1 120574119892 (119909

1120596119895 119909

2119895 120579)

1198961

=

sum2120596=1sum

2119895=1 120574119892 (119909

1120596119895 119909

2119895 120579)

2

=

sum2120596=1 [(0 + 0) (0 + 1)]

2=

(0 + 0)2

= 05

(28)

There was one result of 120574119892 equal to 1 which coupledwith the cardinality of class1198621 being 2makes 1198881 = 05One has

1198882 =sum1198962120596=1sum119899119895=1 120574119892 (119909

2120596119895 119909

2119895 120579)

1198962

=

sum2120596=1sum

2119895=1 120574119892 (119909

2120596119895 119909

2119895 120579)

2

=

sum2120596=1 [(0 + 0) (0 + 0)]

2=

(0 + 0)2

= 0

(29)

No similarities were found between the fundamentalpatterns of class1198622 and the test pattern 1199092 (for 120579 = 0)thus 1198882 = 0

(8) If there is more than one maximum among the dif-ferent 119888119894 increment 120579 by 1 and repeat steps (6) and (7)until there is a unique maximum or the stop condi-tion 120579 ge 120588 is fulfilledThere is a uniquemaximum (foreach test pattern) so we go directly to step (9)

(9) If there is a unique maximum assign 119910 to the classcorresponding to such maximum For 1199091

1198621199091 = 1198622 since

MAX2119894=1119888119894 = MAX (0 05) = 05 = 1198882

(30)

while for 1199092

1198621199092 = 1198621 since

MAX2119894=1119888119894 = MAX (05 0) = 05 = 1198881

(31)

(10) Otherwise assign 119910 to the class of the first maximumThis is unnecessary since both test patterns havealready been assigned a class

22 SlidingWindows Awidespread approach for data streammining is the use of a sliding window to keep only a represen-tative portion of the data We are not interested in keeping allthe information of the data stream particularly whenwe haveto meet the memory space constraint required for algorithmsworking in this context This technique is able to deal withconcept drift eliminating those data points that come froman old concept According to [5] the training window sizecan be fixed or variable over time

Sliding windows of a fixed size store in memory a fixednumber of the 119908 most recent examples Whenever a newexample arrives it is saved to memory and the oldest one isdiscarded These types of windows are similar to first-in first-out data structures This simple adaptive learning method isoften used as a baseline in evaluation of new algorithms

Sliding windows of variable size vary the number ofexamples in a window over time typically depending on theindications of a change detector A straightforward approachis to shrink the window whenever a change is detected suchthat the training data reflects the most recent concept andgrow the window otherwise

23 Proposed Method The Gamma classifier has previ-ously been used for time series prediction in air pollution[37] and oil well production [40] showing very promisingresults The current work takes the Gamma classifier andcombines it with a sliding window approach with the aimof developing a newmethod capable of performing classifica-tion over a continuous data stream referred to asDS-Gammaclassifier

8 Mathematical Problems in Engineering

(1) Initialize window(2) for each 119909120591 do(3) If 120591 lt 119882 then(4) Add 119909120591 to the tail of the window(5) Else(6) Use 119909120591 to classify(7) Add 119909120591 to the tail of the window

Algorithm 1 Learning policy algorithm

A data stream 119878 can be defined as a sequence of dataelements of the form 119878 = 1199041 1199042 119904now from a universe ofsize119863 where119863 is huge and potentially infinite but countableWe assume that each element 119904119894 has the following form

119904119894 = (119909120591 119910120591) (32)

where 119909120591 is an input vector of size 119899 and 119910120591 is the class labelassociatedwith it 120591 isin Z+ is an index to indicate the sequentialorder of the elements in the data stream 120591 will be boundedby the size of the data stream such that if the data stream isinfinite 120591 will take infinite values but countable

One of the main challenges for mining data streamsis the ability to maintain an accurate decision model Thecurrent model has to incorporate new knowledge from newdata and also it should forget old information when datais outdated To address these issues the proposed methodimplements a learning and forgetting policy The learning(insertion)forgetting (removal) policies were designed toguarantee the space and temporal relevance of the data andthe consistency of the current model

The proposed method implements a fixed windowapproach This approach achieves concept drift detection inan indirect way by considering only the most recent dataAlgorithm 1 outlines the general steps used as learning policyInitially the window is created with the first 119882 labeledrecords where 119882 is the size of the window After thatevery sample is used for classification and evaluation of theperformance and then it is used to update the window

Classifiers that deal with concept drifts are forced toimplement forgetting adaptation or drift detection mech-anisms in order to adjust to changing environments Inthis proposal a forgetting mechanism is implemented Algo-rithm 2 outlines the basic forgetting policy implementedin this proposal The oldest element in the window 119909

120575 isremoved

24 MOA Massive Online Analysis (MOA) is a softwareenvironment for implementing algorithms and runningexperiments for online learning from evolving data streams[41] MOA contains a collection of offline and online algo-rithms for both classification and clustering as well as toolsfor evaluation on data streams with concept drifts MOA alsocan be used to build synthetic data streams using generatorsreading ARFF files or joining several streams MOA streamsgenerators allow simulating potentially infinite sequences of

Table 2 Datasets characteristics

Dataset Instances Attributes ClassesAgrawal 100000 10 2Bank 45211 16 2CoverType 581012 54 7Electricity 45312 8 2Hyperplane 100000 10 2RandomRBF 100000 10 2

data with different characteristics (different types of conceptdrifts and recurrent concepts among others) All the algo-rithms used for comparative analysis are implemented underthe MOA framework

25 Data Streams This section provides a brief descriptionof the data streams used during the experimental phase Theproposed solution has been tested using synthetic and realdata streams For data stream classification problems just afew suitable data streams can be successfully used for evalu-ation A few data streams with enough examples are hostedin [42 43] For further evaluation the use of synthetic datastreams generators becomes necessary The synthetics datastreams used in our experiments were generated using theMOA framework mentioned in the previous section A sum-mary of the main characteristics of the data streams is shownin Table 2 Descriptions of the data streams were taken from[42 43] in the case of real ones and from [44] for the synthet-ics

251 Real World Data Streams

Bank Marketing Dataset The Bank Marketing dataset isavailable from UCI Machine Learning Repository [43] Thisdata is related to direct marketing campaigns (phone calls)of a Portuguese banking institution The classification goal isto predict if the client will subscribe or not a term deposit Itcontains 45211 instances Each instance has 16 attributes andthe class label The dataset has two classes that identified if abank term deposit would be or would not be subscribed

Electricity Dataset This data was collected from the Aus-tralian New South Wales electricity market In this marketprices are not fixed and are affected by demand and supply ofthemarketTheprices of themarket are set every fiveminutesThe dataset was first described by Harries in [45] During thetime period described in the dataset the electricity marketwas expanded with the inclusion of adjacent areas whichleads tomore elaboratemanagement of the supplyThe excessproduction of one region could be sold on the adjacentregionThe Electricity dataset contains 45312 instances Eachexample of the dataset refers to a period of 30 minutes (iethere are 48 instances for each time period of one day) Eachexample on the dataset has 8 fields the day of week the timestamp the New South Wales electricity demand the NewSouthWales electricity price the Victoria electricity demandtheVictoria electricity price the scheduled electricity transfer

Mathematical Problems in Engineering 9

(1) for each 119909120591 do(2) If 120591 gt 119882 then(3) Remove 119909120575(120575 = 120591 minus119882 + 1) from the beginning of the window

Algorithm 2 Forgetting policy algorithm

between states and the class label The class label identifiesthe change of the price relative to a moving average of the last24 hours We use the normalized versions of these datasets sothat the numerical values are between 0 and 1 The dataset isavailable from [42]

Forest CoverType Dataset The Forest CoverType dataset isone of the largest datasets available from the UCI MachineLearning Repository [43] This dataset contains informationthat describes forest cover types from cartographic variablesA given observation (30times 30meter cell) was determined fromthe US Forest Service (USFS) Region 2 Resource InformationSystem (RIS) data It contains 581012 instances and 54attributes Data is in raw form (not scaled) and containsbinary (0 or 1) columns of data for qualitative independentvariables (wilderness areas and soil types) The dataset has 7classes that identify the type of forest cover

252 Synthetic Data Streams

Agrawal Data Stream This generator was originally intro-duced by Agrawal et al in [46] The MOA implementationwas used to generate the data streams for the experimen-tal phase in this work The generator produces a streamcontaining nine attributes Although not explicitly stated bythe authors a sensible conclusion is that these attributesdescribe hypothetical loan applications There are ten func-tions defined for generating binary class labels from theattributes Presumably these determine whether the loanshould be approved Perturbation shifts numeric attributesfrom their true value adding an offset drawn randomly froma uniform distribution the range of which is a specifiedpercentage of the total value range [44] For each experimenta data stream with 100000 examples was generated

RandomRBFData StreamThis generator was devised to offeran alternate complex concept type that is not straightforwardto approximate with a decision tree model The RBF (RadialBasis Function) generator works as follows A fixed numberof random centroids are generated Each center has a randomposition a single standard deviation class label and weightNew examples are generated by selecting a center at randomtaking weights into consideration so that centers with higherweight are more likely to be chosen A random direction ischosen to offset the attribute values from the central pointThe length of the displacement is randomly drawn from aGaussian distribution with standard deviation determined bythe chosen centroid The chosen centroid also determinesthe class label of the example Only numeric attributes aregenerated [44] For our experiments a data stream with

100000 data instances 10 numerical attributes and 2 classeswas generated

Rotating Hyperplane Data Stream This synthetic data streamwas generated using the MOA framework A hyperplane in119889-dimensional space is the set of points 119909 that satisfy [44]

119889

sum

119894=1119909119894119908119894 = 1199080 =

119889

sum

119894=1119908119894 (33)

where 119909119894 is the 119894th coordinate of 119909 examples for whichsum119889119894=1 119909119894119908119894 ge 1199080 are labeled positive and examples for which

sum119889119894=1 119909119894119908119894 lt 1199080 are labeled negative Hyperplanes are useful

for simulating time-changing concepts because orientationand position of the hyperplane can be adjusted in a gradualway if we change the relative size of the weights In MOAthey introduce change to this data stream by adding driftto each weight attribute using this formula 119908119894 = 119908119894 + 119889120590where 120590 is the probability that the direction of change isreversed and 119889 is the change applied to every example Forour experiments a data stream with 100000 data instances10 numerical attributes and 2 classes was generated

26 Algorithms Evaluation One of the objectives of this workis to perform a consistent comparison between the classi-fication performance of our proposal and the classificationperformance of other well-known data stream pattern clas-sification algorithms Evaluation of data stream algorithmsis a relatively new field that has not been studied as well asevaluation on batch learning In traditional batch learningevaluation has been mainly limited to evaluating a modelusing different random reorder of the same static dataset (119896-fold cross validation leave-one-out and bootstrap) [2] Fordata stream scenarios the problem of infinite data raises newchallenges On the other hand the batch approaches cannoteffectively measure accuracy of a model in a context withconcepts that change over time In the data stream settingone of the main concerns is how to design an evaluationpractice that allows us to obtain a reliablemeasure of accuracyover time According to [41] two main approaches arise

261 Holdout When traditional batch learning reaches ascale where cross validation is too time-consuming it is oftenaccepted to insteadmeasure performance on a single holdoutset An extension of this approach can be applied for datastream scenarios First a set of 119872 instances is used to trainthe model then a set of the following unseen 119873 instances isused to test the model then it is again trained with the next119872 instances and testedwith the subsequence119873 instances andso forth

10 Mathematical Problems in Engineering

262 Interleaved Test-Then-Train or Prequential Each indi-vidual example can be used to test the model before it is usedfor training and from this the accuracy can be incrementallyupdated When intentionally performed in this order themodel is always being tested on examples it has not seen

Holdout evaluation gives a more accurate estimationof the performance of the classifier on more recent dataHowever it requires recent test data that is sometimes difficultto obtain while the prequential evaluation has the advantagethat no holdout set is needed for testing making maximumuse of the available data In [47] Gama et al present a generalframework to assess data stream algorithms They studieddifferent mechanisms of evaluation that include holdoutestimator the prequential error and the prequential errorestimated over a sliding window or using fading factorsTheystated that the use of prequential error with forgetting mech-anisms reveals to be advantageous in assessing performanceand in comparing stream learning algorithms Since ourproposedmethod isworkingwith a slidingwindowapproachwe have selected the prequential error over sliding windowfor evaluation of the proposedmethodThe prequential error119875119882 for a sliding window of size 119882 computed at time 119894 isbased on an accumulated sumof a 0-1 loss function119871 betweenthe predicted 119910119896 values and the observed values 119910119896 using thefollowing expression [47]

119875119882 (119894) =1119882

119894

sum

119896=119894minus119882+1119871 (119910119896 119910119896) (34)

We also evaluate the performance results of each competingalgorithm with the Gamma classifier using a statistical testWe use the Wilcoxon test [48] to assess the statistical sig-nificance of the results The Wilcoxon signed-ranks test is anonparametric test which can be used to rank the differencesin performances of two classifiers for each dataset ignoringthe signs and comparing the ranks for the positive and thenegative differences

27 Experimental Setup To ensure valid comparison of clas-sification performance the same conditions and validationschemes were applied in each experiment Classificationperformance of each of the compared algorithms was cal-culated using the prequential error approach These resultswill be used to compare the performance of our proposaland other data stream classification algorithms We haveaimed at choosing representative algorithms from differentapproaches covering the state of the art in data stream classi-fication such as Hoeffding Tree IBL Naıve Bayes with DDM(Drift Detection Method) SVM and ensemble methods Allof these algorithms are implemented in theMOA frameworkFurther details on the implementation of these algorithmscan be found in [44] For all data streams with exceptionof the Electricity dataset the experiments were executed 10times and the average of the performance and executiontime results are presented Records in the Electricity datasethave a temporal relation that should not be altered forthis reason with this specific dataset the experiment wasexecuted just one time The Bank and CoverType datasets

were reordered 10 times using the randomized filter fromthe WEKA platform [49] for each experiment For thesynthetics data streams (Agrawal Rotating Hyperplane andRandomRBF) 10 data streams were generated for each oneof them using different random seeds in MOA Performancewas calculated using the prequential error introduced in theprevious section We also measure the total time used forclassification to evaluate the efficiency of each algorithm Allexperiments were conducted using a personal computer withan Intel Core i3-2100 processor running Ubuntu 1304 64-bitoperating system with 4096GB of RAM

3 Results and Discussion

In this section we present and discuss the results obtainedduring the experimental phase throughout which six datastreams three real and three synthetic were used to obtainthe classification and time performance for each of thecompared classification algorithms First we evaluate thesensibility of the proposed method to the change on thewindows size which is a parameter that can largely influencethe performance of the classifier Table 3 shows the averageaccuracy of the DS-Gamma classifier for different windowsizes best accuracy for each data stream is emphasized withboldface Table 4 shows the average total time used to classifyeach data stream The results of the average performancewith indicator of the standard deviation are also depicted inFigure 1

In general the DS-Gamma classifier performanceimproved as the window size increases with the exceptionof Electricity data stream for this data stream performanceis better with smaller window sizes It is worth noting thatthe DS-Gamma classifier achieved its best performance forthe Electricity data stream with a window size of 50 witha performance of 8912 This performance is better thanall the performances achieved by all the other comparedalgorithms with a window size = 1000 For the other datastreams the best performance was generally obtained withwindow sizes between 500 and 1000 With window sizesgreater than 1000 performance keeps stable and in somecases even degrades as we can observe in the results fromthe Bank and Hyperplane data streams in Figure 1 Figure 2shows the average classification time for different windowsize A linear trend line (dotted line) and the coefficient ofdetermination 1198772 were included in the graphic of each datastream The 1198772 values ranging between 09743 and 09866show that time is growing in linear way as the size of thewindow increases

To evaluate the DS-Gamma classifier we compared itsperformance with other data stream classifiers Table 5 pre-sents the performance results of each evaluated algorithm oneach data stream including the standard deviation Best per-formance is emphasized with boldface We also include boxplot with the same results in Figure 3 For the Agrawal datastreamHoeffdingTree and theDS-Gamma classifier achievedthe two best performances OzaBoostAdwin presents the bestperformance for the CoverType and Electricity data streamsbut this ensemble classifier exhibits the highest classificationtime for all the data streams as we can observe in Table 6 and

Mathematical Problems in Engineering 11

Table 3 Average accuracy of DS-Gamma for different windows size

Dataset Window size50 100 200 500 750 1000 1250 1500 1750 2000

Agrawal 7964 8597 8986 9259 9288 9282 9267 9251 9232 9207Bank 6819 7340 7705 7913 7912 7896 7860 7824 7776 7737CoverType 3821 3761 3970 4430 4659 4777 4862 4919 4951 4974Electricity 8912 8526 8171 7984 7858 7714 7636 7594 7531 7449Hyperplane 7825 8265 8528 8656 8655 8630 8598 8565 8534 8500RandomRBF 6461 6739 6933 7056 7081 7085 7074 7063 7050 7033

Table 4 Averaged classification time (seconds) of DS-Gamma for different windows size

Dataset Window size50 100 200 500 750 1000 1250 1500 1750 2000

Agrawal 224 450 793 1926 2804 3900 4592 5757 6390 7349Bank 066 114 207 475 701 917 1154 1455 1583 1787CoverType 8347 14081 25120 34005 25412 33503 42894 52239 66924 67699Electricity 034 053 098 233 356 463 599 705 830 949Hyperplane 224 401 756 1819 2676 3028 4428 5296 6187 7063RandomRBF 279 409 773 1854 2735 3072 4502 5412 6223 7104

Table 5 Averaged accuracy () comparison

Classifier Agrawal Bank CoverType Electricity Hyperplane RandomRBFAcc 120590 Acc 120590 Acc 120590 Acc 120590 Acc 120590 Acc 120590

Bayes-DDM 8816 019 8338 141 6761 681 8105 mdash 8602 139 7187 382Perceptron DDM 6720 011 7343 1053 7076 555 7861 mdash 8565 149 7188 593Hoeffding Tree 9439 024 8852 011 7229 268 8241 mdash 8241 562 8672 169Active Classifier 9082 162 8821 014 6731 103 7859 mdash 7950 655 7532 270IBL-DS 8676 048 8652 021 6173 946 7534 mdash 8420 077 9332 121AWE 9217 059 8790 038 7616 366 7836 mdash 8588 165 9198 114OzaBoostAdwin 8711 262 8767 050 7855 497 8792 mdash 8321 191 9149 114SGD 6720 011 7656 166 5901 737 8452 mdash 8319 042 6174 331SPegasos 5582 017 7746 199 5047 517 5760 mdash 5280 608 5333 313DS-Gamma 9282 010 7896 022 4777 431 7714 mdash 8630 135 7085 254

Table 6 Averaged classification time (seconds) comparison

Classifier Agrawal Bank CoverType Electricity Hyperplane RandomRBFTime 120590 Time 120590 Time 120590 Time 120590 Time 120590 Time 120590

Bayes-DDM 062 011 039 008 1776 162 061 mdash 070 001 069 001Perceptron DDM 040 002 020 005 828 010 019 mdash 045 002 044 002Hoeffding Tree 098 023 054 003 3005 1106 078 mdash 136 005 140 005Active Classifier 064 003 026 002 1514 074 023 mdash 072 004 070 002IBL-DS 5319 211 3851 826 105760 6095 861 mdash 8639 096 8810 106AWE 1432 020 663 030 101247 30441 670 mdash 1803 364 3752 303OzaBoostAdwin 19262 4447 4693 1056 1058973 389664 3975 mdash 22399 5134 5704 1666SGD 014 004 011 0009 4035 007 012 mdash 027 002 027 002SPegasos 022 0006 011 0016 1684 012 012 mdash 031 005 024 001DS-Gamma 3900 358 916 005 33503 1291 463 mdash 3028 079 3072 094

12 Mathematical Problems in Engineering

Agrawal data stream

Window size

95

90

85

80

7550 100 200 500 750 1000 1250 1500 1750 2000

Hyperplane data stream

90

85

80

70

75

65

Window size50 100 200 500 750 1000 1250 1500 1750 2000

Bank data stream85

80

70

75

6550 100 200 500 750 1000 1250 1500 1750 2000

Window size

Window size

90

85

80

70

75

65

60

50

55

50 100 200 500 750 1000 1250 1500 1750 2000

RandomRBF data stream

60

50

40

30

20

10

0

Window size50 100 200 500 750 1000 1250 1500 1750 2000

CoverType data stream

90

95

85

80

70

75

6550 100 200 500 750 1000 1250 1500 1750 2000

Window size

Electricity data stream

Figure 1 Average accuracy of DS-Gamma for different windows size

Figure 4 This can be a serious disadvantage especially indata stream scenarios The lowest times are achieved by theperceptron with Drift Detection Method (DDM) but itsperformance is generally low when compared to the othermethods A fact that caught our attention was the low perfor-mance of the DS-Gamma classifier for the CoverType datastream This data stream presents a heavy class imbalancethat seems to be negatively affecting the performance of theclassifier As we can observe in step (8) of the DS-Gamma

classifier algorithm classification relies on a weighted sumper class when a class greatly outnumbers the other(s) thissum will be biased to the class with most members

To statistically compare the results obtained by the DS-Gamma classifier and the other evaluated algorithms theWilcoxon signed-ranks test was used Table 7 shows thevalues of the asymptotic significance (2-sided) obtained bythe Wilcoxon test using 120572 = 005 The obtained values areall greater than 120572 so the null hypothesis is not rejected and

Mathematical Problems in Engineering 13

Hyperplane data stream

Window size

Window size

CoverType data stream Electricity data stream

RandomRBF data stream

Agrawal data stream Bank data stream80

70

60

50

40

30

20

20

15

10

5

010

050 100 200 500 750 1000 1250 1500 1750 2000

50 100 200 500 750 1000 1250 1500 1750 2000

Window size50 100 200 500 750 1000 1250 1500 1750 2000 Window size

50 100 200 500 750 1000 1250 1500 1750 2000

Window size50 100 200 500 750 1000 1250 1500 1750 2000

Window size50 100 200 500 750 1000 1250 1500 1750 2000

minus10

70

60

50

40

30

20

10

0

550

450

350

250

150

50

10

8

6

4

2

0

minus10

minus50

70

60

50

40

30

20

10

0

minus10

minus2

minus5

R

2

= 09866

R

2

= 09854

R

2

= 09804

R

2

= 09796

R

2

= 09743

R

2

= 09842

Figure 2 Averaged classification time (seconds) of DS-Gamma for different windows size

we can infer that there are no significant differences amongthe compared algorithms

4 Conclusions

In this paper we describe a novel method for the task ofpattern classification over a continuous data stream based onan associative model The proposed method is a supervisedpattern recognition model based on the Gamma classifierDuring the experimental phase the proposed method wastested using different data stream scenarios with real andsynthetic data streams The proposed method presented verypromising results in terms of performance and classification

Table 7 Wilcoxon results

Compared algorithms Asymptotic significance (2-sided)Gamma-Bayes-DDM 0345Gamma-Perceptron DDM 0917Gamma-Hoeffding Tree 0116Gamma-Active Classifier 0463Gamma-IBL-DS 0345Gamma-AWE 0116Gamma-OzaBoostAdwin 0116Gamma-SGD 0600Gamma-SPegasos 0075

14 Mathematical Problems in Engineering

Agrawal data stream

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

95

90

85

80

75

70

65

60

55

50

Bank data stream

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

90

80

70

60

50

40

Hyperplane data stream

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

95

90

85

80

75

70

65

60

55

45

50

RandomRBF data stream

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

95

90

85

80

75

70

65

60

55

45

50

CoverType data stream

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

90

85

80

75

70

65

60

55

45

50

Electricity data stream100

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

90

80

70

60

50

40

30

20

10

0

Figure 3 Averaged accuracy () comparison

time when compared with some of the state-of-the-art algo-rithms for data stream classification

We also study the effect of the window size over the clas-sification accuracy and time In general accuracy improvedas the window size increased We observed that best perfor-mances were obtained with window sizes between 500 and

1000 For larger window sizes performance remained stableand in some cases even declined

Since the proposed method based its classification ona weighted sum per class it is strongly affected by severeclass imbalance as we can observe from the results obtainedwith the experiments performed with the CoverType dataset

Mathematical Problems in Engineering 15

Agrawal data stream

200

150

100

50

0

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

1

10

100

Hyperplane data stream

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

1

10

100

1000

10000

100000CoverType data stream

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

0102030405060708090

RandomRBF data stream

0

10

20

30

40

50

60Bank data stream

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

05

10152025303540

Electricity data stream

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

Figure 4 Averaged classification time (seconds) comparison

a heavily imbalance dataset where the accuracy of theclassifier was the worst of the compared algorithms As futurework a mechanism to address severe class imbalance shouldbe integrated to improve classification accuracy

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

The authors would like to thank the Instituto PolitecnicoNacional (Secretarıa Academica COFAA SIP CIC andCIDETEC) the CONACyT and SNI for their economicalsupport to develop this workThe support of theUniversity ofPorto was provided during the research stay from August toDecember 2014 The authors also want to thank Dr YennyVilluendas Rey for her contribution to perform the statisticalsignificance analysis

16 Mathematical Problems in Engineering

References

[1] V Turner J FGantzD Reinsel and SMinton ldquoThedigital uni-verse of opportunities rich data and the increasing value of theinternet of thingsrdquo Tech Rep IDC Information andData 2014httpwwwemccomleadershipdigital-universe2014iviewindexhtm

[2] A Bifet Adaptive Stream Mining Pattern Learning and Miningfrom Evolving Data Streams IOS Press Amsterdam The Neth-erlands 2010

[3] J Gama Knowledge Discovery from Data Streams Chapman ampHallCRC 2010

[4] J Beringer and E Hullermeier ldquoEfficient instance-based learn-ing on data streamsrdquo Intelligent Data Analysis vol 11 no 6 pp627ndash650 2007

[5] J Gama I Zliobaite A Bifet M Pechenizkiy and A Bou-chachia ldquoA survey on concept drift adaptationrdquo ACM Comput-ing Surveys vol 46 no 4 article 44 2014

[6] P Domingos and G Hulten ldquoMining high-speed data streamsrdquoinProceedings of the 6thACMSIGKDD International Conferenceon Knowledge Discovery and Data Mining pp 71ndash80 BostonMass USA August 2000

[7] G Hulten L Spencer and P Domingos ldquoMining time-chang-ing data streamsrdquo in Proceedings of the 7th ACM SIGKDD Inter-national Conference on Knowledge Discovery and Data Mining(KDD rsquo01) pp 97ndash106 San Francisco Calif USA August 2001

[8] C Liang Y Zhang P Shi and Z Hu ldquoLearning very fast deci-sion tree from uncertain data streams with positive and unla-beled samplesrdquo Information Sciences vol 213 pp 50ndash67 2012

[9] L Rutkowski M Jaworski L Pietruczuk and P Duda ldquoDeci-sion trees for mining data streams based on the gaussianapproximationrdquo IEEE Transactions on Knowledge and DataEngineering vol 26 no 1 pp 108ndash119 2014

[10] G Widmer andM Kubat ldquoLearning in the presence of conceptdrift and hidden contextsrdquo Machine Learning vol 23 no 1 pp69ndash101 1996

[11] F Ferrer-Troyano J S Aguilar-Ruiz and J C RiquelmeldquoData streams classification by incremental rule learning withparameterized generalizationrdquo in Proceedings of the 2006 ACMSymposium on Applied Computing pp 657ndash661 Dijon FranceApril 2006

[12] D Mena-Torres and J S Aguilar-Ruiz ldquoA similarity-basedapproach for data stream classificationrdquo Expert Systems withApplications vol 41 no 9 pp 4224ndash4234 2014

[13] A Shaker and E Hullermeier ldquoIBLStreams a system forinstance-based classification and regression on data streamsrdquoEvolving Systems vol 3 no 4 pp 235ndash249 2012

[14] A Ghazikhani R Monsefi and H S Yazdi ldquoRecursive leastsquare perceptron model for non-stationary and imbalanceddata stream classificationrdquo Evolving Systems vol 4 no 2 pp119ndash131 2013

[15] A Ghazikhani R Monsefi and H S Yazdi ldquoEnsemble ofonline neural networks for non-stationary and imbalanced datastreamsrdquo Neurocomputing vol 122 pp 535ndash544 2013

[16] A Sancho-Asensio A Orriols-Puig and E GolobardesldquoRobust on-line neural learning classifier system for data streamclassification tasksrdquo Soft Computing vol 18 no 8 pp 1441ndash14612014

[17] D M Farid L Zhang A Hossain et al ldquoAn adaptive ensembleclassifier for mining concept drifting data streamsrdquo ExpertSystems with Applications vol 40 no 15 pp 5895ndash5906 2013

[18] D Brzezinski and J Stefanowski ldquoCombining block-based andonline methods in learning ensembles from concept driftingdata streamsrdquo Information Sciences vol 265 pp 50ndash67 2014

[19] G Ditzler and R Polikar ldquoIncremental learning of conceptdrift from streaming imbalanced datardquo IEEE Transactions onKnowledge and Data Engineering vol 25 no 10 pp 2283ndash23012013

[20] M J Hosseini Z Ahmadi andH Beigy ldquoUsing a classifier poolin accuracy based tracking of recurring concepts in data streamclassificationrdquo Evolving Systems vol 4 no 1 pp 43ndash60 2013

[21] N Saunier and S Midenet ldquoCreating ensemble classifiersthrough order and incremental data selection in a streamrdquo Pat-tern Analysis and Applications vol 16 no 3 pp 333ndash347 2013

[22] MMMasud J Gao L Khan J Han and BMThuraisinghamldquoClassification and novel class detection in concept-driftingdata streams under time constraintsrdquo IEEE Transactions onKnowledge and Data Engineering vol 23 no 6 pp 859ndash8742011

[23] A Alzghoul M Lofstrand and B Backe ldquoData stream fore-casting for system fault predictionrdquo Computers and IndustrialEngineering vol 62 no 4 pp 972ndash978 2012

[24] J Kranjc J Smailovic V Podpecan M Grcar M Znidarsicand N Lavrac ldquoActive learning for sentiment analysis on datastreams methodology and workflow implementation in theClowdFlows platformrdquo Information Processing and Manage-ment vol 51 no 2 pp 187ndash203 2015

[25] J Smailovic M Grcar N Lavrac and M Znidarsic ldquoStream-based active learning for sentiment analysis in the financialdomainrdquo Information Sciences vol 285 pp 181ndash203 2014

[26] S Shalev-Shwartz Y Singer N Srebro and A Cotter ldquoPegasosprimal estimated sub-gradient solver for SVMrdquo in Proceedingsof the 24th International Conference on Machine Learning pp807ndash814 June 2007

[27] Z Miller B Dickinson W Deitrick W Hu and A H WangldquoTwitter spammer detection using data stream clusteringrdquoInformation Sciences vol 260 pp 64ndash73 2014

[28] B Krawczyk and M Wozniak ldquoIncremental weighted one-class classifier for mining stationary data streamsrdquo Journal ofComputational Science vol 9 pp 19ndash25 2015

[29] J Gama ldquoA survey on learning from data streams current andfuture trendsrdquo Progress in Artificial Intelligence vol 1 no 1 pp45ndash55 2012

[30] L I Kuncheva ldquoChange detection in streaming multivariatedata using likelihood detectorsrdquo IEEE Transactions on Knowl-edge and Data Engineering vol 25 no 5 pp 1175ndash1180 2013

[31] G J Ross N M Adams D K Tasoulis and D J Hand ldquoExpo-nentially weighted moving average charts for detecting conceptdriftrdquoPatternRecognition Letters vol 33 no 2 pp 191ndash198 2012

[32] J Gama P Medas G Castillo and P Rodrigues ldquoLearning withdrift detectionrdquo in Advances in Artificial IntelligencemdashSBIA2004 vol 3171 of Lecture Notes in Computer Science pp 286ndash295 Springer Berlin Germany 2004

[33] X Wu P Li and X Hu ldquoLearning from concept drifting datastreams with unlabeled datardquoNeurocomputing vol 92 pp 145ndash155 2012

[34] X Qin Y Zhang C Li and X Li ldquoLearning from data streamswith only positive and unlabeled datardquo Journal of IntelligentInformation Systems vol 40 no 3 pp 405ndash430 2013

Mathematical Problems in Engineering 17

[35] M M Masud C Woolam J Gao et al ldquoFacing the reality ofdata stream classification coping with scarcity of labeled datardquoKnowledge and Information Systems vol 33 no 1 pp 213ndash2442012

[36] J Read A Bifet G Holmes and B Pfahringer ldquoScalable andefficient multi-label classification for evolving data streamsrdquoMachine Learning vol 88 no 1-2 pp 243ndash272 2012

[37] I Lopez-Yanez A J Arguelles-Cruz O Camacho-Nieto andC Yanez-Marquez ldquoPollutants time-series prediction usingthe gamma classifierrdquo International Journal of ComputationalIntelligence Systems vol 4 no 4 pp 680ndash711 2011

[38] M E Acevedo-Mosqueda C Yanez-Marquez and I Lopez-Yanez ldquoAlpha-beta bidirectional associative memories theoryand applicationsrdquo Neural Processing Letters vol 26 no 1 pp 1ndash40 2007

[39] C Yanez-Marquez E M Felipe-Riveron I Lopez-Yanez andR Flores-Carapia ldquoA novel approach to automatic color match-ingrdquo in Progress in Pattern Recognition Image Analysis andApplications vol 4225 of Lecture Notes in Computer Science pp529ndash538 Springer Berlin Germany 2006

[40] I Lopez-Yanez L Sheremetov and C Yanez-Marquez ldquoAnovel associative model for time series data miningrdquo PatternRecognition Letters vol 41 no 1 pp 23ndash33 2014

[41] A Bifet G Holmes R Kirkby and B Pfahringer ldquoData streammining a practical approachrdquo Tech RepUniversity ofWaikato2011

[42] Massive Online Analysis (MOA) 2014 httpmoacmswaikatoacnzdatasets

[43] K Bache and M Lichman UCI Machine Learning RepositorySchool of Information and Computer Science University ofCalifornia Irvine Calif USA 2013 httparchiveicsucieduml

[44] A Bifet R Kirkby P Kranen and P Reutemann ldquoMassiveOnline Analysis Manualrdquo 2012 httpmoacmswaikatoacnzdocumentation

[45] M Harries ldquoSplice-2 comparative evaluation electricity pric-ingrdquo Tech Rep University of South Wales Cardiff UK 1999

[46] R Agrawal T Imielinski and A Swami ldquoDatabase mining aperformance perspectiverdquo IEEE Transactions on Knowledge andData Engineering vol 5 no 6 pp 914ndash925 1993

[47] J Gama R Sebastiao and P P Rodrigues ldquoOn evaluatingstream learning algorithmsrdquo Machine Learning vol 90 no 3pp 317ndash346 2013

[48] F Wilcoxon ldquoIndividual comparisons by ranking methodsrdquoBiometrics Bulletin vol 1 no 6 pp 80ndash83 1945

[49] M Hall E Frank G Holmes B Pfahringer P Reutemannand I H Witten WEKA 3 Data Mining Software in Java 2010httpwwwcswaikatoacnzmlweka

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 3: Research Article Data Stream Classification Based …downloads.hindawi.com/journals/mpe/2015/939175.pdfuncertain data [ ], classi cation with unlabeled instances [ , , ], novel class

Mathematical Problems in Engineering 3

incremental One-Class Support Vector Machine that assignsweights to each object according to its level of significanceThey also introduce two schemes for estimating weights fornew incoming data and examine their usefulness

Another well studied issue in the data stream contextis concept drift detection The change in the concept mightoccur due to changes in hidden variables affecting the phe-nomena or changes in the characteristics of the observedvariables According to [29] two main tactics are usedin concept drift detection (1) monitoring the evolution ofperformance indicators using statistical techniques and (2)monitoring distribution of two different time windows Somealgorithms use methods that adapt the decision model atregular intervals without taking into account if change reallyhappened this involves extra processing time On the otherhand explicit changemethods can detect the point of changequantify the change and take the needed actions In generalthemethods for concept drift detection can be integratedwithdifferent base learners Somemethods related to this issue canbe found in [30ndash32]

Some other issues related to data stream classificationhave also been addressed such as unbalanced data streamclassification [14 15 19] classification in the presence ofuncertain data [8] classification with unlabeled instances[8 33 34] novel class detection [22 35] and multilabelclassification [36]

In this paper we describe a novel method for the taskof pattern classification over a continuous data stream basedon an associative model The proposed method is based onthe Gamma classifier [37] which is a supervised patternrecognition model inspired on the Alpha-Beta associativememories [38] The proposal has been developed to workin data stream scenarios using a sliding window approachand it is capable of handling the space and time constraintsinherent to this type of scenarios Various methods havebeen designed for data stream classification but to ourknowledge the associative approach has never been used forthis kind of problems The main contribution of this workis the implementation of a novel method for data streamclassification based on an associative model

The rest of this paper is organized as follows Section 2describes all the materials and methods needed to developour proposal Section 3 describes how the experimental phasewas conducted and discusses the results Some conclusionsare presented in Section 4 and finally the Acknowledgmentsand References are included

2 Materials and Methods

21 Gamma Classifier This classifier is a supervised methodwhose name is derived from the similarity operator that ituses the generalized Gamma operatorThis operator takes asinput two binary vectors 119909 and 119910 and a positive integer 120579and returns 1 if both vectors are similar (120579 being the degree ofallowed dissimilarity) or 0 otherwise The Gamma operatoruses other operators (namely 120572 120573 and 119906120573) that will beintroduced first The rest of this section is strongly based on[37 38]

Table 1 The Alpha (120572) and Beta (120573) operators

(a)

120572 119860 times 119860 rarr 119861

119909 119910 120572(119909 119910)

0 0 10 1 01 0 21 1 1

(b)

120573 119861 times 119860 rarr 119860

119909 119910 120573(119909 119910)

0 0 00 1 01 0 01 1 12 0 12 1 1

Definition 1 (Alpha and Beta operators) Given the sets 119860 =

0 1 and 119861 = 0 1 2 the Alpha (120572) and Beta (120573) operatorsare defined in a tabular form as shown in Table 1

Definition 2 (Alpha operator applied to vectors) Let 119909 119910 isin

119860119899 with 119899 isin Z+ be the input column vectors The output for

120572(119909 119910) is 119899-dimensional vector whose 119894th component (119894 =1 2 119899) is computed as follows

120572 (119909 119910)119894 = 120572 (119909119894 119910119894) (1)

Definition 3 (119906120573 operator) Considering the binary pattern119909 isin 119860

119899 as input this unary operator gives the following non-negative integer as output and is calculated as follows

119906120573 (119909) =

119899

sum

119894=1120573 (119909119894 119909119894) (2)

Thus if 119909 = [1 1 0 0 1 0] then

119906120573 (119909) = 120573 (1 1) + 120573 (1 1) + 120573 (0 0) + 120573 (0 0)

+ 120573 (1 1) + 120573 (0 0) = 1+ 1+ 0+ 0+ 1+ 0

= 3

(3)

Definition 4 (pruning operator) Let 119909 isin 119860119899 and 119910 isin 119860119898 let

119899119898 isin Z+ 119899 lt 119898 be two binary vectors then 119910 pruned by119909 denoted by 119910119909 is 119899-dimensional binary vector whose 119894thcomponent is defined as follows

(1199101003817100381710038171003817119909)119894

= 119910119894+119898minus119899 (4)

where (119894 = 1 2 119899)

4 Mathematical Problems in Engineering

For instance let 119910 = [1 1 1 1 0 1] 119909 = [0 1 0]119898 =

6 and then

(1199101003817100381710038171003817119909)1 = 1199101+6minus3 = 1199104

(1199101003817100381710038171003817119909)2 = 1199102+6minus3 = 1199105

(1199101003817100381710038171003817119909)3 = 1199103+6minus3 = 1199106

(1199101003817100381710038171003817119909) = [

1 0 1]

(5)

Thus only the 4th 5th and 6th elements of 119910 will be usedin (119910119909)119894 Notice that the first 119898 minus 119899 components of 119910 arediscarded when building 119910119909

The Gamma operator needs a binary vector as input Inorder to deal with real andor integer vectors a method torepresent this vector in a binary form is needed In this workwe will use the modified Johnson-Mobius code [39] which isa variation of the classical Johnson-Mobius code To converta set of realinteger numbers into a binary representation wewill follow these steps

(1) Let 119877 be a set of real numbers such that 119877 = 1199031 1199032 119903119894 119903119899 where 119899 isin R

(2) If there is one or more negative numbers in the setcreate a new set from subtracting the minimum (ie119903119894) from each number in 119877 obtaining a new set 119879with only nonnegative real numbers 119879 = 1199051 1199052 119905119894 119905119899 where 119905119895 = 119903119895 minus 119903119894 forall119895 isin 1 2 119899 andparticularly 119905119894 = 0

(3) Choose a fixed number 119889 and truncate each numberof the set 119879 to have only 119889 decimal positions

(4) Scale up all the numbers of the set obtained in theprevious step by multiplying all numbers by 10119889 inorder to leave only nonnegative integer numbers 119864 =

1198901 1198902 119890119898 119890119899 where 119890119895 = 119905119895 sdot 10119889forall119895 isin 1 2

119899 and 119890119898 is the maximum value of the set 119864(5) For each element 119890119895 isin 119864 with 119895 = 1 2 119899

concatenate (119890119898 minus 119890119895) zeroes with 119890119895 ones

For instance let 119877 sub R be defined as 119877 = 24 minus07 18 15now let us use the modified Johnson-Mobius code to convertthe elements of 119877 into binary vectors

(1) Subtract the minimum Since minus07 is the minimum in119877 the members of the transformed set 119879 are obtainedby subtracting minus07 from each member in 119877 119905119894 = 119889119894 minus(minus07) = 119889119894 + 07 Consider

119863 997888rarr 119879 sub R

119879 = 31 0 25 22 (6)

(2) Scale up the numbers We select 119889 = 1 so eachnumber is truncate to 1 decimal and then multiply by10 (10119889 = 101 = 10) One has

119879 997888rarr 119864 sub Z+

119864 = 31 0 25 22 (7)

(3) Concatenate (119890119898 minus119890119895) zeroes with 119890119895 ones where 119890119898 isthemaximumnumber in119864 and 119890119895 is the current num-ber to be coded Given that the maximum numberin 119864 is 31 all the binary vectors have 31 componentsthat is they all are 31 bits long For example 1198904 =

22 is converted into its binary representation 1198884by appending 119890119898 minus 1198904 = 31 minus 22 = 9 zeroes fol-lowed by 1198904 = 22 ones which gives the final vectorldquo0000000001111111111111111111111rdquo Consider

119864 997888rarr 119862 = 119888119894 | 119888119894 isin11986031

119862 =

[1111111111111111111111111111111][0000000000000000000000000000000][0000001111111111111111111111111][0000000001111111111111111111111]

(8)

Definition 5 (Gammaoperator) The similarityGammaoper-ator takes two binary patterns119909 isin 119860119899 and119910 isin 119860119898 119899119898 isin Z+119899 le 119898 and a nonnegative integer 120579 as input and outputs abinary number for which there are two cases

Case 1 If 119899 = 119898 then the output is computed according tothe following

120574119892 (119909 119910 120579) =

1 if 119898 minus 119906120573 [120572 (119909 119910) mod 2] le 120579

0 otherwise(9)

where mod denotes the usual modulo operation

Case 2 If 119899 lt 119898 then the output is computed using 119910119909instead of 119910 as follows

120574119892 (119909 119910 120579)

=

1 if 119898 minus 119906120573 [120572 (119909 1199101003817100381710038171003817119909) mod 2] le 120579

0 otherwise

(10)

In order to better illustrate how the generalized Gammaoperator works let us work through some examples of itsapplication Then if 119909 = [1 0 1 1 1] 119910 = [1 0 0 1 0]and 120579 = 1 what is the result of 120574119892(119909 119910 120579) First we calculate120572(119909 119910) = [1 1 2 1 2] then

120572 (119909 119910) mod 2 = [1 1 2 1 2] mod 2

= [1 1 0 1 0] (11)

now

119906120573 (120572 (119909 119910) mod 2) = 119906120573 [1 1 0 1 0]

= sum[120573 (1 1) 120573 (1 1) 120573 (0 0) 120573 (1 1) 120573 (0 0)]

= 1+ 1+ 0+ 1+ 0 = 3

(12)

Mathematical Problems in Engineering 5

and since 119898 = 5 5 minus 3 = 2 given that 2 le (120579 = 1) thenthe result of 120574119892 is 0 and it is decided that given 120579 the twovectors are not similar However if we increase 120579 = 2 theresult of 120574119892 will be 1 since 2 le (120579 = 2) In this case thetwo vectors will be considered similar The main idea of thegeneralized Gamma operator is to indicate (result equal to 1)that two binary vectors are similar allowing up to 120579 bits to bedifferent and still consider those vectors similar If more than120579 bits are different the vectors are said to be not similar (resultequal to 0) Thus if 120579 = 0 both vectors must be equal for 120574119892to output a 1

211 Gamma Classifier Algorithm Let (119909120583 119910120583) | 120583 =

1 2 119901 be the fundamental pattern set with cardinality 119901when a test pattern 119909 isin R119899 with 119899 isin Z+ is presented to theGamma classifier these steps are followed

(1) Code the fundamental set using the modified John-son-Mobius code obtaining a value 119890119898 for each com-ponent of the 119899-dimensional vectors in the funda-mental set These 119890119898 values are calculated from theoriginal real-valued vectors as

119890119898 (119895) = MAX119901119894=1 (119909119894119895) forall119895 isin 1 2 119899 (13)

(2) Compute the stop parameter 120588 = MAX119899119895=1[119890119898(119895)]

(3) Code the test pattern using the modified Johnson-Mobius code with the same parameters used to codethe fundamental set If any 119890119895 (obtained by offsettingscaling and truncating the original 119909119895) is greaterthan the corresponding 119890119898(119895) code it using the largernumber of bits

(4) Transform the index of all fundamental patterns intotwo indices one for their class and another for theirposition in the class (eg 119909120583 which is the 120596th patternfor class 119894 becomes 119909119894120596)

(5) Initialize 120579 to 0

(6) If 120579 = 0 test whether 119909 is a fundamental pattern bycalculating 120574119892(119909

120583119895 119909119895 0) for 120583 = 1 2 119901 and then

computing the initial weighted addition 1198880120583 for each

fundamental pattern as follows

1198880120583 =

119899

sum

119894=1120574119892 (119909120583119895 119909119895 0) for 120583 = 1 2 119901 (14)

If there is a unique maximum whose value equals 119899assign the class associated with such maximum to thetest pattern Consider

119910 = 119910120590 such that MAX119901119894=1119888

0119894 = 119888

0120590 = 119899 (15)

(7) Calculate 120574119892(119909119894120596119895 119909119895 120579) for each component of the

fundamental patterns

(8) Compute a weighted sum 119888119894 for each class accordingto this equation

119888119894 =

sum119896119894120596=1sum119899119895=1 120574119892 (119909

119894120596119895 119909119895 120579)

119896119894

(16)

where 119896119894 is the cardinality in the fundamental set ofclass 119894

(9) If there is more than one maximum among the dif-ferent 119888119894 increment 120579 by 1 and repeat steps (7) and(8) until there is a unique maximum or the stopcondition 120579 gt 120588 is fulfilled

(10) If there is a unique maximum among the 119888119894 assign119909 to the class corresponding to such maximum

119910 = 119910119895 such that MAX 119888119894 = 119888119895 (17)

(11) Otherwise assign 119909 to the class of the first maximum

As an example let us consider the following fundamentalpatterns

1199091= (

28)

1199092= (

19)

1199093= (

63)

1199094= (

74)

(18)

grouped in two classes 1198621 = 1199091 119909

2 and 1198622 = 119909

3 119909

4 Then

the patterns to be classified are

1199091= (

62)

1199092= (

39)

(19)

As can be seen the dimension of all patterns is 119899 = 2 andthere are 2 classes both with the same cardinality 1198961 = 2 and1198962 = 2 Now the steps in the algorithm are followed

6 Mathematical Problems in Engineering

(1) Code the fundamental set with themodified Johnson-Mobius code obtaining a value 119890119898 for each compo-nent Consider

1199091= [

0000011011111111

]

1199092= [

0000001111111111

]

1199093= [

0111111000000111

]

1199094= [

1111111000001111

]

(20)

Since the maximum value among the first compo-nents of all fundamental patterns is 7 then 119890119898(1) = 7and the maximum value among the second compo-nents is 119890119898(2) = 9Thus the binary vectors represent-ing the first components are 7 bits long while thoserepresenting the second component are 9 bits long

(2) Compute the stop parameter

120588 = MAX119899119895=1 [119890119898 (119895)] = MAX2119895=1 [119890119898 (1) 119890119898 (2)]

= MAX (7 9) = 7(21)

(3) Code the patterns to be classified into the modifiedJohnson-Mobius code using the same parametersused with the fundamental set

1199091= [

0111111000000011

]

1199092= [

0000111111111111

]

(22)

Again the binary vectors representing the first com-ponents are 7 bits long while those representing thesecond component are 9 bits long

(4) Transform the index of all fundamental patterns intotwo indices one for the class they belong to andanother for their position in the class Consider

1199091= 119909

11

1199092= 119909

12

1199093= 119909

21

1199094= 119909

22

(23)

Given that 1198621 = 1199091 119909

2 and 1198622 = 119909

3 119909

4 we know

that both 1199091 and 119909

2 belong to class 1198621 so the firstindex for both patterns becomes 1 the second index isused to differentiate between patterns assigned to thesame class thus 1199091 becomes 11990911 and 1199092 is now 119909

12Something similar happens to 1199093 and 1199094 but with thefirst index being 2 since they belong to class 1198622 119909

3

becomes 11990921 and 1199094 is now 11990922

(5) Initialize = 0(6) Calculate 120574119892(119909

119894120596119895 119909119895 120579) for each component of the

fundamental patterns in each class Thus for 1199091 wehave

120574119892 (119909111 119909

11 0) = 0

120574119892 (119909112 119909

12 0) = 0

120574119892 (119909121 119909

11 0) = 0

120574119892 (119909122 119909

12 0) = 0

120574119892 (119909211 119909

11 0) = 1

120574119892 (119909212 119909

12 0) = 0

120574119892 (119909221 119909

11 0) = 0

120574119892 (119909222 119909

12 0) = 0

(24)

Since 120579 = 0 for 120574119892 to give a result of 1 it is necessarythat both vectors 119909119894120596119895 and 1199091119895 are equal But only onefundamental pattern has its first component equal tothat of 1199091 119909211 = [0111111] = 119909

11 while no funda-

mental pattern has a second component equal to thatof the test pattern Given that this is the only case forwhich a component of a fundamental pattern is equalto the same component of the test pattern it is alsothe only instance in which 120574119892 outputs 1

Now for 1199092 we have

120574119892 (119909111 119909

21 0) = 0

120574119892 (119909112 119909

22 0) = 0

120574119892 (119909121 119909

21 0) = 0

120574119892 (119909122 119909

22 0) = 1

120574119892 (119909211 119909

21 0) = 0

120574119892 (119909212 119909

22 0) = 0

120574119892 (119909221 119909

21 0) = 0

120574119892 (119909222 119909

22 0) = 0

(25)

Mathematical Problems in Engineering 7

Again 120579 = 0 thus forcing both vectors 119909119894120596119895 and 1199092119895

to be equal in order to obtain a 1 Similarly to whathappened in the case of 1199091 there is but one instanceof such occurrence 119909122 = [1111111] = 11990922

(7) Compute a weighted sum 119888119894 for each class

1198881 =sum1198961120596=1sum119899119895=1 120574119892 (119909

1120596119895 119909

1119895 120579)

1198961

=

sum2120596=1sum

2119895=1 120574119892 (119909

1120596119895 119909

1119895 120579)

2

=

sum2120596=1 [(0 + 0) (0 + 0)]

2=

(0 + 0)2

= 0

(26)

Here we add together the results obtained on allcomponents of all fundamental patterns belongingto class 1198621 Since all gave 0 (ie none of these weresimilar to the corresponding component of the testpattern given the value of 120579 in effect) the result of theweighted addition for class 1198621 is 1198881 = 0 One has

1198882 =sum1198962120596=1sum119899119895=1 120574119892 (119909

2120596119895 119909

1119895 120579)

1198962

=

sum2120596=1sum

2119895=1 120574119892 (119909

2120596119895 119909

1119895 120579)

2

=

sum2120596=1 [(1 + 0) (0 + 0)]

2=

(1 + 0)2

= 05

(27)

In this case there was one instance similar to thecorresponding component of the test pattern (equalsince 120579 = 0 during this run) Thus the weightedaddition for class 1198622 is 1198882 = 05 given that the sum ofresults is divided by the cardinality of the class 119896 = 2And for 1199092

1198881 =sum1198961120596=1sum119899119895=1 120574119892 (119909

1120596119895 119909

2119895 120579)

1198961

=

sum2120596=1sum

2119895=1 120574119892 (119909

1120596119895 119909

2119895 120579)

2

=

sum2120596=1 [(0 + 0) (0 + 1)]

2=

(0 + 0)2

= 05

(28)

There was one result of 120574119892 equal to 1 which coupledwith the cardinality of class1198621 being 2makes 1198881 = 05One has

1198882 =sum1198962120596=1sum119899119895=1 120574119892 (119909

2120596119895 119909

2119895 120579)

1198962

=

sum2120596=1sum

2119895=1 120574119892 (119909

2120596119895 119909

2119895 120579)

2

=

sum2120596=1 [(0 + 0) (0 + 0)]

2=

(0 + 0)2

= 0

(29)

No similarities were found between the fundamentalpatterns of class1198622 and the test pattern 1199092 (for 120579 = 0)thus 1198882 = 0

(8) If there is more than one maximum among the dif-ferent 119888119894 increment 120579 by 1 and repeat steps (6) and (7)until there is a unique maximum or the stop condi-tion 120579 ge 120588 is fulfilledThere is a uniquemaximum (foreach test pattern) so we go directly to step (9)

(9) If there is a unique maximum assign 119910 to the classcorresponding to such maximum For 1199091

1198621199091 = 1198622 since

MAX2119894=1119888119894 = MAX (0 05) = 05 = 1198882

(30)

while for 1199092

1198621199092 = 1198621 since

MAX2119894=1119888119894 = MAX (05 0) = 05 = 1198881

(31)

(10) Otherwise assign 119910 to the class of the first maximumThis is unnecessary since both test patterns havealready been assigned a class

22 SlidingWindows Awidespread approach for data streammining is the use of a sliding window to keep only a represen-tative portion of the data We are not interested in keeping allthe information of the data stream particularly whenwe haveto meet the memory space constraint required for algorithmsworking in this context This technique is able to deal withconcept drift eliminating those data points that come froman old concept According to [5] the training window sizecan be fixed or variable over time

Sliding windows of a fixed size store in memory a fixednumber of the 119908 most recent examples Whenever a newexample arrives it is saved to memory and the oldest one isdiscarded These types of windows are similar to first-in first-out data structures This simple adaptive learning method isoften used as a baseline in evaluation of new algorithms

Sliding windows of variable size vary the number ofexamples in a window over time typically depending on theindications of a change detector A straightforward approachis to shrink the window whenever a change is detected suchthat the training data reflects the most recent concept andgrow the window otherwise

23 Proposed Method The Gamma classifier has previ-ously been used for time series prediction in air pollution[37] and oil well production [40] showing very promisingresults The current work takes the Gamma classifier andcombines it with a sliding window approach with the aimof developing a newmethod capable of performing classifica-tion over a continuous data stream referred to asDS-Gammaclassifier

8 Mathematical Problems in Engineering

(1) Initialize window(2) for each 119909120591 do(3) If 120591 lt 119882 then(4) Add 119909120591 to the tail of the window(5) Else(6) Use 119909120591 to classify(7) Add 119909120591 to the tail of the window

Algorithm 1 Learning policy algorithm

A data stream 119878 can be defined as a sequence of dataelements of the form 119878 = 1199041 1199042 119904now from a universe ofsize119863 where119863 is huge and potentially infinite but countableWe assume that each element 119904119894 has the following form

119904119894 = (119909120591 119910120591) (32)

where 119909120591 is an input vector of size 119899 and 119910120591 is the class labelassociatedwith it 120591 isin Z+ is an index to indicate the sequentialorder of the elements in the data stream 120591 will be boundedby the size of the data stream such that if the data stream isinfinite 120591 will take infinite values but countable

One of the main challenges for mining data streamsis the ability to maintain an accurate decision model Thecurrent model has to incorporate new knowledge from newdata and also it should forget old information when datais outdated To address these issues the proposed methodimplements a learning and forgetting policy The learning(insertion)forgetting (removal) policies were designed toguarantee the space and temporal relevance of the data andthe consistency of the current model

The proposed method implements a fixed windowapproach This approach achieves concept drift detection inan indirect way by considering only the most recent dataAlgorithm 1 outlines the general steps used as learning policyInitially the window is created with the first 119882 labeledrecords where 119882 is the size of the window After thatevery sample is used for classification and evaluation of theperformance and then it is used to update the window

Classifiers that deal with concept drifts are forced toimplement forgetting adaptation or drift detection mech-anisms in order to adjust to changing environments Inthis proposal a forgetting mechanism is implemented Algo-rithm 2 outlines the basic forgetting policy implementedin this proposal The oldest element in the window 119909

120575 isremoved

24 MOA Massive Online Analysis (MOA) is a softwareenvironment for implementing algorithms and runningexperiments for online learning from evolving data streams[41] MOA contains a collection of offline and online algo-rithms for both classification and clustering as well as toolsfor evaluation on data streams with concept drifts MOA alsocan be used to build synthetic data streams using generatorsreading ARFF files or joining several streams MOA streamsgenerators allow simulating potentially infinite sequences of

Table 2 Datasets characteristics

Dataset Instances Attributes ClassesAgrawal 100000 10 2Bank 45211 16 2CoverType 581012 54 7Electricity 45312 8 2Hyperplane 100000 10 2RandomRBF 100000 10 2

data with different characteristics (different types of conceptdrifts and recurrent concepts among others) All the algo-rithms used for comparative analysis are implemented underthe MOA framework

25 Data Streams This section provides a brief descriptionof the data streams used during the experimental phase Theproposed solution has been tested using synthetic and realdata streams For data stream classification problems just afew suitable data streams can be successfully used for evalu-ation A few data streams with enough examples are hostedin [42 43] For further evaluation the use of synthetic datastreams generators becomes necessary The synthetics datastreams used in our experiments were generated using theMOA framework mentioned in the previous section A sum-mary of the main characteristics of the data streams is shownin Table 2 Descriptions of the data streams were taken from[42 43] in the case of real ones and from [44] for the synthet-ics

251 Real World Data Streams

Bank Marketing Dataset The Bank Marketing dataset isavailable from UCI Machine Learning Repository [43] Thisdata is related to direct marketing campaigns (phone calls)of a Portuguese banking institution The classification goal isto predict if the client will subscribe or not a term deposit Itcontains 45211 instances Each instance has 16 attributes andthe class label The dataset has two classes that identified if abank term deposit would be or would not be subscribed

Electricity Dataset This data was collected from the Aus-tralian New South Wales electricity market In this marketprices are not fixed and are affected by demand and supply ofthemarketTheprices of themarket are set every fiveminutesThe dataset was first described by Harries in [45] During thetime period described in the dataset the electricity marketwas expanded with the inclusion of adjacent areas whichleads tomore elaboratemanagement of the supplyThe excessproduction of one region could be sold on the adjacentregionThe Electricity dataset contains 45312 instances Eachexample of the dataset refers to a period of 30 minutes (iethere are 48 instances for each time period of one day) Eachexample on the dataset has 8 fields the day of week the timestamp the New South Wales electricity demand the NewSouthWales electricity price the Victoria electricity demandtheVictoria electricity price the scheduled electricity transfer

Mathematical Problems in Engineering 9

(1) for each 119909120591 do(2) If 120591 gt 119882 then(3) Remove 119909120575(120575 = 120591 minus119882 + 1) from the beginning of the window

Algorithm 2 Forgetting policy algorithm

between states and the class label The class label identifiesthe change of the price relative to a moving average of the last24 hours We use the normalized versions of these datasets sothat the numerical values are between 0 and 1 The dataset isavailable from [42]

Forest CoverType Dataset The Forest CoverType dataset isone of the largest datasets available from the UCI MachineLearning Repository [43] This dataset contains informationthat describes forest cover types from cartographic variablesA given observation (30times 30meter cell) was determined fromthe US Forest Service (USFS) Region 2 Resource InformationSystem (RIS) data It contains 581012 instances and 54attributes Data is in raw form (not scaled) and containsbinary (0 or 1) columns of data for qualitative independentvariables (wilderness areas and soil types) The dataset has 7classes that identify the type of forest cover

252 Synthetic Data Streams

Agrawal Data Stream This generator was originally intro-duced by Agrawal et al in [46] The MOA implementationwas used to generate the data streams for the experimen-tal phase in this work The generator produces a streamcontaining nine attributes Although not explicitly stated bythe authors a sensible conclusion is that these attributesdescribe hypothetical loan applications There are ten func-tions defined for generating binary class labels from theattributes Presumably these determine whether the loanshould be approved Perturbation shifts numeric attributesfrom their true value adding an offset drawn randomly froma uniform distribution the range of which is a specifiedpercentage of the total value range [44] For each experimenta data stream with 100000 examples was generated

RandomRBFData StreamThis generator was devised to offeran alternate complex concept type that is not straightforwardto approximate with a decision tree model The RBF (RadialBasis Function) generator works as follows A fixed numberof random centroids are generated Each center has a randomposition a single standard deviation class label and weightNew examples are generated by selecting a center at randomtaking weights into consideration so that centers with higherweight are more likely to be chosen A random direction ischosen to offset the attribute values from the central pointThe length of the displacement is randomly drawn from aGaussian distribution with standard deviation determined bythe chosen centroid The chosen centroid also determinesthe class label of the example Only numeric attributes aregenerated [44] For our experiments a data stream with

100000 data instances 10 numerical attributes and 2 classeswas generated

Rotating Hyperplane Data Stream This synthetic data streamwas generated using the MOA framework A hyperplane in119889-dimensional space is the set of points 119909 that satisfy [44]

119889

sum

119894=1119909119894119908119894 = 1199080 =

119889

sum

119894=1119908119894 (33)

where 119909119894 is the 119894th coordinate of 119909 examples for whichsum119889119894=1 119909119894119908119894 ge 1199080 are labeled positive and examples for which

sum119889119894=1 119909119894119908119894 lt 1199080 are labeled negative Hyperplanes are useful

for simulating time-changing concepts because orientationand position of the hyperplane can be adjusted in a gradualway if we change the relative size of the weights In MOAthey introduce change to this data stream by adding driftto each weight attribute using this formula 119908119894 = 119908119894 + 119889120590where 120590 is the probability that the direction of change isreversed and 119889 is the change applied to every example Forour experiments a data stream with 100000 data instances10 numerical attributes and 2 classes was generated

26 Algorithms Evaluation One of the objectives of this workis to perform a consistent comparison between the classi-fication performance of our proposal and the classificationperformance of other well-known data stream pattern clas-sification algorithms Evaluation of data stream algorithmsis a relatively new field that has not been studied as well asevaluation on batch learning In traditional batch learningevaluation has been mainly limited to evaluating a modelusing different random reorder of the same static dataset (119896-fold cross validation leave-one-out and bootstrap) [2] Fordata stream scenarios the problem of infinite data raises newchallenges On the other hand the batch approaches cannoteffectively measure accuracy of a model in a context withconcepts that change over time In the data stream settingone of the main concerns is how to design an evaluationpractice that allows us to obtain a reliablemeasure of accuracyover time According to [41] two main approaches arise

261 Holdout When traditional batch learning reaches ascale where cross validation is too time-consuming it is oftenaccepted to insteadmeasure performance on a single holdoutset An extension of this approach can be applied for datastream scenarios First a set of 119872 instances is used to trainthe model then a set of the following unseen 119873 instances isused to test the model then it is again trained with the next119872 instances and testedwith the subsequence119873 instances andso forth

10 Mathematical Problems in Engineering

262 Interleaved Test-Then-Train or Prequential Each indi-vidual example can be used to test the model before it is usedfor training and from this the accuracy can be incrementallyupdated When intentionally performed in this order themodel is always being tested on examples it has not seen

Holdout evaluation gives a more accurate estimationof the performance of the classifier on more recent dataHowever it requires recent test data that is sometimes difficultto obtain while the prequential evaluation has the advantagethat no holdout set is needed for testing making maximumuse of the available data In [47] Gama et al present a generalframework to assess data stream algorithms They studieddifferent mechanisms of evaluation that include holdoutestimator the prequential error and the prequential errorestimated over a sliding window or using fading factorsTheystated that the use of prequential error with forgetting mech-anisms reveals to be advantageous in assessing performanceand in comparing stream learning algorithms Since ourproposedmethod isworkingwith a slidingwindowapproachwe have selected the prequential error over sliding windowfor evaluation of the proposedmethodThe prequential error119875119882 for a sliding window of size 119882 computed at time 119894 isbased on an accumulated sumof a 0-1 loss function119871 betweenthe predicted 119910119896 values and the observed values 119910119896 using thefollowing expression [47]

119875119882 (119894) =1119882

119894

sum

119896=119894minus119882+1119871 (119910119896 119910119896) (34)

We also evaluate the performance results of each competingalgorithm with the Gamma classifier using a statistical testWe use the Wilcoxon test [48] to assess the statistical sig-nificance of the results The Wilcoxon signed-ranks test is anonparametric test which can be used to rank the differencesin performances of two classifiers for each dataset ignoringthe signs and comparing the ranks for the positive and thenegative differences

27 Experimental Setup To ensure valid comparison of clas-sification performance the same conditions and validationschemes were applied in each experiment Classificationperformance of each of the compared algorithms was cal-culated using the prequential error approach These resultswill be used to compare the performance of our proposaland other data stream classification algorithms We haveaimed at choosing representative algorithms from differentapproaches covering the state of the art in data stream classi-fication such as Hoeffding Tree IBL Naıve Bayes with DDM(Drift Detection Method) SVM and ensemble methods Allof these algorithms are implemented in theMOA frameworkFurther details on the implementation of these algorithmscan be found in [44] For all data streams with exceptionof the Electricity dataset the experiments were executed 10times and the average of the performance and executiontime results are presented Records in the Electricity datasethave a temporal relation that should not be altered forthis reason with this specific dataset the experiment wasexecuted just one time The Bank and CoverType datasets

were reordered 10 times using the randomized filter fromthe WEKA platform [49] for each experiment For thesynthetics data streams (Agrawal Rotating Hyperplane andRandomRBF) 10 data streams were generated for each oneof them using different random seeds in MOA Performancewas calculated using the prequential error introduced in theprevious section We also measure the total time used forclassification to evaluate the efficiency of each algorithm Allexperiments were conducted using a personal computer withan Intel Core i3-2100 processor running Ubuntu 1304 64-bitoperating system with 4096GB of RAM

3 Results and Discussion

In this section we present and discuss the results obtainedduring the experimental phase throughout which six datastreams three real and three synthetic were used to obtainthe classification and time performance for each of thecompared classification algorithms First we evaluate thesensibility of the proposed method to the change on thewindows size which is a parameter that can largely influencethe performance of the classifier Table 3 shows the averageaccuracy of the DS-Gamma classifier for different windowsizes best accuracy for each data stream is emphasized withboldface Table 4 shows the average total time used to classifyeach data stream The results of the average performancewith indicator of the standard deviation are also depicted inFigure 1

In general the DS-Gamma classifier performanceimproved as the window size increases with the exceptionof Electricity data stream for this data stream performanceis better with smaller window sizes It is worth noting thatthe DS-Gamma classifier achieved its best performance forthe Electricity data stream with a window size of 50 witha performance of 8912 This performance is better thanall the performances achieved by all the other comparedalgorithms with a window size = 1000 For the other datastreams the best performance was generally obtained withwindow sizes between 500 and 1000 With window sizesgreater than 1000 performance keeps stable and in somecases even degrades as we can observe in the results fromthe Bank and Hyperplane data streams in Figure 1 Figure 2shows the average classification time for different windowsize A linear trend line (dotted line) and the coefficient ofdetermination 1198772 were included in the graphic of each datastream The 1198772 values ranging between 09743 and 09866show that time is growing in linear way as the size of thewindow increases

To evaluate the DS-Gamma classifier we compared itsperformance with other data stream classifiers Table 5 pre-sents the performance results of each evaluated algorithm oneach data stream including the standard deviation Best per-formance is emphasized with boldface We also include boxplot with the same results in Figure 3 For the Agrawal datastreamHoeffdingTree and theDS-Gamma classifier achievedthe two best performances OzaBoostAdwin presents the bestperformance for the CoverType and Electricity data streamsbut this ensemble classifier exhibits the highest classificationtime for all the data streams as we can observe in Table 6 and

Mathematical Problems in Engineering 11

Table 3 Average accuracy of DS-Gamma for different windows size

Dataset Window size50 100 200 500 750 1000 1250 1500 1750 2000

Agrawal 7964 8597 8986 9259 9288 9282 9267 9251 9232 9207Bank 6819 7340 7705 7913 7912 7896 7860 7824 7776 7737CoverType 3821 3761 3970 4430 4659 4777 4862 4919 4951 4974Electricity 8912 8526 8171 7984 7858 7714 7636 7594 7531 7449Hyperplane 7825 8265 8528 8656 8655 8630 8598 8565 8534 8500RandomRBF 6461 6739 6933 7056 7081 7085 7074 7063 7050 7033

Table 4 Averaged classification time (seconds) of DS-Gamma for different windows size

Dataset Window size50 100 200 500 750 1000 1250 1500 1750 2000

Agrawal 224 450 793 1926 2804 3900 4592 5757 6390 7349Bank 066 114 207 475 701 917 1154 1455 1583 1787CoverType 8347 14081 25120 34005 25412 33503 42894 52239 66924 67699Electricity 034 053 098 233 356 463 599 705 830 949Hyperplane 224 401 756 1819 2676 3028 4428 5296 6187 7063RandomRBF 279 409 773 1854 2735 3072 4502 5412 6223 7104

Table 5 Averaged accuracy () comparison

Classifier Agrawal Bank CoverType Electricity Hyperplane RandomRBFAcc 120590 Acc 120590 Acc 120590 Acc 120590 Acc 120590 Acc 120590

Bayes-DDM 8816 019 8338 141 6761 681 8105 mdash 8602 139 7187 382Perceptron DDM 6720 011 7343 1053 7076 555 7861 mdash 8565 149 7188 593Hoeffding Tree 9439 024 8852 011 7229 268 8241 mdash 8241 562 8672 169Active Classifier 9082 162 8821 014 6731 103 7859 mdash 7950 655 7532 270IBL-DS 8676 048 8652 021 6173 946 7534 mdash 8420 077 9332 121AWE 9217 059 8790 038 7616 366 7836 mdash 8588 165 9198 114OzaBoostAdwin 8711 262 8767 050 7855 497 8792 mdash 8321 191 9149 114SGD 6720 011 7656 166 5901 737 8452 mdash 8319 042 6174 331SPegasos 5582 017 7746 199 5047 517 5760 mdash 5280 608 5333 313DS-Gamma 9282 010 7896 022 4777 431 7714 mdash 8630 135 7085 254

Table 6 Averaged classification time (seconds) comparison

Classifier Agrawal Bank CoverType Electricity Hyperplane RandomRBFTime 120590 Time 120590 Time 120590 Time 120590 Time 120590 Time 120590

Bayes-DDM 062 011 039 008 1776 162 061 mdash 070 001 069 001Perceptron DDM 040 002 020 005 828 010 019 mdash 045 002 044 002Hoeffding Tree 098 023 054 003 3005 1106 078 mdash 136 005 140 005Active Classifier 064 003 026 002 1514 074 023 mdash 072 004 070 002IBL-DS 5319 211 3851 826 105760 6095 861 mdash 8639 096 8810 106AWE 1432 020 663 030 101247 30441 670 mdash 1803 364 3752 303OzaBoostAdwin 19262 4447 4693 1056 1058973 389664 3975 mdash 22399 5134 5704 1666SGD 014 004 011 0009 4035 007 012 mdash 027 002 027 002SPegasos 022 0006 011 0016 1684 012 012 mdash 031 005 024 001DS-Gamma 3900 358 916 005 33503 1291 463 mdash 3028 079 3072 094

12 Mathematical Problems in Engineering

Agrawal data stream

Window size

95

90

85

80

7550 100 200 500 750 1000 1250 1500 1750 2000

Hyperplane data stream

90

85

80

70

75

65

Window size50 100 200 500 750 1000 1250 1500 1750 2000

Bank data stream85

80

70

75

6550 100 200 500 750 1000 1250 1500 1750 2000

Window size

Window size

90

85

80

70

75

65

60

50

55

50 100 200 500 750 1000 1250 1500 1750 2000

RandomRBF data stream

60

50

40

30

20

10

0

Window size50 100 200 500 750 1000 1250 1500 1750 2000

CoverType data stream

90

95

85

80

70

75

6550 100 200 500 750 1000 1250 1500 1750 2000

Window size

Electricity data stream

Figure 1 Average accuracy of DS-Gamma for different windows size

Figure 4 This can be a serious disadvantage especially indata stream scenarios The lowest times are achieved by theperceptron with Drift Detection Method (DDM) but itsperformance is generally low when compared to the othermethods A fact that caught our attention was the low perfor-mance of the DS-Gamma classifier for the CoverType datastream This data stream presents a heavy class imbalancethat seems to be negatively affecting the performance of theclassifier As we can observe in step (8) of the DS-Gamma

classifier algorithm classification relies on a weighted sumper class when a class greatly outnumbers the other(s) thissum will be biased to the class with most members

To statistically compare the results obtained by the DS-Gamma classifier and the other evaluated algorithms theWilcoxon signed-ranks test was used Table 7 shows thevalues of the asymptotic significance (2-sided) obtained bythe Wilcoxon test using 120572 = 005 The obtained values areall greater than 120572 so the null hypothesis is not rejected and

Mathematical Problems in Engineering 13

Hyperplane data stream

Window size

Window size

CoverType data stream Electricity data stream

RandomRBF data stream

Agrawal data stream Bank data stream80

70

60

50

40

30

20

20

15

10

5

010

050 100 200 500 750 1000 1250 1500 1750 2000

50 100 200 500 750 1000 1250 1500 1750 2000

Window size50 100 200 500 750 1000 1250 1500 1750 2000 Window size

50 100 200 500 750 1000 1250 1500 1750 2000

Window size50 100 200 500 750 1000 1250 1500 1750 2000

Window size50 100 200 500 750 1000 1250 1500 1750 2000

minus10

70

60

50

40

30

20

10

0

550

450

350

250

150

50

10

8

6

4

2

0

minus10

minus50

70

60

50

40

30

20

10

0

minus10

minus2

minus5

R

2

= 09866

R

2

= 09854

R

2

= 09804

R

2

= 09796

R

2

= 09743

R

2

= 09842

Figure 2 Averaged classification time (seconds) of DS-Gamma for different windows size

we can infer that there are no significant differences amongthe compared algorithms

4 Conclusions

In this paper we describe a novel method for the task ofpattern classification over a continuous data stream based onan associative model The proposed method is a supervisedpattern recognition model based on the Gamma classifierDuring the experimental phase the proposed method wastested using different data stream scenarios with real andsynthetic data streams The proposed method presented verypromising results in terms of performance and classification

Table 7 Wilcoxon results

Compared algorithms Asymptotic significance (2-sided)Gamma-Bayes-DDM 0345Gamma-Perceptron DDM 0917Gamma-Hoeffding Tree 0116Gamma-Active Classifier 0463Gamma-IBL-DS 0345Gamma-AWE 0116Gamma-OzaBoostAdwin 0116Gamma-SGD 0600Gamma-SPegasos 0075

14 Mathematical Problems in Engineering

Agrawal data stream

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

95

90

85

80

75

70

65

60

55

50

Bank data stream

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

90

80

70

60

50

40

Hyperplane data stream

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

95

90

85

80

75

70

65

60

55

45

50

RandomRBF data stream

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

95

90

85

80

75

70

65

60

55

45

50

CoverType data stream

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

90

85

80

75

70

65

60

55

45

50

Electricity data stream100

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

90

80

70

60

50

40

30

20

10

0

Figure 3 Averaged accuracy () comparison

time when compared with some of the state-of-the-art algo-rithms for data stream classification

We also study the effect of the window size over the clas-sification accuracy and time In general accuracy improvedas the window size increased We observed that best perfor-mances were obtained with window sizes between 500 and

1000 For larger window sizes performance remained stableand in some cases even declined

Since the proposed method based its classification ona weighted sum per class it is strongly affected by severeclass imbalance as we can observe from the results obtainedwith the experiments performed with the CoverType dataset

Mathematical Problems in Engineering 15

Agrawal data stream

200

150

100

50

0

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

1

10

100

Hyperplane data stream

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

1

10

100

1000

10000

100000CoverType data stream

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

0102030405060708090

RandomRBF data stream

0

10

20

30

40

50

60Bank data stream

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

05

10152025303540

Electricity data stream

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

Figure 4 Averaged classification time (seconds) comparison

a heavily imbalance dataset where the accuracy of theclassifier was the worst of the compared algorithms As futurework a mechanism to address severe class imbalance shouldbe integrated to improve classification accuracy

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

The authors would like to thank the Instituto PolitecnicoNacional (Secretarıa Academica COFAA SIP CIC andCIDETEC) the CONACyT and SNI for their economicalsupport to develop this workThe support of theUniversity ofPorto was provided during the research stay from August toDecember 2014 The authors also want to thank Dr YennyVilluendas Rey for her contribution to perform the statisticalsignificance analysis

16 Mathematical Problems in Engineering

References

[1] V Turner J FGantzD Reinsel and SMinton ldquoThedigital uni-verse of opportunities rich data and the increasing value of theinternet of thingsrdquo Tech Rep IDC Information andData 2014httpwwwemccomleadershipdigital-universe2014iviewindexhtm

[2] A Bifet Adaptive Stream Mining Pattern Learning and Miningfrom Evolving Data Streams IOS Press Amsterdam The Neth-erlands 2010

[3] J Gama Knowledge Discovery from Data Streams Chapman ampHallCRC 2010

[4] J Beringer and E Hullermeier ldquoEfficient instance-based learn-ing on data streamsrdquo Intelligent Data Analysis vol 11 no 6 pp627ndash650 2007

[5] J Gama I Zliobaite A Bifet M Pechenizkiy and A Bou-chachia ldquoA survey on concept drift adaptationrdquo ACM Comput-ing Surveys vol 46 no 4 article 44 2014

[6] P Domingos and G Hulten ldquoMining high-speed data streamsrdquoinProceedings of the 6thACMSIGKDD International Conferenceon Knowledge Discovery and Data Mining pp 71ndash80 BostonMass USA August 2000

[7] G Hulten L Spencer and P Domingos ldquoMining time-chang-ing data streamsrdquo in Proceedings of the 7th ACM SIGKDD Inter-national Conference on Knowledge Discovery and Data Mining(KDD rsquo01) pp 97ndash106 San Francisco Calif USA August 2001

[8] C Liang Y Zhang P Shi and Z Hu ldquoLearning very fast deci-sion tree from uncertain data streams with positive and unla-beled samplesrdquo Information Sciences vol 213 pp 50ndash67 2012

[9] L Rutkowski M Jaworski L Pietruczuk and P Duda ldquoDeci-sion trees for mining data streams based on the gaussianapproximationrdquo IEEE Transactions on Knowledge and DataEngineering vol 26 no 1 pp 108ndash119 2014

[10] G Widmer andM Kubat ldquoLearning in the presence of conceptdrift and hidden contextsrdquo Machine Learning vol 23 no 1 pp69ndash101 1996

[11] F Ferrer-Troyano J S Aguilar-Ruiz and J C RiquelmeldquoData streams classification by incremental rule learning withparameterized generalizationrdquo in Proceedings of the 2006 ACMSymposium on Applied Computing pp 657ndash661 Dijon FranceApril 2006

[12] D Mena-Torres and J S Aguilar-Ruiz ldquoA similarity-basedapproach for data stream classificationrdquo Expert Systems withApplications vol 41 no 9 pp 4224ndash4234 2014

[13] A Shaker and E Hullermeier ldquoIBLStreams a system forinstance-based classification and regression on data streamsrdquoEvolving Systems vol 3 no 4 pp 235ndash249 2012

[14] A Ghazikhani R Monsefi and H S Yazdi ldquoRecursive leastsquare perceptron model for non-stationary and imbalanceddata stream classificationrdquo Evolving Systems vol 4 no 2 pp119ndash131 2013

[15] A Ghazikhani R Monsefi and H S Yazdi ldquoEnsemble ofonline neural networks for non-stationary and imbalanced datastreamsrdquo Neurocomputing vol 122 pp 535ndash544 2013

[16] A Sancho-Asensio A Orriols-Puig and E GolobardesldquoRobust on-line neural learning classifier system for data streamclassification tasksrdquo Soft Computing vol 18 no 8 pp 1441ndash14612014

[17] D M Farid L Zhang A Hossain et al ldquoAn adaptive ensembleclassifier for mining concept drifting data streamsrdquo ExpertSystems with Applications vol 40 no 15 pp 5895ndash5906 2013

[18] D Brzezinski and J Stefanowski ldquoCombining block-based andonline methods in learning ensembles from concept driftingdata streamsrdquo Information Sciences vol 265 pp 50ndash67 2014

[19] G Ditzler and R Polikar ldquoIncremental learning of conceptdrift from streaming imbalanced datardquo IEEE Transactions onKnowledge and Data Engineering vol 25 no 10 pp 2283ndash23012013

[20] M J Hosseini Z Ahmadi andH Beigy ldquoUsing a classifier poolin accuracy based tracking of recurring concepts in data streamclassificationrdquo Evolving Systems vol 4 no 1 pp 43ndash60 2013

[21] N Saunier and S Midenet ldquoCreating ensemble classifiersthrough order and incremental data selection in a streamrdquo Pat-tern Analysis and Applications vol 16 no 3 pp 333ndash347 2013

[22] MMMasud J Gao L Khan J Han and BMThuraisinghamldquoClassification and novel class detection in concept-driftingdata streams under time constraintsrdquo IEEE Transactions onKnowledge and Data Engineering vol 23 no 6 pp 859ndash8742011

[23] A Alzghoul M Lofstrand and B Backe ldquoData stream fore-casting for system fault predictionrdquo Computers and IndustrialEngineering vol 62 no 4 pp 972ndash978 2012

[24] J Kranjc J Smailovic V Podpecan M Grcar M Znidarsicand N Lavrac ldquoActive learning for sentiment analysis on datastreams methodology and workflow implementation in theClowdFlows platformrdquo Information Processing and Manage-ment vol 51 no 2 pp 187ndash203 2015

[25] J Smailovic M Grcar N Lavrac and M Znidarsic ldquoStream-based active learning for sentiment analysis in the financialdomainrdquo Information Sciences vol 285 pp 181ndash203 2014

[26] S Shalev-Shwartz Y Singer N Srebro and A Cotter ldquoPegasosprimal estimated sub-gradient solver for SVMrdquo in Proceedingsof the 24th International Conference on Machine Learning pp807ndash814 June 2007

[27] Z Miller B Dickinson W Deitrick W Hu and A H WangldquoTwitter spammer detection using data stream clusteringrdquoInformation Sciences vol 260 pp 64ndash73 2014

[28] B Krawczyk and M Wozniak ldquoIncremental weighted one-class classifier for mining stationary data streamsrdquo Journal ofComputational Science vol 9 pp 19ndash25 2015

[29] J Gama ldquoA survey on learning from data streams current andfuture trendsrdquo Progress in Artificial Intelligence vol 1 no 1 pp45ndash55 2012

[30] L I Kuncheva ldquoChange detection in streaming multivariatedata using likelihood detectorsrdquo IEEE Transactions on Knowl-edge and Data Engineering vol 25 no 5 pp 1175ndash1180 2013

[31] G J Ross N M Adams D K Tasoulis and D J Hand ldquoExpo-nentially weighted moving average charts for detecting conceptdriftrdquoPatternRecognition Letters vol 33 no 2 pp 191ndash198 2012

[32] J Gama P Medas G Castillo and P Rodrigues ldquoLearning withdrift detectionrdquo in Advances in Artificial IntelligencemdashSBIA2004 vol 3171 of Lecture Notes in Computer Science pp 286ndash295 Springer Berlin Germany 2004

[33] X Wu P Li and X Hu ldquoLearning from concept drifting datastreams with unlabeled datardquoNeurocomputing vol 92 pp 145ndash155 2012

[34] X Qin Y Zhang C Li and X Li ldquoLearning from data streamswith only positive and unlabeled datardquo Journal of IntelligentInformation Systems vol 40 no 3 pp 405ndash430 2013

Mathematical Problems in Engineering 17

[35] M M Masud C Woolam J Gao et al ldquoFacing the reality ofdata stream classification coping with scarcity of labeled datardquoKnowledge and Information Systems vol 33 no 1 pp 213ndash2442012

[36] J Read A Bifet G Holmes and B Pfahringer ldquoScalable andefficient multi-label classification for evolving data streamsrdquoMachine Learning vol 88 no 1-2 pp 243ndash272 2012

[37] I Lopez-Yanez A J Arguelles-Cruz O Camacho-Nieto andC Yanez-Marquez ldquoPollutants time-series prediction usingthe gamma classifierrdquo International Journal of ComputationalIntelligence Systems vol 4 no 4 pp 680ndash711 2011

[38] M E Acevedo-Mosqueda C Yanez-Marquez and I Lopez-Yanez ldquoAlpha-beta bidirectional associative memories theoryand applicationsrdquo Neural Processing Letters vol 26 no 1 pp 1ndash40 2007

[39] C Yanez-Marquez E M Felipe-Riveron I Lopez-Yanez andR Flores-Carapia ldquoA novel approach to automatic color match-ingrdquo in Progress in Pattern Recognition Image Analysis andApplications vol 4225 of Lecture Notes in Computer Science pp529ndash538 Springer Berlin Germany 2006

[40] I Lopez-Yanez L Sheremetov and C Yanez-Marquez ldquoAnovel associative model for time series data miningrdquo PatternRecognition Letters vol 41 no 1 pp 23ndash33 2014

[41] A Bifet G Holmes R Kirkby and B Pfahringer ldquoData streammining a practical approachrdquo Tech RepUniversity ofWaikato2011

[42] Massive Online Analysis (MOA) 2014 httpmoacmswaikatoacnzdatasets

[43] K Bache and M Lichman UCI Machine Learning RepositorySchool of Information and Computer Science University ofCalifornia Irvine Calif USA 2013 httparchiveicsucieduml

[44] A Bifet R Kirkby P Kranen and P Reutemann ldquoMassiveOnline Analysis Manualrdquo 2012 httpmoacmswaikatoacnzdocumentation

[45] M Harries ldquoSplice-2 comparative evaluation electricity pric-ingrdquo Tech Rep University of South Wales Cardiff UK 1999

[46] R Agrawal T Imielinski and A Swami ldquoDatabase mining aperformance perspectiverdquo IEEE Transactions on Knowledge andData Engineering vol 5 no 6 pp 914ndash925 1993

[47] J Gama R Sebastiao and P P Rodrigues ldquoOn evaluatingstream learning algorithmsrdquo Machine Learning vol 90 no 3pp 317ndash346 2013

[48] F Wilcoxon ldquoIndividual comparisons by ranking methodsrdquoBiometrics Bulletin vol 1 no 6 pp 80ndash83 1945

[49] M Hall E Frank G Holmes B Pfahringer P Reutemannand I H Witten WEKA 3 Data Mining Software in Java 2010httpwwwcswaikatoacnzmlweka

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 4: Research Article Data Stream Classification Based …downloads.hindawi.com/journals/mpe/2015/939175.pdfuncertain data [ ], classi cation with unlabeled instances [ , , ], novel class

4 Mathematical Problems in Engineering

For instance let 119910 = [1 1 1 1 0 1] 119909 = [0 1 0]119898 =

6 and then

(1199101003817100381710038171003817119909)1 = 1199101+6minus3 = 1199104

(1199101003817100381710038171003817119909)2 = 1199102+6minus3 = 1199105

(1199101003817100381710038171003817119909)3 = 1199103+6minus3 = 1199106

(1199101003817100381710038171003817119909) = [

1 0 1]

(5)

Thus only the 4th 5th and 6th elements of 119910 will be usedin (119910119909)119894 Notice that the first 119898 minus 119899 components of 119910 arediscarded when building 119910119909

The Gamma operator needs a binary vector as input Inorder to deal with real andor integer vectors a method torepresent this vector in a binary form is needed In this workwe will use the modified Johnson-Mobius code [39] which isa variation of the classical Johnson-Mobius code To converta set of realinteger numbers into a binary representation wewill follow these steps

(1) Let 119877 be a set of real numbers such that 119877 = 1199031 1199032 119903119894 119903119899 where 119899 isin R

(2) If there is one or more negative numbers in the setcreate a new set from subtracting the minimum (ie119903119894) from each number in 119877 obtaining a new set 119879with only nonnegative real numbers 119879 = 1199051 1199052 119905119894 119905119899 where 119905119895 = 119903119895 minus 119903119894 forall119895 isin 1 2 119899 andparticularly 119905119894 = 0

(3) Choose a fixed number 119889 and truncate each numberof the set 119879 to have only 119889 decimal positions

(4) Scale up all the numbers of the set obtained in theprevious step by multiplying all numbers by 10119889 inorder to leave only nonnegative integer numbers 119864 =

1198901 1198902 119890119898 119890119899 where 119890119895 = 119905119895 sdot 10119889forall119895 isin 1 2

119899 and 119890119898 is the maximum value of the set 119864(5) For each element 119890119895 isin 119864 with 119895 = 1 2 119899

concatenate (119890119898 minus 119890119895) zeroes with 119890119895 ones

For instance let 119877 sub R be defined as 119877 = 24 minus07 18 15now let us use the modified Johnson-Mobius code to convertthe elements of 119877 into binary vectors

(1) Subtract the minimum Since minus07 is the minimum in119877 the members of the transformed set 119879 are obtainedby subtracting minus07 from each member in 119877 119905119894 = 119889119894 minus(minus07) = 119889119894 + 07 Consider

119863 997888rarr 119879 sub R

119879 = 31 0 25 22 (6)

(2) Scale up the numbers We select 119889 = 1 so eachnumber is truncate to 1 decimal and then multiply by10 (10119889 = 101 = 10) One has

119879 997888rarr 119864 sub Z+

119864 = 31 0 25 22 (7)

(3) Concatenate (119890119898 minus119890119895) zeroes with 119890119895 ones where 119890119898 isthemaximumnumber in119864 and 119890119895 is the current num-ber to be coded Given that the maximum numberin 119864 is 31 all the binary vectors have 31 componentsthat is they all are 31 bits long For example 1198904 =

22 is converted into its binary representation 1198884by appending 119890119898 minus 1198904 = 31 minus 22 = 9 zeroes fol-lowed by 1198904 = 22 ones which gives the final vectorldquo0000000001111111111111111111111rdquo Consider

119864 997888rarr 119862 = 119888119894 | 119888119894 isin11986031

119862 =

[1111111111111111111111111111111][0000000000000000000000000000000][0000001111111111111111111111111][0000000001111111111111111111111]

(8)

Definition 5 (Gammaoperator) The similarityGammaoper-ator takes two binary patterns119909 isin 119860119899 and119910 isin 119860119898 119899119898 isin Z+119899 le 119898 and a nonnegative integer 120579 as input and outputs abinary number for which there are two cases

Case 1 If 119899 = 119898 then the output is computed according tothe following

120574119892 (119909 119910 120579) =

1 if 119898 minus 119906120573 [120572 (119909 119910) mod 2] le 120579

0 otherwise(9)

where mod denotes the usual modulo operation

Case 2 If 119899 lt 119898 then the output is computed using 119910119909instead of 119910 as follows

120574119892 (119909 119910 120579)

=

1 if 119898 minus 119906120573 [120572 (119909 1199101003817100381710038171003817119909) mod 2] le 120579

0 otherwise

(10)

In order to better illustrate how the generalized Gammaoperator works let us work through some examples of itsapplication Then if 119909 = [1 0 1 1 1] 119910 = [1 0 0 1 0]and 120579 = 1 what is the result of 120574119892(119909 119910 120579) First we calculate120572(119909 119910) = [1 1 2 1 2] then

120572 (119909 119910) mod 2 = [1 1 2 1 2] mod 2

= [1 1 0 1 0] (11)

now

119906120573 (120572 (119909 119910) mod 2) = 119906120573 [1 1 0 1 0]

= sum[120573 (1 1) 120573 (1 1) 120573 (0 0) 120573 (1 1) 120573 (0 0)]

= 1+ 1+ 0+ 1+ 0 = 3

(12)

Mathematical Problems in Engineering 5

and since 119898 = 5 5 minus 3 = 2 given that 2 le (120579 = 1) thenthe result of 120574119892 is 0 and it is decided that given 120579 the twovectors are not similar However if we increase 120579 = 2 theresult of 120574119892 will be 1 since 2 le (120579 = 2) In this case thetwo vectors will be considered similar The main idea of thegeneralized Gamma operator is to indicate (result equal to 1)that two binary vectors are similar allowing up to 120579 bits to bedifferent and still consider those vectors similar If more than120579 bits are different the vectors are said to be not similar (resultequal to 0) Thus if 120579 = 0 both vectors must be equal for 120574119892to output a 1

211 Gamma Classifier Algorithm Let (119909120583 119910120583) | 120583 =

1 2 119901 be the fundamental pattern set with cardinality 119901when a test pattern 119909 isin R119899 with 119899 isin Z+ is presented to theGamma classifier these steps are followed

(1) Code the fundamental set using the modified John-son-Mobius code obtaining a value 119890119898 for each com-ponent of the 119899-dimensional vectors in the funda-mental set These 119890119898 values are calculated from theoriginal real-valued vectors as

119890119898 (119895) = MAX119901119894=1 (119909119894119895) forall119895 isin 1 2 119899 (13)

(2) Compute the stop parameter 120588 = MAX119899119895=1[119890119898(119895)]

(3) Code the test pattern using the modified Johnson-Mobius code with the same parameters used to codethe fundamental set If any 119890119895 (obtained by offsettingscaling and truncating the original 119909119895) is greaterthan the corresponding 119890119898(119895) code it using the largernumber of bits

(4) Transform the index of all fundamental patterns intotwo indices one for their class and another for theirposition in the class (eg 119909120583 which is the 120596th patternfor class 119894 becomes 119909119894120596)

(5) Initialize 120579 to 0

(6) If 120579 = 0 test whether 119909 is a fundamental pattern bycalculating 120574119892(119909

120583119895 119909119895 0) for 120583 = 1 2 119901 and then

computing the initial weighted addition 1198880120583 for each

fundamental pattern as follows

1198880120583 =

119899

sum

119894=1120574119892 (119909120583119895 119909119895 0) for 120583 = 1 2 119901 (14)

If there is a unique maximum whose value equals 119899assign the class associated with such maximum to thetest pattern Consider

119910 = 119910120590 such that MAX119901119894=1119888

0119894 = 119888

0120590 = 119899 (15)

(7) Calculate 120574119892(119909119894120596119895 119909119895 120579) for each component of the

fundamental patterns

(8) Compute a weighted sum 119888119894 for each class accordingto this equation

119888119894 =

sum119896119894120596=1sum119899119895=1 120574119892 (119909

119894120596119895 119909119895 120579)

119896119894

(16)

where 119896119894 is the cardinality in the fundamental set ofclass 119894

(9) If there is more than one maximum among the dif-ferent 119888119894 increment 120579 by 1 and repeat steps (7) and(8) until there is a unique maximum or the stopcondition 120579 gt 120588 is fulfilled

(10) If there is a unique maximum among the 119888119894 assign119909 to the class corresponding to such maximum

119910 = 119910119895 such that MAX 119888119894 = 119888119895 (17)

(11) Otherwise assign 119909 to the class of the first maximum

As an example let us consider the following fundamentalpatterns

1199091= (

28)

1199092= (

19)

1199093= (

63)

1199094= (

74)

(18)

grouped in two classes 1198621 = 1199091 119909

2 and 1198622 = 119909

3 119909

4 Then

the patterns to be classified are

1199091= (

62)

1199092= (

39)

(19)

As can be seen the dimension of all patterns is 119899 = 2 andthere are 2 classes both with the same cardinality 1198961 = 2 and1198962 = 2 Now the steps in the algorithm are followed

6 Mathematical Problems in Engineering

(1) Code the fundamental set with themodified Johnson-Mobius code obtaining a value 119890119898 for each compo-nent Consider

1199091= [

0000011011111111

]

1199092= [

0000001111111111

]

1199093= [

0111111000000111

]

1199094= [

1111111000001111

]

(20)

Since the maximum value among the first compo-nents of all fundamental patterns is 7 then 119890119898(1) = 7and the maximum value among the second compo-nents is 119890119898(2) = 9Thus the binary vectors represent-ing the first components are 7 bits long while thoserepresenting the second component are 9 bits long

(2) Compute the stop parameter

120588 = MAX119899119895=1 [119890119898 (119895)] = MAX2119895=1 [119890119898 (1) 119890119898 (2)]

= MAX (7 9) = 7(21)

(3) Code the patterns to be classified into the modifiedJohnson-Mobius code using the same parametersused with the fundamental set

1199091= [

0111111000000011

]

1199092= [

0000111111111111

]

(22)

Again the binary vectors representing the first com-ponents are 7 bits long while those representing thesecond component are 9 bits long

(4) Transform the index of all fundamental patterns intotwo indices one for the class they belong to andanother for their position in the class Consider

1199091= 119909

11

1199092= 119909

12

1199093= 119909

21

1199094= 119909

22

(23)

Given that 1198621 = 1199091 119909

2 and 1198622 = 119909

3 119909

4 we know

that both 1199091 and 119909

2 belong to class 1198621 so the firstindex for both patterns becomes 1 the second index isused to differentiate between patterns assigned to thesame class thus 1199091 becomes 11990911 and 1199092 is now 119909

12Something similar happens to 1199093 and 1199094 but with thefirst index being 2 since they belong to class 1198622 119909

3

becomes 11990921 and 1199094 is now 11990922

(5) Initialize = 0(6) Calculate 120574119892(119909

119894120596119895 119909119895 120579) for each component of the

fundamental patterns in each class Thus for 1199091 wehave

120574119892 (119909111 119909

11 0) = 0

120574119892 (119909112 119909

12 0) = 0

120574119892 (119909121 119909

11 0) = 0

120574119892 (119909122 119909

12 0) = 0

120574119892 (119909211 119909

11 0) = 1

120574119892 (119909212 119909

12 0) = 0

120574119892 (119909221 119909

11 0) = 0

120574119892 (119909222 119909

12 0) = 0

(24)

Since 120579 = 0 for 120574119892 to give a result of 1 it is necessarythat both vectors 119909119894120596119895 and 1199091119895 are equal But only onefundamental pattern has its first component equal tothat of 1199091 119909211 = [0111111] = 119909

11 while no funda-

mental pattern has a second component equal to thatof the test pattern Given that this is the only case forwhich a component of a fundamental pattern is equalto the same component of the test pattern it is alsothe only instance in which 120574119892 outputs 1

Now for 1199092 we have

120574119892 (119909111 119909

21 0) = 0

120574119892 (119909112 119909

22 0) = 0

120574119892 (119909121 119909

21 0) = 0

120574119892 (119909122 119909

22 0) = 1

120574119892 (119909211 119909

21 0) = 0

120574119892 (119909212 119909

22 0) = 0

120574119892 (119909221 119909

21 0) = 0

120574119892 (119909222 119909

22 0) = 0

(25)

Mathematical Problems in Engineering 7

Again 120579 = 0 thus forcing both vectors 119909119894120596119895 and 1199092119895

to be equal in order to obtain a 1 Similarly to whathappened in the case of 1199091 there is but one instanceof such occurrence 119909122 = [1111111] = 11990922

(7) Compute a weighted sum 119888119894 for each class

1198881 =sum1198961120596=1sum119899119895=1 120574119892 (119909

1120596119895 119909

1119895 120579)

1198961

=

sum2120596=1sum

2119895=1 120574119892 (119909

1120596119895 119909

1119895 120579)

2

=

sum2120596=1 [(0 + 0) (0 + 0)]

2=

(0 + 0)2

= 0

(26)

Here we add together the results obtained on allcomponents of all fundamental patterns belongingto class 1198621 Since all gave 0 (ie none of these weresimilar to the corresponding component of the testpattern given the value of 120579 in effect) the result of theweighted addition for class 1198621 is 1198881 = 0 One has

1198882 =sum1198962120596=1sum119899119895=1 120574119892 (119909

2120596119895 119909

1119895 120579)

1198962

=

sum2120596=1sum

2119895=1 120574119892 (119909

2120596119895 119909

1119895 120579)

2

=

sum2120596=1 [(1 + 0) (0 + 0)]

2=

(1 + 0)2

= 05

(27)

In this case there was one instance similar to thecorresponding component of the test pattern (equalsince 120579 = 0 during this run) Thus the weightedaddition for class 1198622 is 1198882 = 05 given that the sum ofresults is divided by the cardinality of the class 119896 = 2And for 1199092

1198881 =sum1198961120596=1sum119899119895=1 120574119892 (119909

1120596119895 119909

2119895 120579)

1198961

=

sum2120596=1sum

2119895=1 120574119892 (119909

1120596119895 119909

2119895 120579)

2

=

sum2120596=1 [(0 + 0) (0 + 1)]

2=

(0 + 0)2

= 05

(28)

There was one result of 120574119892 equal to 1 which coupledwith the cardinality of class1198621 being 2makes 1198881 = 05One has

1198882 =sum1198962120596=1sum119899119895=1 120574119892 (119909

2120596119895 119909

2119895 120579)

1198962

=

sum2120596=1sum

2119895=1 120574119892 (119909

2120596119895 119909

2119895 120579)

2

=

sum2120596=1 [(0 + 0) (0 + 0)]

2=

(0 + 0)2

= 0

(29)

No similarities were found between the fundamentalpatterns of class1198622 and the test pattern 1199092 (for 120579 = 0)thus 1198882 = 0

(8) If there is more than one maximum among the dif-ferent 119888119894 increment 120579 by 1 and repeat steps (6) and (7)until there is a unique maximum or the stop condi-tion 120579 ge 120588 is fulfilledThere is a uniquemaximum (foreach test pattern) so we go directly to step (9)

(9) If there is a unique maximum assign 119910 to the classcorresponding to such maximum For 1199091

1198621199091 = 1198622 since

MAX2119894=1119888119894 = MAX (0 05) = 05 = 1198882

(30)

while for 1199092

1198621199092 = 1198621 since

MAX2119894=1119888119894 = MAX (05 0) = 05 = 1198881

(31)

(10) Otherwise assign 119910 to the class of the first maximumThis is unnecessary since both test patterns havealready been assigned a class

22 SlidingWindows Awidespread approach for data streammining is the use of a sliding window to keep only a represen-tative portion of the data We are not interested in keeping allthe information of the data stream particularly whenwe haveto meet the memory space constraint required for algorithmsworking in this context This technique is able to deal withconcept drift eliminating those data points that come froman old concept According to [5] the training window sizecan be fixed or variable over time

Sliding windows of a fixed size store in memory a fixednumber of the 119908 most recent examples Whenever a newexample arrives it is saved to memory and the oldest one isdiscarded These types of windows are similar to first-in first-out data structures This simple adaptive learning method isoften used as a baseline in evaluation of new algorithms

Sliding windows of variable size vary the number ofexamples in a window over time typically depending on theindications of a change detector A straightforward approachis to shrink the window whenever a change is detected suchthat the training data reflects the most recent concept andgrow the window otherwise

23 Proposed Method The Gamma classifier has previ-ously been used for time series prediction in air pollution[37] and oil well production [40] showing very promisingresults The current work takes the Gamma classifier andcombines it with a sliding window approach with the aimof developing a newmethod capable of performing classifica-tion over a continuous data stream referred to asDS-Gammaclassifier

8 Mathematical Problems in Engineering

(1) Initialize window(2) for each 119909120591 do(3) If 120591 lt 119882 then(4) Add 119909120591 to the tail of the window(5) Else(6) Use 119909120591 to classify(7) Add 119909120591 to the tail of the window

Algorithm 1 Learning policy algorithm

A data stream 119878 can be defined as a sequence of dataelements of the form 119878 = 1199041 1199042 119904now from a universe ofsize119863 where119863 is huge and potentially infinite but countableWe assume that each element 119904119894 has the following form

119904119894 = (119909120591 119910120591) (32)

where 119909120591 is an input vector of size 119899 and 119910120591 is the class labelassociatedwith it 120591 isin Z+ is an index to indicate the sequentialorder of the elements in the data stream 120591 will be boundedby the size of the data stream such that if the data stream isinfinite 120591 will take infinite values but countable

One of the main challenges for mining data streamsis the ability to maintain an accurate decision model Thecurrent model has to incorporate new knowledge from newdata and also it should forget old information when datais outdated To address these issues the proposed methodimplements a learning and forgetting policy The learning(insertion)forgetting (removal) policies were designed toguarantee the space and temporal relevance of the data andthe consistency of the current model

The proposed method implements a fixed windowapproach This approach achieves concept drift detection inan indirect way by considering only the most recent dataAlgorithm 1 outlines the general steps used as learning policyInitially the window is created with the first 119882 labeledrecords where 119882 is the size of the window After thatevery sample is used for classification and evaluation of theperformance and then it is used to update the window

Classifiers that deal with concept drifts are forced toimplement forgetting adaptation or drift detection mech-anisms in order to adjust to changing environments Inthis proposal a forgetting mechanism is implemented Algo-rithm 2 outlines the basic forgetting policy implementedin this proposal The oldest element in the window 119909

120575 isremoved

24 MOA Massive Online Analysis (MOA) is a softwareenvironment for implementing algorithms and runningexperiments for online learning from evolving data streams[41] MOA contains a collection of offline and online algo-rithms for both classification and clustering as well as toolsfor evaluation on data streams with concept drifts MOA alsocan be used to build synthetic data streams using generatorsreading ARFF files or joining several streams MOA streamsgenerators allow simulating potentially infinite sequences of

Table 2 Datasets characteristics

Dataset Instances Attributes ClassesAgrawal 100000 10 2Bank 45211 16 2CoverType 581012 54 7Electricity 45312 8 2Hyperplane 100000 10 2RandomRBF 100000 10 2

data with different characteristics (different types of conceptdrifts and recurrent concepts among others) All the algo-rithms used for comparative analysis are implemented underthe MOA framework

25 Data Streams This section provides a brief descriptionof the data streams used during the experimental phase Theproposed solution has been tested using synthetic and realdata streams For data stream classification problems just afew suitable data streams can be successfully used for evalu-ation A few data streams with enough examples are hostedin [42 43] For further evaluation the use of synthetic datastreams generators becomes necessary The synthetics datastreams used in our experiments were generated using theMOA framework mentioned in the previous section A sum-mary of the main characteristics of the data streams is shownin Table 2 Descriptions of the data streams were taken from[42 43] in the case of real ones and from [44] for the synthet-ics

251 Real World Data Streams

Bank Marketing Dataset The Bank Marketing dataset isavailable from UCI Machine Learning Repository [43] Thisdata is related to direct marketing campaigns (phone calls)of a Portuguese banking institution The classification goal isto predict if the client will subscribe or not a term deposit Itcontains 45211 instances Each instance has 16 attributes andthe class label The dataset has two classes that identified if abank term deposit would be or would not be subscribed

Electricity Dataset This data was collected from the Aus-tralian New South Wales electricity market In this marketprices are not fixed and are affected by demand and supply ofthemarketTheprices of themarket are set every fiveminutesThe dataset was first described by Harries in [45] During thetime period described in the dataset the electricity marketwas expanded with the inclusion of adjacent areas whichleads tomore elaboratemanagement of the supplyThe excessproduction of one region could be sold on the adjacentregionThe Electricity dataset contains 45312 instances Eachexample of the dataset refers to a period of 30 minutes (iethere are 48 instances for each time period of one day) Eachexample on the dataset has 8 fields the day of week the timestamp the New South Wales electricity demand the NewSouthWales electricity price the Victoria electricity demandtheVictoria electricity price the scheduled electricity transfer

Mathematical Problems in Engineering 9

(1) for each 119909120591 do(2) If 120591 gt 119882 then(3) Remove 119909120575(120575 = 120591 minus119882 + 1) from the beginning of the window

Algorithm 2 Forgetting policy algorithm

between states and the class label The class label identifiesthe change of the price relative to a moving average of the last24 hours We use the normalized versions of these datasets sothat the numerical values are between 0 and 1 The dataset isavailable from [42]

Forest CoverType Dataset The Forest CoverType dataset isone of the largest datasets available from the UCI MachineLearning Repository [43] This dataset contains informationthat describes forest cover types from cartographic variablesA given observation (30times 30meter cell) was determined fromthe US Forest Service (USFS) Region 2 Resource InformationSystem (RIS) data It contains 581012 instances and 54attributes Data is in raw form (not scaled) and containsbinary (0 or 1) columns of data for qualitative independentvariables (wilderness areas and soil types) The dataset has 7classes that identify the type of forest cover

252 Synthetic Data Streams

Agrawal Data Stream This generator was originally intro-duced by Agrawal et al in [46] The MOA implementationwas used to generate the data streams for the experimen-tal phase in this work The generator produces a streamcontaining nine attributes Although not explicitly stated bythe authors a sensible conclusion is that these attributesdescribe hypothetical loan applications There are ten func-tions defined for generating binary class labels from theattributes Presumably these determine whether the loanshould be approved Perturbation shifts numeric attributesfrom their true value adding an offset drawn randomly froma uniform distribution the range of which is a specifiedpercentage of the total value range [44] For each experimenta data stream with 100000 examples was generated

RandomRBFData StreamThis generator was devised to offeran alternate complex concept type that is not straightforwardto approximate with a decision tree model The RBF (RadialBasis Function) generator works as follows A fixed numberof random centroids are generated Each center has a randomposition a single standard deviation class label and weightNew examples are generated by selecting a center at randomtaking weights into consideration so that centers with higherweight are more likely to be chosen A random direction ischosen to offset the attribute values from the central pointThe length of the displacement is randomly drawn from aGaussian distribution with standard deviation determined bythe chosen centroid The chosen centroid also determinesthe class label of the example Only numeric attributes aregenerated [44] For our experiments a data stream with

100000 data instances 10 numerical attributes and 2 classeswas generated

Rotating Hyperplane Data Stream This synthetic data streamwas generated using the MOA framework A hyperplane in119889-dimensional space is the set of points 119909 that satisfy [44]

119889

sum

119894=1119909119894119908119894 = 1199080 =

119889

sum

119894=1119908119894 (33)

where 119909119894 is the 119894th coordinate of 119909 examples for whichsum119889119894=1 119909119894119908119894 ge 1199080 are labeled positive and examples for which

sum119889119894=1 119909119894119908119894 lt 1199080 are labeled negative Hyperplanes are useful

for simulating time-changing concepts because orientationand position of the hyperplane can be adjusted in a gradualway if we change the relative size of the weights In MOAthey introduce change to this data stream by adding driftto each weight attribute using this formula 119908119894 = 119908119894 + 119889120590where 120590 is the probability that the direction of change isreversed and 119889 is the change applied to every example Forour experiments a data stream with 100000 data instances10 numerical attributes and 2 classes was generated

26 Algorithms Evaluation One of the objectives of this workis to perform a consistent comparison between the classi-fication performance of our proposal and the classificationperformance of other well-known data stream pattern clas-sification algorithms Evaluation of data stream algorithmsis a relatively new field that has not been studied as well asevaluation on batch learning In traditional batch learningevaluation has been mainly limited to evaluating a modelusing different random reorder of the same static dataset (119896-fold cross validation leave-one-out and bootstrap) [2] Fordata stream scenarios the problem of infinite data raises newchallenges On the other hand the batch approaches cannoteffectively measure accuracy of a model in a context withconcepts that change over time In the data stream settingone of the main concerns is how to design an evaluationpractice that allows us to obtain a reliablemeasure of accuracyover time According to [41] two main approaches arise

261 Holdout When traditional batch learning reaches ascale where cross validation is too time-consuming it is oftenaccepted to insteadmeasure performance on a single holdoutset An extension of this approach can be applied for datastream scenarios First a set of 119872 instances is used to trainthe model then a set of the following unseen 119873 instances isused to test the model then it is again trained with the next119872 instances and testedwith the subsequence119873 instances andso forth

10 Mathematical Problems in Engineering

262 Interleaved Test-Then-Train or Prequential Each indi-vidual example can be used to test the model before it is usedfor training and from this the accuracy can be incrementallyupdated When intentionally performed in this order themodel is always being tested on examples it has not seen

Holdout evaluation gives a more accurate estimationof the performance of the classifier on more recent dataHowever it requires recent test data that is sometimes difficultto obtain while the prequential evaluation has the advantagethat no holdout set is needed for testing making maximumuse of the available data In [47] Gama et al present a generalframework to assess data stream algorithms They studieddifferent mechanisms of evaluation that include holdoutestimator the prequential error and the prequential errorestimated over a sliding window or using fading factorsTheystated that the use of prequential error with forgetting mech-anisms reveals to be advantageous in assessing performanceand in comparing stream learning algorithms Since ourproposedmethod isworkingwith a slidingwindowapproachwe have selected the prequential error over sliding windowfor evaluation of the proposedmethodThe prequential error119875119882 for a sliding window of size 119882 computed at time 119894 isbased on an accumulated sumof a 0-1 loss function119871 betweenthe predicted 119910119896 values and the observed values 119910119896 using thefollowing expression [47]

119875119882 (119894) =1119882

119894

sum

119896=119894minus119882+1119871 (119910119896 119910119896) (34)

We also evaluate the performance results of each competingalgorithm with the Gamma classifier using a statistical testWe use the Wilcoxon test [48] to assess the statistical sig-nificance of the results The Wilcoxon signed-ranks test is anonparametric test which can be used to rank the differencesin performances of two classifiers for each dataset ignoringthe signs and comparing the ranks for the positive and thenegative differences

27 Experimental Setup To ensure valid comparison of clas-sification performance the same conditions and validationschemes were applied in each experiment Classificationperformance of each of the compared algorithms was cal-culated using the prequential error approach These resultswill be used to compare the performance of our proposaland other data stream classification algorithms We haveaimed at choosing representative algorithms from differentapproaches covering the state of the art in data stream classi-fication such as Hoeffding Tree IBL Naıve Bayes with DDM(Drift Detection Method) SVM and ensemble methods Allof these algorithms are implemented in theMOA frameworkFurther details on the implementation of these algorithmscan be found in [44] For all data streams with exceptionof the Electricity dataset the experiments were executed 10times and the average of the performance and executiontime results are presented Records in the Electricity datasethave a temporal relation that should not be altered forthis reason with this specific dataset the experiment wasexecuted just one time The Bank and CoverType datasets

were reordered 10 times using the randomized filter fromthe WEKA platform [49] for each experiment For thesynthetics data streams (Agrawal Rotating Hyperplane andRandomRBF) 10 data streams were generated for each oneof them using different random seeds in MOA Performancewas calculated using the prequential error introduced in theprevious section We also measure the total time used forclassification to evaluate the efficiency of each algorithm Allexperiments were conducted using a personal computer withan Intel Core i3-2100 processor running Ubuntu 1304 64-bitoperating system with 4096GB of RAM

3 Results and Discussion

In this section we present and discuss the results obtainedduring the experimental phase throughout which six datastreams three real and three synthetic were used to obtainthe classification and time performance for each of thecompared classification algorithms First we evaluate thesensibility of the proposed method to the change on thewindows size which is a parameter that can largely influencethe performance of the classifier Table 3 shows the averageaccuracy of the DS-Gamma classifier for different windowsizes best accuracy for each data stream is emphasized withboldface Table 4 shows the average total time used to classifyeach data stream The results of the average performancewith indicator of the standard deviation are also depicted inFigure 1

In general the DS-Gamma classifier performanceimproved as the window size increases with the exceptionof Electricity data stream for this data stream performanceis better with smaller window sizes It is worth noting thatthe DS-Gamma classifier achieved its best performance forthe Electricity data stream with a window size of 50 witha performance of 8912 This performance is better thanall the performances achieved by all the other comparedalgorithms with a window size = 1000 For the other datastreams the best performance was generally obtained withwindow sizes between 500 and 1000 With window sizesgreater than 1000 performance keeps stable and in somecases even degrades as we can observe in the results fromthe Bank and Hyperplane data streams in Figure 1 Figure 2shows the average classification time for different windowsize A linear trend line (dotted line) and the coefficient ofdetermination 1198772 were included in the graphic of each datastream The 1198772 values ranging between 09743 and 09866show that time is growing in linear way as the size of thewindow increases

To evaluate the DS-Gamma classifier we compared itsperformance with other data stream classifiers Table 5 pre-sents the performance results of each evaluated algorithm oneach data stream including the standard deviation Best per-formance is emphasized with boldface We also include boxplot with the same results in Figure 3 For the Agrawal datastreamHoeffdingTree and theDS-Gamma classifier achievedthe two best performances OzaBoostAdwin presents the bestperformance for the CoverType and Electricity data streamsbut this ensemble classifier exhibits the highest classificationtime for all the data streams as we can observe in Table 6 and

Mathematical Problems in Engineering 11

Table 3 Average accuracy of DS-Gamma for different windows size

Dataset Window size50 100 200 500 750 1000 1250 1500 1750 2000

Agrawal 7964 8597 8986 9259 9288 9282 9267 9251 9232 9207Bank 6819 7340 7705 7913 7912 7896 7860 7824 7776 7737CoverType 3821 3761 3970 4430 4659 4777 4862 4919 4951 4974Electricity 8912 8526 8171 7984 7858 7714 7636 7594 7531 7449Hyperplane 7825 8265 8528 8656 8655 8630 8598 8565 8534 8500RandomRBF 6461 6739 6933 7056 7081 7085 7074 7063 7050 7033

Table 4 Averaged classification time (seconds) of DS-Gamma for different windows size

Dataset Window size50 100 200 500 750 1000 1250 1500 1750 2000

Agrawal 224 450 793 1926 2804 3900 4592 5757 6390 7349Bank 066 114 207 475 701 917 1154 1455 1583 1787CoverType 8347 14081 25120 34005 25412 33503 42894 52239 66924 67699Electricity 034 053 098 233 356 463 599 705 830 949Hyperplane 224 401 756 1819 2676 3028 4428 5296 6187 7063RandomRBF 279 409 773 1854 2735 3072 4502 5412 6223 7104

Table 5 Averaged accuracy () comparison

Classifier Agrawal Bank CoverType Electricity Hyperplane RandomRBFAcc 120590 Acc 120590 Acc 120590 Acc 120590 Acc 120590 Acc 120590

Bayes-DDM 8816 019 8338 141 6761 681 8105 mdash 8602 139 7187 382Perceptron DDM 6720 011 7343 1053 7076 555 7861 mdash 8565 149 7188 593Hoeffding Tree 9439 024 8852 011 7229 268 8241 mdash 8241 562 8672 169Active Classifier 9082 162 8821 014 6731 103 7859 mdash 7950 655 7532 270IBL-DS 8676 048 8652 021 6173 946 7534 mdash 8420 077 9332 121AWE 9217 059 8790 038 7616 366 7836 mdash 8588 165 9198 114OzaBoostAdwin 8711 262 8767 050 7855 497 8792 mdash 8321 191 9149 114SGD 6720 011 7656 166 5901 737 8452 mdash 8319 042 6174 331SPegasos 5582 017 7746 199 5047 517 5760 mdash 5280 608 5333 313DS-Gamma 9282 010 7896 022 4777 431 7714 mdash 8630 135 7085 254

Table 6 Averaged classification time (seconds) comparison

Classifier Agrawal Bank CoverType Electricity Hyperplane RandomRBFTime 120590 Time 120590 Time 120590 Time 120590 Time 120590 Time 120590

Bayes-DDM 062 011 039 008 1776 162 061 mdash 070 001 069 001Perceptron DDM 040 002 020 005 828 010 019 mdash 045 002 044 002Hoeffding Tree 098 023 054 003 3005 1106 078 mdash 136 005 140 005Active Classifier 064 003 026 002 1514 074 023 mdash 072 004 070 002IBL-DS 5319 211 3851 826 105760 6095 861 mdash 8639 096 8810 106AWE 1432 020 663 030 101247 30441 670 mdash 1803 364 3752 303OzaBoostAdwin 19262 4447 4693 1056 1058973 389664 3975 mdash 22399 5134 5704 1666SGD 014 004 011 0009 4035 007 012 mdash 027 002 027 002SPegasos 022 0006 011 0016 1684 012 012 mdash 031 005 024 001DS-Gamma 3900 358 916 005 33503 1291 463 mdash 3028 079 3072 094

12 Mathematical Problems in Engineering

Agrawal data stream

Window size

95

90

85

80

7550 100 200 500 750 1000 1250 1500 1750 2000

Hyperplane data stream

90

85

80

70

75

65

Window size50 100 200 500 750 1000 1250 1500 1750 2000

Bank data stream85

80

70

75

6550 100 200 500 750 1000 1250 1500 1750 2000

Window size

Window size

90

85

80

70

75

65

60

50

55

50 100 200 500 750 1000 1250 1500 1750 2000

RandomRBF data stream

60

50

40

30

20

10

0

Window size50 100 200 500 750 1000 1250 1500 1750 2000

CoverType data stream

90

95

85

80

70

75

6550 100 200 500 750 1000 1250 1500 1750 2000

Window size

Electricity data stream

Figure 1 Average accuracy of DS-Gamma for different windows size

Figure 4 This can be a serious disadvantage especially indata stream scenarios The lowest times are achieved by theperceptron with Drift Detection Method (DDM) but itsperformance is generally low when compared to the othermethods A fact that caught our attention was the low perfor-mance of the DS-Gamma classifier for the CoverType datastream This data stream presents a heavy class imbalancethat seems to be negatively affecting the performance of theclassifier As we can observe in step (8) of the DS-Gamma

classifier algorithm classification relies on a weighted sumper class when a class greatly outnumbers the other(s) thissum will be biased to the class with most members

To statistically compare the results obtained by the DS-Gamma classifier and the other evaluated algorithms theWilcoxon signed-ranks test was used Table 7 shows thevalues of the asymptotic significance (2-sided) obtained bythe Wilcoxon test using 120572 = 005 The obtained values areall greater than 120572 so the null hypothesis is not rejected and

Mathematical Problems in Engineering 13

Hyperplane data stream

Window size

Window size

CoverType data stream Electricity data stream

RandomRBF data stream

Agrawal data stream Bank data stream80

70

60

50

40

30

20

20

15

10

5

010

050 100 200 500 750 1000 1250 1500 1750 2000

50 100 200 500 750 1000 1250 1500 1750 2000

Window size50 100 200 500 750 1000 1250 1500 1750 2000 Window size

50 100 200 500 750 1000 1250 1500 1750 2000

Window size50 100 200 500 750 1000 1250 1500 1750 2000

Window size50 100 200 500 750 1000 1250 1500 1750 2000

minus10

70

60

50

40

30

20

10

0

550

450

350

250

150

50

10

8

6

4

2

0

minus10

minus50

70

60

50

40

30

20

10

0

minus10

minus2

minus5

R

2

= 09866

R

2

= 09854

R

2

= 09804

R

2

= 09796

R

2

= 09743

R

2

= 09842

Figure 2 Averaged classification time (seconds) of DS-Gamma for different windows size

we can infer that there are no significant differences amongthe compared algorithms

4 Conclusions

In this paper we describe a novel method for the task ofpattern classification over a continuous data stream based onan associative model The proposed method is a supervisedpattern recognition model based on the Gamma classifierDuring the experimental phase the proposed method wastested using different data stream scenarios with real andsynthetic data streams The proposed method presented verypromising results in terms of performance and classification

Table 7 Wilcoxon results

Compared algorithms Asymptotic significance (2-sided)Gamma-Bayes-DDM 0345Gamma-Perceptron DDM 0917Gamma-Hoeffding Tree 0116Gamma-Active Classifier 0463Gamma-IBL-DS 0345Gamma-AWE 0116Gamma-OzaBoostAdwin 0116Gamma-SGD 0600Gamma-SPegasos 0075

14 Mathematical Problems in Engineering

Agrawal data stream

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

95

90

85

80

75

70

65

60

55

50

Bank data stream

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

90

80

70

60

50

40

Hyperplane data stream

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

95

90

85

80

75

70

65

60

55

45

50

RandomRBF data stream

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

95

90

85

80

75

70

65

60

55

45

50

CoverType data stream

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

90

85

80

75

70

65

60

55

45

50

Electricity data stream100

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

90

80

70

60

50

40

30

20

10

0

Figure 3 Averaged accuracy () comparison

time when compared with some of the state-of-the-art algo-rithms for data stream classification

We also study the effect of the window size over the clas-sification accuracy and time In general accuracy improvedas the window size increased We observed that best perfor-mances were obtained with window sizes between 500 and

1000 For larger window sizes performance remained stableand in some cases even declined

Since the proposed method based its classification ona weighted sum per class it is strongly affected by severeclass imbalance as we can observe from the results obtainedwith the experiments performed with the CoverType dataset

Mathematical Problems in Engineering 15

Agrawal data stream

200

150

100

50

0

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

1

10

100

Hyperplane data stream

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

1

10

100

1000

10000

100000CoverType data stream

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

0102030405060708090

RandomRBF data stream

0

10

20

30

40

50

60Bank data stream

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

05

10152025303540

Electricity data stream

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

Figure 4 Averaged classification time (seconds) comparison

a heavily imbalance dataset where the accuracy of theclassifier was the worst of the compared algorithms As futurework a mechanism to address severe class imbalance shouldbe integrated to improve classification accuracy

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

The authors would like to thank the Instituto PolitecnicoNacional (Secretarıa Academica COFAA SIP CIC andCIDETEC) the CONACyT and SNI for their economicalsupport to develop this workThe support of theUniversity ofPorto was provided during the research stay from August toDecember 2014 The authors also want to thank Dr YennyVilluendas Rey for her contribution to perform the statisticalsignificance analysis

16 Mathematical Problems in Engineering

References

[1] V Turner J FGantzD Reinsel and SMinton ldquoThedigital uni-verse of opportunities rich data and the increasing value of theinternet of thingsrdquo Tech Rep IDC Information andData 2014httpwwwemccomleadershipdigital-universe2014iviewindexhtm

[2] A Bifet Adaptive Stream Mining Pattern Learning and Miningfrom Evolving Data Streams IOS Press Amsterdam The Neth-erlands 2010

[3] J Gama Knowledge Discovery from Data Streams Chapman ampHallCRC 2010

[4] J Beringer and E Hullermeier ldquoEfficient instance-based learn-ing on data streamsrdquo Intelligent Data Analysis vol 11 no 6 pp627ndash650 2007

[5] J Gama I Zliobaite A Bifet M Pechenizkiy and A Bou-chachia ldquoA survey on concept drift adaptationrdquo ACM Comput-ing Surveys vol 46 no 4 article 44 2014

[6] P Domingos and G Hulten ldquoMining high-speed data streamsrdquoinProceedings of the 6thACMSIGKDD International Conferenceon Knowledge Discovery and Data Mining pp 71ndash80 BostonMass USA August 2000

[7] G Hulten L Spencer and P Domingos ldquoMining time-chang-ing data streamsrdquo in Proceedings of the 7th ACM SIGKDD Inter-national Conference on Knowledge Discovery and Data Mining(KDD rsquo01) pp 97ndash106 San Francisco Calif USA August 2001

[8] C Liang Y Zhang P Shi and Z Hu ldquoLearning very fast deci-sion tree from uncertain data streams with positive and unla-beled samplesrdquo Information Sciences vol 213 pp 50ndash67 2012

[9] L Rutkowski M Jaworski L Pietruczuk and P Duda ldquoDeci-sion trees for mining data streams based on the gaussianapproximationrdquo IEEE Transactions on Knowledge and DataEngineering vol 26 no 1 pp 108ndash119 2014

[10] G Widmer andM Kubat ldquoLearning in the presence of conceptdrift and hidden contextsrdquo Machine Learning vol 23 no 1 pp69ndash101 1996

[11] F Ferrer-Troyano J S Aguilar-Ruiz and J C RiquelmeldquoData streams classification by incremental rule learning withparameterized generalizationrdquo in Proceedings of the 2006 ACMSymposium on Applied Computing pp 657ndash661 Dijon FranceApril 2006

[12] D Mena-Torres and J S Aguilar-Ruiz ldquoA similarity-basedapproach for data stream classificationrdquo Expert Systems withApplications vol 41 no 9 pp 4224ndash4234 2014

[13] A Shaker and E Hullermeier ldquoIBLStreams a system forinstance-based classification and regression on data streamsrdquoEvolving Systems vol 3 no 4 pp 235ndash249 2012

[14] A Ghazikhani R Monsefi and H S Yazdi ldquoRecursive leastsquare perceptron model for non-stationary and imbalanceddata stream classificationrdquo Evolving Systems vol 4 no 2 pp119ndash131 2013

[15] A Ghazikhani R Monsefi and H S Yazdi ldquoEnsemble ofonline neural networks for non-stationary and imbalanced datastreamsrdquo Neurocomputing vol 122 pp 535ndash544 2013

[16] A Sancho-Asensio A Orriols-Puig and E GolobardesldquoRobust on-line neural learning classifier system for data streamclassification tasksrdquo Soft Computing vol 18 no 8 pp 1441ndash14612014

[17] D M Farid L Zhang A Hossain et al ldquoAn adaptive ensembleclassifier for mining concept drifting data streamsrdquo ExpertSystems with Applications vol 40 no 15 pp 5895ndash5906 2013

[18] D Brzezinski and J Stefanowski ldquoCombining block-based andonline methods in learning ensembles from concept driftingdata streamsrdquo Information Sciences vol 265 pp 50ndash67 2014

[19] G Ditzler and R Polikar ldquoIncremental learning of conceptdrift from streaming imbalanced datardquo IEEE Transactions onKnowledge and Data Engineering vol 25 no 10 pp 2283ndash23012013

[20] M J Hosseini Z Ahmadi andH Beigy ldquoUsing a classifier poolin accuracy based tracking of recurring concepts in data streamclassificationrdquo Evolving Systems vol 4 no 1 pp 43ndash60 2013

[21] N Saunier and S Midenet ldquoCreating ensemble classifiersthrough order and incremental data selection in a streamrdquo Pat-tern Analysis and Applications vol 16 no 3 pp 333ndash347 2013

[22] MMMasud J Gao L Khan J Han and BMThuraisinghamldquoClassification and novel class detection in concept-driftingdata streams under time constraintsrdquo IEEE Transactions onKnowledge and Data Engineering vol 23 no 6 pp 859ndash8742011

[23] A Alzghoul M Lofstrand and B Backe ldquoData stream fore-casting for system fault predictionrdquo Computers and IndustrialEngineering vol 62 no 4 pp 972ndash978 2012

[24] J Kranjc J Smailovic V Podpecan M Grcar M Znidarsicand N Lavrac ldquoActive learning for sentiment analysis on datastreams methodology and workflow implementation in theClowdFlows platformrdquo Information Processing and Manage-ment vol 51 no 2 pp 187ndash203 2015

[25] J Smailovic M Grcar N Lavrac and M Znidarsic ldquoStream-based active learning for sentiment analysis in the financialdomainrdquo Information Sciences vol 285 pp 181ndash203 2014

[26] S Shalev-Shwartz Y Singer N Srebro and A Cotter ldquoPegasosprimal estimated sub-gradient solver for SVMrdquo in Proceedingsof the 24th International Conference on Machine Learning pp807ndash814 June 2007

[27] Z Miller B Dickinson W Deitrick W Hu and A H WangldquoTwitter spammer detection using data stream clusteringrdquoInformation Sciences vol 260 pp 64ndash73 2014

[28] B Krawczyk and M Wozniak ldquoIncremental weighted one-class classifier for mining stationary data streamsrdquo Journal ofComputational Science vol 9 pp 19ndash25 2015

[29] J Gama ldquoA survey on learning from data streams current andfuture trendsrdquo Progress in Artificial Intelligence vol 1 no 1 pp45ndash55 2012

[30] L I Kuncheva ldquoChange detection in streaming multivariatedata using likelihood detectorsrdquo IEEE Transactions on Knowl-edge and Data Engineering vol 25 no 5 pp 1175ndash1180 2013

[31] G J Ross N M Adams D K Tasoulis and D J Hand ldquoExpo-nentially weighted moving average charts for detecting conceptdriftrdquoPatternRecognition Letters vol 33 no 2 pp 191ndash198 2012

[32] J Gama P Medas G Castillo and P Rodrigues ldquoLearning withdrift detectionrdquo in Advances in Artificial IntelligencemdashSBIA2004 vol 3171 of Lecture Notes in Computer Science pp 286ndash295 Springer Berlin Germany 2004

[33] X Wu P Li and X Hu ldquoLearning from concept drifting datastreams with unlabeled datardquoNeurocomputing vol 92 pp 145ndash155 2012

[34] X Qin Y Zhang C Li and X Li ldquoLearning from data streamswith only positive and unlabeled datardquo Journal of IntelligentInformation Systems vol 40 no 3 pp 405ndash430 2013

Mathematical Problems in Engineering 17

[35] M M Masud C Woolam J Gao et al ldquoFacing the reality ofdata stream classification coping with scarcity of labeled datardquoKnowledge and Information Systems vol 33 no 1 pp 213ndash2442012

[36] J Read A Bifet G Holmes and B Pfahringer ldquoScalable andefficient multi-label classification for evolving data streamsrdquoMachine Learning vol 88 no 1-2 pp 243ndash272 2012

[37] I Lopez-Yanez A J Arguelles-Cruz O Camacho-Nieto andC Yanez-Marquez ldquoPollutants time-series prediction usingthe gamma classifierrdquo International Journal of ComputationalIntelligence Systems vol 4 no 4 pp 680ndash711 2011

[38] M E Acevedo-Mosqueda C Yanez-Marquez and I Lopez-Yanez ldquoAlpha-beta bidirectional associative memories theoryand applicationsrdquo Neural Processing Letters vol 26 no 1 pp 1ndash40 2007

[39] C Yanez-Marquez E M Felipe-Riveron I Lopez-Yanez andR Flores-Carapia ldquoA novel approach to automatic color match-ingrdquo in Progress in Pattern Recognition Image Analysis andApplications vol 4225 of Lecture Notes in Computer Science pp529ndash538 Springer Berlin Germany 2006

[40] I Lopez-Yanez L Sheremetov and C Yanez-Marquez ldquoAnovel associative model for time series data miningrdquo PatternRecognition Letters vol 41 no 1 pp 23ndash33 2014

[41] A Bifet G Holmes R Kirkby and B Pfahringer ldquoData streammining a practical approachrdquo Tech RepUniversity ofWaikato2011

[42] Massive Online Analysis (MOA) 2014 httpmoacmswaikatoacnzdatasets

[43] K Bache and M Lichman UCI Machine Learning RepositorySchool of Information and Computer Science University ofCalifornia Irvine Calif USA 2013 httparchiveicsucieduml

[44] A Bifet R Kirkby P Kranen and P Reutemann ldquoMassiveOnline Analysis Manualrdquo 2012 httpmoacmswaikatoacnzdocumentation

[45] M Harries ldquoSplice-2 comparative evaluation electricity pric-ingrdquo Tech Rep University of South Wales Cardiff UK 1999

[46] R Agrawal T Imielinski and A Swami ldquoDatabase mining aperformance perspectiverdquo IEEE Transactions on Knowledge andData Engineering vol 5 no 6 pp 914ndash925 1993

[47] J Gama R Sebastiao and P P Rodrigues ldquoOn evaluatingstream learning algorithmsrdquo Machine Learning vol 90 no 3pp 317ndash346 2013

[48] F Wilcoxon ldquoIndividual comparisons by ranking methodsrdquoBiometrics Bulletin vol 1 no 6 pp 80ndash83 1945

[49] M Hall E Frank G Holmes B Pfahringer P Reutemannand I H Witten WEKA 3 Data Mining Software in Java 2010httpwwwcswaikatoacnzmlweka

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 5: Research Article Data Stream Classification Based …downloads.hindawi.com/journals/mpe/2015/939175.pdfuncertain data [ ], classi cation with unlabeled instances [ , , ], novel class

Mathematical Problems in Engineering 5

and since 119898 = 5 5 minus 3 = 2 given that 2 le (120579 = 1) thenthe result of 120574119892 is 0 and it is decided that given 120579 the twovectors are not similar However if we increase 120579 = 2 theresult of 120574119892 will be 1 since 2 le (120579 = 2) In this case thetwo vectors will be considered similar The main idea of thegeneralized Gamma operator is to indicate (result equal to 1)that two binary vectors are similar allowing up to 120579 bits to bedifferent and still consider those vectors similar If more than120579 bits are different the vectors are said to be not similar (resultequal to 0) Thus if 120579 = 0 both vectors must be equal for 120574119892to output a 1

211 Gamma Classifier Algorithm Let (119909120583 119910120583) | 120583 =

1 2 119901 be the fundamental pattern set with cardinality 119901when a test pattern 119909 isin R119899 with 119899 isin Z+ is presented to theGamma classifier these steps are followed

(1) Code the fundamental set using the modified John-son-Mobius code obtaining a value 119890119898 for each com-ponent of the 119899-dimensional vectors in the funda-mental set These 119890119898 values are calculated from theoriginal real-valued vectors as

119890119898 (119895) = MAX119901119894=1 (119909119894119895) forall119895 isin 1 2 119899 (13)

(2) Compute the stop parameter 120588 = MAX119899119895=1[119890119898(119895)]

(3) Code the test pattern using the modified Johnson-Mobius code with the same parameters used to codethe fundamental set If any 119890119895 (obtained by offsettingscaling and truncating the original 119909119895) is greaterthan the corresponding 119890119898(119895) code it using the largernumber of bits

(4) Transform the index of all fundamental patterns intotwo indices one for their class and another for theirposition in the class (eg 119909120583 which is the 120596th patternfor class 119894 becomes 119909119894120596)

(5) Initialize 120579 to 0

(6) If 120579 = 0 test whether 119909 is a fundamental pattern bycalculating 120574119892(119909

120583119895 119909119895 0) for 120583 = 1 2 119901 and then

computing the initial weighted addition 1198880120583 for each

fundamental pattern as follows

1198880120583 =

119899

sum

119894=1120574119892 (119909120583119895 119909119895 0) for 120583 = 1 2 119901 (14)

If there is a unique maximum whose value equals 119899assign the class associated with such maximum to thetest pattern Consider

119910 = 119910120590 such that MAX119901119894=1119888

0119894 = 119888

0120590 = 119899 (15)

(7) Calculate 120574119892(119909119894120596119895 119909119895 120579) for each component of the

fundamental patterns

(8) Compute a weighted sum 119888119894 for each class accordingto this equation

119888119894 =

sum119896119894120596=1sum119899119895=1 120574119892 (119909

119894120596119895 119909119895 120579)

119896119894

(16)

where 119896119894 is the cardinality in the fundamental set ofclass 119894

(9) If there is more than one maximum among the dif-ferent 119888119894 increment 120579 by 1 and repeat steps (7) and(8) until there is a unique maximum or the stopcondition 120579 gt 120588 is fulfilled

(10) If there is a unique maximum among the 119888119894 assign119909 to the class corresponding to such maximum

119910 = 119910119895 such that MAX 119888119894 = 119888119895 (17)

(11) Otherwise assign 119909 to the class of the first maximum

As an example let us consider the following fundamentalpatterns

1199091= (

28)

1199092= (

19)

1199093= (

63)

1199094= (

74)

(18)

grouped in two classes 1198621 = 1199091 119909

2 and 1198622 = 119909

3 119909

4 Then

the patterns to be classified are

1199091= (

62)

1199092= (

39)

(19)

As can be seen the dimension of all patterns is 119899 = 2 andthere are 2 classes both with the same cardinality 1198961 = 2 and1198962 = 2 Now the steps in the algorithm are followed

6 Mathematical Problems in Engineering

(1) Code the fundamental set with themodified Johnson-Mobius code obtaining a value 119890119898 for each compo-nent Consider

1199091= [

0000011011111111

]

1199092= [

0000001111111111

]

1199093= [

0111111000000111

]

1199094= [

1111111000001111

]

(20)

Since the maximum value among the first compo-nents of all fundamental patterns is 7 then 119890119898(1) = 7and the maximum value among the second compo-nents is 119890119898(2) = 9Thus the binary vectors represent-ing the first components are 7 bits long while thoserepresenting the second component are 9 bits long

(2) Compute the stop parameter

120588 = MAX119899119895=1 [119890119898 (119895)] = MAX2119895=1 [119890119898 (1) 119890119898 (2)]

= MAX (7 9) = 7(21)

(3) Code the patterns to be classified into the modifiedJohnson-Mobius code using the same parametersused with the fundamental set

1199091= [

0111111000000011

]

1199092= [

0000111111111111

]

(22)

Again the binary vectors representing the first com-ponents are 7 bits long while those representing thesecond component are 9 bits long

(4) Transform the index of all fundamental patterns intotwo indices one for the class they belong to andanother for their position in the class Consider

1199091= 119909

11

1199092= 119909

12

1199093= 119909

21

1199094= 119909

22

(23)

Given that 1198621 = 1199091 119909

2 and 1198622 = 119909

3 119909

4 we know

that both 1199091 and 119909

2 belong to class 1198621 so the firstindex for both patterns becomes 1 the second index isused to differentiate between patterns assigned to thesame class thus 1199091 becomes 11990911 and 1199092 is now 119909

12Something similar happens to 1199093 and 1199094 but with thefirst index being 2 since they belong to class 1198622 119909

3

becomes 11990921 and 1199094 is now 11990922

(5) Initialize = 0(6) Calculate 120574119892(119909

119894120596119895 119909119895 120579) for each component of the

fundamental patterns in each class Thus for 1199091 wehave

120574119892 (119909111 119909

11 0) = 0

120574119892 (119909112 119909

12 0) = 0

120574119892 (119909121 119909

11 0) = 0

120574119892 (119909122 119909

12 0) = 0

120574119892 (119909211 119909

11 0) = 1

120574119892 (119909212 119909

12 0) = 0

120574119892 (119909221 119909

11 0) = 0

120574119892 (119909222 119909

12 0) = 0

(24)

Since 120579 = 0 for 120574119892 to give a result of 1 it is necessarythat both vectors 119909119894120596119895 and 1199091119895 are equal But only onefundamental pattern has its first component equal tothat of 1199091 119909211 = [0111111] = 119909

11 while no funda-

mental pattern has a second component equal to thatof the test pattern Given that this is the only case forwhich a component of a fundamental pattern is equalto the same component of the test pattern it is alsothe only instance in which 120574119892 outputs 1

Now for 1199092 we have

120574119892 (119909111 119909

21 0) = 0

120574119892 (119909112 119909

22 0) = 0

120574119892 (119909121 119909

21 0) = 0

120574119892 (119909122 119909

22 0) = 1

120574119892 (119909211 119909

21 0) = 0

120574119892 (119909212 119909

22 0) = 0

120574119892 (119909221 119909

21 0) = 0

120574119892 (119909222 119909

22 0) = 0

(25)

Mathematical Problems in Engineering 7

Again 120579 = 0 thus forcing both vectors 119909119894120596119895 and 1199092119895

to be equal in order to obtain a 1 Similarly to whathappened in the case of 1199091 there is but one instanceof such occurrence 119909122 = [1111111] = 11990922

(7) Compute a weighted sum 119888119894 for each class

1198881 =sum1198961120596=1sum119899119895=1 120574119892 (119909

1120596119895 119909

1119895 120579)

1198961

=

sum2120596=1sum

2119895=1 120574119892 (119909

1120596119895 119909

1119895 120579)

2

=

sum2120596=1 [(0 + 0) (0 + 0)]

2=

(0 + 0)2

= 0

(26)

Here we add together the results obtained on allcomponents of all fundamental patterns belongingto class 1198621 Since all gave 0 (ie none of these weresimilar to the corresponding component of the testpattern given the value of 120579 in effect) the result of theweighted addition for class 1198621 is 1198881 = 0 One has

1198882 =sum1198962120596=1sum119899119895=1 120574119892 (119909

2120596119895 119909

1119895 120579)

1198962

=

sum2120596=1sum

2119895=1 120574119892 (119909

2120596119895 119909

1119895 120579)

2

=

sum2120596=1 [(1 + 0) (0 + 0)]

2=

(1 + 0)2

= 05

(27)

In this case there was one instance similar to thecorresponding component of the test pattern (equalsince 120579 = 0 during this run) Thus the weightedaddition for class 1198622 is 1198882 = 05 given that the sum ofresults is divided by the cardinality of the class 119896 = 2And for 1199092

1198881 =sum1198961120596=1sum119899119895=1 120574119892 (119909

1120596119895 119909

2119895 120579)

1198961

=

sum2120596=1sum

2119895=1 120574119892 (119909

1120596119895 119909

2119895 120579)

2

=

sum2120596=1 [(0 + 0) (0 + 1)]

2=

(0 + 0)2

= 05

(28)

There was one result of 120574119892 equal to 1 which coupledwith the cardinality of class1198621 being 2makes 1198881 = 05One has

1198882 =sum1198962120596=1sum119899119895=1 120574119892 (119909

2120596119895 119909

2119895 120579)

1198962

=

sum2120596=1sum

2119895=1 120574119892 (119909

2120596119895 119909

2119895 120579)

2

=

sum2120596=1 [(0 + 0) (0 + 0)]

2=

(0 + 0)2

= 0

(29)

No similarities were found between the fundamentalpatterns of class1198622 and the test pattern 1199092 (for 120579 = 0)thus 1198882 = 0

(8) If there is more than one maximum among the dif-ferent 119888119894 increment 120579 by 1 and repeat steps (6) and (7)until there is a unique maximum or the stop condi-tion 120579 ge 120588 is fulfilledThere is a uniquemaximum (foreach test pattern) so we go directly to step (9)

(9) If there is a unique maximum assign 119910 to the classcorresponding to such maximum For 1199091

1198621199091 = 1198622 since

MAX2119894=1119888119894 = MAX (0 05) = 05 = 1198882

(30)

while for 1199092

1198621199092 = 1198621 since

MAX2119894=1119888119894 = MAX (05 0) = 05 = 1198881

(31)

(10) Otherwise assign 119910 to the class of the first maximumThis is unnecessary since both test patterns havealready been assigned a class

22 SlidingWindows Awidespread approach for data streammining is the use of a sliding window to keep only a represen-tative portion of the data We are not interested in keeping allthe information of the data stream particularly whenwe haveto meet the memory space constraint required for algorithmsworking in this context This technique is able to deal withconcept drift eliminating those data points that come froman old concept According to [5] the training window sizecan be fixed or variable over time

Sliding windows of a fixed size store in memory a fixednumber of the 119908 most recent examples Whenever a newexample arrives it is saved to memory and the oldest one isdiscarded These types of windows are similar to first-in first-out data structures This simple adaptive learning method isoften used as a baseline in evaluation of new algorithms

Sliding windows of variable size vary the number ofexamples in a window over time typically depending on theindications of a change detector A straightforward approachis to shrink the window whenever a change is detected suchthat the training data reflects the most recent concept andgrow the window otherwise

23 Proposed Method The Gamma classifier has previ-ously been used for time series prediction in air pollution[37] and oil well production [40] showing very promisingresults The current work takes the Gamma classifier andcombines it with a sliding window approach with the aimof developing a newmethod capable of performing classifica-tion over a continuous data stream referred to asDS-Gammaclassifier

8 Mathematical Problems in Engineering

(1) Initialize window(2) for each 119909120591 do(3) If 120591 lt 119882 then(4) Add 119909120591 to the tail of the window(5) Else(6) Use 119909120591 to classify(7) Add 119909120591 to the tail of the window

Algorithm 1 Learning policy algorithm

A data stream 119878 can be defined as a sequence of dataelements of the form 119878 = 1199041 1199042 119904now from a universe ofsize119863 where119863 is huge and potentially infinite but countableWe assume that each element 119904119894 has the following form

119904119894 = (119909120591 119910120591) (32)

where 119909120591 is an input vector of size 119899 and 119910120591 is the class labelassociatedwith it 120591 isin Z+ is an index to indicate the sequentialorder of the elements in the data stream 120591 will be boundedby the size of the data stream such that if the data stream isinfinite 120591 will take infinite values but countable

One of the main challenges for mining data streamsis the ability to maintain an accurate decision model Thecurrent model has to incorporate new knowledge from newdata and also it should forget old information when datais outdated To address these issues the proposed methodimplements a learning and forgetting policy The learning(insertion)forgetting (removal) policies were designed toguarantee the space and temporal relevance of the data andthe consistency of the current model

The proposed method implements a fixed windowapproach This approach achieves concept drift detection inan indirect way by considering only the most recent dataAlgorithm 1 outlines the general steps used as learning policyInitially the window is created with the first 119882 labeledrecords where 119882 is the size of the window After thatevery sample is used for classification and evaluation of theperformance and then it is used to update the window

Classifiers that deal with concept drifts are forced toimplement forgetting adaptation or drift detection mech-anisms in order to adjust to changing environments Inthis proposal a forgetting mechanism is implemented Algo-rithm 2 outlines the basic forgetting policy implementedin this proposal The oldest element in the window 119909

120575 isremoved

24 MOA Massive Online Analysis (MOA) is a softwareenvironment for implementing algorithms and runningexperiments for online learning from evolving data streams[41] MOA contains a collection of offline and online algo-rithms for both classification and clustering as well as toolsfor evaluation on data streams with concept drifts MOA alsocan be used to build synthetic data streams using generatorsreading ARFF files or joining several streams MOA streamsgenerators allow simulating potentially infinite sequences of

Table 2 Datasets characteristics

Dataset Instances Attributes ClassesAgrawal 100000 10 2Bank 45211 16 2CoverType 581012 54 7Electricity 45312 8 2Hyperplane 100000 10 2RandomRBF 100000 10 2

data with different characteristics (different types of conceptdrifts and recurrent concepts among others) All the algo-rithms used for comparative analysis are implemented underthe MOA framework

25 Data Streams This section provides a brief descriptionof the data streams used during the experimental phase Theproposed solution has been tested using synthetic and realdata streams For data stream classification problems just afew suitable data streams can be successfully used for evalu-ation A few data streams with enough examples are hostedin [42 43] For further evaluation the use of synthetic datastreams generators becomes necessary The synthetics datastreams used in our experiments were generated using theMOA framework mentioned in the previous section A sum-mary of the main characteristics of the data streams is shownin Table 2 Descriptions of the data streams were taken from[42 43] in the case of real ones and from [44] for the synthet-ics

251 Real World Data Streams

Bank Marketing Dataset The Bank Marketing dataset isavailable from UCI Machine Learning Repository [43] Thisdata is related to direct marketing campaigns (phone calls)of a Portuguese banking institution The classification goal isto predict if the client will subscribe or not a term deposit Itcontains 45211 instances Each instance has 16 attributes andthe class label The dataset has two classes that identified if abank term deposit would be or would not be subscribed

Electricity Dataset This data was collected from the Aus-tralian New South Wales electricity market In this marketprices are not fixed and are affected by demand and supply ofthemarketTheprices of themarket are set every fiveminutesThe dataset was first described by Harries in [45] During thetime period described in the dataset the electricity marketwas expanded with the inclusion of adjacent areas whichleads tomore elaboratemanagement of the supplyThe excessproduction of one region could be sold on the adjacentregionThe Electricity dataset contains 45312 instances Eachexample of the dataset refers to a period of 30 minutes (iethere are 48 instances for each time period of one day) Eachexample on the dataset has 8 fields the day of week the timestamp the New South Wales electricity demand the NewSouthWales electricity price the Victoria electricity demandtheVictoria electricity price the scheduled electricity transfer

Mathematical Problems in Engineering 9

(1) for each 119909120591 do(2) If 120591 gt 119882 then(3) Remove 119909120575(120575 = 120591 minus119882 + 1) from the beginning of the window

Algorithm 2 Forgetting policy algorithm

between states and the class label The class label identifiesthe change of the price relative to a moving average of the last24 hours We use the normalized versions of these datasets sothat the numerical values are between 0 and 1 The dataset isavailable from [42]

Forest CoverType Dataset The Forest CoverType dataset isone of the largest datasets available from the UCI MachineLearning Repository [43] This dataset contains informationthat describes forest cover types from cartographic variablesA given observation (30times 30meter cell) was determined fromthe US Forest Service (USFS) Region 2 Resource InformationSystem (RIS) data It contains 581012 instances and 54attributes Data is in raw form (not scaled) and containsbinary (0 or 1) columns of data for qualitative independentvariables (wilderness areas and soil types) The dataset has 7classes that identify the type of forest cover

252 Synthetic Data Streams

Agrawal Data Stream This generator was originally intro-duced by Agrawal et al in [46] The MOA implementationwas used to generate the data streams for the experimen-tal phase in this work The generator produces a streamcontaining nine attributes Although not explicitly stated bythe authors a sensible conclusion is that these attributesdescribe hypothetical loan applications There are ten func-tions defined for generating binary class labels from theattributes Presumably these determine whether the loanshould be approved Perturbation shifts numeric attributesfrom their true value adding an offset drawn randomly froma uniform distribution the range of which is a specifiedpercentage of the total value range [44] For each experimenta data stream with 100000 examples was generated

RandomRBFData StreamThis generator was devised to offeran alternate complex concept type that is not straightforwardto approximate with a decision tree model The RBF (RadialBasis Function) generator works as follows A fixed numberof random centroids are generated Each center has a randomposition a single standard deviation class label and weightNew examples are generated by selecting a center at randomtaking weights into consideration so that centers with higherweight are more likely to be chosen A random direction ischosen to offset the attribute values from the central pointThe length of the displacement is randomly drawn from aGaussian distribution with standard deviation determined bythe chosen centroid The chosen centroid also determinesthe class label of the example Only numeric attributes aregenerated [44] For our experiments a data stream with

100000 data instances 10 numerical attributes and 2 classeswas generated

Rotating Hyperplane Data Stream This synthetic data streamwas generated using the MOA framework A hyperplane in119889-dimensional space is the set of points 119909 that satisfy [44]

119889

sum

119894=1119909119894119908119894 = 1199080 =

119889

sum

119894=1119908119894 (33)

where 119909119894 is the 119894th coordinate of 119909 examples for whichsum119889119894=1 119909119894119908119894 ge 1199080 are labeled positive and examples for which

sum119889119894=1 119909119894119908119894 lt 1199080 are labeled negative Hyperplanes are useful

for simulating time-changing concepts because orientationand position of the hyperplane can be adjusted in a gradualway if we change the relative size of the weights In MOAthey introduce change to this data stream by adding driftto each weight attribute using this formula 119908119894 = 119908119894 + 119889120590where 120590 is the probability that the direction of change isreversed and 119889 is the change applied to every example Forour experiments a data stream with 100000 data instances10 numerical attributes and 2 classes was generated

26 Algorithms Evaluation One of the objectives of this workis to perform a consistent comparison between the classi-fication performance of our proposal and the classificationperformance of other well-known data stream pattern clas-sification algorithms Evaluation of data stream algorithmsis a relatively new field that has not been studied as well asevaluation on batch learning In traditional batch learningevaluation has been mainly limited to evaluating a modelusing different random reorder of the same static dataset (119896-fold cross validation leave-one-out and bootstrap) [2] Fordata stream scenarios the problem of infinite data raises newchallenges On the other hand the batch approaches cannoteffectively measure accuracy of a model in a context withconcepts that change over time In the data stream settingone of the main concerns is how to design an evaluationpractice that allows us to obtain a reliablemeasure of accuracyover time According to [41] two main approaches arise

261 Holdout When traditional batch learning reaches ascale where cross validation is too time-consuming it is oftenaccepted to insteadmeasure performance on a single holdoutset An extension of this approach can be applied for datastream scenarios First a set of 119872 instances is used to trainthe model then a set of the following unseen 119873 instances isused to test the model then it is again trained with the next119872 instances and testedwith the subsequence119873 instances andso forth

10 Mathematical Problems in Engineering

262 Interleaved Test-Then-Train or Prequential Each indi-vidual example can be used to test the model before it is usedfor training and from this the accuracy can be incrementallyupdated When intentionally performed in this order themodel is always being tested on examples it has not seen

Holdout evaluation gives a more accurate estimationof the performance of the classifier on more recent dataHowever it requires recent test data that is sometimes difficultto obtain while the prequential evaluation has the advantagethat no holdout set is needed for testing making maximumuse of the available data In [47] Gama et al present a generalframework to assess data stream algorithms They studieddifferent mechanisms of evaluation that include holdoutestimator the prequential error and the prequential errorestimated over a sliding window or using fading factorsTheystated that the use of prequential error with forgetting mech-anisms reveals to be advantageous in assessing performanceand in comparing stream learning algorithms Since ourproposedmethod isworkingwith a slidingwindowapproachwe have selected the prequential error over sliding windowfor evaluation of the proposedmethodThe prequential error119875119882 for a sliding window of size 119882 computed at time 119894 isbased on an accumulated sumof a 0-1 loss function119871 betweenthe predicted 119910119896 values and the observed values 119910119896 using thefollowing expression [47]

119875119882 (119894) =1119882

119894

sum

119896=119894minus119882+1119871 (119910119896 119910119896) (34)

We also evaluate the performance results of each competingalgorithm with the Gamma classifier using a statistical testWe use the Wilcoxon test [48] to assess the statistical sig-nificance of the results The Wilcoxon signed-ranks test is anonparametric test which can be used to rank the differencesin performances of two classifiers for each dataset ignoringthe signs and comparing the ranks for the positive and thenegative differences

27 Experimental Setup To ensure valid comparison of clas-sification performance the same conditions and validationschemes were applied in each experiment Classificationperformance of each of the compared algorithms was cal-culated using the prequential error approach These resultswill be used to compare the performance of our proposaland other data stream classification algorithms We haveaimed at choosing representative algorithms from differentapproaches covering the state of the art in data stream classi-fication such as Hoeffding Tree IBL Naıve Bayes with DDM(Drift Detection Method) SVM and ensemble methods Allof these algorithms are implemented in theMOA frameworkFurther details on the implementation of these algorithmscan be found in [44] For all data streams with exceptionof the Electricity dataset the experiments were executed 10times and the average of the performance and executiontime results are presented Records in the Electricity datasethave a temporal relation that should not be altered forthis reason with this specific dataset the experiment wasexecuted just one time The Bank and CoverType datasets

were reordered 10 times using the randomized filter fromthe WEKA platform [49] for each experiment For thesynthetics data streams (Agrawal Rotating Hyperplane andRandomRBF) 10 data streams were generated for each oneof them using different random seeds in MOA Performancewas calculated using the prequential error introduced in theprevious section We also measure the total time used forclassification to evaluate the efficiency of each algorithm Allexperiments were conducted using a personal computer withan Intel Core i3-2100 processor running Ubuntu 1304 64-bitoperating system with 4096GB of RAM

3 Results and Discussion

In this section we present and discuss the results obtainedduring the experimental phase throughout which six datastreams three real and three synthetic were used to obtainthe classification and time performance for each of thecompared classification algorithms First we evaluate thesensibility of the proposed method to the change on thewindows size which is a parameter that can largely influencethe performance of the classifier Table 3 shows the averageaccuracy of the DS-Gamma classifier for different windowsizes best accuracy for each data stream is emphasized withboldface Table 4 shows the average total time used to classifyeach data stream The results of the average performancewith indicator of the standard deviation are also depicted inFigure 1

In general the DS-Gamma classifier performanceimproved as the window size increases with the exceptionof Electricity data stream for this data stream performanceis better with smaller window sizes It is worth noting thatthe DS-Gamma classifier achieved its best performance forthe Electricity data stream with a window size of 50 witha performance of 8912 This performance is better thanall the performances achieved by all the other comparedalgorithms with a window size = 1000 For the other datastreams the best performance was generally obtained withwindow sizes between 500 and 1000 With window sizesgreater than 1000 performance keeps stable and in somecases even degrades as we can observe in the results fromthe Bank and Hyperplane data streams in Figure 1 Figure 2shows the average classification time for different windowsize A linear trend line (dotted line) and the coefficient ofdetermination 1198772 were included in the graphic of each datastream The 1198772 values ranging between 09743 and 09866show that time is growing in linear way as the size of thewindow increases

To evaluate the DS-Gamma classifier we compared itsperformance with other data stream classifiers Table 5 pre-sents the performance results of each evaluated algorithm oneach data stream including the standard deviation Best per-formance is emphasized with boldface We also include boxplot with the same results in Figure 3 For the Agrawal datastreamHoeffdingTree and theDS-Gamma classifier achievedthe two best performances OzaBoostAdwin presents the bestperformance for the CoverType and Electricity data streamsbut this ensemble classifier exhibits the highest classificationtime for all the data streams as we can observe in Table 6 and

Mathematical Problems in Engineering 11

Table 3 Average accuracy of DS-Gamma for different windows size

Dataset Window size50 100 200 500 750 1000 1250 1500 1750 2000

Agrawal 7964 8597 8986 9259 9288 9282 9267 9251 9232 9207Bank 6819 7340 7705 7913 7912 7896 7860 7824 7776 7737CoverType 3821 3761 3970 4430 4659 4777 4862 4919 4951 4974Electricity 8912 8526 8171 7984 7858 7714 7636 7594 7531 7449Hyperplane 7825 8265 8528 8656 8655 8630 8598 8565 8534 8500RandomRBF 6461 6739 6933 7056 7081 7085 7074 7063 7050 7033

Table 4 Averaged classification time (seconds) of DS-Gamma for different windows size

Dataset Window size50 100 200 500 750 1000 1250 1500 1750 2000

Agrawal 224 450 793 1926 2804 3900 4592 5757 6390 7349Bank 066 114 207 475 701 917 1154 1455 1583 1787CoverType 8347 14081 25120 34005 25412 33503 42894 52239 66924 67699Electricity 034 053 098 233 356 463 599 705 830 949Hyperplane 224 401 756 1819 2676 3028 4428 5296 6187 7063RandomRBF 279 409 773 1854 2735 3072 4502 5412 6223 7104

Table 5 Averaged accuracy () comparison

Classifier Agrawal Bank CoverType Electricity Hyperplane RandomRBFAcc 120590 Acc 120590 Acc 120590 Acc 120590 Acc 120590 Acc 120590

Bayes-DDM 8816 019 8338 141 6761 681 8105 mdash 8602 139 7187 382Perceptron DDM 6720 011 7343 1053 7076 555 7861 mdash 8565 149 7188 593Hoeffding Tree 9439 024 8852 011 7229 268 8241 mdash 8241 562 8672 169Active Classifier 9082 162 8821 014 6731 103 7859 mdash 7950 655 7532 270IBL-DS 8676 048 8652 021 6173 946 7534 mdash 8420 077 9332 121AWE 9217 059 8790 038 7616 366 7836 mdash 8588 165 9198 114OzaBoostAdwin 8711 262 8767 050 7855 497 8792 mdash 8321 191 9149 114SGD 6720 011 7656 166 5901 737 8452 mdash 8319 042 6174 331SPegasos 5582 017 7746 199 5047 517 5760 mdash 5280 608 5333 313DS-Gamma 9282 010 7896 022 4777 431 7714 mdash 8630 135 7085 254

Table 6 Averaged classification time (seconds) comparison

Classifier Agrawal Bank CoverType Electricity Hyperplane RandomRBFTime 120590 Time 120590 Time 120590 Time 120590 Time 120590 Time 120590

Bayes-DDM 062 011 039 008 1776 162 061 mdash 070 001 069 001Perceptron DDM 040 002 020 005 828 010 019 mdash 045 002 044 002Hoeffding Tree 098 023 054 003 3005 1106 078 mdash 136 005 140 005Active Classifier 064 003 026 002 1514 074 023 mdash 072 004 070 002IBL-DS 5319 211 3851 826 105760 6095 861 mdash 8639 096 8810 106AWE 1432 020 663 030 101247 30441 670 mdash 1803 364 3752 303OzaBoostAdwin 19262 4447 4693 1056 1058973 389664 3975 mdash 22399 5134 5704 1666SGD 014 004 011 0009 4035 007 012 mdash 027 002 027 002SPegasos 022 0006 011 0016 1684 012 012 mdash 031 005 024 001DS-Gamma 3900 358 916 005 33503 1291 463 mdash 3028 079 3072 094

12 Mathematical Problems in Engineering

Agrawal data stream

Window size

95

90

85

80

7550 100 200 500 750 1000 1250 1500 1750 2000

Hyperplane data stream

90

85

80

70

75

65

Window size50 100 200 500 750 1000 1250 1500 1750 2000

Bank data stream85

80

70

75

6550 100 200 500 750 1000 1250 1500 1750 2000

Window size

Window size

90

85

80

70

75

65

60

50

55

50 100 200 500 750 1000 1250 1500 1750 2000

RandomRBF data stream

60

50

40

30

20

10

0

Window size50 100 200 500 750 1000 1250 1500 1750 2000

CoverType data stream

90

95

85

80

70

75

6550 100 200 500 750 1000 1250 1500 1750 2000

Window size

Electricity data stream

Figure 1 Average accuracy of DS-Gamma for different windows size

Figure 4 This can be a serious disadvantage especially indata stream scenarios The lowest times are achieved by theperceptron with Drift Detection Method (DDM) but itsperformance is generally low when compared to the othermethods A fact that caught our attention was the low perfor-mance of the DS-Gamma classifier for the CoverType datastream This data stream presents a heavy class imbalancethat seems to be negatively affecting the performance of theclassifier As we can observe in step (8) of the DS-Gamma

classifier algorithm classification relies on a weighted sumper class when a class greatly outnumbers the other(s) thissum will be biased to the class with most members

To statistically compare the results obtained by the DS-Gamma classifier and the other evaluated algorithms theWilcoxon signed-ranks test was used Table 7 shows thevalues of the asymptotic significance (2-sided) obtained bythe Wilcoxon test using 120572 = 005 The obtained values areall greater than 120572 so the null hypothesis is not rejected and

Mathematical Problems in Engineering 13

Hyperplane data stream

Window size

Window size

CoverType data stream Electricity data stream

RandomRBF data stream

Agrawal data stream Bank data stream80

70

60

50

40

30

20

20

15

10

5

010

050 100 200 500 750 1000 1250 1500 1750 2000

50 100 200 500 750 1000 1250 1500 1750 2000

Window size50 100 200 500 750 1000 1250 1500 1750 2000 Window size

50 100 200 500 750 1000 1250 1500 1750 2000

Window size50 100 200 500 750 1000 1250 1500 1750 2000

Window size50 100 200 500 750 1000 1250 1500 1750 2000

minus10

70

60

50

40

30

20

10

0

550

450

350

250

150

50

10

8

6

4

2

0

minus10

minus50

70

60

50

40

30

20

10

0

minus10

minus2

minus5

R

2

= 09866

R

2

= 09854

R

2

= 09804

R

2

= 09796

R

2

= 09743

R

2

= 09842

Figure 2 Averaged classification time (seconds) of DS-Gamma for different windows size

we can infer that there are no significant differences amongthe compared algorithms

4 Conclusions

In this paper we describe a novel method for the task ofpattern classification over a continuous data stream based onan associative model The proposed method is a supervisedpattern recognition model based on the Gamma classifierDuring the experimental phase the proposed method wastested using different data stream scenarios with real andsynthetic data streams The proposed method presented verypromising results in terms of performance and classification

Table 7 Wilcoxon results

Compared algorithms Asymptotic significance (2-sided)Gamma-Bayes-DDM 0345Gamma-Perceptron DDM 0917Gamma-Hoeffding Tree 0116Gamma-Active Classifier 0463Gamma-IBL-DS 0345Gamma-AWE 0116Gamma-OzaBoostAdwin 0116Gamma-SGD 0600Gamma-SPegasos 0075

14 Mathematical Problems in Engineering

Agrawal data stream

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

95

90

85

80

75

70

65

60

55

50

Bank data stream

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

90

80

70

60

50

40

Hyperplane data stream

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

95

90

85

80

75

70

65

60

55

45

50

RandomRBF data stream

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

95

90

85

80

75

70

65

60

55

45

50

CoverType data stream

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

90

85

80

75

70

65

60

55

45

50

Electricity data stream100

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

90

80

70

60

50

40

30

20

10

0

Figure 3 Averaged accuracy () comparison

time when compared with some of the state-of-the-art algo-rithms for data stream classification

We also study the effect of the window size over the clas-sification accuracy and time In general accuracy improvedas the window size increased We observed that best perfor-mances were obtained with window sizes between 500 and

1000 For larger window sizes performance remained stableand in some cases even declined

Since the proposed method based its classification ona weighted sum per class it is strongly affected by severeclass imbalance as we can observe from the results obtainedwith the experiments performed with the CoverType dataset

Mathematical Problems in Engineering 15

Agrawal data stream

200

150

100

50

0

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

1

10

100

Hyperplane data stream

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

1

10

100

1000

10000

100000CoverType data stream

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

0102030405060708090

RandomRBF data stream

0

10

20

30

40

50

60Bank data stream

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

05

10152025303540

Electricity data stream

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

Figure 4 Averaged classification time (seconds) comparison

a heavily imbalance dataset where the accuracy of theclassifier was the worst of the compared algorithms As futurework a mechanism to address severe class imbalance shouldbe integrated to improve classification accuracy

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

The authors would like to thank the Instituto PolitecnicoNacional (Secretarıa Academica COFAA SIP CIC andCIDETEC) the CONACyT and SNI for their economicalsupport to develop this workThe support of theUniversity ofPorto was provided during the research stay from August toDecember 2014 The authors also want to thank Dr YennyVilluendas Rey for her contribution to perform the statisticalsignificance analysis

16 Mathematical Problems in Engineering

References

[1] V Turner J FGantzD Reinsel and SMinton ldquoThedigital uni-verse of opportunities rich data and the increasing value of theinternet of thingsrdquo Tech Rep IDC Information andData 2014httpwwwemccomleadershipdigital-universe2014iviewindexhtm

[2] A Bifet Adaptive Stream Mining Pattern Learning and Miningfrom Evolving Data Streams IOS Press Amsterdam The Neth-erlands 2010

[3] J Gama Knowledge Discovery from Data Streams Chapman ampHallCRC 2010

[4] J Beringer and E Hullermeier ldquoEfficient instance-based learn-ing on data streamsrdquo Intelligent Data Analysis vol 11 no 6 pp627ndash650 2007

[5] J Gama I Zliobaite A Bifet M Pechenizkiy and A Bou-chachia ldquoA survey on concept drift adaptationrdquo ACM Comput-ing Surveys vol 46 no 4 article 44 2014

[6] P Domingos and G Hulten ldquoMining high-speed data streamsrdquoinProceedings of the 6thACMSIGKDD International Conferenceon Knowledge Discovery and Data Mining pp 71ndash80 BostonMass USA August 2000

[7] G Hulten L Spencer and P Domingos ldquoMining time-chang-ing data streamsrdquo in Proceedings of the 7th ACM SIGKDD Inter-national Conference on Knowledge Discovery and Data Mining(KDD rsquo01) pp 97ndash106 San Francisco Calif USA August 2001

[8] C Liang Y Zhang P Shi and Z Hu ldquoLearning very fast deci-sion tree from uncertain data streams with positive and unla-beled samplesrdquo Information Sciences vol 213 pp 50ndash67 2012

[9] L Rutkowski M Jaworski L Pietruczuk and P Duda ldquoDeci-sion trees for mining data streams based on the gaussianapproximationrdquo IEEE Transactions on Knowledge and DataEngineering vol 26 no 1 pp 108ndash119 2014

[10] G Widmer andM Kubat ldquoLearning in the presence of conceptdrift and hidden contextsrdquo Machine Learning vol 23 no 1 pp69ndash101 1996

[11] F Ferrer-Troyano J S Aguilar-Ruiz and J C RiquelmeldquoData streams classification by incremental rule learning withparameterized generalizationrdquo in Proceedings of the 2006 ACMSymposium on Applied Computing pp 657ndash661 Dijon FranceApril 2006

[12] D Mena-Torres and J S Aguilar-Ruiz ldquoA similarity-basedapproach for data stream classificationrdquo Expert Systems withApplications vol 41 no 9 pp 4224ndash4234 2014

[13] A Shaker and E Hullermeier ldquoIBLStreams a system forinstance-based classification and regression on data streamsrdquoEvolving Systems vol 3 no 4 pp 235ndash249 2012

[14] A Ghazikhani R Monsefi and H S Yazdi ldquoRecursive leastsquare perceptron model for non-stationary and imbalanceddata stream classificationrdquo Evolving Systems vol 4 no 2 pp119ndash131 2013

[15] A Ghazikhani R Monsefi and H S Yazdi ldquoEnsemble ofonline neural networks for non-stationary and imbalanced datastreamsrdquo Neurocomputing vol 122 pp 535ndash544 2013

[16] A Sancho-Asensio A Orriols-Puig and E GolobardesldquoRobust on-line neural learning classifier system for data streamclassification tasksrdquo Soft Computing vol 18 no 8 pp 1441ndash14612014

[17] D M Farid L Zhang A Hossain et al ldquoAn adaptive ensembleclassifier for mining concept drifting data streamsrdquo ExpertSystems with Applications vol 40 no 15 pp 5895ndash5906 2013

[18] D Brzezinski and J Stefanowski ldquoCombining block-based andonline methods in learning ensembles from concept driftingdata streamsrdquo Information Sciences vol 265 pp 50ndash67 2014

[19] G Ditzler and R Polikar ldquoIncremental learning of conceptdrift from streaming imbalanced datardquo IEEE Transactions onKnowledge and Data Engineering vol 25 no 10 pp 2283ndash23012013

[20] M J Hosseini Z Ahmadi andH Beigy ldquoUsing a classifier poolin accuracy based tracking of recurring concepts in data streamclassificationrdquo Evolving Systems vol 4 no 1 pp 43ndash60 2013

[21] N Saunier and S Midenet ldquoCreating ensemble classifiersthrough order and incremental data selection in a streamrdquo Pat-tern Analysis and Applications vol 16 no 3 pp 333ndash347 2013

[22] MMMasud J Gao L Khan J Han and BMThuraisinghamldquoClassification and novel class detection in concept-driftingdata streams under time constraintsrdquo IEEE Transactions onKnowledge and Data Engineering vol 23 no 6 pp 859ndash8742011

[23] A Alzghoul M Lofstrand and B Backe ldquoData stream fore-casting for system fault predictionrdquo Computers and IndustrialEngineering vol 62 no 4 pp 972ndash978 2012

[24] J Kranjc J Smailovic V Podpecan M Grcar M Znidarsicand N Lavrac ldquoActive learning for sentiment analysis on datastreams methodology and workflow implementation in theClowdFlows platformrdquo Information Processing and Manage-ment vol 51 no 2 pp 187ndash203 2015

[25] J Smailovic M Grcar N Lavrac and M Znidarsic ldquoStream-based active learning for sentiment analysis in the financialdomainrdquo Information Sciences vol 285 pp 181ndash203 2014

[26] S Shalev-Shwartz Y Singer N Srebro and A Cotter ldquoPegasosprimal estimated sub-gradient solver for SVMrdquo in Proceedingsof the 24th International Conference on Machine Learning pp807ndash814 June 2007

[27] Z Miller B Dickinson W Deitrick W Hu and A H WangldquoTwitter spammer detection using data stream clusteringrdquoInformation Sciences vol 260 pp 64ndash73 2014

[28] B Krawczyk and M Wozniak ldquoIncremental weighted one-class classifier for mining stationary data streamsrdquo Journal ofComputational Science vol 9 pp 19ndash25 2015

[29] J Gama ldquoA survey on learning from data streams current andfuture trendsrdquo Progress in Artificial Intelligence vol 1 no 1 pp45ndash55 2012

[30] L I Kuncheva ldquoChange detection in streaming multivariatedata using likelihood detectorsrdquo IEEE Transactions on Knowl-edge and Data Engineering vol 25 no 5 pp 1175ndash1180 2013

[31] G J Ross N M Adams D K Tasoulis and D J Hand ldquoExpo-nentially weighted moving average charts for detecting conceptdriftrdquoPatternRecognition Letters vol 33 no 2 pp 191ndash198 2012

[32] J Gama P Medas G Castillo and P Rodrigues ldquoLearning withdrift detectionrdquo in Advances in Artificial IntelligencemdashSBIA2004 vol 3171 of Lecture Notes in Computer Science pp 286ndash295 Springer Berlin Germany 2004

[33] X Wu P Li and X Hu ldquoLearning from concept drifting datastreams with unlabeled datardquoNeurocomputing vol 92 pp 145ndash155 2012

[34] X Qin Y Zhang C Li and X Li ldquoLearning from data streamswith only positive and unlabeled datardquo Journal of IntelligentInformation Systems vol 40 no 3 pp 405ndash430 2013

Mathematical Problems in Engineering 17

[35] M M Masud C Woolam J Gao et al ldquoFacing the reality ofdata stream classification coping with scarcity of labeled datardquoKnowledge and Information Systems vol 33 no 1 pp 213ndash2442012

[36] J Read A Bifet G Holmes and B Pfahringer ldquoScalable andefficient multi-label classification for evolving data streamsrdquoMachine Learning vol 88 no 1-2 pp 243ndash272 2012

[37] I Lopez-Yanez A J Arguelles-Cruz O Camacho-Nieto andC Yanez-Marquez ldquoPollutants time-series prediction usingthe gamma classifierrdquo International Journal of ComputationalIntelligence Systems vol 4 no 4 pp 680ndash711 2011

[38] M E Acevedo-Mosqueda C Yanez-Marquez and I Lopez-Yanez ldquoAlpha-beta bidirectional associative memories theoryand applicationsrdquo Neural Processing Letters vol 26 no 1 pp 1ndash40 2007

[39] C Yanez-Marquez E M Felipe-Riveron I Lopez-Yanez andR Flores-Carapia ldquoA novel approach to automatic color match-ingrdquo in Progress in Pattern Recognition Image Analysis andApplications vol 4225 of Lecture Notes in Computer Science pp529ndash538 Springer Berlin Germany 2006

[40] I Lopez-Yanez L Sheremetov and C Yanez-Marquez ldquoAnovel associative model for time series data miningrdquo PatternRecognition Letters vol 41 no 1 pp 23ndash33 2014

[41] A Bifet G Holmes R Kirkby and B Pfahringer ldquoData streammining a practical approachrdquo Tech RepUniversity ofWaikato2011

[42] Massive Online Analysis (MOA) 2014 httpmoacmswaikatoacnzdatasets

[43] K Bache and M Lichman UCI Machine Learning RepositorySchool of Information and Computer Science University ofCalifornia Irvine Calif USA 2013 httparchiveicsucieduml

[44] A Bifet R Kirkby P Kranen and P Reutemann ldquoMassiveOnline Analysis Manualrdquo 2012 httpmoacmswaikatoacnzdocumentation

[45] M Harries ldquoSplice-2 comparative evaluation electricity pric-ingrdquo Tech Rep University of South Wales Cardiff UK 1999

[46] R Agrawal T Imielinski and A Swami ldquoDatabase mining aperformance perspectiverdquo IEEE Transactions on Knowledge andData Engineering vol 5 no 6 pp 914ndash925 1993

[47] J Gama R Sebastiao and P P Rodrigues ldquoOn evaluatingstream learning algorithmsrdquo Machine Learning vol 90 no 3pp 317ndash346 2013

[48] F Wilcoxon ldquoIndividual comparisons by ranking methodsrdquoBiometrics Bulletin vol 1 no 6 pp 80ndash83 1945

[49] M Hall E Frank G Holmes B Pfahringer P Reutemannand I H Witten WEKA 3 Data Mining Software in Java 2010httpwwwcswaikatoacnzmlweka

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 6: Research Article Data Stream Classification Based …downloads.hindawi.com/journals/mpe/2015/939175.pdfuncertain data [ ], classi cation with unlabeled instances [ , , ], novel class

6 Mathematical Problems in Engineering

(1) Code the fundamental set with themodified Johnson-Mobius code obtaining a value 119890119898 for each compo-nent Consider

1199091= [

0000011011111111

]

1199092= [

0000001111111111

]

1199093= [

0111111000000111

]

1199094= [

1111111000001111

]

(20)

Since the maximum value among the first compo-nents of all fundamental patterns is 7 then 119890119898(1) = 7and the maximum value among the second compo-nents is 119890119898(2) = 9Thus the binary vectors represent-ing the first components are 7 bits long while thoserepresenting the second component are 9 bits long

(2) Compute the stop parameter

120588 = MAX119899119895=1 [119890119898 (119895)] = MAX2119895=1 [119890119898 (1) 119890119898 (2)]

= MAX (7 9) = 7(21)

(3) Code the patterns to be classified into the modifiedJohnson-Mobius code using the same parametersused with the fundamental set

1199091= [

0111111000000011

]

1199092= [

0000111111111111

]

(22)

Again the binary vectors representing the first com-ponents are 7 bits long while those representing thesecond component are 9 bits long

(4) Transform the index of all fundamental patterns intotwo indices one for the class they belong to andanother for their position in the class Consider

1199091= 119909

11

1199092= 119909

12

1199093= 119909

21

1199094= 119909

22

(23)

Given that 1198621 = 1199091 119909

2 and 1198622 = 119909

3 119909

4 we know

that both 1199091 and 119909

2 belong to class 1198621 so the firstindex for both patterns becomes 1 the second index isused to differentiate between patterns assigned to thesame class thus 1199091 becomes 11990911 and 1199092 is now 119909

12Something similar happens to 1199093 and 1199094 but with thefirst index being 2 since they belong to class 1198622 119909

3

becomes 11990921 and 1199094 is now 11990922

(5) Initialize = 0(6) Calculate 120574119892(119909

119894120596119895 119909119895 120579) for each component of the

fundamental patterns in each class Thus for 1199091 wehave

120574119892 (119909111 119909

11 0) = 0

120574119892 (119909112 119909

12 0) = 0

120574119892 (119909121 119909

11 0) = 0

120574119892 (119909122 119909

12 0) = 0

120574119892 (119909211 119909

11 0) = 1

120574119892 (119909212 119909

12 0) = 0

120574119892 (119909221 119909

11 0) = 0

120574119892 (119909222 119909

12 0) = 0

(24)

Since 120579 = 0 for 120574119892 to give a result of 1 it is necessarythat both vectors 119909119894120596119895 and 1199091119895 are equal But only onefundamental pattern has its first component equal tothat of 1199091 119909211 = [0111111] = 119909

11 while no funda-

mental pattern has a second component equal to thatof the test pattern Given that this is the only case forwhich a component of a fundamental pattern is equalto the same component of the test pattern it is alsothe only instance in which 120574119892 outputs 1

Now for 1199092 we have

120574119892 (119909111 119909

21 0) = 0

120574119892 (119909112 119909

22 0) = 0

120574119892 (119909121 119909

21 0) = 0

120574119892 (119909122 119909

22 0) = 1

120574119892 (119909211 119909

21 0) = 0

120574119892 (119909212 119909

22 0) = 0

120574119892 (119909221 119909

21 0) = 0

120574119892 (119909222 119909

22 0) = 0

(25)

Mathematical Problems in Engineering 7

Again 120579 = 0 thus forcing both vectors 119909119894120596119895 and 1199092119895

to be equal in order to obtain a 1 Similarly to whathappened in the case of 1199091 there is but one instanceof such occurrence 119909122 = [1111111] = 11990922

(7) Compute a weighted sum 119888119894 for each class

1198881 =sum1198961120596=1sum119899119895=1 120574119892 (119909

1120596119895 119909

1119895 120579)

1198961

=

sum2120596=1sum

2119895=1 120574119892 (119909

1120596119895 119909

1119895 120579)

2

=

sum2120596=1 [(0 + 0) (0 + 0)]

2=

(0 + 0)2

= 0

(26)

Here we add together the results obtained on allcomponents of all fundamental patterns belongingto class 1198621 Since all gave 0 (ie none of these weresimilar to the corresponding component of the testpattern given the value of 120579 in effect) the result of theweighted addition for class 1198621 is 1198881 = 0 One has

1198882 =sum1198962120596=1sum119899119895=1 120574119892 (119909

2120596119895 119909

1119895 120579)

1198962

=

sum2120596=1sum

2119895=1 120574119892 (119909

2120596119895 119909

1119895 120579)

2

=

sum2120596=1 [(1 + 0) (0 + 0)]

2=

(1 + 0)2

= 05

(27)

In this case there was one instance similar to thecorresponding component of the test pattern (equalsince 120579 = 0 during this run) Thus the weightedaddition for class 1198622 is 1198882 = 05 given that the sum ofresults is divided by the cardinality of the class 119896 = 2And for 1199092

1198881 =sum1198961120596=1sum119899119895=1 120574119892 (119909

1120596119895 119909

2119895 120579)

1198961

=

sum2120596=1sum

2119895=1 120574119892 (119909

1120596119895 119909

2119895 120579)

2

=

sum2120596=1 [(0 + 0) (0 + 1)]

2=

(0 + 0)2

= 05

(28)

There was one result of 120574119892 equal to 1 which coupledwith the cardinality of class1198621 being 2makes 1198881 = 05One has

1198882 =sum1198962120596=1sum119899119895=1 120574119892 (119909

2120596119895 119909

2119895 120579)

1198962

=

sum2120596=1sum

2119895=1 120574119892 (119909

2120596119895 119909

2119895 120579)

2

=

sum2120596=1 [(0 + 0) (0 + 0)]

2=

(0 + 0)2

= 0

(29)

No similarities were found between the fundamentalpatterns of class1198622 and the test pattern 1199092 (for 120579 = 0)thus 1198882 = 0

(8) If there is more than one maximum among the dif-ferent 119888119894 increment 120579 by 1 and repeat steps (6) and (7)until there is a unique maximum or the stop condi-tion 120579 ge 120588 is fulfilledThere is a uniquemaximum (foreach test pattern) so we go directly to step (9)

(9) If there is a unique maximum assign 119910 to the classcorresponding to such maximum For 1199091

1198621199091 = 1198622 since

MAX2119894=1119888119894 = MAX (0 05) = 05 = 1198882

(30)

while for 1199092

1198621199092 = 1198621 since

MAX2119894=1119888119894 = MAX (05 0) = 05 = 1198881

(31)

(10) Otherwise assign 119910 to the class of the first maximumThis is unnecessary since both test patterns havealready been assigned a class

22 SlidingWindows Awidespread approach for data streammining is the use of a sliding window to keep only a represen-tative portion of the data We are not interested in keeping allthe information of the data stream particularly whenwe haveto meet the memory space constraint required for algorithmsworking in this context This technique is able to deal withconcept drift eliminating those data points that come froman old concept According to [5] the training window sizecan be fixed or variable over time

Sliding windows of a fixed size store in memory a fixednumber of the 119908 most recent examples Whenever a newexample arrives it is saved to memory and the oldest one isdiscarded These types of windows are similar to first-in first-out data structures This simple adaptive learning method isoften used as a baseline in evaluation of new algorithms

Sliding windows of variable size vary the number ofexamples in a window over time typically depending on theindications of a change detector A straightforward approachis to shrink the window whenever a change is detected suchthat the training data reflects the most recent concept andgrow the window otherwise

23 Proposed Method The Gamma classifier has previ-ously been used for time series prediction in air pollution[37] and oil well production [40] showing very promisingresults The current work takes the Gamma classifier andcombines it with a sliding window approach with the aimof developing a newmethod capable of performing classifica-tion over a continuous data stream referred to asDS-Gammaclassifier

8 Mathematical Problems in Engineering

(1) Initialize window(2) for each 119909120591 do(3) If 120591 lt 119882 then(4) Add 119909120591 to the tail of the window(5) Else(6) Use 119909120591 to classify(7) Add 119909120591 to the tail of the window

Algorithm 1 Learning policy algorithm

A data stream 119878 can be defined as a sequence of dataelements of the form 119878 = 1199041 1199042 119904now from a universe ofsize119863 where119863 is huge and potentially infinite but countableWe assume that each element 119904119894 has the following form

119904119894 = (119909120591 119910120591) (32)

where 119909120591 is an input vector of size 119899 and 119910120591 is the class labelassociatedwith it 120591 isin Z+ is an index to indicate the sequentialorder of the elements in the data stream 120591 will be boundedby the size of the data stream such that if the data stream isinfinite 120591 will take infinite values but countable

One of the main challenges for mining data streamsis the ability to maintain an accurate decision model Thecurrent model has to incorporate new knowledge from newdata and also it should forget old information when datais outdated To address these issues the proposed methodimplements a learning and forgetting policy The learning(insertion)forgetting (removal) policies were designed toguarantee the space and temporal relevance of the data andthe consistency of the current model

The proposed method implements a fixed windowapproach This approach achieves concept drift detection inan indirect way by considering only the most recent dataAlgorithm 1 outlines the general steps used as learning policyInitially the window is created with the first 119882 labeledrecords where 119882 is the size of the window After thatevery sample is used for classification and evaluation of theperformance and then it is used to update the window

Classifiers that deal with concept drifts are forced toimplement forgetting adaptation or drift detection mech-anisms in order to adjust to changing environments Inthis proposal a forgetting mechanism is implemented Algo-rithm 2 outlines the basic forgetting policy implementedin this proposal The oldest element in the window 119909

120575 isremoved

24 MOA Massive Online Analysis (MOA) is a softwareenvironment for implementing algorithms and runningexperiments for online learning from evolving data streams[41] MOA contains a collection of offline and online algo-rithms for both classification and clustering as well as toolsfor evaluation on data streams with concept drifts MOA alsocan be used to build synthetic data streams using generatorsreading ARFF files or joining several streams MOA streamsgenerators allow simulating potentially infinite sequences of

Table 2 Datasets characteristics

Dataset Instances Attributes ClassesAgrawal 100000 10 2Bank 45211 16 2CoverType 581012 54 7Electricity 45312 8 2Hyperplane 100000 10 2RandomRBF 100000 10 2

data with different characteristics (different types of conceptdrifts and recurrent concepts among others) All the algo-rithms used for comparative analysis are implemented underthe MOA framework

25 Data Streams This section provides a brief descriptionof the data streams used during the experimental phase Theproposed solution has been tested using synthetic and realdata streams For data stream classification problems just afew suitable data streams can be successfully used for evalu-ation A few data streams with enough examples are hostedin [42 43] For further evaluation the use of synthetic datastreams generators becomes necessary The synthetics datastreams used in our experiments were generated using theMOA framework mentioned in the previous section A sum-mary of the main characteristics of the data streams is shownin Table 2 Descriptions of the data streams were taken from[42 43] in the case of real ones and from [44] for the synthet-ics

251 Real World Data Streams

Bank Marketing Dataset The Bank Marketing dataset isavailable from UCI Machine Learning Repository [43] Thisdata is related to direct marketing campaigns (phone calls)of a Portuguese banking institution The classification goal isto predict if the client will subscribe or not a term deposit Itcontains 45211 instances Each instance has 16 attributes andthe class label The dataset has two classes that identified if abank term deposit would be or would not be subscribed

Electricity Dataset This data was collected from the Aus-tralian New South Wales electricity market In this marketprices are not fixed and are affected by demand and supply ofthemarketTheprices of themarket are set every fiveminutesThe dataset was first described by Harries in [45] During thetime period described in the dataset the electricity marketwas expanded with the inclusion of adjacent areas whichleads tomore elaboratemanagement of the supplyThe excessproduction of one region could be sold on the adjacentregionThe Electricity dataset contains 45312 instances Eachexample of the dataset refers to a period of 30 minutes (iethere are 48 instances for each time period of one day) Eachexample on the dataset has 8 fields the day of week the timestamp the New South Wales electricity demand the NewSouthWales electricity price the Victoria electricity demandtheVictoria electricity price the scheduled electricity transfer

Mathematical Problems in Engineering 9

(1) for each 119909120591 do(2) If 120591 gt 119882 then(3) Remove 119909120575(120575 = 120591 minus119882 + 1) from the beginning of the window

Algorithm 2 Forgetting policy algorithm

between states and the class label The class label identifiesthe change of the price relative to a moving average of the last24 hours We use the normalized versions of these datasets sothat the numerical values are between 0 and 1 The dataset isavailable from [42]

Forest CoverType Dataset The Forest CoverType dataset isone of the largest datasets available from the UCI MachineLearning Repository [43] This dataset contains informationthat describes forest cover types from cartographic variablesA given observation (30times 30meter cell) was determined fromthe US Forest Service (USFS) Region 2 Resource InformationSystem (RIS) data It contains 581012 instances and 54attributes Data is in raw form (not scaled) and containsbinary (0 or 1) columns of data for qualitative independentvariables (wilderness areas and soil types) The dataset has 7classes that identify the type of forest cover

252 Synthetic Data Streams

Agrawal Data Stream This generator was originally intro-duced by Agrawal et al in [46] The MOA implementationwas used to generate the data streams for the experimen-tal phase in this work The generator produces a streamcontaining nine attributes Although not explicitly stated bythe authors a sensible conclusion is that these attributesdescribe hypothetical loan applications There are ten func-tions defined for generating binary class labels from theattributes Presumably these determine whether the loanshould be approved Perturbation shifts numeric attributesfrom their true value adding an offset drawn randomly froma uniform distribution the range of which is a specifiedpercentage of the total value range [44] For each experimenta data stream with 100000 examples was generated

RandomRBFData StreamThis generator was devised to offeran alternate complex concept type that is not straightforwardto approximate with a decision tree model The RBF (RadialBasis Function) generator works as follows A fixed numberof random centroids are generated Each center has a randomposition a single standard deviation class label and weightNew examples are generated by selecting a center at randomtaking weights into consideration so that centers with higherweight are more likely to be chosen A random direction ischosen to offset the attribute values from the central pointThe length of the displacement is randomly drawn from aGaussian distribution with standard deviation determined bythe chosen centroid The chosen centroid also determinesthe class label of the example Only numeric attributes aregenerated [44] For our experiments a data stream with

100000 data instances 10 numerical attributes and 2 classeswas generated

Rotating Hyperplane Data Stream This synthetic data streamwas generated using the MOA framework A hyperplane in119889-dimensional space is the set of points 119909 that satisfy [44]

119889

sum

119894=1119909119894119908119894 = 1199080 =

119889

sum

119894=1119908119894 (33)

where 119909119894 is the 119894th coordinate of 119909 examples for whichsum119889119894=1 119909119894119908119894 ge 1199080 are labeled positive and examples for which

sum119889119894=1 119909119894119908119894 lt 1199080 are labeled negative Hyperplanes are useful

for simulating time-changing concepts because orientationand position of the hyperplane can be adjusted in a gradualway if we change the relative size of the weights In MOAthey introduce change to this data stream by adding driftto each weight attribute using this formula 119908119894 = 119908119894 + 119889120590where 120590 is the probability that the direction of change isreversed and 119889 is the change applied to every example Forour experiments a data stream with 100000 data instances10 numerical attributes and 2 classes was generated

26 Algorithms Evaluation One of the objectives of this workis to perform a consistent comparison between the classi-fication performance of our proposal and the classificationperformance of other well-known data stream pattern clas-sification algorithms Evaluation of data stream algorithmsis a relatively new field that has not been studied as well asevaluation on batch learning In traditional batch learningevaluation has been mainly limited to evaluating a modelusing different random reorder of the same static dataset (119896-fold cross validation leave-one-out and bootstrap) [2] Fordata stream scenarios the problem of infinite data raises newchallenges On the other hand the batch approaches cannoteffectively measure accuracy of a model in a context withconcepts that change over time In the data stream settingone of the main concerns is how to design an evaluationpractice that allows us to obtain a reliablemeasure of accuracyover time According to [41] two main approaches arise

261 Holdout When traditional batch learning reaches ascale where cross validation is too time-consuming it is oftenaccepted to insteadmeasure performance on a single holdoutset An extension of this approach can be applied for datastream scenarios First a set of 119872 instances is used to trainthe model then a set of the following unseen 119873 instances isused to test the model then it is again trained with the next119872 instances and testedwith the subsequence119873 instances andso forth

10 Mathematical Problems in Engineering

262 Interleaved Test-Then-Train or Prequential Each indi-vidual example can be used to test the model before it is usedfor training and from this the accuracy can be incrementallyupdated When intentionally performed in this order themodel is always being tested on examples it has not seen

Holdout evaluation gives a more accurate estimationof the performance of the classifier on more recent dataHowever it requires recent test data that is sometimes difficultto obtain while the prequential evaluation has the advantagethat no holdout set is needed for testing making maximumuse of the available data In [47] Gama et al present a generalframework to assess data stream algorithms They studieddifferent mechanisms of evaluation that include holdoutestimator the prequential error and the prequential errorestimated over a sliding window or using fading factorsTheystated that the use of prequential error with forgetting mech-anisms reveals to be advantageous in assessing performanceand in comparing stream learning algorithms Since ourproposedmethod isworkingwith a slidingwindowapproachwe have selected the prequential error over sliding windowfor evaluation of the proposedmethodThe prequential error119875119882 for a sliding window of size 119882 computed at time 119894 isbased on an accumulated sumof a 0-1 loss function119871 betweenthe predicted 119910119896 values and the observed values 119910119896 using thefollowing expression [47]

119875119882 (119894) =1119882

119894

sum

119896=119894minus119882+1119871 (119910119896 119910119896) (34)

We also evaluate the performance results of each competingalgorithm with the Gamma classifier using a statistical testWe use the Wilcoxon test [48] to assess the statistical sig-nificance of the results The Wilcoxon signed-ranks test is anonparametric test which can be used to rank the differencesin performances of two classifiers for each dataset ignoringthe signs and comparing the ranks for the positive and thenegative differences

27 Experimental Setup To ensure valid comparison of clas-sification performance the same conditions and validationschemes were applied in each experiment Classificationperformance of each of the compared algorithms was cal-culated using the prequential error approach These resultswill be used to compare the performance of our proposaland other data stream classification algorithms We haveaimed at choosing representative algorithms from differentapproaches covering the state of the art in data stream classi-fication such as Hoeffding Tree IBL Naıve Bayes with DDM(Drift Detection Method) SVM and ensemble methods Allof these algorithms are implemented in theMOA frameworkFurther details on the implementation of these algorithmscan be found in [44] For all data streams with exceptionof the Electricity dataset the experiments were executed 10times and the average of the performance and executiontime results are presented Records in the Electricity datasethave a temporal relation that should not be altered forthis reason with this specific dataset the experiment wasexecuted just one time The Bank and CoverType datasets

were reordered 10 times using the randomized filter fromthe WEKA platform [49] for each experiment For thesynthetics data streams (Agrawal Rotating Hyperplane andRandomRBF) 10 data streams were generated for each oneof them using different random seeds in MOA Performancewas calculated using the prequential error introduced in theprevious section We also measure the total time used forclassification to evaluate the efficiency of each algorithm Allexperiments were conducted using a personal computer withan Intel Core i3-2100 processor running Ubuntu 1304 64-bitoperating system with 4096GB of RAM

3 Results and Discussion

In this section we present and discuss the results obtainedduring the experimental phase throughout which six datastreams three real and three synthetic were used to obtainthe classification and time performance for each of thecompared classification algorithms First we evaluate thesensibility of the proposed method to the change on thewindows size which is a parameter that can largely influencethe performance of the classifier Table 3 shows the averageaccuracy of the DS-Gamma classifier for different windowsizes best accuracy for each data stream is emphasized withboldface Table 4 shows the average total time used to classifyeach data stream The results of the average performancewith indicator of the standard deviation are also depicted inFigure 1

In general the DS-Gamma classifier performanceimproved as the window size increases with the exceptionof Electricity data stream for this data stream performanceis better with smaller window sizes It is worth noting thatthe DS-Gamma classifier achieved its best performance forthe Electricity data stream with a window size of 50 witha performance of 8912 This performance is better thanall the performances achieved by all the other comparedalgorithms with a window size = 1000 For the other datastreams the best performance was generally obtained withwindow sizes between 500 and 1000 With window sizesgreater than 1000 performance keeps stable and in somecases even degrades as we can observe in the results fromthe Bank and Hyperplane data streams in Figure 1 Figure 2shows the average classification time for different windowsize A linear trend line (dotted line) and the coefficient ofdetermination 1198772 were included in the graphic of each datastream The 1198772 values ranging between 09743 and 09866show that time is growing in linear way as the size of thewindow increases

To evaluate the DS-Gamma classifier we compared itsperformance with other data stream classifiers Table 5 pre-sents the performance results of each evaluated algorithm oneach data stream including the standard deviation Best per-formance is emphasized with boldface We also include boxplot with the same results in Figure 3 For the Agrawal datastreamHoeffdingTree and theDS-Gamma classifier achievedthe two best performances OzaBoostAdwin presents the bestperformance for the CoverType and Electricity data streamsbut this ensemble classifier exhibits the highest classificationtime for all the data streams as we can observe in Table 6 and

Mathematical Problems in Engineering 11

Table 3 Average accuracy of DS-Gamma for different windows size

Dataset Window size50 100 200 500 750 1000 1250 1500 1750 2000

Agrawal 7964 8597 8986 9259 9288 9282 9267 9251 9232 9207Bank 6819 7340 7705 7913 7912 7896 7860 7824 7776 7737CoverType 3821 3761 3970 4430 4659 4777 4862 4919 4951 4974Electricity 8912 8526 8171 7984 7858 7714 7636 7594 7531 7449Hyperplane 7825 8265 8528 8656 8655 8630 8598 8565 8534 8500RandomRBF 6461 6739 6933 7056 7081 7085 7074 7063 7050 7033

Table 4 Averaged classification time (seconds) of DS-Gamma for different windows size

Dataset Window size50 100 200 500 750 1000 1250 1500 1750 2000

Agrawal 224 450 793 1926 2804 3900 4592 5757 6390 7349Bank 066 114 207 475 701 917 1154 1455 1583 1787CoverType 8347 14081 25120 34005 25412 33503 42894 52239 66924 67699Electricity 034 053 098 233 356 463 599 705 830 949Hyperplane 224 401 756 1819 2676 3028 4428 5296 6187 7063RandomRBF 279 409 773 1854 2735 3072 4502 5412 6223 7104

Table 5 Averaged accuracy () comparison

Classifier Agrawal Bank CoverType Electricity Hyperplane RandomRBFAcc 120590 Acc 120590 Acc 120590 Acc 120590 Acc 120590 Acc 120590

Bayes-DDM 8816 019 8338 141 6761 681 8105 mdash 8602 139 7187 382Perceptron DDM 6720 011 7343 1053 7076 555 7861 mdash 8565 149 7188 593Hoeffding Tree 9439 024 8852 011 7229 268 8241 mdash 8241 562 8672 169Active Classifier 9082 162 8821 014 6731 103 7859 mdash 7950 655 7532 270IBL-DS 8676 048 8652 021 6173 946 7534 mdash 8420 077 9332 121AWE 9217 059 8790 038 7616 366 7836 mdash 8588 165 9198 114OzaBoostAdwin 8711 262 8767 050 7855 497 8792 mdash 8321 191 9149 114SGD 6720 011 7656 166 5901 737 8452 mdash 8319 042 6174 331SPegasos 5582 017 7746 199 5047 517 5760 mdash 5280 608 5333 313DS-Gamma 9282 010 7896 022 4777 431 7714 mdash 8630 135 7085 254

Table 6 Averaged classification time (seconds) comparison

Classifier Agrawal Bank CoverType Electricity Hyperplane RandomRBFTime 120590 Time 120590 Time 120590 Time 120590 Time 120590 Time 120590

Bayes-DDM 062 011 039 008 1776 162 061 mdash 070 001 069 001Perceptron DDM 040 002 020 005 828 010 019 mdash 045 002 044 002Hoeffding Tree 098 023 054 003 3005 1106 078 mdash 136 005 140 005Active Classifier 064 003 026 002 1514 074 023 mdash 072 004 070 002IBL-DS 5319 211 3851 826 105760 6095 861 mdash 8639 096 8810 106AWE 1432 020 663 030 101247 30441 670 mdash 1803 364 3752 303OzaBoostAdwin 19262 4447 4693 1056 1058973 389664 3975 mdash 22399 5134 5704 1666SGD 014 004 011 0009 4035 007 012 mdash 027 002 027 002SPegasos 022 0006 011 0016 1684 012 012 mdash 031 005 024 001DS-Gamma 3900 358 916 005 33503 1291 463 mdash 3028 079 3072 094

12 Mathematical Problems in Engineering

Agrawal data stream

Window size

95

90

85

80

7550 100 200 500 750 1000 1250 1500 1750 2000

Hyperplane data stream

90

85

80

70

75

65

Window size50 100 200 500 750 1000 1250 1500 1750 2000

Bank data stream85

80

70

75

6550 100 200 500 750 1000 1250 1500 1750 2000

Window size

Window size

90

85

80

70

75

65

60

50

55

50 100 200 500 750 1000 1250 1500 1750 2000

RandomRBF data stream

60

50

40

30

20

10

0

Window size50 100 200 500 750 1000 1250 1500 1750 2000

CoverType data stream

90

95

85

80

70

75

6550 100 200 500 750 1000 1250 1500 1750 2000

Window size

Electricity data stream

Figure 1 Average accuracy of DS-Gamma for different windows size

Figure 4 This can be a serious disadvantage especially indata stream scenarios The lowest times are achieved by theperceptron with Drift Detection Method (DDM) but itsperformance is generally low when compared to the othermethods A fact that caught our attention was the low perfor-mance of the DS-Gamma classifier for the CoverType datastream This data stream presents a heavy class imbalancethat seems to be negatively affecting the performance of theclassifier As we can observe in step (8) of the DS-Gamma

classifier algorithm classification relies on a weighted sumper class when a class greatly outnumbers the other(s) thissum will be biased to the class with most members

To statistically compare the results obtained by the DS-Gamma classifier and the other evaluated algorithms theWilcoxon signed-ranks test was used Table 7 shows thevalues of the asymptotic significance (2-sided) obtained bythe Wilcoxon test using 120572 = 005 The obtained values areall greater than 120572 so the null hypothesis is not rejected and

Mathematical Problems in Engineering 13

Hyperplane data stream

Window size

Window size

CoverType data stream Electricity data stream

RandomRBF data stream

Agrawal data stream Bank data stream80

70

60

50

40

30

20

20

15

10

5

010

050 100 200 500 750 1000 1250 1500 1750 2000

50 100 200 500 750 1000 1250 1500 1750 2000

Window size50 100 200 500 750 1000 1250 1500 1750 2000 Window size

50 100 200 500 750 1000 1250 1500 1750 2000

Window size50 100 200 500 750 1000 1250 1500 1750 2000

Window size50 100 200 500 750 1000 1250 1500 1750 2000

minus10

70

60

50

40

30

20

10

0

550

450

350

250

150

50

10

8

6

4

2

0

minus10

minus50

70

60

50

40

30

20

10

0

minus10

minus2

minus5

R

2

= 09866

R

2

= 09854

R

2

= 09804

R

2

= 09796

R

2

= 09743

R

2

= 09842

Figure 2 Averaged classification time (seconds) of DS-Gamma for different windows size

we can infer that there are no significant differences amongthe compared algorithms

4 Conclusions

In this paper we describe a novel method for the task ofpattern classification over a continuous data stream based onan associative model The proposed method is a supervisedpattern recognition model based on the Gamma classifierDuring the experimental phase the proposed method wastested using different data stream scenarios with real andsynthetic data streams The proposed method presented verypromising results in terms of performance and classification

Table 7 Wilcoxon results

Compared algorithms Asymptotic significance (2-sided)Gamma-Bayes-DDM 0345Gamma-Perceptron DDM 0917Gamma-Hoeffding Tree 0116Gamma-Active Classifier 0463Gamma-IBL-DS 0345Gamma-AWE 0116Gamma-OzaBoostAdwin 0116Gamma-SGD 0600Gamma-SPegasos 0075

14 Mathematical Problems in Engineering

Agrawal data stream

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

95

90

85

80

75

70

65

60

55

50

Bank data stream

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

90

80

70

60

50

40

Hyperplane data stream

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

95

90

85

80

75

70

65

60

55

45

50

RandomRBF data stream

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

95

90

85

80

75

70

65

60

55

45

50

CoverType data stream

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

90

85

80

75

70

65

60

55

45

50

Electricity data stream100

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

90

80

70

60

50

40

30

20

10

0

Figure 3 Averaged accuracy () comparison

time when compared with some of the state-of-the-art algo-rithms for data stream classification

We also study the effect of the window size over the clas-sification accuracy and time In general accuracy improvedas the window size increased We observed that best perfor-mances were obtained with window sizes between 500 and

1000 For larger window sizes performance remained stableand in some cases even declined

Since the proposed method based its classification ona weighted sum per class it is strongly affected by severeclass imbalance as we can observe from the results obtainedwith the experiments performed with the CoverType dataset

Mathematical Problems in Engineering 15

Agrawal data stream

200

150

100

50

0

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

1

10

100

Hyperplane data stream

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

1

10

100

1000

10000

100000CoverType data stream

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

0102030405060708090

RandomRBF data stream

0

10

20

30

40

50

60Bank data stream

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

05

10152025303540

Electricity data stream

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

Figure 4 Averaged classification time (seconds) comparison

a heavily imbalance dataset where the accuracy of theclassifier was the worst of the compared algorithms As futurework a mechanism to address severe class imbalance shouldbe integrated to improve classification accuracy

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

The authors would like to thank the Instituto PolitecnicoNacional (Secretarıa Academica COFAA SIP CIC andCIDETEC) the CONACyT and SNI for their economicalsupport to develop this workThe support of theUniversity ofPorto was provided during the research stay from August toDecember 2014 The authors also want to thank Dr YennyVilluendas Rey for her contribution to perform the statisticalsignificance analysis

16 Mathematical Problems in Engineering

References

[1] V Turner J FGantzD Reinsel and SMinton ldquoThedigital uni-verse of opportunities rich data and the increasing value of theinternet of thingsrdquo Tech Rep IDC Information andData 2014httpwwwemccomleadershipdigital-universe2014iviewindexhtm

[2] A Bifet Adaptive Stream Mining Pattern Learning and Miningfrom Evolving Data Streams IOS Press Amsterdam The Neth-erlands 2010

[3] J Gama Knowledge Discovery from Data Streams Chapman ampHallCRC 2010

[4] J Beringer and E Hullermeier ldquoEfficient instance-based learn-ing on data streamsrdquo Intelligent Data Analysis vol 11 no 6 pp627ndash650 2007

[5] J Gama I Zliobaite A Bifet M Pechenizkiy and A Bou-chachia ldquoA survey on concept drift adaptationrdquo ACM Comput-ing Surveys vol 46 no 4 article 44 2014

[6] P Domingos and G Hulten ldquoMining high-speed data streamsrdquoinProceedings of the 6thACMSIGKDD International Conferenceon Knowledge Discovery and Data Mining pp 71ndash80 BostonMass USA August 2000

[7] G Hulten L Spencer and P Domingos ldquoMining time-chang-ing data streamsrdquo in Proceedings of the 7th ACM SIGKDD Inter-national Conference on Knowledge Discovery and Data Mining(KDD rsquo01) pp 97ndash106 San Francisco Calif USA August 2001

[8] C Liang Y Zhang P Shi and Z Hu ldquoLearning very fast deci-sion tree from uncertain data streams with positive and unla-beled samplesrdquo Information Sciences vol 213 pp 50ndash67 2012

[9] L Rutkowski M Jaworski L Pietruczuk and P Duda ldquoDeci-sion trees for mining data streams based on the gaussianapproximationrdquo IEEE Transactions on Knowledge and DataEngineering vol 26 no 1 pp 108ndash119 2014

[10] G Widmer andM Kubat ldquoLearning in the presence of conceptdrift and hidden contextsrdquo Machine Learning vol 23 no 1 pp69ndash101 1996

[11] F Ferrer-Troyano J S Aguilar-Ruiz and J C RiquelmeldquoData streams classification by incremental rule learning withparameterized generalizationrdquo in Proceedings of the 2006 ACMSymposium on Applied Computing pp 657ndash661 Dijon FranceApril 2006

[12] D Mena-Torres and J S Aguilar-Ruiz ldquoA similarity-basedapproach for data stream classificationrdquo Expert Systems withApplications vol 41 no 9 pp 4224ndash4234 2014

[13] A Shaker and E Hullermeier ldquoIBLStreams a system forinstance-based classification and regression on data streamsrdquoEvolving Systems vol 3 no 4 pp 235ndash249 2012

[14] A Ghazikhani R Monsefi and H S Yazdi ldquoRecursive leastsquare perceptron model for non-stationary and imbalanceddata stream classificationrdquo Evolving Systems vol 4 no 2 pp119ndash131 2013

[15] A Ghazikhani R Monsefi and H S Yazdi ldquoEnsemble ofonline neural networks for non-stationary and imbalanced datastreamsrdquo Neurocomputing vol 122 pp 535ndash544 2013

[16] A Sancho-Asensio A Orriols-Puig and E GolobardesldquoRobust on-line neural learning classifier system for data streamclassification tasksrdquo Soft Computing vol 18 no 8 pp 1441ndash14612014

[17] D M Farid L Zhang A Hossain et al ldquoAn adaptive ensembleclassifier for mining concept drifting data streamsrdquo ExpertSystems with Applications vol 40 no 15 pp 5895ndash5906 2013

[18] D Brzezinski and J Stefanowski ldquoCombining block-based andonline methods in learning ensembles from concept driftingdata streamsrdquo Information Sciences vol 265 pp 50ndash67 2014

[19] G Ditzler and R Polikar ldquoIncremental learning of conceptdrift from streaming imbalanced datardquo IEEE Transactions onKnowledge and Data Engineering vol 25 no 10 pp 2283ndash23012013

[20] M J Hosseini Z Ahmadi andH Beigy ldquoUsing a classifier poolin accuracy based tracking of recurring concepts in data streamclassificationrdquo Evolving Systems vol 4 no 1 pp 43ndash60 2013

[21] N Saunier and S Midenet ldquoCreating ensemble classifiersthrough order and incremental data selection in a streamrdquo Pat-tern Analysis and Applications vol 16 no 3 pp 333ndash347 2013

[22] MMMasud J Gao L Khan J Han and BMThuraisinghamldquoClassification and novel class detection in concept-driftingdata streams under time constraintsrdquo IEEE Transactions onKnowledge and Data Engineering vol 23 no 6 pp 859ndash8742011

[23] A Alzghoul M Lofstrand and B Backe ldquoData stream fore-casting for system fault predictionrdquo Computers and IndustrialEngineering vol 62 no 4 pp 972ndash978 2012

[24] J Kranjc J Smailovic V Podpecan M Grcar M Znidarsicand N Lavrac ldquoActive learning for sentiment analysis on datastreams methodology and workflow implementation in theClowdFlows platformrdquo Information Processing and Manage-ment vol 51 no 2 pp 187ndash203 2015

[25] J Smailovic M Grcar N Lavrac and M Znidarsic ldquoStream-based active learning for sentiment analysis in the financialdomainrdquo Information Sciences vol 285 pp 181ndash203 2014

[26] S Shalev-Shwartz Y Singer N Srebro and A Cotter ldquoPegasosprimal estimated sub-gradient solver for SVMrdquo in Proceedingsof the 24th International Conference on Machine Learning pp807ndash814 June 2007

[27] Z Miller B Dickinson W Deitrick W Hu and A H WangldquoTwitter spammer detection using data stream clusteringrdquoInformation Sciences vol 260 pp 64ndash73 2014

[28] B Krawczyk and M Wozniak ldquoIncremental weighted one-class classifier for mining stationary data streamsrdquo Journal ofComputational Science vol 9 pp 19ndash25 2015

[29] J Gama ldquoA survey on learning from data streams current andfuture trendsrdquo Progress in Artificial Intelligence vol 1 no 1 pp45ndash55 2012

[30] L I Kuncheva ldquoChange detection in streaming multivariatedata using likelihood detectorsrdquo IEEE Transactions on Knowl-edge and Data Engineering vol 25 no 5 pp 1175ndash1180 2013

[31] G J Ross N M Adams D K Tasoulis and D J Hand ldquoExpo-nentially weighted moving average charts for detecting conceptdriftrdquoPatternRecognition Letters vol 33 no 2 pp 191ndash198 2012

[32] J Gama P Medas G Castillo and P Rodrigues ldquoLearning withdrift detectionrdquo in Advances in Artificial IntelligencemdashSBIA2004 vol 3171 of Lecture Notes in Computer Science pp 286ndash295 Springer Berlin Germany 2004

[33] X Wu P Li and X Hu ldquoLearning from concept drifting datastreams with unlabeled datardquoNeurocomputing vol 92 pp 145ndash155 2012

[34] X Qin Y Zhang C Li and X Li ldquoLearning from data streamswith only positive and unlabeled datardquo Journal of IntelligentInformation Systems vol 40 no 3 pp 405ndash430 2013

Mathematical Problems in Engineering 17

[35] M M Masud C Woolam J Gao et al ldquoFacing the reality ofdata stream classification coping with scarcity of labeled datardquoKnowledge and Information Systems vol 33 no 1 pp 213ndash2442012

[36] J Read A Bifet G Holmes and B Pfahringer ldquoScalable andefficient multi-label classification for evolving data streamsrdquoMachine Learning vol 88 no 1-2 pp 243ndash272 2012

[37] I Lopez-Yanez A J Arguelles-Cruz O Camacho-Nieto andC Yanez-Marquez ldquoPollutants time-series prediction usingthe gamma classifierrdquo International Journal of ComputationalIntelligence Systems vol 4 no 4 pp 680ndash711 2011

[38] M E Acevedo-Mosqueda C Yanez-Marquez and I Lopez-Yanez ldquoAlpha-beta bidirectional associative memories theoryand applicationsrdquo Neural Processing Letters vol 26 no 1 pp 1ndash40 2007

[39] C Yanez-Marquez E M Felipe-Riveron I Lopez-Yanez andR Flores-Carapia ldquoA novel approach to automatic color match-ingrdquo in Progress in Pattern Recognition Image Analysis andApplications vol 4225 of Lecture Notes in Computer Science pp529ndash538 Springer Berlin Germany 2006

[40] I Lopez-Yanez L Sheremetov and C Yanez-Marquez ldquoAnovel associative model for time series data miningrdquo PatternRecognition Letters vol 41 no 1 pp 23ndash33 2014

[41] A Bifet G Holmes R Kirkby and B Pfahringer ldquoData streammining a practical approachrdquo Tech RepUniversity ofWaikato2011

[42] Massive Online Analysis (MOA) 2014 httpmoacmswaikatoacnzdatasets

[43] K Bache and M Lichman UCI Machine Learning RepositorySchool of Information and Computer Science University ofCalifornia Irvine Calif USA 2013 httparchiveicsucieduml

[44] A Bifet R Kirkby P Kranen and P Reutemann ldquoMassiveOnline Analysis Manualrdquo 2012 httpmoacmswaikatoacnzdocumentation

[45] M Harries ldquoSplice-2 comparative evaluation electricity pric-ingrdquo Tech Rep University of South Wales Cardiff UK 1999

[46] R Agrawal T Imielinski and A Swami ldquoDatabase mining aperformance perspectiverdquo IEEE Transactions on Knowledge andData Engineering vol 5 no 6 pp 914ndash925 1993

[47] J Gama R Sebastiao and P P Rodrigues ldquoOn evaluatingstream learning algorithmsrdquo Machine Learning vol 90 no 3pp 317ndash346 2013

[48] F Wilcoxon ldquoIndividual comparisons by ranking methodsrdquoBiometrics Bulletin vol 1 no 6 pp 80ndash83 1945

[49] M Hall E Frank G Holmes B Pfahringer P Reutemannand I H Witten WEKA 3 Data Mining Software in Java 2010httpwwwcswaikatoacnzmlweka

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 7: Research Article Data Stream Classification Based …downloads.hindawi.com/journals/mpe/2015/939175.pdfuncertain data [ ], classi cation with unlabeled instances [ , , ], novel class

Mathematical Problems in Engineering 7

Again 120579 = 0 thus forcing both vectors 119909119894120596119895 and 1199092119895

to be equal in order to obtain a 1 Similarly to whathappened in the case of 1199091 there is but one instanceof such occurrence 119909122 = [1111111] = 11990922

(7) Compute a weighted sum 119888119894 for each class

1198881 =sum1198961120596=1sum119899119895=1 120574119892 (119909

1120596119895 119909

1119895 120579)

1198961

=

sum2120596=1sum

2119895=1 120574119892 (119909

1120596119895 119909

1119895 120579)

2

=

sum2120596=1 [(0 + 0) (0 + 0)]

2=

(0 + 0)2

= 0

(26)

Here we add together the results obtained on allcomponents of all fundamental patterns belongingto class 1198621 Since all gave 0 (ie none of these weresimilar to the corresponding component of the testpattern given the value of 120579 in effect) the result of theweighted addition for class 1198621 is 1198881 = 0 One has

1198882 =sum1198962120596=1sum119899119895=1 120574119892 (119909

2120596119895 119909

1119895 120579)

1198962

=

sum2120596=1sum

2119895=1 120574119892 (119909

2120596119895 119909

1119895 120579)

2

=

sum2120596=1 [(1 + 0) (0 + 0)]

2=

(1 + 0)2

= 05

(27)

In this case there was one instance similar to thecorresponding component of the test pattern (equalsince 120579 = 0 during this run) Thus the weightedaddition for class 1198622 is 1198882 = 05 given that the sum ofresults is divided by the cardinality of the class 119896 = 2And for 1199092

1198881 =sum1198961120596=1sum119899119895=1 120574119892 (119909

1120596119895 119909

2119895 120579)

1198961

=

sum2120596=1sum

2119895=1 120574119892 (119909

1120596119895 119909

2119895 120579)

2

=

sum2120596=1 [(0 + 0) (0 + 1)]

2=

(0 + 0)2

= 05

(28)

There was one result of 120574119892 equal to 1 which coupledwith the cardinality of class1198621 being 2makes 1198881 = 05One has

1198882 =sum1198962120596=1sum119899119895=1 120574119892 (119909

2120596119895 119909

2119895 120579)

1198962

=

sum2120596=1sum

2119895=1 120574119892 (119909

2120596119895 119909

2119895 120579)

2

=

sum2120596=1 [(0 + 0) (0 + 0)]

2=

(0 + 0)2

= 0

(29)

No similarities were found between the fundamentalpatterns of class1198622 and the test pattern 1199092 (for 120579 = 0)thus 1198882 = 0

(8) If there is more than one maximum among the dif-ferent 119888119894 increment 120579 by 1 and repeat steps (6) and (7)until there is a unique maximum or the stop condi-tion 120579 ge 120588 is fulfilledThere is a uniquemaximum (foreach test pattern) so we go directly to step (9)

(9) If there is a unique maximum assign 119910 to the classcorresponding to such maximum For 1199091

1198621199091 = 1198622 since

MAX2119894=1119888119894 = MAX (0 05) = 05 = 1198882

(30)

while for 1199092

1198621199092 = 1198621 since

MAX2119894=1119888119894 = MAX (05 0) = 05 = 1198881

(31)

(10) Otherwise assign 119910 to the class of the first maximumThis is unnecessary since both test patterns havealready been assigned a class

22 SlidingWindows Awidespread approach for data streammining is the use of a sliding window to keep only a represen-tative portion of the data We are not interested in keeping allthe information of the data stream particularly whenwe haveto meet the memory space constraint required for algorithmsworking in this context This technique is able to deal withconcept drift eliminating those data points that come froman old concept According to [5] the training window sizecan be fixed or variable over time

Sliding windows of a fixed size store in memory a fixednumber of the 119908 most recent examples Whenever a newexample arrives it is saved to memory and the oldest one isdiscarded These types of windows are similar to first-in first-out data structures This simple adaptive learning method isoften used as a baseline in evaluation of new algorithms

Sliding windows of variable size vary the number ofexamples in a window over time typically depending on theindications of a change detector A straightforward approachis to shrink the window whenever a change is detected suchthat the training data reflects the most recent concept andgrow the window otherwise

23 Proposed Method The Gamma classifier has previ-ously been used for time series prediction in air pollution[37] and oil well production [40] showing very promisingresults The current work takes the Gamma classifier andcombines it with a sliding window approach with the aimof developing a newmethod capable of performing classifica-tion over a continuous data stream referred to asDS-Gammaclassifier

8 Mathematical Problems in Engineering

(1) Initialize window(2) for each 119909120591 do(3) If 120591 lt 119882 then(4) Add 119909120591 to the tail of the window(5) Else(6) Use 119909120591 to classify(7) Add 119909120591 to the tail of the window

Algorithm 1 Learning policy algorithm

A data stream 119878 can be defined as a sequence of dataelements of the form 119878 = 1199041 1199042 119904now from a universe ofsize119863 where119863 is huge and potentially infinite but countableWe assume that each element 119904119894 has the following form

119904119894 = (119909120591 119910120591) (32)

where 119909120591 is an input vector of size 119899 and 119910120591 is the class labelassociatedwith it 120591 isin Z+ is an index to indicate the sequentialorder of the elements in the data stream 120591 will be boundedby the size of the data stream such that if the data stream isinfinite 120591 will take infinite values but countable

One of the main challenges for mining data streamsis the ability to maintain an accurate decision model Thecurrent model has to incorporate new knowledge from newdata and also it should forget old information when datais outdated To address these issues the proposed methodimplements a learning and forgetting policy The learning(insertion)forgetting (removal) policies were designed toguarantee the space and temporal relevance of the data andthe consistency of the current model

The proposed method implements a fixed windowapproach This approach achieves concept drift detection inan indirect way by considering only the most recent dataAlgorithm 1 outlines the general steps used as learning policyInitially the window is created with the first 119882 labeledrecords where 119882 is the size of the window After thatevery sample is used for classification and evaluation of theperformance and then it is used to update the window

Classifiers that deal with concept drifts are forced toimplement forgetting adaptation or drift detection mech-anisms in order to adjust to changing environments Inthis proposal a forgetting mechanism is implemented Algo-rithm 2 outlines the basic forgetting policy implementedin this proposal The oldest element in the window 119909

120575 isremoved

24 MOA Massive Online Analysis (MOA) is a softwareenvironment for implementing algorithms and runningexperiments for online learning from evolving data streams[41] MOA contains a collection of offline and online algo-rithms for both classification and clustering as well as toolsfor evaluation on data streams with concept drifts MOA alsocan be used to build synthetic data streams using generatorsreading ARFF files or joining several streams MOA streamsgenerators allow simulating potentially infinite sequences of

Table 2 Datasets characteristics

Dataset Instances Attributes ClassesAgrawal 100000 10 2Bank 45211 16 2CoverType 581012 54 7Electricity 45312 8 2Hyperplane 100000 10 2RandomRBF 100000 10 2

data with different characteristics (different types of conceptdrifts and recurrent concepts among others) All the algo-rithms used for comparative analysis are implemented underthe MOA framework

25 Data Streams This section provides a brief descriptionof the data streams used during the experimental phase Theproposed solution has been tested using synthetic and realdata streams For data stream classification problems just afew suitable data streams can be successfully used for evalu-ation A few data streams with enough examples are hostedin [42 43] For further evaluation the use of synthetic datastreams generators becomes necessary The synthetics datastreams used in our experiments were generated using theMOA framework mentioned in the previous section A sum-mary of the main characteristics of the data streams is shownin Table 2 Descriptions of the data streams were taken from[42 43] in the case of real ones and from [44] for the synthet-ics

251 Real World Data Streams

Bank Marketing Dataset The Bank Marketing dataset isavailable from UCI Machine Learning Repository [43] Thisdata is related to direct marketing campaigns (phone calls)of a Portuguese banking institution The classification goal isto predict if the client will subscribe or not a term deposit Itcontains 45211 instances Each instance has 16 attributes andthe class label The dataset has two classes that identified if abank term deposit would be or would not be subscribed

Electricity Dataset This data was collected from the Aus-tralian New South Wales electricity market In this marketprices are not fixed and are affected by demand and supply ofthemarketTheprices of themarket are set every fiveminutesThe dataset was first described by Harries in [45] During thetime period described in the dataset the electricity marketwas expanded with the inclusion of adjacent areas whichleads tomore elaboratemanagement of the supplyThe excessproduction of one region could be sold on the adjacentregionThe Electricity dataset contains 45312 instances Eachexample of the dataset refers to a period of 30 minutes (iethere are 48 instances for each time period of one day) Eachexample on the dataset has 8 fields the day of week the timestamp the New South Wales electricity demand the NewSouthWales electricity price the Victoria electricity demandtheVictoria electricity price the scheduled electricity transfer

Mathematical Problems in Engineering 9

(1) for each 119909120591 do(2) If 120591 gt 119882 then(3) Remove 119909120575(120575 = 120591 minus119882 + 1) from the beginning of the window

Algorithm 2 Forgetting policy algorithm

between states and the class label The class label identifiesthe change of the price relative to a moving average of the last24 hours We use the normalized versions of these datasets sothat the numerical values are between 0 and 1 The dataset isavailable from [42]

Forest CoverType Dataset The Forest CoverType dataset isone of the largest datasets available from the UCI MachineLearning Repository [43] This dataset contains informationthat describes forest cover types from cartographic variablesA given observation (30times 30meter cell) was determined fromthe US Forest Service (USFS) Region 2 Resource InformationSystem (RIS) data It contains 581012 instances and 54attributes Data is in raw form (not scaled) and containsbinary (0 or 1) columns of data for qualitative independentvariables (wilderness areas and soil types) The dataset has 7classes that identify the type of forest cover

252 Synthetic Data Streams

Agrawal Data Stream This generator was originally intro-duced by Agrawal et al in [46] The MOA implementationwas used to generate the data streams for the experimen-tal phase in this work The generator produces a streamcontaining nine attributes Although not explicitly stated bythe authors a sensible conclusion is that these attributesdescribe hypothetical loan applications There are ten func-tions defined for generating binary class labels from theattributes Presumably these determine whether the loanshould be approved Perturbation shifts numeric attributesfrom their true value adding an offset drawn randomly froma uniform distribution the range of which is a specifiedpercentage of the total value range [44] For each experimenta data stream with 100000 examples was generated

RandomRBFData StreamThis generator was devised to offeran alternate complex concept type that is not straightforwardto approximate with a decision tree model The RBF (RadialBasis Function) generator works as follows A fixed numberof random centroids are generated Each center has a randomposition a single standard deviation class label and weightNew examples are generated by selecting a center at randomtaking weights into consideration so that centers with higherweight are more likely to be chosen A random direction ischosen to offset the attribute values from the central pointThe length of the displacement is randomly drawn from aGaussian distribution with standard deviation determined bythe chosen centroid The chosen centroid also determinesthe class label of the example Only numeric attributes aregenerated [44] For our experiments a data stream with

100000 data instances 10 numerical attributes and 2 classeswas generated

Rotating Hyperplane Data Stream This synthetic data streamwas generated using the MOA framework A hyperplane in119889-dimensional space is the set of points 119909 that satisfy [44]

119889

sum

119894=1119909119894119908119894 = 1199080 =

119889

sum

119894=1119908119894 (33)

where 119909119894 is the 119894th coordinate of 119909 examples for whichsum119889119894=1 119909119894119908119894 ge 1199080 are labeled positive and examples for which

sum119889119894=1 119909119894119908119894 lt 1199080 are labeled negative Hyperplanes are useful

for simulating time-changing concepts because orientationand position of the hyperplane can be adjusted in a gradualway if we change the relative size of the weights In MOAthey introduce change to this data stream by adding driftto each weight attribute using this formula 119908119894 = 119908119894 + 119889120590where 120590 is the probability that the direction of change isreversed and 119889 is the change applied to every example Forour experiments a data stream with 100000 data instances10 numerical attributes and 2 classes was generated

26 Algorithms Evaluation One of the objectives of this workis to perform a consistent comparison between the classi-fication performance of our proposal and the classificationperformance of other well-known data stream pattern clas-sification algorithms Evaluation of data stream algorithmsis a relatively new field that has not been studied as well asevaluation on batch learning In traditional batch learningevaluation has been mainly limited to evaluating a modelusing different random reorder of the same static dataset (119896-fold cross validation leave-one-out and bootstrap) [2] Fordata stream scenarios the problem of infinite data raises newchallenges On the other hand the batch approaches cannoteffectively measure accuracy of a model in a context withconcepts that change over time In the data stream settingone of the main concerns is how to design an evaluationpractice that allows us to obtain a reliablemeasure of accuracyover time According to [41] two main approaches arise

261 Holdout When traditional batch learning reaches ascale where cross validation is too time-consuming it is oftenaccepted to insteadmeasure performance on a single holdoutset An extension of this approach can be applied for datastream scenarios First a set of 119872 instances is used to trainthe model then a set of the following unseen 119873 instances isused to test the model then it is again trained with the next119872 instances and testedwith the subsequence119873 instances andso forth

10 Mathematical Problems in Engineering

262 Interleaved Test-Then-Train or Prequential Each indi-vidual example can be used to test the model before it is usedfor training and from this the accuracy can be incrementallyupdated When intentionally performed in this order themodel is always being tested on examples it has not seen

Holdout evaluation gives a more accurate estimationof the performance of the classifier on more recent dataHowever it requires recent test data that is sometimes difficultto obtain while the prequential evaluation has the advantagethat no holdout set is needed for testing making maximumuse of the available data In [47] Gama et al present a generalframework to assess data stream algorithms They studieddifferent mechanisms of evaluation that include holdoutestimator the prequential error and the prequential errorestimated over a sliding window or using fading factorsTheystated that the use of prequential error with forgetting mech-anisms reveals to be advantageous in assessing performanceand in comparing stream learning algorithms Since ourproposedmethod isworkingwith a slidingwindowapproachwe have selected the prequential error over sliding windowfor evaluation of the proposedmethodThe prequential error119875119882 for a sliding window of size 119882 computed at time 119894 isbased on an accumulated sumof a 0-1 loss function119871 betweenthe predicted 119910119896 values and the observed values 119910119896 using thefollowing expression [47]

119875119882 (119894) =1119882

119894

sum

119896=119894minus119882+1119871 (119910119896 119910119896) (34)

We also evaluate the performance results of each competingalgorithm with the Gamma classifier using a statistical testWe use the Wilcoxon test [48] to assess the statistical sig-nificance of the results The Wilcoxon signed-ranks test is anonparametric test which can be used to rank the differencesin performances of two classifiers for each dataset ignoringthe signs and comparing the ranks for the positive and thenegative differences

27 Experimental Setup To ensure valid comparison of clas-sification performance the same conditions and validationschemes were applied in each experiment Classificationperformance of each of the compared algorithms was cal-culated using the prequential error approach These resultswill be used to compare the performance of our proposaland other data stream classification algorithms We haveaimed at choosing representative algorithms from differentapproaches covering the state of the art in data stream classi-fication such as Hoeffding Tree IBL Naıve Bayes with DDM(Drift Detection Method) SVM and ensemble methods Allof these algorithms are implemented in theMOA frameworkFurther details on the implementation of these algorithmscan be found in [44] For all data streams with exceptionof the Electricity dataset the experiments were executed 10times and the average of the performance and executiontime results are presented Records in the Electricity datasethave a temporal relation that should not be altered forthis reason with this specific dataset the experiment wasexecuted just one time The Bank and CoverType datasets

were reordered 10 times using the randomized filter fromthe WEKA platform [49] for each experiment For thesynthetics data streams (Agrawal Rotating Hyperplane andRandomRBF) 10 data streams were generated for each oneof them using different random seeds in MOA Performancewas calculated using the prequential error introduced in theprevious section We also measure the total time used forclassification to evaluate the efficiency of each algorithm Allexperiments were conducted using a personal computer withan Intel Core i3-2100 processor running Ubuntu 1304 64-bitoperating system with 4096GB of RAM

3 Results and Discussion

In this section we present and discuss the results obtainedduring the experimental phase throughout which six datastreams three real and three synthetic were used to obtainthe classification and time performance for each of thecompared classification algorithms First we evaluate thesensibility of the proposed method to the change on thewindows size which is a parameter that can largely influencethe performance of the classifier Table 3 shows the averageaccuracy of the DS-Gamma classifier for different windowsizes best accuracy for each data stream is emphasized withboldface Table 4 shows the average total time used to classifyeach data stream The results of the average performancewith indicator of the standard deviation are also depicted inFigure 1

In general the DS-Gamma classifier performanceimproved as the window size increases with the exceptionof Electricity data stream for this data stream performanceis better with smaller window sizes It is worth noting thatthe DS-Gamma classifier achieved its best performance forthe Electricity data stream with a window size of 50 witha performance of 8912 This performance is better thanall the performances achieved by all the other comparedalgorithms with a window size = 1000 For the other datastreams the best performance was generally obtained withwindow sizes between 500 and 1000 With window sizesgreater than 1000 performance keeps stable and in somecases even degrades as we can observe in the results fromthe Bank and Hyperplane data streams in Figure 1 Figure 2shows the average classification time for different windowsize A linear trend line (dotted line) and the coefficient ofdetermination 1198772 were included in the graphic of each datastream The 1198772 values ranging between 09743 and 09866show that time is growing in linear way as the size of thewindow increases

To evaluate the DS-Gamma classifier we compared itsperformance with other data stream classifiers Table 5 pre-sents the performance results of each evaluated algorithm oneach data stream including the standard deviation Best per-formance is emphasized with boldface We also include boxplot with the same results in Figure 3 For the Agrawal datastreamHoeffdingTree and theDS-Gamma classifier achievedthe two best performances OzaBoostAdwin presents the bestperformance for the CoverType and Electricity data streamsbut this ensemble classifier exhibits the highest classificationtime for all the data streams as we can observe in Table 6 and

Mathematical Problems in Engineering 11

Table 3 Average accuracy of DS-Gamma for different windows size

Dataset Window size50 100 200 500 750 1000 1250 1500 1750 2000

Agrawal 7964 8597 8986 9259 9288 9282 9267 9251 9232 9207Bank 6819 7340 7705 7913 7912 7896 7860 7824 7776 7737CoverType 3821 3761 3970 4430 4659 4777 4862 4919 4951 4974Electricity 8912 8526 8171 7984 7858 7714 7636 7594 7531 7449Hyperplane 7825 8265 8528 8656 8655 8630 8598 8565 8534 8500RandomRBF 6461 6739 6933 7056 7081 7085 7074 7063 7050 7033

Table 4 Averaged classification time (seconds) of DS-Gamma for different windows size

Dataset Window size50 100 200 500 750 1000 1250 1500 1750 2000

Agrawal 224 450 793 1926 2804 3900 4592 5757 6390 7349Bank 066 114 207 475 701 917 1154 1455 1583 1787CoverType 8347 14081 25120 34005 25412 33503 42894 52239 66924 67699Electricity 034 053 098 233 356 463 599 705 830 949Hyperplane 224 401 756 1819 2676 3028 4428 5296 6187 7063RandomRBF 279 409 773 1854 2735 3072 4502 5412 6223 7104

Table 5 Averaged accuracy () comparison

Classifier Agrawal Bank CoverType Electricity Hyperplane RandomRBFAcc 120590 Acc 120590 Acc 120590 Acc 120590 Acc 120590 Acc 120590

Bayes-DDM 8816 019 8338 141 6761 681 8105 mdash 8602 139 7187 382Perceptron DDM 6720 011 7343 1053 7076 555 7861 mdash 8565 149 7188 593Hoeffding Tree 9439 024 8852 011 7229 268 8241 mdash 8241 562 8672 169Active Classifier 9082 162 8821 014 6731 103 7859 mdash 7950 655 7532 270IBL-DS 8676 048 8652 021 6173 946 7534 mdash 8420 077 9332 121AWE 9217 059 8790 038 7616 366 7836 mdash 8588 165 9198 114OzaBoostAdwin 8711 262 8767 050 7855 497 8792 mdash 8321 191 9149 114SGD 6720 011 7656 166 5901 737 8452 mdash 8319 042 6174 331SPegasos 5582 017 7746 199 5047 517 5760 mdash 5280 608 5333 313DS-Gamma 9282 010 7896 022 4777 431 7714 mdash 8630 135 7085 254

Table 6 Averaged classification time (seconds) comparison

Classifier Agrawal Bank CoverType Electricity Hyperplane RandomRBFTime 120590 Time 120590 Time 120590 Time 120590 Time 120590 Time 120590

Bayes-DDM 062 011 039 008 1776 162 061 mdash 070 001 069 001Perceptron DDM 040 002 020 005 828 010 019 mdash 045 002 044 002Hoeffding Tree 098 023 054 003 3005 1106 078 mdash 136 005 140 005Active Classifier 064 003 026 002 1514 074 023 mdash 072 004 070 002IBL-DS 5319 211 3851 826 105760 6095 861 mdash 8639 096 8810 106AWE 1432 020 663 030 101247 30441 670 mdash 1803 364 3752 303OzaBoostAdwin 19262 4447 4693 1056 1058973 389664 3975 mdash 22399 5134 5704 1666SGD 014 004 011 0009 4035 007 012 mdash 027 002 027 002SPegasos 022 0006 011 0016 1684 012 012 mdash 031 005 024 001DS-Gamma 3900 358 916 005 33503 1291 463 mdash 3028 079 3072 094

12 Mathematical Problems in Engineering

Agrawal data stream

Window size

95

90

85

80

7550 100 200 500 750 1000 1250 1500 1750 2000

Hyperplane data stream

90

85

80

70

75

65

Window size50 100 200 500 750 1000 1250 1500 1750 2000

Bank data stream85

80

70

75

6550 100 200 500 750 1000 1250 1500 1750 2000

Window size

Window size

90

85

80

70

75

65

60

50

55

50 100 200 500 750 1000 1250 1500 1750 2000

RandomRBF data stream

60

50

40

30

20

10

0

Window size50 100 200 500 750 1000 1250 1500 1750 2000

CoverType data stream

90

95

85

80

70

75

6550 100 200 500 750 1000 1250 1500 1750 2000

Window size

Electricity data stream

Figure 1 Average accuracy of DS-Gamma for different windows size

Figure 4 This can be a serious disadvantage especially indata stream scenarios The lowest times are achieved by theperceptron with Drift Detection Method (DDM) but itsperformance is generally low when compared to the othermethods A fact that caught our attention was the low perfor-mance of the DS-Gamma classifier for the CoverType datastream This data stream presents a heavy class imbalancethat seems to be negatively affecting the performance of theclassifier As we can observe in step (8) of the DS-Gamma

classifier algorithm classification relies on a weighted sumper class when a class greatly outnumbers the other(s) thissum will be biased to the class with most members

To statistically compare the results obtained by the DS-Gamma classifier and the other evaluated algorithms theWilcoxon signed-ranks test was used Table 7 shows thevalues of the asymptotic significance (2-sided) obtained bythe Wilcoxon test using 120572 = 005 The obtained values areall greater than 120572 so the null hypothesis is not rejected and

Mathematical Problems in Engineering 13

Hyperplane data stream

Window size

Window size

CoverType data stream Electricity data stream

RandomRBF data stream

Agrawal data stream Bank data stream80

70

60

50

40

30

20

20

15

10

5

010

050 100 200 500 750 1000 1250 1500 1750 2000

50 100 200 500 750 1000 1250 1500 1750 2000

Window size50 100 200 500 750 1000 1250 1500 1750 2000 Window size

50 100 200 500 750 1000 1250 1500 1750 2000

Window size50 100 200 500 750 1000 1250 1500 1750 2000

Window size50 100 200 500 750 1000 1250 1500 1750 2000

minus10

70

60

50

40

30

20

10

0

550

450

350

250

150

50

10

8

6

4

2

0

minus10

minus50

70

60

50

40

30

20

10

0

minus10

minus2

minus5

R

2

= 09866

R

2

= 09854

R

2

= 09804

R

2

= 09796

R

2

= 09743

R

2

= 09842

Figure 2 Averaged classification time (seconds) of DS-Gamma for different windows size

we can infer that there are no significant differences amongthe compared algorithms

4 Conclusions

In this paper we describe a novel method for the task ofpattern classification over a continuous data stream based onan associative model The proposed method is a supervisedpattern recognition model based on the Gamma classifierDuring the experimental phase the proposed method wastested using different data stream scenarios with real andsynthetic data streams The proposed method presented verypromising results in terms of performance and classification

Table 7 Wilcoxon results

Compared algorithms Asymptotic significance (2-sided)Gamma-Bayes-DDM 0345Gamma-Perceptron DDM 0917Gamma-Hoeffding Tree 0116Gamma-Active Classifier 0463Gamma-IBL-DS 0345Gamma-AWE 0116Gamma-OzaBoostAdwin 0116Gamma-SGD 0600Gamma-SPegasos 0075

14 Mathematical Problems in Engineering

Agrawal data stream

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

95

90

85

80

75

70

65

60

55

50

Bank data stream

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

90

80

70

60

50

40

Hyperplane data stream

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

95

90

85

80

75

70

65

60

55

45

50

RandomRBF data stream

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

95

90

85

80

75

70

65

60

55

45

50

CoverType data stream

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

90

85

80

75

70

65

60

55

45

50

Electricity data stream100

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

90

80

70

60

50

40

30

20

10

0

Figure 3 Averaged accuracy () comparison

time when compared with some of the state-of-the-art algo-rithms for data stream classification

We also study the effect of the window size over the clas-sification accuracy and time In general accuracy improvedas the window size increased We observed that best perfor-mances were obtained with window sizes between 500 and

1000 For larger window sizes performance remained stableand in some cases even declined

Since the proposed method based its classification ona weighted sum per class it is strongly affected by severeclass imbalance as we can observe from the results obtainedwith the experiments performed with the CoverType dataset

Mathematical Problems in Engineering 15

Agrawal data stream

200

150

100

50

0

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

1

10

100

Hyperplane data stream

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

1

10

100

1000

10000

100000CoverType data stream

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

0102030405060708090

RandomRBF data stream

0

10

20

30

40

50

60Bank data stream

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

05

10152025303540

Electricity data stream

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

Figure 4 Averaged classification time (seconds) comparison

a heavily imbalance dataset where the accuracy of theclassifier was the worst of the compared algorithms As futurework a mechanism to address severe class imbalance shouldbe integrated to improve classification accuracy

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

The authors would like to thank the Instituto PolitecnicoNacional (Secretarıa Academica COFAA SIP CIC andCIDETEC) the CONACyT and SNI for their economicalsupport to develop this workThe support of theUniversity ofPorto was provided during the research stay from August toDecember 2014 The authors also want to thank Dr YennyVilluendas Rey for her contribution to perform the statisticalsignificance analysis

16 Mathematical Problems in Engineering

References

[1] V Turner J FGantzD Reinsel and SMinton ldquoThedigital uni-verse of opportunities rich data and the increasing value of theinternet of thingsrdquo Tech Rep IDC Information andData 2014httpwwwemccomleadershipdigital-universe2014iviewindexhtm

[2] A Bifet Adaptive Stream Mining Pattern Learning and Miningfrom Evolving Data Streams IOS Press Amsterdam The Neth-erlands 2010

[3] J Gama Knowledge Discovery from Data Streams Chapman ampHallCRC 2010

[4] J Beringer and E Hullermeier ldquoEfficient instance-based learn-ing on data streamsrdquo Intelligent Data Analysis vol 11 no 6 pp627ndash650 2007

[5] J Gama I Zliobaite A Bifet M Pechenizkiy and A Bou-chachia ldquoA survey on concept drift adaptationrdquo ACM Comput-ing Surveys vol 46 no 4 article 44 2014

[6] P Domingos and G Hulten ldquoMining high-speed data streamsrdquoinProceedings of the 6thACMSIGKDD International Conferenceon Knowledge Discovery and Data Mining pp 71ndash80 BostonMass USA August 2000

[7] G Hulten L Spencer and P Domingos ldquoMining time-chang-ing data streamsrdquo in Proceedings of the 7th ACM SIGKDD Inter-national Conference on Knowledge Discovery and Data Mining(KDD rsquo01) pp 97ndash106 San Francisco Calif USA August 2001

[8] C Liang Y Zhang P Shi and Z Hu ldquoLearning very fast deci-sion tree from uncertain data streams with positive and unla-beled samplesrdquo Information Sciences vol 213 pp 50ndash67 2012

[9] L Rutkowski M Jaworski L Pietruczuk and P Duda ldquoDeci-sion trees for mining data streams based on the gaussianapproximationrdquo IEEE Transactions on Knowledge and DataEngineering vol 26 no 1 pp 108ndash119 2014

[10] G Widmer andM Kubat ldquoLearning in the presence of conceptdrift and hidden contextsrdquo Machine Learning vol 23 no 1 pp69ndash101 1996

[11] F Ferrer-Troyano J S Aguilar-Ruiz and J C RiquelmeldquoData streams classification by incremental rule learning withparameterized generalizationrdquo in Proceedings of the 2006 ACMSymposium on Applied Computing pp 657ndash661 Dijon FranceApril 2006

[12] D Mena-Torres and J S Aguilar-Ruiz ldquoA similarity-basedapproach for data stream classificationrdquo Expert Systems withApplications vol 41 no 9 pp 4224ndash4234 2014

[13] A Shaker and E Hullermeier ldquoIBLStreams a system forinstance-based classification and regression on data streamsrdquoEvolving Systems vol 3 no 4 pp 235ndash249 2012

[14] A Ghazikhani R Monsefi and H S Yazdi ldquoRecursive leastsquare perceptron model for non-stationary and imbalanceddata stream classificationrdquo Evolving Systems vol 4 no 2 pp119ndash131 2013

[15] A Ghazikhani R Monsefi and H S Yazdi ldquoEnsemble ofonline neural networks for non-stationary and imbalanced datastreamsrdquo Neurocomputing vol 122 pp 535ndash544 2013

[16] A Sancho-Asensio A Orriols-Puig and E GolobardesldquoRobust on-line neural learning classifier system for data streamclassification tasksrdquo Soft Computing vol 18 no 8 pp 1441ndash14612014

[17] D M Farid L Zhang A Hossain et al ldquoAn adaptive ensembleclassifier for mining concept drifting data streamsrdquo ExpertSystems with Applications vol 40 no 15 pp 5895ndash5906 2013

[18] D Brzezinski and J Stefanowski ldquoCombining block-based andonline methods in learning ensembles from concept driftingdata streamsrdquo Information Sciences vol 265 pp 50ndash67 2014

[19] G Ditzler and R Polikar ldquoIncremental learning of conceptdrift from streaming imbalanced datardquo IEEE Transactions onKnowledge and Data Engineering vol 25 no 10 pp 2283ndash23012013

[20] M J Hosseini Z Ahmadi andH Beigy ldquoUsing a classifier poolin accuracy based tracking of recurring concepts in data streamclassificationrdquo Evolving Systems vol 4 no 1 pp 43ndash60 2013

[21] N Saunier and S Midenet ldquoCreating ensemble classifiersthrough order and incremental data selection in a streamrdquo Pat-tern Analysis and Applications vol 16 no 3 pp 333ndash347 2013

[22] MMMasud J Gao L Khan J Han and BMThuraisinghamldquoClassification and novel class detection in concept-driftingdata streams under time constraintsrdquo IEEE Transactions onKnowledge and Data Engineering vol 23 no 6 pp 859ndash8742011

[23] A Alzghoul M Lofstrand and B Backe ldquoData stream fore-casting for system fault predictionrdquo Computers and IndustrialEngineering vol 62 no 4 pp 972ndash978 2012

[24] J Kranjc J Smailovic V Podpecan M Grcar M Znidarsicand N Lavrac ldquoActive learning for sentiment analysis on datastreams methodology and workflow implementation in theClowdFlows platformrdquo Information Processing and Manage-ment vol 51 no 2 pp 187ndash203 2015

[25] J Smailovic M Grcar N Lavrac and M Znidarsic ldquoStream-based active learning for sentiment analysis in the financialdomainrdquo Information Sciences vol 285 pp 181ndash203 2014

[26] S Shalev-Shwartz Y Singer N Srebro and A Cotter ldquoPegasosprimal estimated sub-gradient solver for SVMrdquo in Proceedingsof the 24th International Conference on Machine Learning pp807ndash814 June 2007

[27] Z Miller B Dickinson W Deitrick W Hu and A H WangldquoTwitter spammer detection using data stream clusteringrdquoInformation Sciences vol 260 pp 64ndash73 2014

[28] B Krawczyk and M Wozniak ldquoIncremental weighted one-class classifier for mining stationary data streamsrdquo Journal ofComputational Science vol 9 pp 19ndash25 2015

[29] J Gama ldquoA survey on learning from data streams current andfuture trendsrdquo Progress in Artificial Intelligence vol 1 no 1 pp45ndash55 2012

[30] L I Kuncheva ldquoChange detection in streaming multivariatedata using likelihood detectorsrdquo IEEE Transactions on Knowl-edge and Data Engineering vol 25 no 5 pp 1175ndash1180 2013

[31] G J Ross N M Adams D K Tasoulis and D J Hand ldquoExpo-nentially weighted moving average charts for detecting conceptdriftrdquoPatternRecognition Letters vol 33 no 2 pp 191ndash198 2012

[32] J Gama P Medas G Castillo and P Rodrigues ldquoLearning withdrift detectionrdquo in Advances in Artificial IntelligencemdashSBIA2004 vol 3171 of Lecture Notes in Computer Science pp 286ndash295 Springer Berlin Germany 2004

[33] X Wu P Li and X Hu ldquoLearning from concept drifting datastreams with unlabeled datardquoNeurocomputing vol 92 pp 145ndash155 2012

[34] X Qin Y Zhang C Li and X Li ldquoLearning from data streamswith only positive and unlabeled datardquo Journal of IntelligentInformation Systems vol 40 no 3 pp 405ndash430 2013

Mathematical Problems in Engineering 17

[35] M M Masud C Woolam J Gao et al ldquoFacing the reality ofdata stream classification coping with scarcity of labeled datardquoKnowledge and Information Systems vol 33 no 1 pp 213ndash2442012

[36] J Read A Bifet G Holmes and B Pfahringer ldquoScalable andefficient multi-label classification for evolving data streamsrdquoMachine Learning vol 88 no 1-2 pp 243ndash272 2012

[37] I Lopez-Yanez A J Arguelles-Cruz O Camacho-Nieto andC Yanez-Marquez ldquoPollutants time-series prediction usingthe gamma classifierrdquo International Journal of ComputationalIntelligence Systems vol 4 no 4 pp 680ndash711 2011

[38] M E Acevedo-Mosqueda C Yanez-Marquez and I Lopez-Yanez ldquoAlpha-beta bidirectional associative memories theoryand applicationsrdquo Neural Processing Letters vol 26 no 1 pp 1ndash40 2007

[39] C Yanez-Marquez E M Felipe-Riveron I Lopez-Yanez andR Flores-Carapia ldquoA novel approach to automatic color match-ingrdquo in Progress in Pattern Recognition Image Analysis andApplications vol 4225 of Lecture Notes in Computer Science pp529ndash538 Springer Berlin Germany 2006

[40] I Lopez-Yanez L Sheremetov and C Yanez-Marquez ldquoAnovel associative model for time series data miningrdquo PatternRecognition Letters vol 41 no 1 pp 23ndash33 2014

[41] A Bifet G Holmes R Kirkby and B Pfahringer ldquoData streammining a practical approachrdquo Tech RepUniversity ofWaikato2011

[42] Massive Online Analysis (MOA) 2014 httpmoacmswaikatoacnzdatasets

[43] K Bache and M Lichman UCI Machine Learning RepositorySchool of Information and Computer Science University ofCalifornia Irvine Calif USA 2013 httparchiveicsucieduml

[44] A Bifet R Kirkby P Kranen and P Reutemann ldquoMassiveOnline Analysis Manualrdquo 2012 httpmoacmswaikatoacnzdocumentation

[45] M Harries ldquoSplice-2 comparative evaluation electricity pric-ingrdquo Tech Rep University of South Wales Cardiff UK 1999

[46] R Agrawal T Imielinski and A Swami ldquoDatabase mining aperformance perspectiverdquo IEEE Transactions on Knowledge andData Engineering vol 5 no 6 pp 914ndash925 1993

[47] J Gama R Sebastiao and P P Rodrigues ldquoOn evaluatingstream learning algorithmsrdquo Machine Learning vol 90 no 3pp 317ndash346 2013

[48] F Wilcoxon ldquoIndividual comparisons by ranking methodsrdquoBiometrics Bulletin vol 1 no 6 pp 80ndash83 1945

[49] M Hall E Frank G Holmes B Pfahringer P Reutemannand I H Witten WEKA 3 Data Mining Software in Java 2010httpwwwcswaikatoacnzmlweka

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 8: Research Article Data Stream Classification Based …downloads.hindawi.com/journals/mpe/2015/939175.pdfuncertain data [ ], classi cation with unlabeled instances [ , , ], novel class

8 Mathematical Problems in Engineering

(1) Initialize window(2) for each 119909120591 do(3) If 120591 lt 119882 then(4) Add 119909120591 to the tail of the window(5) Else(6) Use 119909120591 to classify(7) Add 119909120591 to the tail of the window

Algorithm 1 Learning policy algorithm

A data stream 119878 can be defined as a sequence of dataelements of the form 119878 = 1199041 1199042 119904now from a universe ofsize119863 where119863 is huge and potentially infinite but countableWe assume that each element 119904119894 has the following form

119904119894 = (119909120591 119910120591) (32)

where 119909120591 is an input vector of size 119899 and 119910120591 is the class labelassociatedwith it 120591 isin Z+ is an index to indicate the sequentialorder of the elements in the data stream 120591 will be boundedby the size of the data stream such that if the data stream isinfinite 120591 will take infinite values but countable

One of the main challenges for mining data streamsis the ability to maintain an accurate decision model Thecurrent model has to incorporate new knowledge from newdata and also it should forget old information when datais outdated To address these issues the proposed methodimplements a learning and forgetting policy The learning(insertion)forgetting (removal) policies were designed toguarantee the space and temporal relevance of the data andthe consistency of the current model

The proposed method implements a fixed windowapproach This approach achieves concept drift detection inan indirect way by considering only the most recent dataAlgorithm 1 outlines the general steps used as learning policyInitially the window is created with the first 119882 labeledrecords where 119882 is the size of the window After thatevery sample is used for classification and evaluation of theperformance and then it is used to update the window

Classifiers that deal with concept drifts are forced toimplement forgetting adaptation or drift detection mech-anisms in order to adjust to changing environments Inthis proposal a forgetting mechanism is implemented Algo-rithm 2 outlines the basic forgetting policy implementedin this proposal The oldest element in the window 119909

120575 isremoved

24 MOA Massive Online Analysis (MOA) is a softwareenvironment for implementing algorithms and runningexperiments for online learning from evolving data streams[41] MOA contains a collection of offline and online algo-rithms for both classification and clustering as well as toolsfor evaluation on data streams with concept drifts MOA alsocan be used to build synthetic data streams using generatorsreading ARFF files or joining several streams MOA streamsgenerators allow simulating potentially infinite sequences of

Table 2 Datasets characteristics

Dataset Instances Attributes ClassesAgrawal 100000 10 2Bank 45211 16 2CoverType 581012 54 7Electricity 45312 8 2Hyperplane 100000 10 2RandomRBF 100000 10 2

data with different characteristics (different types of conceptdrifts and recurrent concepts among others) All the algo-rithms used for comparative analysis are implemented underthe MOA framework

25 Data Streams This section provides a brief descriptionof the data streams used during the experimental phase Theproposed solution has been tested using synthetic and realdata streams For data stream classification problems just afew suitable data streams can be successfully used for evalu-ation A few data streams with enough examples are hostedin [42 43] For further evaluation the use of synthetic datastreams generators becomes necessary The synthetics datastreams used in our experiments were generated using theMOA framework mentioned in the previous section A sum-mary of the main characteristics of the data streams is shownin Table 2 Descriptions of the data streams were taken from[42 43] in the case of real ones and from [44] for the synthet-ics

251 Real World Data Streams

Bank Marketing Dataset The Bank Marketing dataset isavailable from UCI Machine Learning Repository [43] Thisdata is related to direct marketing campaigns (phone calls)of a Portuguese banking institution The classification goal isto predict if the client will subscribe or not a term deposit Itcontains 45211 instances Each instance has 16 attributes andthe class label The dataset has two classes that identified if abank term deposit would be or would not be subscribed

Electricity Dataset This data was collected from the Aus-tralian New South Wales electricity market In this marketprices are not fixed and are affected by demand and supply ofthemarketTheprices of themarket are set every fiveminutesThe dataset was first described by Harries in [45] During thetime period described in the dataset the electricity marketwas expanded with the inclusion of adjacent areas whichleads tomore elaboratemanagement of the supplyThe excessproduction of one region could be sold on the adjacentregionThe Electricity dataset contains 45312 instances Eachexample of the dataset refers to a period of 30 minutes (iethere are 48 instances for each time period of one day) Eachexample on the dataset has 8 fields the day of week the timestamp the New South Wales electricity demand the NewSouthWales electricity price the Victoria electricity demandtheVictoria electricity price the scheduled electricity transfer

Mathematical Problems in Engineering 9

(1) for each 119909120591 do(2) If 120591 gt 119882 then(3) Remove 119909120575(120575 = 120591 minus119882 + 1) from the beginning of the window

Algorithm 2 Forgetting policy algorithm

between states and the class label The class label identifiesthe change of the price relative to a moving average of the last24 hours We use the normalized versions of these datasets sothat the numerical values are between 0 and 1 The dataset isavailable from [42]

Forest CoverType Dataset The Forest CoverType dataset isone of the largest datasets available from the UCI MachineLearning Repository [43] This dataset contains informationthat describes forest cover types from cartographic variablesA given observation (30times 30meter cell) was determined fromthe US Forest Service (USFS) Region 2 Resource InformationSystem (RIS) data It contains 581012 instances and 54attributes Data is in raw form (not scaled) and containsbinary (0 or 1) columns of data for qualitative independentvariables (wilderness areas and soil types) The dataset has 7classes that identify the type of forest cover

252 Synthetic Data Streams

Agrawal Data Stream This generator was originally intro-duced by Agrawal et al in [46] The MOA implementationwas used to generate the data streams for the experimen-tal phase in this work The generator produces a streamcontaining nine attributes Although not explicitly stated bythe authors a sensible conclusion is that these attributesdescribe hypothetical loan applications There are ten func-tions defined for generating binary class labels from theattributes Presumably these determine whether the loanshould be approved Perturbation shifts numeric attributesfrom their true value adding an offset drawn randomly froma uniform distribution the range of which is a specifiedpercentage of the total value range [44] For each experimenta data stream with 100000 examples was generated

RandomRBFData StreamThis generator was devised to offeran alternate complex concept type that is not straightforwardto approximate with a decision tree model The RBF (RadialBasis Function) generator works as follows A fixed numberof random centroids are generated Each center has a randomposition a single standard deviation class label and weightNew examples are generated by selecting a center at randomtaking weights into consideration so that centers with higherweight are more likely to be chosen A random direction ischosen to offset the attribute values from the central pointThe length of the displacement is randomly drawn from aGaussian distribution with standard deviation determined bythe chosen centroid The chosen centroid also determinesthe class label of the example Only numeric attributes aregenerated [44] For our experiments a data stream with

100000 data instances 10 numerical attributes and 2 classeswas generated

Rotating Hyperplane Data Stream This synthetic data streamwas generated using the MOA framework A hyperplane in119889-dimensional space is the set of points 119909 that satisfy [44]

119889

sum

119894=1119909119894119908119894 = 1199080 =

119889

sum

119894=1119908119894 (33)

where 119909119894 is the 119894th coordinate of 119909 examples for whichsum119889119894=1 119909119894119908119894 ge 1199080 are labeled positive and examples for which

sum119889119894=1 119909119894119908119894 lt 1199080 are labeled negative Hyperplanes are useful

for simulating time-changing concepts because orientationand position of the hyperplane can be adjusted in a gradualway if we change the relative size of the weights In MOAthey introduce change to this data stream by adding driftto each weight attribute using this formula 119908119894 = 119908119894 + 119889120590where 120590 is the probability that the direction of change isreversed and 119889 is the change applied to every example Forour experiments a data stream with 100000 data instances10 numerical attributes and 2 classes was generated

26 Algorithms Evaluation One of the objectives of this workis to perform a consistent comparison between the classi-fication performance of our proposal and the classificationperformance of other well-known data stream pattern clas-sification algorithms Evaluation of data stream algorithmsis a relatively new field that has not been studied as well asevaluation on batch learning In traditional batch learningevaluation has been mainly limited to evaluating a modelusing different random reorder of the same static dataset (119896-fold cross validation leave-one-out and bootstrap) [2] Fordata stream scenarios the problem of infinite data raises newchallenges On the other hand the batch approaches cannoteffectively measure accuracy of a model in a context withconcepts that change over time In the data stream settingone of the main concerns is how to design an evaluationpractice that allows us to obtain a reliablemeasure of accuracyover time According to [41] two main approaches arise

261 Holdout When traditional batch learning reaches ascale where cross validation is too time-consuming it is oftenaccepted to insteadmeasure performance on a single holdoutset An extension of this approach can be applied for datastream scenarios First a set of 119872 instances is used to trainthe model then a set of the following unseen 119873 instances isused to test the model then it is again trained with the next119872 instances and testedwith the subsequence119873 instances andso forth

10 Mathematical Problems in Engineering

262 Interleaved Test-Then-Train or Prequential Each indi-vidual example can be used to test the model before it is usedfor training and from this the accuracy can be incrementallyupdated When intentionally performed in this order themodel is always being tested on examples it has not seen

Holdout evaluation gives a more accurate estimationof the performance of the classifier on more recent dataHowever it requires recent test data that is sometimes difficultto obtain while the prequential evaluation has the advantagethat no holdout set is needed for testing making maximumuse of the available data In [47] Gama et al present a generalframework to assess data stream algorithms They studieddifferent mechanisms of evaluation that include holdoutestimator the prequential error and the prequential errorestimated over a sliding window or using fading factorsTheystated that the use of prequential error with forgetting mech-anisms reveals to be advantageous in assessing performanceand in comparing stream learning algorithms Since ourproposedmethod isworkingwith a slidingwindowapproachwe have selected the prequential error over sliding windowfor evaluation of the proposedmethodThe prequential error119875119882 for a sliding window of size 119882 computed at time 119894 isbased on an accumulated sumof a 0-1 loss function119871 betweenthe predicted 119910119896 values and the observed values 119910119896 using thefollowing expression [47]

119875119882 (119894) =1119882

119894

sum

119896=119894minus119882+1119871 (119910119896 119910119896) (34)

We also evaluate the performance results of each competingalgorithm with the Gamma classifier using a statistical testWe use the Wilcoxon test [48] to assess the statistical sig-nificance of the results The Wilcoxon signed-ranks test is anonparametric test which can be used to rank the differencesin performances of two classifiers for each dataset ignoringthe signs and comparing the ranks for the positive and thenegative differences

27 Experimental Setup To ensure valid comparison of clas-sification performance the same conditions and validationschemes were applied in each experiment Classificationperformance of each of the compared algorithms was cal-culated using the prequential error approach These resultswill be used to compare the performance of our proposaland other data stream classification algorithms We haveaimed at choosing representative algorithms from differentapproaches covering the state of the art in data stream classi-fication such as Hoeffding Tree IBL Naıve Bayes with DDM(Drift Detection Method) SVM and ensemble methods Allof these algorithms are implemented in theMOA frameworkFurther details on the implementation of these algorithmscan be found in [44] For all data streams with exceptionof the Electricity dataset the experiments were executed 10times and the average of the performance and executiontime results are presented Records in the Electricity datasethave a temporal relation that should not be altered forthis reason with this specific dataset the experiment wasexecuted just one time The Bank and CoverType datasets

were reordered 10 times using the randomized filter fromthe WEKA platform [49] for each experiment For thesynthetics data streams (Agrawal Rotating Hyperplane andRandomRBF) 10 data streams were generated for each oneof them using different random seeds in MOA Performancewas calculated using the prequential error introduced in theprevious section We also measure the total time used forclassification to evaluate the efficiency of each algorithm Allexperiments were conducted using a personal computer withan Intel Core i3-2100 processor running Ubuntu 1304 64-bitoperating system with 4096GB of RAM

3 Results and Discussion

In this section we present and discuss the results obtainedduring the experimental phase throughout which six datastreams three real and three synthetic were used to obtainthe classification and time performance for each of thecompared classification algorithms First we evaluate thesensibility of the proposed method to the change on thewindows size which is a parameter that can largely influencethe performance of the classifier Table 3 shows the averageaccuracy of the DS-Gamma classifier for different windowsizes best accuracy for each data stream is emphasized withboldface Table 4 shows the average total time used to classifyeach data stream The results of the average performancewith indicator of the standard deviation are also depicted inFigure 1

In general the DS-Gamma classifier performanceimproved as the window size increases with the exceptionof Electricity data stream for this data stream performanceis better with smaller window sizes It is worth noting thatthe DS-Gamma classifier achieved its best performance forthe Electricity data stream with a window size of 50 witha performance of 8912 This performance is better thanall the performances achieved by all the other comparedalgorithms with a window size = 1000 For the other datastreams the best performance was generally obtained withwindow sizes between 500 and 1000 With window sizesgreater than 1000 performance keeps stable and in somecases even degrades as we can observe in the results fromthe Bank and Hyperplane data streams in Figure 1 Figure 2shows the average classification time for different windowsize A linear trend line (dotted line) and the coefficient ofdetermination 1198772 were included in the graphic of each datastream The 1198772 values ranging between 09743 and 09866show that time is growing in linear way as the size of thewindow increases

To evaluate the DS-Gamma classifier we compared itsperformance with other data stream classifiers Table 5 pre-sents the performance results of each evaluated algorithm oneach data stream including the standard deviation Best per-formance is emphasized with boldface We also include boxplot with the same results in Figure 3 For the Agrawal datastreamHoeffdingTree and theDS-Gamma classifier achievedthe two best performances OzaBoostAdwin presents the bestperformance for the CoverType and Electricity data streamsbut this ensemble classifier exhibits the highest classificationtime for all the data streams as we can observe in Table 6 and

Mathematical Problems in Engineering 11

Table 3 Average accuracy of DS-Gamma for different windows size

Dataset Window size50 100 200 500 750 1000 1250 1500 1750 2000

Agrawal 7964 8597 8986 9259 9288 9282 9267 9251 9232 9207Bank 6819 7340 7705 7913 7912 7896 7860 7824 7776 7737CoverType 3821 3761 3970 4430 4659 4777 4862 4919 4951 4974Electricity 8912 8526 8171 7984 7858 7714 7636 7594 7531 7449Hyperplane 7825 8265 8528 8656 8655 8630 8598 8565 8534 8500RandomRBF 6461 6739 6933 7056 7081 7085 7074 7063 7050 7033

Table 4 Averaged classification time (seconds) of DS-Gamma for different windows size

Dataset Window size50 100 200 500 750 1000 1250 1500 1750 2000

Agrawal 224 450 793 1926 2804 3900 4592 5757 6390 7349Bank 066 114 207 475 701 917 1154 1455 1583 1787CoverType 8347 14081 25120 34005 25412 33503 42894 52239 66924 67699Electricity 034 053 098 233 356 463 599 705 830 949Hyperplane 224 401 756 1819 2676 3028 4428 5296 6187 7063RandomRBF 279 409 773 1854 2735 3072 4502 5412 6223 7104

Table 5 Averaged accuracy () comparison

Classifier Agrawal Bank CoverType Electricity Hyperplane RandomRBFAcc 120590 Acc 120590 Acc 120590 Acc 120590 Acc 120590 Acc 120590

Bayes-DDM 8816 019 8338 141 6761 681 8105 mdash 8602 139 7187 382Perceptron DDM 6720 011 7343 1053 7076 555 7861 mdash 8565 149 7188 593Hoeffding Tree 9439 024 8852 011 7229 268 8241 mdash 8241 562 8672 169Active Classifier 9082 162 8821 014 6731 103 7859 mdash 7950 655 7532 270IBL-DS 8676 048 8652 021 6173 946 7534 mdash 8420 077 9332 121AWE 9217 059 8790 038 7616 366 7836 mdash 8588 165 9198 114OzaBoostAdwin 8711 262 8767 050 7855 497 8792 mdash 8321 191 9149 114SGD 6720 011 7656 166 5901 737 8452 mdash 8319 042 6174 331SPegasos 5582 017 7746 199 5047 517 5760 mdash 5280 608 5333 313DS-Gamma 9282 010 7896 022 4777 431 7714 mdash 8630 135 7085 254

Table 6 Averaged classification time (seconds) comparison

Classifier Agrawal Bank CoverType Electricity Hyperplane RandomRBFTime 120590 Time 120590 Time 120590 Time 120590 Time 120590 Time 120590

Bayes-DDM 062 011 039 008 1776 162 061 mdash 070 001 069 001Perceptron DDM 040 002 020 005 828 010 019 mdash 045 002 044 002Hoeffding Tree 098 023 054 003 3005 1106 078 mdash 136 005 140 005Active Classifier 064 003 026 002 1514 074 023 mdash 072 004 070 002IBL-DS 5319 211 3851 826 105760 6095 861 mdash 8639 096 8810 106AWE 1432 020 663 030 101247 30441 670 mdash 1803 364 3752 303OzaBoostAdwin 19262 4447 4693 1056 1058973 389664 3975 mdash 22399 5134 5704 1666SGD 014 004 011 0009 4035 007 012 mdash 027 002 027 002SPegasos 022 0006 011 0016 1684 012 012 mdash 031 005 024 001DS-Gamma 3900 358 916 005 33503 1291 463 mdash 3028 079 3072 094

12 Mathematical Problems in Engineering

Agrawal data stream

Window size

95

90

85

80

7550 100 200 500 750 1000 1250 1500 1750 2000

Hyperplane data stream

90

85

80

70

75

65

Window size50 100 200 500 750 1000 1250 1500 1750 2000

Bank data stream85

80

70

75

6550 100 200 500 750 1000 1250 1500 1750 2000

Window size

Window size

90

85

80

70

75

65

60

50

55

50 100 200 500 750 1000 1250 1500 1750 2000

RandomRBF data stream

60

50

40

30

20

10

0

Window size50 100 200 500 750 1000 1250 1500 1750 2000

CoverType data stream

90

95

85

80

70

75

6550 100 200 500 750 1000 1250 1500 1750 2000

Window size

Electricity data stream

Figure 1 Average accuracy of DS-Gamma for different windows size

Figure 4 This can be a serious disadvantage especially indata stream scenarios The lowest times are achieved by theperceptron with Drift Detection Method (DDM) but itsperformance is generally low when compared to the othermethods A fact that caught our attention was the low perfor-mance of the DS-Gamma classifier for the CoverType datastream This data stream presents a heavy class imbalancethat seems to be negatively affecting the performance of theclassifier As we can observe in step (8) of the DS-Gamma

classifier algorithm classification relies on a weighted sumper class when a class greatly outnumbers the other(s) thissum will be biased to the class with most members

To statistically compare the results obtained by the DS-Gamma classifier and the other evaluated algorithms theWilcoxon signed-ranks test was used Table 7 shows thevalues of the asymptotic significance (2-sided) obtained bythe Wilcoxon test using 120572 = 005 The obtained values areall greater than 120572 so the null hypothesis is not rejected and

Mathematical Problems in Engineering 13

Hyperplane data stream

Window size

Window size

CoverType data stream Electricity data stream

RandomRBF data stream

Agrawal data stream Bank data stream80

70

60

50

40

30

20

20

15

10

5

010

050 100 200 500 750 1000 1250 1500 1750 2000

50 100 200 500 750 1000 1250 1500 1750 2000

Window size50 100 200 500 750 1000 1250 1500 1750 2000 Window size

50 100 200 500 750 1000 1250 1500 1750 2000

Window size50 100 200 500 750 1000 1250 1500 1750 2000

Window size50 100 200 500 750 1000 1250 1500 1750 2000

minus10

70

60

50

40

30

20

10

0

550

450

350

250

150

50

10

8

6

4

2

0

minus10

minus50

70

60

50

40

30

20

10

0

minus10

minus2

minus5

R

2

= 09866

R

2

= 09854

R

2

= 09804

R

2

= 09796

R

2

= 09743

R

2

= 09842

Figure 2 Averaged classification time (seconds) of DS-Gamma for different windows size

we can infer that there are no significant differences amongthe compared algorithms

4 Conclusions

In this paper we describe a novel method for the task ofpattern classification over a continuous data stream based onan associative model The proposed method is a supervisedpattern recognition model based on the Gamma classifierDuring the experimental phase the proposed method wastested using different data stream scenarios with real andsynthetic data streams The proposed method presented verypromising results in terms of performance and classification

Table 7 Wilcoxon results

Compared algorithms Asymptotic significance (2-sided)Gamma-Bayes-DDM 0345Gamma-Perceptron DDM 0917Gamma-Hoeffding Tree 0116Gamma-Active Classifier 0463Gamma-IBL-DS 0345Gamma-AWE 0116Gamma-OzaBoostAdwin 0116Gamma-SGD 0600Gamma-SPegasos 0075

14 Mathematical Problems in Engineering

Agrawal data stream

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

95

90

85

80

75

70

65

60

55

50

Bank data stream

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

90

80

70

60

50

40

Hyperplane data stream

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

95

90

85

80

75

70

65

60

55

45

50

RandomRBF data stream

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

95

90

85

80

75

70

65

60

55

45

50

CoverType data stream

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

90

85

80

75

70

65

60

55

45

50

Electricity data stream100

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

90

80

70

60

50

40

30

20

10

0

Figure 3 Averaged accuracy () comparison

time when compared with some of the state-of-the-art algo-rithms for data stream classification

We also study the effect of the window size over the clas-sification accuracy and time In general accuracy improvedas the window size increased We observed that best perfor-mances were obtained with window sizes between 500 and

1000 For larger window sizes performance remained stableand in some cases even declined

Since the proposed method based its classification ona weighted sum per class it is strongly affected by severeclass imbalance as we can observe from the results obtainedwith the experiments performed with the CoverType dataset

Mathematical Problems in Engineering 15

Agrawal data stream

200

150

100

50

0

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

1

10

100

Hyperplane data stream

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

1

10

100

1000

10000

100000CoverType data stream

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

0102030405060708090

RandomRBF data stream

0

10

20

30

40

50

60Bank data stream

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

05

10152025303540

Electricity data stream

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

Figure 4 Averaged classification time (seconds) comparison

a heavily imbalance dataset where the accuracy of theclassifier was the worst of the compared algorithms As futurework a mechanism to address severe class imbalance shouldbe integrated to improve classification accuracy

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

The authors would like to thank the Instituto PolitecnicoNacional (Secretarıa Academica COFAA SIP CIC andCIDETEC) the CONACyT and SNI for their economicalsupport to develop this workThe support of theUniversity ofPorto was provided during the research stay from August toDecember 2014 The authors also want to thank Dr YennyVilluendas Rey for her contribution to perform the statisticalsignificance analysis

16 Mathematical Problems in Engineering

References

[1] V Turner J FGantzD Reinsel and SMinton ldquoThedigital uni-verse of opportunities rich data and the increasing value of theinternet of thingsrdquo Tech Rep IDC Information andData 2014httpwwwemccomleadershipdigital-universe2014iviewindexhtm

[2] A Bifet Adaptive Stream Mining Pattern Learning and Miningfrom Evolving Data Streams IOS Press Amsterdam The Neth-erlands 2010

[3] J Gama Knowledge Discovery from Data Streams Chapman ampHallCRC 2010

[4] J Beringer and E Hullermeier ldquoEfficient instance-based learn-ing on data streamsrdquo Intelligent Data Analysis vol 11 no 6 pp627ndash650 2007

[5] J Gama I Zliobaite A Bifet M Pechenizkiy and A Bou-chachia ldquoA survey on concept drift adaptationrdquo ACM Comput-ing Surveys vol 46 no 4 article 44 2014

[6] P Domingos and G Hulten ldquoMining high-speed data streamsrdquoinProceedings of the 6thACMSIGKDD International Conferenceon Knowledge Discovery and Data Mining pp 71ndash80 BostonMass USA August 2000

[7] G Hulten L Spencer and P Domingos ldquoMining time-chang-ing data streamsrdquo in Proceedings of the 7th ACM SIGKDD Inter-national Conference on Knowledge Discovery and Data Mining(KDD rsquo01) pp 97ndash106 San Francisco Calif USA August 2001

[8] C Liang Y Zhang P Shi and Z Hu ldquoLearning very fast deci-sion tree from uncertain data streams with positive and unla-beled samplesrdquo Information Sciences vol 213 pp 50ndash67 2012

[9] L Rutkowski M Jaworski L Pietruczuk and P Duda ldquoDeci-sion trees for mining data streams based on the gaussianapproximationrdquo IEEE Transactions on Knowledge and DataEngineering vol 26 no 1 pp 108ndash119 2014

[10] G Widmer andM Kubat ldquoLearning in the presence of conceptdrift and hidden contextsrdquo Machine Learning vol 23 no 1 pp69ndash101 1996

[11] F Ferrer-Troyano J S Aguilar-Ruiz and J C RiquelmeldquoData streams classification by incremental rule learning withparameterized generalizationrdquo in Proceedings of the 2006 ACMSymposium on Applied Computing pp 657ndash661 Dijon FranceApril 2006

[12] D Mena-Torres and J S Aguilar-Ruiz ldquoA similarity-basedapproach for data stream classificationrdquo Expert Systems withApplications vol 41 no 9 pp 4224ndash4234 2014

[13] A Shaker and E Hullermeier ldquoIBLStreams a system forinstance-based classification and regression on data streamsrdquoEvolving Systems vol 3 no 4 pp 235ndash249 2012

[14] A Ghazikhani R Monsefi and H S Yazdi ldquoRecursive leastsquare perceptron model for non-stationary and imbalanceddata stream classificationrdquo Evolving Systems vol 4 no 2 pp119ndash131 2013

[15] A Ghazikhani R Monsefi and H S Yazdi ldquoEnsemble ofonline neural networks for non-stationary and imbalanced datastreamsrdquo Neurocomputing vol 122 pp 535ndash544 2013

[16] A Sancho-Asensio A Orriols-Puig and E GolobardesldquoRobust on-line neural learning classifier system for data streamclassification tasksrdquo Soft Computing vol 18 no 8 pp 1441ndash14612014

[17] D M Farid L Zhang A Hossain et al ldquoAn adaptive ensembleclassifier for mining concept drifting data streamsrdquo ExpertSystems with Applications vol 40 no 15 pp 5895ndash5906 2013

[18] D Brzezinski and J Stefanowski ldquoCombining block-based andonline methods in learning ensembles from concept driftingdata streamsrdquo Information Sciences vol 265 pp 50ndash67 2014

[19] G Ditzler and R Polikar ldquoIncremental learning of conceptdrift from streaming imbalanced datardquo IEEE Transactions onKnowledge and Data Engineering vol 25 no 10 pp 2283ndash23012013

[20] M J Hosseini Z Ahmadi andH Beigy ldquoUsing a classifier poolin accuracy based tracking of recurring concepts in data streamclassificationrdquo Evolving Systems vol 4 no 1 pp 43ndash60 2013

[21] N Saunier and S Midenet ldquoCreating ensemble classifiersthrough order and incremental data selection in a streamrdquo Pat-tern Analysis and Applications vol 16 no 3 pp 333ndash347 2013

[22] MMMasud J Gao L Khan J Han and BMThuraisinghamldquoClassification and novel class detection in concept-driftingdata streams under time constraintsrdquo IEEE Transactions onKnowledge and Data Engineering vol 23 no 6 pp 859ndash8742011

[23] A Alzghoul M Lofstrand and B Backe ldquoData stream fore-casting for system fault predictionrdquo Computers and IndustrialEngineering vol 62 no 4 pp 972ndash978 2012

[24] J Kranjc J Smailovic V Podpecan M Grcar M Znidarsicand N Lavrac ldquoActive learning for sentiment analysis on datastreams methodology and workflow implementation in theClowdFlows platformrdquo Information Processing and Manage-ment vol 51 no 2 pp 187ndash203 2015

[25] J Smailovic M Grcar N Lavrac and M Znidarsic ldquoStream-based active learning for sentiment analysis in the financialdomainrdquo Information Sciences vol 285 pp 181ndash203 2014

[26] S Shalev-Shwartz Y Singer N Srebro and A Cotter ldquoPegasosprimal estimated sub-gradient solver for SVMrdquo in Proceedingsof the 24th International Conference on Machine Learning pp807ndash814 June 2007

[27] Z Miller B Dickinson W Deitrick W Hu and A H WangldquoTwitter spammer detection using data stream clusteringrdquoInformation Sciences vol 260 pp 64ndash73 2014

[28] B Krawczyk and M Wozniak ldquoIncremental weighted one-class classifier for mining stationary data streamsrdquo Journal ofComputational Science vol 9 pp 19ndash25 2015

[29] J Gama ldquoA survey on learning from data streams current andfuture trendsrdquo Progress in Artificial Intelligence vol 1 no 1 pp45ndash55 2012

[30] L I Kuncheva ldquoChange detection in streaming multivariatedata using likelihood detectorsrdquo IEEE Transactions on Knowl-edge and Data Engineering vol 25 no 5 pp 1175ndash1180 2013

[31] G J Ross N M Adams D K Tasoulis and D J Hand ldquoExpo-nentially weighted moving average charts for detecting conceptdriftrdquoPatternRecognition Letters vol 33 no 2 pp 191ndash198 2012

[32] J Gama P Medas G Castillo and P Rodrigues ldquoLearning withdrift detectionrdquo in Advances in Artificial IntelligencemdashSBIA2004 vol 3171 of Lecture Notes in Computer Science pp 286ndash295 Springer Berlin Germany 2004

[33] X Wu P Li and X Hu ldquoLearning from concept drifting datastreams with unlabeled datardquoNeurocomputing vol 92 pp 145ndash155 2012

[34] X Qin Y Zhang C Li and X Li ldquoLearning from data streamswith only positive and unlabeled datardquo Journal of IntelligentInformation Systems vol 40 no 3 pp 405ndash430 2013

Mathematical Problems in Engineering 17

[35] M M Masud C Woolam J Gao et al ldquoFacing the reality ofdata stream classification coping with scarcity of labeled datardquoKnowledge and Information Systems vol 33 no 1 pp 213ndash2442012

[36] J Read A Bifet G Holmes and B Pfahringer ldquoScalable andefficient multi-label classification for evolving data streamsrdquoMachine Learning vol 88 no 1-2 pp 243ndash272 2012

[37] I Lopez-Yanez A J Arguelles-Cruz O Camacho-Nieto andC Yanez-Marquez ldquoPollutants time-series prediction usingthe gamma classifierrdquo International Journal of ComputationalIntelligence Systems vol 4 no 4 pp 680ndash711 2011

[38] M E Acevedo-Mosqueda C Yanez-Marquez and I Lopez-Yanez ldquoAlpha-beta bidirectional associative memories theoryand applicationsrdquo Neural Processing Letters vol 26 no 1 pp 1ndash40 2007

[39] C Yanez-Marquez E M Felipe-Riveron I Lopez-Yanez andR Flores-Carapia ldquoA novel approach to automatic color match-ingrdquo in Progress in Pattern Recognition Image Analysis andApplications vol 4225 of Lecture Notes in Computer Science pp529ndash538 Springer Berlin Germany 2006

[40] I Lopez-Yanez L Sheremetov and C Yanez-Marquez ldquoAnovel associative model for time series data miningrdquo PatternRecognition Letters vol 41 no 1 pp 23ndash33 2014

[41] A Bifet G Holmes R Kirkby and B Pfahringer ldquoData streammining a practical approachrdquo Tech RepUniversity ofWaikato2011

[42] Massive Online Analysis (MOA) 2014 httpmoacmswaikatoacnzdatasets

[43] K Bache and M Lichman UCI Machine Learning RepositorySchool of Information and Computer Science University ofCalifornia Irvine Calif USA 2013 httparchiveicsucieduml

[44] A Bifet R Kirkby P Kranen and P Reutemann ldquoMassiveOnline Analysis Manualrdquo 2012 httpmoacmswaikatoacnzdocumentation

[45] M Harries ldquoSplice-2 comparative evaluation electricity pric-ingrdquo Tech Rep University of South Wales Cardiff UK 1999

[46] R Agrawal T Imielinski and A Swami ldquoDatabase mining aperformance perspectiverdquo IEEE Transactions on Knowledge andData Engineering vol 5 no 6 pp 914ndash925 1993

[47] J Gama R Sebastiao and P P Rodrigues ldquoOn evaluatingstream learning algorithmsrdquo Machine Learning vol 90 no 3pp 317ndash346 2013

[48] F Wilcoxon ldquoIndividual comparisons by ranking methodsrdquoBiometrics Bulletin vol 1 no 6 pp 80ndash83 1945

[49] M Hall E Frank G Holmes B Pfahringer P Reutemannand I H Witten WEKA 3 Data Mining Software in Java 2010httpwwwcswaikatoacnzmlweka

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 9: Research Article Data Stream Classification Based …downloads.hindawi.com/journals/mpe/2015/939175.pdfuncertain data [ ], classi cation with unlabeled instances [ , , ], novel class

Mathematical Problems in Engineering 9

(1) for each 119909120591 do(2) If 120591 gt 119882 then(3) Remove 119909120575(120575 = 120591 minus119882 + 1) from the beginning of the window

Algorithm 2 Forgetting policy algorithm

between states and the class label The class label identifiesthe change of the price relative to a moving average of the last24 hours We use the normalized versions of these datasets sothat the numerical values are between 0 and 1 The dataset isavailable from [42]

Forest CoverType Dataset The Forest CoverType dataset isone of the largest datasets available from the UCI MachineLearning Repository [43] This dataset contains informationthat describes forest cover types from cartographic variablesA given observation (30times 30meter cell) was determined fromthe US Forest Service (USFS) Region 2 Resource InformationSystem (RIS) data It contains 581012 instances and 54attributes Data is in raw form (not scaled) and containsbinary (0 or 1) columns of data for qualitative independentvariables (wilderness areas and soil types) The dataset has 7classes that identify the type of forest cover

252 Synthetic Data Streams

Agrawal Data Stream This generator was originally intro-duced by Agrawal et al in [46] The MOA implementationwas used to generate the data streams for the experimen-tal phase in this work The generator produces a streamcontaining nine attributes Although not explicitly stated bythe authors a sensible conclusion is that these attributesdescribe hypothetical loan applications There are ten func-tions defined for generating binary class labels from theattributes Presumably these determine whether the loanshould be approved Perturbation shifts numeric attributesfrom their true value adding an offset drawn randomly froma uniform distribution the range of which is a specifiedpercentage of the total value range [44] For each experimenta data stream with 100000 examples was generated

RandomRBFData StreamThis generator was devised to offeran alternate complex concept type that is not straightforwardto approximate with a decision tree model The RBF (RadialBasis Function) generator works as follows A fixed numberof random centroids are generated Each center has a randomposition a single standard deviation class label and weightNew examples are generated by selecting a center at randomtaking weights into consideration so that centers with higherweight are more likely to be chosen A random direction ischosen to offset the attribute values from the central pointThe length of the displacement is randomly drawn from aGaussian distribution with standard deviation determined bythe chosen centroid The chosen centroid also determinesthe class label of the example Only numeric attributes aregenerated [44] For our experiments a data stream with

100000 data instances 10 numerical attributes and 2 classeswas generated

Rotating Hyperplane Data Stream This synthetic data streamwas generated using the MOA framework A hyperplane in119889-dimensional space is the set of points 119909 that satisfy [44]

119889

sum

119894=1119909119894119908119894 = 1199080 =

119889

sum

119894=1119908119894 (33)

where 119909119894 is the 119894th coordinate of 119909 examples for whichsum119889119894=1 119909119894119908119894 ge 1199080 are labeled positive and examples for which

sum119889119894=1 119909119894119908119894 lt 1199080 are labeled negative Hyperplanes are useful

for simulating time-changing concepts because orientationand position of the hyperplane can be adjusted in a gradualway if we change the relative size of the weights In MOAthey introduce change to this data stream by adding driftto each weight attribute using this formula 119908119894 = 119908119894 + 119889120590where 120590 is the probability that the direction of change isreversed and 119889 is the change applied to every example Forour experiments a data stream with 100000 data instances10 numerical attributes and 2 classes was generated

26 Algorithms Evaluation One of the objectives of this workis to perform a consistent comparison between the classi-fication performance of our proposal and the classificationperformance of other well-known data stream pattern clas-sification algorithms Evaluation of data stream algorithmsis a relatively new field that has not been studied as well asevaluation on batch learning In traditional batch learningevaluation has been mainly limited to evaluating a modelusing different random reorder of the same static dataset (119896-fold cross validation leave-one-out and bootstrap) [2] Fordata stream scenarios the problem of infinite data raises newchallenges On the other hand the batch approaches cannoteffectively measure accuracy of a model in a context withconcepts that change over time In the data stream settingone of the main concerns is how to design an evaluationpractice that allows us to obtain a reliablemeasure of accuracyover time According to [41] two main approaches arise

261 Holdout When traditional batch learning reaches ascale where cross validation is too time-consuming it is oftenaccepted to insteadmeasure performance on a single holdoutset An extension of this approach can be applied for datastream scenarios First a set of 119872 instances is used to trainthe model then a set of the following unseen 119873 instances isused to test the model then it is again trained with the next119872 instances and testedwith the subsequence119873 instances andso forth

10 Mathematical Problems in Engineering

262 Interleaved Test-Then-Train or Prequential Each indi-vidual example can be used to test the model before it is usedfor training and from this the accuracy can be incrementallyupdated When intentionally performed in this order themodel is always being tested on examples it has not seen

Holdout evaluation gives a more accurate estimationof the performance of the classifier on more recent dataHowever it requires recent test data that is sometimes difficultto obtain while the prequential evaluation has the advantagethat no holdout set is needed for testing making maximumuse of the available data In [47] Gama et al present a generalframework to assess data stream algorithms They studieddifferent mechanisms of evaluation that include holdoutestimator the prequential error and the prequential errorestimated over a sliding window or using fading factorsTheystated that the use of prequential error with forgetting mech-anisms reveals to be advantageous in assessing performanceand in comparing stream learning algorithms Since ourproposedmethod isworkingwith a slidingwindowapproachwe have selected the prequential error over sliding windowfor evaluation of the proposedmethodThe prequential error119875119882 for a sliding window of size 119882 computed at time 119894 isbased on an accumulated sumof a 0-1 loss function119871 betweenthe predicted 119910119896 values and the observed values 119910119896 using thefollowing expression [47]

119875119882 (119894) =1119882

119894

sum

119896=119894minus119882+1119871 (119910119896 119910119896) (34)

We also evaluate the performance results of each competingalgorithm with the Gamma classifier using a statistical testWe use the Wilcoxon test [48] to assess the statistical sig-nificance of the results The Wilcoxon signed-ranks test is anonparametric test which can be used to rank the differencesin performances of two classifiers for each dataset ignoringthe signs and comparing the ranks for the positive and thenegative differences

27 Experimental Setup To ensure valid comparison of clas-sification performance the same conditions and validationschemes were applied in each experiment Classificationperformance of each of the compared algorithms was cal-culated using the prequential error approach These resultswill be used to compare the performance of our proposaland other data stream classification algorithms We haveaimed at choosing representative algorithms from differentapproaches covering the state of the art in data stream classi-fication such as Hoeffding Tree IBL Naıve Bayes with DDM(Drift Detection Method) SVM and ensemble methods Allof these algorithms are implemented in theMOA frameworkFurther details on the implementation of these algorithmscan be found in [44] For all data streams with exceptionof the Electricity dataset the experiments were executed 10times and the average of the performance and executiontime results are presented Records in the Electricity datasethave a temporal relation that should not be altered forthis reason with this specific dataset the experiment wasexecuted just one time The Bank and CoverType datasets

were reordered 10 times using the randomized filter fromthe WEKA platform [49] for each experiment For thesynthetics data streams (Agrawal Rotating Hyperplane andRandomRBF) 10 data streams were generated for each oneof them using different random seeds in MOA Performancewas calculated using the prequential error introduced in theprevious section We also measure the total time used forclassification to evaluate the efficiency of each algorithm Allexperiments were conducted using a personal computer withan Intel Core i3-2100 processor running Ubuntu 1304 64-bitoperating system with 4096GB of RAM

3 Results and Discussion

In this section we present and discuss the results obtainedduring the experimental phase throughout which six datastreams three real and three synthetic were used to obtainthe classification and time performance for each of thecompared classification algorithms First we evaluate thesensibility of the proposed method to the change on thewindows size which is a parameter that can largely influencethe performance of the classifier Table 3 shows the averageaccuracy of the DS-Gamma classifier for different windowsizes best accuracy for each data stream is emphasized withboldface Table 4 shows the average total time used to classifyeach data stream The results of the average performancewith indicator of the standard deviation are also depicted inFigure 1

In general the DS-Gamma classifier performanceimproved as the window size increases with the exceptionof Electricity data stream for this data stream performanceis better with smaller window sizes It is worth noting thatthe DS-Gamma classifier achieved its best performance forthe Electricity data stream with a window size of 50 witha performance of 8912 This performance is better thanall the performances achieved by all the other comparedalgorithms with a window size = 1000 For the other datastreams the best performance was generally obtained withwindow sizes between 500 and 1000 With window sizesgreater than 1000 performance keeps stable and in somecases even degrades as we can observe in the results fromthe Bank and Hyperplane data streams in Figure 1 Figure 2shows the average classification time for different windowsize A linear trend line (dotted line) and the coefficient ofdetermination 1198772 were included in the graphic of each datastream The 1198772 values ranging between 09743 and 09866show that time is growing in linear way as the size of thewindow increases

To evaluate the DS-Gamma classifier we compared itsperformance with other data stream classifiers Table 5 pre-sents the performance results of each evaluated algorithm oneach data stream including the standard deviation Best per-formance is emphasized with boldface We also include boxplot with the same results in Figure 3 For the Agrawal datastreamHoeffdingTree and theDS-Gamma classifier achievedthe two best performances OzaBoostAdwin presents the bestperformance for the CoverType and Electricity data streamsbut this ensemble classifier exhibits the highest classificationtime for all the data streams as we can observe in Table 6 and

Mathematical Problems in Engineering 11

Table 3 Average accuracy of DS-Gamma for different windows size

Dataset Window size50 100 200 500 750 1000 1250 1500 1750 2000

Agrawal 7964 8597 8986 9259 9288 9282 9267 9251 9232 9207Bank 6819 7340 7705 7913 7912 7896 7860 7824 7776 7737CoverType 3821 3761 3970 4430 4659 4777 4862 4919 4951 4974Electricity 8912 8526 8171 7984 7858 7714 7636 7594 7531 7449Hyperplane 7825 8265 8528 8656 8655 8630 8598 8565 8534 8500RandomRBF 6461 6739 6933 7056 7081 7085 7074 7063 7050 7033

Table 4 Averaged classification time (seconds) of DS-Gamma for different windows size

Dataset Window size50 100 200 500 750 1000 1250 1500 1750 2000

Agrawal 224 450 793 1926 2804 3900 4592 5757 6390 7349Bank 066 114 207 475 701 917 1154 1455 1583 1787CoverType 8347 14081 25120 34005 25412 33503 42894 52239 66924 67699Electricity 034 053 098 233 356 463 599 705 830 949Hyperplane 224 401 756 1819 2676 3028 4428 5296 6187 7063RandomRBF 279 409 773 1854 2735 3072 4502 5412 6223 7104

Table 5 Averaged accuracy () comparison

Classifier Agrawal Bank CoverType Electricity Hyperplane RandomRBFAcc 120590 Acc 120590 Acc 120590 Acc 120590 Acc 120590 Acc 120590

Bayes-DDM 8816 019 8338 141 6761 681 8105 mdash 8602 139 7187 382Perceptron DDM 6720 011 7343 1053 7076 555 7861 mdash 8565 149 7188 593Hoeffding Tree 9439 024 8852 011 7229 268 8241 mdash 8241 562 8672 169Active Classifier 9082 162 8821 014 6731 103 7859 mdash 7950 655 7532 270IBL-DS 8676 048 8652 021 6173 946 7534 mdash 8420 077 9332 121AWE 9217 059 8790 038 7616 366 7836 mdash 8588 165 9198 114OzaBoostAdwin 8711 262 8767 050 7855 497 8792 mdash 8321 191 9149 114SGD 6720 011 7656 166 5901 737 8452 mdash 8319 042 6174 331SPegasos 5582 017 7746 199 5047 517 5760 mdash 5280 608 5333 313DS-Gamma 9282 010 7896 022 4777 431 7714 mdash 8630 135 7085 254

Table 6 Averaged classification time (seconds) comparison

Classifier Agrawal Bank CoverType Electricity Hyperplane RandomRBFTime 120590 Time 120590 Time 120590 Time 120590 Time 120590 Time 120590

Bayes-DDM 062 011 039 008 1776 162 061 mdash 070 001 069 001Perceptron DDM 040 002 020 005 828 010 019 mdash 045 002 044 002Hoeffding Tree 098 023 054 003 3005 1106 078 mdash 136 005 140 005Active Classifier 064 003 026 002 1514 074 023 mdash 072 004 070 002IBL-DS 5319 211 3851 826 105760 6095 861 mdash 8639 096 8810 106AWE 1432 020 663 030 101247 30441 670 mdash 1803 364 3752 303OzaBoostAdwin 19262 4447 4693 1056 1058973 389664 3975 mdash 22399 5134 5704 1666SGD 014 004 011 0009 4035 007 012 mdash 027 002 027 002SPegasos 022 0006 011 0016 1684 012 012 mdash 031 005 024 001DS-Gamma 3900 358 916 005 33503 1291 463 mdash 3028 079 3072 094

12 Mathematical Problems in Engineering

Agrawal data stream

Window size

95

90

85

80

7550 100 200 500 750 1000 1250 1500 1750 2000

Hyperplane data stream

90

85

80

70

75

65

Window size50 100 200 500 750 1000 1250 1500 1750 2000

Bank data stream85

80

70

75

6550 100 200 500 750 1000 1250 1500 1750 2000

Window size

Window size

90

85

80

70

75

65

60

50

55

50 100 200 500 750 1000 1250 1500 1750 2000

RandomRBF data stream

60

50

40

30

20

10

0

Window size50 100 200 500 750 1000 1250 1500 1750 2000

CoverType data stream

90

95

85

80

70

75

6550 100 200 500 750 1000 1250 1500 1750 2000

Window size

Electricity data stream

Figure 1 Average accuracy of DS-Gamma for different windows size

Figure 4 This can be a serious disadvantage especially indata stream scenarios The lowest times are achieved by theperceptron with Drift Detection Method (DDM) but itsperformance is generally low when compared to the othermethods A fact that caught our attention was the low perfor-mance of the DS-Gamma classifier for the CoverType datastream This data stream presents a heavy class imbalancethat seems to be negatively affecting the performance of theclassifier As we can observe in step (8) of the DS-Gamma

classifier algorithm classification relies on a weighted sumper class when a class greatly outnumbers the other(s) thissum will be biased to the class with most members

To statistically compare the results obtained by the DS-Gamma classifier and the other evaluated algorithms theWilcoxon signed-ranks test was used Table 7 shows thevalues of the asymptotic significance (2-sided) obtained bythe Wilcoxon test using 120572 = 005 The obtained values areall greater than 120572 so the null hypothesis is not rejected and

Mathematical Problems in Engineering 13

Hyperplane data stream

Window size

Window size

CoverType data stream Electricity data stream

RandomRBF data stream

Agrawal data stream Bank data stream80

70

60

50

40

30

20

20

15

10

5

010

050 100 200 500 750 1000 1250 1500 1750 2000

50 100 200 500 750 1000 1250 1500 1750 2000

Window size50 100 200 500 750 1000 1250 1500 1750 2000 Window size

50 100 200 500 750 1000 1250 1500 1750 2000

Window size50 100 200 500 750 1000 1250 1500 1750 2000

Window size50 100 200 500 750 1000 1250 1500 1750 2000

minus10

70

60

50

40

30

20

10

0

550

450

350

250

150

50

10

8

6

4

2

0

minus10

minus50

70

60

50

40

30

20

10

0

minus10

minus2

minus5

R

2

= 09866

R

2

= 09854

R

2

= 09804

R

2

= 09796

R

2

= 09743

R

2

= 09842

Figure 2 Averaged classification time (seconds) of DS-Gamma for different windows size

we can infer that there are no significant differences amongthe compared algorithms

4 Conclusions

In this paper we describe a novel method for the task ofpattern classification over a continuous data stream based onan associative model The proposed method is a supervisedpattern recognition model based on the Gamma classifierDuring the experimental phase the proposed method wastested using different data stream scenarios with real andsynthetic data streams The proposed method presented verypromising results in terms of performance and classification

Table 7 Wilcoxon results

Compared algorithms Asymptotic significance (2-sided)Gamma-Bayes-DDM 0345Gamma-Perceptron DDM 0917Gamma-Hoeffding Tree 0116Gamma-Active Classifier 0463Gamma-IBL-DS 0345Gamma-AWE 0116Gamma-OzaBoostAdwin 0116Gamma-SGD 0600Gamma-SPegasos 0075

14 Mathematical Problems in Engineering

Agrawal data stream

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

95

90

85

80

75

70

65

60

55

50

Bank data stream

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

90

80

70

60

50

40

Hyperplane data stream

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

95

90

85

80

75

70

65

60

55

45

50

RandomRBF data stream

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

95

90

85

80

75

70

65

60

55

45

50

CoverType data stream

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

90

85

80

75

70

65

60

55

45

50

Electricity data stream100

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

90

80

70

60

50

40

30

20

10

0

Figure 3 Averaged accuracy () comparison

time when compared with some of the state-of-the-art algo-rithms for data stream classification

We also study the effect of the window size over the clas-sification accuracy and time In general accuracy improvedas the window size increased We observed that best perfor-mances were obtained with window sizes between 500 and

1000 For larger window sizes performance remained stableand in some cases even declined

Since the proposed method based its classification ona weighted sum per class it is strongly affected by severeclass imbalance as we can observe from the results obtainedwith the experiments performed with the CoverType dataset

Mathematical Problems in Engineering 15

Agrawal data stream

200

150

100

50

0

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

1

10

100

Hyperplane data stream

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

1

10

100

1000

10000

100000CoverType data stream

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

0102030405060708090

RandomRBF data stream

0

10

20

30

40

50

60Bank data stream

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

05

10152025303540

Electricity data stream

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

Figure 4 Averaged classification time (seconds) comparison

a heavily imbalance dataset where the accuracy of theclassifier was the worst of the compared algorithms As futurework a mechanism to address severe class imbalance shouldbe integrated to improve classification accuracy

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

The authors would like to thank the Instituto PolitecnicoNacional (Secretarıa Academica COFAA SIP CIC andCIDETEC) the CONACyT and SNI for their economicalsupport to develop this workThe support of theUniversity ofPorto was provided during the research stay from August toDecember 2014 The authors also want to thank Dr YennyVilluendas Rey for her contribution to perform the statisticalsignificance analysis

16 Mathematical Problems in Engineering

References

[1] V Turner J FGantzD Reinsel and SMinton ldquoThedigital uni-verse of opportunities rich data and the increasing value of theinternet of thingsrdquo Tech Rep IDC Information andData 2014httpwwwemccomleadershipdigital-universe2014iviewindexhtm

[2] A Bifet Adaptive Stream Mining Pattern Learning and Miningfrom Evolving Data Streams IOS Press Amsterdam The Neth-erlands 2010

[3] J Gama Knowledge Discovery from Data Streams Chapman ampHallCRC 2010

[4] J Beringer and E Hullermeier ldquoEfficient instance-based learn-ing on data streamsrdquo Intelligent Data Analysis vol 11 no 6 pp627ndash650 2007

[5] J Gama I Zliobaite A Bifet M Pechenizkiy and A Bou-chachia ldquoA survey on concept drift adaptationrdquo ACM Comput-ing Surveys vol 46 no 4 article 44 2014

[6] P Domingos and G Hulten ldquoMining high-speed data streamsrdquoinProceedings of the 6thACMSIGKDD International Conferenceon Knowledge Discovery and Data Mining pp 71ndash80 BostonMass USA August 2000

[7] G Hulten L Spencer and P Domingos ldquoMining time-chang-ing data streamsrdquo in Proceedings of the 7th ACM SIGKDD Inter-national Conference on Knowledge Discovery and Data Mining(KDD rsquo01) pp 97ndash106 San Francisco Calif USA August 2001

[8] C Liang Y Zhang P Shi and Z Hu ldquoLearning very fast deci-sion tree from uncertain data streams with positive and unla-beled samplesrdquo Information Sciences vol 213 pp 50ndash67 2012

[9] L Rutkowski M Jaworski L Pietruczuk and P Duda ldquoDeci-sion trees for mining data streams based on the gaussianapproximationrdquo IEEE Transactions on Knowledge and DataEngineering vol 26 no 1 pp 108ndash119 2014

[10] G Widmer andM Kubat ldquoLearning in the presence of conceptdrift and hidden contextsrdquo Machine Learning vol 23 no 1 pp69ndash101 1996

[11] F Ferrer-Troyano J S Aguilar-Ruiz and J C RiquelmeldquoData streams classification by incremental rule learning withparameterized generalizationrdquo in Proceedings of the 2006 ACMSymposium on Applied Computing pp 657ndash661 Dijon FranceApril 2006

[12] D Mena-Torres and J S Aguilar-Ruiz ldquoA similarity-basedapproach for data stream classificationrdquo Expert Systems withApplications vol 41 no 9 pp 4224ndash4234 2014

[13] A Shaker and E Hullermeier ldquoIBLStreams a system forinstance-based classification and regression on data streamsrdquoEvolving Systems vol 3 no 4 pp 235ndash249 2012

[14] A Ghazikhani R Monsefi and H S Yazdi ldquoRecursive leastsquare perceptron model for non-stationary and imbalanceddata stream classificationrdquo Evolving Systems vol 4 no 2 pp119ndash131 2013

[15] A Ghazikhani R Monsefi and H S Yazdi ldquoEnsemble ofonline neural networks for non-stationary and imbalanced datastreamsrdquo Neurocomputing vol 122 pp 535ndash544 2013

[16] A Sancho-Asensio A Orriols-Puig and E GolobardesldquoRobust on-line neural learning classifier system for data streamclassification tasksrdquo Soft Computing vol 18 no 8 pp 1441ndash14612014

[17] D M Farid L Zhang A Hossain et al ldquoAn adaptive ensembleclassifier for mining concept drifting data streamsrdquo ExpertSystems with Applications vol 40 no 15 pp 5895ndash5906 2013

[18] D Brzezinski and J Stefanowski ldquoCombining block-based andonline methods in learning ensembles from concept driftingdata streamsrdquo Information Sciences vol 265 pp 50ndash67 2014

[19] G Ditzler and R Polikar ldquoIncremental learning of conceptdrift from streaming imbalanced datardquo IEEE Transactions onKnowledge and Data Engineering vol 25 no 10 pp 2283ndash23012013

[20] M J Hosseini Z Ahmadi andH Beigy ldquoUsing a classifier poolin accuracy based tracking of recurring concepts in data streamclassificationrdquo Evolving Systems vol 4 no 1 pp 43ndash60 2013

[21] N Saunier and S Midenet ldquoCreating ensemble classifiersthrough order and incremental data selection in a streamrdquo Pat-tern Analysis and Applications vol 16 no 3 pp 333ndash347 2013

[22] MMMasud J Gao L Khan J Han and BMThuraisinghamldquoClassification and novel class detection in concept-driftingdata streams under time constraintsrdquo IEEE Transactions onKnowledge and Data Engineering vol 23 no 6 pp 859ndash8742011

[23] A Alzghoul M Lofstrand and B Backe ldquoData stream fore-casting for system fault predictionrdquo Computers and IndustrialEngineering vol 62 no 4 pp 972ndash978 2012

[24] J Kranjc J Smailovic V Podpecan M Grcar M Znidarsicand N Lavrac ldquoActive learning for sentiment analysis on datastreams methodology and workflow implementation in theClowdFlows platformrdquo Information Processing and Manage-ment vol 51 no 2 pp 187ndash203 2015

[25] J Smailovic M Grcar N Lavrac and M Znidarsic ldquoStream-based active learning for sentiment analysis in the financialdomainrdquo Information Sciences vol 285 pp 181ndash203 2014

[26] S Shalev-Shwartz Y Singer N Srebro and A Cotter ldquoPegasosprimal estimated sub-gradient solver for SVMrdquo in Proceedingsof the 24th International Conference on Machine Learning pp807ndash814 June 2007

[27] Z Miller B Dickinson W Deitrick W Hu and A H WangldquoTwitter spammer detection using data stream clusteringrdquoInformation Sciences vol 260 pp 64ndash73 2014

[28] B Krawczyk and M Wozniak ldquoIncremental weighted one-class classifier for mining stationary data streamsrdquo Journal ofComputational Science vol 9 pp 19ndash25 2015

[29] J Gama ldquoA survey on learning from data streams current andfuture trendsrdquo Progress in Artificial Intelligence vol 1 no 1 pp45ndash55 2012

[30] L I Kuncheva ldquoChange detection in streaming multivariatedata using likelihood detectorsrdquo IEEE Transactions on Knowl-edge and Data Engineering vol 25 no 5 pp 1175ndash1180 2013

[31] G J Ross N M Adams D K Tasoulis and D J Hand ldquoExpo-nentially weighted moving average charts for detecting conceptdriftrdquoPatternRecognition Letters vol 33 no 2 pp 191ndash198 2012

[32] J Gama P Medas G Castillo and P Rodrigues ldquoLearning withdrift detectionrdquo in Advances in Artificial IntelligencemdashSBIA2004 vol 3171 of Lecture Notes in Computer Science pp 286ndash295 Springer Berlin Germany 2004

[33] X Wu P Li and X Hu ldquoLearning from concept drifting datastreams with unlabeled datardquoNeurocomputing vol 92 pp 145ndash155 2012

[34] X Qin Y Zhang C Li and X Li ldquoLearning from data streamswith only positive and unlabeled datardquo Journal of IntelligentInformation Systems vol 40 no 3 pp 405ndash430 2013

Mathematical Problems in Engineering 17

[35] M M Masud C Woolam J Gao et al ldquoFacing the reality ofdata stream classification coping with scarcity of labeled datardquoKnowledge and Information Systems vol 33 no 1 pp 213ndash2442012

[36] J Read A Bifet G Holmes and B Pfahringer ldquoScalable andefficient multi-label classification for evolving data streamsrdquoMachine Learning vol 88 no 1-2 pp 243ndash272 2012

[37] I Lopez-Yanez A J Arguelles-Cruz O Camacho-Nieto andC Yanez-Marquez ldquoPollutants time-series prediction usingthe gamma classifierrdquo International Journal of ComputationalIntelligence Systems vol 4 no 4 pp 680ndash711 2011

[38] M E Acevedo-Mosqueda C Yanez-Marquez and I Lopez-Yanez ldquoAlpha-beta bidirectional associative memories theoryand applicationsrdquo Neural Processing Letters vol 26 no 1 pp 1ndash40 2007

[39] C Yanez-Marquez E M Felipe-Riveron I Lopez-Yanez andR Flores-Carapia ldquoA novel approach to automatic color match-ingrdquo in Progress in Pattern Recognition Image Analysis andApplications vol 4225 of Lecture Notes in Computer Science pp529ndash538 Springer Berlin Germany 2006

[40] I Lopez-Yanez L Sheremetov and C Yanez-Marquez ldquoAnovel associative model for time series data miningrdquo PatternRecognition Letters vol 41 no 1 pp 23ndash33 2014

[41] A Bifet G Holmes R Kirkby and B Pfahringer ldquoData streammining a practical approachrdquo Tech RepUniversity ofWaikato2011

[42] Massive Online Analysis (MOA) 2014 httpmoacmswaikatoacnzdatasets

[43] K Bache and M Lichman UCI Machine Learning RepositorySchool of Information and Computer Science University ofCalifornia Irvine Calif USA 2013 httparchiveicsucieduml

[44] A Bifet R Kirkby P Kranen and P Reutemann ldquoMassiveOnline Analysis Manualrdquo 2012 httpmoacmswaikatoacnzdocumentation

[45] M Harries ldquoSplice-2 comparative evaluation electricity pric-ingrdquo Tech Rep University of South Wales Cardiff UK 1999

[46] R Agrawal T Imielinski and A Swami ldquoDatabase mining aperformance perspectiverdquo IEEE Transactions on Knowledge andData Engineering vol 5 no 6 pp 914ndash925 1993

[47] J Gama R Sebastiao and P P Rodrigues ldquoOn evaluatingstream learning algorithmsrdquo Machine Learning vol 90 no 3pp 317ndash346 2013

[48] F Wilcoxon ldquoIndividual comparisons by ranking methodsrdquoBiometrics Bulletin vol 1 no 6 pp 80ndash83 1945

[49] M Hall E Frank G Holmes B Pfahringer P Reutemannand I H Witten WEKA 3 Data Mining Software in Java 2010httpwwwcswaikatoacnzmlweka

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 10: Research Article Data Stream Classification Based …downloads.hindawi.com/journals/mpe/2015/939175.pdfuncertain data [ ], classi cation with unlabeled instances [ , , ], novel class

10 Mathematical Problems in Engineering

262 Interleaved Test-Then-Train or Prequential Each indi-vidual example can be used to test the model before it is usedfor training and from this the accuracy can be incrementallyupdated When intentionally performed in this order themodel is always being tested on examples it has not seen

Holdout evaluation gives a more accurate estimationof the performance of the classifier on more recent dataHowever it requires recent test data that is sometimes difficultto obtain while the prequential evaluation has the advantagethat no holdout set is needed for testing making maximumuse of the available data In [47] Gama et al present a generalframework to assess data stream algorithms They studieddifferent mechanisms of evaluation that include holdoutestimator the prequential error and the prequential errorestimated over a sliding window or using fading factorsTheystated that the use of prequential error with forgetting mech-anisms reveals to be advantageous in assessing performanceand in comparing stream learning algorithms Since ourproposedmethod isworkingwith a slidingwindowapproachwe have selected the prequential error over sliding windowfor evaluation of the proposedmethodThe prequential error119875119882 for a sliding window of size 119882 computed at time 119894 isbased on an accumulated sumof a 0-1 loss function119871 betweenthe predicted 119910119896 values and the observed values 119910119896 using thefollowing expression [47]

119875119882 (119894) =1119882

119894

sum

119896=119894minus119882+1119871 (119910119896 119910119896) (34)

We also evaluate the performance results of each competingalgorithm with the Gamma classifier using a statistical testWe use the Wilcoxon test [48] to assess the statistical sig-nificance of the results The Wilcoxon signed-ranks test is anonparametric test which can be used to rank the differencesin performances of two classifiers for each dataset ignoringthe signs and comparing the ranks for the positive and thenegative differences

27 Experimental Setup To ensure valid comparison of clas-sification performance the same conditions and validationschemes were applied in each experiment Classificationperformance of each of the compared algorithms was cal-culated using the prequential error approach These resultswill be used to compare the performance of our proposaland other data stream classification algorithms We haveaimed at choosing representative algorithms from differentapproaches covering the state of the art in data stream classi-fication such as Hoeffding Tree IBL Naıve Bayes with DDM(Drift Detection Method) SVM and ensemble methods Allof these algorithms are implemented in theMOA frameworkFurther details on the implementation of these algorithmscan be found in [44] For all data streams with exceptionof the Electricity dataset the experiments were executed 10times and the average of the performance and executiontime results are presented Records in the Electricity datasethave a temporal relation that should not be altered forthis reason with this specific dataset the experiment wasexecuted just one time The Bank and CoverType datasets

were reordered 10 times using the randomized filter fromthe WEKA platform [49] for each experiment For thesynthetics data streams (Agrawal Rotating Hyperplane andRandomRBF) 10 data streams were generated for each oneof them using different random seeds in MOA Performancewas calculated using the prequential error introduced in theprevious section We also measure the total time used forclassification to evaluate the efficiency of each algorithm Allexperiments were conducted using a personal computer withan Intel Core i3-2100 processor running Ubuntu 1304 64-bitoperating system with 4096GB of RAM

3 Results and Discussion

In this section we present and discuss the results obtainedduring the experimental phase throughout which six datastreams three real and three synthetic were used to obtainthe classification and time performance for each of thecompared classification algorithms First we evaluate thesensibility of the proposed method to the change on thewindows size which is a parameter that can largely influencethe performance of the classifier Table 3 shows the averageaccuracy of the DS-Gamma classifier for different windowsizes best accuracy for each data stream is emphasized withboldface Table 4 shows the average total time used to classifyeach data stream The results of the average performancewith indicator of the standard deviation are also depicted inFigure 1

In general the DS-Gamma classifier performanceimproved as the window size increases with the exceptionof Electricity data stream for this data stream performanceis better with smaller window sizes It is worth noting thatthe DS-Gamma classifier achieved its best performance forthe Electricity data stream with a window size of 50 witha performance of 8912 This performance is better thanall the performances achieved by all the other comparedalgorithms with a window size = 1000 For the other datastreams the best performance was generally obtained withwindow sizes between 500 and 1000 With window sizesgreater than 1000 performance keeps stable and in somecases even degrades as we can observe in the results fromthe Bank and Hyperplane data streams in Figure 1 Figure 2shows the average classification time for different windowsize A linear trend line (dotted line) and the coefficient ofdetermination 1198772 were included in the graphic of each datastream The 1198772 values ranging between 09743 and 09866show that time is growing in linear way as the size of thewindow increases

To evaluate the DS-Gamma classifier we compared itsperformance with other data stream classifiers Table 5 pre-sents the performance results of each evaluated algorithm oneach data stream including the standard deviation Best per-formance is emphasized with boldface We also include boxplot with the same results in Figure 3 For the Agrawal datastreamHoeffdingTree and theDS-Gamma classifier achievedthe two best performances OzaBoostAdwin presents the bestperformance for the CoverType and Electricity data streamsbut this ensemble classifier exhibits the highest classificationtime for all the data streams as we can observe in Table 6 and

Mathematical Problems in Engineering 11

Table 3 Average accuracy of DS-Gamma for different windows size

Dataset Window size50 100 200 500 750 1000 1250 1500 1750 2000

Agrawal 7964 8597 8986 9259 9288 9282 9267 9251 9232 9207Bank 6819 7340 7705 7913 7912 7896 7860 7824 7776 7737CoverType 3821 3761 3970 4430 4659 4777 4862 4919 4951 4974Electricity 8912 8526 8171 7984 7858 7714 7636 7594 7531 7449Hyperplane 7825 8265 8528 8656 8655 8630 8598 8565 8534 8500RandomRBF 6461 6739 6933 7056 7081 7085 7074 7063 7050 7033

Table 4 Averaged classification time (seconds) of DS-Gamma for different windows size

Dataset Window size50 100 200 500 750 1000 1250 1500 1750 2000

Agrawal 224 450 793 1926 2804 3900 4592 5757 6390 7349Bank 066 114 207 475 701 917 1154 1455 1583 1787CoverType 8347 14081 25120 34005 25412 33503 42894 52239 66924 67699Electricity 034 053 098 233 356 463 599 705 830 949Hyperplane 224 401 756 1819 2676 3028 4428 5296 6187 7063RandomRBF 279 409 773 1854 2735 3072 4502 5412 6223 7104

Table 5 Averaged accuracy () comparison

Classifier Agrawal Bank CoverType Electricity Hyperplane RandomRBFAcc 120590 Acc 120590 Acc 120590 Acc 120590 Acc 120590 Acc 120590

Bayes-DDM 8816 019 8338 141 6761 681 8105 mdash 8602 139 7187 382Perceptron DDM 6720 011 7343 1053 7076 555 7861 mdash 8565 149 7188 593Hoeffding Tree 9439 024 8852 011 7229 268 8241 mdash 8241 562 8672 169Active Classifier 9082 162 8821 014 6731 103 7859 mdash 7950 655 7532 270IBL-DS 8676 048 8652 021 6173 946 7534 mdash 8420 077 9332 121AWE 9217 059 8790 038 7616 366 7836 mdash 8588 165 9198 114OzaBoostAdwin 8711 262 8767 050 7855 497 8792 mdash 8321 191 9149 114SGD 6720 011 7656 166 5901 737 8452 mdash 8319 042 6174 331SPegasos 5582 017 7746 199 5047 517 5760 mdash 5280 608 5333 313DS-Gamma 9282 010 7896 022 4777 431 7714 mdash 8630 135 7085 254

Table 6 Averaged classification time (seconds) comparison

Classifier Agrawal Bank CoverType Electricity Hyperplane RandomRBFTime 120590 Time 120590 Time 120590 Time 120590 Time 120590 Time 120590

Bayes-DDM 062 011 039 008 1776 162 061 mdash 070 001 069 001Perceptron DDM 040 002 020 005 828 010 019 mdash 045 002 044 002Hoeffding Tree 098 023 054 003 3005 1106 078 mdash 136 005 140 005Active Classifier 064 003 026 002 1514 074 023 mdash 072 004 070 002IBL-DS 5319 211 3851 826 105760 6095 861 mdash 8639 096 8810 106AWE 1432 020 663 030 101247 30441 670 mdash 1803 364 3752 303OzaBoostAdwin 19262 4447 4693 1056 1058973 389664 3975 mdash 22399 5134 5704 1666SGD 014 004 011 0009 4035 007 012 mdash 027 002 027 002SPegasos 022 0006 011 0016 1684 012 012 mdash 031 005 024 001DS-Gamma 3900 358 916 005 33503 1291 463 mdash 3028 079 3072 094

12 Mathematical Problems in Engineering

Agrawal data stream

Window size

95

90

85

80

7550 100 200 500 750 1000 1250 1500 1750 2000

Hyperplane data stream

90

85

80

70

75

65

Window size50 100 200 500 750 1000 1250 1500 1750 2000

Bank data stream85

80

70

75

6550 100 200 500 750 1000 1250 1500 1750 2000

Window size

Window size

90

85

80

70

75

65

60

50

55

50 100 200 500 750 1000 1250 1500 1750 2000

RandomRBF data stream

60

50

40

30

20

10

0

Window size50 100 200 500 750 1000 1250 1500 1750 2000

CoverType data stream

90

95

85

80

70

75

6550 100 200 500 750 1000 1250 1500 1750 2000

Window size

Electricity data stream

Figure 1 Average accuracy of DS-Gamma for different windows size

Figure 4 This can be a serious disadvantage especially indata stream scenarios The lowest times are achieved by theperceptron with Drift Detection Method (DDM) but itsperformance is generally low when compared to the othermethods A fact that caught our attention was the low perfor-mance of the DS-Gamma classifier for the CoverType datastream This data stream presents a heavy class imbalancethat seems to be negatively affecting the performance of theclassifier As we can observe in step (8) of the DS-Gamma

classifier algorithm classification relies on a weighted sumper class when a class greatly outnumbers the other(s) thissum will be biased to the class with most members

To statistically compare the results obtained by the DS-Gamma classifier and the other evaluated algorithms theWilcoxon signed-ranks test was used Table 7 shows thevalues of the asymptotic significance (2-sided) obtained bythe Wilcoxon test using 120572 = 005 The obtained values areall greater than 120572 so the null hypothesis is not rejected and

Mathematical Problems in Engineering 13

Hyperplane data stream

Window size

Window size

CoverType data stream Electricity data stream

RandomRBF data stream

Agrawal data stream Bank data stream80

70

60

50

40

30

20

20

15

10

5

010

050 100 200 500 750 1000 1250 1500 1750 2000

50 100 200 500 750 1000 1250 1500 1750 2000

Window size50 100 200 500 750 1000 1250 1500 1750 2000 Window size

50 100 200 500 750 1000 1250 1500 1750 2000

Window size50 100 200 500 750 1000 1250 1500 1750 2000

Window size50 100 200 500 750 1000 1250 1500 1750 2000

minus10

70

60

50

40

30

20

10

0

550

450

350

250

150

50

10

8

6

4

2

0

minus10

minus50

70

60

50

40

30

20

10

0

minus10

minus2

minus5

R

2

= 09866

R

2

= 09854

R

2

= 09804

R

2

= 09796

R

2

= 09743

R

2

= 09842

Figure 2 Averaged classification time (seconds) of DS-Gamma for different windows size

we can infer that there are no significant differences amongthe compared algorithms

4 Conclusions

In this paper we describe a novel method for the task ofpattern classification over a continuous data stream based onan associative model The proposed method is a supervisedpattern recognition model based on the Gamma classifierDuring the experimental phase the proposed method wastested using different data stream scenarios with real andsynthetic data streams The proposed method presented verypromising results in terms of performance and classification

Table 7 Wilcoxon results

Compared algorithms Asymptotic significance (2-sided)Gamma-Bayes-DDM 0345Gamma-Perceptron DDM 0917Gamma-Hoeffding Tree 0116Gamma-Active Classifier 0463Gamma-IBL-DS 0345Gamma-AWE 0116Gamma-OzaBoostAdwin 0116Gamma-SGD 0600Gamma-SPegasos 0075

14 Mathematical Problems in Engineering

Agrawal data stream

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

95

90

85

80

75

70

65

60

55

50

Bank data stream

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

90

80

70

60

50

40

Hyperplane data stream

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

95

90

85

80

75

70

65

60

55

45

50

RandomRBF data stream

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

95

90

85

80

75

70

65

60

55

45

50

CoverType data stream

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

90

85

80

75

70

65

60

55

45

50

Electricity data stream100

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

90

80

70

60

50

40

30

20

10

0

Figure 3 Averaged accuracy () comparison

time when compared with some of the state-of-the-art algo-rithms for data stream classification

We also study the effect of the window size over the clas-sification accuracy and time In general accuracy improvedas the window size increased We observed that best perfor-mances were obtained with window sizes between 500 and

1000 For larger window sizes performance remained stableand in some cases even declined

Since the proposed method based its classification ona weighted sum per class it is strongly affected by severeclass imbalance as we can observe from the results obtainedwith the experiments performed with the CoverType dataset

Mathematical Problems in Engineering 15

Agrawal data stream

200

150

100

50

0

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

1

10

100

Hyperplane data stream

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

1

10

100

1000

10000

100000CoverType data stream

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

0102030405060708090

RandomRBF data stream

0

10

20

30

40

50

60Bank data stream

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

05

10152025303540

Electricity data stream

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

Figure 4 Averaged classification time (seconds) comparison

a heavily imbalance dataset where the accuracy of theclassifier was the worst of the compared algorithms As futurework a mechanism to address severe class imbalance shouldbe integrated to improve classification accuracy

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

The authors would like to thank the Instituto PolitecnicoNacional (Secretarıa Academica COFAA SIP CIC andCIDETEC) the CONACyT and SNI for their economicalsupport to develop this workThe support of theUniversity ofPorto was provided during the research stay from August toDecember 2014 The authors also want to thank Dr YennyVilluendas Rey for her contribution to perform the statisticalsignificance analysis

16 Mathematical Problems in Engineering

References

[1] V Turner J FGantzD Reinsel and SMinton ldquoThedigital uni-verse of opportunities rich data and the increasing value of theinternet of thingsrdquo Tech Rep IDC Information andData 2014httpwwwemccomleadershipdigital-universe2014iviewindexhtm

[2] A Bifet Adaptive Stream Mining Pattern Learning and Miningfrom Evolving Data Streams IOS Press Amsterdam The Neth-erlands 2010

[3] J Gama Knowledge Discovery from Data Streams Chapman ampHallCRC 2010

[4] J Beringer and E Hullermeier ldquoEfficient instance-based learn-ing on data streamsrdquo Intelligent Data Analysis vol 11 no 6 pp627ndash650 2007

[5] J Gama I Zliobaite A Bifet M Pechenizkiy and A Bou-chachia ldquoA survey on concept drift adaptationrdquo ACM Comput-ing Surveys vol 46 no 4 article 44 2014

[6] P Domingos and G Hulten ldquoMining high-speed data streamsrdquoinProceedings of the 6thACMSIGKDD International Conferenceon Knowledge Discovery and Data Mining pp 71ndash80 BostonMass USA August 2000

[7] G Hulten L Spencer and P Domingos ldquoMining time-chang-ing data streamsrdquo in Proceedings of the 7th ACM SIGKDD Inter-national Conference on Knowledge Discovery and Data Mining(KDD rsquo01) pp 97ndash106 San Francisco Calif USA August 2001

[8] C Liang Y Zhang P Shi and Z Hu ldquoLearning very fast deci-sion tree from uncertain data streams with positive and unla-beled samplesrdquo Information Sciences vol 213 pp 50ndash67 2012

[9] L Rutkowski M Jaworski L Pietruczuk and P Duda ldquoDeci-sion trees for mining data streams based on the gaussianapproximationrdquo IEEE Transactions on Knowledge and DataEngineering vol 26 no 1 pp 108ndash119 2014

[10] G Widmer andM Kubat ldquoLearning in the presence of conceptdrift and hidden contextsrdquo Machine Learning vol 23 no 1 pp69ndash101 1996

[11] F Ferrer-Troyano J S Aguilar-Ruiz and J C RiquelmeldquoData streams classification by incremental rule learning withparameterized generalizationrdquo in Proceedings of the 2006 ACMSymposium on Applied Computing pp 657ndash661 Dijon FranceApril 2006

[12] D Mena-Torres and J S Aguilar-Ruiz ldquoA similarity-basedapproach for data stream classificationrdquo Expert Systems withApplications vol 41 no 9 pp 4224ndash4234 2014

[13] A Shaker and E Hullermeier ldquoIBLStreams a system forinstance-based classification and regression on data streamsrdquoEvolving Systems vol 3 no 4 pp 235ndash249 2012

[14] A Ghazikhani R Monsefi and H S Yazdi ldquoRecursive leastsquare perceptron model for non-stationary and imbalanceddata stream classificationrdquo Evolving Systems vol 4 no 2 pp119ndash131 2013

[15] A Ghazikhani R Monsefi and H S Yazdi ldquoEnsemble ofonline neural networks for non-stationary and imbalanced datastreamsrdquo Neurocomputing vol 122 pp 535ndash544 2013

[16] A Sancho-Asensio A Orriols-Puig and E GolobardesldquoRobust on-line neural learning classifier system for data streamclassification tasksrdquo Soft Computing vol 18 no 8 pp 1441ndash14612014

[17] D M Farid L Zhang A Hossain et al ldquoAn adaptive ensembleclassifier for mining concept drifting data streamsrdquo ExpertSystems with Applications vol 40 no 15 pp 5895ndash5906 2013

[18] D Brzezinski and J Stefanowski ldquoCombining block-based andonline methods in learning ensembles from concept driftingdata streamsrdquo Information Sciences vol 265 pp 50ndash67 2014

[19] G Ditzler and R Polikar ldquoIncremental learning of conceptdrift from streaming imbalanced datardquo IEEE Transactions onKnowledge and Data Engineering vol 25 no 10 pp 2283ndash23012013

[20] M J Hosseini Z Ahmadi andH Beigy ldquoUsing a classifier poolin accuracy based tracking of recurring concepts in data streamclassificationrdquo Evolving Systems vol 4 no 1 pp 43ndash60 2013

[21] N Saunier and S Midenet ldquoCreating ensemble classifiersthrough order and incremental data selection in a streamrdquo Pat-tern Analysis and Applications vol 16 no 3 pp 333ndash347 2013

[22] MMMasud J Gao L Khan J Han and BMThuraisinghamldquoClassification and novel class detection in concept-driftingdata streams under time constraintsrdquo IEEE Transactions onKnowledge and Data Engineering vol 23 no 6 pp 859ndash8742011

[23] A Alzghoul M Lofstrand and B Backe ldquoData stream fore-casting for system fault predictionrdquo Computers and IndustrialEngineering vol 62 no 4 pp 972ndash978 2012

[24] J Kranjc J Smailovic V Podpecan M Grcar M Znidarsicand N Lavrac ldquoActive learning for sentiment analysis on datastreams methodology and workflow implementation in theClowdFlows platformrdquo Information Processing and Manage-ment vol 51 no 2 pp 187ndash203 2015

[25] J Smailovic M Grcar N Lavrac and M Znidarsic ldquoStream-based active learning for sentiment analysis in the financialdomainrdquo Information Sciences vol 285 pp 181ndash203 2014

[26] S Shalev-Shwartz Y Singer N Srebro and A Cotter ldquoPegasosprimal estimated sub-gradient solver for SVMrdquo in Proceedingsof the 24th International Conference on Machine Learning pp807ndash814 June 2007

[27] Z Miller B Dickinson W Deitrick W Hu and A H WangldquoTwitter spammer detection using data stream clusteringrdquoInformation Sciences vol 260 pp 64ndash73 2014

[28] B Krawczyk and M Wozniak ldquoIncremental weighted one-class classifier for mining stationary data streamsrdquo Journal ofComputational Science vol 9 pp 19ndash25 2015

[29] J Gama ldquoA survey on learning from data streams current andfuture trendsrdquo Progress in Artificial Intelligence vol 1 no 1 pp45ndash55 2012

[30] L I Kuncheva ldquoChange detection in streaming multivariatedata using likelihood detectorsrdquo IEEE Transactions on Knowl-edge and Data Engineering vol 25 no 5 pp 1175ndash1180 2013

[31] G J Ross N M Adams D K Tasoulis and D J Hand ldquoExpo-nentially weighted moving average charts for detecting conceptdriftrdquoPatternRecognition Letters vol 33 no 2 pp 191ndash198 2012

[32] J Gama P Medas G Castillo and P Rodrigues ldquoLearning withdrift detectionrdquo in Advances in Artificial IntelligencemdashSBIA2004 vol 3171 of Lecture Notes in Computer Science pp 286ndash295 Springer Berlin Germany 2004

[33] X Wu P Li and X Hu ldquoLearning from concept drifting datastreams with unlabeled datardquoNeurocomputing vol 92 pp 145ndash155 2012

[34] X Qin Y Zhang C Li and X Li ldquoLearning from data streamswith only positive and unlabeled datardquo Journal of IntelligentInformation Systems vol 40 no 3 pp 405ndash430 2013

Mathematical Problems in Engineering 17

[35] M M Masud C Woolam J Gao et al ldquoFacing the reality ofdata stream classification coping with scarcity of labeled datardquoKnowledge and Information Systems vol 33 no 1 pp 213ndash2442012

[36] J Read A Bifet G Holmes and B Pfahringer ldquoScalable andefficient multi-label classification for evolving data streamsrdquoMachine Learning vol 88 no 1-2 pp 243ndash272 2012

[37] I Lopez-Yanez A J Arguelles-Cruz O Camacho-Nieto andC Yanez-Marquez ldquoPollutants time-series prediction usingthe gamma classifierrdquo International Journal of ComputationalIntelligence Systems vol 4 no 4 pp 680ndash711 2011

[38] M E Acevedo-Mosqueda C Yanez-Marquez and I Lopez-Yanez ldquoAlpha-beta bidirectional associative memories theoryand applicationsrdquo Neural Processing Letters vol 26 no 1 pp 1ndash40 2007

[39] C Yanez-Marquez E M Felipe-Riveron I Lopez-Yanez andR Flores-Carapia ldquoA novel approach to automatic color match-ingrdquo in Progress in Pattern Recognition Image Analysis andApplications vol 4225 of Lecture Notes in Computer Science pp529ndash538 Springer Berlin Germany 2006

[40] I Lopez-Yanez L Sheremetov and C Yanez-Marquez ldquoAnovel associative model for time series data miningrdquo PatternRecognition Letters vol 41 no 1 pp 23ndash33 2014

[41] A Bifet G Holmes R Kirkby and B Pfahringer ldquoData streammining a practical approachrdquo Tech RepUniversity ofWaikato2011

[42] Massive Online Analysis (MOA) 2014 httpmoacmswaikatoacnzdatasets

[43] K Bache and M Lichman UCI Machine Learning RepositorySchool of Information and Computer Science University ofCalifornia Irvine Calif USA 2013 httparchiveicsucieduml

[44] A Bifet R Kirkby P Kranen and P Reutemann ldquoMassiveOnline Analysis Manualrdquo 2012 httpmoacmswaikatoacnzdocumentation

[45] M Harries ldquoSplice-2 comparative evaluation electricity pric-ingrdquo Tech Rep University of South Wales Cardiff UK 1999

[46] R Agrawal T Imielinski and A Swami ldquoDatabase mining aperformance perspectiverdquo IEEE Transactions on Knowledge andData Engineering vol 5 no 6 pp 914ndash925 1993

[47] J Gama R Sebastiao and P P Rodrigues ldquoOn evaluatingstream learning algorithmsrdquo Machine Learning vol 90 no 3pp 317ndash346 2013

[48] F Wilcoxon ldquoIndividual comparisons by ranking methodsrdquoBiometrics Bulletin vol 1 no 6 pp 80ndash83 1945

[49] M Hall E Frank G Holmes B Pfahringer P Reutemannand I H Witten WEKA 3 Data Mining Software in Java 2010httpwwwcswaikatoacnzmlweka

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 11: Research Article Data Stream Classification Based …downloads.hindawi.com/journals/mpe/2015/939175.pdfuncertain data [ ], classi cation with unlabeled instances [ , , ], novel class

Mathematical Problems in Engineering 11

Table 3 Average accuracy of DS-Gamma for different windows size

Dataset Window size50 100 200 500 750 1000 1250 1500 1750 2000

Agrawal 7964 8597 8986 9259 9288 9282 9267 9251 9232 9207Bank 6819 7340 7705 7913 7912 7896 7860 7824 7776 7737CoverType 3821 3761 3970 4430 4659 4777 4862 4919 4951 4974Electricity 8912 8526 8171 7984 7858 7714 7636 7594 7531 7449Hyperplane 7825 8265 8528 8656 8655 8630 8598 8565 8534 8500RandomRBF 6461 6739 6933 7056 7081 7085 7074 7063 7050 7033

Table 4 Averaged classification time (seconds) of DS-Gamma for different windows size

Dataset Window size50 100 200 500 750 1000 1250 1500 1750 2000

Agrawal 224 450 793 1926 2804 3900 4592 5757 6390 7349Bank 066 114 207 475 701 917 1154 1455 1583 1787CoverType 8347 14081 25120 34005 25412 33503 42894 52239 66924 67699Electricity 034 053 098 233 356 463 599 705 830 949Hyperplane 224 401 756 1819 2676 3028 4428 5296 6187 7063RandomRBF 279 409 773 1854 2735 3072 4502 5412 6223 7104

Table 5 Averaged accuracy () comparison

Classifier Agrawal Bank CoverType Electricity Hyperplane RandomRBFAcc 120590 Acc 120590 Acc 120590 Acc 120590 Acc 120590 Acc 120590

Bayes-DDM 8816 019 8338 141 6761 681 8105 mdash 8602 139 7187 382Perceptron DDM 6720 011 7343 1053 7076 555 7861 mdash 8565 149 7188 593Hoeffding Tree 9439 024 8852 011 7229 268 8241 mdash 8241 562 8672 169Active Classifier 9082 162 8821 014 6731 103 7859 mdash 7950 655 7532 270IBL-DS 8676 048 8652 021 6173 946 7534 mdash 8420 077 9332 121AWE 9217 059 8790 038 7616 366 7836 mdash 8588 165 9198 114OzaBoostAdwin 8711 262 8767 050 7855 497 8792 mdash 8321 191 9149 114SGD 6720 011 7656 166 5901 737 8452 mdash 8319 042 6174 331SPegasos 5582 017 7746 199 5047 517 5760 mdash 5280 608 5333 313DS-Gamma 9282 010 7896 022 4777 431 7714 mdash 8630 135 7085 254

Table 6 Averaged classification time (seconds) comparison

Classifier Agrawal Bank CoverType Electricity Hyperplane RandomRBFTime 120590 Time 120590 Time 120590 Time 120590 Time 120590 Time 120590

Bayes-DDM 062 011 039 008 1776 162 061 mdash 070 001 069 001Perceptron DDM 040 002 020 005 828 010 019 mdash 045 002 044 002Hoeffding Tree 098 023 054 003 3005 1106 078 mdash 136 005 140 005Active Classifier 064 003 026 002 1514 074 023 mdash 072 004 070 002IBL-DS 5319 211 3851 826 105760 6095 861 mdash 8639 096 8810 106AWE 1432 020 663 030 101247 30441 670 mdash 1803 364 3752 303OzaBoostAdwin 19262 4447 4693 1056 1058973 389664 3975 mdash 22399 5134 5704 1666SGD 014 004 011 0009 4035 007 012 mdash 027 002 027 002SPegasos 022 0006 011 0016 1684 012 012 mdash 031 005 024 001DS-Gamma 3900 358 916 005 33503 1291 463 mdash 3028 079 3072 094

12 Mathematical Problems in Engineering

Agrawal data stream

Window size

95

90

85

80

7550 100 200 500 750 1000 1250 1500 1750 2000

Hyperplane data stream

90

85

80

70

75

65

Window size50 100 200 500 750 1000 1250 1500 1750 2000

Bank data stream85

80

70

75

6550 100 200 500 750 1000 1250 1500 1750 2000

Window size

Window size

90

85

80

70

75

65

60

50

55

50 100 200 500 750 1000 1250 1500 1750 2000

RandomRBF data stream

60

50

40

30

20

10

0

Window size50 100 200 500 750 1000 1250 1500 1750 2000

CoverType data stream

90

95

85

80

70

75

6550 100 200 500 750 1000 1250 1500 1750 2000

Window size

Electricity data stream

Figure 1 Average accuracy of DS-Gamma for different windows size

Figure 4 This can be a serious disadvantage especially indata stream scenarios The lowest times are achieved by theperceptron with Drift Detection Method (DDM) but itsperformance is generally low when compared to the othermethods A fact that caught our attention was the low perfor-mance of the DS-Gamma classifier for the CoverType datastream This data stream presents a heavy class imbalancethat seems to be negatively affecting the performance of theclassifier As we can observe in step (8) of the DS-Gamma

classifier algorithm classification relies on a weighted sumper class when a class greatly outnumbers the other(s) thissum will be biased to the class with most members

To statistically compare the results obtained by the DS-Gamma classifier and the other evaluated algorithms theWilcoxon signed-ranks test was used Table 7 shows thevalues of the asymptotic significance (2-sided) obtained bythe Wilcoxon test using 120572 = 005 The obtained values areall greater than 120572 so the null hypothesis is not rejected and

Mathematical Problems in Engineering 13

Hyperplane data stream

Window size

Window size

CoverType data stream Electricity data stream

RandomRBF data stream

Agrawal data stream Bank data stream80

70

60

50

40

30

20

20

15

10

5

010

050 100 200 500 750 1000 1250 1500 1750 2000

50 100 200 500 750 1000 1250 1500 1750 2000

Window size50 100 200 500 750 1000 1250 1500 1750 2000 Window size

50 100 200 500 750 1000 1250 1500 1750 2000

Window size50 100 200 500 750 1000 1250 1500 1750 2000

Window size50 100 200 500 750 1000 1250 1500 1750 2000

minus10

70

60

50

40

30

20

10

0

550

450

350

250

150

50

10

8

6

4

2

0

minus10

minus50

70

60

50

40

30

20

10

0

minus10

minus2

minus5

R

2

= 09866

R

2

= 09854

R

2

= 09804

R

2

= 09796

R

2

= 09743

R

2

= 09842

Figure 2 Averaged classification time (seconds) of DS-Gamma for different windows size

we can infer that there are no significant differences amongthe compared algorithms

4 Conclusions

In this paper we describe a novel method for the task ofpattern classification over a continuous data stream based onan associative model The proposed method is a supervisedpattern recognition model based on the Gamma classifierDuring the experimental phase the proposed method wastested using different data stream scenarios with real andsynthetic data streams The proposed method presented verypromising results in terms of performance and classification

Table 7 Wilcoxon results

Compared algorithms Asymptotic significance (2-sided)Gamma-Bayes-DDM 0345Gamma-Perceptron DDM 0917Gamma-Hoeffding Tree 0116Gamma-Active Classifier 0463Gamma-IBL-DS 0345Gamma-AWE 0116Gamma-OzaBoostAdwin 0116Gamma-SGD 0600Gamma-SPegasos 0075

14 Mathematical Problems in Engineering

Agrawal data stream

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

95

90

85

80

75

70

65

60

55

50

Bank data stream

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

90

80

70

60

50

40

Hyperplane data stream

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

95

90

85

80

75

70

65

60

55

45

50

RandomRBF data stream

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

95

90

85

80

75

70

65

60

55

45

50

CoverType data stream

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

90

85

80

75

70

65

60

55

45

50

Electricity data stream100

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

90

80

70

60

50

40

30

20

10

0

Figure 3 Averaged accuracy () comparison

time when compared with some of the state-of-the-art algo-rithms for data stream classification

We also study the effect of the window size over the clas-sification accuracy and time In general accuracy improvedas the window size increased We observed that best perfor-mances were obtained with window sizes between 500 and

1000 For larger window sizes performance remained stableand in some cases even declined

Since the proposed method based its classification ona weighted sum per class it is strongly affected by severeclass imbalance as we can observe from the results obtainedwith the experiments performed with the CoverType dataset

Mathematical Problems in Engineering 15

Agrawal data stream

200

150

100

50

0

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

1

10

100

Hyperplane data stream

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

1

10

100

1000

10000

100000CoverType data stream

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

0102030405060708090

RandomRBF data stream

0

10

20

30

40

50

60Bank data stream

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

05

10152025303540

Electricity data stream

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

Figure 4 Averaged classification time (seconds) comparison

a heavily imbalance dataset where the accuracy of theclassifier was the worst of the compared algorithms As futurework a mechanism to address severe class imbalance shouldbe integrated to improve classification accuracy

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

The authors would like to thank the Instituto PolitecnicoNacional (Secretarıa Academica COFAA SIP CIC andCIDETEC) the CONACyT and SNI for their economicalsupport to develop this workThe support of theUniversity ofPorto was provided during the research stay from August toDecember 2014 The authors also want to thank Dr YennyVilluendas Rey for her contribution to perform the statisticalsignificance analysis

16 Mathematical Problems in Engineering

References

[1] V Turner J FGantzD Reinsel and SMinton ldquoThedigital uni-verse of opportunities rich data and the increasing value of theinternet of thingsrdquo Tech Rep IDC Information andData 2014httpwwwemccomleadershipdigital-universe2014iviewindexhtm

[2] A Bifet Adaptive Stream Mining Pattern Learning and Miningfrom Evolving Data Streams IOS Press Amsterdam The Neth-erlands 2010

[3] J Gama Knowledge Discovery from Data Streams Chapman ampHallCRC 2010

[4] J Beringer and E Hullermeier ldquoEfficient instance-based learn-ing on data streamsrdquo Intelligent Data Analysis vol 11 no 6 pp627ndash650 2007

[5] J Gama I Zliobaite A Bifet M Pechenizkiy and A Bou-chachia ldquoA survey on concept drift adaptationrdquo ACM Comput-ing Surveys vol 46 no 4 article 44 2014

[6] P Domingos and G Hulten ldquoMining high-speed data streamsrdquoinProceedings of the 6thACMSIGKDD International Conferenceon Knowledge Discovery and Data Mining pp 71ndash80 BostonMass USA August 2000

[7] G Hulten L Spencer and P Domingos ldquoMining time-chang-ing data streamsrdquo in Proceedings of the 7th ACM SIGKDD Inter-national Conference on Knowledge Discovery and Data Mining(KDD rsquo01) pp 97ndash106 San Francisco Calif USA August 2001

[8] C Liang Y Zhang P Shi and Z Hu ldquoLearning very fast deci-sion tree from uncertain data streams with positive and unla-beled samplesrdquo Information Sciences vol 213 pp 50ndash67 2012

[9] L Rutkowski M Jaworski L Pietruczuk and P Duda ldquoDeci-sion trees for mining data streams based on the gaussianapproximationrdquo IEEE Transactions on Knowledge and DataEngineering vol 26 no 1 pp 108ndash119 2014

[10] G Widmer andM Kubat ldquoLearning in the presence of conceptdrift and hidden contextsrdquo Machine Learning vol 23 no 1 pp69ndash101 1996

[11] F Ferrer-Troyano J S Aguilar-Ruiz and J C RiquelmeldquoData streams classification by incremental rule learning withparameterized generalizationrdquo in Proceedings of the 2006 ACMSymposium on Applied Computing pp 657ndash661 Dijon FranceApril 2006

[12] D Mena-Torres and J S Aguilar-Ruiz ldquoA similarity-basedapproach for data stream classificationrdquo Expert Systems withApplications vol 41 no 9 pp 4224ndash4234 2014

[13] A Shaker and E Hullermeier ldquoIBLStreams a system forinstance-based classification and regression on data streamsrdquoEvolving Systems vol 3 no 4 pp 235ndash249 2012

[14] A Ghazikhani R Monsefi and H S Yazdi ldquoRecursive leastsquare perceptron model for non-stationary and imbalanceddata stream classificationrdquo Evolving Systems vol 4 no 2 pp119ndash131 2013

[15] A Ghazikhani R Monsefi and H S Yazdi ldquoEnsemble ofonline neural networks for non-stationary and imbalanced datastreamsrdquo Neurocomputing vol 122 pp 535ndash544 2013

[16] A Sancho-Asensio A Orriols-Puig and E GolobardesldquoRobust on-line neural learning classifier system for data streamclassification tasksrdquo Soft Computing vol 18 no 8 pp 1441ndash14612014

[17] D M Farid L Zhang A Hossain et al ldquoAn adaptive ensembleclassifier for mining concept drifting data streamsrdquo ExpertSystems with Applications vol 40 no 15 pp 5895ndash5906 2013

[18] D Brzezinski and J Stefanowski ldquoCombining block-based andonline methods in learning ensembles from concept driftingdata streamsrdquo Information Sciences vol 265 pp 50ndash67 2014

[19] G Ditzler and R Polikar ldquoIncremental learning of conceptdrift from streaming imbalanced datardquo IEEE Transactions onKnowledge and Data Engineering vol 25 no 10 pp 2283ndash23012013

[20] M J Hosseini Z Ahmadi andH Beigy ldquoUsing a classifier poolin accuracy based tracking of recurring concepts in data streamclassificationrdquo Evolving Systems vol 4 no 1 pp 43ndash60 2013

[21] N Saunier and S Midenet ldquoCreating ensemble classifiersthrough order and incremental data selection in a streamrdquo Pat-tern Analysis and Applications vol 16 no 3 pp 333ndash347 2013

[22] MMMasud J Gao L Khan J Han and BMThuraisinghamldquoClassification and novel class detection in concept-driftingdata streams under time constraintsrdquo IEEE Transactions onKnowledge and Data Engineering vol 23 no 6 pp 859ndash8742011

[23] A Alzghoul M Lofstrand and B Backe ldquoData stream fore-casting for system fault predictionrdquo Computers and IndustrialEngineering vol 62 no 4 pp 972ndash978 2012

[24] J Kranjc J Smailovic V Podpecan M Grcar M Znidarsicand N Lavrac ldquoActive learning for sentiment analysis on datastreams methodology and workflow implementation in theClowdFlows platformrdquo Information Processing and Manage-ment vol 51 no 2 pp 187ndash203 2015

[25] J Smailovic M Grcar N Lavrac and M Znidarsic ldquoStream-based active learning for sentiment analysis in the financialdomainrdquo Information Sciences vol 285 pp 181ndash203 2014

[26] S Shalev-Shwartz Y Singer N Srebro and A Cotter ldquoPegasosprimal estimated sub-gradient solver for SVMrdquo in Proceedingsof the 24th International Conference on Machine Learning pp807ndash814 June 2007

[27] Z Miller B Dickinson W Deitrick W Hu and A H WangldquoTwitter spammer detection using data stream clusteringrdquoInformation Sciences vol 260 pp 64ndash73 2014

[28] B Krawczyk and M Wozniak ldquoIncremental weighted one-class classifier for mining stationary data streamsrdquo Journal ofComputational Science vol 9 pp 19ndash25 2015

[29] J Gama ldquoA survey on learning from data streams current andfuture trendsrdquo Progress in Artificial Intelligence vol 1 no 1 pp45ndash55 2012

[30] L I Kuncheva ldquoChange detection in streaming multivariatedata using likelihood detectorsrdquo IEEE Transactions on Knowl-edge and Data Engineering vol 25 no 5 pp 1175ndash1180 2013

[31] G J Ross N M Adams D K Tasoulis and D J Hand ldquoExpo-nentially weighted moving average charts for detecting conceptdriftrdquoPatternRecognition Letters vol 33 no 2 pp 191ndash198 2012

[32] J Gama P Medas G Castillo and P Rodrigues ldquoLearning withdrift detectionrdquo in Advances in Artificial IntelligencemdashSBIA2004 vol 3171 of Lecture Notes in Computer Science pp 286ndash295 Springer Berlin Germany 2004

[33] X Wu P Li and X Hu ldquoLearning from concept drifting datastreams with unlabeled datardquoNeurocomputing vol 92 pp 145ndash155 2012

[34] X Qin Y Zhang C Li and X Li ldquoLearning from data streamswith only positive and unlabeled datardquo Journal of IntelligentInformation Systems vol 40 no 3 pp 405ndash430 2013

Mathematical Problems in Engineering 17

[35] M M Masud C Woolam J Gao et al ldquoFacing the reality ofdata stream classification coping with scarcity of labeled datardquoKnowledge and Information Systems vol 33 no 1 pp 213ndash2442012

[36] J Read A Bifet G Holmes and B Pfahringer ldquoScalable andefficient multi-label classification for evolving data streamsrdquoMachine Learning vol 88 no 1-2 pp 243ndash272 2012

[37] I Lopez-Yanez A J Arguelles-Cruz O Camacho-Nieto andC Yanez-Marquez ldquoPollutants time-series prediction usingthe gamma classifierrdquo International Journal of ComputationalIntelligence Systems vol 4 no 4 pp 680ndash711 2011

[38] M E Acevedo-Mosqueda C Yanez-Marquez and I Lopez-Yanez ldquoAlpha-beta bidirectional associative memories theoryand applicationsrdquo Neural Processing Letters vol 26 no 1 pp 1ndash40 2007

[39] C Yanez-Marquez E M Felipe-Riveron I Lopez-Yanez andR Flores-Carapia ldquoA novel approach to automatic color match-ingrdquo in Progress in Pattern Recognition Image Analysis andApplications vol 4225 of Lecture Notes in Computer Science pp529ndash538 Springer Berlin Germany 2006

[40] I Lopez-Yanez L Sheremetov and C Yanez-Marquez ldquoAnovel associative model for time series data miningrdquo PatternRecognition Letters vol 41 no 1 pp 23ndash33 2014

[41] A Bifet G Holmes R Kirkby and B Pfahringer ldquoData streammining a practical approachrdquo Tech RepUniversity ofWaikato2011

[42] Massive Online Analysis (MOA) 2014 httpmoacmswaikatoacnzdatasets

[43] K Bache and M Lichman UCI Machine Learning RepositorySchool of Information and Computer Science University ofCalifornia Irvine Calif USA 2013 httparchiveicsucieduml

[44] A Bifet R Kirkby P Kranen and P Reutemann ldquoMassiveOnline Analysis Manualrdquo 2012 httpmoacmswaikatoacnzdocumentation

[45] M Harries ldquoSplice-2 comparative evaluation electricity pric-ingrdquo Tech Rep University of South Wales Cardiff UK 1999

[46] R Agrawal T Imielinski and A Swami ldquoDatabase mining aperformance perspectiverdquo IEEE Transactions on Knowledge andData Engineering vol 5 no 6 pp 914ndash925 1993

[47] J Gama R Sebastiao and P P Rodrigues ldquoOn evaluatingstream learning algorithmsrdquo Machine Learning vol 90 no 3pp 317ndash346 2013

[48] F Wilcoxon ldquoIndividual comparisons by ranking methodsrdquoBiometrics Bulletin vol 1 no 6 pp 80ndash83 1945

[49] M Hall E Frank G Holmes B Pfahringer P Reutemannand I H Witten WEKA 3 Data Mining Software in Java 2010httpwwwcswaikatoacnzmlweka

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 12: Research Article Data Stream Classification Based …downloads.hindawi.com/journals/mpe/2015/939175.pdfuncertain data [ ], classi cation with unlabeled instances [ , , ], novel class

12 Mathematical Problems in Engineering

Agrawal data stream

Window size

95

90

85

80

7550 100 200 500 750 1000 1250 1500 1750 2000

Hyperplane data stream

90

85

80

70

75

65

Window size50 100 200 500 750 1000 1250 1500 1750 2000

Bank data stream85

80

70

75

6550 100 200 500 750 1000 1250 1500 1750 2000

Window size

Window size

90

85

80

70

75

65

60

50

55

50 100 200 500 750 1000 1250 1500 1750 2000

RandomRBF data stream

60

50

40

30

20

10

0

Window size50 100 200 500 750 1000 1250 1500 1750 2000

CoverType data stream

90

95

85

80

70

75

6550 100 200 500 750 1000 1250 1500 1750 2000

Window size

Electricity data stream

Figure 1 Average accuracy of DS-Gamma for different windows size

Figure 4 This can be a serious disadvantage especially indata stream scenarios The lowest times are achieved by theperceptron with Drift Detection Method (DDM) but itsperformance is generally low when compared to the othermethods A fact that caught our attention was the low perfor-mance of the DS-Gamma classifier for the CoverType datastream This data stream presents a heavy class imbalancethat seems to be negatively affecting the performance of theclassifier As we can observe in step (8) of the DS-Gamma

classifier algorithm classification relies on a weighted sumper class when a class greatly outnumbers the other(s) thissum will be biased to the class with most members

To statistically compare the results obtained by the DS-Gamma classifier and the other evaluated algorithms theWilcoxon signed-ranks test was used Table 7 shows thevalues of the asymptotic significance (2-sided) obtained bythe Wilcoxon test using 120572 = 005 The obtained values areall greater than 120572 so the null hypothesis is not rejected and

Mathematical Problems in Engineering 13

Hyperplane data stream

Window size

Window size

CoverType data stream Electricity data stream

RandomRBF data stream

Agrawal data stream Bank data stream80

70

60

50

40

30

20

20

15

10

5

010

050 100 200 500 750 1000 1250 1500 1750 2000

50 100 200 500 750 1000 1250 1500 1750 2000

Window size50 100 200 500 750 1000 1250 1500 1750 2000 Window size

50 100 200 500 750 1000 1250 1500 1750 2000

Window size50 100 200 500 750 1000 1250 1500 1750 2000

Window size50 100 200 500 750 1000 1250 1500 1750 2000

minus10

70

60

50

40

30

20

10

0

550

450

350

250

150

50

10

8

6

4

2

0

minus10

minus50

70

60

50

40

30

20

10

0

minus10

minus2

minus5

R

2

= 09866

R

2

= 09854

R

2

= 09804

R

2

= 09796

R

2

= 09743

R

2

= 09842

Figure 2 Averaged classification time (seconds) of DS-Gamma for different windows size

we can infer that there are no significant differences amongthe compared algorithms

4 Conclusions

In this paper we describe a novel method for the task ofpattern classification over a continuous data stream based onan associative model The proposed method is a supervisedpattern recognition model based on the Gamma classifierDuring the experimental phase the proposed method wastested using different data stream scenarios with real andsynthetic data streams The proposed method presented verypromising results in terms of performance and classification

Table 7 Wilcoxon results

Compared algorithms Asymptotic significance (2-sided)Gamma-Bayes-DDM 0345Gamma-Perceptron DDM 0917Gamma-Hoeffding Tree 0116Gamma-Active Classifier 0463Gamma-IBL-DS 0345Gamma-AWE 0116Gamma-OzaBoostAdwin 0116Gamma-SGD 0600Gamma-SPegasos 0075

14 Mathematical Problems in Engineering

Agrawal data stream

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

95

90

85

80

75

70

65

60

55

50

Bank data stream

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

90

80

70

60

50

40

Hyperplane data stream

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

95

90

85

80

75

70

65

60

55

45

50

RandomRBF data stream

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

95

90

85

80

75

70

65

60

55

45

50

CoverType data stream

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

90

85

80

75

70

65

60

55

45

50

Electricity data stream100

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

90

80

70

60

50

40

30

20

10

0

Figure 3 Averaged accuracy () comparison

time when compared with some of the state-of-the-art algo-rithms for data stream classification

We also study the effect of the window size over the clas-sification accuracy and time In general accuracy improvedas the window size increased We observed that best perfor-mances were obtained with window sizes between 500 and

1000 For larger window sizes performance remained stableand in some cases even declined

Since the proposed method based its classification ona weighted sum per class it is strongly affected by severeclass imbalance as we can observe from the results obtainedwith the experiments performed with the CoverType dataset

Mathematical Problems in Engineering 15

Agrawal data stream

200

150

100

50

0

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

1

10

100

Hyperplane data stream

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

1

10

100

1000

10000

100000CoverType data stream

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

0102030405060708090

RandomRBF data stream

0

10

20

30

40

50

60Bank data stream

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

05

10152025303540

Electricity data stream

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

Figure 4 Averaged classification time (seconds) comparison

a heavily imbalance dataset where the accuracy of theclassifier was the worst of the compared algorithms As futurework a mechanism to address severe class imbalance shouldbe integrated to improve classification accuracy

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

The authors would like to thank the Instituto PolitecnicoNacional (Secretarıa Academica COFAA SIP CIC andCIDETEC) the CONACyT and SNI for their economicalsupport to develop this workThe support of theUniversity ofPorto was provided during the research stay from August toDecember 2014 The authors also want to thank Dr YennyVilluendas Rey for her contribution to perform the statisticalsignificance analysis

16 Mathematical Problems in Engineering

References

[1] V Turner J FGantzD Reinsel and SMinton ldquoThedigital uni-verse of opportunities rich data and the increasing value of theinternet of thingsrdquo Tech Rep IDC Information andData 2014httpwwwemccomleadershipdigital-universe2014iviewindexhtm

[2] A Bifet Adaptive Stream Mining Pattern Learning and Miningfrom Evolving Data Streams IOS Press Amsterdam The Neth-erlands 2010

[3] J Gama Knowledge Discovery from Data Streams Chapman ampHallCRC 2010

[4] J Beringer and E Hullermeier ldquoEfficient instance-based learn-ing on data streamsrdquo Intelligent Data Analysis vol 11 no 6 pp627ndash650 2007

[5] J Gama I Zliobaite A Bifet M Pechenizkiy and A Bou-chachia ldquoA survey on concept drift adaptationrdquo ACM Comput-ing Surveys vol 46 no 4 article 44 2014

[6] P Domingos and G Hulten ldquoMining high-speed data streamsrdquoinProceedings of the 6thACMSIGKDD International Conferenceon Knowledge Discovery and Data Mining pp 71ndash80 BostonMass USA August 2000

[7] G Hulten L Spencer and P Domingos ldquoMining time-chang-ing data streamsrdquo in Proceedings of the 7th ACM SIGKDD Inter-national Conference on Knowledge Discovery and Data Mining(KDD rsquo01) pp 97ndash106 San Francisco Calif USA August 2001

[8] C Liang Y Zhang P Shi and Z Hu ldquoLearning very fast deci-sion tree from uncertain data streams with positive and unla-beled samplesrdquo Information Sciences vol 213 pp 50ndash67 2012

[9] L Rutkowski M Jaworski L Pietruczuk and P Duda ldquoDeci-sion trees for mining data streams based on the gaussianapproximationrdquo IEEE Transactions on Knowledge and DataEngineering vol 26 no 1 pp 108ndash119 2014

[10] G Widmer andM Kubat ldquoLearning in the presence of conceptdrift and hidden contextsrdquo Machine Learning vol 23 no 1 pp69ndash101 1996

[11] F Ferrer-Troyano J S Aguilar-Ruiz and J C RiquelmeldquoData streams classification by incremental rule learning withparameterized generalizationrdquo in Proceedings of the 2006 ACMSymposium on Applied Computing pp 657ndash661 Dijon FranceApril 2006

[12] D Mena-Torres and J S Aguilar-Ruiz ldquoA similarity-basedapproach for data stream classificationrdquo Expert Systems withApplications vol 41 no 9 pp 4224ndash4234 2014

[13] A Shaker and E Hullermeier ldquoIBLStreams a system forinstance-based classification and regression on data streamsrdquoEvolving Systems vol 3 no 4 pp 235ndash249 2012

[14] A Ghazikhani R Monsefi and H S Yazdi ldquoRecursive leastsquare perceptron model for non-stationary and imbalanceddata stream classificationrdquo Evolving Systems vol 4 no 2 pp119ndash131 2013

[15] A Ghazikhani R Monsefi and H S Yazdi ldquoEnsemble ofonline neural networks for non-stationary and imbalanced datastreamsrdquo Neurocomputing vol 122 pp 535ndash544 2013

[16] A Sancho-Asensio A Orriols-Puig and E GolobardesldquoRobust on-line neural learning classifier system for data streamclassification tasksrdquo Soft Computing vol 18 no 8 pp 1441ndash14612014

[17] D M Farid L Zhang A Hossain et al ldquoAn adaptive ensembleclassifier for mining concept drifting data streamsrdquo ExpertSystems with Applications vol 40 no 15 pp 5895ndash5906 2013

[18] D Brzezinski and J Stefanowski ldquoCombining block-based andonline methods in learning ensembles from concept driftingdata streamsrdquo Information Sciences vol 265 pp 50ndash67 2014

[19] G Ditzler and R Polikar ldquoIncremental learning of conceptdrift from streaming imbalanced datardquo IEEE Transactions onKnowledge and Data Engineering vol 25 no 10 pp 2283ndash23012013

[20] M J Hosseini Z Ahmadi andH Beigy ldquoUsing a classifier poolin accuracy based tracking of recurring concepts in data streamclassificationrdquo Evolving Systems vol 4 no 1 pp 43ndash60 2013

[21] N Saunier and S Midenet ldquoCreating ensemble classifiersthrough order and incremental data selection in a streamrdquo Pat-tern Analysis and Applications vol 16 no 3 pp 333ndash347 2013

[22] MMMasud J Gao L Khan J Han and BMThuraisinghamldquoClassification and novel class detection in concept-driftingdata streams under time constraintsrdquo IEEE Transactions onKnowledge and Data Engineering vol 23 no 6 pp 859ndash8742011

[23] A Alzghoul M Lofstrand and B Backe ldquoData stream fore-casting for system fault predictionrdquo Computers and IndustrialEngineering vol 62 no 4 pp 972ndash978 2012

[24] J Kranjc J Smailovic V Podpecan M Grcar M Znidarsicand N Lavrac ldquoActive learning for sentiment analysis on datastreams methodology and workflow implementation in theClowdFlows platformrdquo Information Processing and Manage-ment vol 51 no 2 pp 187ndash203 2015

[25] J Smailovic M Grcar N Lavrac and M Znidarsic ldquoStream-based active learning for sentiment analysis in the financialdomainrdquo Information Sciences vol 285 pp 181ndash203 2014

[26] S Shalev-Shwartz Y Singer N Srebro and A Cotter ldquoPegasosprimal estimated sub-gradient solver for SVMrdquo in Proceedingsof the 24th International Conference on Machine Learning pp807ndash814 June 2007

[27] Z Miller B Dickinson W Deitrick W Hu and A H WangldquoTwitter spammer detection using data stream clusteringrdquoInformation Sciences vol 260 pp 64ndash73 2014

[28] B Krawczyk and M Wozniak ldquoIncremental weighted one-class classifier for mining stationary data streamsrdquo Journal ofComputational Science vol 9 pp 19ndash25 2015

[29] J Gama ldquoA survey on learning from data streams current andfuture trendsrdquo Progress in Artificial Intelligence vol 1 no 1 pp45ndash55 2012

[30] L I Kuncheva ldquoChange detection in streaming multivariatedata using likelihood detectorsrdquo IEEE Transactions on Knowl-edge and Data Engineering vol 25 no 5 pp 1175ndash1180 2013

[31] G J Ross N M Adams D K Tasoulis and D J Hand ldquoExpo-nentially weighted moving average charts for detecting conceptdriftrdquoPatternRecognition Letters vol 33 no 2 pp 191ndash198 2012

[32] J Gama P Medas G Castillo and P Rodrigues ldquoLearning withdrift detectionrdquo in Advances in Artificial IntelligencemdashSBIA2004 vol 3171 of Lecture Notes in Computer Science pp 286ndash295 Springer Berlin Germany 2004

[33] X Wu P Li and X Hu ldquoLearning from concept drifting datastreams with unlabeled datardquoNeurocomputing vol 92 pp 145ndash155 2012

[34] X Qin Y Zhang C Li and X Li ldquoLearning from data streamswith only positive and unlabeled datardquo Journal of IntelligentInformation Systems vol 40 no 3 pp 405ndash430 2013

Mathematical Problems in Engineering 17

[35] M M Masud C Woolam J Gao et al ldquoFacing the reality ofdata stream classification coping with scarcity of labeled datardquoKnowledge and Information Systems vol 33 no 1 pp 213ndash2442012

[36] J Read A Bifet G Holmes and B Pfahringer ldquoScalable andefficient multi-label classification for evolving data streamsrdquoMachine Learning vol 88 no 1-2 pp 243ndash272 2012

[37] I Lopez-Yanez A J Arguelles-Cruz O Camacho-Nieto andC Yanez-Marquez ldquoPollutants time-series prediction usingthe gamma classifierrdquo International Journal of ComputationalIntelligence Systems vol 4 no 4 pp 680ndash711 2011

[38] M E Acevedo-Mosqueda C Yanez-Marquez and I Lopez-Yanez ldquoAlpha-beta bidirectional associative memories theoryand applicationsrdquo Neural Processing Letters vol 26 no 1 pp 1ndash40 2007

[39] C Yanez-Marquez E M Felipe-Riveron I Lopez-Yanez andR Flores-Carapia ldquoA novel approach to automatic color match-ingrdquo in Progress in Pattern Recognition Image Analysis andApplications vol 4225 of Lecture Notes in Computer Science pp529ndash538 Springer Berlin Germany 2006

[40] I Lopez-Yanez L Sheremetov and C Yanez-Marquez ldquoAnovel associative model for time series data miningrdquo PatternRecognition Letters vol 41 no 1 pp 23ndash33 2014

[41] A Bifet G Holmes R Kirkby and B Pfahringer ldquoData streammining a practical approachrdquo Tech RepUniversity ofWaikato2011

[42] Massive Online Analysis (MOA) 2014 httpmoacmswaikatoacnzdatasets

[43] K Bache and M Lichman UCI Machine Learning RepositorySchool of Information and Computer Science University ofCalifornia Irvine Calif USA 2013 httparchiveicsucieduml

[44] A Bifet R Kirkby P Kranen and P Reutemann ldquoMassiveOnline Analysis Manualrdquo 2012 httpmoacmswaikatoacnzdocumentation

[45] M Harries ldquoSplice-2 comparative evaluation electricity pric-ingrdquo Tech Rep University of South Wales Cardiff UK 1999

[46] R Agrawal T Imielinski and A Swami ldquoDatabase mining aperformance perspectiverdquo IEEE Transactions on Knowledge andData Engineering vol 5 no 6 pp 914ndash925 1993

[47] J Gama R Sebastiao and P P Rodrigues ldquoOn evaluatingstream learning algorithmsrdquo Machine Learning vol 90 no 3pp 317ndash346 2013

[48] F Wilcoxon ldquoIndividual comparisons by ranking methodsrdquoBiometrics Bulletin vol 1 no 6 pp 80ndash83 1945

[49] M Hall E Frank G Holmes B Pfahringer P Reutemannand I H Witten WEKA 3 Data Mining Software in Java 2010httpwwwcswaikatoacnzmlweka

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 13: Research Article Data Stream Classification Based …downloads.hindawi.com/journals/mpe/2015/939175.pdfuncertain data [ ], classi cation with unlabeled instances [ , , ], novel class

Mathematical Problems in Engineering 13

Hyperplane data stream

Window size

Window size

CoverType data stream Electricity data stream

RandomRBF data stream

Agrawal data stream Bank data stream80

70

60

50

40

30

20

20

15

10

5

010

050 100 200 500 750 1000 1250 1500 1750 2000

50 100 200 500 750 1000 1250 1500 1750 2000

Window size50 100 200 500 750 1000 1250 1500 1750 2000 Window size

50 100 200 500 750 1000 1250 1500 1750 2000

Window size50 100 200 500 750 1000 1250 1500 1750 2000

Window size50 100 200 500 750 1000 1250 1500 1750 2000

minus10

70

60

50

40

30

20

10

0

550

450

350

250

150

50

10

8

6

4

2

0

minus10

minus50

70

60

50

40

30

20

10

0

minus10

minus2

minus5

R

2

= 09866

R

2

= 09854

R

2

= 09804

R

2

= 09796

R

2

= 09743

R

2

= 09842

Figure 2 Averaged classification time (seconds) of DS-Gamma for different windows size

we can infer that there are no significant differences amongthe compared algorithms

4 Conclusions

In this paper we describe a novel method for the task ofpattern classification over a continuous data stream based onan associative model The proposed method is a supervisedpattern recognition model based on the Gamma classifierDuring the experimental phase the proposed method wastested using different data stream scenarios with real andsynthetic data streams The proposed method presented verypromising results in terms of performance and classification

Table 7 Wilcoxon results

Compared algorithms Asymptotic significance (2-sided)Gamma-Bayes-DDM 0345Gamma-Perceptron DDM 0917Gamma-Hoeffding Tree 0116Gamma-Active Classifier 0463Gamma-IBL-DS 0345Gamma-AWE 0116Gamma-OzaBoostAdwin 0116Gamma-SGD 0600Gamma-SPegasos 0075

14 Mathematical Problems in Engineering

Agrawal data stream

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

95

90

85

80

75

70

65

60

55

50

Bank data stream

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

90

80

70

60

50

40

Hyperplane data stream

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

95

90

85

80

75

70

65

60

55

45

50

RandomRBF data stream

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

95

90

85

80

75

70

65

60

55

45

50

CoverType data stream

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

90

85

80

75

70

65

60

55

45

50

Electricity data stream100

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

90

80

70

60

50

40

30

20

10

0

Figure 3 Averaged accuracy () comparison

time when compared with some of the state-of-the-art algo-rithms for data stream classification

We also study the effect of the window size over the clas-sification accuracy and time In general accuracy improvedas the window size increased We observed that best perfor-mances were obtained with window sizes between 500 and

1000 For larger window sizes performance remained stableand in some cases even declined

Since the proposed method based its classification ona weighted sum per class it is strongly affected by severeclass imbalance as we can observe from the results obtainedwith the experiments performed with the CoverType dataset

Mathematical Problems in Engineering 15

Agrawal data stream

200

150

100

50

0

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

1

10

100

Hyperplane data stream

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

1

10

100

1000

10000

100000CoverType data stream

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

0102030405060708090

RandomRBF data stream

0

10

20

30

40

50

60Bank data stream

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

05

10152025303540

Electricity data stream

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

Figure 4 Averaged classification time (seconds) comparison

a heavily imbalance dataset where the accuracy of theclassifier was the worst of the compared algorithms As futurework a mechanism to address severe class imbalance shouldbe integrated to improve classification accuracy

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

The authors would like to thank the Instituto PolitecnicoNacional (Secretarıa Academica COFAA SIP CIC andCIDETEC) the CONACyT and SNI for their economicalsupport to develop this workThe support of theUniversity ofPorto was provided during the research stay from August toDecember 2014 The authors also want to thank Dr YennyVilluendas Rey for her contribution to perform the statisticalsignificance analysis

16 Mathematical Problems in Engineering

References

[1] V Turner J FGantzD Reinsel and SMinton ldquoThedigital uni-verse of opportunities rich data and the increasing value of theinternet of thingsrdquo Tech Rep IDC Information andData 2014httpwwwemccomleadershipdigital-universe2014iviewindexhtm

[2] A Bifet Adaptive Stream Mining Pattern Learning and Miningfrom Evolving Data Streams IOS Press Amsterdam The Neth-erlands 2010

[3] J Gama Knowledge Discovery from Data Streams Chapman ampHallCRC 2010

[4] J Beringer and E Hullermeier ldquoEfficient instance-based learn-ing on data streamsrdquo Intelligent Data Analysis vol 11 no 6 pp627ndash650 2007

[5] J Gama I Zliobaite A Bifet M Pechenizkiy and A Bou-chachia ldquoA survey on concept drift adaptationrdquo ACM Comput-ing Surveys vol 46 no 4 article 44 2014

[6] P Domingos and G Hulten ldquoMining high-speed data streamsrdquoinProceedings of the 6thACMSIGKDD International Conferenceon Knowledge Discovery and Data Mining pp 71ndash80 BostonMass USA August 2000

[7] G Hulten L Spencer and P Domingos ldquoMining time-chang-ing data streamsrdquo in Proceedings of the 7th ACM SIGKDD Inter-national Conference on Knowledge Discovery and Data Mining(KDD rsquo01) pp 97ndash106 San Francisco Calif USA August 2001

[8] C Liang Y Zhang P Shi and Z Hu ldquoLearning very fast deci-sion tree from uncertain data streams with positive and unla-beled samplesrdquo Information Sciences vol 213 pp 50ndash67 2012

[9] L Rutkowski M Jaworski L Pietruczuk and P Duda ldquoDeci-sion trees for mining data streams based on the gaussianapproximationrdquo IEEE Transactions on Knowledge and DataEngineering vol 26 no 1 pp 108ndash119 2014

[10] G Widmer andM Kubat ldquoLearning in the presence of conceptdrift and hidden contextsrdquo Machine Learning vol 23 no 1 pp69ndash101 1996

[11] F Ferrer-Troyano J S Aguilar-Ruiz and J C RiquelmeldquoData streams classification by incremental rule learning withparameterized generalizationrdquo in Proceedings of the 2006 ACMSymposium on Applied Computing pp 657ndash661 Dijon FranceApril 2006

[12] D Mena-Torres and J S Aguilar-Ruiz ldquoA similarity-basedapproach for data stream classificationrdquo Expert Systems withApplications vol 41 no 9 pp 4224ndash4234 2014

[13] A Shaker and E Hullermeier ldquoIBLStreams a system forinstance-based classification and regression on data streamsrdquoEvolving Systems vol 3 no 4 pp 235ndash249 2012

[14] A Ghazikhani R Monsefi and H S Yazdi ldquoRecursive leastsquare perceptron model for non-stationary and imbalanceddata stream classificationrdquo Evolving Systems vol 4 no 2 pp119ndash131 2013

[15] A Ghazikhani R Monsefi and H S Yazdi ldquoEnsemble ofonline neural networks for non-stationary and imbalanced datastreamsrdquo Neurocomputing vol 122 pp 535ndash544 2013

[16] A Sancho-Asensio A Orriols-Puig and E GolobardesldquoRobust on-line neural learning classifier system for data streamclassification tasksrdquo Soft Computing vol 18 no 8 pp 1441ndash14612014

[17] D M Farid L Zhang A Hossain et al ldquoAn adaptive ensembleclassifier for mining concept drifting data streamsrdquo ExpertSystems with Applications vol 40 no 15 pp 5895ndash5906 2013

[18] D Brzezinski and J Stefanowski ldquoCombining block-based andonline methods in learning ensembles from concept driftingdata streamsrdquo Information Sciences vol 265 pp 50ndash67 2014

[19] G Ditzler and R Polikar ldquoIncremental learning of conceptdrift from streaming imbalanced datardquo IEEE Transactions onKnowledge and Data Engineering vol 25 no 10 pp 2283ndash23012013

[20] M J Hosseini Z Ahmadi andH Beigy ldquoUsing a classifier poolin accuracy based tracking of recurring concepts in data streamclassificationrdquo Evolving Systems vol 4 no 1 pp 43ndash60 2013

[21] N Saunier and S Midenet ldquoCreating ensemble classifiersthrough order and incremental data selection in a streamrdquo Pat-tern Analysis and Applications vol 16 no 3 pp 333ndash347 2013

[22] MMMasud J Gao L Khan J Han and BMThuraisinghamldquoClassification and novel class detection in concept-driftingdata streams under time constraintsrdquo IEEE Transactions onKnowledge and Data Engineering vol 23 no 6 pp 859ndash8742011

[23] A Alzghoul M Lofstrand and B Backe ldquoData stream fore-casting for system fault predictionrdquo Computers and IndustrialEngineering vol 62 no 4 pp 972ndash978 2012

[24] J Kranjc J Smailovic V Podpecan M Grcar M Znidarsicand N Lavrac ldquoActive learning for sentiment analysis on datastreams methodology and workflow implementation in theClowdFlows platformrdquo Information Processing and Manage-ment vol 51 no 2 pp 187ndash203 2015

[25] J Smailovic M Grcar N Lavrac and M Znidarsic ldquoStream-based active learning for sentiment analysis in the financialdomainrdquo Information Sciences vol 285 pp 181ndash203 2014

[26] S Shalev-Shwartz Y Singer N Srebro and A Cotter ldquoPegasosprimal estimated sub-gradient solver for SVMrdquo in Proceedingsof the 24th International Conference on Machine Learning pp807ndash814 June 2007

[27] Z Miller B Dickinson W Deitrick W Hu and A H WangldquoTwitter spammer detection using data stream clusteringrdquoInformation Sciences vol 260 pp 64ndash73 2014

[28] B Krawczyk and M Wozniak ldquoIncremental weighted one-class classifier for mining stationary data streamsrdquo Journal ofComputational Science vol 9 pp 19ndash25 2015

[29] J Gama ldquoA survey on learning from data streams current andfuture trendsrdquo Progress in Artificial Intelligence vol 1 no 1 pp45ndash55 2012

[30] L I Kuncheva ldquoChange detection in streaming multivariatedata using likelihood detectorsrdquo IEEE Transactions on Knowl-edge and Data Engineering vol 25 no 5 pp 1175ndash1180 2013

[31] G J Ross N M Adams D K Tasoulis and D J Hand ldquoExpo-nentially weighted moving average charts for detecting conceptdriftrdquoPatternRecognition Letters vol 33 no 2 pp 191ndash198 2012

[32] J Gama P Medas G Castillo and P Rodrigues ldquoLearning withdrift detectionrdquo in Advances in Artificial IntelligencemdashSBIA2004 vol 3171 of Lecture Notes in Computer Science pp 286ndash295 Springer Berlin Germany 2004

[33] X Wu P Li and X Hu ldquoLearning from concept drifting datastreams with unlabeled datardquoNeurocomputing vol 92 pp 145ndash155 2012

[34] X Qin Y Zhang C Li and X Li ldquoLearning from data streamswith only positive and unlabeled datardquo Journal of IntelligentInformation Systems vol 40 no 3 pp 405ndash430 2013

Mathematical Problems in Engineering 17

[35] M M Masud C Woolam J Gao et al ldquoFacing the reality ofdata stream classification coping with scarcity of labeled datardquoKnowledge and Information Systems vol 33 no 1 pp 213ndash2442012

[36] J Read A Bifet G Holmes and B Pfahringer ldquoScalable andefficient multi-label classification for evolving data streamsrdquoMachine Learning vol 88 no 1-2 pp 243ndash272 2012

[37] I Lopez-Yanez A J Arguelles-Cruz O Camacho-Nieto andC Yanez-Marquez ldquoPollutants time-series prediction usingthe gamma classifierrdquo International Journal of ComputationalIntelligence Systems vol 4 no 4 pp 680ndash711 2011

[38] M E Acevedo-Mosqueda C Yanez-Marquez and I Lopez-Yanez ldquoAlpha-beta bidirectional associative memories theoryand applicationsrdquo Neural Processing Letters vol 26 no 1 pp 1ndash40 2007

[39] C Yanez-Marquez E M Felipe-Riveron I Lopez-Yanez andR Flores-Carapia ldquoA novel approach to automatic color match-ingrdquo in Progress in Pattern Recognition Image Analysis andApplications vol 4225 of Lecture Notes in Computer Science pp529ndash538 Springer Berlin Germany 2006

[40] I Lopez-Yanez L Sheremetov and C Yanez-Marquez ldquoAnovel associative model for time series data miningrdquo PatternRecognition Letters vol 41 no 1 pp 23ndash33 2014

[41] A Bifet G Holmes R Kirkby and B Pfahringer ldquoData streammining a practical approachrdquo Tech RepUniversity ofWaikato2011

[42] Massive Online Analysis (MOA) 2014 httpmoacmswaikatoacnzdatasets

[43] K Bache and M Lichman UCI Machine Learning RepositorySchool of Information and Computer Science University ofCalifornia Irvine Calif USA 2013 httparchiveicsucieduml

[44] A Bifet R Kirkby P Kranen and P Reutemann ldquoMassiveOnline Analysis Manualrdquo 2012 httpmoacmswaikatoacnzdocumentation

[45] M Harries ldquoSplice-2 comparative evaluation electricity pric-ingrdquo Tech Rep University of South Wales Cardiff UK 1999

[46] R Agrawal T Imielinski and A Swami ldquoDatabase mining aperformance perspectiverdquo IEEE Transactions on Knowledge andData Engineering vol 5 no 6 pp 914ndash925 1993

[47] J Gama R Sebastiao and P P Rodrigues ldquoOn evaluatingstream learning algorithmsrdquo Machine Learning vol 90 no 3pp 317ndash346 2013

[48] F Wilcoxon ldquoIndividual comparisons by ranking methodsrdquoBiometrics Bulletin vol 1 no 6 pp 80ndash83 1945

[49] M Hall E Frank G Holmes B Pfahringer P Reutemannand I H Witten WEKA 3 Data Mining Software in Java 2010httpwwwcswaikatoacnzmlweka

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 14: Research Article Data Stream Classification Based …downloads.hindawi.com/journals/mpe/2015/939175.pdfuncertain data [ ], classi cation with unlabeled instances [ , , ], novel class

14 Mathematical Problems in Engineering

Agrawal data stream

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

95

90

85

80

75

70

65

60

55

50

Bank data stream

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

90

80

70

60

50

40

Hyperplane data stream

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

95

90

85

80

75

70

65

60

55

45

50

RandomRBF data stream

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

95

90

85

80

75

70

65

60

55

45

50

CoverType data stream

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

90

85

80

75

70

65

60

55

45

50

Electricity data stream100

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

90

80

70

60

50

40

30

20

10

0

Figure 3 Averaged accuracy () comparison

time when compared with some of the state-of-the-art algo-rithms for data stream classification

We also study the effect of the window size over the clas-sification accuracy and time In general accuracy improvedas the window size increased We observed that best perfor-mances were obtained with window sizes between 500 and

1000 For larger window sizes performance remained stableand in some cases even declined

Since the proposed method based its classification ona weighted sum per class it is strongly affected by severeclass imbalance as we can observe from the results obtainedwith the experiments performed with the CoverType dataset

Mathematical Problems in Engineering 15

Agrawal data stream

200

150

100

50

0

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

1

10

100

Hyperplane data stream

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

1

10

100

1000

10000

100000CoverType data stream

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

0102030405060708090

RandomRBF data stream

0

10

20

30

40

50

60Bank data stream

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

05

10152025303540

Electricity data stream

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

Figure 4 Averaged classification time (seconds) comparison

a heavily imbalance dataset where the accuracy of theclassifier was the worst of the compared algorithms As futurework a mechanism to address severe class imbalance shouldbe integrated to improve classification accuracy

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

The authors would like to thank the Instituto PolitecnicoNacional (Secretarıa Academica COFAA SIP CIC andCIDETEC) the CONACyT and SNI for their economicalsupport to develop this workThe support of theUniversity ofPorto was provided during the research stay from August toDecember 2014 The authors also want to thank Dr YennyVilluendas Rey for her contribution to perform the statisticalsignificance analysis

16 Mathematical Problems in Engineering

References

[1] V Turner J FGantzD Reinsel and SMinton ldquoThedigital uni-verse of opportunities rich data and the increasing value of theinternet of thingsrdquo Tech Rep IDC Information andData 2014httpwwwemccomleadershipdigital-universe2014iviewindexhtm

[2] A Bifet Adaptive Stream Mining Pattern Learning and Miningfrom Evolving Data Streams IOS Press Amsterdam The Neth-erlands 2010

[3] J Gama Knowledge Discovery from Data Streams Chapman ampHallCRC 2010

[4] J Beringer and E Hullermeier ldquoEfficient instance-based learn-ing on data streamsrdquo Intelligent Data Analysis vol 11 no 6 pp627ndash650 2007

[5] J Gama I Zliobaite A Bifet M Pechenizkiy and A Bou-chachia ldquoA survey on concept drift adaptationrdquo ACM Comput-ing Surveys vol 46 no 4 article 44 2014

[6] P Domingos and G Hulten ldquoMining high-speed data streamsrdquoinProceedings of the 6thACMSIGKDD International Conferenceon Knowledge Discovery and Data Mining pp 71ndash80 BostonMass USA August 2000

[7] G Hulten L Spencer and P Domingos ldquoMining time-chang-ing data streamsrdquo in Proceedings of the 7th ACM SIGKDD Inter-national Conference on Knowledge Discovery and Data Mining(KDD rsquo01) pp 97ndash106 San Francisco Calif USA August 2001

[8] C Liang Y Zhang P Shi and Z Hu ldquoLearning very fast deci-sion tree from uncertain data streams with positive and unla-beled samplesrdquo Information Sciences vol 213 pp 50ndash67 2012

[9] L Rutkowski M Jaworski L Pietruczuk and P Duda ldquoDeci-sion trees for mining data streams based on the gaussianapproximationrdquo IEEE Transactions on Knowledge and DataEngineering vol 26 no 1 pp 108ndash119 2014

[10] G Widmer andM Kubat ldquoLearning in the presence of conceptdrift and hidden contextsrdquo Machine Learning vol 23 no 1 pp69ndash101 1996

[11] F Ferrer-Troyano J S Aguilar-Ruiz and J C RiquelmeldquoData streams classification by incremental rule learning withparameterized generalizationrdquo in Proceedings of the 2006 ACMSymposium on Applied Computing pp 657ndash661 Dijon FranceApril 2006

[12] D Mena-Torres and J S Aguilar-Ruiz ldquoA similarity-basedapproach for data stream classificationrdquo Expert Systems withApplications vol 41 no 9 pp 4224ndash4234 2014

[13] A Shaker and E Hullermeier ldquoIBLStreams a system forinstance-based classification and regression on data streamsrdquoEvolving Systems vol 3 no 4 pp 235ndash249 2012

[14] A Ghazikhani R Monsefi and H S Yazdi ldquoRecursive leastsquare perceptron model for non-stationary and imbalanceddata stream classificationrdquo Evolving Systems vol 4 no 2 pp119ndash131 2013

[15] A Ghazikhani R Monsefi and H S Yazdi ldquoEnsemble ofonline neural networks for non-stationary and imbalanced datastreamsrdquo Neurocomputing vol 122 pp 535ndash544 2013

[16] A Sancho-Asensio A Orriols-Puig and E GolobardesldquoRobust on-line neural learning classifier system for data streamclassification tasksrdquo Soft Computing vol 18 no 8 pp 1441ndash14612014

[17] D M Farid L Zhang A Hossain et al ldquoAn adaptive ensembleclassifier for mining concept drifting data streamsrdquo ExpertSystems with Applications vol 40 no 15 pp 5895ndash5906 2013

[18] D Brzezinski and J Stefanowski ldquoCombining block-based andonline methods in learning ensembles from concept driftingdata streamsrdquo Information Sciences vol 265 pp 50ndash67 2014

[19] G Ditzler and R Polikar ldquoIncremental learning of conceptdrift from streaming imbalanced datardquo IEEE Transactions onKnowledge and Data Engineering vol 25 no 10 pp 2283ndash23012013

[20] M J Hosseini Z Ahmadi andH Beigy ldquoUsing a classifier poolin accuracy based tracking of recurring concepts in data streamclassificationrdquo Evolving Systems vol 4 no 1 pp 43ndash60 2013

[21] N Saunier and S Midenet ldquoCreating ensemble classifiersthrough order and incremental data selection in a streamrdquo Pat-tern Analysis and Applications vol 16 no 3 pp 333ndash347 2013

[22] MMMasud J Gao L Khan J Han and BMThuraisinghamldquoClassification and novel class detection in concept-driftingdata streams under time constraintsrdquo IEEE Transactions onKnowledge and Data Engineering vol 23 no 6 pp 859ndash8742011

[23] A Alzghoul M Lofstrand and B Backe ldquoData stream fore-casting for system fault predictionrdquo Computers and IndustrialEngineering vol 62 no 4 pp 972ndash978 2012

[24] J Kranjc J Smailovic V Podpecan M Grcar M Znidarsicand N Lavrac ldquoActive learning for sentiment analysis on datastreams methodology and workflow implementation in theClowdFlows platformrdquo Information Processing and Manage-ment vol 51 no 2 pp 187ndash203 2015

[25] J Smailovic M Grcar N Lavrac and M Znidarsic ldquoStream-based active learning for sentiment analysis in the financialdomainrdquo Information Sciences vol 285 pp 181ndash203 2014

[26] S Shalev-Shwartz Y Singer N Srebro and A Cotter ldquoPegasosprimal estimated sub-gradient solver for SVMrdquo in Proceedingsof the 24th International Conference on Machine Learning pp807ndash814 June 2007

[27] Z Miller B Dickinson W Deitrick W Hu and A H WangldquoTwitter spammer detection using data stream clusteringrdquoInformation Sciences vol 260 pp 64ndash73 2014

[28] B Krawczyk and M Wozniak ldquoIncremental weighted one-class classifier for mining stationary data streamsrdquo Journal ofComputational Science vol 9 pp 19ndash25 2015

[29] J Gama ldquoA survey on learning from data streams current andfuture trendsrdquo Progress in Artificial Intelligence vol 1 no 1 pp45ndash55 2012

[30] L I Kuncheva ldquoChange detection in streaming multivariatedata using likelihood detectorsrdquo IEEE Transactions on Knowl-edge and Data Engineering vol 25 no 5 pp 1175ndash1180 2013

[31] G J Ross N M Adams D K Tasoulis and D J Hand ldquoExpo-nentially weighted moving average charts for detecting conceptdriftrdquoPatternRecognition Letters vol 33 no 2 pp 191ndash198 2012

[32] J Gama P Medas G Castillo and P Rodrigues ldquoLearning withdrift detectionrdquo in Advances in Artificial IntelligencemdashSBIA2004 vol 3171 of Lecture Notes in Computer Science pp 286ndash295 Springer Berlin Germany 2004

[33] X Wu P Li and X Hu ldquoLearning from concept drifting datastreams with unlabeled datardquoNeurocomputing vol 92 pp 145ndash155 2012

[34] X Qin Y Zhang C Li and X Li ldquoLearning from data streamswith only positive and unlabeled datardquo Journal of IntelligentInformation Systems vol 40 no 3 pp 405ndash430 2013

Mathematical Problems in Engineering 17

[35] M M Masud C Woolam J Gao et al ldquoFacing the reality ofdata stream classification coping with scarcity of labeled datardquoKnowledge and Information Systems vol 33 no 1 pp 213ndash2442012

[36] J Read A Bifet G Holmes and B Pfahringer ldquoScalable andefficient multi-label classification for evolving data streamsrdquoMachine Learning vol 88 no 1-2 pp 243ndash272 2012

[37] I Lopez-Yanez A J Arguelles-Cruz O Camacho-Nieto andC Yanez-Marquez ldquoPollutants time-series prediction usingthe gamma classifierrdquo International Journal of ComputationalIntelligence Systems vol 4 no 4 pp 680ndash711 2011

[38] M E Acevedo-Mosqueda C Yanez-Marquez and I Lopez-Yanez ldquoAlpha-beta bidirectional associative memories theoryand applicationsrdquo Neural Processing Letters vol 26 no 1 pp 1ndash40 2007

[39] C Yanez-Marquez E M Felipe-Riveron I Lopez-Yanez andR Flores-Carapia ldquoA novel approach to automatic color match-ingrdquo in Progress in Pattern Recognition Image Analysis andApplications vol 4225 of Lecture Notes in Computer Science pp529ndash538 Springer Berlin Germany 2006

[40] I Lopez-Yanez L Sheremetov and C Yanez-Marquez ldquoAnovel associative model for time series data miningrdquo PatternRecognition Letters vol 41 no 1 pp 23ndash33 2014

[41] A Bifet G Holmes R Kirkby and B Pfahringer ldquoData streammining a practical approachrdquo Tech RepUniversity ofWaikato2011

[42] Massive Online Analysis (MOA) 2014 httpmoacmswaikatoacnzdatasets

[43] K Bache and M Lichman UCI Machine Learning RepositorySchool of Information and Computer Science University ofCalifornia Irvine Calif USA 2013 httparchiveicsucieduml

[44] A Bifet R Kirkby P Kranen and P Reutemann ldquoMassiveOnline Analysis Manualrdquo 2012 httpmoacmswaikatoacnzdocumentation

[45] M Harries ldquoSplice-2 comparative evaluation electricity pric-ingrdquo Tech Rep University of South Wales Cardiff UK 1999

[46] R Agrawal T Imielinski and A Swami ldquoDatabase mining aperformance perspectiverdquo IEEE Transactions on Knowledge andData Engineering vol 5 no 6 pp 914ndash925 1993

[47] J Gama R Sebastiao and P P Rodrigues ldquoOn evaluatingstream learning algorithmsrdquo Machine Learning vol 90 no 3pp 317ndash346 2013

[48] F Wilcoxon ldquoIndividual comparisons by ranking methodsrdquoBiometrics Bulletin vol 1 no 6 pp 80ndash83 1945

[49] M Hall E Frank G Holmes B Pfahringer P Reutemannand I H Witten WEKA 3 Data Mining Software in Java 2010httpwwwcswaikatoacnzmlweka

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 15: Research Article Data Stream Classification Based …downloads.hindawi.com/journals/mpe/2015/939175.pdfuncertain data [ ], classi cation with unlabeled instances [ , , ], novel class

Mathematical Problems in Engineering 15

Agrawal data stream

200

150

100

50

0

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

1

10

100

Hyperplane data stream

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

1

10

100

1000

10000

100000CoverType data stream

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

0102030405060708090

RandomRBF data stream

0

10

20

30

40

50

60Bank data stream

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

05

10152025303540

Electricity data stream

Baye

s-D

DM

Perc

eptro

n D

DM

Activ

e Cla

ssifi

er

Hoe

ffdin

g Tr

ee

IBL-

DS

AWE

Oza

Boos

tAdw

in

SGD

SPeg

asos

Gam

ma

Figure 4 Averaged classification time (seconds) comparison

a heavily imbalance dataset where the accuracy of theclassifier was the worst of the compared algorithms As futurework a mechanism to address severe class imbalance shouldbe integrated to improve classification accuracy

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

The authors would like to thank the Instituto PolitecnicoNacional (Secretarıa Academica COFAA SIP CIC andCIDETEC) the CONACyT and SNI for their economicalsupport to develop this workThe support of theUniversity ofPorto was provided during the research stay from August toDecember 2014 The authors also want to thank Dr YennyVilluendas Rey for her contribution to perform the statisticalsignificance analysis

16 Mathematical Problems in Engineering

References

[1] V Turner J FGantzD Reinsel and SMinton ldquoThedigital uni-verse of opportunities rich data and the increasing value of theinternet of thingsrdquo Tech Rep IDC Information andData 2014httpwwwemccomleadershipdigital-universe2014iviewindexhtm

[2] A Bifet Adaptive Stream Mining Pattern Learning and Miningfrom Evolving Data Streams IOS Press Amsterdam The Neth-erlands 2010

[3] J Gama Knowledge Discovery from Data Streams Chapman ampHallCRC 2010

[4] J Beringer and E Hullermeier ldquoEfficient instance-based learn-ing on data streamsrdquo Intelligent Data Analysis vol 11 no 6 pp627ndash650 2007

[5] J Gama I Zliobaite A Bifet M Pechenizkiy and A Bou-chachia ldquoA survey on concept drift adaptationrdquo ACM Comput-ing Surveys vol 46 no 4 article 44 2014

[6] P Domingos and G Hulten ldquoMining high-speed data streamsrdquoinProceedings of the 6thACMSIGKDD International Conferenceon Knowledge Discovery and Data Mining pp 71ndash80 BostonMass USA August 2000

[7] G Hulten L Spencer and P Domingos ldquoMining time-chang-ing data streamsrdquo in Proceedings of the 7th ACM SIGKDD Inter-national Conference on Knowledge Discovery and Data Mining(KDD rsquo01) pp 97ndash106 San Francisco Calif USA August 2001

[8] C Liang Y Zhang P Shi and Z Hu ldquoLearning very fast deci-sion tree from uncertain data streams with positive and unla-beled samplesrdquo Information Sciences vol 213 pp 50ndash67 2012

[9] L Rutkowski M Jaworski L Pietruczuk and P Duda ldquoDeci-sion trees for mining data streams based on the gaussianapproximationrdquo IEEE Transactions on Knowledge and DataEngineering vol 26 no 1 pp 108ndash119 2014

[10] G Widmer andM Kubat ldquoLearning in the presence of conceptdrift and hidden contextsrdquo Machine Learning vol 23 no 1 pp69ndash101 1996

[11] F Ferrer-Troyano J S Aguilar-Ruiz and J C RiquelmeldquoData streams classification by incremental rule learning withparameterized generalizationrdquo in Proceedings of the 2006 ACMSymposium on Applied Computing pp 657ndash661 Dijon FranceApril 2006

[12] D Mena-Torres and J S Aguilar-Ruiz ldquoA similarity-basedapproach for data stream classificationrdquo Expert Systems withApplications vol 41 no 9 pp 4224ndash4234 2014

[13] A Shaker and E Hullermeier ldquoIBLStreams a system forinstance-based classification and regression on data streamsrdquoEvolving Systems vol 3 no 4 pp 235ndash249 2012

[14] A Ghazikhani R Monsefi and H S Yazdi ldquoRecursive leastsquare perceptron model for non-stationary and imbalanceddata stream classificationrdquo Evolving Systems vol 4 no 2 pp119ndash131 2013

[15] A Ghazikhani R Monsefi and H S Yazdi ldquoEnsemble ofonline neural networks for non-stationary and imbalanced datastreamsrdquo Neurocomputing vol 122 pp 535ndash544 2013

[16] A Sancho-Asensio A Orriols-Puig and E GolobardesldquoRobust on-line neural learning classifier system for data streamclassification tasksrdquo Soft Computing vol 18 no 8 pp 1441ndash14612014

[17] D M Farid L Zhang A Hossain et al ldquoAn adaptive ensembleclassifier for mining concept drifting data streamsrdquo ExpertSystems with Applications vol 40 no 15 pp 5895ndash5906 2013

[18] D Brzezinski and J Stefanowski ldquoCombining block-based andonline methods in learning ensembles from concept driftingdata streamsrdquo Information Sciences vol 265 pp 50ndash67 2014

[19] G Ditzler and R Polikar ldquoIncremental learning of conceptdrift from streaming imbalanced datardquo IEEE Transactions onKnowledge and Data Engineering vol 25 no 10 pp 2283ndash23012013

[20] M J Hosseini Z Ahmadi andH Beigy ldquoUsing a classifier poolin accuracy based tracking of recurring concepts in data streamclassificationrdquo Evolving Systems vol 4 no 1 pp 43ndash60 2013

[21] N Saunier and S Midenet ldquoCreating ensemble classifiersthrough order and incremental data selection in a streamrdquo Pat-tern Analysis and Applications vol 16 no 3 pp 333ndash347 2013

[22] MMMasud J Gao L Khan J Han and BMThuraisinghamldquoClassification and novel class detection in concept-driftingdata streams under time constraintsrdquo IEEE Transactions onKnowledge and Data Engineering vol 23 no 6 pp 859ndash8742011

[23] A Alzghoul M Lofstrand and B Backe ldquoData stream fore-casting for system fault predictionrdquo Computers and IndustrialEngineering vol 62 no 4 pp 972ndash978 2012

[24] J Kranjc J Smailovic V Podpecan M Grcar M Znidarsicand N Lavrac ldquoActive learning for sentiment analysis on datastreams methodology and workflow implementation in theClowdFlows platformrdquo Information Processing and Manage-ment vol 51 no 2 pp 187ndash203 2015

[25] J Smailovic M Grcar N Lavrac and M Znidarsic ldquoStream-based active learning for sentiment analysis in the financialdomainrdquo Information Sciences vol 285 pp 181ndash203 2014

[26] S Shalev-Shwartz Y Singer N Srebro and A Cotter ldquoPegasosprimal estimated sub-gradient solver for SVMrdquo in Proceedingsof the 24th International Conference on Machine Learning pp807ndash814 June 2007

[27] Z Miller B Dickinson W Deitrick W Hu and A H WangldquoTwitter spammer detection using data stream clusteringrdquoInformation Sciences vol 260 pp 64ndash73 2014

[28] B Krawczyk and M Wozniak ldquoIncremental weighted one-class classifier for mining stationary data streamsrdquo Journal ofComputational Science vol 9 pp 19ndash25 2015

[29] J Gama ldquoA survey on learning from data streams current andfuture trendsrdquo Progress in Artificial Intelligence vol 1 no 1 pp45ndash55 2012

[30] L I Kuncheva ldquoChange detection in streaming multivariatedata using likelihood detectorsrdquo IEEE Transactions on Knowl-edge and Data Engineering vol 25 no 5 pp 1175ndash1180 2013

[31] G J Ross N M Adams D K Tasoulis and D J Hand ldquoExpo-nentially weighted moving average charts for detecting conceptdriftrdquoPatternRecognition Letters vol 33 no 2 pp 191ndash198 2012

[32] J Gama P Medas G Castillo and P Rodrigues ldquoLearning withdrift detectionrdquo in Advances in Artificial IntelligencemdashSBIA2004 vol 3171 of Lecture Notes in Computer Science pp 286ndash295 Springer Berlin Germany 2004

[33] X Wu P Li and X Hu ldquoLearning from concept drifting datastreams with unlabeled datardquoNeurocomputing vol 92 pp 145ndash155 2012

[34] X Qin Y Zhang C Li and X Li ldquoLearning from data streamswith only positive and unlabeled datardquo Journal of IntelligentInformation Systems vol 40 no 3 pp 405ndash430 2013

Mathematical Problems in Engineering 17

[35] M M Masud C Woolam J Gao et al ldquoFacing the reality ofdata stream classification coping with scarcity of labeled datardquoKnowledge and Information Systems vol 33 no 1 pp 213ndash2442012

[36] J Read A Bifet G Holmes and B Pfahringer ldquoScalable andefficient multi-label classification for evolving data streamsrdquoMachine Learning vol 88 no 1-2 pp 243ndash272 2012

[37] I Lopez-Yanez A J Arguelles-Cruz O Camacho-Nieto andC Yanez-Marquez ldquoPollutants time-series prediction usingthe gamma classifierrdquo International Journal of ComputationalIntelligence Systems vol 4 no 4 pp 680ndash711 2011

[38] M E Acevedo-Mosqueda C Yanez-Marquez and I Lopez-Yanez ldquoAlpha-beta bidirectional associative memories theoryand applicationsrdquo Neural Processing Letters vol 26 no 1 pp 1ndash40 2007

[39] C Yanez-Marquez E M Felipe-Riveron I Lopez-Yanez andR Flores-Carapia ldquoA novel approach to automatic color match-ingrdquo in Progress in Pattern Recognition Image Analysis andApplications vol 4225 of Lecture Notes in Computer Science pp529ndash538 Springer Berlin Germany 2006

[40] I Lopez-Yanez L Sheremetov and C Yanez-Marquez ldquoAnovel associative model for time series data miningrdquo PatternRecognition Letters vol 41 no 1 pp 23ndash33 2014

[41] A Bifet G Holmes R Kirkby and B Pfahringer ldquoData streammining a practical approachrdquo Tech RepUniversity ofWaikato2011

[42] Massive Online Analysis (MOA) 2014 httpmoacmswaikatoacnzdatasets

[43] K Bache and M Lichman UCI Machine Learning RepositorySchool of Information and Computer Science University ofCalifornia Irvine Calif USA 2013 httparchiveicsucieduml

[44] A Bifet R Kirkby P Kranen and P Reutemann ldquoMassiveOnline Analysis Manualrdquo 2012 httpmoacmswaikatoacnzdocumentation

[45] M Harries ldquoSplice-2 comparative evaluation electricity pric-ingrdquo Tech Rep University of South Wales Cardiff UK 1999

[46] R Agrawal T Imielinski and A Swami ldquoDatabase mining aperformance perspectiverdquo IEEE Transactions on Knowledge andData Engineering vol 5 no 6 pp 914ndash925 1993

[47] J Gama R Sebastiao and P P Rodrigues ldquoOn evaluatingstream learning algorithmsrdquo Machine Learning vol 90 no 3pp 317ndash346 2013

[48] F Wilcoxon ldquoIndividual comparisons by ranking methodsrdquoBiometrics Bulletin vol 1 no 6 pp 80ndash83 1945

[49] M Hall E Frank G Holmes B Pfahringer P Reutemannand I H Witten WEKA 3 Data Mining Software in Java 2010httpwwwcswaikatoacnzmlweka

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 16: Research Article Data Stream Classification Based …downloads.hindawi.com/journals/mpe/2015/939175.pdfuncertain data [ ], classi cation with unlabeled instances [ , , ], novel class

16 Mathematical Problems in Engineering

References

[1] V Turner J FGantzD Reinsel and SMinton ldquoThedigital uni-verse of opportunities rich data and the increasing value of theinternet of thingsrdquo Tech Rep IDC Information andData 2014httpwwwemccomleadershipdigital-universe2014iviewindexhtm

[2] A Bifet Adaptive Stream Mining Pattern Learning and Miningfrom Evolving Data Streams IOS Press Amsterdam The Neth-erlands 2010

[3] J Gama Knowledge Discovery from Data Streams Chapman ampHallCRC 2010

[4] J Beringer and E Hullermeier ldquoEfficient instance-based learn-ing on data streamsrdquo Intelligent Data Analysis vol 11 no 6 pp627ndash650 2007

[5] J Gama I Zliobaite A Bifet M Pechenizkiy and A Bou-chachia ldquoA survey on concept drift adaptationrdquo ACM Comput-ing Surveys vol 46 no 4 article 44 2014

[6] P Domingos and G Hulten ldquoMining high-speed data streamsrdquoinProceedings of the 6thACMSIGKDD International Conferenceon Knowledge Discovery and Data Mining pp 71ndash80 BostonMass USA August 2000

[7] G Hulten L Spencer and P Domingos ldquoMining time-chang-ing data streamsrdquo in Proceedings of the 7th ACM SIGKDD Inter-national Conference on Knowledge Discovery and Data Mining(KDD rsquo01) pp 97ndash106 San Francisco Calif USA August 2001

[8] C Liang Y Zhang P Shi and Z Hu ldquoLearning very fast deci-sion tree from uncertain data streams with positive and unla-beled samplesrdquo Information Sciences vol 213 pp 50ndash67 2012

[9] L Rutkowski M Jaworski L Pietruczuk and P Duda ldquoDeci-sion trees for mining data streams based on the gaussianapproximationrdquo IEEE Transactions on Knowledge and DataEngineering vol 26 no 1 pp 108ndash119 2014

[10] G Widmer andM Kubat ldquoLearning in the presence of conceptdrift and hidden contextsrdquo Machine Learning vol 23 no 1 pp69ndash101 1996

[11] F Ferrer-Troyano J S Aguilar-Ruiz and J C RiquelmeldquoData streams classification by incremental rule learning withparameterized generalizationrdquo in Proceedings of the 2006 ACMSymposium on Applied Computing pp 657ndash661 Dijon FranceApril 2006

[12] D Mena-Torres and J S Aguilar-Ruiz ldquoA similarity-basedapproach for data stream classificationrdquo Expert Systems withApplications vol 41 no 9 pp 4224ndash4234 2014

[13] A Shaker and E Hullermeier ldquoIBLStreams a system forinstance-based classification and regression on data streamsrdquoEvolving Systems vol 3 no 4 pp 235ndash249 2012

[14] A Ghazikhani R Monsefi and H S Yazdi ldquoRecursive leastsquare perceptron model for non-stationary and imbalanceddata stream classificationrdquo Evolving Systems vol 4 no 2 pp119ndash131 2013

[15] A Ghazikhani R Monsefi and H S Yazdi ldquoEnsemble ofonline neural networks for non-stationary and imbalanced datastreamsrdquo Neurocomputing vol 122 pp 535ndash544 2013

[16] A Sancho-Asensio A Orriols-Puig and E GolobardesldquoRobust on-line neural learning classifier system for data streamclassification tasksrdquo Soft Computing vol 18 no 8 pp 1441ndash14612014

[17] D M Farid L Zhang A Hossain et al ldquoAn adaptive ensembleclassifier for mining concept drifting data streamsrdquo ExpertSystems with Applications vol 40 no 15 pp 5895ndash5906 2013

[18] D Brzezinski and J Stefanowski ldquoCombining block-based andonline methods in learning ensembles from concept driftingdata streamsrdquo Information Sciences vol 265 pp 50ndash67 2014

[19] G Ditzler and R Polikar ldquoIncremental learning of conceptdrift from streaming imbalanced datardquo IEEE Transactions onKnowledge and Data Engineering vol 25 no 10 pp 2283ndash23012013

[20] M J Hosseini Z Ahmadi andH Beigy ldquoUsing a classifier poolin accuracy based tracking of recurring concepts in data streamclassificationrdquo Evolving Systems vol 4 no 1 pp 43ndash60 2013

[21] N Saunier and S Midenet ldquoCreating ensemble classifiersthrough order and incremental data selection in a streamrdquo Pat-tern Analysis and Applications vol 16 no 3 pp 333ndash347 2013

[22] MMMasud J Gao L Khan J Han and BMThuraisinghamldquoClassification and novel class detection in concept-driftingdata streams under time constraintsrdquo IEEE Transactions onKnowledge and Data Engineering vol 23 no 6 pp 859ndash8742011

[23] A Alzghoul M Lofstrand and B Backe ldquoData stream fore-casting for system fault predictionrdquo Computers and IndustrialEngineering vol 62 no 4 pp 972ndash978 2012

[24] J Kranjc J Smailovic V Podpecan M Grcar M Znidarsicand N Lavrac ldquoActive learning for sentiment analysis on datastreams methodology and workflow implementation in theClowdFlows platformrdquo Information Processing and Manage-ment vol 51 no 2 pp 187ndash203 2015

[25] J Smailovic M Grcar N Lavrac and M Znidarsic ldquoStream-based active learning for sentiment analysis in the financialdomainrdquo Information Sciences vol 285 pp 181ndash203 2014

[26] S Shalev-Shwartz Y Singer N Srebro and A Cotter ldquoPegasosprimal estimated sub-gradient solver for SVMrdquo in Proceedingsof the 24th International Conference on Machine Learning pp807ndash814 June 2007

[27] Z Miller B Dickinson W Deitrick W Hu and A H WangldquoTwitter spammer detection using data stream clusteringrdquoInformation Sciences vol 260 pp 64ndash73 2014

[28] B Krawczyk and M Wozniak ldquoIncremental weighted one-class classifier for mining stationary data streamsrdquo Journal ofComputational Science vol 9 pp 19ndash25 2015

[29] J Gama ldquoA survey on learning from data streams current andfuture trendsrdquo Progress in Artificial Intelligence vol 1 no 1 pp45ndash55 2012

[30] L I Kuncheva ldquoChange detection in streaming multivariatedata using likelihood detectorsrdquo IEEE Transactions on Knowl-edge and Data Engineering vol 25 no 5 pp 1175ndash1180 2013

[31] G J Ross N M Adams D K Tasoulis and D J Hand ldquoExpo-nentially weighted moving average charts for detecting conceptdriftrdquoPatternRecognition Letters vol 33 no 2 pp 191ndash198 2012

[32] J Gama P Medas G Castillo and P Rodrigues ldquoLearning withdrift detectionrdquo in Advances in Artificial IntelligencemdashSBIA2004 vol 3171 of Lecture Notes in Computer Science pp 286ndash295 Springer Berlin Germany 2004

[33] X Wu P Li and X Hu ldquoLearning from concept drifting datastreams with unlabeled datardquoNeurocomputing vol 92 pp 145ndash155 2012

[34] X Qin Y Zhang C Li and X Li ldquoLearning from data streamswith only positive and unlabeled datardquo Journal of IntelligentInformation Systems vol 40 no 3 pp 405ndash430 2013

Mathematical Problems in Engineering 17

[35] M M Masud C Woolam J Gao et al ldquoFacing the reality ofdata stream classification coping with scarcity of labeled datardquoKnowledge and Information Systems vol 33 no 1 pp 213ndash2442012

[36] J Read A Bifet G Holmes and B Pfahringer ldquoScalable andefficient multi-label classification for evolving data streamsrdquoMachine Learning vol 88 no 1-2 pp 243ndash272 2012

[37] I Lopez-Yanez A J Arguelles-Cruz O Camacho-Nieto andC Yanez-Marquez ldquoPollutants time-series prediction usingthe gamma classifierrdquo International Journal of ComputationalIntelligence Systems vol 4 no 4 pp 680ndash711 2011

[38] M E Acevedo-Mosqueda C Yanez-Marquez and I Lopez-Yanez ldquoAlpha-beta bidirectional associative memories theoryand applicationsrdquo Neural Processing Letters vol 26 no 1 pp 1ndash40 2007

[39] C Yanez-Marquez E M Felipe-Riveron I Lopez-Yanez andR Flores-Carapia ldquoA novel approach to automatic color match-ingrdquo in Progress in Pattern Recognition Image Analysis andApplications vol 4225 of Lecture Notes in Computer Science pp529ndash538 Springer Berlin Germany 2006

[40] I Lopez-Yanez L Sheremetov and C Yanez-Marquez ldquoAnovel associative model for time series data miningrdquo PatternRecognition Letters vol 41 no 1 pp 23ndash33 2014

[41] A Bifet G Holmes R Kirkby and B Pfahringer ldquoData streammining a practical approachrdquo Tech RepUniversity ofWaikato2011

[42] Massive Online Analysis (MOA) 2014 httpmoacmswaikatoacnzdatasets

[43] K Bache and M Lichman UCI Machine Learning RepositorySchool of Information and Computer Science University ofCalifornia Irvine Calif USA 2013 httparchiveicsucieduml

[44] A Bifet R Kirkby P Kranen and P Reutemann ldquoMassiveOnline Analysis Manualrdquo 2012 httpmoacmswaikatoacnzdocumentation

[45] M Harries ldquoSplice-2 comparative evaluation electricity pric-ingrdquo Tech Rep University of South Wales Cardiff UK 1999

[46] R Agrawal T Imielinski and A Swami ldquoDatabase mining aperformance perspectiverdquo IEEE Transactions on Knowledge andData Engineering vol 5 no 6 pp 914ndash925 1993

[47] J Gama R Sebastiao and P P Rodrigues ldquoOn evaluatingstream learning algorithmsrdquo Machine Learning vol 90 no 3pp 317ndash346 2013

[48] F Wilcoxon ldquoIndividual comparisons by ranking methodsrdquoBiometrics Bulletin vol 1 no 6 pp 80ndash83 1945

[49] M Hall E Frank G Holmes B Pfahringer P Reutemannand I H Witten WEKA 3 Data Mining Software in Java 2010httpwwwcswaikatoacnzmlweka

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 17: Research Article Data Stream Classification Based …downloads.hindawi.com/journals/mpe/2015/939175.pdfuncertain data [ ], classi cation with unlabeled instances [ , , ], novel class

Mathematical Problems in Engineering 17

[35] M M Masud C Woolam J Gao et al ldquoFacing the reality ofdata stream classification coping with scarcity of labeled datardquoKnowledge and Information Systems vol 33 no 1 pp 213ndash2442012

[36] J Read A Bifet G Holmes and B Pfahringer ldquoScalable andefficient multi-label classification for evolving data streamsrdquoMachine Learning vol 88 no 1-2 pp 243ndash272 2012

[37] I Lopez-Yanez A J Arguelles-Cruz O Camacho-Nieto andC Yanez-Marquez ldquoPollutants time-series prediction usingthe gamma classifierrdquo International Journal of ComputationalIntelligence Systems vol 4 no 4 pp 680ndash711 2011

[38] M E Acevedo-Mosqueda C Yanez-Marquez and I Lopez-Yanez ldquoAlpha-beta bidirectional associative memories theoryand applicationsrdquo Neural Processing Letters vol 26 no 1 pp 1ndash40 2007

[39] C Yanez-Marquez E M Felipe-Riveron I Lopez-Yanez andR Flores-Carapia ldquoA novel approach to automatic color match-ingrdquo in Progress in Pattern Recognition Image Analysis andApplications vol 4225 of Lecture Notes in Computer Science pp529ndash538 Springer Berlin Germany 2006

[40] I Lopez-Yanez L Sheremetov and C Yanez-Marquez ldquoAnovel associative model for time series data miningrdquo PatternRecognition Letters vol 41 no 1 pp 23ndash33 2014

[41] A Bifet G Holmes R Kirkby and B Pfahringer ldquoData streammining a practical approachrdquo Tech RepUniversity ofWaikato2011

[42] Massive Online Analysis (MOA) 2014 httpmoacmswaikatoacnzdatasets

[43] K Bache and M Lichman UCI Machine Learning RepositorySchool of Information and Computer Science University ofCalifornia Irvine Calif USA 2013 httparchiveicsucieduml

[44] A Bifet R Kirkby P Kranen and P Reutemann ldquoMassiveOnline Analysis Manualrdquo 2012 httpmoacmswaikatoacnzdocumentation

[45] M Harries ldquoSplice-2 comparative evaluation electricity pric-ingrdquo Tech Rep University of South Wales Cardiff UK 1999

[46] R Agrawal T Imielinski and A Swami ldquoDatabase mining aperformance perspectiverdquo IEEE Transactions on Knowledge andData Engineering vol 5 no 6 pp 914ndash925 1993

[47] J Gama R Sebastiao and P P Rodrigues ldquoOn evaluatingstream learning algorithmsrdquo Machine Learning vol 90 no 3pp 317ndash346 2013

[48] F Wilcoxon ldquoIndividual comparisons by ranking methodsrdquoBiometrics Bulletin vol 1 no 6 pp 80ndash83 1945

[49] M Hall E Frank G Holmes B Pfahringer P Reutemannand I H Witten WEKA 3 Data Mining Software in Java 2010httpwwwcswaikatoacnzmlweka

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 18: Research Article Data Stream Classification Based …downloads.hindawi.com/journals/mpe/2015/939175.pdfuncertain data [ ], classi cation with unlabeled instances [ , , ], novel class

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of