Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
A METHOD FOR SOLVING THE CLASSIFICATION PROBLEM USING CELLULAR AUTOMATA BASED ON HEAT TRANSFER PROCESS
Document Type and Number:
WIPO Patent Application WO/2019/059871
Kind Code:
A2
Abstract:
The present invention relates to a method that is developed for solving the classification problem using cellular automata (CA) based on heat transfer (propagation) process. In the present invention, the data set is randomly distributed to the training, validation and test sets and then the data points in the training set are mapped to the cells in the CA. Then, a state value is assigned to the cells having data points. This state value shows the class label of the data sample contained in the cell. That is to say, a cell not having a data sample in the beginning will not have a class label. Further, the class labels are spread to the other cells in the automaton by local interactions. This process is designed upon being inspired by the heat transfer process in nature. Thus the areas near the data points are heated. On the other hand, a second rule is used for changing the state of the heated cells. Cells which are not assigned to a class start to join the specified classes under certain conditions. Validation cluster is used for determining the criterion of completing the training process. After a training process is carried out in this learning framework, the test set is used for determining the success rate.

Inventors:
KORKMAZ EMIN ERKAN (TR)
USTA TUĞBA (TR)
DÜNDAR ENES BURAK (TR)
Application Number:
PCT/TR2018/050517
Publication Date:
March 28, 2019
Filing Date:
September 21, 2018
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
UNIV YEDITEPE (TR)
International Classes:
G06F7/00
Attorney, Agent or Firm:
ANKARA PATENT BUREAU LIMITED (TR)
Download PDF:
Claims:
CLAIMS

1. A computer applied method enabling to classify the data using cellular automata essentially characterized by the steps of

a) randomly dividing the data points of a data set containing data points having the same or different attribute values into three sets, namely training set, validation set and test set,

b) -assigning the data points of the training set to the cells of a cellular automaton,

c) assigning these cells, according to the attribute values of the data points assigned to the said cells, to a state value and a constant temperature value; and assigning all of the cells that do not contain a data point to a unique state value different from the state values utilized for cells that contain a data point and a temperature value lower than the said constant temperature value, d) creating a selection list containing the cells, to which the data points in the training cluster are assigned, and the neighbors of these cells,

e) selecting a cell in the selection list randomly,

f) calculating the average temperature value of the selected cell and its neighbor cells,

g) determining if the selected cell and its neighbor cells have an assigned data point or not,

updating the temperature of the cells, to which no data point in the training set is assigned, as the said average temperature, and incorporating these cells with updated temperatures into the selection list,

- not updating the temperature of the cells to which data points in the training set are assigned,

h) selecting a cell in the selection list randomly,

i) determining if the temperature of the selected cell and its neighbor cells is above a threshold value or not,

- moving the said neighbor cell(s) having a temperature above the threshold value to the same state (class) with the selected cell, j) creating a loop from steps (e) - (i) to be repeated for a predetermined number of times,

k) assigning the data points in the validation set to the cells of the cellular automaton,

1) determining if a state value (class label) is assigned previously in the steps (e) - (i) to a predetermined percentile (for example 95%) of the cells, to which these data points in the validation set are assigned,

if assigned, assigning the closest state value in the cellular automaton to the remaining cells in the said validation set,

- if not assigned, returning to step (e).

2. A computer applied method according to Claim 1, which enables to determine the success rate of the computer applied method in Claim 1 and is characterized by

m) assigning each data point in the test set to the cells of the cellular automaton, n) assigning a state value (class label) according to the attribute values of the data points assigned to the said cells,

o) comparing the state value (class label) assigned in steps (b) - (i) to the cell, to which each data point in the test set is assigned, with the state value (class label) of the data point in the test set,

p) determining the success rate based on number of cells having the same state value (class label) and the total number of cells to which the data points in the testing cluster are assigned. 3. An electronic device which enables to operate a computer applied method according to Claim 1 or 2, and which comprises a processor, a memory unit, a data entry interface and a monitor.

Description:
A METHOD FOR SOLVING THE CLASSIFICATION PROBLEM USING CELLULAR AUTOMATA BASED ON HEAT TRANSFER PROCESS

Field of the Invention

The present invention relates to the methods utilized for processing data with rules. More specifically, the invention relates to a method that is developed for solving the classification problem using cellular automata based on heat transfer (propagation) process.

Background of the Invention Data classification problem is an important problem which has very different areas of use and which is solved by different methods in the literature. The problem can be defined as producing a model for determining the categories of the elements (data points) of a data set. A training set comprising data points having certain attributes is required for producing this model. One of the attributes of the data points is class label. A model is produced by using the other attributes to determine the class label. The success rate of this model is determined by using a test set different from the training set. Then, this model is used for determining the classes of the future data points or data points whose class labels are unknown. Classification algorithms have many different areas of use. For example, by generating a model from various pathological or personal features of diagnosed patients by means of classification methods, it becomes possible to perform an automatic diagnosis on new patients by this model. Or, it is possible to obtain a system that will categorize the texts by means of a model that will be produced according to the features that will be obtained from the contents of the written texts, or estimate the behavior of the users in any environment by similar methods. Many techniques have been proposed and used for problems of classification. Some of these are; Decision Tree, Support Vector Machine and Naive Bayesian approaches [1, 2, 3]. All approaches define a number of separators in the data space using the attributes of the data points in the training set. By means of these separators, classes of new data points are determined. However, each classification approach has to use a certain orientation or presupposition when creating these separators. This situation prevents the emergence of a general classification approach that will be successful on all kinds of data. Therefore, development of different classification approaches is an attractive approach that is needed in this research area.

References [1] T. Hastie, R. Tibshirani, and J. Friedman, "The Elements of Statistical Learning ' ' Springer (2001 ).

[2] Corinna Cortes and Vladimir Vapnik. Support-vector Networks. Machine learning, 20(3) :273-297, 1995.

[3] Peter Cheeseman, Matthew Self, James Kelly, Will Taylor, Don Freeman, and John C Stutz. Bayesian classification. In AAAI, volume 88, pages 607-611, 1988.

[4] Boerlijst M, Hogeweg P (1991 ) Self-structuring and selection: Spiral waves as a substrate for prebiotic evolution. Artificial life, 2:255-276.

[5] Ermentrout GB, Edelstein-Keshet L (1993) Cellular automata approaches to biological modeling. Journal of theoretical Biology, 160(1 ):97-l 33.

[6] Langton CG (1984) Self-reproduction in cellular automata. Physica D: Nonlinear Phenomena, 10(1 ): 135-144.

[7] Mai J, Von Niessen W (1992) A cellular automaton model with diffusion for a surface reaction system. Chemical physics, 165(1 ): 57-63.

[8] Margolus N, Tojfoli T, Vichniac G (1986) Cellular-automata supercomputers or fluid-dynamics modeling. Physical Review Letters, 56(16):1694. [9] [14] C.L. Blake D.J. Newmanand C.J. Merz. UCI repository of machine learning databases, 1998.

URL http://www.ics.uci.edu/$\sim$mlearn/MLRepository.html.

[10] Mark A Holland Lloyd A Smith. Practical feature subset selection for machine learning. 1998.

[11] Anu] Gupta. Classification of complex uci datasets using machine learning and evolutionary algorithms. International journal of scientific and technology research, 4(5): 85-94, 2015.

[12] Anahita Ghazvini, Jamilu Awwalu, and Azuraliza Abu Bakar. Comparative analysis of algorithms in supervised classification: A case study of bank notes dataset. Computer Trends and Technology, 17(1 ): 39-43, 2014.

[13] Manika Vermaand Devarshi Mehta. A comparative study of techniques in datamining. International Journal of Emerging Technology and Advanced

Engineering, 4(4): 314-321, 2014.

[14] A Shrutiand B.I. Khodanpur. Comparative study of advanced classification methods. International Journal on Recent and Innovation Trends in Computing and Communication, 3(3): 1216-1220, 2015.

[15] S Syed Shajahaan, S Shanthi, and V Mano Chitra. Application of datamining techniques to model breast cancer data. International Journal of Emerging Technology and Advanced Engineering, 3(11 ):362-369,239 2013.

Problems Solved by the Invention

The present invention relates to a method that is developed for solving the classification problem based on heat transfer (propagation) process.

In the method of the present invention, Cellular Automata (CA) is used for solving the classification problem. Cellular Automata is a discrete system consisting of cells that have neighborhood relationship with each other. Computations in CA are done by considering the interactions between neighbor cells. Each cell can be in a certain state, and the new state of the cell in the next step is determined based on the states of its neighbor cells. Therefore, a change in a CA occurs based on these local interactions and this in turn enables the CAs to have a strong parallel computation capacity. CAs are utilized to simulate different processes in different disciplines [4, 5, 6, 7, 8]. Cellular automata provide a dynamic system consisting of cells. Each cell in the system can be in one of the predetermined states. Furthermore, the calculations can be performed by changing the state values by certain rules. The said rules are based on the interaction of the neighbor cells. In the present invention, a classification method named as Stochastic Cellular Automata (SCA) classification is proposed. In the beginning, the data set is randomly distributed to the training, validation and test sets and then the data points in the training data set are mapped to the cells in the CA. Then, a state value is assigned to the cells having data points. This state value shows the class label of the data sample contained in the cell. That is to say, a cell not having a data sample in the beginning will not have a class label. Further, the class labels are spread to the other cells in the automaton by local interactions. This process is designed upon being inspired by the heat transfer process in nature. The cells comprising the data points in the CA are defined as heat sources and continuously produce heat energy. This energy is spread to the neighbor cells. Thus the areas near the data points are heated. On the other hand, a second rule is used for changing the state of the heated cells. Cells which are not assigned to a class start to join the classes denoted with different states. Finally, the data space represented by the automaton is categorized according to the attributes and classes of the data points in the training set.

Different class labels in the training set are spread to various areas in the data space represented by CA at the end of the training process. Validation set is used for determining the stopping criterion of the training process. During this training process, after the process is completed, in other words, after the class labels are spread to the empty cells by the heat and state transfer rules, the test set is used for determining the success rate.

The approach disclosed in the Turkish patent application numbered 2016/19702 and titled "A Method for Solving the Clustering Problem Using Cellular Automata Based on Heat Transfer Process" is adapted with this study to be converted into a form that also solves classification problem. As mentioned in the background of the invention, each classification algorithm has to use a certain orientation or presupposition. With the proposed approach, a new classification method based on the heat transfer principle in nature is created. Realizing the method in cellular automata enables the system to be open to parallel operation. It is possible to enhance performance of the system with parallel operation. In classification of huge datasets where classical methods will experience performance problems, the proposed system may stand out with parallel operation.

Detailed Description of the Invention

A method developed to fulfill the objectives of the present invention is illustrated in the accompanying figures, in which:

Figure 1. is the first configuration of the Cellular Automata used in an embodiment of the invention.

Figure 2. is the last configuration of the Cellular Automata in Figure 1.

Figure 3. The following are related to the heating process in cellular automata

(a) initial state

(b) is the view after heat transfer

A method enabling to classify the data using cellular automata essentially comprises the following steps: a) randomly dividing the data points of a data set containing data points having the same or different attribute values into three sets, namely training set, validation set and test set,

b) -assigning the data points of the training set to the cells of a cellular automaton,

c) assigning these cells, according to the attribute values of the data points assigned to the said cells, to a state value (class label) and a constant temperature value; and assigning all of the cells that do not contain a data point to a unique state value different from the state values utilized for cells that contain a data point (for instance state "zero") and a temperature value lower than the said constant temperature value,

d) creating a selection list containing the cells, to which the data points in the training cluster are assigned, and the neighbors of these cells,

e) selecting a cell in the selection list randomly,

f) calculating the average temperature value of the selected cell and its neighbor cells,

g) determining if the selected cell and its neighbor cells have an assigned data point or not,

updating the temperature of the cells, to which no data point in the training cluster is assigned, as the said average temperature, and incorporating these cells with updated temperatures into the selection list,

not updating the temperature of the cells to which data points in the training cluster are assigned,

h) selecting a cell in the selection list randomly,

i) determining if the temperature of the selected cell and its neighbor cells is above a threshold value or not,

moving the said neighbor cell(s) having a temperature above the threshold value to the state (class) of the selected cell,

j) creating a loop from steps (e) - (i) to be repeated for a predetermined number of times, k) assigning the data points in the validation cluster to the cells of the cellular automaton,

1) determining if a state value (class label) is assigned previously in the steps (e) - (i) to a predetermined percentile (for example 95%) of the cells, to which these data points in the validation cluster are assigned,

if assigned, assigning the closest state value (class label) in the cellular automaton to the remaining cells in the said validation cluster,

if not assigned, returning to step (e). The method of the present invention is a computerized method that can be executed by an electronic device (e.g. a desktop computer, a laptop computer, a tablet computer, etc.). The said electronic device comprises a storage unit (e.g. a hard disk, flash disk, etc.) for storing the rules that will be used in the invention, a processing unit (e.g. a microprocessor) for running/applying these rules, a data entry interface (e.g. a mouse, keyboard or a virtual keyboard) for inputting the said rules and dataset into the electronic device, and a monitor (e.g. an LCD monitor, touchscreen, etc.) for displaying the results to the user.

In the first step of the method of the invention, a data set comprising data points having the same or different attributes is inputted to the electronic device via the data entry interface. The said data set can be preloaded to a memory unit. Then the said data cluster is randomly divided into training, validation and test sets. Preferably, the training set is formed so as to contain more data points than the data points in the validation and testing clusters. In a preferred application of the invention, the data points in a dataset are divided such that the training set will comprise 60% of the data points, while the validation set and the test set will each comprise 20% of the data points, however the present invention is not limited to these ratios.

The training set , validation set and test set are data sets having certain attributes. One of the attributes of the data points is class label. For example, if we assume that x, vector is the attributes of a data point except the class label and that lj is the class label, the training data cluster will be comprised of the set {(Xi ,li), (X2,l2), ... (Xn,ln)}. A model is produced by using the other attributes to determine the class label. This model can be defined as f : X -> L. In the definition, X represents the data space and L represent the result space. The success rate of this model is determined by using a test set different from the training set. Then, this model is used for determining the classes of the future data points or data points whose class labels are unknown, f function can be thought of as a separator in the sample space. This separator enables the categories in the data to be differentiated.

In the second step of the invention, the samples in the data set should be assigned to the cells of the Cellular Automaton (CA). However, first of all the data points in the training set are assigned to the cells of the CA. As described in the following sections, the data points in the validation and test sets will also be assigned to the cells of the CA in the proceeding steps.

The CA, which is used in the process of assigning the data in the training, validation and test sets to the cells, has d dimensions. This d number is equal to the number of attributes in the data cluster. The data points are assigned to the cells in the CA based on the values of the attributes. If c is the number of cells used in dimension d and if A^ ia and A^ ax are the smallest and biggest values set of the attribute corresponding to the said dimension in the data; and furthermore, if the value of the attribute of the data point that will be assigned in the said dimension is x A d , the index ( xf ) of the cell to which this data point will be assigned in dimension d is calculated as in the following equation. Using Equation 1, after assigning the data instances in the training set to the cells, the state values (class labels) of the cells containing data points are assigned. This assignment is based on the class label of the data points in the cells. The empty cells will not have any class label. As described in the following paragraphs, the class labels are spread to the empty cells in the CA by the method inspired by the heat transfer process.

The cells containing the data points in the training set are assumed to be heat sources. The heat energy produced by these cells is transferred to other cells. Therefore, each cell has an initial temperature value. If a cell has a data point, the temperature value of that cell is determined as a constant value (e.g. 100°) and this value is kept constant throughout the process. The cells which are initially empty have a lower temperature value (e.g. 0°) than said constant value. In Figure 1, a two-dimensional example CA is presented. The first value in the cells indicates the temperature of the cell while the second value indicates the state information. In the example, the data points of a training set consisting of a total of six data points and having two different class labels are assigned to the cells of this automaton. As mentioned above, the temperature of the cells having data points is determined to be 100°. The empty cells have a temperature of 0°. The two different classes in the training set are represented in the automaton with 1 and 2 state values. Again, the empty cells are in 0 state. After the datasets in the training set are assigned to the cells of the cellular automaton, and temperature values and state values (the class label) are given to these datasets, the process of spreading the class labels in the automaton is carried out.

Two different rules have been used in the process of spreading the class labels in the automaton. The first rule is to enable the heat energy produced by the heat sources in automaton to spread in the automaton. The other rule is to change the state values of the heated empty cells. When the said two rules are repeatedly applied to randomly selected cells, the state values of the cells having data points (class labels) begin to spread to the empty cells in the automaton. Thus, it becomes possible to categorize the data space represented by CA. When the process described herein is operated on the automaton shown in Figure 1, the last configuration to which the said automaton evolves is shown in Figure 2. As seen in Figure 2, all of the cells were heated and acquired a state value of 1 or 2 after a certain period of time; that is to say, the data space represented by the automaton is categorized as two different classes.

The first rule (heat transfer method) used in the process of spreading the class labels in the automaton is given in algorithm 1. In algorithm 1, the input cell C is randomly selected from a selection set. The said selection set comprises the cells initially having data points and the neighbors of the said cells. Another process in the algorithm is to determine the neighbors of the selected cell. If the neighbor cell is not present in the selection list, this cell is generated and added to the list so that it can be selected for the heat transfer process in the proceeding steps. Then, the average temperatures of the randomly selected cell and its neighbors are calculated. As mentioned before, if the randomly selected cell contains data sample, temperature value of the said cell is fixed to 100°. However, if the selected cell is empty, the new temperature value of this cell is equalized to the calculated average value. Furthermore, if the neighbor cells of the selected cell are empty, the temperature values of these cells are also updated with the calculated average value.

Algorithm 1: Heat Transfer Process in CA procedure HEAT-PROP AG ATION(CELL C)

N = getNeighbor(C)

AverageTemperature = calculateAverageTemperature(C,N) if empty(C) then

Ctemperature = AverageTemperature

end if

for each Cell K S N do

if empty(K) then

Ktemperature = AverageTemperature

end if

end procedure

The method presented in Algorithm 1 has the tendency to equalize the temperature of all of the cells. However, the heat sources (in other words, the cells containing the data points in the training cluster) constantly provide heat to the system and their temperature values do not change. This situation enables the regions that have more data points inside in the automaton to get warmer compared to the other regions. In Figure 3(a), a two-dimensional dataset example is presented. After the above mentioned rule is applied to randomly selected cells for a certain period of time, temperature values of CA are shown in Figure 3(b). The dark tones in Figure 3(b) denote the high temperature values.

As mentioned in the previous paragraphs, different classes in the data set are denoted with different state values in the CA. In the beginning, the cells having data points have the state values denoting the class label of the data point they contain. The state values of these cells are constant and they do not change with the rule that updates the state. However, the states of the empty cells may change depending on their temperature values. If these empty cells are heated enough, they may move to the state of their neighbor cells. The state updating process is given in Algorithm 2. As it is the case in heat transfer process, the cell whose state will be updated is randomly chosen from the selection list. If a cell and its neighbor is heated sufficiently, it may transfer its state to its neighbor cell. As it is seen in the algorithm, if the temperature value of the selected cell C exceeds the threshold value (e.g. 30°) and the state value is different from zero, the neighbors of the cell C are determined. Then, all of the neighbor cells having a temperature value above the threshold value acquire the state value of the cell C. Additionally, this process is recursively called for the neighbor cells. Thus, a class label represented by a certain state value will rapidly spread in the heated regions of the automaton.

Algorithm 2: State Transfer in CA

procedure STATE-TRANSFER(CELL C)

if ((Cstate != 0) and (Ctem P eratu re > threshold)) then

N = getNeighbor(C)

for each cell K £ N do

if ( (K s tate == 0) and (Kt em peratu re > threshold) ) then

Estate = Cstate

STATE-TRANSFER(K)

end if

end for

end if

end procedure

How the heat transfer and state updating rules described in the above paragraphs are used on the CA within a general framework is given in Algorithm 3. Algorithm 3: General Algorithm

procedure SCA-CLASSIFY(CELLULAR AUTOMATON CA, DATASET D)

MAPDATA(CA,Dtrain)

int continue = 1

int iteration = 0

while (continue) do

Cell C = getCellFromSelectionSet(CA)

HEAT-PROP AGATION(C)

Cell C = getCellFromSelectionSet(CA)

STATE-TRANSFER(C)

if iteration% 1000 = 0 then

continue = ControlValidationPoints(CA,O va \idation)

end if

iteration++

end while

N = GetNotAssignedClassList(CA,D V aiidation)

for each cell K £ N do

DETERMINE-LABEL(K)

end for

end procedure

As seen in algorithm 3, initially the data points in the training set are assigned to the CA. Then a loop starts, wherein the heat transfer and state updating processes are applied to the randomly chosen cells.

The validation set is used for determining the stopping criterion of the above given algorithm. In the following process, the class labels are spread to the empty cells via state updating process. However, the number of cells in the CA for high dimensional data sets increases exponentially. Therefore, it is not possible to generate all of the cells in the CA and to assign a class value for each cell. In the beginning of the process, a data structure is formed which includes the cells containing the data points in the training cluster and their neighbor cells and which is named as the selection list.

The cells initially in the selection list are selected for the heat transfer process. However, if these selected cells propagate heat to a cell that is not in the data structure, these cells which are getting warmer are generated and added to the selection list. This process stops when the class labels in the CA become widespread to a certain extent. The amount of this spreading is measured using the data points in the validation set determined randomly at the beginning of the process. For validation process, the data points in this set are assigned to the CA cells. Then it is determined whether the cells, to which these data points are assigned, are assigned a class label with the training process. The algorithm is stopped when a certain portion (e.g. 95%) of the CA cells, to which the data points in the validation cluster are assigned, have a class label with the state updating process. When the process is stopped, the cells in the validation set that are not assigned a class label (5% of the validation cluster) are assigned the class label of the cells, which are in the closest position to the said cells and have a class label. If more than one class label is eligible to be assigned to a cell (for example, if the cell's two different neighbors have different class labels), one of these labels is randomly selected. This assignment process is represented with DETERMINE-LABEL(K) method in the algorithm. That is to say, the training process is stopped when the class labels, which begin to spread in the CA with the training process, cover the majority of the validation set (95%).

In the testing process, the class labels of the data points in the test set will be tried to be estimated using the categorization represented by the CA. For testing process, each data point in the test set is assigned to individual CA cells according to their attribute values. If a class label is assigned to a cell, to which a data point in the test set is assigned, during the training process, the said label and the label of the test set data point are compared and if the labels match, then it is determined that the class of the data point in the test set is estimated correctly. In the case that the labels do not match or a class label is not assigned to the cell that contains the data point in the test set in the training process, the process is considered unsuccessful. As a result, the ratio of the data points, whose labels are successfully estimated, to the overall size of the test set is determined as the success rate (accuracy) of the system. In the following section, the success level of the method of the present invention is determined using different data sets and is compared with the Naive-Bayes method which is one of the basic classification algorithms.

The following additional steps are used for determining the success rate;

m) assigning each data point in the test set to the cells of the cellular automaton, n) assigning a state value (class label) according to the attribute values of the data points assigned to the said cells,

o) comparing the state value (class label) assigned in steps (b) - (i) to the cell, to which each data point in the test set is assigned, with the state value (class label) assigned to the data point in the test set,

p) determining the success rate based on the number of cells having the same state value (class label) and the total number of cells to which the data points in the test set are assigned.

Tests were carried out on different data sets to determine the success level of the system. The data sets that were used were obtained from UCI data repository [9]. Table 1 shows the features of these data sets. As can be seen in Table 1, data sets having different number of attributes and classes have been selected.

Table 1 - The data clusters used in the experiments Prior to the classification process, "Correlation Based Feature Selection " method [10] was applied to the data sets to reduce the dimensions of these data sets. For the Irish data set, "petallength " and "petalwidth " were selected, while for Banknote data set, first and second attributes were selected. For the Glass data set, only the attribute "Si" is removed during this process. The Heart data set typically has 13 attributes, however with the said method 7 attributes were selected from this data set. The attributes are "chest", "resting electrocardiographic result", "maximum heart rate archived", "exercise induced angina ", "oldpeak", "number of major vessels " and "thai". On the other hand, the 14 attributes of the Australian data set is decreased to 5. The selected attributes were specified as 5, 8, 10, 13, 14. "Year of operation" and "the number of positive nodes" are the attributes selected for Haberman data set. Finally, the Breast-wisc data set typically has 9 attributes. None of the attributes were removed during the reduction process. Table 1 gives the number of attributes obtained as by the attribute reduction process.

Using the described data sets, SCA-classification method performance percentages are given in Table 2. These results are the accuracy values obtained on randomly selected test sets. Average performance value is obtained from 10 different trials. Furthermore, the best result is given in the said table.

Table 2 - Success rate of SCA - Classification method

Table 3 provides comparison of the results obtained by SCA - classification method with NaiveBayes classification method which is accepted as a basic classification algorithm. Naive-Bayes was used in different studies on these datasets. References of these studies are also added to the table. As also seen in the table, SCA-classification shows a very similar performance with this basic approach. The performance difference of the two approaches on the data sets that are used is less than 5% except Haberman data set.

Table 3 - Comparison of SCA - classification with Nave Bayes classification method As a conclusion, a new method for classification based on cellular automata is presented in this invention. The approach disclosed in the Turkish patent application numbered 2016/19702 and titled "A Method for Solving the Clustering Problem Using Cellular Automata Based on Heat Transfer Process" is adapted with this study to be converted into a form that also solves classification problem. By using a cellular automaton, the two rules used in Algorithm 1 and 2 and the general method specified in Algorithm 3 were used to reveal the classes in the data by means of a process similar to the propagation of heat in a medium.