Comparison of correlated algorithm accuracy Naive Bayes Classifier and Naive Bayes Classifier for heart failure classification

Heart failure (ARF) is a health problem that has relatively high mortality and morbidity rates in developed or developing countries, including Indonesia. In 2016, WHO stated that 17.5 million people died from cardiovascular disease, while in 2008, HF disease represented 31% of patient deaths worldwide. One of the new breakthroughs for early diagnosis was by utilizing data mining techniques. In this study, the Correlated Naive Bayes Classifier (C-NBC) and Naive Bayes Classifier (NBC) algorithms were used to obtaine the best accuracy results so that they can be used for the Heart Failure dataset. Based on the results of the tests that had been carried out, it showed that the Correlated Naive Bayes Classifier (C-NBC) algorithm accuracy of 80.6% was higher accuracy than the Naive Bayes Classifier (NBC) algorithm which was only 67.5%. With the results of this study, the use of the Correlated Naive Bayes Classifier (C-NBC) algorithm can be used to diagnose patients with potential heart failure (heart failure) because it has a high level of accuracy and is categorized as Good Classification.


Introduction
Heart failure (HF) is a major health issue with relatively high mortality and morbidity rates in developed or developing countries, including Indonesia [1].In addition, HF is the leading cause of hospital admission, especially among the elderly [2], [3].Based on the diagnosis results, coronary heart disease and heart failure are relatively common in the population aged 15-24 years.Death due to cardiovascular disease, especially heart failure, is 27%.About 3-20 out of 1000 people experience this disease.The incidence of heart failure increases with age (100 for every 1000 people over the age of 60 years) [4].
Based on the results of the Health Research of the Ministry of Health of the Republic of Indonesia in 2013, it was found that the estimation of heart failure patients based on a doctor's diagnosis was estimated at 0.13% or 229,696 people, with the most estimated sufferers were coming from East Java Province at 0.19% or around 54,826 people.Meanwhile, based on the diagnosis of symptoms, it is estimated at 0.3% or 530,068 people, with the highest estimated number of sufferers were coming from West Java Province at 0.3% or 96,487 people [5].
Given the number of patients with heart failure and the importance of a vital organ such as the heart, predicting heart failure has become a priority for doctors.Predicting heart failure-related issues in practice have usually failed to achieve high accuracy [6].Rapid and accurate prediction of death for people with heart failure is very important to improve patient health care and prevent them from dying early [7].Therefore we need a new breakthrough that is right processing for the such data in order to produce a health plan that can prioritize disease management approaches to reduce the mortality rate of heart failure patients.
Several studies have been conducted related to this research.First, the study of tuberculosis disease using the Naive Bayes algorithm based on particle swarm optimization.The result obtained in this study indicates that the accuracy value of the Naive Bayes algorithm is 92.69% [8].Second, the research uses several datasets, including the Balance-Scale Dataset, the Iris Dataset, the Haberman Dataset, and the Servo Dataset, using the Correlated Naive Bayes Classifier (C-NBC) algorithm.Using the Correlated Naive Bayes Classifier (C-NBC) algorithm, the result shows the increase in the accuracy value that is 13.3% [9].In addition, the study used a dataset of Covid-19 patients.This study uses various Naive Bayes techniques.The result shows that an average accuracy value was 87% [10].

Research Article
Open Access (CC-BY-SA)

Abstract
Heart failure (ARF) is a health problem that has relatively high mortality and morbidity rates in developed or developing countries, including Indonesia.In 2016, WHO stated that 17.5 million people died from cardiovascular disease, while in 2008, HF disease represented 31% of patient deaths worldwide.One of the new breakthroughs for early diagnosis was by utilizing data mining techniques.In this study, the Correlated Naive Bayes Classifier (C-NBC) and Naive Bayes Classifier (NBC) algorithms were used to obtaine the best accuracy results so that they can be used for the Heart Failure dataset.Based on the results of the tests that had been carried out, it showed that the Correlated Naive Bayes Classifier (C-NBC) algorithm accuracy of 80.6% was higher accuracy than the Naive Bayes Classifier (NBC) algorithm which was only 67.5%.With the results of this study, the use of the Correlated Naive Bayes Classifier (C-NBC) algorithm can be used to diagnose patients with potential heart failure (heart failure) because it has a high level of accuracy and is categorized as Good Classification.Based on the description of the problems that have been described, to reduce the number of deaths caused by heart failure is to make an early diagnosis correctly.One method that can be used is to use data mining techniques.To diagnose heart failure, a form or procedure that has the best level of accuracy is required, so in this study, a comparison of several data mining classification methods is required, namely the Correlated-Naive Bayes Classifier and Naive Bayes Classifier algorithms, to obtain the best accuracy so that they can be used for the diagnosis of potential heart failure Effectively and optimally.

Method
The method proposed by researchers in research on the Comparison of Accuracy of Correlated Naive Bayes Classifier and Naive Bayes Classifier Algorithms for Classification of Heart Failure Disease from several stages can be seen in Figure 1.The dataset in this study was collected from the Krembil Research Institute, Toronto, Canada, dataset taken from the UCI Repository.The dataset has 13 attributes, two classes, and 299 data [11].

Pre-Processig Stage
The pre-processing stage is a data mining process that is first carried out to get good quality data to be processed before the classification process, one of which is data transformation.Data transformation is an essential part of the health dataset in the pre-processing stage [12] 3. Use of Classification Method Stages of using the classification method.One of the tasks of data mining is classification.Classification represents the most widely used data mining technique [13].In addition, the classification can be improved by increasing the number of features in data mining [14].This study uses the Correlated Naive Bayes classifier (C-NBC) and Naive Bayes Classifier (NBC) algorithms.The results of the algorithm was used to support the level of accuracy using the Confusion Matrix.The confusion matrix was obtained by calculating the value of precision, recall, and F-Measure [12], can be seen in table 2.

Table 2. Confusion Matrix
Correct Classification Clasification as + - Details of the general calculation of the value of precision, recall, and F-Measure of an accuracy [13], :

Validation and Evaluation
The validation and evaluation stages were carried out by measuring the accuracy of the results achieved by the confusion matrix and K-fold cross-validation technique models.

Withdrawal of Conclusion
This stage concluded the results obtained from the study using the Correlated Naive Bayes Classifier (C-NBC) algorithm and the Naive Bayes Classifier algorithm, which provided accurate results for classifying heart failure based on the precision, recall, and F-Measure values of each algorithm, with the classification level [15], as follows :

A. Pre-Processing Stage
The pre-processing stage was through data transformation, which aimed to facilitate the calculation of the value between the class attributes at the classification stage.The following results from the change of non-numeric data types into numeric data can be seen in table 3.
Table 3. Data transformation of heart failure dataset class No.
Non-Numeric Data Numerical Data Survived 0 Then at this pre-processing stage, identification and adjustment of attributes and selection of the heart failure dataset were carried out so that the data obtained was data that was really ready to be used in the next stage.The results of the heart failure dataset that have been adjusted for attributes for the weka application as shown in table 4.

B. Use of Classification Method
After going through the pre-processing stage, then the stage was continued to the stage of using the classification method.At this stage, the aim was to produce an accuracy value from the confusion matrix.The testing technique used the Correlated Naive Bayes Classifier (C-NBC) algorithm and the Naive Bayes Classifier (NBC) algorithm on the dataset with an evaluation method of 10-fold cross-validation with randomization of 20 times.The results of the algorithm for the accuracy of the heart failure dataset.In order to make it easier to see the results of the comparison of the accuracy of the Correlated Naive Bayes Classifier (C-NBC) algorithm and the Naive Bayes Classifier (NBC) algorithm to the heart failure dataset based on the test results, you can see the visualization of Figure 2., below.algorithm was 76.5%.The highest results obtained in the second test of these algorithms obtained the accuracy value of the Correlated Naive Bayes Classifier (C-NBC) algorithm compared to the Naive Bayes Classifier (NBC) algorithm because the Correlated Naive Bayes Classifier (C-NBC) algorithm took into account the R-Square (correlation value) of each dataset attribute to its class.This was in line with the fact that the addition of parameters to the correlation calculation in the Naive Bayes Classifier algorithm can increase the accuracy value optimally [16].

C. Validation and Evaluation
In this stage, validation and evaluation of the algorithms used were the Correlated Naive Bayes Classifier (C-NBC) algorithm and the Naive Bayes Classifier (NBC) algorithm using a confusion matrix and 10-Fold crossvalidation.After testing the classification method, the results of the average accuracy of the second algorithm were shown in the table 6.Furthermore, from the accuracy described in table 6., it can be seen that the accuracy results were generated by the confusion matrix from testing the heart failure dataset using the Correlated Naive Bayes Classifier (C-NBC) algorithm and the Naive Bayes Classifier (NBC) algorithm, as shown in table 7 and 8.

Conclusion
From the results of research that have been carried out related to the comparison of the accuracy of the Correlated Naive Bayes Classifier (C-NBC) algorithm and the Naive Bayes Classifier (NBC) algorithm on the heart failure dataset obtained from UCI Repository Learning, namely the Heart Failure Dataset.The Correlated Naive Bayes Classifier (C-NBC) obtained the best accuracy value, compared to the results of the Naive Bayes Classifier (NBC) algorithm.The accuracy rate of the Correlated Naive Bayes Classifier (C-NBC) algorithm was 80.6%.With the results of this study, the use of the Correlated Naive Bayes Classifier (C-NBC) algorithm can be used for the diagnosis of heart failure patients because it has a good level of accuracy and is categorized as Good Classification.

Figure 1 .
Figure 1.Research Stages Finally, Explanation of the research stages based on the figure 1., above are as follows: 1.Data CollectionThe dataset in this study was collected from the Krembil Research Institute, Toronto, Canada, dataset taken from the UCI Repository.The dataset has 13 attributes, two classes, and 299 data[11].Table 1.Heart Failure Dataset Attributes No Attribute Name Description a. Excellent classification = 0.90 -1.00 b.Good classification = 0.80 -0.90 c.Fair classification = 0.70 -0.80 d.Poor classification = 0.60 -0.70 e.Failure = 0.50 -0.60

Figure 2 .
Figure 2. Comparison of Average Accuracy of C-NBC and NBC Algorithms Based on Figure 2 above, the highest average accuracy value in the heart failure dataset with the Correlated Naive Bayes Classifier (C-NBC) algorithm was 80.6% whereas the accuracy value of the Naive Bayes Classifier (NBC)

Table 1 .
Heart Failure Dataset Attributes

Table 4 .
Data Pre-processing Subarkah, et.al. (Comparison of correlated algorithm accuracy Naive Bayes Classifier and Naive Bayes Classifier for heart failure classification)

Table 5 .
Heart Failure Dataset Accuracy Results

Table 6 .
Correlated Naive Bayes Classifier (C-NBC) Algorithm Accuracy Results and Correlated Naive Bayes

Table 7 .
Confusion Matrix Correlated Naive Bayes Classifier (C-NBC) Algorithm While in table 8 below is the result of the confusion matrix algorithm Naive Bayes Classifier (NBC).