I am getting almost 100% accuracy in my Random forest algorithm

Issue

This Content is from Stack Overflow. Question asked by Aditya

I am creating a project based on predicting anemia disease using machine learning. In my project i am using a dataset from kaggle, I have removed all duplicates and have cleaned data and have 534 entries after removing duplicates and null values. But after running the model and checking the accuracy, it is showing 100% accuracy for Random Forest Algorithm which I find unusual.

y = df['Result']
y.shape
df.drop("Result", axis=1, inplace=True)
x = df
x.head()
x_train,x_test,y_train,y_test = train_test_split(x,y,test_size=0.30)

random = RandomForestClassifier()
random.fit(x_train,y_train)
ran_pred = random.predict(x_test)
print(ran_pred)
accuracy_scorerf = accuracy_score(y_test, ran_pred)
print("Random Forest :", accuracy_scorerf)

cm = confusion_matrix(y_test, ran_pred)
f, ax = plt.subplots(figsize=(5,5))
sns.heatmap(cm,fmt=".0f", annot=True,linewidths=0.2, linecolor="purple", ax=ax)
plt.xlabel("Model Predicted")
plt.ylabel("Actual values")
plt.show()

Output –
[1 0 1 0 0 1 1 1 0 0 1 0 0 0 0 1 1 0 0 0 1 1 0 0 1 1 0 0 0 0 1 0 0 0 1 0 0
1 1 0 1 0 0 0 0 1 0 0 1 0 0 0 1 1 0 0 0 0 0 0 0 1 0 1 0 1 0 1 0 0 1 0 1 0
0 0 0 0 0 0 0 0 1 0 1 0 1 0 0 1 0 0 1 1 1 1 1 1 0 1 1 0 1 0 0 0 1 0 1 0 1
0 0 0 0 1 0 0 0 1 0 0 1 1 0 0 1 1 0 1 0 0 1 0 0 0 0 1 1 1 0 1 0 0 1 0 1 0
0 0 1 1 0 0 0 0 0 0 1 1 0]

Random Forest : 1.0

Confusion Matrix Image

I have checked the dataset and have a balanced dataset with 247 suffering from anemia and 287 not suffering from anemia.

Not able to get why I am getting such unusually high accuracy.



Solution

This question is not yet answered, be the first one who answer using the comment. Later the confirmed answer will be published as the solution.

This Question and Answer are collected from stackoverflow and tested by JTuto community, is licensed under the terms of CC BY-SA 2.5. - CC BY-SA 3.0. - CC BY-SA 4.0.

people found this article helpful. What about you?