!pip install pycaret
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()
import warnings
warnings.filterwarnings('ignore')
df = pd.read_csv("/content/Wine.csv")
df.head()
Here the first column is the target column and the rest of the others are independent variables. Looking at the dataset it looks like all the variables are numeric. Let's do that check in the next code block.
#-------Check_for_dataset_information_about_each_column------
df.info()
Looking at the above result it looks like our guessing was correct. All the independant varibales are numeric. Now let's check out the second criterion of missing values. In the next block of code, we will check whether there are any missing values in the dataset.
df.isnull().sum()
The above output shows that there are no missing values in the dataset. As there are no missing values in the given dataset, missing values treatment is not required here. Let's move to our next check of imbalance testing. In the imbalance check, we will test whether our dataset carrying any imbalance or not. The linear machine learning model is sensitive to imbalanced datasets and performs poorly on such datasets. So to test class imbalance we will be using "countplot" from the seaborn library.
sns.countplot(df['1'])
Looking at the above graph we can conclude that there is no such imbalance in the dataset. All three populations are not exactly the same but it is okay according to the definition of imbalance dataset in machine learning.
#-------Check_distribution_of_each_independant_variable----------
sns.pairplot(df.iloc[:,0:5],hue = '1')
Looking at the above pair plot we can conclude that the is a difference in the distribution in examples that belongs to different classes. It shows that for many of the feature different class labels examples centroids are differing from each other. Which is a sign of having variation/differentiation in character between inter-class label examples.
#-------Lets_build_pycaret_model_to_get_idea_about_ML_model------------from pycaret.classification import *
model = setup(df,target = "1")
#--------------Compare_models_using_pycaret------------
compare_models()
target = df['1']
df = df.drop('1',axis=1)
#-----------Split_entire_dataset_into_train_and_validation_sets--------from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test = train_test_split(df,target,test_size = 0.20,random_state = 42)
print(X_train.shape,y_train.shape,X_test.shape,y_test.shape)
#---------pickup_best_performing_model_to_build_ml_model-----------
from sklearn.ensemble import ExtraTreesClassifier
model = ExtraTreesClassifier(verbose=1)
model.fit(X_train,y_train)
#--------Implement_model_on_validation_dataset------------------
predictions = model.predict(X_test)
#---------------Check_model_performance--------------------
from sklearn.metrics import classification_report
print(classification_report(y_test,predictions))
Comments