Problem Statement

The Drug Classifcation Analysis is used to analyse the effect of a particular drug based on certain paramrters (Age,Sex,BP,Cholesterol,Na_to_K) and finding an effective model which holds a strong relation with the parameters to predict the specific drug consumption index.

Dataset

The dataset used is the Drug Classification With Different Algorithms from Kaggle.

The 6 class labels are:

Age :Age of the person (int64).
Sex :Gender the person holds(object or categorical) (Male or Female).
Cholesterol :Fat level of the person (object or categorical) (High or Low or Normal).
Na_to_K :Sodium or Potassium content of the body (float64).
BP : Blood Pressure of the person (object or categorical) (High or Normal).

Target Variable:

Drug (object or categorical)

Drug refer to the type of drug consumed (through medication or direct injection)

Type:

A,B,C,X,Y

Model(s) Used

KNN Classifier

In this kernel, parameters of KNN Algorithm are described and effects of these paremeters on result are observed. First prediction is predicted with default parameters and this result is used for comparing. After that, best value of every parameters are found and are discussed their effects on result.Finally, GridSearch algorithm is used to find best values of each parameters. So results can be compared each other in the conclusion part.

i) Calculate distance

ii) Find closest neighbors

iii)Vote for labels

Refer

Random Forest

The Random forest or Random Decision Forest is a supervised Machine learning algorithm used for classification, regression, and other tasks using decision trees. The Random forest classifier creates a set of decision trees from a randomly selected subset of the training set. It is basically a set of decision trees (DT) from a randomly selected subset of the training set and then It collects the votes from different decision trees to decide the final prediction.

Based on the MSE the entropy of the system is reduced to get the best classification.

Refer

SVM Classifier

Support Vector Machines

Generally, Support Vector Machines is considered to be a classification approach, it but can be employed in both types of classification and regression problems. It can easily handle multiple continuous and categorical variables. SVM constructs a hyperplane in multidimensional space to separate different classes. SVM generates optimal hyperplane in an iterative manner, which is used to minimize an error. The core idea of SVM is to find a maximum marginal hyperplane(MMH) that best divides the dataset into classes.

i) Generate hyperplanes which segregates the classes in the best way. Left-hand side figure showing three hyperplanes black, blue and orange. Here, the blue and orange have higher classification error, but the black is separating the two classes correctly.

ii) Select the right hyperplane with the maximum segregation from the either nearest data points as shown in the right-hand side figure.

Refer

Future Work

Need to bring some improvemrnt in the data cleaning methods through standardised scaling non object variables.
Merging more classes for analysis (eg medication consumption rate, other mineral components comsumed etc).
Check for multicollinearity between parameters for significance.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
Data_Classification_Model_Task1a/Data_Classification_Model-main		Data_Classification_Model_Task1a/Data_Classification_Model-main
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Problem Statement

Dataset

Model(s) Used

Future Work

About

Releases

Packages

Languages

Tashmoy966/Task1_Classification_Model_Drug_Classification

Folders and files

Latest commit

History

Repository files navigation

Problem Statement

Dataset

Model(s) Used

Future Work

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages