Building a Random Forest Classifier using Scikit-learn in Python.

2 min readAug 2, 2020

A simple Machine learning algorithm.

Random forest classifier creates a set of decision trees from a randomly selected subset of the training set. The Random Forest Classifier is a set of decision trees from a randomly selected subset of the training set. It aggregates the votes from different decision trees to decide the final class of the test object.

In this classification algorithm, we will use IRIS flower datasets to train and test the model. The dataset is available in the scikit-learn library.

Code is as follows:

#importing libraries

#importing Scikit-learn library and datasets package
from sklearn import datasets

#Loading the iris plants dataset (classification)
iris = datasets.load_iris()

#checking our dataset content and features names…
print(iris.target_names)

#OUTPUT: [‘setosa’ ‘versicolor’ ‘virginica’]

print(iris.feature_names)

#OUTPUT: [‘sepal length (cm)’,’sepal width (cm)’,’petal length (cm)’,’petal width (cm)’]

#dividing the datasets into two parts i.e. training datasets and test datasets
X,y = datasets.load_iris( return_X_y=True)

#Split arrays or matrices into random train and test subsets
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.70)

#(80% training dataset and 30% test datasets)

#importing required libraries

#importing random forest classifier from assemble module
from sklearn.ensemble import RandomForestClassifier
import pandas as pd
data= pd.DataFrame({‘sepallength’:iris.data[:,0],’sepalwidth’:iris.data[:,1],’petallength’:iris.data[:,2],’petalwidth’:iris.data[:,3],’species’:iris.target})

#checking the top 5 datasets in iris dataset

print(data.head())

#creating a RF classifier

clf = RandomForestClassifier(n_estimators=100)

#Training the model on the training dataset where fit function is used to train the model using the training sets as parameters
clf.fit(X_train, y_train)

#performing predictions on the test dataset
y_pred= clf.predict(X_test)

print()

#using metrics module for accuracy calculation,metrics is used to find accuracy/error

from sklearn import metrics
print(“ACCURACY OF THE MODEL:”,metrics.accuracy_score(y_test,y_pred))

#OUTPUT : ACCURACY OF THE MODEL: 0.9238095238095239

Building a Random Forest Classifier using Scikit-learn in Python.

Code is as follows:

Written by Amandeep