Machine Learning: Classification using Python and Oracle ATP

Continuing the last article when we created a Jupyter Notebook and used python to connect to an Oracle Autonomous Transaction Processing Database instance, now it’s time to run a classification using the machine learning library called Scikit-Learn.

This is a simple demonstration using the Iris dataset and in the near future I intend to show a more real use case.

You can download the notebook here: https://github.com/waslleysouza/oracle_autonomous_jupyter/blob/master/atp_classification.ipynb.

Start the Jupyter Notebook and open the notebook.
First, install the required python libraries for machine learning.

# Install required ML libraries
!pip install scikit-learn xgboost

Import the required python libraries for machine learning.

# Import required ML libraries
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import cross_val_score
from sklearn import tree
import xgboost as xgb

Separate features and labels into different variables, and print the first three samples of each.

X = df.iloc[:, 0:4]
y = df.iloc[:, 4]

X.head(3)
y.head(3) 

Split the data into training and test datasets.

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42) 
print('Training set = {} samples, Test set = {} samples'.format(X_train.shape[0], X_test.shape[0]))

Scale the data.

sc = StandardScaler()
sc.fit(X_train)
X_train_std = sc.transform(X_train)
X_test_std = sc.transform(X_test)

Now is the time for fun!
There are many machine learning algorithms that we can use for classification, for example, Decision Tree.

# Create object
decision_tree = tree.DecisionTreeClassifier(criterion='gini')

# Train DT based on scaled training set
decision_tree.fit(X_train_std, y_train)

# Print performance
print('The accuracy of the Decision Tree classifier on training data is {:.2f}'.format(decision_tree.score(X_train_std, y_train)))
print('The accuracy of the Decision Tree classifier on test data is {:.2f}'.format(decision_tree.score(X_test_std, y_test)))

Or XGBoost.

# Create object
xgb_clf = xgb.XGBClassifier()

# Train DT based on scaled training set
xgb_clf = xgb_clf.fit(X_train_std, y_train)

# Print performance 
print('The accuracy of the XGBoost classifier on training data is {:.2f}'.format(xgb_clf.score(X_train_std, y_train)))
print('The accuracy of the XGBoost classifier on test data is {:.2f}'.format(xgb_clf.score(X_test_std, y_test)))

Good job!
In this article, you learned how to use some machine learning algorithms for classification of data from the Oracle Autonomous Transaction Processing Database instance through Jupyter Notebook.

Have a good time!

Author: Waslley Souza

Consultor Oracle com foco em tecnologias Oracle Fusion Middleware e SOA. Certificado Oracle WebCenter Portal, Oracle ADF e Java.

One thought on “Machine Learning: Classification using Python and Oracle ATP”

  1. Great Notebooks, showing how to define, connect, populate and retrieve data from the Autonomous DB… Instead of defaulting to using SQLite !
    Thanks for sharing !
    JP

Comments are closed.