Continuing the last article when we created a Jupyter Notebook and used python to connect to an Oracle Autonomous Transaction Processing Database instance, now it’s time to run a classification using the machine learning library called Scikit-Learn.
This is a simple demonstration using the Iris dataset and in the near future I intend to show a more real use case.
You can download the notebook here: https://github.com/waslleysouza/oracle_autonomous_jupyter/blob/master/atp_classification.ipynb.
Start the Jupyter Notebook and open the notebook.
First, install the required python libraries for machine learning.
# Install required ML libraries !pip install scikit-learn xgboost
Import the required python libraries for machine learning.
# Import required ML libraries from sklearn.model_selection import train_test_split from sklearn.preprocessing import StandardScaler from sklearn.model_selection import cross_val_score from sklearn import tree import xgboost as xgb
Separate features and labels into different variables, and print the first three samples of each.
X = df.iloc[:, 0:4] y = df.iloc[:, 4] X.head(3) y.head(3)
Split the data into training and test datasets.
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
print('Training set = {} samples, Test set = {} samples'.format(X_train.shape[0], X_test.shape[0]))
Scale the data.
sc = StandardScaler() sc.fit(X_train) X_train_std = sc.transform(X_train) X_test_std = sc.transform(X_test)
Now is the time for fun!
There are many machine learning algorithms that we can use for classification, for example, Decision Tree.
# Create object decision_tree = tree.DecisionTreeClassifier(criterion='gini') # Train DT based on scaled training set decision_tree.fit(X_train_std, y_train) # Print performance print('The accuracy of the Decision Tree classifier on training data is {:.2f}'.format(decision_tree.score(X_train_std, y_train))) print('The accuracy of the Decision Tree classifier on test data is {:.2f}'.format(decision_tree.score(X_test_std, y_test)))
Or XGBoost.
# Create object xgb_clf = xgb.XGBClassifier() # Train DT based on scaled training set xgb_clf = xgb_clf.fit(X_train_std, y_train) # Print performance print('The accuracy of the XGBoost classifier on training data is {:.2f}'.format(xgb_clf.score(X_train_std, y_train))) print('The accuracy of the XGBoost classifier on test data is {:.2f}'.format(xgb_clf.score(X_test_std, y_test)))
Good job!
In this article, you learned how to use some machine learning algorithms for classification of data from the Oracle Autonomous Transaction Processing Database instance through Jupyter Notebook.
Have a good time!
One reply on “Machine Learning: Classification using Python and Oracle ATP”
Great Notebooks, showing how to define, connect, populate and retrieve data from the Autonomous DB… Instead of defaulting to using SQLite !
Thanks for sharing !
JP