Saving Machine Learning model and containerizing it to predict future events

Abhishek Sharma

4 min readMay 27, 2021

MLOps Task 01

Task Description 📄

👉 Pull the Docker container image of CentOS image from DockerHub and create a new container

👉 Install the Python software on the top of docker container

👉 In Container you need to copy/create a machine learning model which you have created in jupyter notebook.

In this task, I will be using ‘SalaryData.csv’ dataset to train my Machine Learning model. This dataset contains a total of 30 records and 2 fields namely ‘Years Experience’ and ‘Salary’.

I will create a machine learning model which will predict salary based on years of experience.Then I will save this model in the ‘salary.pk1’ file so that in the future whenever anyone needs to predict salary, they don’t need to train the model again, they just need to import the ‘salary.pk1’ file.

After saving this model, I will deploy this model in a docker container and predict the salary of a person based on year of experience.

So, let’s get started with the task.

First I will install one of the most popular Python distributor platform ‘anaconda’ in my RHEL8 Virtual Machine.( Referral this link to install it on RHEL8 )

After successful installation of ‘anaconda’, run the command given below to launch ‘jupyter’ IDE.

jupyter notebook --allow-root

After that I created a new Python file to create and train my ML model.

First I imported pandas to read the ‘SalaryData.csv’ file and store it dataset variable.

dataset.head( ): Function to print the top 5 records.

Here ‘YearsExperience’ is our independent feature, so I stored in ‘x’ variable.

‘Salary’ is our dependent variable and our estimator which I stored in ‘y’ variable.

As ‘y’ is continuous, I imported LinearRegression from sklearn module to train my model.

model = LinearRegression( ): To create a empty model without any experience.

Before dumping the data into the model, ‘x’ should be a 2D array. But in our case it is 1D array. I used reshape( ) to convert it into 2D array.

model.fit (x,y): It helps us to train our model based on past experience.

model.predict( ): Helps to predict salary of person based on years of experience.

Finally I imported joblib module and use dump ( ) to store my model in the ‘salary.pk1’ file so that in future I dont to train my model again to predict salary.

The second half of the task is to containerize the saved model and create a python file to predict salary. For this, I will be using docker as my container engine.

To check docker is successfully installed in our system or not, run:

docker info

As no error come up, it means docker is successfully installed in my system.

To check docker is running in our system, use:

systemctl status docker

If you find docker is inactive, run:

systemctl enable docker

This command starts your docker engine permanently on your system.

To download latest version of Centos image from hub.docker.com and store it in your system, use:

docker pull centos:latest

In my case I have already downloaded the image so it is showing ‘image is up to date’

To launch a Centos docker container with an interactive terminal with the name ‘ML_prediction’, run:

docker run -it --name=ML_prediction centos:latest

I have successfully launched my container and I am inside of it.

I installed scikit-learn using pip3. I have also provided the version because previously while training the model my scikit-learn version was 0.19.2. If I install any other version, version error comes up.

Now I will transfer ‘salary.pk1’ from my host to the container. The command for this,

docker cp /hostfile  (container_id):/(to_the_place_you_want_the_file_to_be)

copying ‘salary.pk1’ from host to container

checking file successfully copied or not

I created a python file with the name ‘pred.py’ to predict salary.

That’s it. I have successfully completed my MLOps task 1. Hope you enjoyed reading the blog.

Thank you.

Saving Machine Learning model and containerizing it to predict future events

Written by Abhishek Sharma