Machine Learning on Amazon Machine Learning

Machine Learning on Amazon Machine Learning

Jugal Gandhi bio photo By Jugal Gandhi Comment

Amazon Machine learning service is a full automated and easy tool to use. It automatically chooses the appropriate machine learning algorithm for you and trains the machine learning model. Placing minimal responsibilty on user’s hand, it provides quick results without much hassle.

Amazon Machine learning service is a good starting point for beginners who want to learn how machine learning works. Machine learning comprises of two types of problem: unsupervised machine learning and supervised machine learning. Amazon machine learning only provides solution for supervised machine learning problem. Supervised machine learning problems are the one where each row in the dataset has one target label to predict.

Go here, and create your AWS free account. Out of many services provided by AWS, Amazon Machine Learning is one of them.

To get started with Amazon Machine Learning, go through this video.

In this blog, we shall learn how to perform Linear Regression on House Price Prediction Dataset using Amazon Machine Learning Service.

TABLE OF CONTENTS

Exploring Amazon Machine Learning Service

First login to your account.

Once you log-in, go to the top black bar, and click Services and type Machine Learning. In the drop-down select Machine Learning. This shall open machine learning AWS service. The window looks like below:

intro_items

As, you can see there are four objects:

  1. Datasource: Your input data.
  2. ML Model: Machine learning model chosen by Amazon. This machine learning model is trained using a training datasource.
  3. Evaluation: Uses your test data to gauge the performance of your machine learning model.
  4. Batch Prediction: Uses batch data to predict values based on the rules it learned using training datasource.

But, before we do any of these we have to upload our dataset on our AWS account.

Upload your dataset

As mentioned by AWS Machine Learning documnetation, Amazon Machine Learning allows you to read your dataset from three different resources: (a) one or more files in Amazon S3, (b) results of an Amazon Redshift query, or (c) results of an Amazon Relational Database Service (RDS).

In our case, we will upload our dataset to our Amazon S3 bucket.

In this exercise, we are going to load a public data-set hosted by kaggle. You can find and download this data-set here. Download it and store it on your local drive. This dataset contains house sale prices for King County. This data-set consists of 21 columns and 21613 rows. Out of these 21 columns, the ‘price’ column is called label(one which we are trying to predict), and the remaining 20 columns are called as features(independent predictors). Each row in dataset is called as observation or data-point.

Once, you have the dataset in your local drive, go back to your AWS console and do the following steps:

  • STEP 1: Go to the top black bar, and click Service and type s3. In the drop-down select S3. This shall open S3 service console. It is a simple cloud storage service offerd by Amazon.

    s3_service

  • STEP 2: Click the Create button. Give any name to your bucket. Choose your nearest hosting region. Remember the name of your bucket should be unique and universal, and so you have to choose a different name then the one shown below. At the end click Next.

    create_bucket

  • STEP 3: Keep hitting next and at the end click Create Bucket. For, now we are not going to set any properties and permissions for the bucket, and we shall keep the default settings for it. This shall create a new bucket with the name you specified and you shall find it in the list of your buckets.

    bucket_list

  • STEP 4: Now left click your bucket and then click the Upload button and under Select Files window click Add Files to upload the downloaded .csv file from your local drive to Amazon S3. Dont change any default properties and keep hitting next untill end.

    upload_file

Create Data Source

Now, we have our dataset uploaded on AWS S3. Now go to back to your AWS Machine Learning Console. We can now create new datasource by doing following steps:

  • STEP 1: Click Create New… drop-down menu and select Datasource. This shall open Create Datasource windowpane.

  • STEP 2: Input Data pane. Under location select S3. Specify your bucket location and give your own Datasource name. Then click Verify.

    input_data

    This shall verify a few permissions. If asked to grant any permission press Yes. The validation will be performed next. Once, the validation is succesful click Continue.

  • STEP 3: Schema pane. At the “Does the first line in your CSV contain the column names?” choose “Yes”. Now you can see your feature columns being listed. Here, each row represents different feature columns of your dataset. Thus, here we have 21 rows representing 21 different feature columns of your dataset. You can also view the datatype of that feature along with the name. Change the data-type of features: ‘grade, waterfornt, view, bedroom, condition and zipcode’ to categorical.

    schema

  • STEP 4: Target pane. At the “Do you plan to use this dataset to create or evaluate an ML model?” choose “Yes”. We do this because we are going to use the same dataset to evaluate the trained machine learning model. Next, you will be asked to choose a Target. So, under Target column select price, as price is what we are trying to predict here.

    target

    Once, you select price, and because price is a numeric attribute, Amazon will automatically detect this and show you that this is a linear regression problem. Cool!!. Automation!!. Now click Continue.

  • STEP 4: Row pane. At the “Does your data contains an identifier?” choose “No”.

  • STEP 5: Review pane. Under this pane your review all the changes made to your datasource. Glance through this and then click Create Datasource. This shall open a new window as shown below:

    new_datasource


Create ML Model and Train the model

Now, the next step would be to create an ML(Machine Learning) model. So, go to back to your AWS Machine Learning Console and do the following steps:

  • STEP 1: Click Create New… drop-down menu and select ML model. This shall open Create ML model windowpane.

  • STEP 2: Input Data pane. Choose the “I already created a datasource pointing to my S3 data” option and then it will show the datasource that you created in earlier step listed. Click your datasource.

    ml_input_data

    Now click Continue.

  • STEP 3: ML model settings pane. Give a name to your ML model and a name to your Evaluation. Here, we are using Default Recipe to train our ML model.

    ml_model_settings

    Now click Review.

  • STEP 4: Review pane. Review your changes here and then click Create ML Model. Once you create your ML model, a new pane shall open which shows you the status of your ML model. This is shown in the figure below:

    ml_model_status

    The status will show pending. But, after few minutes it must change to “Completed”. Once, the ML Model training is done and the evaluation will be performed next automatically.

To explore the training process, under ML model summary block select Download log. This log traces down the entire process executed to train your machine learning model. For linear regression, Amazon machine learning uses Stochastic Gradient Descent + Squared Loss funtion. This log shows how the linear regression model tries different learning rate to find the optimal weights that gives the least error. The end part of the log shows the parameters chosen to train the model. This can be shown in the figure below.

download_log

As seen the learning rate chosen is 1.0 and the training data set RMSE is 141915.4004.


Evaluate the model

Now, there should be five ML objects in your Machine Learning Dashboard, as shown below. Ypu can check the status of all your ML objects here and you can wathc their progress by refreshing your AWS Machine Learning Console. Wait for the status to show “Completed”.

dashboard

Now, select “Evaluation: ML Model” for looking at the evaluation summary.

ml_model_performance

There is an option of clicking “Explore model performance”, which you can click to visualize the histogram of errors.

histogram_errors

The green dotted line in the above figure marks the zero point on x-axis. You can change in the size of bin-intervals by choosing different options.


Generate predictions

Amazon machine learning allows you to do both batch predictions and real-time predictions.


Summary


comments powered by Disqus