Simple Housing Price Prediction Using Neural Networks with TensorFlow
Neural Networks are easy to get started with.
Most times, the confusion is around things like what algorithm to use, what library or framework, etc.
In this simple example, we will train a model to predict housing prices. Our training data consists of 14 variables. 13 variables are predictor variables, with the last being the target variable. Our training data comes from the Boston Housing Price Prediction dataset, which is hosted by Kaggle. Information is available here.
You will need to download train.csv and store it somewhere accessible. Let’s start by importing what we need and reading in our data.
The above code will print the first five rows of your imported data. You will see the column names printed along with the data in a tabular format. Our target variable is called medv, so we store it.
If you take a look at the data, you will see that the different columns have different ranges. This is not good for gradient descent. We need to have the columns range between 0 and 1. We use a MinMaxScaler from scikit-learn for that.
Because of the scaling, we will need to perform some operations on our predictions before we can get the correct predictions. Let’s keep track of your operands in lines 16
and 17
.
Scaling produces a Numpy Array. We need to create a DataFrame out of that. We do that in line 19
.
We are now reading to start building our Neural Network. We will make use of a Sequential model.
We can now add layers to our model. We will be creating fully connected layers using model.add(). The first call creates two layers, while subsequent calls add one layer each. We need to tell each layer what its output will be, which is the number of neurons it will output. We also need to specify the activation of the layers. In this case, we use the relu activation function. We do that in lines 4
to 7
.
Notice that the final layer outputs one value. That is because we are predicting a continuous variable. For the same reason, we do not specify an activation.
Next, we need to compile our model. We do this by specifying our loss function and our optimizer in line 9
.More on these two in another article.
We are now ready to train our model. Before we do that, we need to get our training dataset ready. We will leave out the first ten rows of our data so we can use them for validation. We will separate our predictors into X, and our target into Y.
We will train our model by passing in our training dataset. We also need to specify the number of times we would like to go over our training data. This is called an epoch.
At this point we are ready to make a prediction.
What did your model predict? Now, go take a look at the medv of the first row in your training data. Are the values close?
I hope you learnt something useful here. You can find a notebook with the code here.