This is a Udacity machine learning nanodegree project deliverable, please use in accordance to Udacity honor code.
The goal of this project is to apply basic machine learning concepts on data collected for housing prices in the Boston, Massachusetts area to predict the selling price of a new home by performing the following:
- Explore the data to obtain important features and descriptive statistics about the dataset.
- Split the data into testing and training subsets.
- Determine a suitable performance metric for this problem.
- Analyze performance graphs for a learning algorithm with varying parameters and training set sizes.
- Pick the optimal model that best generalizes for unseen data.
- Finally, test this optimal model on a new sample and compare the predicted selling price to the statistics.
The following SW was used in the first part of the project:
- Python 2.7
- NumPy
- scikit-learn
In the last part of this project, R was used as an EDA tool:
- R 3.2.3
- corrplot
- GGally
Data set can be found at: https://archive.ics.uci.edu/ml/datasets/Housing
Final report and IPython notebook are included in this repository. IPython notebook is straightforward to use, please refer to http://cs231n.github.io/ipython-tutorial/ for a quick tutorial.