Using Linear Regression to predict Diamond Prices
Posted by Diego em Novembro 10, 2014
This post is a follow up on the “Statistical Regression Models Examples” lecture from the Coursera Data Science Specialization from the Johns Hopkins University.
I wrote it as I was attempting to understand this particular lesson.
The idea is to use linear regression to predict the price of diamonds on the “diamonds” dataset from the UsingR library which is a data set on 48 diamond rings containing price in Singapore dollars and size of diamond in carats.
> g = ggplot(diamond, aes (x= carat, y=price)) > g= g + geom_point(size = 6, colour = "black", alpha =0.2) > g= g + geom_point(size = 5, colour = "blue", alpha =0.2) > g= g + geom_smooth(method = "lm", colour="black")
Create the linear regression:
· That means the linear regression function is:
o y = 3721x – 259.63
· In other words:
o We estimate an expected 3721.02 (SIN) dollar increase in price for every carat increase in mass of diamond.
o The intercept 259.63 is the expected price of a 0 carat diamond.
Getting a more interpretable intercept:
In order to get a more meaningful result, we center the “carat” variable:
(Intercept) I(carat - mean(carat))
That means that $500.1 is the expected price for the average sized diamond of the data (0.2042 carats). We can verify that by substituting 0.2042 on the first function, we get 500:
o 3721 * 0.2042 – 259.63 = 500.1
Predicting the price of a diamond
In order to predict the value of a diamond, we can just substitute the value we want on the linear regression formula:
Or we can use R’s predict function: