Um blog sobre nada

Um conjunto de inutilidades que podem vir a ser úteis

Using Linear Regression to predict Diamond Prices

Posted by Diego em Novembro 10, 2014


 

This post is a follow up on the “Statistical Regression Models Examples” lecture from the Coursera Data Science Specialization from the Johns Hopkins University.
I wrote it as I was attempting to understand this particular lesson.

The idea is to use linear regression to predict the price of diamonds on the “diamonds” dataset from the UsingR library which is a data set on 48 diamond rings containing price in Singapore dollars and size of diamond in carats.

 

library(UsingR)

plot(diamond$carat,diamond$price,
xlab=”Mass(carats)”,
ylab=”Price(SIN$)”,
bg=”lightblue”,
col=”black”,cex=1.1,pch=21,frame=FALSE)
abline(lm(price~carat,data=diamond),lwd=2)

 

clip_image002

 

using ggplot2:

> g = ggplot(diamond, aes (x= carat, y=price))
> g= g + geom_point(size = 6, colour = "black", alpha =0.2)
> g= g + geom_point(size = 5, colour = "blue", alpha =0.2)
> g= g + geom_smooth(method = "lm", colour="black")
image
 
 

Create the linear regression:

fit<-lm(price~carat,data=diamond)

coef(fit)

(Intercept)       carat 
  -259.6259   3721.0249 

 

·         That means the linear regression function is:

o   y = 3721x – 259.63

·         In other words:

o   We estimate an expected 3721.02 (SIN) dollar increase in price for every carat increase in mass of diamond.

o   The intercept ­259.63 is the expected price of a 0 carat diamond.

 

Getting a more interpretable intercept:

 

In order to get a more meaningful result, we center the “carat” variable:

 

> fit2<-lm(price~I(carat-mean(carat)),data=diamond)
> coef(fit2)
 
           (Intercept) I(carat - mean(carat)) 
500.833            3721.0249 

 

That means that $500.1 is the expected price for the average sized diamond of the data (0.2042 carats). We can verify that by substituting 0.2042 on the first function, we get 500:

o   3721 * 0.2042 – 259.63 = 500.1

 

Graphs:

clip_image003

 

Data: 

clip_image004

 

Predicting the price of a diamond

In order to predict the value of a diamond, we can just substitute the value we want on the linear regression formula:

clip_image005

 

Or we can use R’s predict function:

clip_image006

 

 

 

Deixe uma Resposta

Preencha os seus detalhes abaixo ou clique num ícone para iniciar sessão:

Logótipo da WordPress.com

Está a comentar usando a sua conta WordPress.com Terminar Sessão / Alterar )

Imagem do Twitter

Está a comentar usando a sua conta Twitter Terminar Sessão / Alterar )

Facebook photo

Está a comentar usando a sua conta Facebook Terminar Sessão / Alterar )

Google+ photo

Está a comentar usando a sua conta Google+ Terminar Sessão / Alterar )

Connecting to %s

 
%d bloggers like this: