Um blog sobre nada

Um conjunto de inutilidades que podem vir a ser úteis

Basic Statistics in R

Posted by Diego em Dezembro 11, 2014


 

library(UsingR)

·         First let’s take a look at the data

image
 

·         We’ll be playing with the temperature variable so, to make things easier, let’s load it into a variable:

temp =airquality[,4]

 

·         Then let’s do a simple plot sorted by the temperature:

plot (sort(temp), xlab="Index", ylab = 'Temperature')

 image

·         We can check the minimum, maximum, median and first and third quartile by creating a box plot:

boxplot(temp)

image

Or simply by running:

 

summary(temp)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  56.00   72.00   79.00   77.88   85.00   97.00 

 

 

·         A functionality that I like a lot is adding a line connecting the points on the plot. Since we have a lot of points, I’ll be plotting the first 10 by doing temp[1:10] to better visualize the result. That can be achieved by using the “type = b” parameter

plot (sort(temp[1:10]), xlab="Index", ylab = 'Temperature', type = 'b')



image

·         The mean of the values can be calculated by the mean command. But let’s say we have an outlier on our dataset (this doesn’t make much sense on temperature, but it makes a lot of sense with salaries for example, where one person can earn a lot more than everyone else). A number that is so big, that we want to remove it because it is affecting mean calculation. To simulate that, I’ll sort the temp data and add an outlier with the “c” command:

temp2 <-sort(temp) #sort the data
temp2 <- c (temp2, 5000) # adds a new value to the integer vector

 

Then we can use the trim command which, in this case, removes 5% of the observations (from the top and the bottom)

mean(temp2)
[1] 109.8442
mean(temp2, trim=0.05)
[1] 78.19286
mean(temp)
[1] 77.88235

 

 

We can see that the second mean looks a lot closer to the actual mean than the first one.

 

·         Few other useful functions:

o   Range:

range (temp)
[1] 56 97

max(temp) - min(temp)
[1] 41

o   IQR (Interquartile range): difference between the third quartile and the first quartile

IQR(temp)
[1] 13

 
o   Variance 
 
var(temp)
[1] 89.59133

o   Standard Deviation:

sd(temp)
[1] 9.46527
sqrt(var(temp))
[1] 9.46527
round(sd(temp),1)
[1] 9.5

 

Deixe uma Resposta

Preencha os seus detalhes abaixo ou clique num ícone para iniciar sessão:

Logótipo da WordPress.com

Está a comentar usando a sua conta WordPress.com Terminar Sessão / Alterar )

Imagem do Twitter

Está a comentar usando a sua conta Twitter Terminar Sessão / Alterar )

Facebook photo

Está a comentar usando a sua conta Facebook Terminar Sessão / Alterar )

Google+ photo

Está a comentar usando a sua conta Google+ Terminar Sessão / Alterar )

Connecting to %s

 
%d bloggers like this: