Lets assume that
set.seed(44)
deaths<- 10:1 + sample.int(3, 10, replace = T)
and
spent<- seq(100, 550, by = 50 )
The very first thing you want to do when you get your data is literally to look at it. This can be done relatively painlessly with
plot(spent, deaths)
which yields
So it looks like the more we spend, the less deaths there are. That makes sense. But how can we quantify that statement. Using cor()
will give us the correlation between the two variables spent
and deaths
.
cor(spent, deaths)
# [1] -0.9809581
So it looks like they are very strong (and negatively correlated.) One other simple method (that is closely related to cor()
) is to fit a linear model.
model<- lm(deaths~spent)
The summary()
call yields a lot of useful information about the model you just fit, the interpretation of which is beyond the scope of this post, but can be readily found with some quick Googling.
summary(model)
#Call:
#lm(formula = deaths ~ spent)
#Residuals:
# Min 1Q Median 3Q Max
#-0.89697 -0.51515 -0.05758 0.46364 1.01818
#Coefficients:
# Estimate Std. Error t value Pr(>|t|)
#(Intercept) 14.151515 0.539649 26.22 4.80e-09 ***
#spent -0.021697 0.001519 -14.29 5.62e-07 ***
#---
#Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#Residual standard error: 0.6898 on 8 degrees of freedom
#Multiple R-squared: 0.9623, Adjusted R-squared: 0.9576
#F-statistic: 204.1 on 1 and 8 DF, p-value: 5.622e-07
solved Determine if data is related in R [closed]