hw3-5.1

Plots Section

The following plots will conduct exploratory analysis on the data to get a sense of the data’s distribution and to see if we can spot any outliers with a visual representation.

boxplot(x= uscrime_data$Crime)

From the boxplot we can see that there are a few observations above the whisker which indicates values past Q3 are outliers (2 observations closest to a Crime value of 2000). There does not seem to be any observations in the lower quartiles that indicate any outliers.

hist(uscrime_data$Crime)

The result of histogram indicates a skewed distribution for the right tail. For a grubbs test to be effective it is implied that the data follows a normal distribution. However our data does follow a normal distribution towards the middle portion of the graph, so it could mean that we have outlying data. We will continue with conducting the grubbs test for further investigation.

Grub Test Section

To ensure we test observations represented by the minimum and maximum values on the graph, we will use the “opposite” parameter of the grubbs test function.

grubbs.test(x=uscrime_data$Crime, type = 10, opposite = F)

## 
##  Grubbs test for one outlier
## 
## data:  uscrime_data$Crime
## G = 2.81287, U = 0.82426, p-value = 0.07887
## alternative hypothesis: highest value 1993 is an outlier

grubbs.test(x=uscrime_data$Crime, type = 10, opposite = T)

## 
##  Grubbs test for one outlier
## 
## data:  uscrime_data$Crime
## G = 1.45589, U = 0.95292, p-value = 1
## alternative hypothesis: lowest value 342 is an outlier

The first output indicates that the highest or maximum observation closest to 2000 on the graph can be deemed an outlier due to the significantly low p-value of 0.07. This holds true to what was also determined in the boxplot output.

The second output indicates that the lowest value of the crime feature is with a high certainty not an outlier as we retrieved a p-value of 1.

hw3-5.1

Mark Pearl

29/01/2020

Plots Section

Grub Test Section