Files
louiscklaw 9035c1312b update,
2025-02-01 02:09:32 +08:00

49 lines
2.0 KiB
Plaintext

---
title: "hw3-5.1"
author: "Mark Pearl"
date: "9/11/2019"
output:
html_document: default
pdf_document: default
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
library(outliers)
```
```{r uscrime_data with sample}
uscrime_data <- read.table('C:/Users/mjpearl/Desktop/omsa/ISYE_6501/hw3/data/uscrime.txt',header = TRUE, stringsAsFactors = FALSE)
head(uscrime_data)
```
## Plots Section
The following plots will conduct exploratory analysis on the data to get a sense of the data's distribution and to see if we can spot any outliers with a visual representation.
```{r boxplot}
boxplot(x= uscrime_data$Crime)
```
From the boxplot we can see that there are a few observations above the whisker which indicates values past Q3 are outliers (2 observations closest to a Crime value of 2000). There does not seem to be any observations in the lower quartiles that indicate any outliers.
```{r uscrime_data histogram}
hist(uscrime_data$Crime)
```
The result of histogram indicates a skewed distribution for the right tail. For a grubbs test to be effective it is implied that the data follows a normal distribution. However our data does follow a normal distribution towards the middle portion of the graph, so it could mean that we have outlying data. We will continue with conducting the grubbs test for further investigation.
## Grub Test Section
To ensure we test observations represented by the minimum and maximum values on the graph, we will use the "opposite" parameter of the grubbs test function.
```{r grubtest1}
grubbs.test(x=uscrime_data$Crime, type = 10, opposite = F)
```
```{r grubtest2}
grubbs.test(x=uscrime_data$Crime, type = 10, opposite = T)
```
The first output indicates that the highest or maximum observation closest to 2000 on the graph can be deemed an outlier due to the significantly low p-value of 0.07. This holds true to what was also determined in the boxplot output.
The second output indicates that the lowest value of the crime feature is with a high certainty not an outlier as we retrieved a p-value of 1.