Files
tunmnlu/task_2/others-answer/omsa-main/ISYE-6501-OAN/hw4/hw3/hw3-6.1to6.2.Rmd
louiscklaw 9035c1312b update,
2025-02-01 02:09:32 +08:00

92 lines
4.7 KiB
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
title: "hw3-6.1-2"
author: "Mark Pearl"
date: "9/11/2019"
output:
pdf_document: default
html_document: default
---
6.1
At work we have a number of clients in the oil & gas industry who need to have sophistication for the area of preventative maintenance with sensor data. Near-real time streaming data feeds into an enterprise data platform to conduct time-series analytics on the health of various machines across different oil sites. CUSUM is an applicable approach for solving this problem as the health of the machine needs to stay below a set threshold set from maintenance staff.
We want our approach to be conservative enough so we don't measure false alarms but tolerant enough to prevent major breakdowns. This would allude to use choosing a moderate level for our threshold.
https://ncss-wpengine.netdna-ssl.com/wp-content/themes/ncss/pdf/Procedures/NCSS/CUSUM_Charts.pdf
Based on the following resource we can see that suitable values for T is 5 and the value for c is usually set to 0.5. If downtimes were extremely critical then we would likely choose a lower threshold.
##6.2.1 Cusum Calculation:
In this section we'll be calculating the cusum by calculating the mean for each day across all years from 1996 to 2015. We'll then iterate across each year and subtract the x of t and c from the mean as we're looking to detect change for a decrease in temperature.
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
```{r uscrime_data with sample}
temps_data <- read.table('C:/Users/mjpearl/Desktop/omsa/ISYE_6501/hw3/data/temps.txt',header = TRUE, stringsAsFactors = FALSE)
head(temps_data)
```
```{r cusum function}
#Calculate the average temperate for each across day across all years from 1995 to 2015
x_of_t <- rowMeans(temps_data[c(2:length(temps_data))], dims=1, na.rm=T)
#Compute the mean for all values of x of t to get mu
mu <- mean(x_of_t)
#Set our value of C to help moderate our values to ensure we don't go over the threshold
C <- 4
#Now to compute S of t to determine if we're experiencing a change below our threshold we use the formula (mu - x_of_t - C) instead of using (x_of_t - mu - C) which is detecting an increase
s_of_t <- x_of_t - mu - C
#Craete an empty vector with an issue value of 0 but will be appended the value of cusum in a loop for each observation
precusum <- 0 * s_of_t
cusum <- append(precusum, 0)
#Loop through all observations to update the cusum value and check to #see if it's negative. If it's negative then the value cusum value is #set to 0, otherwise initialize it with the value and keep looping
for (i in 1:length(s_of_t))
{
cusum_value <- cusum[i] + s_of_t[i]
ifelse(cusum_value > 0, cusum[i+1] <- cusum_value, cusum[i+1] <- 0)
}
plot(cusum)
```
```{r check temp drop}
which(cusum > 85)
temps_data[56, 1]
```
From our results we can determine that there is a temperate change starting from the 56th obversation/row in the dataset or August 25th onwards.
##6.2.2 Cusum calculation:
Use a CUSUM approach to make a judgment of whether Atlantas summer climate has gotten warmer in that time (and if so, when)
```{r cusum2 function}
#Calculate the average temperate for each year up until we see a drop in the change of weather measured at August 26th. We could calculate the exact date for each year, but instead we're using a simpler approach of just taking a sample of the primary dataset up until that observation. We'll need to use colMeans instead of rowMeans for this portion
temps_subset <- temps_data[1:56,]
x_of_t2 <- colMeans(temps_subset[c(2:length(temps_subset))], dims=1, na.rm=T)
#Compute the mean for all values of x of t to get mu
mu2 <- mean(x_of_t2)
#Set our value of C to help moderate our values to ensure we don't go over the threshold
C <- 2
#Now to compute S of t to determine if we're experiencing a change below our threshold we use the formula (mu - x_of_t - C) instead of using (x_of_t - mu - C) which is detecting an increase
s_of_t2 <- x_of_t2 - mu2 - C
#Craete an empty vector with an issue value of 0 but will be appended the value of cusum in a loop for each observation
precusum2 <- 0 * s_of_t2
cusum2 <- append(precusum2, 0)
#Loop through all observations to update the cusum value and check to #see if it's negative. If it's negative then the value cusum value is #set to 0, otherwise initialize it with the value and keep looping
for (i in 1:length(s_of_t2))
{
cusum_value2 <- cusum2[i] + s_of_t2[i]
ifelse(cusum_value2 > 0, cusum2[i+1] <- cusum_value2, cusum2[i+1] <- 0)
}
plot(cusum2)
```
From this result we can conclude that we are detecting changes above the threshold on several occasions throughout this time interval. More so in the later years. However, it doesn't seem to follow a consistent trend and the results are inconclusive of whether the climate has got warmer.