This function marks outliers in the input vector.

detect_outliers(
  x,
  apriori,
  ...,
  plot = FALSE,
  verbose = FALSE,
  title = NULL,
  timestamps = NULL
)

# S4 method for numeric,missing
detect_outliers(x, plot, verbose, title, timestamps)

# S4 method for numeric,Apriori
detect_outliers(x, apriori, plot, verbose, title, timestamps)

Arguments

x

numeric vector of values

apriori

Apriori class

...

optional parameters, depending on signature:

plot

prints comprehensive plots

verbose

prints comprehensive information

title

adds title to the plot

timestamps

timestamp vector. For airpressure, timestamps are of no importance, except aestehtical for the scatter plot if plot = TRUE. In case there are duplicates and NA values, only warnings will be raised which might suggest that something is wrong with x. For hydrostatic pressure, timestamps are important. Therefore an error is raised if timestamps are not supplied, or if any of the timestamps are NA or duplicates.

Value

Logical vector with same length as x, specifying TRUE for an outlier.

Methods (by class)

  • x = numeric,apriori = missing: Only considers x, without any a-priori information. A normal distribution is assumed with mean and variance estimated using median and MAD as described in Leys, 2013.

  • x = numeric,apriori = Apriori: Takes a-priori information about x into consideration.

Examples

# In case of a vector:
x <- c(1:9, 100)
detect_outliers(x)
#>  [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE

# In case of a dataframe, select the column:
df <- data.frame('x' = x)
detect_outliers(df$x)
#>  [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE

# Or use the tidyverse approach:
library(magrittr)
df %>% dplyr::mutate("outlier" = detect_outliers(x))
#>      x outlier
#> 1    1   FALSE
#> 2    2   FALSE
#> 3    3   FALSE
#> 4    4   FALSE
#> 5    5   FALSE
#> 6    6   FALSE
#> 7    7   FALSE
#> 8    8   FALSE
#> 9    9   FALSE
#> 10 100    TRUE