--- title: "assertable Template" author: "Grant Nguyen" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{assertable template} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, echo=FALSE, results='asis'} library(assertable); library(data.table) ``` ## Data We will use the CO2 dataset, which has 64 rows and 5 columns of data from an experiment related to the cold tolerances of plants. First, we take in the CO2 dataset and save the whole dataset three times into three separate csv files as data/file_#.csv, with a unique id_var. ```{r, results='asis', eval=FALSE} for(i in 1:3) { data <- CO2 data$id_var <- i write.csv(data,file=paste0("../data/file_",i,".csv"),row.names=F) } ``` ## File Check and Import First, use check_files to make sure the files exist. We can use the system.file command to locate them within the assertable package. Then, run import_files to bring them in. We'll call the combined data object master_data. ```{r, results='markup'} filenames <- paste0("file_",c(1:3),".csv") filenames <- system.file("extdata", filenames, package = "assertable") filenames ``` ```{r, results='markup'} check_files(filenames) ``` ```{r, results='markup'} master_data <- import_files(filenames,FUN=fread) head(master_data) ``` ## Checking Dimensions This dataset should have 84 * 3 rows, and six columns: Plant, Type, Treatment, conc, uptake, and id_var. ```{r, results='markup', error=TRUE} assert_nrows(master_data,(84*3)) assert_colnames(master_data,c("plant","type","treatment","conc","uptake","id_var")) ``` Oops, forgot to capitalize the column names. Trying again. ```{r, results='markup'} assert_nrows(master_data,(84*3)) assert_colnames(master_data,c("Plant","Type","Treatment","conc","uptake","id_var")) ``` ## Checking IDs We believe the dataset should be unique by Plant, conc, and id_var (where id_var just represents the replication number of the dataset). Let's check this. ```{r, results='markup'} plants <- unique(master_data$Plant) concs <- unique(master_data$conc) id_vars <- unique(master_data$id_var) id_list <- list(Plant=plants, conc=concs, id_var=id_vars) assert_ids(master_data,id_list) ``` Now, let's make sure that there are only two values in Type: Quebec and Mississippi. Let's also make sure that uptake and conc are more than 0 and less than 1500. ```{r, results='markup'} assert_values(master_data, colnames = "Type", test="in", test_val = c("Quebec","Mississippi")) assert_values(master_data, colnames = c("uptake","conc"), test="gt", test_val = 0) assert_values(master_data, colnames = c("uptake","conc"), test="lt", test_val = 1500) ``` Finally, let's assert that all values of conc must be at least 6 times the value of uptake ```{r, results='markup', error=TRUE} assert_values(master_data, colnames = "conc", test="gt", test_val = master_data$uptake * 6) ``` Bummer. Let's finally do some subsetting of our data. ```{r, results='markup'} new_data <- master_data[master_data$Type == "Quebec" & master_data$Plant %in% c("Qn2","Qn3") & uptake > 20,] ``` Now, let's see if our values of concs can uniquely identify our observations. ```{r, results='markup', error=TRUE} assert_ids(new_data, list(Plant=c("Qn2","Qn3"), conc=concs)) ``` Rough, let's take 95 out of our concs level and try it again. ```{r, results='markup', error=TRUE} new_concs <- c(175,250,350,500,675,1000) assert_ids(new_data, list(Plant=c("Qn2","Qn3"),conc=new_concs)) ``` Let's first get the actual rows and look at them. ```{r, results='markup', error=TRUE} new_concs <- c(175,250,350,500,675,1000) vetting_data <- assert_ids(new_data, list(Plant=c("Qn2","Qn3"),conc=new_concs), ids_only=F, warn_only=T) print(head(vetting_data)) ``` Hmm, we forgot to include the values of id_var in the actual id_vars argument. Now, let's try it the last time with the id_var character vector included. ```{r, results='markup', error=TRUE} new_concs <- c(175,250,350,500,675,1000) assert_ids(new_data, list(Plant=c("Qn2","Qn3"), conc=new_concs, id_var=id_vars)) ``` Awesome! Now you're a data wizard!