\(^1\)Department of General Psychology, University of Padova, Italy
\(^2\)Department of Philosophy, Sociology, Pedagogy and Applied Psychology, University of Padova, Italy
\(^3\)Department of Communication Sciences, Humanities and International Studies, University of Urbino Carlo Bo, Italy
The present document includes the data pre-processing steps used to read the different types of data collected over three days from a sample of 105 healthy adults:
PrelQS
= demographic & retrospective self-report
data collected with the preliminary questionnaire (Google
Forms)
ESM
= experience sampling measures collected with
the Sensus
Mobile app (Xiong et al., 2016)
HRV
= 2-min segments of blood volume pulse data
recorded with the E4 wristband (Empatica, Milan,
Italy)
GNG
= Go/No-Go behavioral data recorded with E-Prime
2.0.10 (Psychology Software Tools, Inc.,
Sharpsburg, PA)
For each data type, the following pre-processing steps are
implemented to generate the wide
and long
datasets to be used for subsequent data analysis (see analytical reports
1. Psychometric
and descriptive analyses and 2. Regression
analyses):
Raw data reading & merging
Raw data recoding & processing
Data filtering based on inclusion criteria and data quality
Here, we remove all objects from the R global environment, and we set the system time zone for better temporal synchronization across data files.
# removing all objets from the workspace
rm(list=ls())
# setting system time zone to GMT (for consistent temporal synchronization)
Sys.setenv(tz="GMT")
The following R packages are used in this document (see References section):
# required packages
packages <- c("ggplot2","gridExtra","jsonlite","plyr","lubridate","scales","knitr","reshape2","careless",
"shiny","pracma","lattice")
# generate packages references
knitr::write_bib(c(.packages(), packages),"packagesProc.bib")
# # run to install missing packages
# xfun::pkg_attach2(packages, message = FALSE); rm(list=ls())
Here, the retrospective self-report data collected with the
preliminary questionnaire are read and saved in the
PrelQS
dataset, recoded, and filtered based on the
inclusion criteria.
library(ggplot2);library(gridExtra) # loading required packages
First, we read the Preliminary_qs.csv
data file exported
from Google
Forms.
PrelQS <- read.csv("qs preliminare/Preliminary_qs.csv")
Second, we recode the PrelQS
variables to be used for
the analyses.
As a first step, we recode participants’ identification
codes ID
(currently corresponding to their e-mail
addresses) using the “HRVXXX” format (e.g., from ‘john.smith@gmail.com’ to ‘HRV001’). This is done with
the prel.qs_IDrecode()
function. Since the participants’
e-mail addresses are confidential, both the function and the original
dataset are not showed.
# renaming the first two variables
colnames(PrelQS)[1:2] <- c("timestamp","ID")
# loading and using the function to recode the ID variable
source("prel.qs_IDrecode.R")
(PrelQS <- prel.qs_IDrecode(PrelQS))[1:3,] # showing first three rows
Second, we rename the other variables, remove those not considered for the present study, and recode the considered variables to be used in the analyses.
# renaming all variables
colnames(PrelQS) <- c("time","ID", # submission timestamp & participants ID
"consent", # consent to participate (all 'yes')
"age","gender","weight","height","itaMT", # demographics
# inclusion criteria
"drugs","drugs.which","cvDysf","cvDysf.which",
"otherDysf","otherDysf.which","phone","phone.which",
# retrospective scales (* = not considered in this study)
paste("DASS21",1:21,sep=""), # Depression, Anxiety, and Stress Scale*
paste("PANAS",1:20,sep=""), # Positive and Negative Affective Schedule*
paste("DERS",1:36,sep=""), # Difficulties in Emotion Regulation Scale*
paste("BIS11",c(1:15,17:20,22,23,25:30),sep=""), # Barratt Impulsiveness Scale-11 (3 items out)
paste("PSI",1:17,sep="")) # Physical Symptoms Inventory*
# keeping only considered variables (demographics and inclusion criteria)
PrelQS <- PrelQS[,c("ID","time",colnames(PrelQS)[which(colnames(PrelQS)=="age"):which(colnames(PrelQS)=="height")],
colnames(PrelQS)[which(colnames(PrelQS)=="drugs"):which(colnames(PrelQS)=="phone.which")])]
# recoding variables
PrelQS[,c("ID","gender")] <- lapply(PrelQS[,c("ID","gender")],as.factor) # ID and gender as factor
PrelQS$time <- as.POSIXct(PrelQS$time,format="%Y/%m/%d %I:%M:%S %p") # time as POSIXct
PrelQS[PrelQS$height<3,"height"] <- PrelQS[PrelQS$height<3,"height"]*100 # correcting heights reported in meters
PrelQS <- cbind(PrelQS[,1:4],BMI=PrelQS$weight/(PrelQS$height/100)^2, # computing BMI, removing weight & height
PrelQS[,7:ncol(PrelQS)])
for(i in which(colnames(PrelQS)%in%c("drugs","cvDysf","otherDysf","phone"))){ PrelQS[,i] <- gsub("Sì","Yes",PrelQS[,i]) }
PrelQS[,which(colnames(PrelQS)=="drugs"):which(colnames(PrelQS)=="phone.which")] <- # inclusion criteria as factors
lapply(PrelQS[,which(colnames(PrelQS)=="drugs"):which(colnames(PrelQS)=="phone.which")],as.factor)
# sorting dataset by ID and time
PrelQS <- PrelQS[order(PrelQS$ID,PrelQS$time),]
Third, we clean and filter the data based on the inclusion criteria and by focusing on those participants that were actually invited to participate to the daily protocol (N = 105).
First, we inspect the original number of cases in the
PrelQS
dataset. The questionnaire was completed by a total
of 164 respondents, of which a subsample was invited to participate in
the daily protocol.
cat("Original No. of responses to the PrelQS =",nrow(PrelQS))
## Original No. of responses to the PrelQS = 164
First, we inspect and remove cases of double
responses (N = 3 couples of responses with the same
ID
value). In these cases, only the first response (i.e.,
the one with the earilest timestamp) is included.
# detecting double responses
cat(nrow(PrelQS[duplicated(PrelQS$ID),]),"cases of double responses (same ID)")
## 3 cases of double responses (same ID)
# removing double responses
memory <- PrelQS
PrelQS <- PrelQS[!duplicated(PrelQS$ID),]
cat("Removed",nrow(memory)-nrow(PrelQS),"double responses")
## Removed 3 double responses
Second, we take a look at the variables concerning the inclusion criteria of the study:
Not suffering from dysfunctions (e.g., anxiety disorder) or taking medications affecting the nervous system (e.g., antidepressants)
Not suffering from dysfunctions (e.g., hypertension) or taking medications affecting the cardiovascular system (e.g., beta blockers)
Owning an Android or iOS smartphone
In section 1.2.1., we recoded participants identification codes and
we marked the cases of participants not meeting the inclusion criteria,
as well as other cases of participants that were not invited to take
part in the EMA protocol for other reasons, using the
“OUT” label in the ID
variable.
From a total of 161 responses to the preliminary questionnaire, we can
see that 105 participants (65%) were actually invited to participate in
the daily protocol.
cat("Total No. of responses to the PrelQS =",nrow(PrelQS),"\n -",
nrow(PrelQS[substr(PrelQS$ID,1,3)!="OUT",]),"invited\n -",nrow(PrelQS[substr(PrelQS$ID,1,3)=="OUT",]),"not invited")
## Total No. of responses to the PrelQS = 161
## - 105 invited
## - 56 not invited
Here, we better evaluate the reasons for the exclusion of the remaining 56 participants.
9 participants were not invited because they reported suffering from cardiovascular (i.e., premature heart beat, problems with the mitral valve) or other dysfunction (i.e., thalassemia, anxiety, Crohn’s disease) and/or taking antidepressants
1 participant was not invited because she did not own a smartphone
Further 46 participants were not invited to join the EMA protocol for other reasons (e.g., calendar incompatibilities, end of the data collection)
Here, we compare the demographic and retrospective variables between
included and excluded participants. Most participants that were not
invited to take part in the daily protocol were women
(i.e., likely due to the higher No. of women undergraduates in
psychology, which was the case for most participants, whereas
participants were invited with the aim to balance the sample by gender),
with no marked differences in terms of age
or
BMI
# preparing data
PrelQS$excl <- "IN" # creating excl variable (factor)
PrelQS[substr(PrelQS$ID,1,3)=="OUT","excl"] <- "OUT"
PrelQS$excl <- as.factor(PrelQS$excl)
# plotting
grid.arrange(ggplot(PrelQS,aes(x=excl,fill=gender)) + geom_bar(),
ggplot(PrelQS,aes(x=excl,y=age,fill=excl)) + geom_violin(),
ggplot(PrelQS,aes(x=excl,y=BMI,fill=excl)) + geom_violin(),nrow=1)
# removing excl
PrelQS$excl <- NULL
Here, we show the No. of participants not invited to take part in the
EMA protocol due to dysfunctions affecting either the
cardiovascular or the nervous system. We can see that
4 females were excluded since they reported suffering from a
cardiovascular dysfunction including premature heart
beat (027
, 055
) and problems with the mitral
valve (033
, 053
). 3 further females (4.61%)
were excluded since they reported suffering from other dysfunctions
affecting the nervous system, including thalassemia
(009
), anxiety (020
), Crohn’s disease
(049
).
# inclusion criteria variables
ic <- colnames(PrelQS)[which(colnames(PrelQS)=="drugs"):which(colnames(PrelQS)=="phone.which")]
# cardiovascular dysfunctions
PrelQS[substr(PrelQS$ID,1,3)=="OUT" & PrelQS$cvDysf=="Yes", c("ID","gender",ic)]
# other dysfunctions
PrelQS[substr(PrelQS$ID,1,3)=="OUT" & PrelQS$otherDysf=="Yes", c("ID","gender",ic)]
Here, we show the No. of participants not invited to take part in the
EMA protocol due to medications affecting either the
cardiovascular or the nervous system. 3 participants
(017
, F; 029
, M; 033
, M) were
excluded since they took antidepressants (Laroxyl,
Venlafaxina, Remeron), one of which also suffered from a cardiovascular
dysfunction.
# cardiovascular dysfunctions
PrelQS[substr(PrelQS$ID,1,3)=="OUT" & PrelQS$drugs=="Yes", c("ID","gender",ic)]
Here, we show the No. of participants not invited to take part in the
EMA protocol due to medications affecting either the
cardiovascular or the nervous system. Only 1 female
(051
) was excluded since she reported to not own a personal
smarpthone.
# cardiovascular dysfunctions
PrelQS[substr(PrelQS$ID,1,3)=="OUT" & PrelQS$phone=="No", c("ID","gender",ic)]
The further 46 participants marked with "OUT"
were not
invited due to other reasons (e.g., calendar incompatibility, end of the
study). Here, we inspect the data of such participants and we
remove them from the dataset. We can see that these
participants were mainly females (probably due to the higher No. of
female compared to undergraduates participating to psychological
studies, and our goal of having a balanced sample), with no other
peculiarities compared to the rest of the sample.
# showing participants that were not invited for other reasons
PrelQS[substr(PrelQS$ID,1,3)=="OUT" & !(PrelQS$ID%in%c("OUT033","OUT053","OUT027","OUT055","OUT009","OUT020",
"OUT049","OUT017","OUT029","OUT051")),]
Third, we remove all participants that were not invited to the daily
protocol (marked as “OUT”) from the Prelqs
dataset.
new <- PrelQS[!substr(PrelQS$ID,1,3)=="OUT",]
cat("Filtered",nrow(PrelQS)-nrow(new),"cases; new number of cases =",nrow(new))
## Filtered 56 cases; new number of cases = 105
PrelQS <- new
Fourth, we flag further participants that are not excluded but whose inclusion might bias some of the results.
First, some of the included participants reported suffering from dysfunctions or taking medications that were considered irrelevant for the current study. Here, we better inspect those conditions (N = 9 and 13, respectively):
6 included participants (4 females, 2 males) reported suffering from cardiovascular dysfunctions: arrhythmia, tachycardia episodes, and bicuspid aortic valve
3 included participants (2 females, 1 male) reported suffering from other dysfunctions: hypothyroidism, asthma, and allergy
6 females reported taking hormonal contraceptives
8 participants took other medications including antihistamines (2 males, 5 females), and thyroid hormones (EUTIROX, 1 female)
# cardiovascular dysfunctions
PrelQS[PrelQS$cvDysf=="Yes", c("ID","gender",ic)]
# other dysfunctions
PrelQS[PrelQS$otherDysf=="Yes", c("ID","gender",ic)]
# drugs
PrelQS[PrelQS$drugs=="Yes", c("ID","gender",ic)]
Although we considered such conditions as potentially irrelevant for
the present study, some of them (especially cardiovascular and other
dysfunctions) might play a role. Thus, we create the dysfun
variable to flag cases of included participants with
cardiovascular and/or other dysfunctions.
# recoding dysfun variable (accounting for both cardiovascular and other dysfunctions)
PrelQS$dysfun <- 0
PrelQS[PrelQS$cvDysf=="Yes" | PrelQS$otherDysf=="Yes","dysfun"] <- 1
# recoding drugs variable
PrelQS$drugs <- gsub("Yes","1",gsub("No","0",PrelQS$drugs))
# removing unnecessary variables
PrelQS[,c("cvDysf","otherDysf","cvDysf.which","otherDysf.which","drugs.which","phone","phone.which")] <- NULL
# summary of dysfun and drugs
PrelQS[,c("dysfun","drugs")] <- lapply(PrelQS[,c("dysfun","drugs")],as.factor) # both as factor
summary(PrelQS[substr(PrelQS$ID,1,3)=="HRV",c("dysfun","drugs")])
## dysfun drugs
## 0:96 0:92
## 1: 9 1:13
Second, four of the included participants (4.1%) lost one protocol
day (i.e., day 3 for HRV013
, HRV021
, and
HRV091
, day 1 for HRV021
), which was postponed
to a following week. Here, we mark these participants with
postponed = 1
.
PrelQS$postponed <- 0
PrelQS[PrelQS$ID %in% c("HRV013","HRV021","HRV067","HRV091"),"postponed"] <- 1
Finally, 8 further participants were excluded due to
dropping-out during the EMA protocol (009
, F;
059
, M), poor quality of physiological data
(015
, F; 046
, F; 066
, M) or
technical problems with the mobile app (011
, M;
027
, M; 029
, M). Here, we mark these
participants as ‘OUT’ and we inspect again the differences between
included and excluded participants. We can note that excluded
participants do not substantially deviate from the distributions of the
included participants. Then, we remove them from the PrelQS
dataset.
# marking 8 excluded participants as "OUT"
memory <- PrelQS
PrelQS$ID <- as.character(PrelQS$ID)
excl <- c("HRV009","HRV011","HRV015","HRV027","HRV029","HRV046","HRV059","HRV066")
PrelQS[PrelQS$ID%in%excl,"ID"] <- gsub("HRV","OUT",PrelQS[PrelQS$ID%in%excl,"ID"]) # marking excluded participants as "OUTXXX"
# plotting
PrelQS$excl <- "IN" # creating excl variable (factor)
PrelQS[substr(PrelQS$ID,1,3)=="OUT","excl"] <- "OUT"
PrelQS$excl <- as.factor(PrelQS$excl)
grid.arrange(ggplot(PrelQS,aes(x=excl,fill=gender)) + geom_bar(),
ggplot(PrelQS,aes(x=excl,y=age,fill=excl)) + geom_violin(),
ggplot(PrelQS,aes(x=excl,y=BMI,fill=excl)) + geom_violin(),nrow=1)
# updating number of included participants
cat(nrow(PrelQS[substr(PrelQS$ID,1,3)=="HRV",]),"included participants out of",
nrow(memory[substr(memory$ID,1,3)=="HRV",]),"invited participants")
## 97 included participants out of 105 invited participants
# removing excluded participants
PrelQS <- PrelQS[substr(PrelQS$ID,1,3)=="HRV",]
PrelQS$ID <- as.factor(PrelQS$ID) # ID as factor
PrelQS <- PrelQS[order(PrelQS$ID),] # sorting by ID
PrelQS$excl <- NULL # removing excl variable
Here, the raw data collected with the experience sampling
method (ESM) are read and saved in the ESM
dataset, recoded, and filtered based on response rate and inclusion
criteria.
First, we read the JSON data files exported from the Sensus Mobile app (Xiong et al., 2016).
Specifically, the Protocols
created with Sensus were configured to store the recorded .JSON data
files in our private AWS S3
bucket, from which they were downloaded and stored in the
ESM/data
folder. From the Sensus app, we also exported the
Probe
Definition files, which we use to replace the input IDs
(i.e., by default, each item is associated with an alphanumeic code)
with the corresponding input names (i.e., the item labels that
we set in the protocols). These files were saved in the
ESM/probe
folder.
The readSurveyData
function is used to optimize the ESM
data reading.
readSurveyData
#' @title Read JSON data exported from the Sensus mobile app
#' @param data.path = character string indicating the full path to the folder where the JSON raw data are stored.
#' @param probe.definition = character string indicating the full path to the folder where Probe Definition JSON file(s) (from the Sensus mobile app: Protocol > Probe > Scripted Interaction > Share definition).
#' @param messages = logical value indicating whether the processing steps should be printed (default: TRUE).
readSurveyData <- function(data.path,probe.definition,messages=TRUE){ require(jsonlite)
# 1. Reading data
# .......................................
options(digits.secs=3) # setting No. of digits
paths = list.files(data.path,recursive=TRUE,full.names=TRUE,include.dirs=FALSE) # listing files in data path
if(messages==TRUE){ cat("\n\nReading",length(paths),"data files from",data.path,"...")}
var.names <- c("ParticipantId","Timestamp","InputId","Response","RunTimestamp","SubmissionTimestamp",
"ScriptName","ProtocolId","$type") # selecting variables of interest
data <- as.data.frame(matrix(nrow=0,ncol=9)) # empty dataframe creation and population
colnames(data) <- var.names
for(path in paths){
if(file.info(path)$size>0){ # only reading Datum files (i.e., containing ScriptDatum, which is sized > 0 Kb)
new.data <- read_json(path,simplifyDataFrame=TRUE)
if(class(new.data)=="data.frame" & !is.null(new.data$Response)){ # only keeping files with Response data
if(class(new.data$Response)=="data.frame"){ # sometimes the Response column is read as dataframe
new.data$Response <- as.character(new.data$Response$`$values`)}
data <- rbind(data,new.data[var.names]) }}}
data <- data[!is.na(data$ParticipantId),] # only keeping data with ParticipantId information (i.e., participant identification)
data <- data[!is.na(data$InputId),] # only keeping data with InputId information (i.e., item identification)
names(data)[9] <- "os" # $type as OS (android or iOS) # mobile OS information
data[,9] <- gsub("Sensus.Probes.User.Scripts.ScriptDatum, Sensus","",data[,9]) # removing unuseful information
# 2. From Response Ids to Item labels (based on Probe Definition)
# ...............................................................
if(!is.na(probe.definition)){ if(messages==TRUE){ cat("\n\nConverting ResponseId into InputId based on Probe Definition...")}
# function to read Probe Definition files
readProbe <- function(path){ probedefinition <- read_json(path,simplifyDataFrame=TRUE) # first probe definition file
# reading input labels of the first inputGroup
inputs <- probedefinition$ScriptRunners$`$values`$Script$InputGroups$`$values`[[1]]$Inputs$`$values`[[1]]$Name
infos <- probedefinition$ScriptRunners$`$values`$Script$InputGroups$`$values`[[1]] # input labels
PROTOCOL <- data.frame(protocolName=probedefinition$Protocol$Name,protocolId=probedefinition$Protocol$Id, # protocol name
scriptName=infos$Name,inputName=inputs,inputId=infos$Inputs$`$values`[[1]]$Id) # script name
if(length(probedefinition$ScriptRunners$`$values`$Script$InputGroups$`$values`)>1){ # reading the following inputGroups
for(i in 2:length(probedefinition$ScriptRunners$`$values`$Script$InputGroups$`$values`)){
inputs <- probedefinition$ScriptRunners$`$values`$Script$InputGroups$`$values`[[i]]$Inputs$`$values`[[1]]$Name
infos <- probedefinition$ScriptRunners$`$values`$Script$InputGroups$`$values`[[i]]
PROTOCOL <- rbind(PROTOCOL,data.frame(protocolName=probedefinition$Protocol$Name,
protocolId=probedefinition$Protocol$Id,
scriptName=infos$Name,inputName=inputs,
inputId=infos$Inputs$`$values`[[1]]$Id)) }}
return(PROTOCOL) }
paths = list.files(probe.definition,recursive=TRUE,full.names=TRUE,include.dirs=FALSE) # listing files in probe.definition path
PROTOCOL <- readProbe(paths[1]) # reading the first Probe Definition file
if(length(list.files(probe.definition))>1){ # reading the following Probe Definition files when > 1
for(path in paths[2:length(paths)]){ PROTOCOL <- rbind(PROTOCOL,readProbe(path)) }}
for(i in 1:nrow(data)){
for(j in 1:nrow(PROTOCOL)){ if(!is.na(data[i,3]) & data[i,3]==PROTOCOL[j,5]){ data[i,3] <- as.character(PROTOCOL[j,4]) }}}}
# 3. Cleaning and unlisting Response data
# ...............................................................
if(messages==TRUE){ cat("\n\nUnlisting Response data and removing system information...")}
data$Response <- gsub("list","",data$Response) # cleaning categorical items from Sensus system info
data$Response <- gsub(paste("c","\\(|\\)",sep=""),"",data$Response)
data$Response <- gsub("\\(|\\)","",data$Response)
data$Response <- gsub("\\[|\\]","",data$Response)
data$Response <- gsub("\\$type` = \"System.Collections.Generic.List`1System.Object, mscorlib, mscorlib\", ","",data$Response)
data$Response <- gsub("\\$values","",data$Response)
data$Response <- gsub('``` = ', "",data$Response)
data$Response <- gsub('\ ', "",data$Response)
data$Response <- gsub('\"', "",data$Response)
data$Response <- gsub('\"No\"', "No",data$Response)
data$Response <- gsub('\"Sì\"', "Si",data$Response)
if(class(data$Response)=="data.frame"){ data$Response <- as.character(data$Response$`$values`[[1]]) # unlisting Response column
} else { data$Response <- as.character(data$Response) }
# 4. Encoding time information
# ...............................................................
if(messages==TRUE){ cat("\n\nConverting Timestamp variables as POSIXct...")}
timestamps <- c("Timestamp","RunTimestamp","SubmissionTimestamp") # timestamps variables
data[,timestamps] <- lapply(data[,timestamps],function(x) as.POSIXct(x,format="%Y-%m-%dT%H:%M:%OS",tz="GMT"))
# 5. Cleaning incomplete double responses
# ...............................................................
# in some cases, multiple responses are recorded with the same ParticipantId, RunTimeStamp, and InputId
if(messages==TRUE){ cat("\n\nLooking for double responses...\n")}
data$id.r <- paste(data$ParticipantId,data$RunTimestamp,data$InputId,sep="_") # ID-time-item identifier
data$id.t <- paste(data$ParticipantId,data$RunTimestamp,sep="_") # ID-time identifier
doubles <- levels(as.factor(data[duplicated(data$id.r),"id.t"])) # double responses (same ID-time-item identifier)
ndoubles <- rep(length(doubles),2) # No. of double responses
for(double in doubles){ subId <- levels(as.factor(data[data$id.t==double,"SubmissionTimestamp"])) # submission time
if(length(subId)>1){ ndoubles[1] <- ndoubles[1] - 1
maxDouble <- max(as.POSIXct(subId)) # when the SubmissionTimestamp is different -> splitting responses
data[data$id.t==double,"RunTimestamp"] <- maxDouble - 61.20 # 2nd RunTimeStamp = Submission - 61.20 sec (median resp time)
} else { ndoubles[2] <- ndoubles[2] - 1
data <- rbind(data[data$id.t!=double,], # when even the SubmissionTimestamp is different -> removing double resp.
data[data$id.t==double & !duplicated(data$id.r),]) }}
cat(" - Removed",ndoubles[1],"double responses (same ID, InputName, RunTimestamp, & SubmissionTimestamp)\n",
"- Splitted",ndoubles[2],"double responses with different SubmissionTimestamp\n",
" (in these cases, RunTimestamp is recomputed as SubmissionTimestamp - 61.20 sec (average response length)")
# 6. Sorting columns and Reshaping
# ...............................................................
if(messages==TRUE){ cat("\n\nSorting columns and reshaping (one row per data entry)...")}
colnames(data)[1] <- "ID" # selecting and sorting columns
data <- data[,c("ID","os","ProtocolId","ScriptName","RunTimestamp","SubmissionTimestamp","InputId","Response","Timestamp")]
data <- data[order(data$ID,data$Timestamp),] # sorting rows
wide <- reshape(data[,1:ncol(data)-1],v.names=c("Response"),timevar=c("InputId"), # reshaping from long to wide (resp-by-resp)
idvar=c("RunTimestamp","SubmissionTimestamp"),direction=c("wide"),sep="")
colnames(wide) <- gsub("Response","",gsub("\\-",".",colnames(wide))) # removing label "Response" from ResponseId
wide <- wide[order(wide$ID,wide$RunTimestamp),] # sorting rows by participant ID and RunTimestamp
# wide <- data.frame(cbind(wide[1:3],wide[ncol(data)],wide[5:ncol(data)-1])) # final sorting of rows and columns
wide <- wide[order(wide$ID,wide$RunTimestamp),]
row.names(wide) <- as.character(1:nrow(wide))
if(messages==TRUE){ cat("\n\nRead",nrow(wide),"responses from",nlevels(as.factor(data$ID)),"participants.") }
# 7. Processing response times
# ...............................................................
data$RT <- as.numeric(NA) # computing item response times (i.e., seconds between each response and the previous one)
for(i in 2:nrow(data)){ if(data[i,"ID"]==data[i-1,"ID"] & data[i,"RunTimestamp"]==data[i-1,"RunTimestamp"]){
data[i,"RT"] <- difftime(data[i,"Timestamp"],data[i-1,"Timestamp"],units="secs")}}
data$RT.tot <- as.numeric(NA) # computing survey total response time (i.e., last - first response of each survey)
data$survey <- as.factor(paste(data$ID,data$RunTimestamp,sep="_"))
for(survey in levels(data$survey)){ data[data$survey==survey,"RT.tot"] <-
difftime(data[data$survey==survey,"SubmissionTimestamp"][1],
data[data$survey==survey,"Timestamp"][1],units="secs")}
rt <- reshape(data[,c("ID","RunTimestamp","SubmissionTimestamp","RT.tot","InputId","RT")], # reshaping
v.names=c("RT"),timevar=c("InputId"),idvar=c("RunTimestamp","SubmissionTimestamp"),direction=c("wide"),sep="")
colnames(rt)[5:ncol(rt)] <- gsub("RT","",colnames(rt)[5:ncol(rt)]) # removing label "Timestamp" from ResponseId
# 7. Returning processed dataset and response time
# ...............................................................
return(list(wide,rt))}
Here, we use the readSurveyData
function (depicted
above) to read and preliminary recode the .JSON data files, and to merge
them into the ESM
dataset. In one single case, the JSON
string contains (illegal) UTF8 byte-order-mark.
# data reading, encoding and saving
ESM <- readSurveyData(data.path="ESM/data",probe.definition="ESM/probe")
## Loading required package: jsonlite
##
##
## Reading 3200 data files from ESM/data ...
## Warning: JSON string contains (illegal) UTF8 byte-order-mark!
##
##
## Converting ResponseId into InputId based on Probe Definition...
##
## Unlisting Response data and removing system information...
##
## Converting Timestamp variables as POSIXct...
##
## Looking for double responses...
## - Removed 2 double responses (same ID, InputName, RunTimestamp, & SubmissionTimestamp)
## - Splitted 10 double responses with different SubmissionTimestamp
## (in these cases, RunTimestamp is recomputed as SubmissionTimestamp - 61.20 sec (average response length)
##
## Sorting columns and reshaping (one row per data entry)...
##
## Read 1392 responses from 104 participants.
ESM.RT <- ESM[[2]] # dataset with response times
ESM <- ESM[[1]] # dataset with participant responses
We also integrate the ESM
dataset with 4
additional responses that were sent via instant messages (i.e.,
screenshot of the responses) due to technical problems. These were
stored in the screenshotX4.rda
file.
load("ESM/screenshot_surevyMalfunctioning/screenshotX4.RData") # loading additiona responses
cat("Adding",nrow(screenshotX4),"additional responses sent via instant messages") # N = 4
## Adding 4 additional responses sent via instant messages
ESM <- rbind(ESM,screenshotX4) # adding additional responses to the ESM dataset
Finally, we import the “baseline surveys” data
(i.e., ESM data collected in the lab at the beginning of each day, using
Google Forms),
exported from Google Form as a .CSV file. Some recoding procedures will
be necessary before merging the baseline
and the
ESM
datasets.
# reading raw dataset exported from Google Forms
baseline <- read.csv("baseline/Baseline.csv",header=TRUE)
cat("Reading",nrow(baseline),"responses to the baseline questionnaire from",
nlevels(as.factor(baseline$Nome.utente)),"participants")
## Reading 311 responses to the baseline questionnaire from 108 participants
Second, we recode the ESM
and baseline
variables to be used for the analyses, and we merge the two
datasets.
The following packages and functions are used to optimize the ESM data recoding:
library(lubridate); library(plyr); library(scales)
within.day.adjust
within.day.adjust <- function(data){
# Creating within.day.indicator (dafault = 0)
data$within.day <- NA
# adjusting within.day based on the scheduled times
for(i in 1:nrow(data)){
if(strftime(data[i,"RunTimestamp",], # survey 1 between 11:20 (- 10 min error) and 11:40 (up to 12.00 + 10min error)
format="%H:%M:%S")>strftime("1970-01-01 11:10:00",
format="%H:%M:%S") & strftime(data[i,"RunTimestamp",],
format="%H:%M:%S")<strftime("1970-01-01 12:10:00",
format="%H:%M:%S")){
data[i,"within.day"] = 1 }
else if(strftime(data[i,"RunTimestamp",], # survey 2 between 12:30 (- 10min error) and 12:50 (up to 13:10 + 10min error)
format="%H:%M:%S")>strftime("1970-01-01 12:20:00",
format="%H:%M:%S") & strftime(data[i,"RunTimestamp",],
format="%H:%M:%S")<strftime("1970-01-01 13:20:00",
format="%H:%M:%S")){
data[i,"within.day"] = 2 }
else if(strftime(data[i,"RunTimestamp",], # survey 3 between 13:40 (- 10min error) and 14:00 (up to 14:20 + 10min error)
format="%H:%M:%S")>strftime("1970-01-01 13:30:00",
format="%H:%M:%S") & strftime(data[i,"RunTimestamp",],
format="%H:%M:%S")<strftime("1970-01-01 14:30:00",
format="%H:%M:%S")){
data[i,"within.day"] = 3}
else if(strftime(data[i,"RunTimestamp",], # survey 4 between 14:50 (- 10min error) and 15:10 (up to 15.30 + 10min error)
format="%H:%M:%S")>strftime("1970-01-01 14:40:00",
format="%H:%M:%S") & strftime(data[i,"RunTimestamp",],
format="%H:%M:%S")<strftime("1970-01-01 15:40:00",
format="%H:%M:%S")){
data[i,"within.day"] = 4 }
else if(strftime(data[i,"RunTimestamp",], # survey 5 between 16:00 (- 10min error) and 16:20 (up to 16:40 + 10min error)
format="%H:%M:%S")>strftime("1970-01-01 15:50:00",
format="%H:%M:%S") & strftime(data[i,"RunTimestamp",],
format="%H:%M:%S")<strftime("1970-01-01 16:50:00",
format="%H:%M:%S")){
data[i,"within.day"] = 5 }}
# sorting dataset based on ID, day and within.day
data <- data[order(data$ID,data$day,data$within.day),]
data$within.day <- as.factor(data$within.day)
rownames(data) <- 1:nrow(data)
cat("Creating/Adjusting within.day variable...\n Printing summary of cases:\n")
print(summary(data$within.day))
return(data) }
Creates or adjusts the within.day
variable based on the
scheduled temporal windows used in the current study.
varsRecoding
varsRecoding <- function(data){
# ID and os as factors
data[,c("ID","os")] <- lapply(data[,c("ID","os")],as.factor)
# Recoding item scores and reversed items
ratings <- colnames(data)[c(which(colnames(data)=="t3.nervoso.tranquillo"), # recoding ratings as integer
which(colnames(data)=="e3.affaticato.fresco"):which(colnames(data)=="intensity.NE"),
which(colnames(data)=="v1.male.bene"):which(colnames(data)=="v3.positivo.negativo"))]
data[,ratings] <- lapply(data[,ratings],as.integer)
data$v3.positivo.negativo <- 8 - data$v3.positivo.negativo # HEDONIC TONE = positive
colnames(data)[which(colnames(data)=="v2.soddisfatto.insoddisfatto")] <- "v2.insoddisfatto.soddisfatto" # changing wrong label
data$t1.rilassato.teso <- 8 - data$t1.rilassato.teso # CALMNESS = positive
data$e2.pieno.privodenergia <- 8 - data$e2.pieno.privodenergia # ENERGETIC AROUSAL = positive
data[!is.na(data$intensity.NE) & data$intensity.NE > 7,"intensity.NE"] <- 7 # correcting 2 wrongly recorded items (score > 7)
# Recoding confounders
data$alcohol <- as.factor(gsub("No","0",gsub("Sì","1",data$alcohol))) # alcohol (from yes/no --> 1/0)
data$Activity <- as.factor(gsub("Leggerasedutiocoricatiperlamaggiorpartedeltempo","0", # physical activity level (0 = seated)
gsub("Moderatacamminata,farelescale","1", # (1 = moderate)
gsub("Intensaattivitàsportiva,palestra,corsaodiverserampediscale","2", # (2 = vigorous)
data$Activity))))
data$People.rec <- as.factor(gsub("No,erodasola","0",gsub("No,erodasolo","0", # (0=alone)
gsub("Sì,manonsonostatadisturbata","1",gsub("Sì,manonsonostatadisturbato","1", # (1=not interact)
gsub("Sì,esonostataunpo'disturbata","2",gsub("Sì,esonostataunpo'disturbato","2", # (2=disturb)
gsub("Sì,esonostatainterrotta","3",gsub("Sì,esonostatainterrotto","3", #(3 = interrupted)
data$People.rec)))))))))
data$ProtocolId <- as.factor(gsub("ProtocolStudent","",data$ProtocolId)) # from protocol ID to gender
colnames(data)[which(colnames(data)=="ProtocolId")] <- "gender"
data[,c("smoke","coffe")] <- lapply(data[,c("smoke","coffe")],as.factor) # remaining confounders as factor 0/1
# Sorting and renaming columns
data <- data[,c("ID","os","gender","day","within.day","RunTimestamp", # response information
"v1.male.bene","v2.insoddisfatto.soddisfatto","v3.positivo.negativo", # hedonic tone
"t1.rilassato.teso","t2.agitato.calmo","t3.nervoso.tranquillo", # calmness
"e1.stanco.sveglio","e2.pieno.privodenergia","e3.affaticato.fresco", # energetic arousal
"PE","intensity.PE","NE","intensity.NE", # emotional events
"smoke","coffe","alcohol","Activity","People.rec")] # confounders
colnames(data)[7:ncol(data)] <- c("v1","v2","v3","t1","t2","t3","f1","f2","f3", # renaming columns
"PE","PE.int","NE","NE.int","smoke","coffe","alcohol","activity","people")
# final sorting and returning data
data[,which(colnames(data)=="v1"):which(colnames(data)=="NE.int")] <-
lapply(data[,which(colnames(data)=="v1"):which(colnames(data)=="NE.int")],as.integer) # slider ratings as integer
data$sort <- as.numeric(substr(data$ID,4,6))
data <- data[order(data$sort,data$RunTimestamp),] # sorting by ID and timestamp
data$sort <- NULL # removing sort column
return(data) }
Recodes slider and multiple-choice responses collected with the Sensus protocols used in the study, sort the columns, and fix problems due to Daylight Time changes.
bslRecoding
bslRecoding <- function(data){
# selecting included variables, renaming and sorting columns
colnames(data) <- c("RunTimestamp","ID","v1","t1","f1","v2","t2","f2","v3","t3","f3","PE","PE.int","NE","NE.int",
"Sleep1","Sleep2","Sleep3", # baseline sleep ratings are not considered
"daybefore.ex","drugs","drugs.which",
"day","within.day")
data <- data[,c("ID","day","within.day","RunTimestamp",
"v1","v2","v3","t1","t2","t3","f1","f2","f3", # renaming columns
"PE","PE.int","NE","NE.int","daybefore.ex","drugs","drugs.which")]
# ID as factor
data$ID <- as.factor(data$ID)
# recoding item scores
data$v2 <- 8 - data$v2 # hedonic tone = positive
data$v3 <- 8 - data$v3
data$t1 <- 8 - data$t1 # calmness = positive
data$f2 <- 8 - data$f2 # e.arousal = positive
# other confounding variables
data$drugs <- as.factor(gsub("Sì","1",gsub("No","0",data$drugs))) # drugs
data$daybefore.ex <- as.factor(gsub("Sì","1",gsub("No","0",data$daybefore.ex))) # intense exercise on the day before
# sorting by timestamp
data <- data[order(data$ID,data$RunTimestamp),]
return(data)}
Recodes the variables collected with the baseline surveys.
First, we recode participants’ identification codes
ID
(currently corresponding to their e-mail addresses)
using the “HRVXXX” format (e.g., from ‘john.smith@gmail.com’ to ‘HRV001’). This is done with
the participantID_recoding()
and the
participantID_recoding_baseline()
functions. Again, since
the participants’ e-mail addresses are confidential, both the function
and the original dataset are not showed.
# loading and using the function to recode the ID variable
source("participantID_recoding.R"); source("participantID_recoding_baseline.R")
(ESM <- participantID_recoding(ESM))[1:3,] # recoding ESM IDs and showing first three rows
colnames(baseline)[2] <- "ID"
(baseline <- participantID_recoding_baseline(baseline))[1:3,] # recoding baseline IDs and showing first three rows
Second, we process the timestamps (i.e., temporal
variables) in both the EMA
and the baseline
datasets.
Here, we convert the included timestamps from character to POSIXct (the R format for dates and times), using the “GMT” time zone (note that data were collected in the CET/CEST region, but GMT is easier to handle).
# converting temporal variables as POSIXct in the ESM dataset
ESM[,c("RunTimestamp","SubmissionTimestamp")] <- # no need to specify format since they already are POSIXct
lapply(ESM[,c("RunTimestamp","SubmissionTimestamp")],function(x) as.POSIXct(x,tz="GMT"))
# converting baseline variables as POSIXct in the baseline dataset
colnames(baseline)[which(colnames(baseline)=="Informazioni.cronologiche")] <- "RunTimestamp"
baseline$RunTimestamp <- as.POSIXct(baseline$RunTimestamp,format="%Y/%m/%d %I:%M:%S %p",tz="GMT")
Then, we inspect the data for cases of wrongly encoded
timestamps, considering that data collection started on
November 6th, 2018 and ended on November 21th, 2019. Only two cases are
detected, whose RunTimestamp
was recorded with the wrong
year.
# min and max RunTimestamp
c(min(ESM$RunTimestamp),max(ESM$RunTimestamp)) # ESM: unexpected values < November 2018
## [1] "2018-01-24 16:40:00.000 GMT" "2019-11-21 15:11:17.835 GMT"
c(min(baseline$RunTimestamp),max(baseline$RunTimestamp)) # baseline: ok
## [1] "2018-11-06 10:12:16 GMT" "2019-11-21 10:45:48 GMT"
# only two cases with RunTimestamp < November 2018 and no SubmissionTimestamp
ESM[ESM$RunTimestamp < as.POSIXct("2018-11-06") | ESM$RunTimestamp > as.POSIXct("2019-11-22"),
c("ID","RunTimestamp","SubmissionTimestamp")]
# plotting all timestamps from these two participants: isolated point recorded as it was collected one year before
grid.arrange(ggplot(ESM[ESM$ID=="HRV046",],aes(RunTimestamp)) + geom_histogram(bins=30) + ggtitle("HRV046"),
ggplot(ESM[ESM$ID=="HRV048",],aes(RunTimestamp)) + geom_histogram(bins=30) + ggtitle("HRV048"))
# correcting wrongly encoded RunTimestamps (adding 1year) and SubmissionTimestamps (RunTimestamp + 61.20 sec)
ESM[substr(ESM$RunTimestamp,1,7)=="2018-01","SubmissionTimestamp"] <-
ESM[substr(ESM$RunTimestamp,1,7)=="2018-01","RunTimestamp"] + 61.20
ESM[substr(ESM$RunTimestamp,1,7)=="2018-01",c("RunTimestamp","SubmissionTimestamp")] <-
ESM[substr(ESM$RunTimestamp,1,7)=="2018-01",c("RunTimestamp","SubmissionTimestamp")] + 24*60*60*365
c(min(ESM$RunTimestamp),max(ESM$RunTimestamp)) # sanity check: min and max matching with the data collection period
## [1] "2018-11-06 10:18:46.547 GMT" "2019-11-21 15:11:17.835 GMT"
# 2 further cases with missing SubmissionTimestamp (replacing with RunTimestamp + 61.20 sec)
ESM[is.na(ESM$SubmissionTimestamp),c("ID","RunTimestamp","SubmissionTimestamp")]
ESM[is.na(ESM$SubmissionTimestamp),"SubmissionTimestamp"] <- ESM[is.na(ESM$SubmissionTimestamp),"RunTimestamp"] + 61.20
Second, we adjust the temporal coordinates
accounting for daylight saving time. Indeed, whereas
timestamps were recorded based on the internal clock of participants’
mobile phones, which automatically synchronizes based on time changes,
such time shifts got lost in the data exporting, resulting in
one-hour shifts during standard time. In Italy, during
the data collection period, daylight saving time was applied from March
31th to October 27th 2019. Here, we visually inspect the timestamp of
the first response for both the EMA
and the
baseline
datasets.
We can note that in the EMA
dataset, the responses
recorded during standard time were encoded with
one hour less than the actual time (i.e., the first
questionnaire was scheduled at 11:30, whereas most responses were
recorded at 10:30), whereas those recorded during daylight
saving time show one additional hour less than
the actual time (i.e., most responses recorded at 9:30). In the
baseline
dataset, the responses recorded during
daylight saving time were recorded with one
hour less than the actual time (i.e., baseline questionnaires
were administered around 10:30, whereas most responses are around 9:30),
whereas those recorded during standard time look
fine.
# ESM: computing and plotting time without date
ESM$daytime <- as.POSIXct(paste(hour(ESM$RunTimestamp),minute(ESM$RunTimestamp)),format="%H %M",tz="GMT")
ESM$daylight <- "standard"
ESM[ESM$RunTimestamp > as.POSIXct("2019-04-01") & ESM$RunTimestamp < as.POSIXct("2019-10-27"),"daylight"] <- "daylight"
ggplot(ESM[!duplicated(ESM$ID),],aes(y=daytime,fill=daylight)) + geom_bar(position=position_dodge(),width=50) +
scale_y_datetime(labels = date_format("%H:%M"),date_breaks="60 min") +
ggtitle("ESM$RunTimestamp of the first response for each participant")
# baseline: computing and plotting time without date
baseline$daytime <- as.POSIXct(paste(hour(baseline$RunTimestamp),minute(baseline$RunTimestamp)),format="%H %M",tz="GMT")
baseline$daylight <- "standard"
baseline[baseline$RunTimestamp>as.POSIXct("2019-04-01") & baseline$RunTimestamp<as.POSIXct("2019-10-27"),"daylight"]<-"daylight"
ggplot(baseline[,],aes(y=daytime,fill=daylight)) + geom_bar(position=position_dodge(),width=50) +
scale_y_datetime(labels = date_format("%H:%M"),date_breaks="60 min") +
ggtitle("baseline$RunTimestamp")
Here, we adjust the temporal coordinates in both datasets according to the observed time shifts.
# adding 1 hour to all ESM timestamps
ESM[,c("RunTimestamp","SubmissionTimestamp")] <- ESM[,c("RunTimestamp","SubmissionTimestamp")] + 1*60*60
# adding 1 further hour to ESM timestamps during daylightsaving time
ESM[ESM$daylight=="daylight",c("RunTimestamp","SubmissionTimestamp")] <-
ESM[ESM$daylight=="daylight",c("RunTimestamp","SubmissionTimestamp")] + 1*60*60
# adding 1 hour to baseline timestamps during daylightsaving time
baseline[baseline$daylight=="daylight","RunTimestamp"] <- baseline[baseline$daylight=="daylight","RunTimestamp"] + 1*60*60
# ESM: computing and plotting time without date (NOW OK)
ESM$daytime <- as.POSIXct(paste(hour(ESM$RunTimestamp),minute(ESM$RunTimestamp)),format="%H %M",tz="GMT")
ggplot(ESM[!duplicated(ESM$ID),],aes(y=daytime,fill=daylight)) + geom_bar(position=position_dodge(),width=50) +
scale_y_datetime(labels = date_format("%H:%M"),date_breaks="60 min") +
ggtitle("ESM$RunTimestamp of the first response for each participant")
# baseline: computing and plotting time without date (NOW OK)
baseline$daytime <- as.POSIXct(paste(hour(baseline$RunTimestamp),minute(baseline$RunTimestamp)),format="%H %M",tz="GMT")
ggplot(baseline[,],aes(y=daytime,fill=daylight)) + geom_bar(position=position_dodge(),width=50) +
scale_y_datetime(labels = date_format("%H:%M"),date_breaks="60 min") +
ggtitle("baseline$RunTimestamp of the first response for each participant")
# removing daytime and daylight columns
ESM$daytime <- ESM$daylight <- baseline$daytime <- baseline$daylight <- NULL
Finally, we create two variables to better organize the temporal structure of the datasets:
day
indicating the weekday indicator (i.e., 1 =
Tuesday, 2 = Wednesday, 3 = Thursday)
within.day
consisting in a row identifier within
each day. For baseline
responses, we set
within.day = 0
# creating indicator for the weekday
ESM$day <- as.POSIXlt(ESM$RunTimestamp)$wday - 1
baseline$day <- as.POSIXlt(baseline$RunTimestamp)$wday - 1
levels(as.factor(c(levels(as.factor(ESM$day)),levels(as.factor(baseline$day))))) # sanity check: ok
## [1] "1" "2" "3"
# creating row identifier within each day
ESM <- ddply(ESM,c("ID","day"),transform,within.day=seq_along(day))
baseline$within.day <- 0 # assigning zero to the baseline questionnaires
# showing data structure
ESM[,c("ID","day","within.day","RunTimestamp")]
However, the method used above to encode the within.day
variable counts the survey as they are received (i.e., first, second,
third), and not as they were scheduled (i.e., based on the
RunTimestamp
variable), neglecting missing responses.
Here, we use the within.day.adjust()
function showed at
the beginning of section 2.2 to recode the within.day
variable accounting for the following scheduled temporal
windows and the expiration time (i.e., after
20 minutes), and we add 10 extra minutes before and
after the following time limits to account for the variability among
devices:
0 = 09:45 ~ 11:00 (baseline survey)
1 = 11:20 - 11:40 + 20 min (up to 12:00)
2 = 12:30 - 12:50 + 20 min (up to 13:10)
3 = 13:40 - 14:00 + 20 min (up to 14:20)
4 = 14:50 - 15:10 + 20 min (up to 15:50)
5 = 16:00 - 16:20 + 20 min (up to 16:40)
Here, we apply the function and we correct one case of missing
within.day
(i.e., RunTimestamp
= 17:40;
recorded as within.day = 5
).
# adjusting within.day based on scheduled time windows
ESM <- within.day.adjust(ESM)
## Creating/Adjusting within.day variable...
## Printing summary of cases:
## 1 2 3 4 5 NA's
## 275 280 288 267 285 1
# correcting 1 missing case (17:40 -> within.day = 5)
ESM[is.na(ESM$within.day),"within.day"] <- 5
# showing data structure
ESM[,c("ID","day","within.day","RunTimestamp")]
Finally, we use the varsRecoding()
and the
bslRecoding()
functions showed at the beginning of section
2.2 to recode the remaining variables (i.e., renaming and reversing item
scores, recoding categorical variables, sorting and renaming columns) in
the ESM
and the baseline
datasets,
respectively.
# recoding EMA dataset
ESM <- varsRecoding(ESM)
# recoding baseline dataset
baseline <- bslRecoding(baseline)
Here, we conduct a series of sanity checks by inspecting the summary
and plots of each variable. In the ESM
dataset, we can see
that 11 cases from the same participant HRV099
have both
coffe
and alcohol = "NULL"
, whereas the values
of the other confounders look fine. Since this is likely due to
unanswered items because the condition was not met, we recode those
cases with coffe
and alcohol = 0
. Moreover, we
can note that in some cases participants selected more than one option
for the people
variable. In these cases, we recode the
score in a conservatory way, that is by assigning the higher score.
# inspecting summary of ESM variables
summary(ESM)
## ID os gender day within.day
## HRV083 : 19 Android:1110 F:702 Min. :1.00 1:275
## HRV032 : 17 iOS : 286 M:694 1st Qu.:1.00 2:280
## HRV036 : 16 Median :2.00 3:288
## HRV073 : 16 Mean :1.99 4:267
## HRV095 : 16 3rd Qu.:3.00 5:286
## HRV003 : 15 Max. :3.00
## (Other):1297
## RunTimestamp v1 v2
## Min. :2018-11-06 11:18:46.54 Min. :1.000 Min. :1.000
## 1st Qu.:2018-12-18 13:54:35.94 1st Qu.:4.000 1st Qu.:4.000
## Median :2019-03-13 16:15:10.50 Median :5.000 Median :5.000
## Mean :2019-03-22 16:02:26.94 Mean :4.981 Mean :4.569
## 3rd Qu.:2019-05-21 15:05:04.84 3rd Qu.:6.000 3rd Qu.:5.000
## Max. :2019-11-21 16:11:17.83 Max. :7.000 Max. :7.000
## NA's :13
## v3 t1 t2 t3
## Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000
## 1st Qu.:4.000 1st Qu.:4.000 1st Qu.:4.000 1st Qu.:4.000
## Median :5.000 Median :5.000 Median :5.000 Median :5.000
## Mean :4.734 Mean :4.787 Mean :4.827 Mean :4.853
## 3rd Qu.:6.000 3rd Qu.:6.000 3rd Qu.:6.000 3rd Qu.:6.000
## Max. :7.000 Max. :7.000 Max. :7.000 Max. :7.000
## NA's :14 NA's :8 NA's :13 NA's :23
## f1 f2 f3 PE
## Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000
## 1st Qu.:3.000 1st Qu.:3.000 1st Qu.:3.000 1st Qu.:1.000
## Median :4.000 Median :4.000 Median :4.000 Median :1.000
## Mean :4.255 Mean :4.239 Mean :4.167 Mean :1.856
## 3rd Qu.:5.000 3rd Qu.:5.000 3rd Qu.:5.000 3rd Qu.:2.000
## Max. :7.000 Max. :7.000 Max. :7.000 Max. :7.000
## NA's :10 NA's :14 NA's :23 NA's :24
## PE.int NE NE.int smoke coffe
## Min. :1.000 Min. :1.000 Min. :1.000 0 :1167 0 :1101
## 1st Qu.:1.000 1st Qu.:1.000 1st Qu.:1.000 1 : 157 1 : 221
## Median :1.000 Median :1.000 Median :1.000 2 : 10 2 : 8
## Mean :1.798 Mean :1.393 Mean :1.447 5 : 1 NULL: 11
## 3rd Qu.:2.000 3rd Qu.:1.000 3rd Qu.:1.000 NULL: 11 NA's: 55
## Max. :7.000 Max. :7.000 Max. :7.000 NA's: 50
## NA's :27 NA's :36 NA's :42
## alcohol activity people
## 0 :1329 0 :921 1 :698
## 1 : 9 1 :388 0 :345
## NA's: 58 2 : 29 2 :249
## NA's: 58 3 : 35
## 1,0 : 2
## (Other): 7
## NA's : 60
# cases with coffe or alcohol = "NULL"
ESM[!is.na(ESM$coffe) & ESM$coffe=="NULL",c(1,20:ncol(ESM))]
ESM[!is.na(ESM$coffe) & ESM$coffe=="NULL",c("smoke","coffe")] <- "0"
ESM[,c("smoke","coffe")] <- lapply(lapply(ESM[,c("smoke","coffe")],as.character),as.factor)
# inspecting summary of people
summary(ESM$people)
## 0 0,1 0,1,2 0,2 1 1,0 2 2,1 2,3 3 3,0 3,1 NA's
## 345 1 1 1 698 2 249 1 1 35 1 1 60
ESM$people <- as.factor(gsub("0,1","0", # replacing multi-option responses with the higher selected score
gsub("0,1,2","2",
gsub("0,2","2",
gsub("1,0","1",
gsub("2,1","2",
gsub("2,3","3",
gsub("3,0","3",
gsub("3,1","3",ESM$people)))))))))
# re-inspecting recoded variables (no more "NULL" and multi-choice responses)
summary(ESM[,c("smoke","coffe","people")])
## smoke coffe people
## 0 :1178 0 :1112 0 :346
## 1 : 157 1 : 221 1 :700
## 2 : 10 2 : 8 2 :252
## 5 : 1 NA's: 55 3 : 38
## NA's: 50 NA's: 60
# inspecting summary of baseline variables (all look fine)
summary(baseline)
## ID day within.day RunTimestamp
## HRV001 : 3 Min. :1.000 Min. :0 Min. :2018-11-06 10:12:16.0
## HRV002 : 3 1st Qu.:1.000 1st Qu.:0 1st Qu.:2018-12-18 10:19:24.5
## HRV003 : 3 Median :2.000 Median :0 Median :2019-03-13 10:27:06.0
## HRV004 : 3 Mean :1.997 Mean :0 Mean :2019-03-22 08:11:48.5
## HRV005 : 3 3rd Qu.:3.000 3rd Qu.:0 3rd Qu.:2019-05-21 10:33:18.5
## HRV006 : 3 Max. :3.000 Max. :0 Max. :2019-11-21 10:45:48.0
## (Other):293
## v1 v2 v3 t1 t2
## Min. :2.00 Min. :2.000 Min. :1.000 Min. :1.000 Min. :1.000
## 1st Qu.:4.00 1st Qu.:4.000 1st Qu.:4.000 1st Qu.:4.000 1st Qu.:4.000
## Median :5.00 Median :4.000 Median :5.000 Median :5.000 Median :5.000
## Mean :4.91 Mean :4.466 Mean :4.746 Mean :4.749 Mean :4.907
## 3rd Qu.:6.00 3rd Qu.:5.000 3rd Qu.:6.000 3rd Qu.:6.000 3rd Qu.:6.000
## Max. :7.00 Max. :7.000 Max. :7.000 Max. :7.000 Max. :7.000
##
## t3 f1 f2 f3 PE
## Min. :1.000 Min. :1.00 Min. :1.000 Min. :1.000 Min. :1.000
## 1st Qu.:4.000 1st Qu.:3.00 1st Qu.:3.000 1st Qu.:3.000 1st Qu.:1.000
## Median :5.000 Median :5.00 Median :5.000 Median :4.000 Median :2.000
## Mean :5.013 Mean :4.28 Mean :4.347 Mean :4.212 Mean :2.199
## 3rd Qu.:6.000 3rd Qu.:5.00 3rd Qu.:5.000 3rd Qu.:5.000 3rd Qu.:3.000
## Max. :7.000 Max. :7.00 Max. :7.000 Max. :7.000 Max. :6.000
##
## PE.int NE NE.int daybefore.ex drugs
## Min. :1.000 Min. :1.000 Min. :1.000 0:242 0:297
## 1st Qu.:1.000 1st Qu.:1.000 1st Qu.:1.000 1: 69 1: 14
## Median :1.000 Median :1.000 Median :1.000
## Mean :1.929 Mean :1.707 Mean :1.712
## 3rd Qu.:3.000 3rd Qu.:2.000 3rd Qu.:2.000
## Max. :6.000 Max. :6.000 Max. :6.000
## NA's :2
## drugs.which
## Length:311
## Class :character
## Mode :character
##
##
##
##
From the summary above, we can see that some categorical confounders
have only a very small number of ratings for certain levels (i.e.,
smoke
has only 11 cases rated as “2” or higher,
coffe
has only 8 cases rated as “2”, activity has only 29
cases rated as “2”, people has only 3 cases rated as “3”). Here, we
simplify such variables by dichotomizing each of them
(i.e., 0/1).
# plotting frequency of cases in each level of each confounder
par(mfrow=c(1,4))
for(conf in c("smoke","coffe","activity","people")){ plot(ESM[,conf],main=conf) }
# smoke: 2 and 5 --> 1 (i.e., "1+ cigarettes")
ESM$smoke <- as.factor(gsub("2","1",gsub("5","1",ESM$smoke)))
# coffe: 2 --> 1 (i.e., "1+ coffes")
ESM$coffe <- as.factor(gsub("2","1",ESM$coffe))
# activity: 2 --> 1 (i.e., "moderate/intense physical activity")
ESM$activity <- as.factor(gsub("2","1",ESM$activity))
# people: 1 --> 0 (i.e., "alone or with some people not disturbing"); 2 and 3 --> 1 (i.e., "someone disturbing or interrupting the recording")
ESM$people <- as.factor(gsub("3","1",gsub("2","1",gsub("1","0",ESM$people))))
# re-plotting
par(mfrow=c(1,4))
for(conf in c("smoke","coffe","activity","people")){ plot(ESM[,conf],main=conf) }
# re-summarizing
summary(ESM[,c("smoke","coffe","activity","people")])
## smoke coffe activity people
## 0 :1178 0 :1112 0 :921 0 :1046
## 1 : 168 1 : 229 1 :417 1 : 290
## NA's: 50 NA's: 55 NA's: 58 NA's: 60
Finally, we can join the baseline
dataset with the
ESM
dataset.
# merging and fixing the last issues
ESM$within.day <- as.integer(as.character(ESM$within.day)) # within.day as integer for compatibility
new <- rbind.fill(ESM,baseline) # merging datasets
cat("sanity check:",nrow(new)==(nrow(ESM)+nrow(baseline))) # sanity check
## sanity check: TRUE
ESM <- new[order(new$ID,new$day,new$within.day),] # sorting by ID, day, and within.day
rownames(ESM) <- 1:nrow(ESM)
# showing merged dataset (everithing looks fine)
ESM
Third, we clean and filter the data based on the inclusion criteria and data quality. The following packages and functions are used to optimize data filtering:
library(plyr); library(knitr); library(reshape2); library(ggplot2); library(gridExtra); library(careless)
respRate
respRate <- function(data){
# scheduled No. of surveys (1 baseline + 5 ordinary X 3 days)
tot <- (1 + 5) * 3
bsl <- 1 * 3
ord <- 5 * 3
# computing No. of responses and response rate
data$ID <- as.factor(as.character(data$ID))
rr <- data.frame(ID=levels(data$ID),tot=as.numeric(table(data$ID))) # No. of responses per participant
rr$tot.RR <- round(100*rr$tot/tot) # total response rate
rr$bsl <- as.numeric(table(data[data$within.day==0,"ID"])) # No. of responses to the baseline surveys
rr$bsl.RR <- round(100*rr$bsl/bsl) # baseline response rate
rr$ord <- as.numeric(table(data[data$within.day!=0,"ID"])) # No. of responses to the ordinary surveys
rr$ord.RR <- round(100*rr$ord/ord) # ordinary response rate
# print response rate info
cat(nrow(rr),"included participants",
"\n -",sum(rr$tot),"responses, from",min(rr$tot),"to",max(rr$tot),
", mean =",round(mean(rr$tot),2),"sd =",round(sd(rr$tot),2),
", mean perc =",round(mean(rr$tot.RR),2),"% sd =",round(sd(rr$tot.RR),2),"%",
"\n -",sum(rr$bsl),"baseline, from",min(rr$bsl),"to",max(rr$bsl),
", mean =",round(mean(rr$bsl),2),"sd =",round(sd(rr$bsl),2),
", mean perc =",round(mean(rr$bsl.RR),2),"% sd =",round(sd(rr$bsl.RR),2),"%",
"\n -",sum(rr$ord),"ordinary, from",min(rr$ord),"to",max(rr$ord),
", mean =",round(mean(rr$ord),2),"sd =",round(sd(rr$ord),2),
", mean perc =",round(mean(rr$ord.RR),2),"% sd =",round(sd(rr$ord.RR),2),"%")
return(rr)} # returning response rate dataset
Computes a data frame of response rate values and prints summary information on response rate.
plotResp
plotResp <- function(data,id,items){
long <- melt(data[,c("id",items)],id) # one row per item
long <- long[order(long$id,long$variable),]
ggplot(long,aes(x=value,y=variable)) + geom_vline(xintercept=4,lwd=0.5,lty=2) + geom_point() + xlim(1,7) +
facet_wrap(id) + theme(text=element_text(size=8)) }
Plots the pattern of responses to mood items for a subsample of responses.
We start by inspecting the original number of respondents and
observations in the ESM
dataset. We can see that the
questionnaires were completed by a total of 104
respondents, one less than the 105 participants that were
actually invited to participate to the daily protocol. This is because
participant HRV009
dropped out without activating the
app.
cat("Original No. of responses to the ESM/baseline surveys =",nrow(ESM),"from",nlevels(ESM$ID),"participants")
## Original No. of responses to the ESM/baseline surveys = 1707 from 104 participants
First, we inspect and remove cases of double
responses with the same ID
and
RunTimestamp
values (N = 10). In these cases, only the
first response is included with the exception of HRV007
on
day 1 and HRV073
on day 2, because of missing values in the
first but not in the second response.
# detecting double responses
ESM$idday <- paste(ESM$ID,ESM$RunTimestamp,sep="_")
cat(nrow(ESM[duplicated(ESM$idday),]),"cases of double responses (same ID and RunTimestamp)")
## 10 cases of double responses (same ID and RunTimestamp)
kable(ESM[ESM$idday%in%ESM[duplicated(ESM$idday),"idday"],
c("ID","day","within.day","RunTimestamp","v1","v2","v3","t1","t2","t3","NE.int","smoke","coffe","alcohol","activity")])
ID | day | within.day | RunTimestamp | v1 | v2 | v3 | t1 | t2 | t3 | NE.int | smoke | coffe | alcohol | activity | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
108 | HRV007 | 1 | 5 | 2018-11-13 16:21:31.569 | 4 | 4 | 4 | 5 | 5 | 5 | 3 | 0 | 0 | 0 | 0 |
109 | HRV007 | 1 | 5 | 2018-11-13 16:21:31.569 | 3 | 3 | 3 | 4 | 3 | 3 | NA | NA | NA | NA | NA |
121 | HRV007 | 3 | 5 | 2018-11-15 16:21:49.293 | 3 | 3 | 3 | 3 | 4 | 3 | 3 | 0 | 0 | 0 | 0 |
122 | HRV007 | 3 | 5 | 2018-11-15 16:21:49.293 | 3 | 2 | 3 | 3 | 3 | 2 | 4 | 0 | 0 | 0 | 0 |
492 | HRV032 | 1 | 2 | 2018-12-18 12:54:50.792 | 5 | 5 | 6 | 5 | 5 | 4 | 1 | 1 | 0 | 0 | 1 |
493 | HRV032 | 1 | 2 | 2018-12-18 12:54:50.792 | 1 | 1 | 7 | 7 | 1 | 1 | 1 | 1 | 1 | 0 | 1 |
566 | HRV036 | 2 | 1 | 2019-01-09 11:33:48.782 | 7 | 5 | 7 | 7 | 7 | 7 | 1 | 0 | 0 | 0 | 0 |
567 | HRV036 | 2 | 1 | 2019-01-09 11:33:48.782 | 7 | 5 | 7 | 7 | 7 | 7 | 1 | 0 | 0 | 0 | 0 |
1169 | HRV073 | 2 | 3 | 2019-05-08 13:45:50.606 | 5 | 5 | 5 | 4 | 4 | 5 | 1 | 0 | 0 | NA | NA |
1170 | HRV073 | 2 | 3 | 2019-05-08 13:45:50.606 | 5 | 4 | 5 | 3 | 4 | 4 | 2 | 0 | 1 | 0 | 0 |
1252 | HRV078 | 2 | 2 | 2019-05-15 12:40:54.675 | 3 | 3 | 3 | 3 | 4 | 3 | 3 | 0 | 1 | 0 | 1 |
1253 | HRV078 | 2 | 2 | 2019-05-15 12:40:54.675 | 4 | 3 | 3 | 3 | 3 | 3 | 3 | 0 | 0 | 0 | 1 |
1329 | HRV083 | 1 | 2 | 2019-05-21 12:49:08.996 | 6 | 5 | 4 | 6 | 6 | 6 | 1 | 1 | 0 | 0 | 0 |
1330 | HRV083 | 1 | 2 | 2019-05-21 12:49:08.996 | 6 | 4 | 4 | 6 | 6 | 6 | 1 | 1 | 0 | 0 | 0 |
1337 | HRV083 | 2 | 3 | 2019-05-22 14:10:05.861 | 4 | 4 | 4 | 5 | 5 | 5 | 2 | 1 | 1 | 0 | 0 |
1338 | HRV083 | 2 | 3 | 2019-05-22 14:10:05.861 | 4 | 4 | 4 | 3 | 5 | 5 | 2 | 1 | 1 | 0 | 0 |
1342 | HRV083 | 3 | 1 | 2019-05-23 11:41:29.187 | 4 | 4 | 4 | 5 | 5 | 6 | 1 | 0 | 0 | 0 | 0 |
1343 | HRV083 | 3 | 1 | 2019-05-23 11:41:29.187 | 4 | 4 | 4 | 6 | 6 | 6 | 1 | 0 | 0 | 0 | 0 |
1347 | HRV083 | 3 | 5 | 2019-05-23 16:04:47.322 | 4 | 4 | 4 | 5 | 5 | 6 | 1 | 1 | 0 | 0 | 0 |
1348 | HRV083 | 3 | 5 | 2019-05-23 16:04:47.322 | 4 | 4 | 4 | 5 | 5 | 6 | 1 | 1 | 0 | 0 | 0 |
# excluding second response for HRV007 (day 1) and HRV073 (day 2)
memory <- ESM
ESM <- ESM[!(ESM$idday=="HRV007_2018-11-13 16:18:32.769" & is.na(ESM$activity)),]
ESM <- ESM[!(ESM$idday=="HRV073_2019-05-08 13:42:51.806" & is.na(ESM$activity)),]
# excluding all remaining double responses
ESM <- ESM[!duplicated(ESM$idday),]
# removing double responses
cat("Removed",nrow(memory)-nrow(ESM),"double responses")
## Removed 10 double responses
Then, we use the same procedure to inspect and remove cases of
double responses with the same ID
,
day
and within.day
values. We can see 10
further cases of double responses. Since no missing data are included in
any of these cases, we can just filter the latest response of each
couple.
# detecting double responses
ESM$idday <- paste(ESM$ID,ESM$day,ESM$within.day,sep="_")
cat(nrow(ESM[duplicated(ESM$idday),]),"cases of double responses (same ID and RunTimestamp)")
## 10 cases of double responses (same ID and RunTimestamp)
kable(ESM[ESM$idday%in%ESM[duplicated(ESM$idday),"idday"],
c("ID","day","within.day","RunTimestamp","v1","v2","v3","t1","t2","t3","NE.int","smoke","coffe","alcohol","activity")])
ID | day | within.day | RunTimestamp | v1 | v2 | v3 | t1 | t2 | t3 | NE.int | smoke | coffe | alcohol | activity | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
491 | HRV032 | 1 | 2 | 2018-12-18 12:47:00.000 | 5 | 4 | 5 | 5 | 5 | 6 | 1 | 1 | 0 | 0 | 0 |
492 | HRV032 | 1 | 2 | 2018-12-18 12:54:50.792 | 5 | 5 | 6 | 5 | 5 | 4 | 1 | 1 | 0 | 0 | 1 |
502 | HRV032 | 2 | 5 | 2018-12-19 16:08:00.000 | 3 | 4 | 3 | 3 | 2 | 2 | 4 | 0 | 0 | 0 | 0 |
503 | HRV032 | 2 | 5 | 2018-12-19 16:09:17.886 | 5 | 5 | 5 | 6 | 3 | 5 | 1 | 1 | 0 | 0 | 0 |
739 | HRV046 | 3 | 5 | 2019-01-24 16:01:42.284 | 2 | 2 | 3 | 2 | 2 | 2 | 4 | 1 | 0 | 0 | 1 |
740 | HRV046 | 3 | 5 | 2019-01-24 17:40:00.000 | 4 | 3 | 4 | 2 | 2 | 4 | 1 | 1 | 1 | 0 | 1 |
773 | HRV048 | 3 | 3 | 2019-01-31 13:40:00.000 | 4 | 2 | 3 | 3 | 4 | 5 | 1 | 0 | 0 | 0 | 0 |
774 | HRV048 | 3 | 3 | 2019-01-31 13:51:13.250 | 7 | 7 | 7 | 3 | 2 | 2 | 1 | 0 | 0 | 0 | 1 |
1085 | HRV068 | 2 | 2 | 2019-04-10 12:33:54.719 | 7 | 7 | 7 | 6 | 7 | 7 | 2 | 0 | 0 | 0 | 1 |
1086 | HRV068 | 2 | 2 | 2019-04-10 12:43:54.737 | 7 | 7 | 7 | 6 | 7 | 7 | 1 | 0 | 0 | 0 | 0 |
1169 | HRV073 | 2 | 3 | 2019-05-08 13:45:50.606 | 5 | 5 | 5 | 4 | 4 | 5 | 1 | 0 | 0 | NA | NA |
1171 | HRV073 | 2 | 3 | 2019-05-08 14:03:54.629 | 4 | 3 | 4 | 5 | 5 | 4 | 3 | 0 | 0 | 0 | 0 |
1525 | HRV094 | 3 | 3 | 2019-10-17 13:49:35.021 | 5 | 2 | 4 | 4 | 4 | 3 | 2 | 0 | 0 | 0 | 1 |
1526 | HRV094 | 3 | 3 | 2019-10-17 13:52:44.679 | 5 | 3 | 4 | 3 | 4 | 3 | 1 | 0 | 0 | 0 | 0 |
1527 | HRV094 | 3 | 4 | 2019-10-17 14:51:03.452 | 4 | 4 | 4 | 3 | 3 | 3 | 1 | 0 | 0 | 0 | 0 |
1528 | HRV094 | 3 | 4 | 2019-10-17 15:11:23.876 | 4 | 2 | 3 | 3 | 3 | 3 | 2 | 0 | 0 | 0 | 0 |
1531 | HRV095 | 1 | 2 | 2019-10-15 12:37:19.946 | 5 | 4 | 5 | 5 | 3 | 5 | 1 | 0 | 0 | 0 | 1 |
1532 | HRV095 | 1 | 2 | 2019-10-15 12:37:21.780 | 5 | 4 | 5 | 5 | 3 | 5 | 1 | 0 | 0 | 0 | 1 |
1533 | HRV095 | 1 | 2 | 2019-10-15 12:47:56.838 | 5 | 4 | 4 | 5 | 5 | 5 | 1 | 0 | 0 | 0 | 1 |
# excluding double responses
memory2 <- ESM
ESM <- ESM[!duplicated(ESM$idday),]
# removing double responses
cat("Removed further",nrow(memory2)-nrow(ESM),"double responses")
## Removed further 10 double responses
Thus, we removed a total of 20 double responses
# removing double responses
cat("Removed a total of",nrow(memory)-nrow(ESM),"double responses; new number of responses =",nrow(ESM))
## Removed a total of 20 double responses; new number of responses = 1687
Second, we inspect the daily.drugs
and the
drugs.what
variable (i.e., drugs/medications different than
those usually taken that were assumed on that morning) considering the
inclusion criteria of the study (i.e., not taking medications affecting
the nervous or the cardiovascular system).
Only 14 cases are present in the dataset, of which 7 (4 participants) concerned antiinflammatory medications (Antiinfiammatorio, moment, buscofen), 1 case concerned antibiotics (fosfomicina), 1 case concerned analgesics (paracetamolo), and 3 cases (1 participant) concerned cough syrup (sciroppo per la tosse). None of these cases contrast with our inclusion criteria, but they will be taken into account later, as they can influence some self-reported and physiological variables.
ESM[substr(ESM$ID,1,3)=="HRV" & ESM$within.day==0 & ESM$drugs==1,c("ID","drugs.which")]
Third, we mark the 8 participants that were excluded
for the reasons specified in section 1.3.4. As in that section, such
participants are marked as ‘OUT’. As noted above, participant
HRV009
was even not included in the ESM
dataset because she dropped out before activating the app. Thus, from
the initial 104 participants, the number of excluded participants is 7
(6.73%%), whereas the total number of included participants is 97
(93.27%). Again, we can note that the variables measured from these
participants do not substantially deviate from the distribution of the
included participants, and we remove them from the ESM
dataset.
# marking 8 excluded participants as "OUT"
memory <- ESM
ESM$ID <- as.character(ESM$ID)
excl <- c("HRV009","HRV011","HRV015","HRV027","HRV029","HRV046","HRV059","HRV066")
ESM[ESM$ID%in%excl,"ID"] <- gsub("HRV","OUT",ESM[ESM$ID%in%excl,"ID"]) # marking excluded participants as "OUTXXX"
# plotting
ESM$excl <- "IN" # creating excl variable (factor)
ESM[substr(ESM$ID,1,3)=="OUT","excl"] <- "OUT"
ESM$excl <- as.factor(ESM$excl)
grid.arrange(ggplot(ESM[!is.na(ESM$v1),],aes(x=excl,y=v1,fill=excl)) + geom_violin(),
ggplot(ESM[!is.na(ESM$t1),],aes(x=excl,y=t1,fill=excl)) + geom_violin(),
ggplot(ESM[!is.na(ESM$f1),],aes(x=excl,y=f1,fill=excl)) + geom_violin(),
ggplot(ESM[!is.na(ESM$PE),],aes(x=excl,y=PE,fill=excl)) + geom_violin(),
ggplot(ESM[!is.na(ESM$NE),],aes(x=excl,y=NE,fill=excl)) + geom_violin(),
ggplot(ESM[!is.na(ESM$smoke),],aes(x=excl,fill=smoke)) + geom_bar(),
ggplot(ESM[!is.na(ESM$activity),],aes(x=excl,fill=activity)) + geom_bar(),
ggplot(ESM[!is.na(ESM$people),],aes(x=excl,fill=people)) + geom_bar(),nrow=2)
# updating number of included participants
cat(nrow(ESM[substr(ESM$ID,1,3)=="HRV",]),"observations from",
nlevels(as.factor(as.character(ESM[substr(ESM$ID,1,3)=="HRV","ID"]))),"included participants out of",
nrow(memory),"total observations from",nrow(memory[!duplicated(memory$ID),]),"invited participants")
## 1607 observations from 97 included participants out of 1687 total observations from 104 invited participants
# removing excluded participants
ESM <- ESM[substr(ESM$ID,1,3)=="HRV",]
ESM$ID <- as.factor(ESM$ID) # ID as factor
ESM <- ESM[order(ESM$ID,ESM$RunTimestamp),] # sorting by ID & RunTimestamp
ESM$excl <- NULL # removing excl variable
Finally, although we did not apply inclusion criteria based on further variables than those specified in section 1.3.2, here we consider the response rate and the coherence of the responses as further criteria to flag those cases that might bias the results.
First, we inspect the response rate by using the
respRate
function showed at the beginning of section 2.3.
We can note that all ‘baseline’ surveys (i.e., those
administered in the lab) were responded each day by each participant,
whereas some missing data were present in ‘ordinary
surveys’ (i.e., those administered with the mobile app).
Overall, the mean response rate to ‘ordinary’ surveys was relatively
high (90%) with no participants showing response rates
< 60%. Thus, we see no reasons for excluding or flagging
further participants based on response rate.
rrate <- respRate(ESM)
## 97 included participants
## - 1607 responses, from 12 to 18 , mean = 16.57 sd = 1.41 , mean perc = 91.91 % sd = 7.79 %
## - 291 baseline, from 3 to 3 , mean = 3 sd = 0 , mean perc = 100 % sd = 0 %
## - 1316 ordinary, from 9 to 15 , mean = 13.57 sd = 1.41 , mean perc = 90.41 % sd = 9.32 %
In addition to counting the number of received surveys, we must also consider missing responses within each survey. Indeed, a number of surveys was incomplete due to technical problems with the app. We can notice that missing responses mainly concern items on confounder variables (i.e., activity, alcohol, coffee, smoke and people), which were administered in the last part of each survey and showed from 31 to 39 missing values (about 3%). In contrast, missing data are only about the 2-3% for items measuring stressful events (from 8 to 18), and 1% or less for items measuring mood (from 0 to 1).
miss <- sapply(ESM[ESM$within.day!=0,1:which(colnames(ESM)=="people")], function(x) sum(is.na(x))) # counting missing
miss <- data.frame(item=names(miss),nMiss=as.numeric(miss), # missing rate
percMiss=round(100*as.numeric(miss)/nrow(ESM[ESM$within.day!=0,])))
kable(miss) # showing No. and percentage of missing cases
item | nMiss | percMiss |
---|---|---|
ID | 0 | 0 |
os | 0 | 0 |
gender | 0 | 0 |
day | 0 | 0 |
within.day | 0 | 0 |
RunTimestamp | 0 | 0 |
v1 | 0 | 0 |
v2 | 6 | 0 |
v3 | 7 | 1 |
t1 | 2 | 0 |
t2 | 6 | 0 |
t3 | 7 | 1 |
f1 | 4 | 0 |
f2 | 7 | 1 |
f3 | 7 | 1 |
PE | 8 | 1 |
PE.int | 10 | 1 |
NE | 18 | 1 |
NE.int | 24 | 2 |
smoke | 31 | 2 |
coffe | 35 | 3 |
alcohol | 38 | 3 |
activity | 38 | 3 |
people | 39 | 3 |
Here, we remove further 7 surveys (0.4%) due to missing data in almost all items (i.e., those data entries with missing data in the first items).
# showing 7 cases with missing response to item v3
ESM[is.na(ESM$v3),c(1,which(colnames(ESM)=="v1"):ncol(ESM)),]
# excluding 14 cases with missing response to item v3
ESM <- ESM[!is.na(ESM$v3),]
Second, we flag participants and/or single responses that might be considered as careless respondents/responses.
Following Curran (2016), we consider multiple indicators of potentially careless responses including:
Response times (both item-specific and total)
Response consistency (long string proportions)
Multivariate outlier patterns (Mahalanobis distance)
Response inconsistency (psychometric synonyms)
For each indicator, the numbers between brackets indicates the corresponding number of flagged responses/participants.
First, we consider item-by-item and overall survey response
times in the ‘ordinary’ ESM
questionnaires. This
information was encoded in the ESM.RT
dataset in section
2.1. Thus, as a first step, we process those data by cleaning and
filtering the same cases cleaned and filtered in the ESM
dataset.
# recoding IDs
# ...................................................
ESM.RT <- participantID_recoding(ESM.RT)
# recoding time
# ...................................................
# converting temporal variables as POSIXct in the ESM dataset
ESM.RT[,c("RunTimestamp","SubmissionTimestamp")] <- # no need to specify format since they already are POSIXct
lapply(ESM.RT[,c("RunTimestamp","SubmissionTimestamp")],function(x) as.POSIXct(x,tz="GMT"))
# correcting wrongly encoded RunTimestamps (adding 1year) and SubmissionTimestamps (RunTimestamp + 61.20 sec)
ESM.RT[substr(ESM.RT$RunTimestamp,1,7)=="2018-01","SubmissionTimestamp"] <-
ESM.RT[substr(ESM.RT$RunTimestamp,1,7)=="2018-01","RunTimestamp"] + 61.20
ESM.RT[substr(ESM.RT$RunTimestamp,1,7)=="2018-01",c("RunTimestamp","SubmissionTimestamp")] <-
ESM.RT[substr(ESM.RT$RunTimestamp,1,7)=="2018-01",c("RunTimestamp","SubmissionTimestamp")] + 24*60*60*365
# sanity check: min and max matching with the data collection period
c(min(ESM.RT$RunTimestamp),max(ESM.RT$RunTimestamp))
## [1] "2018-11-06 10:18:46.547 GMT" "2019-11-21 15:11:17.835 GMT"
# adding 1 hour to all ESM timestamps
ESM.RT[,c("RunTimestamp","SubmissionTimestamp")] <- ESM.RT[,c("RunTimestamp","SubmissionTimestamp")] + 1*60*60
# adding 1 further hour to ESM timestamps during daylight saving time
ESM.RT$daylight <- "standard"
ESM.RT[ESM.RT$RunTimestamp > as.POSIXct("2019-04-01") & ESM.RT$RunTimestamp < as.POSIXct("2019-10-27"),"daylight"] <- "daylight"
ESM.RT[ESM.RT$daylight=="daylight",c("RunTimestamp","SubmissionTimestamp")] <-
ESM.RT[ESM.RT$daylight=="daylight",c("RunTimestamp","SubmissionTimestamp")] + 1*60*60
# creating day
ESM.RT$day <- as.POSIXlt(ESM.RT$RunTimestamp)$wday - 1
# creating within.day
ESM.RT <- ddply(ESM.RT,c("ID","day"),transform,within.day=seq_along(day))
ESM.RT <- within.day.adjust(ESM.RT)
## Creating/Adjusting within.day variable...
## Printing summary of cases:
## 1 2 3 4 5
## 275 279 287 267 284
# recoding other variables
# ....................................................
# ID as factor
ESM.RT$ID <- as.factor(ESM.RT$ID)
# Sorting and renaming columns
ESM.RT <- ESM.RT[,c("ID","day","within.day","RunTimestamp","SubmissionTimestamp","RT.tot",
"v1.male.bene","v2.soddisfatto.insoddisfatto","v3.positivo.negativo", # hedonic tone
"t1.rilassato.teso","t2.agitato.calmo","t3.nervoso.tranquillo", # calmness
"e1.stanco.sveglio","e2.pieno.privodenergia","e3.affaticato.fresco", # energetic arousal
"PE","intensity.PE","NE","intensity.PE", # emotional events
"smoke","coffe","alcohol","Activity","People.rec")] # confounders
colnames(ESM.RT)[7:ncol(ESM.RT)] <- c("v1","v2","v3","t1","t2","t3","f1","f2","f3", # renaming columns
"PE","PE.int","NE","NE.int","smoke","coffe","alcohol","activity","people")
# filtering double responses
# ..................................................
# by RunTimestamp
ESM.RT$idday <- paste(ESM.RT$ID,ESM.RT$RunTimestamp,sep="_")
cat(nrow(ESM.RT[duplicated(ESM.RT$idday),]),"cases of double responses (same ID and RunTimestamp)") # 10 (= ESM)
## 10 cases of double responses (same ID and RunTimestamp)
ESM.RT <- ESM.RT[!(ESM.RT$idday=="HRV007_2018-11-13 16:18:32.769" & is.na(ESM.RT$activity)),] # excl 2nd resp for HRV007 & HRV073
ESM.RT <- ESM.RT[!(ESM.RT$idday=="HRV073_2019-05-08 13:42:51.806" & is.na(ESM.RT$activity)),]
ESM.RT <- ESM.RT[!duplicated(ESM.RT$idday),] # excluding all remaining double responses
# by day and within.day
ESM.RT$idday <- paste(ESM.RT$ID,ESM.RT$day,ESM.RT$within.day,sep="_")
cat(nrow(ESM.RT[duplicated(ESM.RT$idday),]),"cases of double responses (same ID and RunTimestamp)") # 6 (4 less than ESM (?))
## 6 cases of double responses (same ID and RunTimestamp)
ESM.RT <- ESM.RT[!duplicated(ESM.RT$idday),] # excluding all double responses
# marking further excluded
# ..................................................
# excluding further 8 participants as "OUT"
ESM.RT$ID <- as.character(ESM.RT$ID)
excl <- c("HRV009","HRV011","HRV015","HRV027","HRV029","HRV046","HRV059","HRV066")
ESM.RT <- ESM.RT[!(ESM.RT$ID%in%excl),]
ESM.RT$ID <- as.factor(as.character(ESM.RT$ID))
ESM.RT <- ESM.RT[order(ESM.RT$ID,ESM.RT$RunTimestamp),]
# excluding cases with missing response to most items
# .................................................
ESM.RT <- ESM.RT[!is.na(ESM.RT$v3),]
# sanity checks
# .................................................
# same number of participants?
nlevels(ESM.RT$ID) == nlevels(ESM$ID)
## [1] TRUE
# same number of included participants?
nlevels(as.factor(as.character(ESM.RT[substr(ESM.RT$ID,1,3)=="HRV","ID"]))) ==
nlevels(as.factor(as.character(ESM[substr(ESM$ID,1,3)=="HRV","ID"])))
## [1] TRUE
# same number of observations?
nrow(ESM.RT) == nrow(ESM[ESM$within.day!=0,])
## [1] TRUE
Now we can inspect the item-by-item response times
(seconds) in the ESM.RT
dataset, computed as the difference
between the timestamp of each response and that of the previous one
(note: the first response of each questionnaire does not have an
associated response time).
We can see that responding to a single item required about 2 seconds in most cases (median response time = 2.69 sec), regardless of the type of items. A small number of items were responded in more than 10 seconds (4.4%) with a very few cases requiring more than 50 seconds (N = 39, 0.18%). Likewise, a small number of items were responded in less than 1 second (2.3%) with a very few cases requiring less than 0.65 seconds (N = 98, 0.44%).
# item-by-item long dataset
rt <- melt(ESM.RT[,c("ID","RunTimestamp",colnames(ESM.RT)[which(colnames(ESM.RT)=="v1"):which(colnames(ESM.RT)=="people")])],
id=c("ID","RunTimestamp"))
rt <- rt[!is.na(rt$value),] # removing missing values
rt$items <- "mood" # categorizing variables based on the item contents
rt[rt$variable%in%c("PE","PE.int","NE","NE.int"),"items"] <- "event"
rt[rt$variable%in%c("smoke","coffe","alcohol","activity","people"),"items"] <- "conf"
# summarizing response times
summary(rt$value)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 1.942 2.693 4.140 4.131 4378.792
# plotting response times
grid.arrange(ggplot(rt,aes(value))+geom_histogram(bins=300)+ggtitle("Response times (sec)"),
ggplot(rt,aes(value))+geom_histogram(bins=300)+xlim(0,100)+ggtitle("zooming 0 - 100 sec"),
ggplot(rt,aes(value))+geom_histogram(bins=300)+xlim(0,10)+ggtitle("zooming 0 - 10 sec"),
ggplot(rt[rt$items=="mood",],aes(value))+geom_histogram(bins=300,alpha=0.5)+xlim(0,10) + ggtitle("Mood items"),
ggplot(rt[rt$items=="event",],aes(value))+geom_histogram(bins=300,alpha=0.5)+xlim(0,10) + ggtitle("Event items"),
ggplot(rt[rt$items=="conf",],aes(value))+geom_histogram(bins=300,alpha=0.5)+xlim(0,10) + ggtitle("Confounders items"),
nrow=2)
# No. of response times higher than X
for(value in c(10,20,25,50,100,500,1000,2000,4000)){ cat("\n-",nrow(rt[rt$value>value,]),"RTs >",value,
"s (",round(100*nrow(rt[rt$value>value,])/nrow(rt),2),"% )") }
##
## - 967 RTs > 10 s ( 4.38 % )
## - 245 RTs > 20 s ( 1.11 % )
## - 151 RTs > 25 s ( 0.68 % )
## - 39 RTs > 50 s ( 0.18 % )
## - 16 RTs > 100 s ( 0.07 % )
## - 2 RTs > 500 s ( 0.01 % )
## - 2 RTs > 1000 s ( 0.01 % )
## - 1 RTs > 2000 s ( 0 % )
## - 1 RTs > 4000 s ( 0 % )
# No. of response times lower than X
for(value in c(2,1.5,1,0.65,0.5,0.25,0.1)){ cat("\n-",nrow(rt[rt$value<value,]),"RTs <",value,
"s (",round(100*nrow(rt[rt$value<value,])/nrow(rt),2),"% )") }
##
## - 5997 RTs < 2 s ( 27.15 % )
## - 2465 RTs < 1.5 s ( 11.16 % )
## - 509 RTs < 1 s ( 2.3 % )
## - 98 RTs < 0.65 s ( 0.44 % )
## - 36 RTs < 0.5 s ( 0.16 % )
## - 12 RTs < 0.25 s ( 0.05 % )
## - 12 RTs < 0.1 s ( 0.05 % )
Here, we flag all cases with at least one item
responded in more than 50 seconds with
iRT.slow = 1
and all those responded in less than
650 ms with iRT.fast = 1
. While the first
threshold is essentially based on the observed RT distribution, the
second criterion was also recommended by Geeraerts
(2020). We also compute the number of items meeting each condition,
for each survey.
We can see that most cases have only one slow-response-time item (N =
26) with only 6 surveys showing 2 or more items that took more than 50
sec. In contrast, there is a similar proportion of surveys with either 1
(N = 34) or 2 items (N = 32) answered with less than 650 ms. In terms of
careless participants, we flag slow and fast
respondents considering participants with 3+ slow responses (N = 2) and
those with 3+ fast responses (N = 8). Particular attention should be
posed on participant HRV099
, showing 11 out of 14 responses
with fast response times.
ESM.RT$idday <- as.factor(paste(ESM.RT$ID,ESM.RT$RunTimestamp,sep="_"))
rt$idday <- as.factor(paste(rt$ID,rt$RunTimestamp,sep="_"))
ESM.RT$iRT.slow <- ESM.RT$NiRT.slow <- ESM.RT$iRT.fast <- ESM.RT$NiRT.fast <- ESM.RT$slowResp <- ESM.RT$fastResp <- 0
for(idday in levels(ESM.RT$idday)){
if(any(rt[rt$idday==idday,"value"]>50)){ ESM.RT[ESM.RT$idday==idday,"iRT.slow"] <- 1
ESM.RT[ESM.RT$idday==idday,"NiRT.slow"] <- nrow(rt[rt$idday==idday & !is.na(rt$value) & rt$value>50,]) }
if(any(rt[rt$idday==idday,"value"]<0.65)){ ESM.RT[ESM.RT$idday==idday,"iRT.fast"] <- 1
ESM.RT[ESM.RT$idday==idday,"NiRT.fast"] <- nrow(rt[rt$idday==idday & !is.na(rt$value) & rt$value<0.65,]) }}
# slow response times (N = 32)
summary(as.factor(ESM.RT$NiRT.slow)) # summary of number of slow response times
## 0 1 2 3
## 1277 26 5 1
(s <- summary(as.factor(as.character(ESM.RT[ESM.RT$iRT.slow==1,"ID"])))) # summary of slow respondents
## HRV001 HRV003 HRV013 HRV014 HRV042 HRV050 HRV051 HRV054 HRV056 HRV060 HRV063
## 1 1 1 1 1 1 3 1 2 1 1
## HRV067 HRV070 HRV073 HRV078 HRV079 HRV080 HRV085 HRV088 HRV089 HRV092 HRV094
## 2 1 1 1 1 1 1 1 1 1 3
## HRV099 HRV104
## 2 2
ESM.RT[ESM.RT$ID %in% names(s[s >= 3]),"slowResp"] <- 1 # flagging slow respondents (3+ responses with 1+ items > 50 sec)
summary(as.factor(ESM.RT[!duplicated(ESM.RT$ID),"slowResp"])) # No. of slow respondents (N = 2)
## 0 1
## 95 2
# fast response times (N = 66)
summary(as.factor(ESM.RT$NiRT.fast)) # summary of number of fast response times
## 0 1 2
## 1243 34 32
(s <- summary(as.factor(as.character(ESM.RT[ESM.RT$iRT.fast==1,"ID"])))) # summary of fast respondents
## HRV006 HRV008 HRV013 HRV014 HRV020 HRV022 HRV024 HRV039 HRV041 HRV044 HRV049
## 1 4 3 2 1 6 2 1 1 1 1
## HRV050 HRV056 HRV061 HRV065 HRV072 HRV073 HRV082 HRV087 HRV088 HRV089 HRV093
## 1 1 1 4 1 2 4 2 1 1 3
## HRV098 HRV099 HRV100 HRV102 HRV103
## 2 11 5 2 2
ESM.RT[ESM.RT$ID %in% names(s[s >= 3]),"fastResp"] <- 1 # flagging fast respondents (i.e., 3+ responses with 1+ items > 50 sec)
summary(as.factor(ESM.RT[!duplicated(ESM.RT$ID),"fastResp"])) # No. of fast respondents (N = 8 + 1 excluded)
## 0 1
## 89 8
Second, the same procedures used with item response times are applied
considering the total response time of each survey
RT.tot
, computed in section 2.1 as the seconds between the
first response and the SubmissionTimestamp
of each survey.
Note that here we do not consider those cases with missing response to
the last items (i.e., from PE
on out).
We can see that responding to the whole questionnaire
required about 1 min in most cases (median = 61.3 sec). A small
number of questionnaires were responded in more than 2 minutes (7.4%)
with a very few cases requiring more than 3 min (N = 27, 2.06%, flagged
as tRT.slow = 1
). Likewise, a small number of
questionnaires were responded in less than 40 second (7.8%) with a very
few cases requiring less than 35 seconds (N = 23, 1.8%, flagged as
tRT.fast = 1
). We also flag careless
participants as those with 3+ responses that took more than 5
min (N = 1 flagged as slowResp.tot = 1
) and 3+ responses
that took less than 35 seconds (N = 2 flagged as
fastResp.tot = 1
).
# summarizing total response times (in both seconds and minutes)
ESM.RT$RT.tot.min <- ESM.RT$RT.tot/60
summary(ESM.RT$RT.tot); summary(ESM.RT$RT.tot.min)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 30.05 48.17 61.31 75.33 81.81 4412.12
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.5008 0.8028 1.0219 1.2556 1.3635 73.5354
# plotting response times
grid.arrange(ggplot(ESM.RT,aes(RT.tot.min))+geom_histogram(bins=300)+ggtitle("Total response times (min)"),
ggplot(ESM.RT,aes(RT.tot.min))+geom_histogram(bins=300)+xlim(0,10)+ggtitle("zooming 0 - 10 min"),
ggplot(ESM.RT,aes(RT.tot.min))+geom_histogram(bins=300)+xlim(0,5)+ggtitle("zooming 0 - 5 min"),
nrow=1)
# No. of total response times higher than X minutes
for(value in c(2,3,5,10,20,50)){ cat("\n-",nrow(ESM.RT[ESM.RT$RT.tot.min>value,]),"total RTs >",value,
"min (",round(100*nrow(ESM.RT[ESM.RT$RT.tot.min>value,])/
nrow(ESM.RT),2),"% )") }
##
## - 97 total RTs > 2 min ( 7.41 % )
## - 27 total RTs > 3 min ( 2.06 % )
## - 7 total RTs > 5 min ( 0.53 % )
## - 2 total RTs > 10 min ( 0.15 % )
## - 2 total RTs > 20 min ( 0.15 % )
## - 1 total RTs > 50 min ( 0.08 % )
# No. of total response times lower than X seconds
for(value in c(40,35,30,20)){ cat("\n-",nrow(ESM.RT[!is.na(ESM.RT$PE) & ESM.RT$RT.tot<value,]),"total RTs <",value,
"sec (",round(100*nrow(ESM.RT[!is.na(ESM.RT$PE) &
ESM.RT$RT.tot<value,])/
nrow(ESM.RT[!is.na(ESM.RT$PE),]),2),"% )") }
##
## - 102 total RTs < 40 sec ( 7.8 % )
## - 23 total RTs < 35 sec ( 1.76 % )
## - 0 total RTs < 30 sec ( 0 % )
## - 0 total RTs < 20 sec ( 0 % )
# flagging slow > 5 min (N = 27) and fast total response times < 35 sec (N = 23)
ESM.RT$tRT.slow <- ESM.RT$tRT.fast <- ESM.RT$slowResp.tot <- ESM.RT$fastResp.tot <- 0
ESM.RT[ESM.RT$RT.tot.min > 3,"tRT.slow"] <- 1
ESM.RT[!is.na(ESM.RT$PE) & ESM.RT$RT.tot < 35,"tRT.fast"] <- 1
# flagging slow and fast respondents
(s <- summary(as.factor(as.character(ESM.RT[ESM.RT$tRT.slow==1,"ID"])))) # summary of slow respondents
## HRV001 HRV003 HRV013 HRV050 HRV051 HRV054 HRV056 HRV063 HRV067 HRV070 HRV073
## 1 1 1 1 2 1 1 1 1 1 1
## HRV075 HRV078 HRV079 HRV080 HRV085 HRV088 HRV089 HRV092 HRV094 HRV099 HRV104
## 1 1 1 1 1 1 1 1 4 2 1
ESM.RT[ESM.RT$ID %in% names(s[s >= 3]),"slowResp.tot"] <- 1 # flagging slow respondents (3+ responses > 5 min)
summary(as.factor(ESM.RT[!duplicated(ESM.RT$ID),"slowResp.tot"])) # No. of slow respondents (N = 1)
## 0 1
## 96 1
(s <- summary(as.factor(as.character(ESM.RT[ESM.RT$tRT.fast==1,"ID"])))) # summary of fast respondents
## HRV008 HRV018 HRV019 HRV022 HRV036 HRV043 HRV049 HRV061 HRV063 HRV065 HRV071
## 2 1 5 1 1 1 1 1 1 2 1
## HRV084 HRV087 HRV099
## 2 1 3
ESM.RT[ESM.RT$ID %in% names(s[s >= 3]),"fastResp.tot"] <- 1 # flagging fast respondents (3+ responses < 35 sec)
summary(as.factor(ESM.RT[!duplicated(ESM.RT$ID),"fastResp.tot"])) # No. of slow respondents (N = 2)
## 0 1
## 95 2
Third, we examine those cases with long string of identical
responses to mood items, that is extremely consistent responses
with long strings of identical value in consecutive items using the same
response scale (see Curran, 2016). This is done using
the careless
package. Note that in section 2.2.3 we recoded
mood ratings to have the same polarity. Event items are not considered
because most respondents entered no events (i.e., 1, 1).
We can see that in most cases (about 71%) the same rating was entered
for up to 3 consecutive items, with the 14.3% showing strings of
consistent responses repeated more than 4 times (i.e., the rule of thumb
proposed by Curran (2016) of identical string length
> scale range / 2). In a minority of cases (N = 56, 3.5%),
respondents entered the same value for 7+ times, and we mark them as
ls = 1
. As done previously, we also flag careless
participants (N = 4) as those with 3+ responses with long
identical strings (N = 18 flagged as lsResp = 1
).
# computing the longest string frequency and the average length of identical consecutive responses
mood <- colnames(ESM)[which(colnames(ESM)=="v1"):which(colnames(ESM)=="f3")]
ESM.LS <- cbind(ESM[,c("ID","RunTimestamp","within.day")],longstring(ESM[,mood],avg=TRUE))
# plotting longest identical string and average identical string length
grid.arrange(ggplot(ESM.LS,aes(longstr))+geom_histogram(bins=30)+ggtitle("Longest identical string in each survey"),
ggplot(ESM.LS,aes(avgstr))+geom_histogram(bins=30)+ggtitle("Average identical string length"),
nrow=1)
# No. of longest identical strings longer than X
for(value in 3:6){ cat("\n-",nrow(ESM.LS[ESM.LS$longstr>value,]),"longest identical string >",value,
"times (",round(100*nrow(ESM.LS[ESM.LS$longstr>value,])/nrow(ESM.LS),2),"% )") }
##
## - 516 longest identical string > 3 times ( 32.25 % )
## - 258 longest identical string > 4 times ( 16.12 % )
## - 166 longest identical string > 5 times ( 10.38 % )
## - 66 longest identical string > 6 times ( 4.12 % )
# flagging cases with identical strings > 6 (N = 56)
ESM.LS$lsResp <- ESM.LS$ls <- 0
ESM.LS[ESM.LS$longstr > 6,"ls"] <- 1
summary(as.factor(ESM.LS$ls)) # No. of cases with long string
## 0 1
## 1534 66
# flagging long string respondents (N = 4)
(s <- summary(as.factor(as.character(ESM.LS[ESM.LS$ls == 1,"ID"])))) # No. of long string responses per respondent
## HRV004 HRV005 HRV006 HRV007 HRV008 HRV017 HRV021 HRV022 HRV023 HRV030 HRV033
## 1 4 1 3 3 1 1 1 1 1 2
## HRV034 HRV036 HRV038 HRV039 HRV040 HRV045 HRV050 HRV054 HRV062 HRV065 HRV067
## 2 2 3 2 1 1 1 1 1 2 1
## HRV068 HRV070 HRV075 HRV078 HRV080 HRV081 HRV085 HRV086 HRV088 HRV090 HRV091
## 1 2 1 2 1 1 1 2 2 1 6
## HRV098 HRV099 HRV100 HRV102 HRV104 HRV105
## 1 1 2 1 3 2
ESM.LS[ESM.LS$ID %in% names(s[s >= 3]),"lsResp"] <- 1 # flagging slow respondents (i.e., 2+ identical strings > 5)
summary(as.factor(ESM.LS[!duplicated(ESM.LS$ID),"lsResp"]))
## 0 1
## 91 6
Fourth, we examine those cases with unlikely response
patterns by performing a multivariate outlier
analysis based on the Mahalanobis distance. We can see that 68
responses (4.2%) were flagged as multivariate outliers
(md = 1
) with a confidence level of 99%, with 6
participants showing 3+ flagged responses and thus marked with
mdResp = 1
.
# computing Mahalanobis distance in mood ratings
ESM.LS <- cbind(ESM.LS,mahad(ESM[,mood],flag=TRUE))
# plotting Mahalanobis distance and marking flagged cases (N = 72)
ESM.LS$md <- as.numeric(gsub("FALSE","0",gsub("TRUE","1",ESM.LS$flagged)))
summary(as.factor(ESM.LS$md)) # No. flagged responses (N = 72)
## 0 1
## 1517 83
ggplot(ESM.LS,aes(d_sq,fill=as.factor(md)))+geom_histogram(bins=30)+ggtitle("Mahalanobis distance")
# flagging outlier respondents (N = 6)
(s <- summary(as.factor(as.character(ESM.LS[ESM.LS$md == 1,"ID"])))) # No. of flagged responses per respondent
## HRV001 HRV002 HRV004 HRV005 HRV008 HRV010 HRV013 HRV014 HRV020 HRV024 HRV026
## 1 2 1 1 1 1 2 2 1 2 2
## HRV030 HRV031 HRV036 HRV037 HRV043 HRV048 HRV049 HRV051 HRV053 HRV055 HRV056
## 1 5 4 1 1 3 1 7 1 1 1
## HRV060 HRV061 HRV062 HRV063 HRV068 HRV069 HRV071 HRV076 HRV079 HRV082 HRV084
## 1 2 1 1 2 1 1 1 2 1 2
## HRV086 HRV087 HRV088 HRV089 HRV093 HRV094 HRV095 HRV096 HRV098 HRV100 HRV102
## 4 1 1 1 1 7 1 3 3 2 1
## HRV103
## 1
ESM.LS$mdResp <- 0
ESM.LS[ESM.LS$ID %in% names(s[s >= 3]),"mdResp"] <- 1 # flagging outlier respondents (i.e., 3+ extreme Mahalanobis dist)
summary(as.factor(ESM.LS[!duplicated(ESM.LS$ID),"mdResp"])) # No. flagged respondents (N = 6)
## 0 1
## 89 8
Fifth, we examine those cases with negative correlations
among psychometric synonyms, that is cases where the
intra-response correlation is negative, and specifically less than -0.30
between pairs of items that correlate at r = .60 or stronger.
We can see that 83 responses (5.2%) were flagged as
psychsyn = 1
, with 5 participants showing 3+ flagged
responses and thus marked with psychsynResp = 1
.
# computing Mahalanobis distance in mood ratings
ESM.LS <- cbind(ESM.LS,psychsyn(ESM[,mood],diag=TRUE,critval=0.60,resample_na=TRUE))
# note that some cases marked with long identical string have SD = 0 so the intra-response correlation cannot be computed
head(cbind(ESM.LS[is.na(ESM.LS$cor),c("ID","longstr","avgstr","ls","lsResp")],ESM[is.na(ESM.LS$cor),mood]))
# flagging and plotting correlations < -.30 (N = 90)
ESM.LS$psychsyn <- 0
ESM.LS[!is.na(ESM.LS$cor) & ESM.LS$cor < (-.30),"psychsyn"] <- 1
summary(as.factor(ESM.LS$psychsyn)) # No flagged responses (N = 90)
## 0 1
## 1500 100
ggplot(ESM.LS,aes(cor,fill=as.factor(psychsyn)))+geom_histogram(bins=100)+ggtitle("Indivual corr. among psychometric synonyms")
# flagging psychsyn respondents (N = 5)
(s <- summary(as.factor(as.character(ESM.LS[ESM.LS$psychsyn == 1,"ID"])))) # No. of flagged responses per respondent
## HRV001 HRV002 HRV004 HRV005 HRV006 HRV008 HRV010 HRV013 HRV014 HRV017 HRV019
## 2 1 1 2 1 2 1 1 1 1 1
## HRV020 HRV022 HRV024 HRV028 HRV030 HRV032 HRV033 HRV034 HRV035 HRV037 HRV041
## 2 1 2 3 1 2 1 1 4 3 1
## HRV048 HRV051 HRV054 HRV056 HRV058 HRV061 HRV062 HRV064 HRV065 HRV067 HRV069
## 2 2 2 1 1 1 1 4 1 1 3
## HRV070 HRV072 HRV073 HRV074 HRV076 HRV079 HRV080 HRV081 HRV082 HRV085 HRV086
## 1 1 3 1 2 2 1 3 3 2 3
## HRV088 HRV089 HRV090 HRV092 HRV093 HRV094 HRV095 HRV096 HRV100 HRV101 HRV102
## 2 1 2 2 2 2 2 1 2 2 2
## HRV103 HRV104 HRV105
## 1 2 1
ESM.LS$psychsynResp <- 0
ESM.LS[ESM.LS$ID %in% names(s[s >= 3]),"psychsynResp"] <- 1 # flagging outlier respondents (i.e., 3+ flagged responses)
summary(as.factor(ESM.LS[!duplicated(ESM.LS$ID),"psychsynResp"])) # No. flagged participants (N = 5)
## 0 1
## 88 9
Here, we summarize the number of potentially careless responses and
participants for each criterion, and we compute an aggregated variable
indicating the number of criteria (from 1 to 5) based on which each
response and participant is classified as potentially careless. Then, we
flag all responses classified as potentially careless
based on 1+ criteria (N = 293) with cLess1 = 1
and those
based on 2+ criteria (N = 53) with cLess2 = 1
,
respectively. Similarly, we flag all participants
classified as potentially careless based on 1+ criteria (N = 23) with
cLessResp1 = 1
, and those based on 2+ criteria (N = 3) with
cLessResp2 = 1
.
# joining careless criteria
ESM.LS$idday <- as.factor(paste(ESM.LS$ID,ESM.LS$RunTimestamp,sep="_"))
ESM.careless <- join(ESM.LS[,c("ID","within.day","idday","ls","md","psychsyn","lsResp","mdResp","psychsynResp")],
ESM.RT[,c("idday","iRT.fast","tRT.fast","iRT.slow","tRT.slow",
"fastResp","fastResp.tot","slowResp","slowResp.tot")],by="idday",type="left")
# computing aggregate careless scores
ESM.careless$cLessResp2 <- ESM.careless$cLessResp1 <- ESM.careless$cLess2 <- ESM.careless$cLess1 <- 0
cr <- c("ls","md","psychsyn","iRT.fast","tRT.fast","iRT.slow","tRT.slow")
ESM.careless$carelessN <- rowSums(ESM.careless[,cr],na.rm=TRUE)
ESM.careless[ESM.careless$carelessN>=1,"cLess1"] <- 1 # careless response if meeting 1+ criteria
ESM.careless[ESM.careless$carelessN>=2,"cLess2"] <- 1 # careless response if meeting 2+ criteria
crResp <- c("lsResp","mdResp","psychsynResp","fastResp","fastResp.tot","slowResp","slowResp.tot")
ESM.careless$cLessRespN <- rowSums(ESM.careless[,crResp],na.rm=TRUE)
ESM.clResp <- ESM.careless[ESM.careless$within.day!=0,]
ESM.clResp <- ESM.clResp[!duplicated(ESM.clResp$ID),]
ESM.clResp[ESM.clResp$cLessRespN>=1,"cLessResp1"] <- 1 # careless respondent if meeting 1+ criteria
ESM.clResp[ESM.clResp$cLessRespN>=2,"cLessResp2"] <- 1 # careless respondent if meeting 1+ criteria
# summarizing criteria of potentially careless responses (N = 293/53)
summary(as.data.frame(lapply(ESM.careless[,c(cr,"carelessN","cLess1","cLess2")],as.factor)))
## ls md psychsyn iRT.fast tRT.fast iRT.slow tRT.slow
## 0:1534 0:1517 0:1500 0 :1240 0 :1283 0 :1274 0 :1279
## 1: 66 1: 83 1: 100 1 : 66 1 : 23 1 : 32 1 : 27
## NA's: 294 NA's: 294 NA's: 294 NA's: 294
##
##
## carelessN cLess1 cLess2
## 0:1266 0:1266 0:1545
## 1: 279 1: 334 1: 55
## 2: 49
## 3: 4
## 4: 2
# summarizing criteria of potentially careless respondents (N = 23/3)
summary(as.data.frame(lapply(ESM.clResp[,c(crResp,"cLessRespN","cLessResp1","cLessResp2")],as.factor)))
## lsResp mdResp psychsynResp fastResp fastResp.tot slowResp slowResp.tot
## 0:91 0:89 0:88 0:89 0:95 0:95 0:96
## 1: 6 1: 8 1: 9 1: 8 1: 2 1: 2 1: 1
##
##
## cLessRespN cLessResp1 cLessResp2
## 0:68 0:68 0:91
## 1:23 1:29 1: 6
## 2: 5
## 3: 1
# joining careless scores to the ESM dataset
ESM$idday <- as.factor(paste(ESM$ID,ESM$RunTimestamp,sep="_"))
ESM <- join(ESM,ESM.careless[,c("idday","cLess1","cLess2")],by="idday",type="left")
ESM <- join(ESM,ESM.clResp[,c("ID","cLessResp1","cLessResp2")],by="ID",type="left")
Here, we visually inspect all responses marked with
cLess2 = 1
. We can note several cases of long string
proportions (e.g., HRV008_3_2
) as well as cases of
inconsistent responses (e.g., HRV060_1_0
for items
t2
and t3
).
ESM$id <- as.factor(paste(ESM$ID,ESM$day,ESM$within.day,sep="_"))
plotResp(ESM[ESM$cLess2==1,],"id",mood)
Here, we visually inspect all responses from participants marked with
cLessResp2 = 2
. Also in this case, we can note several
cases of long string proportions (e.g., HRV051_2_4
) and
especially cases of inconsistent responses (e.g.,
HRV051_1_0
for items f2
and
f3
).
plotResp(ESM[ESM$cLessResp2==1,],"id",mood)
Here, we describe the steps implemented to pre-process the pulse interval data recorded with the E4 wristband (Empatica, Milan). Then, pre-processed pulse intervals are read and used for computing HR and HRV metrics, which are subsequently recoded and filtered based on data quality and inclusion criteria.
The following signal pre-processing steps were implemented to derive pulse intervals from the raw blood volume pulse (BVP):
We selected the recording segments between each couple of markers
signaled by participants by focusing on the last 2
minutes of each 3-minute resting interval, and we used
consecutive pulse wave foot points in the BVP signal to
automatically detect pulse intervals. This was done with the Pulse Intervals Detection
App - PIDApp, based on the pracma
and the shiny
R packages.
Artefacts were removed based on visual
inspection of both the BVP signal and pulse intervals time
series, and recording segments with artefacts in more than the
25% were excluded from the analyses (marked as
out
). Specifically, a given pulse intervals was classified
as artefact and removed if meeting three out of five
criteria:
clear noise in the related raw BVP signal
clear rise in the amplitude of the corresponding acceleration signal
excessive deviation (i.e., > 300 ms) from the mean pulse interval length of the surrounding 10 pulse intervals
excessive deviation (i.e., > 10 bpm) from the instantaneous HR values preceding and following the pulse interval
excessive deviation (i.e., > 1 ms) between the RMSSD value computed with and without considering the pulse interval
The resulting pulse interval time series were automatically processed using the freely available ARTiiFACT software (version 2.13) (Kaufmann et al., 2011). Specifically, we spline interpolated pulse intervals identified as artefacts based on the median absolute deviation method.
A final visual inspection was performed based on
the code showed below (not run), and the pre-processed data were saved
in the HRV/FinalProcessing
folder.
# Before executing the following code, pulse intervals were pre-processed as described in steps 1-3.
# Each pre-processed segment was named as "subject_day_interval_Number of artefacts".
# Intervals discarded due to too many artefacts were marked with the label "out" in place of the number of artefacts.
# Missing intervals (i.e., not signaled by participants) were marked as "miss"
# ..............................................
# 4.1. DATA VISUALIZATION AND COMPARISON
# ..............................................
# Here, we show the code (not run) used for visually inspecting the pulse intervals time series processed with ARTiiFACT.
# Automatically processed pulse intervals obtained in step 3 were saved in the "HRV/ARTIIFACTcorrected" folder.
# Automatically processed pulse intervals are also compared them with the original (i.e., manually processed) data.
# Manually processed pulse intervals obtained in step 2 were saved in the "HRV/clean_noOUT" folder.
# reading file names
data.path <- "HRV/ARTIIFACTcorrected" # files processed with ARTiiFACT
paths = list.files(data.path, recursive = TRUE,full.names = TRUE, include.dirs = FALSE, pattern = ".txt")
data.path.original <- "HRV/clean_noOUT" # original data
paths.original = list.files(data.path.original, recursive = TRUE,full.names = TRUE, include.dirs = FALSE,pattern=".csv")
# isolating ID x day couples to be plotted
IDs <- strsplit(gsub(".txt","",gsub("_ArtifactCorrectedWithMethod_1.txt","",gsub("/","",gsub(data.path,"",paths)))),"_")
IDdays <- levels(as.factor(paste(rep(levels(as.factor(unlist(IDs)[grep("HRV",unlist(IDs))])),3),
c(rep(1,length(IDs)),rep(2,length(IDs)),rep(3,length(IDs))),sep="_")))
# reading, processing, plotting, and saving pulse intervals
for(IDday in IDdays[260:length(IDdays)]){
paths.cor <- grep(IDday,paths,value=TRUE)
paths.or <- grep(IDday,paths.original,value=TRUE)
# setting plot interface
par(mfrow=c(2,4),mar=c(2,2,3,2))
# dataset of HR and HRV
Summ <- data.frame(interval=NA,RMSSD.cor=NA,RMSSD.or=NA,HR.cor=NA,HR.or=NA)
for(i in 1:length(paths.cor)){
# matching processed and original data
ID <- gsub(".csv","",gsub("_ArtifactCorrectedWithMethod_1.txt","",gsub("/","",gsub(data.path,"",paths.cor[i]))))
new.data <- read.csv(paths.cor[i],header=FALSE)
colnames(new.data)[1] <- "IBI"
original.data <- read.csv(paths.or[i])
colnames(original.data)[1] <- "time"
# adding info to dataset
Summ <- rbind(Summ,data.frame(interval=as.integer(strsplit(ID,"_")[[1]][3]),
RMSSD.cor=round(sqrt(mean(diff(new.data$IBI)^2)),2),
RMSSD.or=round(sqrt(mean(diff(original.data$IBI)^2)),2),
HR.cor=round(mean(60000/new.data$IBI),2),HR.or=round(mean(60000/original.data$IBI),2)))
# plotting corrected over original IBIs
plot(original.data[1:nrow(original.data),"time"],60000/original.data[1:nrow(original.data),"IBI"],
type="o",pch=20,cex=1.5,cex.main=1.2,xlab=NA,ylab=NA,
main=paste(ID,"\nRMSSD =",Summ[i+1,3],"-->",Summ[i+1,2],"- HR =",Summ[i+1,5],"-->",Summ[i+1,4]))
lines(original.data[1:nrow(original.data),"time"],60000/new.data$IBI,
type="o",pch=20,cex=1.5,col="red")
# saving corrected data with time information
write.csv(data.frame(time=original.data$time,IBI=new.data$IBI),gsub("clean_noOUT","SanityCheck",paths.or[i])) }
# plotting RMSSD by interval
plot(Summ[2:nrow(Summ),"interval"],Summ[2:nrow(Summ),"RMSSD.or"],main=paste("RMSSD (ms)",IDday),type="o")
lines(Summ[2:nrow(Summ),"interval"],Summ[2:nrow(Summ),"RMSSD.cor"],type="o",col="red")
# plotting HR by interval
plot(Summ[2:nrow(Summ),"interval"],Summ[2:nrow(Summ),"HR.or"],main=paste("HR (bpm)",IDday),type="o")
lines(Summ[2:nrow(Summ),"interval"],Summ[2:nrow(Summ),"HR.cor"],type="o",col="red")
readline(prompt=cat(IDday,"\nPress [enter]")) }
# Based on this visual inspection, some signal segments were manually re-processed by interpolating artefacted pulse intervals.
# Specifically, we only kept corrections leading to substantial absolute change (i.e., > 1 ms) in the RMSSD of each segment.
# Manually re-processed files were saved in the "HRV/SanityCheck2" folder.
# ..............................................
# 4.2. FINAL VISUAL INSPECTION AND DATA EXPORT
# ..............................................
# As a final step, we conducted a last visual inspection and we saved the final data in the "HRV/FinalProcessing" folder.
# reading file names (i.e., "subject_day_interval_Number of artefacts")
data.path <- "HRV/SanityCheck2" # automatically processed & checked data
paths = list.files(data.path, recursive = TRUE,
full.names = TRUE, include.dirs = FALSE, pattern = ".csv")
# reading files, recoding, plotting, and saving
for(IDday in IDdays[201:length(IDdays)]){
# setting plot interface
par(mfrow=c(2,4),mar=c(2,2,3,2))
# dataset of HR and HRV
Summ <- data.frame(interval=NA,RMSSD=NA,HR=NA)
# reading each interval in a given IDday
paths.IDday <- grep(IDday,paths,value=TRUE)
for(i in 1:length(paths.IDday)){
# changing participants' ID
ID <- strsplit(gsub(" ","",(gsub("/","",gsub(".csv","",(gsub(data.path,"",paths.IDday[i])))))),split = "_")
if(nchar(ID[[1]][1])==5){
ID[[1]][1] <- paste("HRV0",substr(ID[[1]][1],4,5),sep="")
ID <- paste(ID[[1]][1],ID[[1]][2],ID[[1]][3],ID[[1]][4],sep="_")
} else{ ID <- gsub("/","",gsub(".csv","",(gsub(data.path,"",paths.IDday[i])))) }
if(strsplit(ID,"_")[[1]][4]!="out" & strsplit(ID,"_")[[1]][4]!="miss"){
# changing data format
new.data <- read.csv(paths.IDday[i])
if(colnames(new.data)[1]=="X"){ new.data <- new.data[,2:3] }
if(colnames(new.data)[1]=="time"){colnames(new.data)[1]="t"}
# adding info to dataset
Summ <- rbind(Summ,data.frame(interval=as.integer(strsplit(ID,"_")[[1]][3]),
RMSSD=round(sqrt(mean(diff(new.data$IBI)^2)),2),
HR=round(mean(60000/new.data$IBI),2)))
# plotting corrected over original IBIs
plot(new.data$t,60000/new.data$IBI,
type="o",pch=20,cex=1.5,cex.main=1.2,xlab=NA,ylab=NA,
main=paste(ID,"\nRMSSD =",Summ[!is.na(Summ$interval)&Summ$interval==i,2],
", HR =",Summ[!is.na(Summ$interval)&Summ$interval==i,3]))
# saving corrected data with time information
write.csv(new.data,paste("HRV/FinalProcessing/",ID,".csv",sep=""),row.names=FALSE)
} else{ # cases marked as "out" or "miss"
plot(1,main=paste(ID,"\nRMSSD =",Summ[i+1,2],"- HR =",Summ[i+1,3])) }} # empty plot
# plotting RMSSD by interval
plot(Summ[2:nrow(Summ),"interval"],Summ[2:nrow(Summ),"RMSSD"],main=paste("RMSSD (ms)",IDday),type="b",pch=20,cex=5)
plot(Summ[2:nrow(Summ),"interval"],Summ[2:nrow(Summ),"HR"],main=paste("HR (bpm)",IDday),type="b",pch=20,cex=5)
readline(prompt=cat(IDday,"\nPress [enter]")) }
Here, we we read the pre-processed pulse intervals time series and we compute the following metrics for each two-minute recording segment:
Mean pulse interval PI
, computed as
the mean of the pulse intervals (ms) in a given recording
segment
Mean heart rate HR
(bpm), computed
as the mean of 60000/PI
Root mean square of successive differences in pulse
intervals RMSSD
(ms), computed as the squared root
of the squared mean successive differences in consecutive pulse
intervals
RMSSD coefficient of variation
RMSSDcv
, computed as 100 * RMSSD / PI
,
following de Geus et al (2019)’s recommendation to
test and report on how adjusting HRV metrics for chronotropic state
affects primary study outcomes
The readHRV
function is used to process pulse intervals
and merge them into the HRV dataset.
readHRV
readHRV <- function(data.path){
# empty data.frame to be filled
HRVdata <- data.frame(ID=NA,day=NA,within.day=NA,PI=NA,HR=NA,RMSSD=NA,RMSSDcv=NA,artefacts=NA,nPI=NA)
paths = list.files(data.path, recursive = TRUE, full.names = TRUE, include.dirs = FALSE)
# data reading
for(path in paths){
ID <- gsub("/","",gsub(data.path,"",gsub(".csv","",path)))
ID <- strsplit(ID,"_")[[1]]
if(ID[4]=="out"|ID[4]=="miss"){ # missing or excluded data
HRVdata <- rbind(HRVdata,
data.frame(ID=ID[1],day=ID[2],within.day=ID[3],
PI=NA,HR=NA,RMSSD=NA,RMSSDcv=NA,
artefacts=ID[4],nPI=NA))
}else{ # complete data
new.data <- read.csv(path)
HRVdata <- rbind(HRVdata,
# computing HRV metrics
data.frame(ID=ID[1],day=ID[2],within.day=ID[3],
PI=mean(new.data$IBI), # mean pulse interval
HR=mean(60000/new.data$IBI), # mean heart rate
RMSSD=sqrt(mean(diff(new.data$IBI)^2)), # RMSSD
RMSSDcv= 100 * sqrt(mean(diff(new.data$IBI)^2)) / mean(new.data$IBI), # RMSSDcv
artefacts=ID[4],
nPI=nrow(new.data)))}}
# adjusting data
HRVdata <- HRVdata[2:nrow(HRVdata),] # removing first empty row
HRVdata$ID <- as.factor(HRVdata$ID) # setting class for ID, day and within.day
HRVdata[,c("within.day","day")] <- lapply(HRVdata[,c("within.day","day")],as.numeric)
return(HRVdata)}
(HRV <- readHRV(data.path="HRV/FinalProcessing"))[1:3,] # reading data and showing first three rows
HRV data were already cleaned and filtered in the signal pre-processing step (see section 3.1). Here, we report the details on data quality and we consider some data quality indicators as further criteria to flag those cases that might bias the results. The following packages and functions are used to optimize data filtering:
library(plyr) # loading required packages
complRate
complRate <- function(data,what=c("compliance","inclusion")){
# scheduled No. of surveys (1 baseline + 5 ordinary X 3 days)
tot <- (1 + 5) * 3
amb <- 5 * 3
lab <- 1 * 3
# subsetting data
data.ok <- data[!(data$artefacts%in%c("out","miss")),] # clean segments
data.nmiss <- data[data$artefacts!="miss",] # nonmissing segments
data.miss <- data[data$artefacts=="miss",] # missing segments
data.out <- data[data$artefacts=="out",] # excluded segments
# creating dataset of compliance and inclusion rates
cr <- data.frame(ID=levels(data$ID),tot.nmiss=as.numeric(table(data.nmiss$ID))) # No. of nonmissing segments per participant
cr$lab.nmiss <- as.numeric(table(data.nmiss[data.nmiss$within.day==6,"ID"])) # No. nonmissing segments in lab
cr$amb.nmiss <- as.numeric(table(data.nmiss[data.nmiss$within.day!=6,"ID"])) # No. nonmissing segments outside lab
# compliance: nonmissing and missing over scheduled
if("compliance" %in% what){
# computing compliance rate
cr$tot.CR <- round(100*cr$tot.nmiss/tot,1) # compliance rate
cr$lab.CR <- round(100*cr$lab.nmiss/lab,1) # compliance rate in lab
cr$amb.CR <- round(100*cr$amb.nmiss/amb,1) # compliance rate outside lab
cat("\n\nNonmissing segments:",
"\n -",sum(cr$tot.nmiss),"nonmissing segments (",
round(100*sum(cr$tot.nmiss)/(tot*nlevels(data$ID)),1),"% of tot scheduled ), from",min(cr$tot.nmiss),"to",max(cr$tot.nmiss),
", mean =",round(mean(cr$tot.nmiss),1),"sd =",round(sd(cr$tot.nmiss),1),
", mean perc =",round(mean(cr$tot.CR),1),"% sd =",round(sd(cr$tot.CR),1),"%",
"\n -",sum(cr$amb.nmiss),"from outside lab, from",min(cr$amb.nmiss),"to",max(cr$amb.nmiss),
", mean =",round(mean(cr$amb.nmiss),1),"sd =",round(sd(cr$amb.nmiss),1),
", mean perc =",round(mean(cr$amb.CR),1),"% sd =",round(sd(cr$amb.CR),1),"%",
"\n -",sum(cr$lab.nmiss),"from lab, from",min(cr$lab.nmiss),"to",max(cr$lab.nmiss),
", mean =",round(mean(cr$lab.nmiss),1),"sd =",round(sd(cr$lab.nmiss),1),
", mean perc =",round(mean(cr$lab.CR),1),"% sd =",round(sd(cr$lab.CR),1),"%")
# computing No. of missing segments and missing rate
cr$tot.miss <- as.numeric(table(data.miss$ID)) # No. of missing segments per participant
cr$tot.MR <- round(100*cr$tot.miss/tot,1) # missing rate
cr$lab.miss <- as.numeric(table(data.miss[data.miss$within.day==6,"ID"])) # No. missing segments in lab
cr$lab.MR <- round(100*cr$lab.miss/lab,1) # missing rate in lab
cr$amb.miss <- as.numeric(table(data.miss[data.miss$within.day!=6,"ID"])) # No. missing segments outside lab
cr$amb.MR <- round(100*cr$amb.miss/amb,1) # missing rate outside lab
cat("\n\nMissing segments:",
"\n -",sum(cr$tot.miss),"missing segments (",
round(100*sum(cr$tot.miss)/(tot*nlevels(data$ID)),1),"% of tot scheduled ), from",min(cr$tot.miss),"to",max(cr$tot.miss),
", mean =",round(mean(cr$tot.miss),1),"sd =",round(sd(cr$tot.miss),1),
", mean perc =",round(mean(cr$tot.MR),1),"% sd =",round(sd(cr$tot.MR),1),"%",
"\n -",sum(cr$amb.miss),"from outside lab, from",min(cr$amb.miss),"to",max(cr$amb.miss),
", mean =",round(mean(cr$amb.miss),1),"sd =",round(sd(cr$amb.miss),1),
", mean perc =",round(mean(cr$amb.MR),1),"% sd =",round(sd(cr$amb.MR),1),"%",
"\n -",sum(cr$lab.miss),"from lab, from",min(cr$lab.miss),"to",max(cr$lab.miss),
", mean =",round(mean(cr$lab.miss),1),"sd =",round(sd(cr$lab.miss),1),
", mean perc =",round(mean(cr$lab.MR),1),"% sd =",round(sd(cr$lab.MR),1),"%") }
# inclusion: excluded and included over nonmissing
if("inclusion" %in% what){
# computing No. of excluded segments and exclusion rate
cr$tot.out <- as.numeric(table(data.out$ID)) # No. of excluded segments per participant
cr$tot.ER <- round(100*cr$tot.out/cr$tot.nmiss,1) # exclusion rate
cr$lab.out <- as.numeric(table(data.out[data.out$within.day==6,"ID"])) # No. excluded segments in lab
cr$lab.ER <- round(100*cr$lab.out/cr$lab.nmiss,1) # exclusion rate in lab
cr$amb.out <- as.numeric(table(data.out[data.out$within.day!=6,"ID"])) # No. excluded segments outside lab
cr$amb.ER <- round(100*cr$amb.out/cr$amb.nmiss,1) # exclusion rate outside lab
cat("\n\nExcluded segments:",
"\n -",sum(cr$tot.out),"excluded segments (",
round(100*sum(cr$tot.out)/sum(cr$tot.nmiss),1),"% of nonmissing segments), from",min(cr$tot.out),"to",max(cr$tot.out),
", mean =",round(mean(cr$tot.out),1),"sd =",round(sd(cr$tot.out),1),
", mean perc =",round(mean(cr$tot.ER),1),"% sd =",round(sd(cr$tot.ER),1),"%",
"\n -",sum(cr$amb.out),"from outside lab, from",min(cr$amb.out),"to",max(cr$amb.out),
", mean =",round(mean(cr$amb.out),1),"sd =",round(sd(cr$amb.out),1),
", mean perc =",round(mean(cr$amb.ER)),"% sd =",round(sd(cr$amb.ER)),"%",
"\n -",sum(cr$lab.out),"from lab, from",min(cr$lab.out),"to",max(cr$lab.out),
", mean =",round(mean(cr$lab.out),1),"sd =",round(sd(cr$lab.out),1),
", mean perc =",round(mean(cr$lab.ER),1),"% sd =",round(sd(cr$lab.ER),1),"%")
# computing No. of included segments and inclusion rate
cr$tot.ok <- as.numeric(table(data.ok$ID)) # No. of included segments per participant
cr$tot.IR <- round(100*cr$tot.ok/cr$tot.nmiss,1) # inclusion rate
cr$lab.ok <- as.numeric(table(data.ok[data.ok$within.day==6,"ID"])) # No. included segments in lab
cr$lab.IR <- round(100*cr$lab.ok/cr$lab.nmiss,1) # inclusion rate in lab
cr$amb.ok <- as.numeric(table(data.ok[data.ok$within.day!=6,"ID"])) # No. included segments outside lab
cr$amb.IR <- round(100*cr$amb.ok/cr$amb.nmiss,1) # inclusion rate outside lab
cat("\n\nIncluded segments:",
"\n -",sum(cr$tot.ok),"included segments (",
round(100*sum(cr$tot.ok)/sum(cr$tot.nmiss),1),"% of nonmissing segments), from",min(cr$tot.ok),"to",max(cr$tot.ok),
", mean =",round(mean(cr$tot.ok),1),"sd =",round(sd(cr$tot.ok),1),
", mean perc =",round(mean(cr$tot.IR),1),"% sd =",round(sd(cr$tot.IR),1),"%",
"\n -",sum(cr$amb.ok),"from outside lab, from",min(cr$amb.ok),"to",max(cr$amb.ok),
", mean =",round(mean(cr$amb.ok),1),"sd =",round(sd(cr$amb.ok),1),
", mean perc =",round(mean(cr$amb.IR),1),"% sd =",round(sd(cr$amb.IR),1),"%",
"\n -",sum(cr$lab.ok),"from lab, from",min(cr$lab.ok),"to",max(cr$lab.ok),
", mean =",round(mean(cr$lab.ok),1),"sd =",round(sd(cr$lab.ok),1),
", mean perc =",round(mean(cr$lab.IR),1),"% sd =",round(sd(cr$lab.IR),1),"%") }
return(cr)}
Computes a data frame of compliance rate values and prints summary information on compliance and inclusion rate.
plotPI <- function(data.path,IDs){
paths = list.files(data.path, recursive = TRUE, full.names = TRUE, include.dirs = FALSE)
for(ID in IDs){ pathIDs <- paths[substr(paths,21,26)==ID]
par(mfrow=c(2,4))
for(pathID in pathIDs){ id <- strsplit(gsub("/","",gsub(pathID,"",gsub(".csv","",pathID))),split="_")[[1]][4]
if(id!="out" & id!="miss"){ pi <- read.csv(pathID)
plot(pi$t,60000/pi$IBI,type="o",pch=20,cex=.5,lwd=.5,cex.main=1.2,xlab="time (sec)",ylab="HR (bpm)",
main=gsub(data.path,"",pathID),ylim=c(40,120))}}}}
Reads and plots the pulse intervals time series from the specified subsample of participants.
We start by inspecting the original number of participants and
observations in the HRV
dataset, which corresponds to the
total number of scheduled recordings (i.e., No. participants x 3 days x
6 recordings), that is 1,746. We can see that the dataset includes a
total of 97 participants, eight less than the 105 participants that were
actually invited to participate to the daily protocol. This is because
the eight excluded participants mentioned in sections
1.3.5 and 2.3.3 were not pre-processed or their pre-processing was
interrupted after noticing too low signal quality.
cat("Total No. of scheduled HRV recordings =",nrow(HRV),"from",nlevels(HRV$ID),"participants")
## Total No. of scheduled HRV recordings = 1746 from 97 participants
First, we inspect the data for cases of double recording segments,
that is segments with the same ID
, day
, and
within.day
value. We can see that no double
recordings are included in the HRV
dataset.
cat(nrow(HRV[duplicated(paste(HRV$ID,HRV$day,HRV$within.day)),]),"double recordings")
## 0 double recordings
Second, we inspect the compliance ratings and the data quality as further criteria to flag those cases that might bias the results.
First, we inspect the compliance rate, that is the number of
recordings that were not marked as “miss” in the file name over the
total number of scheduled recordings, by using the
complRate
function showed above.
We can see that all lab-based segments were recorded each day for each participant (compliance = 100%), whereas some missing data were present in ambulatory segments (i.e., those recorded outside the lab). Overall, the mean compliance rate with ambulatory recordings was relatively high (92%) with no participants showing response rates < 60%. Thus, we see no reasons for excluding or flagging further participants based on compliance rate.
crate <- complRate(HRV,what="compliance")
##
##
## Nonmissing segments:
## - 1629 nonmissing segments ( 93.3 % of tot scheduled ), from 13 to 18 , mean = 16.8 sd = 1.4 , mean perc = 93.3 % sd = 7.5 %
## - 1338 from outside lab, from 10 to 15 , mean = 13.8 sd = 1.4 , mean perc = 92 % sd = 9 %
## - 291 from lab, from 3 to 3 , mean = 3 sd = 0 , mean perc = 100 % sd = 0 %
##
## Missing segments:
## - 117 missing segments ( 6.7 % of tot scheduled ), from 0 to 5 , mean = 1.2 sd = 1.4 , mean perc = 6.7 % sd = 7.5 %
## - 117 from outside lab, from 0 to 5 , mean = 1.2 sd = 1.4 , mean perc = 8 % sd = 9 %
## - 0 from lab, from 0 to 0 , mean = 0 sd = 0 , mean perc = 0 % sd = 0 %
Second, we inspect the rate of pulse intervals identified as artefacts in the included segments. Note that recording segments with artefacts in more than the 25% of the pulse intervals were excluded and marked as “out”.
We can note that for a few included segments (N = 9, 0.7%) still
include artefacts in more than the 25%, and thus we exclude them.
Without these cases, the average No. of removed artefacts per segment is
7.34 (5.2%). Here, we consider a more restrictive criterion by
flagging segments with artefacts in more than the 10%
of the pulse intervals (N = 234, with art10 = 1
, 17%).
# separating "miss" and "out" from the No. of artefacts
HRV$out <- HRV$miss <- 0
HRV[HRV$artefacts=="miss","miss"] <- 1
HRV[HRV$artefacts=="out","out"] <- 1
HRV$nArtefacts <- HRV$artefacts
HRV[HRV$artefacts %in% c("miss","out"),"nArtefacts"] <- NA
HRV$nArtefacts <- as.integer(HRV$nArtefacts) # No. of artefacts
HRV$pArtefacts <- 100*HRV$nArtefacts/HRV$nPI # artefact rate
# checking for included cases with artefacts in more than the 25%
cat("Excluding further",nrow(HRV[!is.na(HRV$pArtefacts) & HRV$pArtefacts>25,]),"segments with 25%+ artefacts")
## Excluding further 9 segments with 25%+ artefacts
HRV[!is.na(HRV$pArtefacts) & HRV$pArtefacts>25,"out"] <- 1 # excluding these cases
HRV[!is.na(HRV$pArtefacts) & HRV$pArtefacts>25,"artefacts"] <- "out"
HRV[!is.na(HRV$pArtefacts) & HRV$pArtefacts>25,c("PI","HR","RMSSD","RMSSDcv","nPI","nArtefacts","pArtefacts")] <- NA
cat(nrow(HRV[!is.na(HRV$pArtefacts) & HRV$pArtefacts>25,]),"remaining segments with 25%+ artefacts")
## 0 remaining segments with 25%+ artefacts
# summarizing and plotting No. of artefacts per segment
cat(sum(HRV$nArtefacts,na.rm=TRUE),"artefacts removed from included segments (",
round(100*sum(HRV$nArtefacts,na.rm=TRUE)/sum(HRV$nPI,na.rm=TRUE),1),"% of pulse intervals )",
"\n- No. of artefacts per recording ranging from",min(HRV$nArtefacts,na.rm=TRUE),"to",max(HRV$nArtefacts,na.rm=TRUE),
", mean =",round(mean(HRV$nArtefacts,na.rm=TRUE),2),"sd =",round(sd(HRV$nArtefacts,na.rm=TRUE),2),
"\n- Perc. artefacts per recording ranging from",
min(100*HRV$nArtefacts/HRV$nPI,na.rm=TRUE),"% to",max(100*HRV$nArtefacts/HRV$nPI,na.rm=TRUE),"%, mean =",
round(mean(100*HRV$nArtefacts/HRV$nPI,na.rm=TRUE),1),"% sd =",round(sd(100*HRV$nArtefacts/HRV$nPI,na.rm=TRUE),1),"%")
## 9906 artefacts removed from included segments ( 4.9 % of pulse intervals )
## - No. of artefacts per recording ranging from 0 to 29 , mean = 7.34 sd = 7.61
## - Perc. artefacts per recording ranging from 0 % to 25 %, mean = 5.2 % sd = 5.5 %
par(mfrow=c(1,2)); hist(HRV$nArtefacts,main="No. removed artefacts"); hist(100*HRV$nArtefacts/HRV$nPI,main="% removed artefacts")
# flagging cases with artefacts in more than the 10%
HRV[!is.na(HRV$pArtefacts),"art10"] <- 0
HRV[!is.na(HRV$pArtefacts) & HRV$pArtefacts>10,"art10"] <- 1
summary(as.factor(HRV$art10))
## 0 1 NA's
## 1115 234 397
Third, we inspect the quality of the recorded signal in terms of
number of segment excluded due to excessive artefacts
(i.e., segments with artefacts in more than the 25% of the pulse
intervals were marked as “out” in the file name) over the number
of nonmissing segments. This is done with the same
complRate
function showed above.
We can see that only a few lab-based segments (N =
30 out of 291, 10.3%) were excluded due to excessive artefacts, whereas
a higher percentage of ambulatory segments (N = 243 our of 1,338, 18.7%)
were excluded for the same reasons. Overall, 280 segments (17.2% of the
nonmissing segments) were excluded due to excessive artefacts, resulting
in a total of 1,349 included segments. Here, a number
of participants (N = 14, 14.4%) showed inclusion rates < 60%
and are flagged with inclRate60 = 1
.
# computing inclusion rate
irate <- complRate(HRV,what="inclusion")
##
##
## Excluded segments:
## - 280 excluded segments ( 17.2 % of nonmissing segments), from 0 to 12 , mean = 2.9 sd = 3.1 , mean perc = 17.2 % sd = 18.1 %
## - 250 from outside lab, from 0 to 10 , mean = 2.6 sd = 2.7 , mean perc = 19 % sd = 20 %
## - 30 from lab, from 0 to 2 , mean = 0.3 sd = 0.6 , mean perc = 10.3 % sd = 19.5 %
##
## Included segments:
## - 1349 included segments ( 82.8 % of nonmissing segments), from 6 to 18 , mean = 13.9 sd = 3.2 , mean perc = 82.8 % sd = 18.1 %
## - 1088 from outside lab, from 4 to 15 , mean = 11.2 sd = 2.9 , mean perc = 81.3 % sd = 19.6 %
## - 261 from lab, from 1 to 3 , mean = 2.7 sd = 0.6 , mean perc = 89.7 % sd = 19.5 %
# flagging participants with IR < 60% (N = 14)
irate$inclRate60 <- 0
irate[irate$tot.IR < 60,"inclRate60"] <- 1
HRV <- join(HRV,irate[,c("ID","inclRate60")],by="ID",type="left")
summary(as.factor(irate$inclRate60))
## 0 1
## 83 14
Finally, we inspect the data of 10 included participants
(10.3%) whose signal was overall classified as ‘noisy’ due to
higher uncertainty of pulse interval detection. The plotPI
function showed at the beginning of section 3.3 is used to visualize the
pulse intervals time series for these participants and to compare them
with a random subsample of 5 ‘clean-signal’ participants.
We can note that, compared to ‘clean-signal’ participants, flagged participants tended to show less variable pulse intervals. Some cases (e.g., HRV032) tend to show ‘fixed’ pulse intervals values (i.e., certain values were highly frequent), as partially due to the sampling frequency of the E4 PPG sensor and can be also seen for some unflagged participant. Other cases (e.g., HRV054) show frequent steep changes of about 10-20 bpm.
Here, we flag the 10 participants showing a ‘noisy’ signal with
noisyPI = 1
. We can note that the noisyPI
variable looks quite indepentend from the previously created
art10
and inclRate60
variables.
noisy <- c("HRV018","HRV020","HRV021","HRV032","HRV033","HRV050","HRV054","HRV063","HRV070","HRV076")
HRV$noisyPI <- 0
HRV[HRV$ID %in% noisy,"noisyPI"] <- 1
summary(as.factor(HRV[!duplicated(HRV$ID),"noisyPI"]))
## 0 1
## 87 10
# crossing noisyPI cases with art10
table(HRV[,c("noisyPI","art10")])
## art10
## noisyPI 0 1
## 0 1021 199
## 1 94 35
# crossing noisyPI participants with inclRate60
table(HRV[!duplicated(HRV$ID),c("noisyPI","inclRate60")])
## inclRate60
## noisyPI 0 1
## 0 75 12
## 1 8 2
Here, we visualize the pulse intervals of all segments recorded by
participants flagged with noisyPI = 1
.
plotPI(data.path="HRV/FinalProcessing",IDs=noisy)
Here, we visualize the pulse intervals of all segments recorded by a
random subsample of 5 participants not flagged with
noisyPI = 1
.
plotPI(data.path="HRV/FinalProcessing",IDs=sample(levels(HRV$ID)[!(levels(HRV$ID) %in% noisy)],size=10))
Here, the raw behavioral data from Go/No-Go task
exported from E-Prime (version 2.0.10) (Psychology Software Tools, Inc., Sharpsburg,
PA) are read and saved in the GNG
dataset, recoded, and
filtered based in inclusion criteria.
First, we read the CSV data files exported from the E-Prime software.
Specifically, we used E-Merge to join the data from all the 105
participants invited to participate to the daily protocol in a single
data file Merged_105_final.csv
. Here, we read that
long-form trial-by-trial dataset.
# reading data
GNG <- read.csv2("GONOGO/Merged_105_final.csv",header=TRUE)
Then, we recode the GNG
variables to be used for the
analyses.
The following functions are used to optimize the ESM data recoding:
GNGrecoding
GNGrecoding <- function(data){
# VARIABLE RECORDING
# .............................................................
# keeping only columns of interest
data <- data[c("ID","Session","Procedure.Block.","SessionDate","SessionTime","TrialType","Image","Jitter", # session info
paste("Stimulus",rep(1:6,each=2),".",rep(c("ACC","RT"),6),sep=""))] # accuracy and reaction times
# recoding block information
colnames(data)[which(names(data)=="Procedure.Block.")] <-"block"
data <- data[data$block!="TRAINING",] # excluding training data
data$block <- as.integer(substr(data$block,6,6)) # block as integer from 1 to 6
# merging performance data into single variables (RT and ACC)
data[is.na(data)] <- 0 # missing data (NA) as 0 to be summed
data$ACC <- rowSums(data[,which(substr(colnames(data),11,13)=="ACC")]) # accuracy
data$RT <- rowSums(data[,which(substr(colnames(data),11,12)=="RT")]) # reaction time
data[,paste("Stimulus",rep(1:6,each=2),".",rep(c("ACC","RT"),6),sep="")] <- NULL # removing stimulus-related columns
# letter stimulus
data$Image <- as.factor(gsub(".png","",data$Image))
colnames(data)[which(names(data)=="Image")] <-"letter"
# TrialType as factor
data$TrialType <- as.factor(data$TrialType)
# FIXING PROBLEMS WITH SINGLE PARTICIPANTS
# .............................................................
# # checks cases with No. of trials per block != 109
# for(ID in levels(as.factor(paste(data$ID,data$block)))){
# if(nrow(data[paste(data$ID,data$block)==ID,])!=108){
# print(paste(ID,nrow(data[paste(data$ID,data$block)==ID,]))) }}
# blocks 5 and 6 of HRV001 were saved as HRV003
data[data$ID=="HRV003" & data$block==5,][109:216,"ID"] <- "HRV001"
data[data$ID=="HRV003" & data$block==6,][109:216,"ID"] <- "HRV001"
# HRV031 performed block 6 before block 5, then again block 6 (invalid), then block 5 twice (second invalid)
data <- rbind(data[!(data$ID=="HRV031" & data$block==5),],
data[data$ID=="HRV031" & data$block==5,][1:108,]) # taking only 1st
data <- rbind(data[!(data$ID=="HRV031" & data$block==6),],
data[data$ID=="HRV031" & data$block==6,][1:108,]) # taking only 1st
data[data$ID=="HRV031" & data$block==5, "block"] <- 9 # inverting 5 and 6
data[data$ID=="HRV031" & data$block==6, "block"] <- 5
data[data$ID=="HRV031" & data$block==9, "block"] <- 6
# HRV042 made block 6 twice (first time invalid)
data <- rbind(data[!(data$ID=="HRV042" & data$block==6),],
data[data$ID=="HRV042" & data$block==6,][109:216,]) # taking only 2nd
# HRV060 made block 4 twice (first time invalid)
data <- rbind(data[!(data$ID=="HRV060" & data$block==4),],
data[data$ID=="HRV060" & data$block==4,][109:216,]) # taking only 2nd
# Block 1 is invalid for HRV066 (technical problem) --> marking everything as NA
data[data$ID=="HRV066" & data$block==1,c("ACC","RT")] <- NA
# HRV086 made block 5 twice (first time invalid)
data <- rbind(data[!(data$ID=="HRV086" & data$block==5),],
data[data$ID=="HRV086" & data$block==5,][109:216,]) # taking only 2nd
# FINAL RECODING AND SAVING
# .............................................................
# renaming Session as "day" and recoding block as either 1 (i.e., 1, 3, and 5) or 2 (i.e., 2, 4, and 6)
colnames(data)[which(names(data)=="Session")] <-"day"
data[data$block==3|data$block==5,"block"] <- 1
data[data$block==4|data$block==6,"block"] <- 2
return(data[order(data$ID,data$SessionDate),])
}
Recodes behavioral data exported from E-Prime, rename and sort the columns, and fix problems with specific participants.
trials2blocks
trials2blocks <- function(data){
# aggregating values by block (mean)
block <- cbind(aggregate(ACC ~ ID*day*block, # mean accuracy to Go trials
data=data[data$TrialType=="Go",],FUN="mean",na.action=na.pass),
aggregate(ACC ~ ID*day*block, # mean accuracy to No-Go trials
data=data[data$TrialType=="No-go",],FUN="mean",na.action=na.pass)[,4],
aggregate(RT ~ ID*day*block, # mean RT to accurate Go trials
data=data[data$TrialType=="Go" & (GNG$ACC==1 | is.na(GNG$ACC)),],FUN="mean",na.action=na.pass)[,4],
aggregate(RT ~ ID*day*block, # sd RT to accurate Go trials
data=data[data$TrialType=="Go" & (GNG$ACC==1 | is.na(GNG$ACC)),],FUN="sd",na.action=na.pass)[,4])
# sorting dataset, renaming columns and rows
block <- block[order(block$ID,block$day,block$block),]
colnames(block)[4:ncol(block)] <- c("Go.ACC","NoGo.ACC","RT","RT.sd")
rownames(block) <- 1:nrow(block)
return(block) }
Aggregates long-form measures in a block-by-block dataset.
First, we recode participants’ identification codes
ID
(currently corresponding to their e-mail addresses)
using the “HRVXXX” format (e.g., from ‘john.smith@gmail.com’ to ‘HRV001’). This is done with
the participantID_recoding_GONOGO()
function. Again, since
the participants’ e-mail addresses are confidential, both the function
and the original dataset are not showed.
# loading and using the function to recode the ID variable
source("participantID_recoding_GONOGO.R")
(GNG <- participantID_recoding_GONOGO(GNG))[1:3,] # recoding GNG IDs and showing first three rows
Second, we recode the SessionDate
and
SessionTime
variables so that they indicate the date and
time and the time within day of each session, respectively. We can note
that all sessions were recorded within the data collection period (i.e.,
between November 2018 and November 2019). In terms of time within day,
we can note that most sessions were recorded between 16:15 and 18:15,
with only three sessions recorded before 15:30.
# SessionDate as date and time
GNG$sd <- as.POSIXct(NA)
GNG[substr(GNG$SessionDate,3,3)=="/","sd"] <- # SessionDate format as d/m/y
as.POSIXct(paste(GNG[substr(GNG$SessionDate,3,3)=="/","SessionDate"],
substr(GNG[substr(GNG$SessionDate,3,3)=="/","SessionTime"],12,19)),format="%m/%d/%Y %H:%M:%S",tz="GMT")
GNG[substr(GNG$SessionDate,3,3)=="-","sd"] <- # SessionDate format as m-d-y
as.POSIXct(paste(GNG[substr(GNG$SessionDate,3,3)=="-","SessionDate"],
substr(GNG[substr(GNG$SessionDate,3,3)=="-","SessionTime"],12,19)),format="%m-%d-%Y %H:%M:%S",tz="GMT")
GNG$SessionDate <- GNG$sd
# plotting SessionDate and SessionTime
par(mfrow=c(1,2))
hist(GNG[!duplicated(paste(GNG$ID,GNG$Session)),"SessionDate"],breaks="months",freq=TRUE,main="SessionDate")
hist(GNG[!duplicated(paste(GNG$ID,GNG$Session)),"SessionTime"],breaks="min",freq=TRUE,main="SessionTime")
Third, we use the GNGrecoding()
function showed above to
recode the other variables in the GNG
dataset, to filter
unuseful columns, and to correct some issues with specific participants
(see function details at the beginning of section 4.2).
(GNG <- GNGrecoding(GNG))[1:3,] # recoding GNG and showing first three rows
Fourth, we create the wide-form block-level
BLOCK
dataset by using the trials2blocks
function to average accuracy ACC
and reaction times
RT
across blocks. Indeed, each participant performed the
task once per day
, for three consecutive days, and each
day
the session was divided into two 108-trial
block
s with identical stimuli but different NoGo stimulus.
Thus, whereas the GNG
data has one row per trial (i.e., 108
* 6 rows per participant), the BLOCK
dataset will have one
row per block (i.e., 6 rows per participant). Note that mean
RT
are computed by only considering “Go” trials with ACC =
1.
(BLOCK <- trials2blocks(GNG))[1:3,] # recoding and showing the first three rows
Finally, we use the variables computed in the BLOCK
dataset to compute the intra-individual coefficient of
variation ICV
as the ratio between the standard
deviation RT.sd
and the mean reaction time RT
in each block. Again, this is done by only considering “Go” trials with
ACC = 1.
BLOCK$ICV <- BLOCK$RT.sd/BLOCK$RT # ICV = RT.sd / mean RT
Here, we clean and filter the data based on the inclusion criteria.
First, we inspect the original number of respondents and observations
in the GNG
dataset. We can note that the Go/NoGo task was
performed by the whole sample of 105 participants that were actually
invited to participate to the daily protocol. We can also note that the
number of trials and blocks do not match with the protocol for two
participants (see section 4.3.3).
cat("Original No. of trial-by-trial data =",nrow(GNG),"in",nrow(BLOCK),"blocks from",nlevels(GNG$ID),"participants")
## Original No. of trial-by-trial data = 67392 in 624 blocks from 105 participants
# participants with less than 648 trials
for(ID in levels(GNG$ID)){ if(nrow(GNG[GNG$ID==ID,])!=648){ print(paste(ID,nrow(GNG[GNG$ID==ID,]),"trials"))} }
## [1] "HRV009 216 trials"
## [1] "HRV059 432 trials"
# participants with less than 6 blocks
for(ID in levels(BLOCK$ID)){ if(nrow(BLOCK[BLOCK$ID==ID,])!=6){ print(paste(ID,nrow(BLOCK[BLOCK$ID==ID,]),"blocks"))} }
## [1] "HRV009 2 blocks"
## [1] "HRV059 4 blocks"
First, we inspect the data for cases of double trials and blocks,
that is blocks with the same ID
, day
, and
block
value or with more than 108 trials. We can see that
no double blocks or double trials are included in the
HRV
dataset.
# double blocks
cat(nrow(BLOCK[duplicated(paste(BLOCK$ID,BLOCK$day,BLOCK$block)),]),"double blocks")
## 0 double blocks
# double trials
BLOCK$nTrials <- as.integer(table(as.factor(paste(GNG$ID,GNG$day,GNG$block))))
cat(nrow(BLOCK[BLOCK$nTrials > 108,]),"blocks with double trials")
## 0 blocks with double trials
Second, we apply our inclusion criteria to the task performance, that
is we consider reaction times to “Go” trials < 150 ms as
anticipations, and we recode them by setting
ACC = 0
.
# detecting and filtering anticipations
cat("Recoding",nrow(GNG[GNG$TrialType=="Go" & !is.na(GNG$ACC) & GNG$RT > 0 & GNG$RT < 150,]),"anticipations (",
round(100*nrow(GNG[GNG$TrialType=="Go" & !is.na(GNG$ACC) & GNG$RT > 0 & GNG$RT < 150,])/
nrow(GNG[GNG$TrialType=="Go" & !is.na(GNG$ACC) & GNG$ACC==1,]),2),"% ) by setting ACC = 0")
## Recoding 470 anticipations ( 0.96 % ) by setting ACC = 0
GNG[GNG$TrialType=="Go" & !is.na(GNG$ACC) & GNG$RT > 0 & GNG$RT < 150,"ACC"] <- 0
# updating BLOCK dataset
BLOCK <- trials2blocks(GNG)
BLOCK$ICV <- BLOCK$RT.sd/BLOCK$RT # ICV = RT.sd / mean RT
Third, we mark the 8 participants that were excluded
for the reasons specified in section 1.3.4. As in that section, such
participants are marked as ‘OUT’. Thus, from the initial 105
participants, the number of excluded participants is 8 (7.62%%), whereas
the total number of included participants is 97 (92.38%). From
BLOCK
data, we can note that excluded participants
performed slightly differently than included
participants (i.e., showing lower ACC
to NoGo trials and
lower ICV
, but also lower RT
). Then, we remove
excluded participants from the GNG
and the
BLOCK
datasets.
# marking 8 excluded participants as "OUT"
memory <- GNG
GNG$ID <- as.character(GNG$ID)
excl <- c("HRV009","HRV011","HRV015","HRV027","HRV029","HRV046","HRV059","HRV066")
GNG[GNG$ID%in%excl,"ID"] <- gsub("HRV","OUT",GNG[GNG$ID%in%excl,"ID"]) # marking excluded participants as "OUTXXX"
# plotting
GNG$excl <- "IN" # creating excl variable (factor)
GNG[substr(GNG$ID,1,3)=="OUT","excl"] <- "OUT"
GNG$excl <- as.factor(GNG$excl)
grid.arrange(ggplot(GNG[GNG$TrialType=="Go" & GNG$ACC==1 & !is.na(GNG$RT),],aes(x=excl,y=RT,fill=excl)) + geom_violin(),
ggplot(GNG[GNG$TrialType=="Go",],aes(x=excl,y=ACC,fill=excl)) + geom_violin(),
ggplot(GNG[GNG$TrialType=="No-go",],aes(x=excl,y=ACC,fill=excl)) + geom_violin(),nrow=1)
# same thing with the BLOCK dataset
memory.BLOCK <- BLOCK
BLOCK$ID <- as.character(BLOCK$ID)
BLOCK[BLOCK$ID%in%excl,"ID"] <- gsub("HRV","OUT",BLOCK[BLOCK$ID%in%excl,"ID"]) # marking excluded participants as "OUTXXX"
BLOCK$excl <- "IN" # creating excl variable (factor)
BLOCK[substr(BLOCK$ID,1,3)=="OUT","excl"] <- "OUT"
BLOCK$excl <- as.factor(BLOCK$excl)
grid.arrange(ggplot(BLOCK,aes(x=excl,y=RT,fill=excl)) + geom_violin(),
ggplot(BLOCK,aes(x=excl,y=RT.sd,fill=excl)) + geom_violin(),
ggplot(BLOCK,aes(x=excl,y=ICV,fill=excl)) + geom_violin(),
ggplot(BLOCK,aes(x=excl,y=Go.ACC,fill=excl)) + geom_violin(),
ggplot(BLOCK,aes(x=excl,y=NoGo.ACC,fill=excl)) + geom_violin(),nrow=1)
# updating number of included participants
cat(nrow(GNG[substr(GNG$ID,1,3)=="HRV",]),"included observations in",nrow(BLOCK[substr(BLOCK$ID,1,3)=="HRV",]),"blocks from",
nlevels(as.factor(as.character(GNG[substr(GNG$ID,1,3)=="HRV","ID"]))),"included participants \nout of",
nrow(memory),"total observations in",nrow(memory.BLOCK),"blocks from",nrow(memory[!duplicated(memory$ID),]),
"invited participants")
## 62856 included observations in 582 blocks from 97 included participants
## out of 67392 total observations in 624 blocks from 105 invited participants
# removing excluded participants
GNG <- GNG[substr(GNG$ID,1,3)=="HRV",]
GNG$ID <- as.factor(GNG$ID) # ID as factor
GNG <- GNG[order(GNG$ID,GNG$SessionDate),] # sorting by ID and SessionDate
BLOCK <- BLOCK[substr(BLOCK$ID,1,3)=="HRV",]
BLOCK$ID <- as.factor(BLOCK$ID)
BLOCK <- BLOCK[order(BLOCK$ID,BLOCK$day,BLOCK$block),]
GNG$excl <- BLOCK$excl <- NULL # removing excl variable
No further criteria are considered to flag potentially confounding cases in relation to the Go/NoGo data. No missing responses are shown by included participants, and no particular cases are highlighted.
summary(GNG) # summary of GNG (no missing data)
## ID day block SessionDate
## HRV001 : 648 Min. :1 Min. :1.0 Min. :2018-11-06 16:17:55.00
## HRV002 : 648 1st Qu.:1 1st Qu.:1.0 1st Qu.:2018-12-19 17:22:41.00
## HRV003 : 648 Median :2 Median :1.5 Median :2019-03-19 16:29:10.00
## HRV004 : 648 Mean :2 Mean :1.5 Mean :2019-03-27 15:46:28.82
## HRV005 : 648 3rd Qu.:3 3rd Qu.:2.0 3rd Qu.:2019-05-22 17:09:35.00
## HRV006 : 648 Max. :3 Max. :2.0 Max. :2019-11-21 17:18:17.00
## (Other):58968
## SessionTime TrialType letter
## Min. :1970-01-01 14:32:43.0000 Go :47142 E : 5238
## 1st Qu.:1970-01-01 16:31:50.0000 No-go:15714 F : 5238
## Median :1970-01-01 16:53:50.5000 G : 5238
## Mean :1970-01-01 16:55:45.5343 L : 5238
## 3rd Qu.:1970-01-01 17:18:24.0000 M : 5238
## Max. :1970-01-01 18:24:44.0000 N : 5238
## (Other):31428
## Jitter ACC RT
## Min. : 0.0 Min. :0.0000 Min. : 0.0
## 1st Qu.: 50.0 1st Qu.:1.0000 1st Qu.:220.0
## Median :100.0 Median :1.0000 Median :295.0
## Mean :100.2 Mean :0.9089 Mean :249.2
## 3rd Qu.:150.0 3rd Qu.:1.0000 3rd Qu.:341.0
## Max. :200.0 Max. :1.0000 Max. :499.0
##
summary(BLOCK) # summary of BLOCK (no missing data)
## ID day block Go.ACC NoGo.ACC
## HRV001 : 6 Min. :1 Min. :1.0 Min. :0.7778 Min. :0.2963
## HRV002 : 6 1st Qu.:1 1st Qu.:1.0 1st Qu.:0.9506 1st Qu.:0.6667
## HRV003 : 6 Median :2 Median :1.5 Median :0.9753 Median :0.7778
## HRV004 : 6 Mean :2 Mean :1.5 Mean :0.9637 Mean :0.7445
## HRV005 : 6 3rd Qu.:3 3rd Qu.:2.0 3rd Qu.:1.0000 3rd Qu.:0.8519
## HRV006 : 6 Max. :3 Max. :2.0 Max. :1.0000 Max. :1.0000
## (Other):546
## RT RT.sd ICV
## Min. :234.6 Min. :25.96 Min. :0.08186
## 1st Qu.:299.3 1st Qu.:47.48 1st Qu.:0.14692
## Median :318.7 Median :53.21 Median :0.17017
## Mean :318.1 Mean :53.56 Mean :0.16961
## 3rd Qu.:338.3 3rd Qu.:59.57 3rd Qu.:0.19136
## Max. :390.1 Max. :87.04 Max. :0.26160
##
Now that we have read, recoded, and filtered the four datasets collected in the study (all including the data from the 97 included participants), we can merge them into the final datasets to be used for the analyses. Specifically, we create four final datasets:
ema
includes data from the ESM
,
HRV
, and PrelQS
dataset, with one row
per measurement of either HRV
(5 ambulatory + 1
lab) or ESM
ratings (1 ‘baseline’ + 5 ambulatory), whereas
PrelQS
information is identically repeated in each row of a
given participant
between
includes all the collected variables, with
one row per participant and by including the
participant averages (e.g., mean, median, SD) of each variable
gng
includes all the collected variables, with
one row per trial of the Go/NoGo task, only including
the values corresponding to the last daily measurement from the
ESM
and the HRV
datasets
block
includes all the collected variables, with
one row per block of the Go/NoGo task, only including
the values corresponding to the last daily measurement from the
ESM
and the HRV
datasets
The following packages are used to optimize data merging.
library(plyr)
First, we merge the HRV
and the ESM
by the
ID
, day
, and within.day
variables. Note that the within.day
variable ranges from 0
to 5 for ESM
data (0 = baseline, 1-5 = ordinary
questionnaires) and from 1 to 6 for HRV
data (1-5 =
ambulatory recordings, 6 = laboratory recordings), with only those
measurements associated with within.day
values from 1 to 5
being matched between the two datasets.
# merging HRV and ESM
ESM[,c("id","idday")] <- HRV$artefacts <- NULL # removing unnecessary variables
HRVESM <- join(HRV,ESM,by=c("ID","day","within.day"),type="full") # merging datasets
HRVESM <- HRVESM[order(HRVESM$ID,HRVESM$day,HRVESM$within.day),] # sorting
row.names(HRVESM) <- 1:nrow(HRVESM) # row names
# sanity check: no HRV when within.day=0, no ESM when within.day=6
cat("Sanity check:",nrow(HRVESM[(HRVESM$within.day==0 & !is.na(HRVESM$RMSSD)) |
(HRVESM$within.day==6 & !is.na(HRVESM$v1)),]) == 0)
## Sanity check: TRUE
# sanity check: 97 included participants x 3 days x 7 occasions
cat("Sanity check:",nrow(HRVESM)==97*3*7)
## Sanity check: TRUE
# showing first three lines
HRVESM[1:3,]
Here, we inspect the number of cases from included participants with
within.day
ranging from 1 to 5 but with missing
HRV
(N = 7) or ESM
values (N = 36).
# missing HRV, nonmising ESM
nrow(HRVESM[HRVESM$within.day %in% 1:5 & HRVESM$miss==1 & !is.na(HRVESM$v1),]) # 7
## [1] 7
# missing ESM, nonmising HRV
nrow(HRVESM[HRVESM$within.day %in% 1:5 & HRVESM$miss!=1 & is.na(HRVESM$v1),]) # 36
## [1] 36
# both missing
nrow(HRVESM[HRVESM$within.day %in% 1:5 & HRVESM$miss==1 & is.na(HRVESM$v1),]) # 110
## [1] 110
Second, we merge the HRVESM
dataset created above with
the PrelQS
dataset.
The first merged dataset ema
has a long
form with one row per measurement, so that PrelQS
information is identically repeated over all rows associated with a
given participant.
# merging HRV and ESM
colnames(HRVESM)[which(colnames(HRVESM)%in%c("gender","drugs"))] <- c("gender.ESM","drugs.ESM") # renaming matching variables
ema <- join(HRVESM,PrelQS,by="ID",type="left")
ema <- ema[order(ema$ID,ema$day,ema$within.day),] # sorting rows
row.names(ema) <- 1:nrow(ema) # row names
# sanity check: same nrow and No. of participants than HRVESM
cat("Sanity check:",nrow(HRVESM) == nrow(ema))
## Sanity check: TRUE
cat("Sanity check:",nlevels(HRVESM$ID) == nlevels(ema$ID))
## Sanity check: TRUE
# sorting and showing first three lines
(ema <- ema[order(ema$ID,ema$day,ema$within.day),])[1:3,]
The second merged dataset between
has a wide
form with one row per participant, so that the aggregated
values of those HRVESM
that are meaningful at the between
level are joined to the PrelQS
. Specifically, we compute
the mean for each continuous variable renaming it as “varName.cm”,
whereas we compute both the mean and the median for RMSSD
and RMSSDcv
due to their skewed distribution. Raw
ESM
ratings are not averaged since we firstly plan to
compute the aggregated scores based on the assessment of their
psychometric properties (see the related
report). As a sanity check, we can see that only 1 participant chose
a Sensus protocol gender different from what reported in the preliminary
questionnaire (likely due to their mistake in selecting the Sensus
protocol).
# computing mean and/or median for quantitative variables
between <- PrelQS
Vars <- c("PI","HR","RMSSD","RMSSDcv","nArtefacts")
for(Var in Vars){ colnames(ema)[which(colnames(ema)==Var)] <- "Var"
between <- join(between,aggregate(Var ~ ID,data=ema,FUN="mean"),by="ID",type="left") # mean of all vars
colnames(between)[ncol(between)] <- paste(Var,"cm",sep=".")
if(Var %in% c("RMSSD","RMSSDcv")){
between <- join(between,aggregate(Var ~ ID,data=ema,FUN="median",na.action=na.omit),by="ID",type="left") # median for skewed
colnames(between)[ncol(between)] <- paste(Var,"cMed",sep=".") }
colnames(ema)[which(colnames(ema)=="Var")] <- Var }
# joining categorical variables (i.e., flag criteria)
between <- join(between,HRV[!duplicated(HRV$ID),c("ID","inclRate60","noisyPI")],by="ID",type="left") # HRV
between <- join(between,ESM[!duplicated(ESM$ID),c("ID","cLessResp1","cLessResp2")],by="ID",type="left") # ESM
between$smoker <- 0
for(ID in levels(between$ID)){
if(nrow(ESM[ESM$ID==ID,])>0){
between[between$ID==ID,"os"] <- ESM[ESM$ID==ID & !is.na(ESM$os),"os"][1] # OS
between[between$ID==ID,"gender.ESM"] <- ESM[ESM$ID==ID & !is.na(ESM$gender),"gender"][1] # gender
if(any(ESM[ESM$ID==ID & !is.na(ESM$smoke),"smoke"]=="1")){ between[between$ID==ID,"smoker"] <- 1 } # smokers
}}
# sanity check:
cat("No. cases with PrelQS gender != ESM gender =",nrow(between[between$gender!=between$gender.ESM,]))
## No. cases with PrelQS gender != ESM gender = 1
between$gender.ESM <- ema$gender.ESM <- NULL # removing gender.ESM (trusting PrelQs)
# sorting and showing first three lines
(between <- between[order(between$ID),])[1:3,]
Third, we integrate ema
information (including
HRV
, ESM
, and PrelQS
variables)
to the GNG
dataset.
The first merged dataset gng
has a long
form with one row for each trial of the Go/NoGo task, so that
between
information is identically repeated over all rows
associated with the same participant. Similarly, HRV
and
EMA
values are identically repeated over all
gng
rows associated with the same day
, but
here we only considered the values of the last daily recording (i.e.,
within.day = 6
for HRV
and
within.day = 5
for ESM
). Again, raw
ESM
item scores are not included now but will be after the
evaluation of their psychometric properties (see the related
report).
# merging GNG and HRV (last measurement)
gng <- join(GNG,HRV[HRV$within.day==6,],by=c("ID","day"),type="left")
colnames(gng)[which(colnames(gng)=="within.day")] <- "time.HRV"
# merging gng and ESM (last measurement)
gng <- join(gng,ESM[ESM$within.day==5,],by=c("ID","day"),type="left")
colnames(gng)[which(colnames(gng)=="within.day")] <- "time.ESM"
# merging gng and PrelQS
gng <- join(gng,PrelQS,by="ID",type="left")
# sanity check: time.HRV can only be = 6 and time.ESM can only be = 5
summary(as.factor(gng$time.HRV))
## 6
## 62856
summary(as.factor(gng$time.ESM))
## 5 NA's
## 57888 4968
gng$time.ESM <- gng$time.HRV <- NULL
cat("Sanity check:",nrow(gng)==nrow(GNG)) # same nrow than GNG
## Sanity check: TRUE
# sorting and showing first three lines
(gng <- gng[order(gng$ID,gng$day,gng$block),])[1:3,]
The second merged dataset is just the same wide-form
between
dataset created above, integrated with the
aggregated GNG
data.
# computing aggregated values
between <- join(between,aggregate(RT ~ ID,data=GNG[GNG$TrialType=="Go" & GNG$ACC==1,],FUN="mean"), # mean RT to Go trials
by="ID",type="left")
between <- join(between,aggregate(ACC ~ ID,data=GNG[GNG$TrialType=="Go",],FUN="mean"), # mean ACC to Go trials
by="ID",type="left")
between <- join(between,aggregate(ACC ~ ID,data=GNG[GNG$TrialType=="No-go",],FUN="mean"), # mean ACC to No-go trials
by="ID",type="left")
colnames(between)[(ncol(between)-2):ncol(between)] <- paste(c("RT","Go.ACC","NoGo.ACC"),"cm",sep=".") # renaming columns
# sorting and showing first three lines
(between <- between[order(between$ID),])[1:3,]
Finally, we integrate ema
information (including
HRV
, ESM
, and PrelQS
variables)
to the BLOCK
dataset.
The first merged dataset block
has a long
form with one row for each block of the Go/NoGo task, so that
between
information is identically repeated over all rows
associated with the same participant. Similarly, HRV
and
EMA
values are identically repeated over all
block
rows associated with the same day
, but
here we only considered the values of the last daily recording (i.e.,
within.day = 6
for HRV
and
within.day = 5
for ESM
). Again, raw
ESM
item scores are not included now but will be after the
evaluation of their psychometric properties (see the related
report).
# merging BLOCK and HRV (last measurement)
block <- join(BLOCK,HRV[HRV$within.day==6,],by=c("ID","day"),type="left")
colnames(block)[which(colnames(block)=="within.day")] <- "time.HRV"
# merging gng and ESM (last measurement)
block <- join(block,ESM[ESM$within.day==5,],by=c("ID","day"),type="left")
colnames(block)[which(colnames(block)=="within.day")] <- "time.ESM"
# merging gng and PrelQS
block <- join(block,PrelQS,by="ID",type="left")
# sanity check: time.HRV can only be = 6 and time.ESM can only be = 5
summary(as.factor(block$time.HRV))
## 6
## 582
summary(as.factor(block$time.ESM))
## 5 NA's
## 536 46
cat("Sanity check:",nrow(block)==nrow(BLOCK)) # same nrow than GNG
## Sanity check: TRUE
# sorting and showing first three lines
(block <- block[order(block$ID,block$day,block$block),])[1:3,]
Finally, we aggregate the RT.sd
and ICV
data from the BLOCK
dataset by computing their mean by
participant, and we integrate this information into the
between
dataset.
# computing aggregated values
between <- join(between,aggregate(RT.sd ~ ID,data=BLOCK,FUN="mean"), # mean RT SD to Go trials
by="ID",type="left")
between <- join(between,aggregate(ICV ~ ID,data=BLOCK,FUN="mean"), # mean ICV to Go trials
by="ID",type="left")
colnames(between)[(ncol(between)-1):ncol(between)] <- paste(c("RT.sd","ICV"),"cm",sep=".") # renaming columns
# sorting and showing first three lines
(between <- between[order(between$ID),])[1:3,]
Here, we show the structure of each of the final pre-processed datasets and we provide a data dictionary for each of them.
The ema
dataset includes HRV
,
ESM
, and PrelQS
information with one
row for each measurement of either HRV
(5
ambulatory + 1 lab) or ESM
(1 ‘baseline’ + 5
ambulatory).
str(ema)
## 'data.frame': 2037 obs. of 49 variables:
## $ ID : Factor w/ 97 levels "HRV001","HRV002",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ day : num 1 1 1 1 1 1 1 2 2 2 ...
## $ within.day : num 0 1 2 3 4 5 6 0 1 2 ...
## $ PI : num NA 649 655 750 657 ...
## $ HR : num NA 92.9 91.9 80.2 91.5 ...
## $ RMSSD : num NA 25.8 23.3 28.8 22.7 ...
## $ RMSSDcv : num NA 3.97 3.55 3.84 3.46 ...
## $ nPI : int NA 183 175 160 180 163 149 NA 197 163 ...
## $ miss : num NA 0 0 0 0 0 0 NA 0 0 ...
## $ out : num NA 0 0 0 0 0 0 NA 0 0 ...
## $ nArtefacts : int NA 8 5 0 7 6 0 NA 0 0 ...
## $ pArtefacts : num NA 4.37 2.86 0 3.89 ...
## $ art10 : num NA 0 0 0 0 0 0 NA 0 0 ...
## $ inclRate60 : num NA 0 0 0 0 0 0 NA 0 0 ...
## $ noisyPI : num NA 0 0 0 0 0 0 NA 0 0 ...
## $ os : Factor w/ 2 levels "Android","iOS": NA 1 1 1 1 1 NA NA 1 1 ...
## $ RunTimestamp: POSIXct, format: "2018-11-06 10:12:16.000" "2018-11-06 11:38:33.269" ...
## $ v1 : int 5 5 4 4 4 3 NA 4 4 4 ...
## $ v2 : num 3 4 4 4 4 4 NA 3 5 3 ...
## $ v3 : num 4 4 3 3 4 2 NA 5 4 4 ...
## $ t1 : num 3 3 2 2 4 3 NA 5 2 3 ...
## $ t2 : int 5 3 3 3 5 4 NA 5 3 4 ...
## $ t3 : int 6 3 3 2 5 6 NA 3 5 5 ...
## $ f1 : int 6 6 5 6 3 3 NA 5 4 3 ...
## $ f2 : num 6 3 3 4 3 3 NA 4 3 3 ...
## $ f3 : int 6 6 5 5 5 2 NA 6 4 4 ...
## $ PE : int 3 1 2 1 2 2 NA 2 1 2 ...
## $ PE.int : int 2 1 2 1 6 3 NA 2 1 5 ...
## $ NE : int 1 2 1 2 1 4 NA 2 1 1 ...
## $ NE.int : int 1 4 1 4 1 4 NA 4 1 1 ...
## $ smoke : Factor w/ 2 levels "0","1": NA 1 1 1 1 1 NA NA 1 1 ...
## $ coffe : Factor w/ 2 levels "0","1": NA 1 1 1 1 1 NA NA 1 1 ...
## $ alcohol : Factor w/ 2 levels "0","1": NA 1 1 1 1 1 NA NA 1 1 ...
## $ activity : Factor w/ 2 levels "0","1": NA 1 2 1 1 2 NA NA 1 2 ...
## $ people : Factor w/ 2 levels "0","1": NA 1 1 1 1 2 NA NA 1 1 ...
## $ daybefore.ex: Factor w/ 2 levels "0","1": 1 NA NA NA NA NA NA 1 NA NA ...
## $ drugs.ESM : Factor w/ 2 levels "0","1": 1 NA NA NA NA NA NA 1 NA NA ...
## $ drugs.which : chr "" NA NA NA ...
## $ cLess1 : num 0 0 1 0 0 1 NA 1 0 0 ...
## $ cLess2 : num 0 0 1 0 0 0 NA 0 0 0 ...
## $ cLessResp1 : num 0 0 0 0 0 0 NA 0 0 0 ...
## $ cLessResp2 : num 0 0 0 0 0 0 NA 0 0 0 ...
## $ time : POSIXct, format: "2018-11-07 11:32:45" "2018-11-07 11:32:45" ...
## $ age : num 25 25 25 25 25 25 25 25 25 25 ...
## $ gender : Factor w/ 2 levels "F","M": 2 2 2 2 2 2 2 2 2 2 ...
## $ BMI : num 19.2 19.2 19.2 19.2 19.2 ...
## $ drugs : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
## $ dysfun : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
## $ postponed : num 0 0 0 0 0 0 0 0 0 0 ...
ID
= participant identification code
day
= day of assessment (1, 2 or 3)
within.day
= scheduled recording number (0 = first
ESM laboratory measurements; 1-5 = ambulatory measurements; 6 = last HRV
laboratory measurement)
PI
= mean pulse interval (ms), computed as the mean
of the PIs in a given recording segment
HR
= mean heart rate (bpm), computed as
60,000/PI
RMSSD
= root mean square of successive differences
in normal-to-normal intervals (ms)
RMSSDcv
= RMSSD coefficient of variation, computed
as 100 × RMSSD
/ PI
following Geus et al (2019)
nPI
= number of pulse intervals considered for each
segment
miss
= flagging missing segments (i.e., not
recorded) (1 = missing, 0 = nonmissing)
out
= flagging excluded segments due to artefacts in
more than the 25% (1 = excluded; 0 = included)
nArtefacts
= number of artefacts
pArtefacts
= percentage of artefacts computed as
100 * nArtefacts / nPI
art10
= flagging segments with
pArtefacts
> 10% (0 = no, 1 = yes)
inclRate60
= flagging participants with a rate of
included segments < 60% than the total number of scheduled segments
(0 = no, 1 = yes)
noisyPIs
= flagging participants with a relatively
‘noisy’ BVP or PI signal (0 = no, 1 = yes)
os
= operating system (either Android or
iOS)
RunTimestamp
= ESM questionnaire opening timestamp
(yyyy-mm-dd hh:mm:ss)
v1
, … f3
= ratings at the
Multidimensional Mood Questionnaire items (1-7)
PE
, … NE.int
= events ratings
(1-7)
smoke
= cigarette smoking over the 20 min before the
recording (0 = no, 1 = yes)
coffe
= coffee drinking over the 60 min before the
recording (0 = no, 1 = yes)
alcohol
= alcohol over the 60 min before the
recording (0 = no, 1 = yes)
activity
= activity intensity before the recording
(0 = low/inactivity, 1 = moderate or intense)
people
= interferences during the recording (0 =
alone or with someone not disturbing, 2 = people disturbing or
interrupting)
daybefore.ex
= intense exercise during the day
before (0 = no, 1 = yes)
drugs
= drugs consumed on that day (0 = no, 1 =
yes)
drugs.which
= which drugs has been used on that day
(open-ended responses)
cLess1
= flagging potentially careless responses
based on 1+ criteria
cLess2
= flagging potentially careless responses
based on 2+ criteria
cLessResp1
= flagging potentially careless
respondents based on 1+ criteria
cLessResp2
= flagging potentially careless
respondents based on 2+ criteria
time
= questionnaire opening timestamp (yyyy-mm-dd
hh:mm:ss)
age
= participants’ age (years)
gender
= participants’ sex (“F” or “M”)
BMI
= participants’ body mass index (kg/m2)
drugs
= flagging participants reporting taking some
medication (0 = no, 1 = yes)
dysfun
= flagging participants reporting minor
cardiovascular and/or other dysfunctions (0 = no, 1 = yes)
postponed
= flagging participants that postponed one
protocol day
The between
dataset includes all collected variables,
with one row for each participant. HRV
, ESM
,
GNG
, and BLOCK
data are averaged by
participant.
str(between)
## 'data.frame': 97 obs. of 26 variables:
## $ ID : Factor w/ 97 levels "HRV001","HRV002",..: 1 2 3 4 5 6 7 8 9 10 ...
## $ time : POSIXct, format: "2018-11-07 11:32:45" "2018-11-08 00:52:05" ...
## $ age : num 25 28 22 22 25 21 20 20 20 22 ...
## $ gender : Factor w/ 2 levels "F","M": 2 2 1 1 2 2 1 1 1 2 ...
## $ BMI : num 19.2 22.7 26.6 20.2 23.2 ...
## $ drugs : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 2 2 1 1 ...
## $ dysfun : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
## $ postponed : num 0 0 0 0 0 0 0 0 0 0 ...
## $ PI.cm : num 675 687 799 668 891 ...
## $ HR.cm : num 90.2 88.2 76.7 90.5 69 ...
## $ RMSSD.cm : num 30.6 32.7 51.7 33.2 70.9 ...
## $ RMSSD.cMed : num 29.1 33.6 54.1 31.4 68.6 ...
## $ RMSSDcv.cm : num 4.46 4.76 6.33 4.93 8.07 ...
## $ RMSSDcv.cMed : num 4.36 4.66 6.44 4.54 7.23 ...
## $ nArtefacts.cm: num 2.83 9.8 11.2 12.12 7.43 ...
## $ inclRate60 : num 0 0 1 0 1 0 0 0 0 1 ...
## $ noisyPI : num 0 0 0 0 0 0 0 0 0 0 ...
## $ cLessResp1 : num 0 0 0 0 1 0 1 1 0 0 ...
## $ cLessResp2 : num 0 0 0 0 0 0 0 1 0 0 ...
## $ smoker : num 0 1 0 1 1 0 0 0 1 0 ...
## $ os : Factor w/ 2 levels "Android","iOS": 1 1 1 1 1 1 1 1 2 1 ...
## $ RT.cm : num 314 319 314 364 331 ...
## $ Go.ACC.cm : num 0.99 0.965 0.957 0.889 0.994 ...
## $ NoGo.ACC.cm : num 0.809 0.71 0.599 0.87 0.833 ...
## $ RT.sd.cm : num 38 57.8 54.1 63 33.4 ...
## $ ICV.cm : num 0.121 0.182 0.173 0.173 0.101 ...
ID
= participant identification codetime
= questionnaire opening timestamp (yyyy-mm-dd
hh:mm:ss)
age
= participants’ age (years)
gender
= participants’ sex (“F” or “M”)
BMI
= participants’ body mass index (kg/m2)
drugs
= flagging participants reporting taking some
medication (0 = no, 1 = yes)
dysfun
= flagging participants reporting minor
cardiovascular and/or other dysfunctions (0 = no, 1 = yes)
postponed
= flagging participants that postponed one
protocol day
PI.cm
= participant’s mean pulse interval (ms),
computed as the mean of the PIs in a given recording segment
HR.cm
= participant’s mean heart rate (bpm),
computed as 60,000/PI
RMSSD.cm
= participant’s mean root mean square of
successive differences in normal-to-normal intervals (ms)
RMSSD.cMed
= participant’s median root mean square
of successive differences in normal-to-normal intervals (ms)
RMSSDcv.cm
= participant’s mean RMSSD coefficient of
variation, computed as 100 × RMSSD
/ PI
following Geus et al (2019)
RMSSDcv.cMed
= participant’s median RMSSD
coefficient of variation, computed as 100 × RMSSD
/
PI
following Geus et al (2019)
nArtefacts.cm
= participant’s mean number of
artefacts
inclRate60
= flagging participants with a rate of
included segments < 60% than the total number of scheduled segments
(0 = no, 1 = yes)
noisyPIs
= flagging participants with a relatively
‘noisy’ BVP or PI signal (0 = no, 1 = yes)
cLessResp1
= flagging potentially careless
respondents based on 1+ criteria (1 = Yes, 0 = No)
cLessResp2
= flagging potentially careless
respondents based on 2+ criteria (1 = Yes, 0 = No)
smoker
= flagging smokers (1 = Yes, 0 = No)
os
= operating system (either Android or
iOS)
RT.cm
= participant’s mean reaction time (ms) to
accurate “Go” trials
Go.ACC.cm
= participant’s mean accuracy (0-1) to
“Go” trials
NoGo.ACC.cm
= participant’s mean accuracy (0-1) to
“No-go” trials
RT.sd.cm
= participant’s mean reaction time (ms)
standard deviation to accurate “Go” trials
ICV.cm
= participant’s mean intraindividual
coefficient of variation in accurate “Go” trials
The gng
dataset includes all collected variables, with
one row for each trial of the go/no-go task. HRV
data is
included only for the segments recorded in the lab
(within.day
= 6), whereas ESM
information is
included only as the last ESM data entry
(within.day = 5
).
str(gng)
## 'data.frame': 62856 obs. of 57 variables:
## $ ID : Factor w/ 97 levels "HRV001","HRV002",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ day : int 1 1 1 1 1 1 1 1 1 1 ...
## $ block : num 1 1 1 1 1 1 1 1 1 1 ...
## $ SessionDate : POSIXct, format: "2018-11-06 16:17:55" "2018-11-06 16:17:55" ...
## $ SessionTime : POSIXct, format: "1970-01-01 16:17:55" "1970-01-01 16:17:55" ...
## $ TrialType : Factor w/ 2 levels "Go","No-go": 1 2 1 1 1 2 1 1 1 1 ...
## $ letter : Factor w/ 12 levels "E","F","G","L",..: 4 10 4 7 1 10 1 4 4 1 ...
## $ Jitter : int 200 200 0 150 0 50 150 0 50 50 ...
## $ ACC : num 1 1 1 1 1 1 1 1 1 1 ...
## $ RT : num 343 0 328 360 339 0 306 332 404 355 ...
## $ PI : num 831 831 831 831 831 ...
## $ HR : num 72.5 72.5 72.5 72.5 72.5 ...
## $ RMSSD : num 49.3 49.3 49.3 49.3 49.3 ...
## $ RMSSDcv : num 5.93 5.93 5.93 5.93 5.93 ...
## $ nPI : int 149 149 149 149 149 149 149 149 149 149 ...
## $ miss : num 0 0 0 0 0 0 0 0 0 0 ...
## $ out : num 0 0 0 0 0 0 0 0 0 0 ...
## $ nArtefacts : int 0 0 0 0 0 0 0 0 0 0 ...
## $ pArtefacts : num 0 0 0 0 0 0 0 0 0 0 ...
## $ art10 : num 0 0 0 0 0 0 0 0 0 0 ...
## $ inclRate60 : num 0 0 0 0 0 0 0 0 0 0 ...
## $ noisyPI : num 0 0 0 0 0 0 0 0 0 0 ...
## $ os : Factor w/ 2 levels "Android","iOS": 1 1 1 1 1 1 1 1 1 1 ...
## $ gender : Factor w/ 2 levels "F","M": 2 2 2 2 2 2 2 2 2 2 ...
## $ RunTimestamp: POSIXct, format: "2018-11-06 16:09:16.716" "2018-11-06 16:09:16.716" ...
## $ v1 : int 3 3 3 3 3 3 3 3 3 3 ...
## $ v2 : num 4 4 4 4 4 4 4 4 4 4 ...
## $ v3 : num 2 2 2 2 2 2 2 2 2 2 ...
## $ t1 : num 3 3 3 3 3 3 3 3 3 3 ...
## $ t2 : int 4 4 4 4 4 4 4 4 4 4 ...
## $ t3 : int 6 6 6 6 6 6 6 6 6 6 ...
## $ f1 : int 3 3 3 3 3 3 3 3 3 3 ...
## $ f2 : num 3 3 3 3 3 3 3 3 3 3 ...
## $ f3 : int 2 2 2 2 2 2 2 2 2 2 ...
## $ PE : int 2 2 2 2 2 2 2 2 2 2 ...
## $ PE.int : int 3 3 3 3 3 3 3 3 3 3 ...
## $ NE : int 4 4 4 4 4 4 4 4 4 4 ...
## $ NE.int : int 4 4 4 4 4 4 4 4 4 4 ...
## $ smoke : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
## $ coffe : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
## $ alcohol : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
## $ activity : Factor w/ 2 levels "0","1": 2 2 2 2 2 2 2 2 2 2 ...
## $ people : Factor w/ 2 levels "0","1": 2 2 2 2 2 2 2 2 2 2 ...
## $ daybefore.ex: Factor w/ 2 levels "0","1": NA NA NA NA NA NA NA NA NA NA ...
## $ drugs : Factor w/ 2 levels "0","1": NA NA NA NA NA NA NA NA NA NA ...
## $ drugs.which : chr NA NA NA NA ...
## $ cLess1 : num 1 1 1 1 1 1 1 1 1 1 ...
## $ cLess2 : num 0 0 0 0 0 0 0 0 0 0 ...
## $ cLessResp1 : num 0 0 0 0 0 0 0 0 0 0 ...
## $ cLessResp2 : num 0 0 0 0 0 0 0 0 0 0 ...
## $ time : POSIXct, format: "2018-11-07 11:32:45" "2018-11-07 11:32:45" ...
## $ age : num 25 25 25 25 25 25 25 25 25 25 ...
## $ gender : Factor w/ 2 levels "F","M": 2 2 2 2 2 2 2 2 2 2 ...
## $ BMI : num 19.2 19.2 19.2 19.2 19.2 ...
## $ drugs : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
## $ dysfun : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
## $ postponed : num 0 0 0 0 0 0 0 0 0 0 ...
ID
= participant identification code
day
= day of assessment (1, 2 or 3)
block
= daily task block (1 or 2)
SessionDate
= date of the task session (yyyy-mm-dd
hh:mm:ss)
SessionTime
= time within day of each task session
(hh:mm:ss)
TrialType
= trial type (Go = trials requiring to
answer, No-go = trials requiring not to answer)
letter
= trial letter stimulus
Jitter
= ms randomly added to the inter-stimulus
interval (ISI) for each trial (base ISI = 400ms)
ACC
= trial accuracy (Go trials: 1 = response, 0 =
omission; No-go trials: 1 = inhibition, 0 = commission)
RT
= trial reaction time (ms) (i.e., RT = 0 when ACC
= 0)
Same variables included in the ema
dataset but only
those referred to the last HRV recording segment (i.e.,
within.day = 6
)
Same variables included in the ema
dataset but only
those referred to the last ESM recording segment (i.e.,
within.day = 5
)
Same variables included in the ema
and the
between
dataset
The block
dataset includes all collected variables, with
one row for each block of the go/no-go task. Again, HRV
data is included only for the segments recorded in the lab
(within.day
= 6), whereas ESM
information is
included only as the last ESM data entry
(within.day = 5
).
str(block)
## 'data.frame': 582 obs. of 57 variables:
## $ ID : Factor w/ 97 levels "HRV001","HRV002",..: 1 1 1 1 1 1 2 2 2 2 ...
## $ day : int 1 1 2 2 3 3 1 1 2 2 ...
## $ block : num 1 2 1 2 1 2 1 2 1 2 ...
## $ Go.ACC : num 0.988 0.988 1 1 1 ...
## $ NoGo.ACC : num 0.889 0.778 0.778 0.852 0.741 ...
## $ RT : num 334 311 319 301 313 ...
## $ RT.sd : num 41.4 35.9 42.4 36.2 37.5 ...
## $ ICV : num 0.124 0.116 0.133 0.12 0.12 ...
## $ time.HRV : num 6 6 6 6 6 6 6 6 6 6 ...
## $ PI : num 831 831 651 651 826 ...
## $ HR : num 72.5 72.5 92.4 92.4 73 ...
## $ RMSSD : num 49.3 49.3 29.9 29.9 54.5 ...
## $ RMSSDcv : num 5.93 5.93 4.59 4.59 6.6 ...
## $ nPI : int 149 149 188 188 145 145 178 178 179 179 ...
## $ miss : num 0 0 0 0 0 0 0 0 0 0 ...
## $ out : num 0 0 0 0 0 0 0 0 0 0 ...
## $ nArtefacts : int 0 0 1 1 0 0 25 25 0 0 ...
## $ pArtefacts : num 0 0 0.532 0.532 0 ...
## $ art10 : num 0 0 0 0 0 0 1 1 0 0 ...
## $ inclRate60 : num 0 0 0 0 0 0 0 0 0 0 ...
## $ noisyPI : num 0 0 0 0 0 0 0 0 0 0 ...
## $ os : Factor w/ 2 levels "Android","iOS": 1 1 1 1 1 1 1 1 1 1 ...
## $ gender : Factor w/ 2 levels "F","M": 2 2 2 2 2 2 2 2 2 2 ...
## $ time.ESM : num 5 5 5 5 5 5 5 5 5 5 ...
## $ RunTimestamp: POSIXct, format: "2018-11-06 16:09:16.716" "2018-11-06 16:09:16.716" ...
## $ v1 : int 3 3 5 5 4 4 4 4 5 5 ...
## $ v2 : num 4 4 5 5 4 4 3 3 5 5 ...
## $ v3 : num 2 2 4 4 4 4 4 4 4 4 ...
## $ t1 : num 3 3 5 5 4 4 5 5 5 5 ...
## $ t2 : int 4 4 5 5 3 3 5 5 5 5 ...
## $ t3 : int 6 6 6 6 4 4 3 3 5 5 ...
## $ f1 : int 3 3 3 3 3 3 2 2 4 4 ...
## $ f2 : num 3 3 3 3 3 3 2 2 5 5 ...
## $ f3 : int 2 2 5 5 3 3 2 2 3 3 ...
## $ PE : int 2 2 1 1 NA NA 1 1 1 1 ...
## $ PE.int : int 3 3 1 1 NA NA 1 1 2 2 ...
## $ NE : int 4 4 2 2 NA NA 1 1 1 1 ...
## $ NE.int : int 4 4 5 5 NA NA 1 1 1 1 ...
## $ smoke : Factor w/ 2 levels "0","1": 1 1 1 1 NA NA 2 2 2 2 ...
## $ coffe : Factor w/ 2 levels "0","1": 1 1 1 1 NA NA 1 1 1 1 ...
## $ alcohol : Factor w/ 2 levels "0","1": 1 1 1 1 NA NA 1 1 1 1 ...
## $ activity : Factor w/ 2 levels "0","1": 2 2 2 2 NA NA 1 1 2 2 ...
## $ people : Factor w/ 2 levels "0","1": 2 2 1 1 NA NA 1 1 1 1 ...
## $ daybefore.ex: Factor w/ 2 levels "0","1": NA NA NA NA NA NA NA NA NA NA ...
## $ drugs : Factor w/ 2 levels "0","1": NA NA NA NA NA NA NA NA NA NA ...
## $ drugs.which : chr NA NA NA NA ...
## $ cLess1 : num 1 1 0 0 0 0 0 0 0 0 ...
## $ cLess2 : num 0 0 0 0 0 0 0 0 0 0 ...
## $ cLessResp1 : num 0 0 0 0 0 0 0 0 0 0 ...
## $ cLessResp2 : num 0 0 0 0 0 0 0 0 0 0 ...
## $ time : POSIXct, format: "2018-11-07 11:32:45" "2018-11-07 11:32:45" ...
## $ age : num 25 25 25 25 25 25 28 28 28 28 ...
## $ gender : Factor w/ 2 levels "F","M": 2 2 2 2 2 2 2 2 2 2 ...
## $ BMI : num 19.2 19.2 19.2 19.2 19.2 ...
## $ drugs : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
## $ dysfun : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
## $ postponed : num 0 0 0 0 0 0 0 0 0 0 ...
ID
= participant identification code
day
= day of assessment (1, 2 or 3)
block
= daily task block (1 or 2)
Go.ACC
= block-related mean accuracy to Go trials
(0-1)
NoGo.ACC
= block-related mean accuracy to No-go
trials (0-1)
RT
= block-related mean reaction time (ms) to
accurate Go trials
RT.sd
= block-related standard deviation of reaction
times (ms) to accurate Go trials (0-1)
ICV
= block-related intraindividual coefficient of
variation (0-1)
Same variables included in the ema
dataset but only
those referred to the last HRV recording segment (i.e.,
within.day = 6
)
Same variables included in the ema
dataset but only
those referred to the last ESM recording segment (i.e.,
within.day = 5
)
Same variables included in the ema
and the
between
dataset
Finally, we export the final pre-processed datasets to be used in the subsequent analyses. Each dataset is saved both in the RData and the CSV file format.
# saving as RData
save(ema,file="3data/ema_preProcessed.RData")
save(between,file="3data/between_preProcessed.RData")
save(gng,file="3data/gng_preProcessed.RData")
save(block,file="3data/block_preProcessed.RData")
# saving as CSV
write.csv(ema,file="3data/ema_preProcessed.CSV",row.names=FALSE)
write.csv(between,file="3data/between_preProcessed.CSV",row.names=FALSE)
write.csv(gng,file="3data/gng_preProcessed.CSV",row.names=FALSE)
write.csv(block,file="3data/block_preProcessed.CSV",row.names=FALSE)
Curran, P. G. (2016). Methods for the detection of carelessly invalid responses in survey data. Journal of Experimental Social Psychology, 66, 4-19. https://doi.org/10.1016/j.jesp.2015.07.006
de Geus, E. J., Gianaros, P. J., Brindle, R. C., Jennings, J. R., & Berntson, G. G. (2019). Should heart rate variability be “corrected” for heart rate? Biological, quantitative, and interpretive considerations. Psychophysiology, 56(2), e13287. https://doi.org/10.1111/psyp.13287
Geeraerts, J. (2020, May 20). Investigating careless responding detection techniques in experience sampling methods. Retrieved from https://osf.io/d798j/
Kaufmann, T., Sütterlin, S., Schulz, S. M., & Vögele, C. (2011). ARTiiFACT: A tool for heart rate artifact processing and heart rate variability analysis. Behavior Research Methods, 43(4), 1161–1170. https://doi.org/10.3758/s13428-011-0107-7
Xiong, H., Huang, Y., Barnes, L. E., & Gerber, M. S. (2016). Sensus: a cross-platform, general-purpose system for mobile crowdsensing in human-subject studies. Proceedings of the 2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing, 415–426. https://doi.org/10.1145/2971648.2971711