Luca Menghini,\(^1\) Massimiliano Pastore,\(^2\) & Cristian Balducci\(^1\)
\(^1\)Department of Psychology, University of Bologna, Bologna, Italy
\(^2\)Department of Developmental and Social Psychology, University of Padova, Padua, Italy
The document includes the R code used to pre-process the raw data collected with the Sensus Mobile app (Xiong et al., 2016) and the Typeform (Barcelona, Spain) platform for the study “Workplace stress in real time: Three parsimonious scales for the experience sampling measurement of stressors and strain at work”. Specifically, it covers the reading and integration of single raw data files, the recoding of the measured variables, and the anonymization of the sample to produce the GDPR-compliant data files used in the main analyses, accompanied with a data dictionary.
The following R packages are used in this document (see References section):
# required packages
packages <- c("jsonlite","tcltk","mgsub","birk","dplyr","tidyr","labourR","data.table","magrittr","plyr")
# generate packages references
knitr::write_bib(c(.packages(), packages),"packagesDataProc.bib")
# # run to install missing packages
# xfun::pkg_attach2(packages, message = FALSE); rm(list=ls())
Frist, we read the raw data files obtained with the experience
sampling method ESMdata
and the preliminary questionnaire
RETROdata
.
# removing all objets from the workspace
rm(list=ls())
# setting system time zone to GMT (for consistent temporal synchronization)
Sys.setenv(tz="GMT")
Here, the readSurveyData()
function is used to read the
raw JSON data saved by the Sensus Mobile app and downloaded from our
private AWS S3 bucket to the data.path
directory. The
probe.definition
argument is used to read the Probe
Definition files downloaded from the Sensus Mobile app (Protocol
-> Probe -> Scripted Interaction -> Share definition) to
couple input IDs with input names (i.e., item labels).
readSurveyData()
readSurveyData <- function(data.path,probe.definition){ require(jsonlite); require(tcltk); options(digits.secs=3)
# 1. Reading data
# .......................................
# listing files in data path
paths = list.files(data.path,recursive=TRUE,full.names=TRUE,include.dirs=FALSE)
# taking only variables of interest
var.names <- c("ParticipantId","Timestamp","InputId","Response","RunTimestamp","SubmissionTimestamp",
"ScriptName","ProtocolId","$type")
# dataframe creation and population
data <- as.data.frame(matrix(nrow=0,ncol=9))
colnames(data) <- var.names
ScheduledTimestamp <- vector()
pb <- tkProgressBar("(1/2) Data reading:", "Data reading %",0, 100, 0) # progress bar
for(path in paths){ info <- sprintf("%d%% done", round(which(paths==path)/length(paths)*100))
setTkProgressBar(pb, round(which(paths==path)/length(paths)*100), title=paste("(1/2) Data reading:",info),info)
if(file.info(path)$size>0){ # read only Datum files (i.e., containing ScriptDatum, if > 0 Kb)
new.data <- read_json(path,simplifyDataFrame=TRUE)
if(class(new.data)=="data.frame" & !is.null(new.data$Response)){ # keep only files with information
if(class(new.data$Response)=="data.frame"){ # sometimes responses are read as dataframe
new.data$Response <- as.character(new.data$Response$`$values`)}
data <- rbind(data,new.data[var.names])}}}
close(pb)
# no responses are saved when participant or input ID is not showed (those rows are removed)
data <- data[!is.na(data$ParticipantId),]
data <- data[!is.na(data$InputId),]
# some other minor settings
row.names(data) <- as.character(1:nrow(data))
data$Timestamp <- as.POSIXct(data$Timestamp,format="%Y-%m-%dT%H:%M:%S")
names(data)[9] <- "os" # $type as OS (android or iOS)
data[,9] <- gsub("Sensus.Probes.User.Scripts.ScriptDatum, Sensus","",data[,9])
# 2. Response Ids as Item labels (reported in Probe Definition file)
# ...............................................................
if(!is.na(probe.definition)){
readProbe <- function(path){ # function to read Probe Definition files
probedefinition <- read_json(path,simplifyDataFrame=TRUE) # first probe definition file
# reading input labels of the first inputGroup
inputs <- probedefinition$ScriptRunners$`$values`$Script$InputGroups$`$values`[[1]]$Inputs$`$values`[[1]]$Name
# other protocol information
infos <- probedefinition$ScriptRunners$`$values`$Script$InputGroups$`$values`[[1]]
PROTOCOL <- data.frame(protocolName=probedefinition$Protocol$Name,protocolId=probedefinition$Protocol$Id,
scriptName=infos$Name,inputName=inputs,inputId=infos$Inputs$`$values`[[1]]$Id)
# adding other InputGroups when more than one
if(length(probedefinition$ScriptRunners$`$values`$Script$InputGroups$`$values`)>1){
for(i in 2:length(probedefinition$ScriptRunners$`$values`$Script$InputGroups$`$values`)){
inputs <- probedefinition$ScriptRunners$`$values`$Script$InputGroups$`$values`[[i]]$Inputs$`$values`[[1]]$Name
infos <- probedefinition$ScriptRunners$`$values`$Script$InputGroups$`$values`[[i]]
PROTOCOL <- rbind(PROTOCOL,data.frame(protocolName=probedefinition$Protocol$Name,
protocolId=probedefinition$Protocol$Id,
scriptName=infos$Name,inputName=inputs,
inputId=infos$Inputs$`$values`[[1]]$Id)) }}
return(PROTOCOL) }
# listing files in probe.definition path
paths = list.files(probe.definition,recursive=TRUE,full.names=TRUE,include.dirs=FALSE)
PROTOCOL <- readProbe(paths[1])
# adding other Probe Definition files when more than one
if(length(list.files(probe.definition))>1){
for(path in paths[2:length(paths)]){
PROTOCOL2 <- readProbe(path)
PROTOCOL <- rbind(PROTOCOL,PROTOCOL2) }}
# using Probe Definition info to convert inputID to inputName
pb <- tkProgressBar("(2/2) Data processing:", "Data processing %",0, 100, 0) # progress bar
for(i in 1:nrow(data)){ info <- sprintf("%d%% done", round(i/nrow(data)*100))
setTkProgressBar(pb, round(i/nrow(data)*100), title=paste("(2/2) Converting InputIDs to InputNames", info), info)
for(j in 1:nrow(PROTOCOL)){ if(!is.na(data[i,3]) & data[i,3]==PROTOCOL[j,5]){
data[i,3] <- as.character(PROTOCOL[j,4]) }}}
close(pb) }
# 3. Cleaning and unlisting Response data
# ...............................................................
# cleaning categorical items from Sensus system info
data$Response <- gsub("list","",data$Response)
data$Response <- gsub(paste("c","\\(|\\)",sep=""),"",data$Response)
data$Response <- gsub("\\(|\\)","",data$Response)
data$Response <- gsub("\\[|\\]","",data$Response)
data$Response <- gsub("\\$type` = \"System.Collections.Generic.List`1System.Object, mscorlib, mscorlib\", ",
"", data$Response)
data$Response <- gsub("\\$values","",data$Response)
data$Response <- gsub('``` = ', "",data$Response)
data$Response <- gsub('\ ', "",data$Response)
data$Response <- gsub('\"', "",data$Response)
data$Response <- gsub('\"No\"', "No",data$Response)
data$Response <- gsub('\"Sì\"', "Si",data$Response)
# unlisting Response column
if(class(data$Response)=="data.frame"){ data$Response <- as.character(data$Response$`$values`[[1]])
} else { data$Response <- as.character(data$Response) }
# 4. Encoding time information
# ...............................................................
# TIMESTAMP variables
data[,c("Timestamp","RunTimestamp",
"SubmissionTimestamp")] <- lapply(data[,c("Timestamp","RunTimestamp",
"SubmissionTimestamp")],function(x)
as.POSIXct(x,format="%Y-%m-%dT%H:%M:%OS")+1*60*60) # adding 1h
# Create indicator for the week day (e.g. Monday=1)
data$day.of.week <- as.POSIXlt(data$RunTimestamp)$wday
# 5. Sorting columns and Reshaping
# ...............................................................
colnames(data)[1] <- "ID"
data <- data[,c("ID","os","ProtocolId","ScriptName","day.of.week","RunTimestamp",
"SubmissionTimestamp","InputId","Response")]
# reshaping
data <- reshape(data,v.names=c("Response"),timevar=c("InputId"),idvar=c("RunTimestamp","SubmissionTimestamp"),
direction=c("wide"),sep="")
colnames(data) <- gsub("Response","",colnames(data)) # removing label "Response" from ResponseId
# sorting by ID and RunTimestamp
data <- data[order(data$ID,data$RunTimestamp),]
# Create row identifier within each day (within.day)
data <- plyr::ddply(data,c("ID","day.of.week"),transform,within.day=seq_along(day.of.week))
# within.day just after day.of.week column
data <- data.frame(cbind(data[1:3],data[ncol(data)],data[5:ncol(data)-1]))
data <- data[order(data$ID,data$RunTimestamp),]
# 6. # Correcting wrongly encoded item labels
# ...............................................................
data[data$ID=="LRSM1963"&is.na(data$OCS),"v1.male.bene"] <-
data[data$ID=="LRSM1963"&is.na(data$OCS),"X17758111.14e5.4f30.a113.06e2a08468ed"]
data[data$ID=="LRSM1963"&is.na(data$OCS),"t1.rilassato.teso"] <-
data[data$ID=="LRSM1963"&is.na(data$OCS),"X17758111.14e5.4f30.a113.06e2a08468ed"]
data[data$ID=="LRSM1963"&is.na(data$OCS),"e1.stanco.sveglio"] <-
data[data$ID=="LRSM1963"&is.na(data$OCS),"X3edca97c.e6a6.4146.9158.b35088f48033"]
data[data$ID=="LRSM1963"&is.na(data$OCS),"v2.soddisfatto.insoddisfatto"] <-
data[data$ID=="LRSM1963"&is.na(data$OCS),"eaa55e1f.26ec.43e0.9e18.ee853ed0acfb"]
data[data$ID=="LRSM1963"&is.na(data$OCS),"t2.agitato.calmo"] <-
data[data$ID=="LRSM1963"&is.na(data$OCS),"X15fff3cc.3d17.483b.ae5f.fee562a6a916"]
data[data$ID=="LRSM1963"&is.na(data$OCS),"e2.pieno.privodenergia"] <-
data[data$ID=="LRSM1963"&is.na(data$OCS),"X86eb9c9c.1426.4698.a3c4.7f0d0a449bf3"]
data[data$ID=="LRSM1963"&is.na(data$OCS),"v3.positivo.negativo"] <-
data[data$ID=="LRSM1963"&is.na(data$OCS),"X1c5d39de.1bb6.4499.8b2d.7799342cc5d2"]
data[data$ID=="LRSM1963"&is.na(data$OCS),"t3.nervoso.tranquillo"] <-
data[data$ID=="LRSM1963"&is.na(data$OCS),"c70b6861.e839.4926.ba3b.2627ab964df2"]
data[data$ID=="LRSM1963"&is.na(data$OCS),"e3.affaticato.fresco"] <-
data[data$ID=="LRSM1963"&is.na(data$OCS),"X8b9d46c9.d406.44d7.9784.6a8542b3da3c"]
data[data$ID=="LRSM1963"&is.na(data$OCS),"WHAT"] <-
data[data$ID=="LRSM1963"&is.na(data$OCS),"e869f6f8.6c65.4d73.921e.6aeeddf4f452"]
data[data$ID=="LRSM1963"&is.na(data$OCS),"HOW"] <-
data[data$ID=="LRSM1963"&is.na(data$OCS),"c9bf9854.adeb.4138.ace0.2507a25304d4"]
data[data$ID=="LRSM1963"&is.na(data$OCS),"nPEOPLE"] <-
data[data$ID=="LRSM1963"&is.na(data$OCS),"X814adbd8.c4f3.4948.83f3.77d119226bf5"]
data[data$ID=="LRSM1963"&is.na(data$OCS),"WHOM"] <-
data[data$ID=="LRSM1963"&is.na(data$OCS),"b2f5e3a2.0f98.49dd.9934.b851b36b8407"]
data[data$ID=="LRSM1963"&is.na(data$OCS),"d1.da.fare"] <-
data[data$ID=="LRSM1963"&is.na(data$OCS),"ffa2f10f.838b.48e3.87af.357a412c3eca"]
data[data$ID=="LRSM1963"&is.na(data$OCS),"d2.veloce"] <-
data[data$ID=="LRSM1963"&is.na(data$OCS),"X1939307c.ef3b.41b2.870a.d3feecc6f189"]
data[data$ID=="LRSM1963"&is.na(data$OCS),"d3.multitask"] <-
data[data$ID=="LRSM1963"&is.na(data$OCS),"f1ff6f5e.1861.419a.9be1.e22266c0d264"]
data[data$ID=="LRSM1963"&is.na(data$OCS),"d4.intensa"] <-
data[data$ID=="LRSM1963"&is.na(data$OCS),"X68909230.3374.4f21.a47f.d4816048a96c"]
data[data$ID=="LRSM1963"&is.na(data$OCS),"c1.cambiare"] <-
data[data$ID=="LRSM1963"&is.na(data$OCS),"dc871f0d.2da0.4f31.ab8d.5372f5acf5eb"]
data[data$ID=="LRSM1963"&is.na(data$OCS),"c2.come"] <-
data[data$ID=="LRSM1963"&is.na(data$OCS),"X4ab74ec5.8585.4154.bb01.f049f66ebb56"]
data[data$ID=="LRSM1963"&is.na(data$OCS),"c3.tempo"] <-
data[data$ID=="LRSM1963"&is.na(data$OCS),"X83c7368b.fce3.406c.804e.760af23cf6a6"]
data[data$ID=="LRSM1963"&is.na(data$OCS),"OCS"] <-
data[data$ID=="LRSM1963"&is.na(data$OCS),"e3f552b1.d076.43ee.ade9.eee2c6d26316"]
data <- data[,1:35]
# ID as factor
data$ID <- as.factor(as.character(data$ID))
return(data) }
# ESM data reading & encoding
ESMdata <- readSurveyData(data.path="data",probe.definition="probe")
# sanity check (2192, 176)
cat("Read",nrow(ESMdata),"observervations from",nlevels(ESMdata$ID),"participants")
## Read 2192 observervations from 176 participants
Here, we read the data obtained with the retrospective preliminary questionnaire and exported as a CSV file from Typeform. We can already see a No. of rows higher than the No. of participants’ identification codes, implying some double responses.
# retrospective data reading
RETROdata <- read.csv("responses.csv")
# sanity check (204, 201)
cat("Read",nrow(RETROdata),"observervations from",
nlevels(as.factor(as.character(RETROdata$Inserisca.il.suo..CODICE.PERSONALE.))),"participants")
## Read 204 observervations from 201 participants
Second, we recode the two datasets by removing unuseful columns, re-setting variable labels and classes, recoding variables, and renaming relevant columns. We also create some variables to encode participants’ compliance, and we remove double responses.
Here, we recode ESMdata
.
We start by renaming mood items and by recoding them to express negative mood dimensions.
# converting numeric responses as numeric
nums <- c("d1.da.fare","d2.veloce","d3.multitask","d4.intensa","c1.cambiare","c2.come","c3.tempo","OCS",
"t1.rilassato.teso","e1.stanco.sveglio","v2.soddisfatto.insoddisfatto","t2.agitato.calmo",
"e2.pieno.privodenergia","v3.positivo.negativo","t3.nervoso.tranquillo","e3.affaticato.fresco",
"v1.male.bene","event1.negativi","event2.intensity.n","event3.positivi","event4.intensity.p","nPEOPLE")
ESMdata[,nums] <- lapply(ESMdata[,nums],as.numeric)
# HEDONIC TONE = Negative Valence (NA)
ESMdata$v1.male.bene <- 8 - ESMdata$v1.male.bene
ESMdata$v3.positivo.negativo <- 8 - ESMdata$v3.positivo.negativo
colnames(ESMdata)[which(colnames(ESMdata)=="v3.positivo.negativo")] <- "v3.negativo.positivo" # correcting incorrect label
# NA items were differently labeled in survey 1 ("Survey Mattina") and in all other surveys
# item v2 was "Sat. - Unsat." in survey 1 and "Unsat. - Sat." in all other surveys
# item v3 was "Posit. - Neg." in survey 1 and "Neg. - Posit." in all other surveys
for(i in 1:nrow(ESMdata)){
if(ESMdata[i,"ScriptName"]=="Survey Mattina"&!is.na(ESMdata[i,"v2.soddisfatto.insoddisfatto"])){
ESMdata[i,"v2.soddisfatto.insoddisfatto"] <- 8 - ESMdata[i,"v2.soddisfatto.insoddisfatto"]}
if(ESMdata[i,"ScriptName"]=="Survey Mattina"&!is.na(ESMdata[i,"v3.negativo.positivo"])){
ESMdata[i,"v3.negativo.positivo"] <- 8 - ESMdata[i,"v3.negativo.positivo"]}}
# TENSE AROUSAL (TA)
ESMdata$t2.agitato.calmo <- 8 - ESMdata$t2.agitato.calmo
ESMdata$t3.nervoso.tranquillo <- 8 - ESMdata$t3.nervoso.tranquillo
# ENERGETIC AROUSAL = Fatigue (FA)
ESMdata$e1.stanco.sveglio <- 8 - ESMdata$e1.stanco.sveglio
ESMdata[!is.na(ESMdata$e3.affaticato.fresco)&ESMdata$e3.affaticato.fresco=="NULL","e3.affaticato.fresco"] <- NA
ESMdata$e3.affaticato.fresco <- 8 - as.numeric(as.character(ESMdata$e3.affaticato.fresco))
Then, we sort ESMdata
columns to reflect item order in
the ESM forms, we select the considered variables, and we rename the
columns in a simpler way.
# selecting and sorting columns
ESMdata <- cbind(ESMdata[,1:8], #......................................................... Participant and occasion info
ESMdata$v1.male.bene,ESMdata$v2.soddisfatto.insoddisfatto,ESMdata$v3.negativo.positivo, #.... Strain (Mood)
ESMdata$t1.rilassato.teso,ESMdata$t2.agitato.calmo,ESMdata$t3.nervoso.tranquillo,
ESMdata$e1.stanco.sveglio,ESMdata$e2.pieno.privodenergia,ESMdata$e3.affaticato.fresco,
ESMdata$WHAT,ESMdata$HOW,ESMdata$WHOM,ESMdata$nPEOPLE, #..................................... Work sampling
ESMdata$d1.da.fare,ESMdata$d2.veloce,ESMdata$d3.multitask,ESMdata$d4.intensa, #.. Stressors (demand & ctrl)
ESMdata$c1.cambiare,ESMdata$c2.come,ESMdata$c3.tempo)
# renaming columns
colnames(ESMdata)[9:ncol(ESMdata)] <- c("v1","v2","v3","t1","t2","t3","f1","f2","f3",
"WHAT","HOW","WHOM","nPeople","d1","d2","d3","d4","c1","c2","c3")
Finally, we recode the remaining categorical variables, and we translate the work sampling items categories in English.
# from ProtocolId (different protocols per gender) to "gender"
colnames(ESMdata)[which(colnames(ESMdata)=="ProtocolId")] <- "gender"
ESMdata$gender <- gsub("ProtocolWork","",ESMdata$gender)
ESMdata$gender <- as.factor(gsub("6de7da11-4919-4fc3-a420-9d9b42024526","M",ESMdata$gender))
# WORK SAMPLING: KNOWLEDGE WORK ACTIVITIES (what)
ESMdata$WHAT <- gsub("ANALISIesame/elaborazionediinformazioniqualitativeoquantitativepermigliorarnelacomprensione",
"ANALYSIS",ESMdata$WHAT)
ESMdata$WHAT <- gsub("RICERCAOACQUISIZIONEINFORMAZIONIconsultazionedifontielettroniche/cartacee,studiooapprendimentopersviluppareconoscenzepersonali,progetti,prodottioservizi",
"ACQUISITION",ESMdata$WHAT)
ESMdata$WHAT <- gsub("AUTHORINGcreazione/composizionedicontenutitestualiomultimediali",
"AUTHORING",ESMdata$WHAT)
ESMdata$WHAT <- gsub("NETWORKINGinterazioneconpersone/entiperraccogliere/scambiareinformazioniofarecontatti",
"NETWORKING",ESMdata$WHAT)
ESMdata$WHAT <- gsub("DIVULGAZIONEinsegnamento,presentazioneocondivisionediinformazioni",
"DISSEMINATION",ESMdata$WHAT)
ESMdata$WHAT <- gsub("ATTIVITÀAMMINISTRSTIVEpraticheburocraticheroutinarie",
"ADMINISTRATIVE",ESMdata$WHAT)
ESMdata$WHAT <- gsub("PAUSA-->indicaancheL'ATTIVITÀSVOLTAPRIMAdellapausaeriferiscitiaquestaperleprossimedomande",
"BREAK",ESMdata$WHAT)
ESMdata$WHAT <- gsub("ALTRO","OTHER",ESMdata$WHAT)
ESMdata$WHAT <- gsub("OTHER,","",ESMdata$WHAT) # when OTHER and another activity, only the second activity is reported
ESMdata$WHAT <- gsub("\n","",ESMdata$WHAT)
# WORK SAMPLING: MEAN OF WORK (how)
ESMdata$HOW <- gsub("Alcomputer","PC",ESMdata$HOW)
ESMdata$HOW <- gsub("Facciaafaccia/oralmente","FACE2FACE",ESMdata$HOW)
ESMdata$HOW <- gsub("Condocumenticartacei","PAPER",ESMdata$HOW)
ESMdata$HOW <- gsub("Altelefono","PHONE",ESMdata$HOW)
ESMdata$HOW <- gsub("Videoconferenzaes.Skype","SKYPE",ESMdata$HOW)
ESMdata$HOW <- gsub("Consmarphone/tablet","SMARTPHONE",ESMdata$HOW)
ESMdata$HOW <- gsub("Altro","OTHER",ESMdata$HOW)
# WORK SAMPLING: PEOPLE INVOLVED IN THETASK (whom)
ESMdata$WHOM <- gsub("Nessuno","ALONE",ESMdata$WHOM)
ESMdata$WHOM <- gsub("Colleghi","COLL",ESMdata$WHOM)
ESMdata$WHOM <- gsub("Sottoposti","UNDER",ESMdata$WHOM)
ESMdata$WHOM <- gsub("Superiori","OVER",ESMdata$WHOM)
ESMdata$WHOM <- gsub("Fornitorioaltricollaboratoriesterni","EXTERNAL",ESMdata$WHOM)
ESMdata$WHOM <- gsub("Clienti/utentidelservizio","CUSTOMER",ESMdata$WHOM)
ESMdata$WHOM <- gsub("Familiari/amici","FAMILY",ESMdata$WHOM)
ESMdata$WHOM <- gsub("Altro","OTHER",ESMdata$WHOM)
# categorical variables as factor
ESMdata[,c("ID","os","WHAT","HOW","WHOM")] <- lapply(ESMdata[,c("ID","os","WHAT","HOW","WHOM")],as.factor)
Then, we use the time.AND.double
function for checking
and adjusting time-related variables (i.e., within.day
and
day.of.week
), fixing daylight time (i.e.,
between March 29th and October 27th, 2019), and removing double
responses.
time.AND.double <- function(data=data,doubleSurvey.exclude=TRUE,doubleProtocol.exclude=TRUE){ require(mgsub)
# 1) Fixing double surveys
# ..................................................................
data$ID <- as.character(data$ID)
# 1.1. participants who re-runned the protocol and changed their id
data[data$ID=="Livio",1] <- "Gftg1945"
# 1.2. participants with the same id (siblings?)
# LFAI1940 (1 male, 1 female) --> LFAI19402 = male
data[data$ID=="LFAI1940"&data$gender=="M","ID"] <- "LFAI19402"
# MCVD1959 (2 males) --> MCVD19592 started later
data[(data$ID=="MCVD1959"&as.POSIXct(as.character(data$RunTimestamp))>as.POSIXct("2019-06-24 08:15:00 GMT")),"ID"] <- "MCVD19592"
# saving sample information
N2.original <- length(levels(as.factor(data$ID))) # 177
N1.original <- nrow(data) # 2192
# 1.3. excluding double response to the same survey (doubleSurvey.exclude)
# ..................................................................
if(doubleSurvey.exclude==TRUE){
# RSCC1961 on FRY sent twice responses to survey5 -> take the 2nd one bcs SC is missing in the 1st one
data <- data[!(data$ID=="RSCC1961" & as.character(data$RunTimestamp)=="2018-11-16 14:50:23.035"),]
# MFPW1957 on FRY sent twice responses to survey3 -> take the 2nd one bcs SC is missing in the 1st one
data <- data[!(data$ID=="MFPW1957" & as.character(data$RunTimestamp)=="2018-12-07 13:21:24.265"),]
# VSLF1952 on MON sent three times responses to survey 6 -> take the 2nd bcs data are missing in the 1st and 3rd
data <- data[!(data$ID=="VSLF1952" & as.character(data$RunTimestamp)=="2019-11-04 16:38:53.588" &
is.na(data$c3)),]
data <- data[!(data$ID=="VSLF1952" & as.character(data$RunTimestamp)=="2019-11-04 16:44:32.546"),]}
# saving sample information
N1.doubleSurvey <- N1.original - nrow(data) # 2
# 1.4. excluding double protocols (i.e., participants who re-runned the protocol (doubleProtocol.exclude)
# ..................................................................
if(doubleProtocol.exclude==TRUE){
# 1.4.1. re-runned the protocol because of technical problems on the first time
# ..............................................................................
# MFPW1957 repeated the protocol (technical problems on Monday)
data <- data[!(data$ID=="MFPW1957"&substr(data$RunTimestamp,start=6,stop=10)=="12-10"),]
# 05101985 repeated the protocol one day (technical problems)
data <- data[!(data$ID=="05101985" & data$day.of.week==5),]
data[data$ID=="Nico",1] <- "05101985"
# OQAB1946 repeated the protocol (technical problems on Monday)
data <- data[!(data$ID=="OQAB1946"&substr(data$RunTimestamp,start=6,stop=10)=="12-05"),]
# PFCZ1960 repeated the protocol (technical problems on Wednesday and Friday)
data <- data[!(data$ID=="PFCZ1960"&substr(data$RunTimestamp,start=6,stop=10)=="11-28"),]
data <- data[!(data$ID=="PFCZ1960"&substr(data$RunTimestamp,start=6,stop=10)=="11-30"),]
data <- data[!(data$ID=="PFCZ1960"&substr(data$RunTimestamp,start=6,stop=10)=="12-19"),]
data <- data[!(data$ID=="PFCZ1960"&substr(data$RunTimestamp,start=6,stop=10)=="12-21"),]
# saving sample information
N1.doubleProtocol.tech <- N1.original - nrow(data) - N1.doubleSurvey # 17
# 1.4.2. re-runned the protocol because of few surveys on the first time
# ..............................................................................
# EDLF1948 re-run the protocol on Monday (few surveys) and changed ID into EDLF1946
data <- data[!(data$ID=="EDLF1948"&data$day.of.week==1),]
data[(data$ID=="EDLF1946"&data$day.of.week==1),1] <- "EDLF1948"
# Gcsb1961 repeted the protocol on Monday (few surveys)
data <- data[!(data$ID=="Gcsb1961" & substr(data$RunTimestamp,start=6,stop=10)=="02-18"),]
# APRT54 repeted the protocol on Monday (only 2 surveys)
data <- data[!(data$ID=="APRT54" & substr(data$RunTimestamp,start=6,stop=10)=="12-05"),]
# MMCP1956 repeted the protocol on Monday (few surveys)
data <- data[!(data$ID=="MMCP1956" & substr(data$RunTimestamp,start=6,stop=10)=="02-11"),]
# PSIS1945 repeated the protocol on Friday (few surveys)
data <- data[!(data$ID=="PSIS1945" & substr(data$RunTimestamp,start=6,stop=10)=="03-08"),]
# Ugiila17L repeated the protocol on Monday (only 1 survey)
data <- data[!(data$ID=="Ugiila17L" & substr(data$RunTimestamp,start=6,stop=10)=="03-18"),]
# FFCR1933 repeated the protocol on Monday (only 2 surveys)
data <- data[!(data$ID=="FFCR1933" & substr(data$RunTimestamp,start=6,stop=10)=="03-22"),]
# FS960214 repeated the protocol on Monday (only 1 survey)
data <- data[!(data$ID=="FS960214" & substr(data$RunTimestamp,start=6,stop=10)=="03-22"),]
# MAGV1960 repeated the protocol on Friday (only 1 survey)
data <- data[!(data$ID=="MAGV1960" & substr(data$RunTimestamp,start=6,stop=10)=="03-29"),]
# Gcrb1950 repeated the protocol on Monday (only 2 survey)
data <- data[!(data$ID=="Gcrb1950" & substr(data$RunTimestamp,start=6,stop=10)=="04-15"),]
# Adms1967 repeated the protocol on Monday (only 2 survey)
data <- data[!(data$ID=="Adms1967" & substr(data$RunTimestamp,start=6,stop=10)=="05-08"),]
# CCMC1967 repeated the protocol on Wednesday (2 + 2, taking first)
data <- data[!(data$ID=="CCMC1967" & substr(data$RunTimestamp,start=6,stop=10)=="11-06"),]
# MGRB1964 repeated the protocol on Wednesday (1 + 4, taking 4)
data <- data[!(data$ID=="MGRB1964" & substr(data$RunTimestamp,start=6,stop=10)=="04-03"),]
# saving sample information
N1.doubleProtocol.few <- N1.original - nrow(data) - N1.doubleSurvey - N1.doubleProtocol.tech # 21
# 1.4.3. re-runned the protocol on their initiative (some forgot to quit the app)
# ..............................................................................
# ABLM1923 repeated the protocol (Monday twice: 5 + 1) --> taking 5
data <- data[!(data$ID=="ABLM1923" & substr(data$RunTimestamp,start=6,stop=10)=="12-10"),]
# ETSF1950 repeated the protocol (Friday twice: 6 + 5, Wednesday twice: 4 + 5, Monday: 4 + 5) --> taking the earliest but on Wed
data <- data[!(data$ID=="ETSF1950" & substr(data$RunTimestamp,start=6,stop=10)=="01-21"),]
data <- data[!(data$ID=="ETSF1950" & substr(data$RunTimestamp,start=6,stop=10)=="01-23"),]
data <- data[!(data$ID=="ETSF1950" & substr(data$RunTimestamp,start=6,stop=10)=="01-25"),]
# data <- data[!(data$ID=="ETSF1950" & substr(data$RunTimestamp,start=6,stop=10)=="01-30"),]
# BDRB1955 repeated the protcol (Friday twice: 4 + 4) --> taking 4 (First)
data <- data[!(data$ID=="BDRB1955" & substr(data$RunTimestamp,start=6,stop=10)=="01-25"),]
# Edmf1950 repeated the protcol (Friday twice: 3 + 4) --> taking 3 (First)
data <- data[!(data$ID=="Edmf1950" & substr(data$RunTimestamp,start=6,stop=10)=="01-25"),]
# LCPP1945 repeated the protcol several times (Friday twice: 5 + 4) --> taking 5 (Second*)
data <- data[!(data$ID=="LCPP1945" & substr(data$RunTimestamp,start=6,stop=10)=="01-23"),]
data <- data[!(data$ID=="LCPP1945" & substr(data$RunTimestamp,start=6,stop=10)=="01-25"),]
data <- data[!(data$ID=="LCPP1945" & substr(data$RunTimestamp,start=6,stop=10)=="01-28"),]
data <- data[!(data$ID=="LCPP1945" & substr(data$RunTimestamp,start=6,stop=10)=="01-30"),]
data <- data[!(data$ID=="LCPP1945" & substr(data$RunTimestamp,start=6,stop=10)=="02-01"),]
data <- data[!(data$ID=="LCPP1945" & substr(data$RunTimestamp,start=6,stop=10)=="02-06"),]
# ANMA1938 repeated the protocol (Monday twice: 5 + 2) --> taking 5, (Wednesday twice: 7 + 4) --> taking 7
data <- data[!(data$ID=="ANMA1938" & substr(data$RunTimestamp,start=6,stop=10)=="01-25"),]
data <- data[!(data$ID=="ANMA1938" & substr(data$RunTimestamp,start=6,stop=10)=="01-28"),]
data <- data[!(data$ID=="ANMA1938" & substr(data$RunTimestamp,start=6,stop=10)=="01-30"),]
data <- data[!(data$ID=="ANMA1938" & substr(data$RunTimestamp,start=6,stop=10)=="02-01"),]
# CLMB1961 repeated the protocol (Monday three times: 5 + 4 + 3) --> taking 5,
# (Wednesday three times: 4 + 4 + 5) --> taking 4,
# (Friday twice: 6 + 3) --> taking 6
data <- data[!(data$ID=="CLMB1961" & substr(data$RunTimestamp,start=6,stop=10)=="01-28"),]
data <- data[!(data$ID=="CLMB1961" & substr(data$RunTimestamp,start=6,stop=10)=="02-04"),]
data <- data[!(data$ID=="CLMB1961" & substr(data$RunTimestamp,start=6,stop=10)=="01-30"),]
data <- data[!(data$ID=="CLMB1961" & substr(data$RunTimestamp,start=6,stop=10)=="02-06"),]
data <- data[!(data$ID=="CLMB1961" & substr(data$RunTimestamp,start=6,stop=10)=="02-01"),]
# PVCR1961 repeated the protocol (Friday twice: 2 + 4) --> taking 4
data <- data[!(data$ID=="PVCR1961" & substr(data$RunTimestamp,start=6,stop=10)=="03-15"),]
# ANMA1938 repeated the protocol (Monday twice: 5 + 2) --> taking 5, (wednesday twice: 7 + 4) --> taking 7
data <- data[!(data$ID=="ANMA1938" & substr(data$RunTimestamp,start=6,stop=10)=="01-25"),]
data <- data[!(data$ID=="ANMA1938" & substr(data$RunTimestamp,start=6,stop=10)=="01-28"),]
# GMPM66 repeated the protocol (Wednesday twice: 5 + 3) --> taking 5
data <- data[!(data$ID=="GMPM66" & substr(data$RunTimestamp,start=6,stop=10)=="03-18"),]
# FS960214 repeated the protocol (Wednesday twice: 4 + 2) --> taking 4
data <- data[!(data$ID=="FS960214" & substr(data$RunTimestamp,start=6,stop=10)=="03-27"),]
# Gspt1958 repeated the protocol (Wednesday twice: 2 + 1) --> taking 2
data <- data[!(data$ID=="Gspt1958" & substr(data$RunTimestamp,start=6,stop=10)=="03-27"),]
# Reve1933 repeated the protocol (Friday twice: 6 + 4) --> taking 6
data <- data[!(data$ID=="Reve1933" & substr(data$RunTimestamp,start=6,stop=10)=="03-22"),]
# pdbr1957 repeated the protocol (Friday twice: 3 + 1, Monday twice: 3 + 2) --> taking 3
data <- data[!(data$ID=="pdbr1957" & substr(data$RunTimestamp,start=6,stop=10)=="03-29"),]
data <- data[!(data$ID=="pdbr1957" & substr(data$RunTimestamp,start=6,stop=10)=="04-01"),]
# ZFBR50 repeated the protocol (Monday twice: 1 + 3) --> taking 3 , (Wednesday twice: 3+1 --> taking 3)
data <- data[!(data$ID=="ZFBR50" & substr(data$RunTimestamp,start=6,stop=10)=="03-11"),]
data <- data[!(data$ID=="ZFBR50" & substr(data$RunTimestamp,start=6,stop=10)=="03-27"),]
# AGPP1942 repeated the protocol (Friday 3 times: 4 + 1 + 5) --> taking 4
data <- data[!(data$ID=="AGPP1942" & substr(data$RunTimestamp,start=6,stop=10)=="03-22"),]
data <- data[!(data$ID=="AGPP1942" & substr(data$RunTimestamp,start=6,stop=10)=="03-29"),] # stop after April 2nd
data <- data[!(data$ID=="AGPP1942" & as.POSIXct(as.character(data$RunTimestamp))>as.POSIXct("2019-04-02 08:15:00 GMT")),]
# AMMP1969 repeated the protocol (Friday twice: 4 + 2) --> taking 4, (Monday twice: 4 + 1) --> taking 4
data <- data[!(data$ID=="AMMP1969" & substr(data$RunTimestamp,start=6,stop=10)=="03-22"),]
data <- data[!(data$ID=="AMMP1969" & substr(data$RunTimestamp,start=6,stop=10)=="03-25"),]
# LRSM1963 repeated the protocol (Wednesday twice: 3 + 3) --> taking first
data <- data[!(data$ID=="LRSM1963" & substr(data$RunTimestamp,start=6,stop=10)=="04-17"),]
# Mr1960 repeated the protocol (Monday twice: 2 + 2) --> taking first
data <- data[!(data$ID=="Mr1960" & substr(data$RunTimestamp,start=6,stop=10)=="05-13"),]
# MSRL1936 repeated the protocol (Monday twice: 2 + 2) --> taking first
data <- data[!(data$ID=="MSRL1936" & substr(data$RunTimestamp,start=6,stop=10)=="09-16"),]
# MPPA1962 repeated the protocol (Monday twice: 1 + 2) --> taking second
data <- data[!(data$ID=="MPPA1962" & substr(data$RunTimestamp,start=6,stop=10)=="10-07"),]
# PZSP1951 repeated the protocol (Monday twice: 1 + 5) --> taking second
data <- data[!(data$ID=="PZSP1951" & substr(data$RunTimestamp,start=6,stop=10)=="10-18"),]
# saving sample information
N1.doubleProtocol.their <- N1.original - nrow(data) - N1.doubleSurvey - N1.doubleProtocol.tech -
N1.doubleProtocol.few # 114
# 1.4.4. re-runned the protocol because of sickness abncence or other reasons
# ..............................................................................
# MGEB1960 repeated the protocol on friday bcs of sickness abcence for half of the day
data <- data[!(data$ID=="MGEB1960" & substr(data$RunTimestamp,start=6,stop=10)=="03-01"),]
# GMPM66 repeated the protocol bcs of sickness abcence (monday twice: 2 + 4) --> taking 4
data <- data[!(data$ID=="GMPM66" & substr(data$RunTimestamp,start=6,stop=10)=="03-18"),]
# saving sample information
N1.doubleProtocol.other <- N1.original - nrow(data) - N1.doubleSurvey - N1.doubleProtocol.tech -
N1.doubleProtocol.few - N1.doubleProtocol.their} # 8
# saving sample information
N1.doubleProtocol <- N1.original - nrow(data) - N1.doubleSurvey # 160
N2.doubleProtocol <- N2.original - nlevels(as.factor(as.character(data$ID)))
# 2) correcting timestamp issues
# ..............................................................................
data$RunTimestamp <- as.character(data$RunTimestamp)
data$SubmissionTimestamp <- as.character(data$SubmissionTimestamp)
# 2.1. Daylight time (adding 1h between March 29th and October 27th, 2019)
# ..................................................................
data[as.POSIXct(data$RunTimestamp) >
as.POSIXct("2019-03-29 00:00:00") &
as.POSIXct(data$RunTimestamp) <
as.POSIXct("2019-10-27 00:00:00"),
"RunTimestamp"] <- as.character(as.POSIXct(as.character(data[as.POSIXct(data$RunTimestamp) >
as.POSIXct("2019-03-29 00:00:00") &
as.POSIXct(data$RunTimestamp) <
as.POSIXct("2019-10-27 00:00:00"),
"RunTimestamp"]))+1*60*60)
data[as.POSIXct(data$SubmissionTimestamp) >
as.POSIXct("2019-03-29 00:00:00") &
as.POSIXct(data$SubmissionTimestamp) <
as.POSIXct("2019-10-27 00:00:00"),
"SubmissionTimestamp"] <- as.character(as.POSIXct(as.character(data[as.POSIXct(data$SubmissionTimestamp) >
as.POSIXct("2019-03-29 00:00:00") &
as.POSIXct(data$SubmissionTimestamp) <
as.POSIXct("2019-10-27 00:00:00"),
"SubmissionTimestamp"]))+1*60*60)
# recoding participants with updated time
data[data$ID=="ATEC1963" & substr(data$RunTimestamp,6,10)=="03-29",
"RunTimestamp"] <- as.character(as.POSIXct(as.character(data[data$ID=="ATEC1963" & substr(data$RunTimestamp,6,10)=="03-29",
"RunTimestamp"]))-1*60*60)
data[data$ID=="ATEC1963" & substr(data$SubmissionTimestamp,6,10)=="03-29",
"SubmissionTimestamp"] <- as.character(as.POSIXct(as.character(data[data$ID=="ATEC1963" &
substr(data$SubmissionTimestamp,6,10)=="03-29",
"SubmissionTimestamp"]))-1*60*60)
data[data$ID=="FS960214" & substr(data$RunTimestamp,6,10)=="03-29",
"RunTimestamp"] <- as.character(as.POSIXct(as.character(data[data$ID=="FS960214" & substr(data$RunTimestamp,6,10)=="03-29",
"RunTimestamp"]))-1*60*60)
data[data$ID=="FS960214" & substr(data$SubmissionTimestamp,6,10)=="03-29",
"SubmissionTimestamp"] <- as.character(as.POSIXct(as.character(data[data$ID=="FS960214" &
substr(data$SubmissionTimestamp,6,10)=="03-29",
"SubmissionTimestamp"]))-1*60*60)
data[data$ID=="RMCS1952" & substr(data$RunTimestamp,6,10)=="03-29",
"RunTimestamp"] <- as.character(as.POSIXct(as.character(data[data$ID=="RMCS1952" & substr(data$RunTimestamp,6,10)=="03-29",
"RunTimestamp"]))-1*60*60)
data[data$ID=="RMCS1952" & substr(data$SubmissionTimestamp,6,10)=="03-29",
"SubmissionTimestamp"] <- as.character(as.POSIXct(as.character(data[data$ID=="RMCS1952" &
substr(data$SubmissionTimestamp,6,10)=="03-29",
"SubmissionTimestamp"]))-1*60*60)
# 2.2. Different time zones
# ..................................................................
# aaaaaa89's timestamps are one hour shifted (working abroad ?)
data[data$ID=="aaaaaa89",
"RunTimestamp"] <- as.character(as.POSIXct(as.character(data[data$ID=="aaaaaa89",
"RunTimestamp"]))-1*60*60)
data[data$ID=="aaaaaa89",
"SubmissionTimestamp"] <- as.character(as.POSIXct(as.character(data[data$ID=="aaaaaa89",
"SubmissionTimestamp"]))-1*60*60)
# MMCP1956's timestamps are one hour shifted (working abroad ?)
data[data$ID=="MMCP1956",
"RunTimestamp"] <- as.character(as.POSIXct(as.character(data[data$ID=="MMCP1956",
"RunTimestamp"]))-1*60*60)
data[data$ID=="MMCP1956",
"SubmissionTimestamp"] <- as.character(as.POSIXct(as.character(data[data$ID=="MMCP1956",
"SubmissionTimestamp"]))-1*60*60)
cat("*RECODING TIME DATA*",
"\n\nOriginal Sample size = ",N2.original," participants (",N1.original," surveys).",
"\n\nExcluding ",N1.doubleSurvey," surveys due to repeated surveys.",
"\nExcluding ",N1.doubleProtocol," surveys due to repeated protocol, of which: \n- ",
N1.doubleProtocol.tech," surveys due to technical problems, \n- ",
N1.doubleProtocol.few," surveys due to too few responses on the first time,\n- ",
N1.doubleProtocol.their," surveys repeated on their initiative.",
"\nRecoding ",N2.doubleProtocol," participants.",
"\n\nCurrent Sample size = ",nlevels(as.factor(data$ID))," participants (",nrow(data)," surveys).",sep="")
return(data)}
# processing data
ESMdata <- time.AND.double(ESMdata)
## *RECODING TIME DATA*
##
## Original Sample size = 178 participants (2192 surveys).
##
## Excluding 4 surveys due to repeated surveys.
## Excluding 156 surveys due to repeated protocol, of which:
## - 17 surveys due to technical problems,
## - 21 surveys due to too few responses on the first time,
## - 116 surveys repeated on their initiative.
## Recoding 2 participants.
##
## Current Sample size = 176 participants (2032 surveys).
# sanity check (2032, 176)
cat(nrow(ESMdata),"observervations from",nlevels(as.factor(as.character(ESMdata$ID))),"participants")
## 2032 observervations from 176 participants
In ESM surveys, the variable within.day
is currently
counting surveys as they where received (1°, 2°, 3°, etc.), and not as
they were scheduled, based on the RunTimestamp
variable. To
recode within.day
, the within.day.adjust()
function is used accounting for both the scheduled temporal window and
the 20-min interval between the survey notification (beep) and its
expiration.
1 = 9:15 - 10:15 + 20 min (up to 10:35), ‘baseline’ survey
(SurveyType
= “baseline”)
2 = 10:20 - 10:40 + 20 min (up to 11:00), ‘work’ survey
(SurveyType
= “work”)
3 = 11:50 - 12:10 + 20 min (up to 12:30)
4 = 13:20 - 13:40 + 20 min (up to 14:00)
5 = 14:50 - 15:10 + 20 min (up to 15:30)
6 = 16:20 - 16:40 + 20 min (up to 17:00)
7 = 17:50 - 18:10 + 20 min (up to 18:30)
Moreover, to account for the variability between devices, 20 extra minutes are subtracted and added to the lower and the higher limit of each window, respectively.
Finally, the variable day.of.week
(currently indexing
the day of the week such that Monday = 1, Tuesday = 2, etc.) is recoded
to the variable day
, indexing the day of the protocol
(i.e., Day 1, Day 2, Day 3).
within.day.adjust()
within.day.adjust <- function(data){ require(birk)
# recoding ScriptName as SurveyType
colnames(data)[which(colnames(data)=="ScriptName")] <- "SurveyType"
data$SurveyType <- gsub("Survey Lavoro","work",data$SurveyType)
data$SurveyType <- as.factor(gsub("Survey Mattina","baseline",data$SurveyType))
# time as POSIXct
data$RunTimestamp <- as.POSIXct(as.character(data$RunTimestamp))
data$SubmissionTimestamp <- as.POSIXct(as.character(data$SubmissionTimestamp))
# converting within.day
for(i in 1:nrow(data)){
# survey 1 between 9:15 (- 10 min error) and 10:15 (up to 10:35), marked as SurveyType = "baseline"
if(data[i,"SurveyType"]=="baseline"){ data[i,"within.day"] = 1 } else {
# survey 2 = 10:20 (- 10min error) up to 11:00 (+ 20min error)
if(strftime(data[i,"RunTimestamp",],
format="%H:%M:%S")<strftime("1970-01-01 11:20:00",format="%H:%M:%S")){
data[i,"within.day"] = 2 }
# survey 3 = 11:50 (- 10min error) up to 12:30 (+ 20min error)
else if(strftime(data[i,"RunTimestamp",],
format="%H:%M:%S")>strftime("1970-01-01 11:30:00",
format="%H:%M:%S") & strftime(data[i,"RunTimestamp",],
format="%H:%M:%S")<strftime("1970-01-01 12:50:00",
format="%H:%M:%S")){
data[i,"within.day"] = 3}
# survey 4 = 13:20 (- 10min error) up to 14:00 (+ 20min error)
else if(strftime(data[i,"RunTimestamp",],
format="%H:%M:%S")>strftime("1970-01-01 13:00:00",
format="%H:%M:%S") & strftime(data[i,"RunTimestamp",],
format="%H:%M:%S")<strftime("1970-01-01 14:20:00",
format="%H:%M:%S")){
data[i,"within.day"] = 4}
# survey 5 = 14:50 (- 10min error) up to 15:30 (+ 20min error)
else if(strftime(data[i,"RunTimestamp",],
format="%H:%M:%S")>strftime("1970-01-01 14:30:00",
format="%H:%M:%S") & strftime(data[i,"RunTimestamp",],
format="%H:%M:%S")<strftime("1970-01-01 15:50:00",
format="%H:%M:%S")){
data[i,"within.day"] = 5}
# survey 6 = 16:20 (- 10min error) up to 17:00 (+ 10min error)
else if(strftime(data[i,"RunTimestamp",],
format="%H:%M:%S")>strftime("1970-01-01 16:00:00",
format="%H:%M:%S") & strftime(data[i,"RunTimestamp",],
format="%H:%M:%S")<strftime("1970-01-01 17:20:00",
format="%H:%M:%S")){
data[i,"within.day"] = 6}
# survey 7 = > 17:50
else if(strftime(data[i,"RunTimestamp",],
format="%H:%M:%S")>strftime("1970-01-01 17:30:00",
format="%H:%M:%S")){ data[i,"within.day"] = 7
} else { data[i,"within.day"] = NA }}}
# sanity check
miss <- nrow(data[is.na(data$within.day),]) # 9 cases
cat("Adjusting ",miss,"cases with RunTimestamp out of scheduled range")
times <- data.frame(within.day=2:7,timestamps=c("1970-01-01 10:30:00","1970-01-01 12:00:00","1970-01-01 13:30:00",
"1970-01-01 15:00:00","1970-01-01 16:30:00","1970-01-01 18:00:00"))
times$timestamps <- as.POSIXct(as.character(times$timestamps))
for(i in 1:nrow(data)){ if(is.na(data[i,"within.day"])){
data[i,"within.day"] <- times[which.closest(as.numeric(times$timestamps),
as.numeric(as.POSIXct(paste("1970-01-01",
substr(as.character(data[i,"RunTimestamp"]),
12,19))))),"within.day"] }}
# creating day variable
data <- data[order(data$ID,data$RunTimestamp),]
data$day <- 1
for(i in 2:nrow(data)){ if(data[i,"ID"] != data[i-1,"ID"]){ data[i,"day"] <- 1 }
else if(data[i,"ID"] == data[i-1,"ID"] & data[i,"day.of.week"] != data[i-1,"day.of.week"]){
data[i,"day"] <- data[i-1,"day"] + 1}else{ data[i,"day"] <- data[i-1,"day"] }}
rownames(data) <- 1:nrow(data)
data <- data[order(data$ID,data$day,data$within.day),]
return(data[,c("ID","gender","os","day","day.of.week","within.day","SurveyType",
colnames(data)[7:(ncol(data)-1)])])}
# processing data
ESMdata <- within.day.adjust(ESMdata)
## Adjusting 9 cases with RunTimestamp out of scheduled range
# sanity check (2,032, 176)
cat(nrow(ESMdata),"observervations from",nlevels(as.factor(as.character(ESMdata$ID))),"participants")
## 2032 observervations from 176 participants
Comments:
In 9 cases, the variable within.day
could not be
correctly encoded due to RunTimestamp
value out of
scheduled ranges. In these cases, the value of within.day
was assigned based on which scheduled survey’s timestamp was the closest
to the RunTimestamp.
Such cases are likely to be due to specific smartphone
time settings (as we already corrected for daylight time and
time zones). For instance, in 5 cases RunTimestamp
is >
18:50
ESMdata[strftime(ESMdata[,"RunTimestamp",],format="%H:%M:%S")>strftime("1970-01-01 18:50:00",format="%H:%M:%S"),
c(1,3:8)]
As a last control, we check again for double surveys
(i.e., surveys that were sent twice due to a malfunction of the mobile
app) and double protocols (i.e., when day
> 3). In these
cases, only the first survey is retained (the second
one is removed). 14 surveys are removed.
n = 0
new.data <- ESMdata[1,]
for(i in 2:nrow(ESMdata)){ # checking double responses (same ID, day and within.day)
if(ESMdata[i,"ID"] == ESMdata[i-1,"ID"] & ESMdata[i,"day"] == ESMdata[i-1,"day",] &
ESMdata[i,"within.day"] == ESMdata[i-1,"within.day"]){
n <- n + 1
cat("\n",as.character(ESMdata[i,"ID"]),ESMdata[i,"day"],ESMdata[i,"within.day"],
as.character(ESMdata[i-1,"RunTimestamp"]),as.character(ESMdata[i,"RunTimestamp"]))
}else{ new.data <- rbind(new.data,ESMdata[i,])}}
##
## ADMV1965 3 2 2019-06-28 10:41:12.026 2019-06-28 10:41:12.026
## CSNR1962 2 7 2019-03-18 18:51:28.055 2019-03-18 18:51:28.055
## GPLB1954 3 6 2019-01-18 16:26:49.950 2019-01-18 16:26:49.950
## GPLB1954 3 7 2019-01-18 18:05:40.134 2019-01-18 18:05:40.134
## GPLB1954 3 7 2019-01-18 18:05:40.134 2019-01-18 18:05:40.134
## LCNSRD94 1 6 2019-03-13 16:27:23.25 2019-03-13 16:27:23.25
## LRSM1963 1 5 2019-04-10 15:09:23.443 2019-04-10 15:09:24.749
## LRSM1963 1 6 2019-04-10 16:35:00.558 2019-04-10 16:37:25.062
## LRSM1963 1 7 2019-04-10 17:55:07.742 2019-04-10 18:15:06.838
## LRSM1963 2 3 2019-04-12 12:03:32.398 2019-04-12 12:03:34.049
## LRSM1963 2 6 2019-04-12 16:35:29.993 2019-04-12 16:35:31.94
## MAGG1948 1 1 2019-01-18 09:15:00.398 2019-01-25 09:15:01.463
## MAGG1948 1 5 2019-01-18 15:02:55.905 2019-01-25 15:07:59.237
## PZSP1951 1 7 2019-10-25 18:07:43.970 2019-10-25 18:09:10.993
cat("Excluding",n,"double responses") # number of double responses (14)
## Excluding 14 double responses
ESMdata <- new.data # excluding double responses
# sanity check
n = 0
for(i in 2:nrow(ESMdata)){ if(ESMdata[i,"ID"] == ESMdata[i-1,"ID"] & ESMdata[i,"day"] == ESMdata[i-1,"day",] &
ESMdata[i,"within.day"] == ESMdata[i-1,"within.day"]){ n <- n + 1 }}
cat(n,"double responses") # no more double responses (OK)
## 0 double responses
# printing and excluding double protocols
cat("Excluding",nrow(ESMdata[as.numeric(ESMdata$day)>3,]),"cases of double protocols") # double protocols (25)
## Excluding 25 cases of double protocols
ESMdata <- ESMdata[as.numeric(ESMdata$day)<4,] # excluding double protocols
# sanity check (1,993, 176)
ESMdata$ID <- as.factor(as.character(ESMdata$ID)) # updating ID values
cat(nrow(ESMdata),"observervations from",nlevels(as.factor(as.character(ESMdata$ID))),"participants")
## 1993 observervations from 176 participants
Comments:
199 surveys (9.08%) were excluded due to double responses or repeated protocol (i.e., due to technical problems, or failure to stop the mobile application, some participants repeated one or more protocol days)
The recoded dataset includes 1,993 responses from 176 participants
Then, we take a look at the number of missing responses in each item. Indeed, a number of surveys was incomplete due to technical problems with the app.
library(dplyr); library(tidyr)
missing.all <- ESMdata %>%
select(v1:f3) %>%
gather("Variable", "value") %>%
group_by(Variable) %>%
summarise(Missing=length(which(is.na(value))),
'% Missing'=round(100*length(which(is.na(value)))/n(),2))
missing.work <- ESMdata[ESMdata$SurveyType=="work",] %>%
select(WHAT:c3) %>%
gather("Variable", "value") %>%
group_by(Variable) %>%
summarise(Missing=length(which(is.na(value))),
'% Missing'=round(100*length(which(is.na(value)))/n(),2))
detach("package:dplyr", unload=TRUE);detach("package:tidyr", unload=TRUE)
missing <- rbind(as.data.frame(missing.all),as.data.frame(missing.work))
missing$Variable <- factor(missing$Variable,
levels=colnames(ESMdata)[which(colnames(ESMdata)=="v1"):which(colnames(ESMdata)=="c3")])
(missing <- missing[order(missing$Variable),]) # sorting by item order
Comments:
We can notice that missing responses mainly concern the last items of work surveys, with Situational Stressors items (d1, d2, d3, d4, and especially c1, c2, and c3) showing 17 to 64 missing data (0.99 - 3.74%)
In contrast, missing data are < 1% for most items measuring Mood and Work Sampling variables
Here, we remove further 14 surveys (0.69%) due to missing data in almost all items (i.e., those data entries with missing data in the first items)
n <- nrow(ESMdata)
ESMdata <- ESMdata[!(is.na(ESMdata$v2)&is.na(ESMdata$v3)),]
cat(n - nrow(ESMdata),"removed surveys due to incomplete responses") # removed surveys (14)
## 14 removed surveys due to incomplete responses
library(dplyr); library(tidyr)
missing.all <- ESMdata %>%
select(v1:f3) %>%
gather("Variable", "value") %>%
group_by(Variable) %>%
summarise(Missing=length(which(is.na(value))),
'% Missing'=round(100*length(which(is.na(value)))/n(),2))
missing.work <- ESMdata[ESMdata$SurveyType=="work",] %>%
select(WHAT:c3) %>%
gather("Variable", "value") %>%
group_by(Variable) %>%
summarise(Missing=length(which(is.na(value))),
'% Missing'=round(100*length(which(is.na(value)))/n(),2))
detach("package:dplyr", unload=TRUE);detach("package:tidyr", unload=TRUE)
missing <- rbind(as.data.frame(missing.all),as.data.frame(missing.work))
missing$Variable <- factor(missing$Variable,
levels=colnames(ESMdata)[which(colnames(ESMdata)=="v1"):which(colnames(ESMdata)=="c3")])
(missing <- missing[order(missing$Variable),]) # sorting by item order
# sanity check (1979, 175)
ESMdata$ID <- as.factor(as.character(ESMdata$ID)) # updating ID values
cat(nrow(ESMdata),"observervations from",nlevels(as.factor(as.character(ESMdata$ID))),"participants")
## 1979 observervations from 175 participants
Comments:
As noted above, missing responses mainly concern the last items of work surveys, with Situational Stressors items (d1, d2, d3, d4, and especially c1, c2, and c3) showing 8 to 54 missing data (0.47 - 3.18%)
In contrast, missing data are < 0.3% for most items measuring Mood and Work Sampling variables
The recoded dataset includes 1,979 responses from 175 participants
Finally, we check the time required to fill the ESM questionnaires based on timestamps of running and submitting.
time2submit <- difftime(ESMdata$SubmissionTimestamp,ESMdata$RunTimestamp)
length(time2submit[time2submit>900])
## [1] 216
time2submit <- time2submit[time2submit<900] # excluding from comuptation 216 extreme cases (11%) taking more than 15 min
mean(as.numeric(time2submit))/60; sd(time2submit)/60
## [1] 3.971841
## [1] 3.600663
hist(as.numeric(time2submit)/60,breaks=20,main="Time to submit ESM Questionnaire",xlab="Response time (min)")
Comments:
in a number of cases (N = 216), participants took more than 15min to fill the questionnaire, probably because they were doing something else and interrupted the data entry
if we don’t consider those participants, the average time to fill the questionnaire was about 4 min (SD = 3.60 min), with most participants responding in 2 min or less
Here, we recode RETROdata
.
We start by selecting and renaming data columns.
# removing unuseful columns
RETROdata[,c("X.","grazie","Network.ID")] <- NULL
# renaming columns
colnames(RETROdata) <- c("gender","age","job",
"position", # not used in this work
"job.sector",
"instr",paste("home",1:5,sep=""), # not used in this work
"work.hours",
paste("phone",1:6,sep=""), # not used in this work
paste("JAWS",1:12,sep=""),
paste("CBI",1:7,sep=""),
paste("PSI",1:18,sep=""), # not used in this work
paste("PSIm",1:18,sep=""), # not used in this work
paste("d",1:5,sep=""),
paste("OCS",1:6,sep=""), # not used in this work
paste("c",1:5,sep=""),
paste("DWAS",1:10,sep=""), # not used in this work
"ID","OS","START","SUBMIT")
# selecting considered variables
RETROdata <-
RETROdata[,c("ID","OS","START","SUBMIT","gender","age","job","job.sector","work.hours", # participant info & demos
paste("JAWS",1:12,sep=""),paste("CBI",1:7,sep=""), # job strain (job-related aff. wellb. & burnout)
paste("d",1:5,sep=""),paste("c",1:5,sep=""))] # job stressors (demand & control)
Then, we recode all categorical variables.
# OS (iOS, Android, other)
RETROdata[RETROdata$ID=="Clmr1958" | RETROdata$ID=="ATLG1958" | RETROdata$ID=="DVMC1950" |
RETROdata$ID=="PRIZZY88", "OS"] <- levels(as.factor(RETROdata$OS))[3] # filling empty values based on ESMdata
RETROdata[RETROdata$ID=="SBAT1949", "OS"] <- levels(as.factor(RETROdata$OS))[4]
RETROdata$OS <- as.factor(gsub("Con sistema ANDROID Samsung, HUAWEI, ASUS, Xiaomi ecc.","Android", # recoding levels
gsub("Con sistema iOS iPhone","iOS",
gsub("Altro es. Microsoft phone -> VEDERE NOTA","other",
gsub("[()]","",RETROdata$OS)))))
# gender (F, M)
RETROdata$gender <- as.factor(substr(RETROdata$gender,1,1))
# job sector (Private, Public)
RETROdata$job.sector <- as.factor(gsub("Privato","Private",gsub("Pubblico","Public",RETROdata$job.sector)))
# categorical variables as factor
RETROdata[,c("ID","OS","gender","job.sector")] <- lapply(RETROdata[,c("ID","OS","gender","job.sector")],as.factor)
Second, we use the RETRO.compl
function to add
information on participants’ compliance (encoded in the
"Compliance.csv"
file), to remove double
responses, and to recode wrongly encoded participants’
ID
values.
RETRO.compl <- function(data,compliance){ require(plyr)
# ID recoding
cat("Excluding 2 pilot responses") # removing pilot responses
data <- data[!(data$ID=="Prova_ValentinaRossi"),]
data <- data[!(data$ID=="provaBianca"),]
data$ID <- gsub("Magg1948","MAGG1948",data$ID) # incorrectly encoded IDs
data$ID <- gsub("05101985","5101985",data$ID)
data[!is.na(data$ID)&data$ID=="ANMA1938"&data$job=="Psicologa","ID"] <- "ANIMA1938"
data[!is.na(data$ID)&data$ID=="MCVD1959"&as.character(data$START)=="2019-05-21 16:12:01","ID"] <- "MCVD19592"
data[!is.na(data$ID)&data$ID=="mrlv1950","ID"] <- "mrlv19502"
data$ID <- gsub(" ","",data$ID)
data$ID <- gsub("Ciao","",data$ID)
cat("\n\nRecoding 2 participant with identical ID (siblings?)") # siblings
data[!is.na(data$ID) & data$ID=="LFAI1940"&data$gender=="M","ID"] <- "LFAI19402"
# merging with compliance.file
compliance$ID <- gsub("mag-48","MAGG1948",compliance$CODICE) # fixing incorrect ID
data <- plyr::join(data,compliance,type="full",by="ID")
data <- data[order(data$ID),]
rownames(data) <- 1:(nrow(data))
# fixing respRate variable
data$respRate <- NA
data[,c("X1survey","X1day","X3days")] <- lapply(data[,c("X1survey","X1day","X3days")],as.character)
for(i in 1:nrow(data)){
if(is.na(data[i,"X1survey"]) | (!is.na(data[i,"X1survey"]) & data[i,"X1survey"] == "")){ data[i,"X1survey"] <- 0
} else { data[i,"X1survey"] <- 1 }
if(is.na(data[i,"X1day"]) | (!is.na(data[i,"X1day"]) & data[i,"X1day"] == "")){ data[i,"X1day"] <- 0
} else { data[i,"X1day"] <- 1 }
if(is.na(data[i,"X3days"]) | (!is.na(data[i,"X3days"]) & data[i,"X3days"] == "")){ data[i,"X3days"] <- 0
} else { data[i,"X3days"] <- 1 }
if(is.na(data[i,"noQs"]) | (!is.na(data[i,"noQs"]) & data[i,"noQs"] == "")){ data[i,"noQs"] <- 0 }
data[i,"respRate"] <- sum(as.numeric(data[i,c("X1survey","X1day","X3days")]))}
# data[,c("ID","respRate","X1survey","X1day","X3days")] # sanity check
data$X1day <- data$X1survey <- data$X3days <- data$N <- data$CODICE <- NULL # removing unuseful varibles
data$respRate <- as.factor(data$respRate)
# excluding 3 participants with no responses to both questionnaires
cat("\n\nExcluding 3 participant with no responses to both questionnaires") # siblings
data <- data[!(data$noQs==1&data$respRate==0),]
# printing compliance information
cat("\n\nTotal No. of participants = ",nrow(data),", of which:\n - ",
nrow(data[data$noQs==0&as.numeric(data$respRate)>0,])," (",
round(100*nrow(data[data$noQs==0&as.numeric(data$respRate)>0,])/nrow(data),2),
"%) responded to BOTH RETROdata & at least 1 ESMdata\n - ",
nrow(data[data$noQs==0&data$respRate==0,])," (",
round(100*nrow(data[data$noQs==0&data$respRate==0,])/nrow(data),2),
"%) responded to RETROdata BUT NOT to any ESMdata\n - ",
nrow(data[data$noQs==1&as.numeric(data$respRate)>0,])," (",
round(100*nrow(data[data$noQs==1&as.numeric(data$respRate)>0,])/nrow(data),2),
"%) responded to at least 1 ESMdata BUT NOT to RETROdata\n\nAmong the first ",
nrow(data[data$noQs==0&as.numeric(data$respRate)>0,])," participants:\n- ",
nrow(data[data$noQs==0&as.numeric(data$respRate)>1,])," (",
round(100*nrow(data[data$noQs==0&as.numeric(data$respRate)>1,])/nrow(data),2),
"%) responded to BOTH RETROdata & at least 1 ESMdata per day\n- ",
nrow(data[data$noQs==0&data$respRate==3,])," (",
round(100*nrow(data[data$noQs==0&data$respRate==3,])/nrow(data),2),
"%) responded to BOTH RETROdata & at least 3 ESMdata per day\n\n",
sep="")
# updating ID levels
data$ID <- as.factor(as.character(data$ID))
return(data[,c("ID","gender","age","OS","respRate","noQs","START","SUBMIT",
colnames(data)[7:(ncol(data)-2)])])}
# processing data
RETROdata <- RETRO.compl(RETROdata,compliance=read.csv2("S5_Compliance.csv"))
## Excluding 2 pilot responses
##
## Recoding 2 participant with identical ID (siblings?)
##
## Excluding 3 participant with no responses to both questionnaires
##
## Total No. of participants = 211, of which:
## - 202 (95.73%) responded to BOTH RETROdata & at least 1 ESMdata
## - 36 (17.06%) responded to RETROdata BUT NOT to any ESMdata
## - 9 (4.27%) responded to at least 1 ESMdata BUT NOT to RETROdata
##
## Among the first 202 participants:
## - 166 (78.67%) responded to BOTH RETROdata & at least 1 ESMdata per day
## - 114 (54.03%) responded to BOTH RETROdata & at least 3 ESMdata per day
RETROdata$noQs.1 <- NULL # double column
# sanity check (211, 211)
cat(nrow(RETROdata),"observations from",nlevels(RETROdata$ID),"participants")
## 211 observations from 211 participants
Comments:
Among 215 recruited participants, three did not respond to both the preliminary questionnaire and any of the scheduled ESM questionnaires, and were excluded
Moreover, one participant was encoded twice with a wrong
ID
The resulting sample is composed by 211 participants
Here, the job.recode()
function is used to recode the
open-ended job
item responses by using the ISCO-08
classification of occupations (level 2) (Ganzeboom, 2010).
job.recode()
job.recode <- function(data){ require(labourR); require(data.table); require(magrittr)
# creating corpus data
corpus <- data.table(id=data$ID,text=data$job,language="it")
corpus$text <- gsub(" presso la segreteria didattica del Dipartimento di Psicologia Generale","", # remove sensitive info
gsub("POSTE ITALIANE","",gsub("CMP di FIUMICINO","",corpus$text)))
languages <- unique(corpus$language) # language classes
# first screening based on the labourR::classify_occupation() function
suggestions <- lapply(languages, function(lang) {
classify_occupation(corpus=corpus[corpus$language==lang],lang=lang,isco_level=2,num_leaves=10)
}) %>% rbindlist
corpus <- plyr::join(corpus,suggestions,by="id",type="left")
# adjusting automatic classification based on manual screening
corpus[corpus$text%in%c("impiegato pubblico","Impiegato","impiegato","impiegata","Impiegata","Impiegato ","IMPIEGATA",
"Impiegata ","Impiegata d’ufficio, receptionist ","Impiegato tecnico","Segretaria d'azienda ",
"Impiegata presso con mansioni operative al ","impegato","Impiegatizio"),"preferredLabel"] <-
"General and keyboard clerks"
corpus[corpus$text%in%c("SERVIZI","customer care back office",
"lavori di segreteria, archiviazioni pratica, ricevimento clienti etc\\."),"preferredLabel"] <-
"Customer services clerks"
corpus[grep("assegnista",tolower(corpus$text)),"preferredLabel"] <-
"Science and engineering professionals"
corpus[corpus$text%in%c("Docente universitario","Data scientist","Assegno di ricerca ","assistente ricercatore",
"Postdoc ","Professore Associato - Università di Padova","Ricercatore, Docente, Psicoterapeuta",
"Professore universitario","Ricerca accademica","Prof ass dpss","Docente Universitario",
"Ricercatore universitario RTDA","Assegno di ricerca",
"Post doc in laboratorio di microbiologia","docente universitario"),"preferredLabel"] <-
"Science and engineering professionals"
corpus[corpus$text%in%c("Programmatore"),"preferredLabel"] <-
"Information and communications technology professionals"
corpus[corpus$text%in%c("Responsabile comunicazione digitale/press","Addetto alla formazione",
"Commerciale in un'azienda in campo energetico","Tax advisor","Digital Marketer",
"Responsabile di selezione del personale in agenzia per il lavoro","HR SPECIALIST",
"Ufficio Risorse Umane","Recruiter",
"Responsabile contatti in una società di logistica","Digital Marketer freelance e Youth Worker",
"Impiegata e operatore del mercato del lavoro, gestisco e metto in atto azioni di orientamento professionale per disoccupati","Consulente di orientamento professionale","Addetta alla selezione del personale",
"Progettazione spazi adibiti a retail e visual merchandising strategico",
"grafica pubblicitaria","Responsabile comunicazione digitale e grafica",
"Graphic Designer","commercialista"),"preferredLabel"] <-
"Business and administration professionals"
corpus[corpus$text%in%c("impiegato amministrativo","Lavoro d’ufficio in Banca","Contabile ammintrativo",
"Impiegata amministrativa presso Rai","Impiegata settore fiscale",
"Impiegato Amministrativo ","Impiegato Amministrativo",
"Impiegato digital marketing","assistente commerciale - customer service",
"Lavoro impiegatizio presso la pubblica amministrazione",
"lavoro impiegatizio di carattere commerciale, lavoro su progetto",
"Sono impiegata amministrativa nella segreteria di un Istituto di riabilitazione",
"impiegato amministrativo-contabile "),
"preferredLabel"] <-
"Business and administration associate professionals"
corpus[corpus$text%in%c("attività operativa di cantiere"),"preferredLabel"] <-
"Building and related trades workers, excluding electricians"
corpus[corpus$text%in%c("attività editoriale","Coordinatrice Culturale",
"Archivista. Attività di censimento documentazione enti pubblici. Coordinamento altri operatori"),"preferredLabel"] <-
"Legal, social, cultural and related associate professionals"
corpus[corpus$text%in%c("Praticante in uno studio legale, tirocinante in un ufficio giudiziario"),"preferredLabel"] <-
"Legal, social, cultural and related associate professionals"
corpus[corpus$text%in%c("Responsabile di selezione del personale in agenzia per il lavoro",
"Risorse umane","responsabile"),"preferredLabel"] <-
"Administrative and commercial managers"
corpus[corpus$text%in%c("psicologo","Psicologa","Assistente disabili",
"Supporto alla didattica, Psicologo",
"Maestro di laboratorio presso un'istituzione psico-pedagogico"),"preferredLabel"] <-
"Social and religious professionals"
corpus[corpus$text%in%c("direzione","Direttore Generale","Dirigente"),"preferredLabel"] <-
"Chief executives, senior officials and legislators"
corpus[corpus$text%in%c("impiegato, coordinatore preparazione logistico spedizione macchine per imballo",
"manager di un team di 8 persone che si occupa della gestione di dati del sottosuolo per l'industria petrolifera"),"preferredLabel"] <-
"Production and specialised services managers"
corpus[corpus$text%in%c("Coordinatore Terapisti","Operatore socio sanitario","Infermiere\n",
"Coordinatore infermieristico"),"preferredLabel"] <-
"Health professionals"
corpus[corpus$text%in%c("Maestro di laboratorio ",
"Maestro di laboratorio presso un'istituzione psico-pedagogico"),"preferredLabel"] <-
"Teaching professionals"
# merging corpus with data
corpus$ID <- as.factor(corpus$id)
data <- plyr::join(data,corpus[,c("ID","preferredLabel")],by="ID",type="left")
# marking excluded jobs as jobOut = TRUE
data$jobOut <- FALSE
data[data$job%in%c("attività operativa di cantiere","Maestro di laboratorio ","Infermiere professionale\n",
"CUSTOMER SERVICE","Assistente disabili","Operatore socio sanitario",
"Infermiere\n","Fotografo\nFarmacista"),"jobOut"] <- TRUE
# replacing original job with recoded job categories
data$job <- as.factor(data$preferredLabel)
data$preferredLabel <- NULL # removing preferredLabel
# summarizing info
cat("Recoded job variable into",nlevels(data$job),"categories:\n")
print(summary(data[!is.na(data$job),"job"]))
cat("\n\n",nrow(data[data$jobOut==TRUE,]),"cases marked as jobOut (incompatible jobs)")
return(data) }
RETROdata <- job.recode(RETROdata)
## Recoded job variable into 19 categories:
## Administrative and commercial managers
## 12
## Building and related trades workers, excluding electricians
## 1
## Business and administration associate professionals
## 32
## Business and administration professionals
## 39
## Chief executives, senior officials and legislators
## 2
## Customer services clerks
## 2
## General and keyboard clerks
## 20
## Health professionals
## 8
## Information and communications technology professionals
## 8
## Legal, social and cultural professionals
## 4
## Legal, social, cultural and related associate professionals
## 5
## Numerical and material recording clerks
## 1
## Personal service workers
## 1
## Production and specialised services managers
## 8
## Science and engineering associate professionals
## 10
## Science and engineering professionals
## 38
## Social and religious professionals
## 4
## Stationary plant and machine operators
## 3
## Teaching professionals
## 4
##
##
## 8 cases marked as jobOut (incompatible jobs)
# sanity check (211, 211)
cat(nrow(RETROdata),"observations from",nlevels(RETROdata$ID),"participants")
## 211 observations from 211 participants
Finally, we check the time required to fill the preliminary questionnaire based on time stamp variables.
# START and SUBMIT as POSIXct
RETROdata$START <- as.POSIXct(as.character(RETROdata$START))
RETROdata$SUBMIT <- as.POSIXct(as.character(RETROdata$SUBMIT))
# conmputing response times (minutes)
time2submit <- difftime(RETROdata$SUBMIT,RETROdata$START,units="mins")
time2submit[!is.na(time2submit)&time2submit>40] # extreme cases
## Time differences in mins
## [1] 58.95000 69.60000 136.83333 53.98333 95.95000 70.00000
## [7] 320.10000 40.58333 146.43333 58.03333 41.55000 227.08333
## [13] 42.71667 154.11667 79.58333 66.33333 72.40000 44.68333
## [19] 51.45000 84.68333 1848.28333 55.38333
time2submit <- time2submit[!is.na(time2submit)&time2submit<40] # excluding 22 extreme cases from the computation
mean(time2submit); sd(time2submit)
## Time difference of 17.62167 mins
## [1] 7.706317
hist(as.numeric(time2submit),breaks=20,main="Time to submit Preliminary Questionnaire",xlab="Response time (min)")
Comments:
a number of participants (N = 22) took more than 40 min to fill the questionnaire, probably because they were doing something else and interrupted the administration
after the exclusion of those participants, the average time to fill the questionnaire was 17.62 min (SD = 7.70 min), with most participants responding in 15 min or less
Here, we merge the ESMdata and RETROdata datasets to be used for data analysis.
First, we use the IDrecode()
function to recode wrongly
indicated ID
values in the ESMdata
(i.e., most
but not all characters corresponded between ESMdata
and
RETROdata
).
IDrecode <- function(data){
# correcting wrongly reported IDs
data$ID <- gsub("05101985","5101985",data$ID)
data$ID <- gsub("ACLS1955 ","ACLS1955",data$ID)
data$ID <- gsub("BAFC1922 ","BAFC1922",data$ID)
data$ID <- gsub("Adfcr49","ADFCR1949",data$ID)
data$ID <- gsub("Asst42","ASST1945",data$ID)
data$ID <- gsub("LBMM1958","BLMM1958",data$ID)
data$ID <- gsub("CRG16","CGT16",data$ID)
data$ID <- gsub("SGAMOR51","CSAM1951",data$ID)
data$ID <- gsub("ZFBR50","FZBR50",data$ID)
data$ID <- gsub("GBPR1944","GBRP1944",data$ID)
data$ID <- gsub("LCPP1945","LCPP1944",data$ID)
data$ID <- gsub("LFMI1965","LFMDI1965",data$ID)
data$ID <- gsub("MGRB1964","MRLV19502",data$ID)
data$ID <- gsub("Ugiila17L","UCGZ1956",data$ID)
data$ID <- gsub("Andrea89","VTGF1966",data$ID)
data$ID <- gsub("LFPT1955","LFPT54",data$ID)
data$ID <- gsub("GNCP1974","GMCP74",data$ID)
data$ID <- gsub("PCAN1953","PCAN1935",data$ID)
data$ID <- gsub("PPCG1961","PPGC1961",data$ID)
return(data) }
ESMdata <- IDrecode(ESMdata)
Then, we check for differences in terms of participants’ ID between the two datasets, and we rename variables with the same label.
# checking differences
RETROid <- toupper(levels(as.factor(as.character(RETROdata[RETROdata$respRate!=0,
"ID"])))) # selecting those that responded to 1+ ESM form
ESMid <- toupper(levels(as.factor(as.character(ESMdata$ID)))) # selectng everybody
for(i in 1:length(RETROid)){ if(RETROid[i] %in% ESMid) next
else print(RETROid[i])} # showing cases with RETROdata but not ESMdata ID (0)
for(i in 1:length(ESMid)){ if(ESMid[i] %in% RETROid) next
else print(ESMid[i])} # showing cases with ESM but not RETROdata ID (0)
# renaming variables
colnames(RETROdata)[which(colnames(RETROdata)=="gender")] <- "gender.RETRO" # changing label for sanity check
# IDs in capital letters
RETROdata$ID <- as.factor(toupper(RETROdata$ID)) # only those who answered to at least 1
ESMdata$ID <- as.factor(toupper(ESMdata$ID))
Finally, we can merge the two datasets.
# merging
ESMdata <- plyr::join(ESMdata,RETROdata[,c(1:which(colnames(RETROdata)=="work.hours"),ncol(RETROdata))],by="ID",type="full")
# sanity check (different gender between prelQS and ESM)
levels(as.factor(as.character(ESMdata[!is.na(ESMdata$gender)&
!is.na(ESMdata$gender.RETRO)&ESMdata$gender!=ESMdata$gender.RETRO,"ID"]))) # 2 cases
## [1] "BDRB1955" "MFPW1957"
# sanity check (different OS between prelQS and ESM)
levels(as.factor(as.character(ESMdata[!is.na(ESMdata$os)&!is.na(ESMdata$OS)&
as.character(ESMdata$os)!=as.character(ESMdata$OS),c("ID")]))) # 2 cases
## [1] "LPTB1929" "SOMC57"
ESMdata$gender <- ESMdata$gender.RETRO # keeping only RETRO gender
ESMdata$gender.RETRO <- NULL
colnames(RETROdata)[2] <- "gender"
ESMdata$OS <- NULL # keeping only ESM os
Comments:
Now, the ESMdata
dataset includes the demographic
and occupational information collected with the preliminary
questionnaire, and the information on participants’ compliance. In both
cases, the variables assume identical values in each row corresponding
to a given participant.
in only two cases, participants selected the protocol corresponding to a different gender than what they indicated in the preliminary questionnaire (we trust the latter)
in only two cases, participants reported the wrong OS in the preliminary questionnaire
Although data were collected anonymously, an identification code
ID
was self-created by the participants to link the
responses between the preliminary questionnaire and the ESM forms. Since
this code was created based on personal information (e.g., mother’s year
of birth), here we recode ID
values as SXXX
so
that such information will not be available for future users of our
data, in compliance with the GDPR.
# saving IDs
IDs <- data.frame(ID=unique(c(levels(ESMdata$ID),levels(RETROdata$ID))))
# creating new fully anonymized values
IDs$anID <- NA
for(i in 1:nrow(IDs)){ id <- paste("S",i,sep="")
if(nchar(id)>2){ if(nchar(id)>3){ IDs[i,"anID"] <- id } else { IDs[i,"anID"] <- gsub("S","S0",id) }
} else { IDs[i,"anID"] <- gsub("S","S00",id) }}
IDs$anID <- as.factor(IDs$anID)
head(IDs$anID) # showing examples
## [1] S001 S002 S003 S004 S005 S006
## 211 Levels: S001 S002 S003 S004 S005 S006 S007 S008 S009 S010 S011 S012 ... S211
# replacing ID values with anID values
ESMdata <- plyr::join(ESMdata,IDs,by="ID",type="left")
ESMdata$ID <- ESMdata$anID
RETROdata <- plyr::join(RETROdata,IDs,by="ID",type="left")
RETROdata$ID <- RETROdata$anID
ESMdata$anID <- RETROdata$anID <- NULL
# sanity check (2015, 211)
cat("ESMdata:",nrow(ESMdata),"observations from",nlevels(ESMdata$ID),"participants")
## ESMdata: 2015 observations from 211 participants
cat("RETROdata:",nrow(RETROdata),"observations from",nlevels(RETROdata$ID),"participants")
## RETROdata: 211 observations from 211 participants
Here, we sort the columns and provide a data dictionary for the
processed ESMdata
and RETROdata
datasets.
# selecting and sorting columns
ESMdata <- ESMdata[,c("ID",colnames(ESMdata)[3:which(colnames(ESMdata)=="SubmissionTimestamp")],
colnames(ESMdata)[which(colnames(ESMdata)=="v1"):which(colnames(ESMdata)=="c3")],# ESM variables
"gender","age","job","jobOut","job.sector","work.hours", # demographic variables
"noQs","respRate")] # response rate info
str(ESMdata)
## 'data.frame': 2015 obs. of 36 variables:
## $ ID : Factor w/ 211 levels "S001","S002",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ os : Factor w/ 2 levels "Android","iOS": 2 2 2 2 2 2 2 2 2 2 ...
## $ day : num 1 1 1 1 2 2 2 3 3 3 ...
## $ day.of.week : int 1 1 1 1 3 3 3 5 5 5 ...
## $ within.day : num 2 3 4 7 3 6 7 3 4 6 ...
## $ SurveyType : Factor w/ 2 levels "baseline","work": 2 2 2 2 2 2 2 2 2 2 ...
## $ RunTimestamp : POSIXct, format: "2018-12-03 10:22:29.818" "2018-12-03 12:05:17.028" ...
## $ SubmissionTimestamp: POSIXct, format: "2018-12-03 10:24:34.655" "2018-12-03 12:07:35.108" ...
## $ v1 : num 4 3 2 2 3 3 3 2 2 3 ...
## $ v2 : num 3 4 4 4 3 4 4 3 4 4 ...
## $ v3 : num 5 4 2 4 3 5 3 2 4 3 ...
## $ t1 : num 6 2 2 3 4 3 4 3 3 3 ...
## $ t2 : num 1 2 2 5 2 4 2 6 4 2 ...
## $ t3 : num 2 3 2 3 3 1 2 2 1 2 ...
## $ f1 : num 7 5 3 7 4 6 6 3 4 6 ...
## $ f2 : num 4 5 4 6 3 6 5 3 3 7 ...
## $ f3 : num 6 6 4 6 3 6 5 3 3 6 ...
## $ WHAT : Factor w/ 118 levels "ACQUISITION",..: 47 105 91 77 108 105 118 9 88 9 ...
## $ HOW : Factor w/ 70 levels "FACE2FACE","FACE2FACE,OTHER",..: 33 1 1 33 16 6 34 34 40 33 ...
## $ WHOM : Factor w/ 51 levels "ALONE","ALONE,COLL",..: 8 17 15 1 26 17 8 1 1 1 ...
## $ nPeople : num 1 5 5 0 50 30 1 0 0 0 ...
## $ d1 : num 5 5 5 5 3 1 4 5 5 6 ...
## $ d2 : num 4 2 3 5 2 1 5 3 2 6 ...
## $ d3 : num 3 4 2 3 2 2 3 6 5 3 ...
## $ d4 : num 3 3 2 5 3 2 3 4 5 6 ...
## $ c1 : num 4 1 1 3 1 1 7 6 7 7 ...
## $ c2 : num 7 1 2 7 1 1 7 7 6 7 ...
## $ c3 : num 4 3 2 7 1 1 7 7 6 7 ...
## $ gender : Factor w/ 2 levels "F","M": 2 2 2 2 2 2 2 2 2 2 ...
## $ age : int 33 33 33 33 33 33 33 33 33 33 ...
## $ job : Factor w/ 19 levels "Administrative and commercial managers",..: 16 16 16 16 16 16 16 16 16 16 ...
## $ jobOut : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
## $ job.sector : Factor w/ 2 levels "Private","Public": 2 2 2 2 2 2 2 2 2 2 ...
## $ work.hours : int 50 50 50 50 50 50 50 50 50 50 ...
## $ noQs : num 0 0 0 0 0 0 0 0 0 0 ...
## $ respRate : Factor w/ 4 levels "0","1","2","3": 4 4 4 4 4 4 4 4 4 4 ...
Data structure
ID
= participant’s anonymized identification
code
os
= participant’s phone operating system
(iOS
or Android
)
day
= day of participation (1, 2 or 3)
day.of.week
= weekday (1 = Monday, 3 = Wednesday, 5
= Friday)
within.day
= scheduled questionnaire within day
(from 1 to 7)
SurveyType
= type of ESM questionnaire (“baseline”
or “work”)
RunTimestamp
and SubmissionTimestamp
=
date and time of ESM questionnaire initiation and submission
ESM ratings
v1
- f3
= Multidimensional Mood
Questionnaire (MDMQ) items measuring Negative Valence, Tense Arousal,
and Fatigue
WHAT
= work sampling item asking to indicate the
type of work task performed in the last 10 min
HOW
= work sampling item asking to indicate the mean
of work used in the last 10 min
WHOM
= work sampling item asking to indicate the
people involved in the task
nPeople
= work sampling item asking to indicate the
total number of people present during the task
d1
- d4
= Task Demand Scale (TDS)
items
c1
- c3
= Task Control (TCS)
items
Demographics (also included in
RETROdata
)
gender
= participant’s gender (M
or
F
)
age
= participant’s age (years)
job
= participant’ job recoded by using ISCO-08
categories
jobOut
= logical variable equal to TRUE
for those participants with a job not compatible with our inclusion
criteria
job.sector
= participant’s job sector
(Private
or Public
)
work.hours
= participant’s weekly work
hours
Inclusion criteria (also included in
RETROdata
)
noQs
= indicating if the participant filled the
preliminary questionnaire (0) or not (1)
RRate
= participant’s response rate
# selecting and sorting columns
RETROdata <- RETROdata[,c("ID","gender","age","job","jobOut","job.sector","work.hours",
paste("JAWS",1:12,sep=""),paste("CBI",1:7,sep=""),
paste("d",1:5,sep=""),paste("c",1:5,sep=""),"noQs","respRate")]
str(RETROdata)
## 'data.frame': 211 obs. of 38 variables:
## $ ID : Factor w/ 211 levels "S001","S002",..: 1 2 176 3 4 5 6 7 177 8 ...
## $ gender : Factor w/ 2 levels "F","M": 2 2 2 1 1 2 1 2 1 1 ...
## $ age : int 33 29 44 42 40 43 59 41 33 33 ...
## $ job : Factor w/ 19 levels "Administrative and commercial managers",..: 16 16 14 7 3 16 4 2 1 1 ...
## $ jobOut : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
## $ job.sector: Factor w/ 2 levels "Private","Public": 2 2 1 1 2 2 1 1 1 1 ...
## $ work.hours: int 50 50 60 40 50 50 46 55 50 40 ...
## $ JAWS1 : int 1 5 3 3 3 3 4 3 3 2 ...
## $ JAWS2 : int 3 5 3 2 2 3 4 3 5 1 ...
## $ JAWS3 : int 1 2 2 2 3 2 4 2 2 1 ...
## $ JAWS4 : int 1 3 2 2 4 2 5 3 3 1 ...
## $ JAWS5 : int 4 3 1 3 2 2 4 3 3 1 ...
## $ JAWS6 : int 4 5 2 2 3 4 3 3 4 2 ...
## $ JAWS7 : int 3 3 4 2 4 4 2 4 2 3 ...
## $ JAWS8 : int 5 3 4 3 4 3 2 4 3 2 ...
## $ JAWS9 : int 4 4 4 2 4 3 3 4 3 3 ...
## $ JAWS10 : int 5 3 4 3 4 4 3 5 3 4 ...
## $ JAWS11 : int 5 3 4 3 4 3 3 4 3 4 ...
## $ JAWS12 : int 1 1 2 3 4 2 1 4 2 4 ...
## $ CBI1 : int 4 5 3 2 3 4 4 3 5 3 ...
## $ CBI2 : int 1 2 1 2 3 3 4 2 4 2 ...
## $ CBI3 : int 2 2 1 2 3 3 4 1 3 2 ...
## $ CBI4 : int 3 3 4 4 4 3 2 5 3 4 ...
## $ CBI5 : int 2 4 4 2 2 3 5 4 4 1 ...
## $ CBI6 : int 2 4 1 2 2 3 4 3 4 1 ...
## $ CBI7 : int 2 3 1 2 3 2 4 1 4 2 ...
## $ d1 : int 4 2 5 3 4 3 4 5 5 2 ...
## $ d2 : int 4 5 4 3 4 4 5 4 5 2 ...
## $ d3 : int 5 4 4 2 3 3 4 4 5 1 ...
## $ d4 : int 5 5 5 3 4 4 4 4 5 2 ...
## $ d5 : int 5 5 3 3 4 4 4 4 5 2 ...
## $ c1 : int 4 4 4 3 4 4 3 5 3 4 ...
## $ c2 : int 4 5 4 3 4 3 3 5 4 4 ...
## $ c3 : int 2 1 2 3 2 3 4 1 3 3 ...
## $ c4 : int 4 5 4 3 4 4 2 5 4 4 ...
## $ c5 : int 5 5 4 3 4 3 2 5 3 4 ...
## $ noQs : num 0 0 0 0 0 0 0 0 0 0 ...
## $ respRate : Factor w/ 4 levels "0","1","2","3": 4 4 1 3 4 4 4 3 1 3 ...
Data structure
ID
= participant’s anonymized identification codeDemographics (also included in
ESMdata
)
gender
= participant’s gender (M
or
F
)
age
= participant’s age (years)
job
= participant’ job recoded by using ISCO-08
categories
jobOut
= logical variable equal to TRUE
for those participants with a job not compatible with our inclusion
criteria
job.sector
= participant’s job sector
(Private
or Public
)
work.hours
= participant’s weekly work
hours
Retrospective ratings
JAWS1
- JAWS12
= Job-related Affective
Wellbeing Scale item responses
CBI1
- CBI7
= Copenhagen Burnout
Inventory (work-related burnout dimension) item scores
d1
- d5
= Quantitative Workload
Inventory item scores
c1
- c5
= Job Control item
scores
Inclusion criteria (also included in
RETROdata
)
noQs
= indicating if the participant filled the
preliminary questionnaire (0) or not (1)
RRate
= participant’s response rate
Finally, we export the two processed datasets in both .RData and CSV format to be used in the main analyses.
# exporting processed ESMdata
save(ESMdata,file="S5_processedData/ESM_processed.RData")
write.csv(ESMdata,"S5_processedData/ESM_processed.csv")
# exporting processed RETROdata
save(RETROdata,file="S5_processedData/RETRO_processed.RData")
write.csv(RETROdata,"S5_processedData/RETRO_processed.csv")
Ganzeboom, H. B. (2010). International standard classification of occupations ISCO-08 with ISEI-08 scores. Version of July, 27, 2010.
Xiong, H., Huang, Y., Barnes, L. E., & Gerber, M. S. (2016). Sensus: a cross-platform, general-purpose system for mobile crowdsensing in human-subject studies. Proceedings of the 2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing, 415–426. https://doi.org/10.1145/2971648.2971711