In this take-home exercise 4, we will examine the daily routines of two selected participant of the city of Engagement, Ohio USA.
In this take-home exercise 4, we will examine the daily routines of two selected participant of the city of Engagement, Ohio USA.
Before we get started, it is important for us to ensure that the required R packages have been installed. If yes, we will load the R packages. If they have yet to be installed, we will install the R packages and load them onto R environment.
The chunk code below will do the trick.
packages = c( 'lubridate', 'patchwork',
'gridExtra', 'tidyverse','knitr',
'data.table', 'clock')
for (p in packages){
if(!require(p, character.only = T)){
install.packages(p)
}
library(p,character.only = T)
}
The TravelJournal and CheckinJournal dataset has been obtained from the Journals folder retrieved from Vast Challenge 2022 website.
The code chunk below import TravelJournal.csv and
CheckinJournal.csv from the data folder by using read_csv()
of readr
into R and save it as an tibble data frame called travel and
venue.
travel <- read_csv("data/TravelJournal.csv")
venue <- read_csv("data/CheckinJournal.csv")
As the end location of the participants are not readily identifiable in the travel dataset, we will mapped the travelEndLocationId to the venue type obtained from the venue dataset.
Below is the code to get the distinct venueId and its corresonpding venueType.
venue_map <- venue %>% select(venueId, venueType) %>% distinct()
Left join the travel dataset with venue_map to identify the end location.
location <- merge(travel, venue_map, by.x = 'travelEndLocationId', by.y = 'venueId', all.x = TRUE)
As there are 2 sets of start and end time for each observation, we will first need to get the interval where participants are travelling.
activity <- rbind(place, transport)
Participant with ID 100 and 200 are selected for comparison. We will compare their weekly pattern between 7 March 2022 - 13 March 2022.
To examine the pattern and duration spent by each participant at the
respective location, we will use geom_segment()
from the ggplot2
package.
pattern <- activity %>% filter(participantId == 100 | participantId == 200) %>%
filter(as.character(endTime) > "2022-03-07 00:00:00") %>%
filter(as.character(startTime) < "2022-03-14 00:00:00")
pl <- ggplot(pattern) +
geom_segment(aes(x=activity, xend=activity, y=startTime, yend=endTime, color = activity), size = 10) +
geom_hline(yintercept = as_datetime("2022-03-07 00:00:00"), color = "black",
linetype = 'dashed', size = 1, alpha = 0.5) +
geom_hline(yintercept = as_datetime("2022-03-08 00:00:00"), color = "black",
linetype = 'dashed', size = 1, alpha = 0.5) +
geom_hline(yintercept = as_datetime("2022-03-09 00:00:00"), color = "black",
linetype = 'dashed', size = 1, alpha = 0.5) +
geom_hline(yintercept = as_datetime("2022-03-10 00:00:00"), color = "black",
linetype = 'dashed', size = 1, alpha = 0.5) +
geom_hline(yintercept = as_datetime("2022-03-11 00:00:00"), color = "black",
linetype = 'dashed', size = 1, alpha = 0.5) +
geom_hline(yintercept = as_datetime("2022-03-12 00:00:00"), color = "black",
linetype = 'dashed', size = 1, alpha = 0.5) +
geom_hline(yintercept = as_datetime("2022-03-13 00:00:00"), color = "black",
linetype = 'dashed', size = 1, alpha = 0.5) +
theme_bw() +
facet_grid(participantId ~ .)+
labs(x = 'Location', y = "Date Time", title = "Weekly routine of participantId 100 and 200" ) +
coord_flip()
pl
pattern2 <- activity %>% filter(participantId == 100 | participantId == 200) %>%
filter(as.character(endTime) > "2022-03-08 00:00:00") %>%
filter(as.character(endTime) < "2022-03-09 08:00:00")
pl2 <- ggplot(pattern2) +
geom_segment(aes(x=activity, xend=activity, y=startTime, yend=endTime,
color = activity), size = 10) +
geom_hline(yintercept = as_datetime("2022-03-07 00:00:00"), color = "black",
linetype = 'dashed', size = 1, alpha = 0.5) +
geom_hline(yintercept = as_datetime("2022-03-08 00:00:00"), color = "black",
linetype = 'dashed', size = 1, alpha = 0.5) +
geom_hline(yintercept = as_datetime("2022-03-09 00:00:00"), color = "black",
linetype = 'dashed', size = 1, alpha = 0.5) +
geom_hline(yintercept = as_datetime("2022-03-10 00:00:00"), color = "black",
linetype = 'dashed', size = 1, alpha = 0.5) +
theme_bw() +
facet_grid(participantId ~ .)+
labs(x = 'Location', y = "Date Time", title = "Routine of participantId 100 and 200 on 8 March 2022" ) +
coord_flip()
pl2
Based on the chart plotted above, we notice the following similarities and differences between participantId 100 and 200:
● Similarities : Both participant 100 and 200 work 5 days a week. ● Differences : Participant 100 works from Tuesday - Saturday whereas participant 200 works from Monday - Friday. Participant 100 starts and end work slightly later as compared to participant 200.
● Differences : Participant 200 goes for lunch at a much later timing as compared to participant 100. Participant 200 takes lesser time to travel from work to the eating place as compares to participant 100.
● Similarities : The commute home from work is roughly the same for both participants.
● Differences : As participant 200 ends work slightly earlier, participant 200 also reaches home slighlty earlier than participant 100.
● Similarities : Both participants do not go to pubs on working days. ● Differences : For this particular week, participant 200 does not go to the pub whereas particpant 100 goes on non-working days.
● Differences : Participant 200 spent the weekend at home whereas participant 100 went out to the restaurant and pubs.