Take-home Exercise 4

In this take-home exercise 4, we will examine the daily routines of two selected participant of the city of Engagement, Ohio USA.

Hulwana Saifulzaman https://www.linkedin.com/in/hulwana-saifulzaman/ (SMU, Master of IT in Business)https://scis.smu.edu.sg/master-it-business
2022-05-23

1. Overview

In this take-home exercise 4, we will examine the daily routines of two selected participant of the city of Engagement, Ohio USA.

2. Getting Started

Before we get started, it is important for us to ensure that the required R packages have been installed. If yes, we will load the R packages. If they have yet to be installed, we will install the R packages and load them onto R environment.

The chunk code below will do the trick.

packages = c( 'lubridate', 'patchwork', 
             'gridExtra', 'tidyverse','knitr',
             'data.table', 'clock')

for (p in packages){
  if(!require(p, character.only = T)){
    install.packages(p)
  }
  library(p,character.only = T)
}

3. Dataset

The TravelJournal and CheckinJournal dataset has been obtained from the Journals folder retrieved from Vast Challenge 2022 website.

3.1. Importing Data

The code chunk below import TravelJournal.csv and CheckinJournal.csv from the data folder by using read_csv() of readr into R and save it as an tibble data frame called travel and venue.

travel <- read_csv("data/TravelJournal.csv")

venue <- read_csv("data/CheckinJournal.csv")

3.2. Data Preparation

3.2.1. Extract the venue type from venue dataset

As the end location of the participants are not readily identifiable in the travel dataset, we will mapped the travelEndLocationId to the venue type obtained from the venue dataset.

Below is the code to get the distinct venueId and its corresonpding venueType.

venue_map <- venue %>% select(venueId, venueType) %>% distinct()

3.2.2. Join the 2 datasets

Left join the travel dataset with venue_map to identify the end location.

location <- merge(travel, venue_map, by.x = 'travelEndLocationId', by.y = 'venueId', all.x = TRUE)

3.2.3. Identify the times where participants are travelling

As there are 2 sets of start and end time for each observation, we will first need to get the interval where participants are travelling.

transport <- location %>%
  select(participantId, travelStartTime, travelEndTime) %>%
  mutate(activity = "Transport")

names(transport)[2] <- "startTime"
names(transport)[3] <- "endTime"

3.2.4. Identify the times where participants are at a certain loacation

place <- location %>%
  select(participantId, checkInTime, checkOutTime, venueType)

names(place)[2] <- "startTime"
names(place)[3] <- "endTime"
names(place)[4] <- "activity"

3.2.5. Merge both transport and place

activity <- rbind(place, transport)

4. Comparing the daily pattern of 2 participants

4.1 Weekly pattern for participantId 100 and 200

Participant with ID 100 and 200 are selected for comparison. We will compare their weekly pattern between 7 March 2022 - 13 March 2022.

To examine the pattern and duration spent by each participant at the respective location, we will use geom_segment() from the ggplot2 package.

4.1.1. Extract weekly data for participant 100 and 200

pattern <- activity %>% filter(participantId == 100 | participantId == 200) %>%
  filter(as.character(endTime) > "2022-03-07 00:00:00") %>%
  filter(as.character(startTime) < "2022-03-14 00:00:00")

4.1.2. Plot the weekly routine for participant 100 and 200

pl <- ggplot(pattern) +
geom_segment(aes(x=activity, xend=activity, y=startTime, yend=endTime, color = activity), size = 10) +
geom_hline(yintercept = as_datetime("2022-03-07 00:00:00"), color = "black", 
           linetype = 'dashed', size = 1, alpha = 0.5) +
geom_hline(yintercept = as_datetime("2022-03-08 00:00:00"), color = "black", 
           linetype = 'dashed', size = 1, alpha = 0.5) +
geom_hline(yintercept = as_datetime("2022-03-09 00:00:00"), color = "black", 
           linetype = 'dashed', size = 1, alpha = 0.5) +
geom_hline(yintercept = as_datetime("2022-03-10 00:00:00"), color = "black", 
           linetype = 'dashed', size = 1, alpha = 0.5) +
geom_hline(yintercept = as_datetime("2022-03-11 00:00:00"), color = "black", 
           linetype = 'dashed', size = 1, alpha = 0.5) +
geom_hline(yintercept = as_datetime("2022-03-12 00:00:00"), color = "black", 
           linetype = 'dashed', size = 1, alpha = 0.5) +
geom_hline(yintercept = as_datetime("2022-03-13 00:00:00"), color = "black", 
           linetype = 'dashed', size = 1, alpha = 0.5) +
theme_bw() +
facet_grid(participantId ~ .)+
labs(x = 'Location', y = "Date Time", title = "Weekly routine of participantId 100 and 200" ) +
coord_flip() 

pl

4.2.1. Extract 8 March 2022 data for participant 100 and 200

pattern2 <- activity %>% filter(participantId == 100 | participantId == 200) %>%
  filter(as.character(endTime) > "2022-03-08 00:00:00") %>%
  filter(as.character(endTime) < "2022-03-09 08:00:00")

4.2.2. Plot Tuesday, 8 March 2022 routine for both participants

pl2 <- ggplot(pattern2) +
geom_segment(aes(x=activity, xend=activity, y=startTime, yend=endTime, 
                             color = activity), size = 10) +
geom_hline(yintercept = as_datetime("2022-03-07 00:00:00"), color = "black", 
           linetype = 'dashed', size = 1, alpha = 0.5) +
geom_hline(yintercept = as_datetime("2022-03-08 00:00:00"), color = "black", 
           linetype = 'dashed', size = 1, alpha = 0.5) +
geom_hline(yintercept = as_datetime("2022-03-09 00:00:00"), color = "black", 
           linetype = 'dashed', size = 1, alpha = 0.5) +
geom_hline(yintercept = as_datetime("2022-03-10 00:00:00"), color = "black", 
           linetype = 'dashed', size = 1, alpha = 0.5) +
theme_bw() +
facet_grid(participantId ~ .)+
labs(x = 'Location', y = "Date Time", title = "Routine of participantId 100 and 200 on 8 March 2022" ) +
coord_flip() 

pl2

Based on the chart plotted above, we notice the following similarities and differences between participantId 100 and 200:

Work

● Similarities : Both participant 100 and 200 work 5 days a week. ● Differences : Participant 100 works from Tuesday - Saturday whereas participant 200 works from Monday - Friday. Participant 100 starts and end work slightly later as compared to participant 200.

Lunch

● Differences : Participant 200 goes for lunch at a much later timing as compared to participant 100. Participant 200 takes lesser time to travel from work to the eating place as compares to participant 100.

Commute home from work

● Similarities : The commute home from work is roughly the same for both participants.

Going Home

● Differences : As participant 200 ends work slightly earlier, participant 200 also reaches home slighlty earlier than participant 100.

Recreation

● Similarities : Both participants do not go to pubs on working days. ● Differences : For this particular week, participant 200 does not go to the pub whereas particpant 100 goes on non-working days.

Weekends

● Differences : Participant 200 spent the weekend at home whereas participant 100 went out to the restaurant and pubs.