Get Google Drive ID for latest or archive DoH Data Drop folders

The DoH Data Drop is distributed using Google Drive with the latest data released through a new Google Drive folder and the older data archived into the same persistent Google Drive folder.

datadrop_id_latest(verbose = TRUE)

datadrop_id_archive(verbose = TRUE, .date = NULL)

datadrop_id(verbose = TRUE, version = c("latest", "archive"), .date = NULL)

datadrop_id_file(tbl, fn)

Arguments

verbose	Logical. Should message on operation progress be shown. Default is TRUE.
.date	A character value for date in YYYY-MM-DD format. This is the date for the archive DoH Data Drop for which an ID is to be returned. Should be specified when using `datadrop_id_archive()`. For `datadrop_id()`, only used when `version` is set to `archive` otherwise ignored.
version	A character value specifying whether to get the latest available DoH Data Drop (`latest`) or to get DoH Data Drop archive (`archive`). Default to `latest`.
tbl	A tibble output produced by `datadrop_ls()` that lists the files within a particular DoH Data Drop Google Drive folder
fn	A character string composed of a word or words that can be used to match to the name of a file within a particular DoH Data Drop Google Drive folder listed in `tbl`.

Value

A 33-character string for the Google Drive ID of the latest DoH Data Drop or the archive DoH Data Drop

A 33-character string for the Google Drive ID of the specified DoH Data Drop file. If fn matches with more than one file, a vector of 33-character strings for the Google Drive IDs of the specified DoH Data Drop files.

Details

The Philippines Department of Health (DoH) currently distributes the latest Data Drop via a fixed shortened URL (bit.ly/DataDropPH) which links/points to a new Google Drive endpoint daily or whenever the daily updated data drop is available. This Google Drive endpoint is a README document in portable document format (PDF) which contains a privacy and confidentiality statement, technical notes with regard to the latest data, technical notes with regard to previous (archive data) and two shortened URLs - one linking to the Google Drive folder that contains all the latest officially released datasets, and the other linking to the datasets released previously (archives). Of these, the first shortened URL linking to the Google Drive folder containing the latest officially released datasets is different for every release and can only be obtained through the README document released for a specific day.

The function datadrop_id_latest() reads the README PDF file, extracts the shortened URL for the latest official released datasets written in that file, expands that shortened URL and then extracts the unique Google Drive ID for the latest officially released datasets. With this Google Drive ID, other functions can then be used to retrieve information and data from the Google Drive specified by this ID.

The DoH Data Drop archives, on the other hand, is distributed via a fixed shortened URL (bit.ly/DataDropArchives) which links/points to a Google Drive folder containing the previous DoH Data Drop releases.

The function datadrop_id_archive() expands that shortened URL and then extracts the unique Google Drive ID for the DoH Data Drop archives folder. With this Google Drive ID, other functions can then be used to retrieve information and data from the Google Drive specified by this ID.

Author

Ernest Guevarra

Examples

if (FALSE) {
  library(googledrive)

  ## Deauthorise
  googledrive::drive_deauth()

  ## Two ways to get the Google Drive ID of the latest DoH Data Drop
  datadrop_id_latest()
  datadrop_id()

  ## Two ways to get the Google Drive ID of the archive DoH Data Drop for
  ## 1 November 2020
  datadrop_id_archive(.date = "2020-11-01")
  datadrop_id(version = "archive", .date = "2020-11-01")
}

if (FALSE) {
  library(googledrive)

  ## Authentication
  googledrive::drive_auth_configure(api_key = Sys.getenv("GOOGLEDRIVE_TOKEN"))

  ## Deauthorise
  googledrive::drive_deauth()

  ## Typical workflow
  tbl <- datadrop_ls(id = datadrop_id())
  datadrop_id_file(tbl = tbl, fn = "Case Information")

  ## Piped workflow using magrittr %>%
  library(magrittr)

  ## Get the id for the latest Case Information file
  datadrop_id() %>%
    datadrop_ls() %>%
    datadrop_id_file(fn = "Case Information")
}