R¶

Using R and RStudio to Access BurstIQ APIs¶

With R 3.2 or later, there are web-based methods to exchange data between applications or systems using web technology like HTTP and machine-readable file formats like XML and JSON. Representational State Transfer (REST) is the most popular architecture used to access and implement web services. Web services using the REST architecture are called RESTful web services.

In this paper, we provide the basic steps to conduct R to the BurstIQ REST Application Programming Interfaces (APIs) on the BurstIQ platform. The paper includes R code that you can adapt and use immediately with your own code.

GETTING STARTED¶

Prerequisites:

Basic working knowledge of R and are comfortable [scripting with RStudio]{.ul} or working with [the Rstudio console]{.ul}.
Internet connection.
Current version of R installed on your computer (3.2 or newer).
httr package. If not, install the httr package by copying and pasting the following script into R:

install.packages("httr")

This package makes requesting data from just about any API easier by formatting your GET requests with the proper headers and authentications.

Jsonlite. If not, install jsonlite by copying and pasting the following script into R.

install.packages("jsonlite")

When the data come back from many APIs, they will be presented in JSON format. To make data easier to work with, you\'ll want to convert the JSON from its native nested form to a flat table like a data frame. The *jsonlite* package makes this easy.

GetPass. Install getPass by copying and pasting the following script into R.

install.packages("getPass")

Tidyverse is recommended, but not required. This package assists with plots.

install.packages("tidyverse")

CONNECTING TO THE BURSTIQ APIS¶

The following code is broken into steps to help the user understand how to substitute information to meet specific needs. However, the code is ultimately designed to allow easy production if all sections are copied into an R script file and run with the {width=”0.579831583552056in” height=”0.16911745406824147in”} button.

Load the Necessary R Packages

#
# Ensure that necessary packages are loaded
#
library("httr")
library("jsonlite")
library("getPass")
library("tidyverse")

You may see some conflicts, which is normal.

Prepare and Enter Constants

Add the URL of the website, path, and client ID that you with you access. The URL (which is technically known as the address of the API's “endpoint”) tells the API where to go. Your administrator can provide the information unique to your team’s access.

The following is the information you’ll need to access any data set within Research Foundry regardless of your affiliation and whether your company has its own secure zone.

#
# CONSTANTS/Global variables
#
base_url <- "https://researchfoundry.burstiq.com" # desired URL

path <- "api/burstchain"

client_id <- "8e2ddf65-7f19-457c-9817-59ba9c5d4b36"
# This is the ID you should use to access Research Foundry data sets.
# If you wish to access your own private data set, add your own client_id

Specify Your Access Credentials

To access BurstIQ URL from R, you’ll need to enter the credentials you created when first setting up an account.

This script will prompt you to enter your username and password. You can hard code your username by adding it under the respective line of code but you must enter your password at the pop-up prompt.

# 3 Specify Your User Credentials
#
username <- readline(prompt = "Enter username/email:")
password <- getPass::getPass(msg = "Enter password: ")

Be aware that your username and password will appear as values in the Global Environment.

Specify the Data Sets You are Trying to Access

The “chain name” is the data set you are trying to access. You may add the chain name to the code or simply enter it at the prompt.

#4 Specify the Data Sets You are Trying to Access
# Enter the chain name, or name of dataset you are requesting

chain_name <- readline(prompt = "Enter the chain name: ")

#Example: csse_covid_19_daily_reports
#

For example, if you wish to access the daily COVID-19 counts, you would enter “csse_covid_19_daily_reports” as the chain name.

Connect R to the BurstIQ Application Programming Interfaces

Headers are often used to negotiate other parameters that enable the application to communicate with the API successfully. For example, they may describe the formatting of the data.

# 5 Connect R to the BurstIQ Application Programming Interfaces
# Perform a query against Research Foundry
#

get_data <- function(flow, username, password, client_id, chain_name)
{
   url <- paste(base_url, path, client_id, chain_name, "interrogate", sep="/")

   print(url)

   #print(flow)

   auth <- paste("ID", password)

   resp <- POST(
      url = url,
      body = list("queryTqlFlow" = flow),
      authenticate(username, password),
      content_type("application/json"),
      accept("application/json"),
      encode = "json"
   )

   #
   print("Got past the POST call")

   #
   # return json object of records OR NULL
   # error handling is simple message to console (enhance as needed)
   if (status_code(resp) == 200) {
      #Create content and errors as text
      if (is.null(content(resp, as="text"))) {
         print("The data returned is null!!!")

         return(NULL)
      } else {
         print(jsonlite::fromJSON(content(resp, as="text"), flatten = FALSE))

         data_json <- jsonlite::fromJSON(content(resp, as="text"), flatten = TRUE)

        return(data_json\$records)
      }
   } else {
      print("received error!")

      print(str(status_code(resp)))

      print(content(resp, as="text"))

      return(NULL)
   }
}

#

The status_code variable deserves some special attention. “Status” refers to the success or failure of the API request, and it comes in the form of a number. The number returned tells the program whether or not the request was a success and can also detail some reasons why it might have failed. The number “200” corresponds to a successful request.

Specify variables in Variables.Flow document

Introduction of individual variables can be added within the R script or by using a separate document. For ease of description, the individual variables in this example are listed in a separate file named Variables.Flow written in a SQL-like code.

The examples below list variables specific to the Research Foundry example. Make sure to specify your own desired variables in the Varibles.flow document—not in the R code.

SELECT
   IF(asset.confirmed ISNULL THEN 0 ELSE asset.confirmed) AS confirmed,
   IF(asset.deaths ISNULL THEN 0 ELSE asset.deaths) AS deaths,
   IF(asset.recovered ISNULL THEN 0 ELSE asset.recovered) AS recovered,
   DATEFROMPARTS(
      YEAR(asset.date),
      MONTH:MONTH(asset.date),
      DAY:DAYOFMONTH(asset.date)
   ) AS d

GROUP BY d COMBINE
   FIRST(d) AS d,
   SUM(confirmed) AS total_confirmed,
   SUM(recovered) AS total_recovered,
   SUM(deaths) AS total_deaths

ORDER BY d

GROUP BY ALL COMBINE
   PUSH(d) AS d,
   PUSH(total_confirmed) AS total_confirmed,
   PUSH(total_recovered) AS total_recovered,
   PUSH(total_deaths) AS total_deaths

/*
FINALIZE function(doc) {
   if (doc.d) doc.d = doc.d.getTime();

   return doc;
}
*/

Pull Desired Variables into Memory

This command searches the Variables.flow document for the variables specified in the SQL command code and pulls these desired variables into memory. This action may take a few seconds.

# 7 Pull Desired Variables into Memory
#
print("Run the query to evaluate the variables")
print("SQL Command being requested:")

#
#Check variables.flow file
tmp_query <- paste(readLines("variables.flow", warn = FALSE))
print(tmp_query)

#
query <- paste(readLines("variables.flow", warn = FALSE), collapse = "\\n")

print("Getting data")

#
records <- get_data(query, username, password, client_id, chain_name)

#
# If there was a problem, inform that there was a problem
if (is.null(records)) {
   print("The get data function failed!! The return data is null.
      Aborting program.")

   exit()
}

#If everything was fine, specify that it was a successful data pull
print("The data were obtained successfully.")

#

When running the code, you may see a statement “No encoding supplied: defaulting to UTF-8,” which is fine. The most important feature is whether the data were successfully obtained.

Create the Data Frame

This script “unpacks” the variables named in the Variable.flow document. Remember to specify your own variable names.

# 8 Create the Data Frame
#

x <- unlist(records\$d)
y <- unlist(records\$total_deaths)
x <- 1:length(y)

#

The next script takes the columns and variable information and creates a flat data table.

# Create the data frame
d <- data.frame(x, y)

#
# Obtain names of all columns

names(records)

#
# Now that you have column names, populate the data frame
d <- unlist(records\$d)

#
# Because one variable is a date, convert date from number of seconds to year/m/d

dts = c(d)

class(dts) = c(\'POSIXt\' , \'POSIXct\')

#
total_confirmed <- unlist(records\$total_confirmed)
total_recovered <- unlist(records\$total_recovered)
total_deaths <- unlist(records\$total_deaths)

frame <- data.frame(dts, total_confirmed, total_recovered, total_deaths)

#

View the Data Table and Summary

This script allows you to view the newly created data table and summary statistics. Remember to specify your own variable names.

# 9 View the Data Table and Summary
# Obtain some information about the dataset
#
view(frame) # to visualize the data table
dim(frame) # number of rows and columns
str(frame) # describe the structure of the data set
summary(frame) # shows the minimum, 1st quartile, median, mean, 3rd quartile, and maximum

#

TROUBLESHOOTING¶

Examine whether the cause is almost always one of the following:

Not using a recent enough version of R or R Studio. The version of R must be 3.2 or newer.
Not been granted access to the desired data set. If you not actually been granted access to the data set or the access has expired, you will not be able to access data.
Cannot access the web service through a corporate firewall. There may be a firewall or security parameter preventing access through your network gateway. It may be necessary to contact your IT administrator if the desired path is blocked.
No SSL support installed/configured (required for HTTPS access). This is rarely an issue on Windows systems, but on UNIX systems, an admin must configure SSL certificates.

ACKNOWLEDGEMENTS¶

We are grateful for the assistance of Hiral Desai, PhD for her kind assistance in reviewing and improving the instructions for this sample code.

TERMINOLOGY¶

API¶

An Application Programming Interface, or API, is an interface between two or more applications. The API is a set of rules that allow multiple applications to communicate with each other. This can be as simple as returning data from a database, but also perform complex calculations and return the results. The application is only allowed to connect to endpoints for posting or reading data, making it a secure method to allow to applications to interoperate.

Authentication¶

Authentication is the process of identifying the client who is doing a request. HTTP supports multiple authentication schema such as anonymous authentication and basic authentication.

In basic authentication passwords are encoded but not encrypted and not considered secure. This might be enough f or internal applications, in combination with HTTPS, but very few public APIs will use on basic authentication. They will use the anonymous schema and use on authentication at the application level.

Endpoint¶

The endpoint is the internet address where the web service can be accessed. It is Uniform Address Location (URL) and typically has the following format.

https://api.example.url/users/memberships?type=free&sort=lastname

In the above example the root-endpoint is https://api.example.url and /users/memberships are the path to a specific web service. The final part of the endpoint, ?filter=free&sort=lastname, is optional. This the query string and can be used to add parameters to the web service.

Header tells the server that the server can expect data in the JSON format (Content-Type).

Content-Type: application/json

HTTP(S)¶

HTTP stands for HyperText Transfer Protocol and is a client server protocol that it the foundation of any data exchange on the web. Web Services also rely on HTTP to exchange data between the client and server . HTTP send information in plain text and is not secure. HTTPS is the secure variant that encrypts data in transit. R supports HTTP, making it possible to access web-based data using R.

HTTP Status Codes¶

The status codes are part of the HTTP protocol and can be used to determine quickly if a request has been completed successfully or failed and why. The status codes are grouped in five classes:

Informational responses (100–199)
Successful responses (200–299)
Redirects (300–399)
Client errors (400–499)
Server errors (500–599)

Most status codes are defined in the HTTP/1.1 standard (RFC 7231), but servers can return non-standard code. If the code is not standard, the client should be able to determine the type of errors by the class.

JSON¶

JSON is an open-standard file format or data interchange format that uses human-readable text to transmit or store data objects consisting of attribute–value pairs and array data types. It is lightweight and the most common data format used by REST web services. The introduction of REST was paired to the rise of JavaScript Object Notation (JSON) as format for data exchange.

Request¶

A request consists of 4 elements:

endpoint
header
method
data (optional)

Response¶

The response consists of a header and data. Each response also has a status code indicating how the request was handled.

REST¶

This determines how the API looks like. REST stands for Representational State Transfer The most popular API standard for web applications is REST. The REST architecture is based on a client/server model. A stateless protocol is used for communication between client and server . Accessing a REST web service is called a request. The data returned by the web service is the response.

R¶