Chris Bail, PhD
Duke University
www.chrisbail.net
github.com/cbail
twitter.com/chris_bail
Check out my Text as Data Course

Introduction

Application Programming Interfaces, or APIs, have become one of the most important ways to access and transfer data online— and increasingly APIs can even analyze your data as well. Compared to screen-scraping data, which is often illegal, logistically difficult (or both), APIs are a useful tool to make custom requests for data in manner that is well structured and considerably easier to work with than the HTML or XML data described in my previous tutorials on screenscraping. This tutorial assumes basic knowledge about R and other skills described in previous tutorials at the link above.

What Is an Application Programming Interface?

APIs are tools for building apps or other forms of software that help people access certain parts of large databases. Software developers can combine these tools in various ways—or combine them with tools from other APIs—in order to generate even more useful tools. Most of us use such apps each day. For example, if you install the Spotify app within your Facebook page to share music with your friends, this app is extracting data from Spotify’s API and then posting it to your Facebook page by communicating with Facebook’s API. There are countless examples of this on the internet at present— thanks in large part to the advent of Web 2.0, or the historical moment where the internet websites became became much more intertwined and dependent.

The number of APIs that are publicly available has expanded dramatically over the past decade, as the figure below shows. At the time of this writing, the website Programmable Web lists more than 19,638 APIs from sites as diverse as Google, Amazon, YouTube, the New York Times, del.icio.us, LinkedIn, and many others. Though the core function of most APIs is to provide software developers with access to data, many APIs now analyze data as well. This might include facial recognition APIs, voice to text APIs, APIs that produce data visualizations, and so on.


How Does an API Work?

In order to illustrate how an API works, it will be useful to start with a very simple one. Suppose we want to use the Google Maps API to geo-code a named entity— or tag the name of a place with latitude and longitude coordinates. The way that we do this, is to write a URL address that a) names the API; and b) includes the text of the query we want to make. If we Googled “Google Maps API Geocode” we would eventually be pointed towards the documentation for that API and learn that the base-URL for the Google Maps API is https://maps.googleapis.com. We want to use the geocoding function of this API, so we need a URL that points to this more specific part of the API: https://maps.googleapis.com/maps/api/geocode/json?address=. We can then add a named entity to the end of the URL such as “Duke” using text that looks something like this: follows: https://maps.googleapis.com/maps/api/geocode/json?address=Duke. This link (with some additional text that I will describe below) produces this output in a web browser: