9 min read

Pulling Twitter Engagements Using the v2 API as Well as rtweet

Tags: rjson httr jsonlite dplyr purrr lubridate rtweet tidyr glue rstudioapi fs readr tidyverse

This is a follow-up to a short post I wrote on R Access to Twitter’s v2 API. In this post I’ll walk through a few more examples of pulling data from twitter using a mix of Twitter’s v2 API as well as the {rtweet} package1.

I’ll pull all Twitter users that I (brshallo) have recently been engaged by (e.g. they like my tweet) or engaged with (e.g. I like their tweet). I’ll lean towards using {rtweet}2 but will use {httr} in cases where it’s more convenient to use Twitter’s v2 API3.

For this post I’m not really worried about optimizing my queries, minimizing API hits, etc. E.g. when using {rtweet} I should authenticate through my project app which has higher rate limits (see Authentication options) but instead I just use the default {rtweet} user authentication. Note also that the default {rtweet} authentication only works when running scripts interactively4.

See prior post for links on authentication mechanisms. I’m assuming you have “TWITTER_BEARER”5 as well as “TWITTER_PAT”6 in your .Renviron file.

library(rjson)
require(httr)
require(jsonlite)
require(dplyr)
library(purrr)
library(lubridate)
library(rtweet)
library(tidyr)

# bearer_token only used when using httr and twitter v2 API
bearer_token <- Sys.getenv("TWITTER_BEARER")
headers <- c(`Authorization` = sprintf('Bearer %s', bearer_token))

GETting all engagements

In each sub-section I’ll pull a different kind of engagement.

  1. GET favorited users
  2. GET all tweets from user – starting point for most of the following sections
  3. From initial query GET references in those tweets
  4. Filter to only tweets with likes, GET favoriters
  5. Filter to only tweets with quotes, search URL’s to GET quoters
  6. Filter to only tweets with retweets, GET retweeters
  7. GET repliers and mentions

I’ll finish by Putting them together into a function. Note that not all queries are perfect at pulling all engagements7.

GET favorited users

It’s often easiest to just let {rtweet} do the work.

# Twitter id for brshallo
user_id <- "307012324"

favorites <- rtweet::get_favorites(user = user_id)

GET all tweets from user

Pulls up to 100 of the most recent tweets from a user8.

url_handle <- glue::glue("https://api.twitter.com/2/users/{user_id}/tweets?max_results=100", user_id = user_id)

params <- list(tweet.fields = "public_metrics,created_at,in_reply_to_user_id,referenced_tweets")

response <- httr::GET(url = url_handle,
                     httr::add_headers(.headers = headers),
                     query = params)

obj <- httr::content(response, as = "text")

json_data <- jsonlite::fromJSON(obj, flatten = TRUE)$data %>% 
  as_tibble()

GET references

statuses_referenced <- bind_rows(json_data$referenced_tweets) %>% 
  rename(status_id = id)

users_referenced <- rtweet::lookup_tweets(statuses_referenced$status_id)

GET favoriters

Filter initial query of tweets to only those with more than 0 likes.

liked_tweets <- json_data %>% 
  filter(public_metrics.like_count > 0)

Functionalize approach described in getting favoriters from prior post R Access to Twitter’s v2 API and map tweet-ids through.

tweet_ids <- liked_tweets$id

get_favoriters <- function(tweet_id){
  url_handle <- glue::glue("https://api.twitter.com/2/tweets/{status_id}/liking_users", status_id = tweet_id)
  
  response <- httr::GET(url = url_handle,
                       httr::add_headers(.headers = headers))
                       # query = params)
  
  obj <- httr::content(response, as = "text")
  x <- rjson::fromJSON(obj)
  
  x$data %>% 
    map_dfr(as_tibble)
}

tweet_favoriters <-
  map_dfr(tweet_ids, ~ bind_cols(tibble(liked_status_id = .x),
                                get_favoriters(.x))) %>%
  rename(user_id = id)

GET quoters

Filter to only posts with quotes.

tweet_ids_quoters <- json_data %>% 
  filter(public_metrics.quote_count > 0) %>%
  pull(id)

However I am not positive the approach below actually picks up all quotes9. I’d also reviewed some other approaches10.

search_tweets_urls <- function(tweet_id){
  rtweet::search_tweets(
    glue::glue("url:{tweet_id}", 
               tweet_id = tweet_id)
    )
} 

quoters <- map_dfr(tweet_ids_quoters, search_tweets_urls) %>% 
  filter(is_quote) %>% 
  as_tibble()

GET retweeters

Filter to only posts that were retweeted.

tweet_ids_rt <- json_data %>% 
  filter(public_metrics.retweet_count > 0) %>%
  select(status_id = id)

I use a slightly different approach in this section than in other similar sections11.

retweeters <- tweet_ids_rt %>% 
  mutate(retweeters = map(status_id, get_retweeters)) %>% 
  unnest(retweeters)

GET repliers and mentions

Alternatively you might just use rtweet::get_mentions() but this only pulls mentions of the currently authenticated user. I also tried other approaches here12.

get_mentions_v2 <- function(user_id){
  url_handle <- glue::glue("https://api.twitter.com/2/users/{user_id}/mentions", user_id = user_id)
  
  response <- httr::GET(url = url_handle,
                        httr::add_headers(.headers = headers))
  
  obj <- httr::content(response, as = "text")
  x <- rjson::fromJSON(obj)
  
  x$data %>% 
    map_dfr(as_tibble)
}

tweets_mentions <- get_mentions_v2(gorthon_id)

repliers_mentions <- lookup_tweets(mentions$id)

Putting them together into a function

The function at this gist returns the output from each of the above sections as a list.

# Twitter id for brshallo
user_id <- "307012324"

# load function get_engagements()
source("https://gist.githubusercontent.com/brshallo/119d6a1f858e0e5c20d77212dee8891a/raw/751d022c7bc2e2148292bb78a5178737d9914024/get-engagements.R")

brshallo_engagements <- get_engagements(user_id)

brshallo_engagements
## $favorites
## # A tibble: 10 x 91
##    user_id             status_id  created_at          screen_name text    source
##  * <chr>               <chr>      <dttm>              <chr>       <chr>   <chr> 
##  1 248350998           151302361~ 2022-04-10 05:18:34 BuildABarr  "Drop ~ Twitt~
##  2 368551889           151263551~ 2022-04-09 03:36:23 IsabellaGh~ "@elli~ Twitt~
##  3 1469531055736590337 151242047~ 2022-04-08 13:21:54 emkayco     "Have ~ Twitt~
##  4 35794978            151196918~ 2022-04-07 07:28:38 _lionelhen~ "@brsh~ Twitt~
##  5 29916355            151194957~ 2022-04-07 06:10:44 jimjam_slam "@brsh~ Twitt~
##  6 29916355            151195192~ 2022-04-07 06:20:03 jimjam_slam "@brsh~ Twitt~
##  7 29916355            151189984~ 2022-04-07 02:53:06 jimjam_slam "@mdne~ Twitt~
##  8 3089027769          151189179~ 2022-04-07 02:21:09 gyp_casino  "@mdne~ Twitt~
##  9 15772978            151132777~ 2022-04-05 12:59:55 jessicagar~ "@brsh~ Twitt~
## 10 144592995           151129000~ 2022-04-05 10:29:49 Rbloggers   "R Acc~ r-blo~
## # ... with 85 more variables: display_text_width <dbl>,
## #   reply_to_status_id <chr>, reply_to_user_id <chr>,
## #   reply_to_screen_name <chr>, is_quote <lgl>, is_retweet <lgl>,
## #   favorite_count <int>, retweet_count <int>, quote_count <int>,
## #   reply_count <int>, hashtags <list>, symbols <list>, urls_url <list>,
## #   urls_t.co <list>, urls_expanded_url <list>, media_url <list>,
## #   media_t.co <list>, media_expanded_url <list>, media_type <list>, ...
## 
## $favoriters
## # A tibble: 90 x 4
##    liked_status_id     user_id            name           username       
##    <chr>               <chr>              <chr>          <chr>          
##  1 1512295676004093955 117241741          Brett J. Gall  brettjgall     
##  2 1512295676004093955 2724597409         Peter Ellis    ellis2013nz    
##  3 1512294950905409543 274123666          Kristen Downs  KristenDDowns  
##  4 1512293864517750790 3656879234         <U+5F20><U+4EAE>           psychelzh      
##  5 1512293864517750790 703843771419484160 Ayush Patel    ayushbipinpatel
##  6 1512293864517750790 419185498          Kevin Gilds    Kevin_Gilds    
##  7 1512293864517750790 127357236          Juan LB        Juan_FLB       
##  8 1512293864517750790 49451947           Luis Remiro    LuisMRemiro    
##  9 1512293864517750790 253175044          Nicholas Viau  nicholasviau   
## 10 1512293864517750790 2202983986         Stefania Klayn Ettti_20       
## # ... with 80 more rows
## 
## $references
## # A tibble: 12 x 90
##    user_id            status_id  created_at          screen_name  text    source
##    <chr>              <chr>      <dttm>              <chr>        <chr>   <chr> 
##  1 307012324          151115943~ 2022-04-05 01:50:59 brshallo     "As an~ Twitt~
##  2 307012324          151229344~ 2022-04-08 04:57:09 brshallo     "@mdne~ Twitt~
##  3 307012324          150969487~ 2022-04-01 00:51:20 brshallo     "It al~ Twitt~
##  4 307012324          151229386~ 2022-04-08 04:58:49 brshallo     "@mdne~ Twitt~
##  5 307012324          147233714~ 2021-12-18 22:45:04 brshallo     "First~ Twitt~
##  6 29916355           151189984~ 2022-04-07 02:53:06 jimjam_slam  "@mdne~ Twitt~
##  7 29916355           151194957~ 2022-04-07 06:10:44 jimjam_slam  "@brsh~ Twitt~
##  8 144592995          151129000~ 2022-04-05 10:29:49 Rbloggers    "R Acc~ r-blo~
##  9 248350998          151302361~ 2022-04-10 05:18:34 BuildABarr   "Drop ~ Twitt~
## 10 3146735425         151226195~ 2022-04-08 02:52:00 mdneuzerling "Lovel~ Twitt~
## 11 983470194982088704 151182189~ 2022-04-06 21:43:22 R4DScommuni~ "The n~ Zapie~
## 12 2724597409         151226515~ 2022-04-08 03:04:44 ellis2013nz  "@mdne~ Twitt~
## # ... with 84 more variables: display_text_width <dbl>,
## #   reply_to_status_id <chr>, reply_to_user_id <chr>,
## #   reply_to_screen_name <chr>, is_quote <lgl>, is_retweet <lgl>,
## #   favorite_count <int>, retweet_count <int>, quote_count <int>,
## #   reply_count <int>, hashtags <list>, symbols <list>, urls_url <list>,
## #   urls_t.co <list>, urls_expanded_url <list>, media_url <list>,
## #   media_t.co <list>, media_expanded_url <list>, media_type <list>, ...
## 
## $quoters
## NULL
## 
## $retweeters
## # A tibble: 11 x 2
##    status_id           user_id            
##    <chr>               <chr>              
##  1 1512293864517750790 296222670          
##  2 1512293864517750790 307012324          
##  3 1511869112401596423 4034079677         
##  4 1511869112401596423 1306626901432324097
##  5 1511869112401596423 1011817655957893120
##  6 1511469730892156928 1011817655957893120
##  7 1511469730892156928 1306626901432324097
##  8 1511159434717761539 1448348827979747333
##  9 1511159434717761539 15772978           
## 10 1511159434717761539 1011817655957893120
## 11 1511159434717761539 1306626901432324097
## 
## $referencers
## # A tibble: 10 x 90
##    user_id             status_id  created_at          screen_name text    source
##    <chr>               <chr>      <dttm>              <chr>       <chr>   <chr> 
##  1 61542689            150992063~ 2022-04-01 15:48:26 twelvespot  "@brsh~ Twitt~
##  2 61542689            150994022~ 2022-04-01 17:06:17 twelvespot  "@brsh~ Twitt~
##  3 18433005            151007180~ 2022-04-02 01:49:09 rcrdleitao  "@brsh~ Twitt~
##  4 35794978            151196918~ 2022-04-07 07:28:38 _lionelhen~ "@brsh~ Twitt~
##  5 1346474633520824320 150985661~ 2022-04-01 11:34:03 markjrieke  "@brsh~ Twitt~
##  6 29916355            151195192~ 2022-04-07 06:20:03 jimjam_slam "@brsh~ Twitt~
##  7 29916355            151195162~ 2022-04-07 06:18:51 jimjam_slam "@brsh~ Twitt~
##  8 29916355            151194957~ 2022-04-07 06:10:44 jimjam_slam "@brsh~ Twitt~
##  9 15772978            151132777~ 2022-04-05 12:59:55 jessicagar~ "@brsh~ Twitt~
## 10 15772978            151117782~ 2022-04-05 03:04:04 jessicagar~ "@brsh~ Twitt~
## # ... with 84 more variables: display_text_width <dbl>,
## #   reply_to_status_id <chr>, reply_to_user_id <chr>,
## #   reply_to_screen_name <chr>, is_quote <lgl>, is_retweet <lgl>,
## #   favorite_count <int>, retweet_count <int>, quote_count <int>,
## #   reply_count <int>, hashtags <list>, symbols <list>, urls_url <list>,
## #   urls_t.co <list>, urls_expanded_url <list>, media_url <list>,
## #   media_t.co <list>, media_expanded_url <list>, media_type <list>, ...

  1. Which as of this writing uses the 1.1 API.↩︎

  2. As it takes less code.↩︎

  3. Or in cases where the field isn’t available in {rtweet}. V2 is not yet supported by {rtweet} but is actively being worked on so this post may have a short shelf-life.↩︎

  4. You’ll need to authenticate through a Twitter developer portal app keys if you want to run those sections automatically. You’ll notice that in creating this script I actually don’t evaluate most of the sections and then use some hidden code chunks to return output.↩︎

  5. For the sections where I use {httr} in this post.↩︎

  6. For the sections where I use {rtweet}. This should be set-up through the default {rtweet} set-up.↩︎

  7. This seemed to particularly be the case when it came to seeing all quotes and mentions.↩︎

  8. The reason I’m using {httr} and v2 instead of {rtweet} for this is that the 1.1 API (that {rtweet} currently uses) doesn’t pull quote count unless you have a premium or enterprise account rtweet#640.↩︎

  9. Thread here seemed to suggest that just searching the url was the way to go.↩︎

  10. This also seems to be way to see quoters: https://twittercommunity.com/t/how-we-can-get-list-of-replies-on-a-tweet-or-reply-to-a-tweet-in-twitter-api/144958/7

    get_quoters <- function(tweet_id){
      url_handle <- glue::glue("https://api.twitter.com/2/tweets/search/recent?tweet.fields=author_id&query=url:{status_id}", status_id = tweet_id)
    
      response <- httr::GET(url = url_handle,
                           httr::add_headers(.headers = headers))
                           # query = params)
    
      obj <- httr::content(response, as = "text")
      x <- rjson::fromJSON(obj)
    
      x$data %>% 
        map_dfr(as_tibble)
    }
    
    quoters <- map(tweet_ids_quoters, get_quoters)
    ↩︎
  11. rtweet::get_retweeters() has a lot fewer columns returned compared to that from rtweet::search_tweets(), which is why I use select() above and a different method than the section before and after this where I instead use pull() and then pass the ideas directly to purrr::map*() statements rather than wrapping them in a mutate() verb – which would have worked just as well. The structures of the manipulation are nearly the same… maybe should have stayed consistent here and written a function to make clear the pattern here is the same, c’est la vie.↩︎

  12. Another simple approach would be to just try: rtweet::search_tweets("@brshallo") . I tried the approach below, but really didnt’ seem to work quite as expected…

    tweet_ids_repliers <- json_data %>% 
      filter(public_metrics.reply_count > 0) %>%
      pull(id)
    
    # pulled from here: https://twittercommunity.com/t/how-to-fetch-retweets-and-quote-tweets-from-the-twitter-v2-search-api/156573 but didn't really work as expected...
    get_replies <- function(tweet_id){
    url_handle <- glue::glue("https://api.twitter.com/2/tweets/search/recent?tweet.fields=author_id&query=conversation_id:{status_id}", status_id = tweet_id)
    
    response <- httr::GET(url = url_handle,
                         httr::add_headers(.headers = headers))
                         # query = params)
    
    obj <- httr::content(response, as = "text")
    x <- rjson::fromJSON(obj)
    
    x$data %>% 
      map_dfr(as_tibble)
    }
    
    repliers <- map(tweet_ids_repliers, get_replies)
    filter(is_quote)
    
    repliers <- bind_rows(repliers)
    ↩︎