Network Visualizations of Code Collections (funspotr part 3)

Tags: dplyr funspotr readr

In previous posts and threads I’ve alluded to the potential utility of visualizing the relationships between parsed functions/packages and files as a network plot.

I added the function network_plot() to funspotr. In this post I’ll simply output the network plots of the parsed-out packages from the code collections discussed in the prior two posts:

library(dplyr)
library(funspotr)

Interactive network plots

The network plots show files as squares and packages as circles, edges represent cases where a package is used in a given file1.

Julia Silge Blog

readr::read_csv("https://raw.githubusercontent.com/brshallo/funspotr-examples/main/data/funs/jsilge-blog-funs-20220114.csv") %>% 
  # not including base R or any custom functions or packages I don't have installed
  filter(!is.na(pkgs), !(pkgs %in% c("base", "(unknown)"))) %>% 
  network_plot(to = pkgs)
  • tidymodels and tidyverse packages are both central to Julia’s posts. The cluster of tidymodels packages show-up (for the most part) just to the right of the cluster of core tidyverse packages.

David Robinson Tidy Tuesday

readr::read_csv("https://raw.githubusercontent.com/brshallo/funspotr-examples/main/data/funs/drob-tidy-tuesdays-funs-20220114.csv") %>% 
  filter(!is.na(pkgs), !(pkgs %in% c("base", "(unknown)"))) %>% 
  network_plot(to = pkgs)
  • Similar to Julia’s posts, tidyverse packages are central to David’s Tidy Tuesday files.. However the tidymodels packages are less central and can be seen in a cluster at the bottom of the plot.
  • In both plots we see {broom} not showing-up by the other tidymodels packages. This is unsurprising for while broom is in the tidymodels ecosystem it has many common uses outside of predictive modeling and has a longer legacy than most tidymodels packages.

R for Data Science Chapters

readr::read_csv("https://raw.githubusercontent.com/brshallo/funspotr-examples/main/data/funs/r4ds-chapter-files-funs-20220117.csv") %>% 
  filter(!is.na(pkgs), !(pkgs %in% c("base", "(unknown)"))) %>% 
  network_plot(to = pkgs)

My blog

readr::read_csv("https://raw.githubusercontent.com/brshallo/funspotr-examples/main/data/funs/brshallo-blog-funs-20220114.csv") %>% 
  filter(!is.na(pkgs), !(pkgs %in% c("base", "(unknown)"))) %>% 
  network_plot(to = pkgs)

My gists

readr::read_csv("https://raw.githubusercontent.com/brshallo/brshallo/master/content/post/2022-02-07-identifying-r-functions-packages-in-your-github-gists/data/brshallo-gists-20220314.csv") %>% 
  filter(!is.na(pkgs), !(pkgs %in% c("base", "(unknown)"))) %>% 
  network_plot(to = pkgs)
  • This figure is a bit different than the graph shown in my tweet above as it includes more of my gists and uses a different algorithm to construct the network.
  • dplyr, purrr, and tidyr are the three packages at the center

  1. With all of these I think more time could go into tailoring the network plot. It would also be interesting to look into measures of network relatedness between the files… maybe in a future post…↩︎