Macros in the Shell: Integrating That Spreadsheet From Finance Into a Data Pipeline 2021-05-10 There is many a data science meme degrading excel: (Google Sheets seems to have escaped most of the memes here.) While I no longer use it regularly for the purposes of analysis, I will always have a …

Quantile Regression Forests for Prediction Intervals 2021-04-21 In this post I will build prediction intervals using quantile regression, more specifically, quantile regression forests. This is my third post on prediction intervals. Prior posts: […] This …

Simulating Prediction Intervals 2021-04-05 Part 1 of my series of posts on building prediction intervals used data held-out from model training to evaluate the characteristics of prediction intervals. In this post I will use hold-out data to …

Understanding Prediction Intervals 2021-03-18 Prediction intervals provide a measure of uncertainty for predictions on individual observations. This post… […] This is the first of three posts on prediction intervals (Part 2 employs …

Basics of Data on People Experiencing Homelessness 2021-01-11 This write-up provides a broad overview of data sources and reports relevant for an independent researcher or analyst new to exploring data on people experiencing homelessness. The section on HMIS …

Weighting Confusion Matrices by Outcomes and Observations 2020-12-08 Weighting in predictive modeling may take multiple forms and occur at different steps in the model building process. […] The focus of this post is on the last stage1. I will describe two types …

Undersampling Will Change the Base Rates of Your Model's Predictions 2020-11-23 TLDR: In classification problems, under and over sampling1 techniques shift the distribution of predicted probabilities towards the minority class. If your problem requires accurate probabilities you …

Influencing Distributions with Tiered Incentives 2020-11-02 In this post I will use incentives for sales representatives in pricing to provide examples of factors to consider when attempting to influence an existing distribution. For instance, if you have a …

Gambling Where the House Almost Always Loses... but Still Wins 2020-10-28 In this post, I will describe an example of a game that produces many small wins for the player and occasional large wins for the house. Such a game could take advantage of psychological biases of …

Should You Use an Assignment as Part of Your Hiring Process for a Data Scientist? 2020-10-27 A version of this question was asked on my alumni Slack channel. There were some excellent points brought up by those answering the question in the negative, including that… […] I think each of …

Feature Engineering with Sliding Windows and Lagged Inputs 2020-10-12 The new rsample::sliding_*() functions bring the windowing approaches used in slider to the sampling procedures used in the tidymodels framework1. These functions make evaluation of models with …

A National Popular Vote Weighted by the Electoral College 2020-09-11 TLDR: In this post I discuss using a national popular vote weighted by the electoral college to elect the president. This approach would empower voters by expanding political influence outside of …

Linear Regression in Pricing Analysis, Essential Things to Know 2020-08-17 Pricing is hard. […] Price is Right Contestant… struggling […] This is particularly true with large complicated products, common in Business to Business sales (B2B). B2B sellers may lack …

Animate interactive objects with Face Detection, JavaScript and Chrome Browser 2020-07-20 We spend the majority of our time in front of screens. It’s mostly one of computer/tablet/phone/tv1. These are largely platforms the user owns or controls. I’m surprised we don’t yet have more …

Short Examples of Best Practices When Writing Functions That Call dplyr Verbs 2020-06-25 dplyr, the foundational tidyverse package, makes a trade-off between being easy to code in interactively at the expense of being more difficult to create functions with. The source of the trade-off is …

Use Flipbooks to Explain Your Code and Thought Process 2020-06-24 Using the pipe operator (%>%) is one of my favorite things about coding in R and the tidyverse. However when it was first shown to me, I couldn’t understand what the #rstats nut describing it was …

Tidy Pairwise Operations 2020-06-03 Say you want to map an operation or list of operations across all two-way1 combinations of a set of variables/columns in a dataframe. For example, you may be doing feature engineering and want to …

Riddler Solutions: Pedestrian Puzzles 2020-03-04 This post contains solutions to FiveThirtyEight’s two riddles released 2020-02-14, Riddler Express and Riddler Classic. I created a toy package animatrixr to help with some of the visualizations and …

animatrixr & Visualizing Matrix Transformations pt. 2 2020-02-24 This post is a continuation on my post from last week on Visualizing Matrix Transformations with gganimate. Both posts are largely inspired by Grant Sanderson’s beautiful video series The Essence of …

Visualizing Matrix Transformations 2020-02-20 I highly recommend the fantastic video series Essence of Linear Algebra by Grant Sanderson. In this post I’ll walk through how you can use gganimate and the tidyverse to (very loosely) recreate some …

Riddler Solutions: Palindrome Dates & Ambiguous Absolute Value Bars 2020-02-13 This post contains solutions to FiveThirtyEight’s two riddles released 2020-02-07, Riddler Express and Riddler Classic. Code for figures and solutions can be found on my github page. […] The …

Riddler Solutions: Perfect Bowl & Magnetic Volume 2020-02-06 This post contains solutions to FiveThirtyEight’s two riddles released 2020-01-31, Riddler Express and Riddler Classic. Code for figures and solutions can be found on my github page. […] The …

Iceland Day 6: Perlan & Departure 2020-01-01 I did my best to convince Britney and my parents that we should start the morning with a ‘polar bear plunge’ in the ocean but was unsuccessful in convincing anyone (including myself) to participate. …

Iceland Day 5: Blue Lagoon & New Year’s Eve 2019-12-31 We were out the door by 7:45AM and headed for the Blue Lagoon (where my parents had offered to treat us for the day). The regular Blue Lagoon pool had sold-out of tickets. Instead, we were ‘forced’ to …

Iceland Day 4: Southeast Coast & Diamond Beach 2019-12-30 Britney and I awoke several times in the night to the frigid cold. We would run across the street to a patch of trees where we could relieve ourselves and get away from the streetlights. I looked up …

Iceland Day 3: Thingvellier & Disaster 2019-12-29 We slept in a little later this morning (930AM), had Skyr parfaits with mom and dad for breakfast and more croissants from Brauð & Co. We drove back to the UNESCO World heritage site, …

Iceland Day 2: Golden Circle & Snowmobiling 2019-12-28 I grabbed an assortment of croissants from Brauð & Co, half a block from our apartment. We were on the road headed East by 7:05AM. The morning was strikingly dark. Clouds obscured any starlight. A …

Iceland Day 1: Landing & City Tour 2019-12-27 My parents, Britney and I landed in Reykjavik at 630AM. We’d taken an eight-and-a-half-hour overnight flight from Seattle. It was dark when we stepped off the plane and would remain dark until 11AM …