The project questionnaire: from ideas to action

March 16, 2024

Over the past 5 years—while helping create projects at Code for Philly and sitting on organizing committees for events—I’ve often reached for the Code for Philly Project Questionnaire. In this post, I want to discuss what makes the questionnaire so useful, and the different situations I’ve ended up using it in. Here are some quick numbers on project questionnaires filled out over the years: Code for Philly projects: 25 questionnaires for projects related to bail funds, covid dashboards, etc… (See this article discussing the projects we focused on in 2020). ... Read more

Pandas has a hard job (and does it well)

May 26, 2020

I’ve had to dive into pandas’ code base over the last year for a project (siuba), and my attitude has shifted dramatically from.. old attitude: why does pandas have to make things so hard? new attitude: pandas has a crazy difficult job. I think this is most apparent in the functions that decide what dtype a Block—the most basic thing that stores data in pandas—should be. For the ubiquitous Object dtype, it often figures out which of the many possible more specific types to cast it to. ... Read more

Single dispatch for democratizing data science tools

February 24, 2020

Imagine you had to implement some action across classes in 60 packages. You know what result you want, but may need to handle each class in a specific way. For example, Jupyter notebooks need to represent python classes as html. The broom package in R uses its tidy() function to summarize different statistical models. In this post I will discuss two approaches you could take to do this. Class focused: have people define a specific method name on their classes. ... Read more

What would it take to recreate dplyr in python?

February 11, 2020

Recently, I left my job as a data scientist at DataCamp to focus full time on two areas: co-directing the non-profit Code for Philly bringing the magic of dplyr to python In order to do the second part, I’ve worked over the past year on a data analysis library called siuba. As part of this work, I’ve found myself often discussing siuba’s hardest job: making grouped operations a delight. In this post I’ll provide a high-level overview of three key challenges for porting dplyr to python. ... Read more

Using R and the A* Algorithm: Cruising Around Minecraft

March 18, 2019

(This article is the last in a series on using the A* algorithm in R. See the first and second posts for more.) Last year at the NYC R conference, I had the chance to see David Smith demonstrate building and navigating a Minecraft maze, using the miner package. It was really cool! At the end of the talk, as we stepped out of the maze, my gaze turned to the lofty minecraft peaks in the distance. ... Read more

Using R and the A* Algorithm: Animated Pathfinding with gganimate

February 27, 2019

This post is the second part of a series on using the A* algorithm in R. While my previous post introduced the machow/astar-r library, and how it works, in this one I’ll focus on visualizing it finding a solution with gganimate. Below is an outline of what I’ll cover. manually define a maze and plot it with ggplot use an example class from the astar library to navigate it add a bonus picture of a gnome to the maze use a single line of gganimate to animate the A* search Drawing the maze First, we’ll load in the necessary libraries, and create a simple maze to navigate. ... Read more

Using R and the A* Algorithm: Turning Cats into Dogs

January 21, 2019

Recently, I’ve come across a 3 problems that were solved quickly using the A* algorithm: Splitting cantonese sentences into words (e.g. 我好肚餓 -> 我 - 好 - 肚餓). Comparing how similar sounding two english words are. Cruising around minecraft. Since I started on these problems using python, the python-astar package got me up and running quickly. However, when switching to R I wasn’t able to find it in any libraries, like igraph. ... Read more

Teaching Data Science to High Schoolers

April 5, 2018

Over the past year I’ve worked on the tools to execute and grade code behind the scenes at DataCamp. This work has ranged from expanding our open source tools for grading R and Python code, to running SQL and bash exercises. However, while helping scale up education data science education to thousands of students is something I’ve wanted to do since helping teach statistics in grad school, there’s is a certain sanity in being in a room with handful of students. ... Read more

Follow on Twitter | Hucore theme & Hugo