CV | Contact

Nonparametric Classification with Bayes' Theorem

A quick explanation of the basics of classification using a kernel density estimate and Bayes' Theorem. [2016-09-11]

Using Prime Decompositions to Understand Machine Learning Algorithms

Let's play with prime numbers and machine learning classifiers! [2016-08-08] [Notebook]

Modeling Mauna Lao CO2 data

Monthly CO2 measurements from the Mauna Loa observatory show an increasing mean over time as well as a seasonal trend. In this notebook we'll see how to analyze and model this data using autoregressive models (ARMA and ARIMA). [2016-07-26]

Cognitive Biases Frequently Encountered by the Practicing Data Scientist

Most people are not trained in statistics or strategic decision making, and as a data scientist you frequently encounter congnitive biases when explaining results and making recommendations. This article describes several that I have encountered. [2016-07-25]

Big O Isn't Everything: Breadth-First versus Depth-First Search

A breadth-first search always finds a shortest path between two vertices in a graph, but is the run time always better than depth-first search? No -- in fact, in some cases BFS can be arbitrarily worse in run time than DFS. [2016-07-21]

Clustering with Scikit-Learn

A practice notebook for clustering with scikit learn: k-Means, DBSCAN, and the silhouette metrics. (Practice for students in my data science course). [2016-07-11]

Logistic Regression with Scikit-Learn

A comparison of linear and logistic models. [2016-06-27]

When the Best Model isn't the Best Model

A simple example of when a less accurate model fits the context better. [2016-06-27]

k-Nearest Neighbors Classification with Scikit-Learn

A quick look at some decision boundaries and classification examples with k-NN, with comparisons to decision tree and linear support vector machine classifiers. [2016-06-22]

Lasso, Ridge, and Gradient Descent Examples with Scikit-Learn

Also: plotting a loss function to see the landscape that gradient descent optimizes over, and the effect that regularization has on the landscape. [2016-06-21]

Linear Regression with Statsmodels and Scikit-Learn

There are many ways to fit a linear regression and in python I find myself commonly using both scikit-learn and statsmodels. This notebook includes examples from both, polynomial fits, dummy variables, and other common tasks. [2016-06-14]

The Central Limit Theorem and Sampling Distributions

A simple Jupyter notebook exploring the central limit theorem. [2016-06-05]

The Moran Process with the Axelrod Library (Jupyter Notebook)

Implementing the Moran process, a popular population model of selection, is easy with the Axelrod library. [2016-03-21]

Classification and Estimation with Scikit-learn, Pandas, Matplotlib, Seaborn, and Jupyter

A quick tutorial of how to visualize, estimate, and classify with the Abalone dataset. [2015-01-24]

Gun Deaths and Gun Ownership -- What does the data say?

Do mass shootings, gun murders, and gun deaths occur in the states with higher gun ownership? [2015-12-04]

An Analysis of the Iterated Prisoner's Dilemma, Part II

Following the exploratory analysis in the last article, let's take what we learned and build predictive models for both tournament scores and wins. As it turns out, we can find two very good models using multiple linear regression. [2015-11-17]

An Analysis of the Iterated Prisoner's Dilemma, Part I

This is an exploratory data analysis of a collection of 100 iterated prisoner's dilemma strategies, their depth of memory, how they act in various contexts, and how they perform in round-robin tournaments. Part 1 in a series. [2015-11-16]

Inferring Memory Depth of Iterated Prisoner's Dilemma Strategies

Last time I talked about a series of metrics for measuring how cooperative a prisoner's dilemma strategy is based on PageRank-like eigenvector methods. In this post we will attempt to infer the memory depth of various strategies. [2015-10-20]

Morality Metrics for the Iterated Prisoner's Dilemma

In 2014, Tyler Singer-Clark defined several morality metrics that evaluate strategies for iterated prisoner's dilemma tournaments. There is a good summary on Scott Aaronson's blog and Singer-Clark's manuscript [pdf] is quite readable. The Axelrod library is now capable of reproducing Singer-Clark's results as well as extending them to the study of all the strategies in the library (currently about 80 ordinary strategies have been implemented). [2015-09-05]

Bayesian Inference for Bernoulli processes: Is that coin fair?

Bayesian inference is the use of Bayes' theorem to develop probability distributions for various phenomena and can be used to estimate parameters for various probability distributions. This article has a companion widget that you should open in a new window or tab. [2015-08-24]

Computing Many Totients Quickly

For Project Euler Problem 70 I needed to compute a lot of Euler Totients quickly. [2015-08-07]

The Birthday Problem in Jury Selection and How I Nearly Served on a Jury

Recently I was called in for jury duty, my first time in a courtroom as a potential juror, and the first time in a court in many years. It was an enlightening if somewhat disheartening experience. [2015-08-04]