Missing data is one of the most common problems a data scientist encounters in data analysis. A a couple of quick solutions for dealing with missing values are “remove the observations with missing values from the dataset” or “fill in the missing values with the mean, median, or mode”. However, how good are these quick fixes? Can we do better? In this article, I am going to (1)give a quick introduction to the different types of missing values, (2)visualize missing values, (3)implement multivariate imputation with scikit-learn, (4) test imputed datasets, and (5) draw conclusions.

Categorizing missing data

Missing values can be separated into…

image by author

My family has lived in San Francisco, California for the past 20 years, and we have witnessed how much housing prices have increased over those years. According to this article, the median home value increased 90% from about $720,000 to $1.36 million, in the period between 2009 to 2019. Today, the majority of the properties in San Francisco cost over a million dollars. For those who want to own a home in this city, I’ve always wondered, “When would be the best time to buy, because prices seem to always go up? What type of property and where in the…

Sara Zong

Sara has always been curious about finding patterns from data generated in real-world situations and using those findings to guide decision making.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store