Link Search Menu Expand Document

Session 1

We will use covid19 vaccination data for India as the example dataset for today’s session. It has been compiled, cleaned, and made publicly available by the Development Data Lab. I cannot emphasize enough the awesome work being done by this team, go check out the datasets they have painstakingly cleaned and linked to each other! Anyway, the dataset we will focus on is called covid_vaccination.dta here.

I have added a few variables to this dataset and saved a copy on my drive. We will use this version.

Before delving into any kind of cleaning or analysis, it is important to understand the dataset’s structure. I’m going to give you a spoiler for now but you will learn commands today that can help you do the same. The dataset has vaccination estimates at the district-date level. What does district-date level? Each row represents a unique combination of a district and a day. So, say, for Karnal (Haryana), we will have multiple observations, all pertaining to different dates. Another way to put it: the data contains daily vaccination estimates at the district level.


Table of contents