Use the folder structure from here. Write your answers in a do-file and submit the whole folder (rename it suitably: name_session3assignment). Do not delete the empty sub-folders. At the top of the do-file, mention your name, date, and purpose of the do-file. Follow the do-file system from here.
For each question below, first write a comment signalling the question number, then a very brief description about what you need to do/are going to do, then the code, and then write the final answer (if the question asks for it, like number of observations).
We continue with the vaccination data available here. Remember to save it in the 01_raw_data
folder!
-
Keep the following variables:
lgd_state_name
,lgd_district_name
,date_day
,date_month
,total_covaxin
, andtotal_covishield
. -
The data is currently at a district-day level. We want to transform it to the state-day level. Follow these steps:
a. Add up the values of
total_covaxin
andtotal_covishield
at the state-day level. That is, for each day, we want to add the respective vaccination variables to get at the state-level figure. Use thesort
+by
or thebysort
command and then thetotal
function of theegenerate
command.b. The resultant data will have duplicates in the state and date variables. Drop duplicates using the
duplicates drop
command.c. Save the dataset (name it appropriately) in the
final_data
folder. -
Start over from the raw dataset. Repeat step 2 but now expand the list of variables to
total_covaxin
,total_covishield
,male_vac
,female_vac
, andtrans_vac
. What you should try doing here is to use a for-loop in step 2.a so that you don’t have to write thebysort
andegen
chunk repeatedly for each variable. Save the resultant dataset (name it differently from what you did in 2.c) in thefinal_data
folder. -
Lets use the dataset from step 2.c. It contains the number of covaxin and covishield shots administered on each day in different states. A metric we would be interested in looking at is the cumulative number of vaccinations. On a given date, such a variable will equal the sum of vaccinations administered until then. The
sum
function of thegenerate
command does this. Typehelp sum()
to see its help file. Create a variable calledtotal_covaxin_cumm
that is a cumulative sum oftotal_covaxin
over time for each state.a. Sort the dataset on
lgd_state_name
,date_day
, anddate_month
. After sorting, browse the data to ensure it looks alright.b. Use the
by
alongwithgenerate
(and itssum
function) to create the variable.c. Refer to this related problem for hints on how to do this using
bysort
. Adapt it to our context and createtotal_covaxin_cumm2
usingbysort
.