Use the folder structure from here. Write your answers in a do-file and submit the whole folder (rename it suitably: name_session3assignment). Do not delete the empty sub-folders. At the top of the do-file, mention your name, date, and purpose of the do-file. Follow the do-file system from here.
For each question below, first write a comment signalling the question number, then a very brief description about what you need to do/are going to do, then the code, and then write the final answer (if the question asks for it, like number of observations).
We continue with the vaccination data available here. Remember to save it in the 01_raw_data folder!
-
Keep the following variables:
lgd_state_name,lgd_district_name,date_day,date_month,total_covaxin, andtotal_covishield. -
The data is currently at a district-day level. We want to transform it to the state-day level. Follow these steps:
a. Add up the values of
total_covaxinandtotal_covishieldat the state-day level. That is, for each day, we want to add the respective vaccination variables to get at the state-level figure. Use thesort+byor thebysortcommand and then thetotalfunction of theegeneratecommand.b. The resultant data will have duplicates in the state and date variables. Drop duplicates using the
duplicates dropcommand.c. Save the dataset (name it appropriately) in the
final_datafolder. -
Start over from the raw dataset. Repeat step 2 but now expand the list of variables to
total_covaxin,total_covishield,male_vac,female_vac, andtrans_vac. What you should try doing here is to use a for-loop in step 2.a so that you don’t have to write thebysortandegenchunk repeatedly for each variable. Save the resultant dataset (name it differently from what you did in 2.c) in thefinal_datafolder. -
Lets use the dataset from step 2.c. It contains the number of covaxin and covishield shots administered on each day in different states. A metric we would be interested in looking at is the cumulative number of vaccinations. On a given date, such a variable will equal the sum of vaccinations administered until then. The
sumfunction of thegeneratecommand does this. Typehelp sum()to see its help file. Create a variable calledtotal_covaxin_cummthat is a cumulative sum oftotal_covaxinover time for each state.a. Sort the dataset on
lgd_state_name,date_day, anddate_month. After sorting, browse the data to ensure it looks alright.b. Use the
byalongwithgenerate(and itssumfunction) to create the variable.c. Refer to this related problem for hints on how to do this using
bysort. Adapt it to our context and createtotal_covaxin_cumm2usingbysort.