I have already mentioned ‘sorting’ before but haven’t defined what it means. Of course, it’s not rocket science; sorting means arranging observations in a particular order. In most contexts, this order is either ascending or descending and is defined on the values of a user-defined set of variables.
Stata has two commands for sorting: sort
and gsort
, the only difference between the two being that sort
performs ascending-order sorting only while gsort
performs both descending and ascending-order sorting.
For example, suppose you want to sort your dataset by state and district names in an ascending fashion (i.e., a comes before b, and so on). The statements below are equivalent:
sort lgd_state_name lgd_district_name
gsort +lgd_state_name +lgd_district_name
The syntax is similar: first the command and then the variables on which we want to sort. In gsort
we add the +
in front of the variable name to indicate that we want to sort ascendingly on that variable. Expectedly, -
refers to descending sorting on a variable. So, the following statement sorts the observations ascendingly on state names but descendingly on district names.
gsort +lgd_state_name -lgd_district_name
Importantly, the order of the variables in that variable list is important. In sort lgd_state_name lgd_district_name
, Stata first sorts on state names and within each state, sorts on district names. So, the sorted data looks like:
lgd_state_name | lgd_district_name |
---|---|
state1 | aaa |
state1 | aab |
state1 | bcc |
… | … |
state2 | aac |
state2 | bbb |
… | … |
However, if we were to flip the order of the variables in that list, i.e., we were to run sort lgd_district_name lgd_state_name
, then Stata would first sort on district names and within each district, would sort on state names. So, the sorted data would look like:
lgd_state_name | lgd_district_name |
---|---|
state1 | aaa |
state1 | aab |
state2 | aac |
state2 | bbb |
state1 | bcc |
… | … |
So, be careful about what you want haha! It’s helpful to quickly browse the data after sorting to confirm we did it right.