Link Search Menu Expand Document

I have already mentioned ‘sorting’ before but haven’t defined what it means. Of course, it’s not rocket science; sorting means arranging observations in a particular order. In most contexts, this order is either ascending or descending and is defined on the values of a user-defined set of variables.

Stata has two commands for sorting: sort and gsort, the only difference between the two being that sort performs ascending-order sorting only while gsort performs both descending and ascending-order sorting.

For example, suppose you want to sort your dataset by state and district names in an ascending fashion (i.e., a comes before b, and so on). The statements below are equivalent:

sort lgd_state_name lgd_district_name
gsort +lgd_state_name +lgd_district_name

The syntax is similar: first the command and then the variables on which we want to sort. In gsort we add the + in front of the variable name to indicate that we want to sort ascendingly on that variable. Expectedly, - refers to descending sorting on a variable. So, the following statement sorts the observations ascendingly on state names but descendingly on district names.

gsort +lgd_state_name -lgd_district_name

Importantly, the order of the variables in that variable list is important. In sort lgd_state_name lgd_district_name, Stata first sorts on state names and within each state, sorts on district names. So, the sorted data looks like:

lgd_state_name lgd_district_name
state1 aaa
state1 aab
state1 bcc
state2 aac
state2 bbb

However, if we were to flip the order of the variables in that list, i.e., we were to run sort lgd_district_name lgd_state_name, then Stata would first sort on district names and within each district, would sort on state names. So, the sorted data would look like:

lgd_state_name lgd_district_name
state1 aaa
state1 aab
state2 aac
state2 bbb
state1 bcc

So, be careful about what you want haha! It’s helpful to quickly browse the data after sorting to confirm we did it right.