Specifying order in categorical variables
It is common to have a categorical column whose values have a clear order. This can be shirt sizes, like “S”, “M”, “L” or “XL”. The category “L” is greater than “M”.
In order to provide with this contextual order to a categorical variable, we must proceed using the Recipe and writing a step for it. Currently, there is no way to do it through the interface, though we have plans to add this functionality soon.
Intro and Resources
To do this, we are going to use the order_categories step. You can follow this example.
Also, a video tutorial is available to do this exact process in the Titanic Data set. You can watch it here in case it makes it easier to follow.
Sample of the Clothing Size Prediction with ordering applied in the Size column
Open the Recipe
Reach out to the little scroll icon in the top right corner.
Enable Code Mode
This will allow us to see and edit the code.
Search for the order_categories step
Start writing ord
, and the list will show the order_categories
step.
Press enter, tab or click on the suggestion to accept it.
This operation will write this piece of code on the same line:
Edit the step with your column name
Because the column we want to take the information from is called size
(careful, it’s case sensitive!),
we change the code so it looks like this:
This gives an error because we are trying to create a column that already exists, size
. We can either
create a new column with the ordering applied, or we can overwrite the existing column to include
the ordering information.
To create a new column, simply change the output from size to any other name, for example:
To overwrite the colum, we use the overwrite operator, like this:
Apply the ordering
Let’s apply the ordering. To do this, edit the step like this:
This involves creating an options object with {}
that has a field categories
. This field expects a list
of all the categories present in this column in ascending order. “Smaller” elements first.
You can basically copy this piece of code and substitute “size” for the name of your column, and the categories list by all the different categories just like here: each in between quote marks and separated by commas.
Done!
Now, we click “Run” in the lower right corner to save. We can now see that our variable Size has a sorting icon next to it, allowing us to sort the whole table.
Was this page helpful?