It is common to have a categorical column whose values have a clear order. This can be shirt sizes, like “S”, “M”, “L” or “XL”. The category “L” is greater than “M”.

In order to provide with this contextual order to a categorical variable, we must proceed using the Recipe and writing a step for it. Currently, there is no way to do it through the interface, though we have plans to add this functionality soon.

Intro and Resources

To do this, we are going to use the order_categories step. You can follow this example.

Also, a video tutorial is available to do this exact process in the Titanic Data set. You can watch it here in case it makes it easier to follow.

Sample of the Clothing Size Prediction with ordering applied in the Size column

Open the Recipe

Reach out to the little scroll icon in the top right corner.

Recipe icon

Enable Code Mode

This will allow us to see and edit the code.

Code Mode in recipe

Search for the order_categories step

Start writing ord, and the list will show the order_categories step. Press enter, tab or click on the suggestion to accept it.

Order Categories
suggestion

This operation will write this piece of code on the same line:

order_categories(ds.input) -> (ds.output)

Edit the step with your column name

Because the column we want to take the information from is called size (careful, it’s case sensitive!), we change the code so it looks like this:

order_categories(ds.size) -> (ds.size)

This gives an error because we are trying to create a column that already exists, size. We can either create a new column with the ordering applied, or we can overwrite the existing column to include the ordering information.

To create a new column, simply change the output from size to any other name, for example:

order_categories(ds.size) -> (ds.size_ordered)

To overwrite the colum, we use the overwrite operator, like this:

order_categories(ds.size) => (ds.size)
                          ^ this changed from - to =

Apply the ordering

Let’s apply the ordering. To do this, edit the step like this:

order_categories(ds.size,
    {"categories": ["XXXL", "XXL", "L", "M", "S", "XXS"]}
) => (ds.size)

All we did is create an options object with {} that has a field categories. This field expects a list of categories present in our data, and we pass in the order we want, bigger/greater first, and in decreasing order.

You can basically copy this piece of code and substitute “size” for the name of your column, and the categories list by all the different categories just like here: each in between quote marks and separated by commas.

Done!

Now, we click “Run” in the lower right corner to save. We can now see that our variable Size has a sorting icon next to it, allowing us to sort the whole table.

Size sortable