Using Python3 SortedDict Effectively (Floor, Ceillings, Minimum, Maximum, etc.)

5092

Introduction:
Coming from an Java background, I found it difficult to use sorted containers, specifically sorted dictionaries, to do boundary queries since Python3's SortedDict API documentation is not quite as transparent and out-of-the-box as Java's TreeMap per se. However, after doing a few research myself and consulting other folks, the Pythonic logic to go about accomplishing the aforementioned capabilities isn't as complex as I initially thought it would be. I'm sure people transitioning from other languages to Python3 would ran into this issue eventually so the goal of this post is to save them time and effort and provide clear illustrations of the aforementioned features and other effective SortedDict usages.

Credits:
Special thanks to @apoi2333 for his post in leetcode-cn and @anhpp for Q&A.

Finding the minimum key and corresponding item:
Let's start with an simple problem. Suppose we already have bunch of keys in our SortedDict map, how do we go about retrieving the smallest key and its corresponding value? Turns out SortedDict's peekitem() method's support for indexing allows us to do just that.

sorted_dict = SortedDict({1: 1, 2: 2, 4: 3})
print(sorted_dict.peekitem(0)) #prints "(1, 1)"

Finding the maximum key and corresponding item:
Now that we have the minimum key out of the way, let's move on the finding the maximum key. We use the same peekitem() method as before. This time we don't pass any arguments so it will peek the last item by default.

sorted_dict = SortedDict({1: 1, 2: 2, 4: 3})
print(sorted_dict.peekitem()) #prints "(4, 3)"

Finding the floor key and corresponding item for a key that may or may not exist in the SortedDict:
Now things get little bit tricky. In this scenario, we want to find the largest key that is smaller than or equal to the target_key. For this, we need to use both the peekitem and the bisect_left methods. bisect_left basically finds the first position to insert the target key so that all the keys are still in order. Therefore, we need to consider two scenarios:

Target key is not amongst the current key set
- In this scenario, we just simply call the bisect_left method with the target key and then subtract 1. However, we do need to take care of the edge case where all the keys are less than the target.
```
sorted_dict = SortedDict({1: 1, 2: 2, 4: 3})
target = 3
print(d.peekitem(sorted_dict.bisect_left(target) - 1)) #prints "(2, 2)"
```

Target key is amongst the current key set

In this scenario, we don't need to subtract 1 since it is already in the key set.

sorted_dict = SortedDict({1: 1, 3: 2, 4: 3})
target = 3
print(sorted_dict.peekitem(sorted_dict.bisect_left(target))) #prints "(3, 2)"

Finding the ceiling key and corresponding item for a key that may or may not exist in the SortedDict:
Ceiling key is defined as the smallest key that is greater than or equal to the target key. Like with finding the floor key, we also need to consider both cases when the target key is in our key set and not in our key set.

Target key is not amongst the current key set
- When the target key is not in our key set, simply calling the bisect method will do. Likewise, we also need to consider the edge where all the keys are less than the target key and handle it accordingly.
```
sorted_dict = SortedDict({1: 1, 2: 2, 4: 3})
target = 3
print(sorted_dict.peekitem(sorted_dict.bisect(target))) #prints "(4, 3)"
```
Target key is amongst the current key set
- In contrast to finding the floor key, it is in the 2nd scenario in which we need to subtract 1 to get to the right position.
```
sorted_dict = SortedDict({1: 1, 3: 2, 4: 3})
target = 3
print(sorted_dict.peekitem(sorted_dict.bisect(target) - 1)) #prints "(3, 2)"
```

Getting a small portion of the SortedDict based on key range:
In Java, this operation is called subMap. Basically, we want to get the portion of the dictionary given some starting and ending range for keys. In Python3, we can accomplish this via slicing on the items view collection object; more specifically, SortedItemsView. You might ask: isn't slicing O(n) time? Not quite in the case with SortedItemsView since the underlying implementation of SortedDict is an self-balancing tree. So it is really O(log n) on average. Refer to this documentation for more information: http://www.grantjenks.com/docs/sortedcontainers/sorteddict.html#sortedcontainers.SortedItemsView

sorted_dict.items()[start:end + 1]

If you want to use this view to initialize a new SortedDict, you can simply just pass it into the constructor like below:

sorted_dict2 = SortedDict(sorted_dict.items()[start:end + 1])

Removing a small portion of the SortedDict based on key range:
Like obtaining the portion, removing a portion of the sorted dictionary based on range keys is also straightforward and involves slicing as well. You can definitely call the popitem method one at a time using for loop but I prefer to use Python3's built-in map transformation function so the code shorter. Note that you will need to enclose the transformation function call inside an zero length queue via consume recipe since the map function does lazy evaluation.

deque(map(sorted_dict.popitem, [start] * (end - start + 1)), maxlen=0)

Here's two alternative ways to achieve range deletion according to a discussion with sortedcontainers' auther here.

Efficient Approach 1: O(k + log n) where k is the range of the keys and n is the number of keys in the map.
For this approach, we popped the keys k times from its underlying dictionary. Then, we can call the underlying sorted list slice range deletion optimized function to delete that sublist in O(log n) time.

low = sorted_dict.bisect_left(start)
high = sorted_dict.bisect_left(end)
for key in sorted_dict._list[low:high + 1]:
   super(SortedDict, sorted_dict).pop(key)
del sorted_dict._list[low:high + 1]

Efficient Approach 2: O(k log n) where k is the range of the keys and n is the number of keys in the map.
This approach was suggested by the author of the library. Although asymptotically spearking, it is less efficient as approach #1 but because irange is extremely fast and more efficient than bisect calls, there will be a lot of times where this approach is faster than approach #1.

iterator = sorted_dict.irange(start, end)
keys = list(iterator)
for key in keys:
   del sorted_dict[key]

Afterword:
If you have any questions, suggestions, comments, or concerns, please don't hesitate to leave them in the comment section below as I hope this will be a good learning experience and reference and therefore continuous dialogues are encouraged. Happy coding!

Comments (9)