In this post, we’ll discuss the underrated Python collections package, which is part of the standard library. Collections allows you to utilize several data structures beyond base Python.
How to get a count of all the elements in a list
One very useful function in collections is the Counter method, which you can use to return a count of all the elements in a list.
nums = [3, 3, 4, 1, 10, 10, 10, 10, 5]
collections.Counter(nums)
The Counter object that gets returned is also modifiable. Let’s define a variable equal to the result above.
counts = collections.Counter(nums)
counts[20] += 1
Notice how we can add the number 20 to our Counter object without having to initialize it with a 0 value.
Counter can also be used if you have a list of tuples. This can be handy if you want to count pairs of numbers or other types of objects.
temp = [(2, 3), (2, 3), (4, 5), (2, 3), (1, 1), (4, 5)]
collections.Counter(temp)
How to create default values for Python dictionaries
In a previous post, we discussed Python dictionaries. defaultdict provides a modified dictionary structure that lets you assign a default value for any key. This means that if you try referencing a key that you haven’t defined, defaultdict will return some default value rather than a “KeyError”. The default value can also be a data structure. For example, suppose you want to create a dictionary where you expect each element to be a set. defaultdict can automatically create this for you.
sample_map = collections.defaultdict(set)
sample_map["a"].add(5)
sample_map["a"].add(10)
sample_map["b"].add(2)
sample_map
If you try to reference a non-existent key, Python will return an empty set since that’s our default value.
sample_map["x"] # returns empty set
Also, note how we don’t need to initialize sample_map[“a”] to an empty set since defaultdict takes care of it for us. This works for other data types as well. For example, the code below will create a default dictionary structure for lists.
list_container = collections.defaultdict(list)
list_container["x"].append(5)
list_container["x"].append(3)
list_container["y"].append(1)
Using defaultdict is one way of creating adjacency lists in Python to represent graphs. It can be useful in situations where you need to build out the adjacency list based off some prior logic (like running DFS, for example).
In addition to sets and lists, defaultdict also supports ints (integers) as a default value.
collections.defaultdict(int)
# create a dictionary where the default value is 5
def_five = collections.defaultdict(lambda: 5)
def_five["a"] = 10
# "b" is not a key, but running this line
# will return the default value of 5
def_five["b"]
deque
The deque data structure in collections is a double-ended queue. deque is pronounced like “deck”. A deque works similarly to a list, but is more optimized for appending or popping off elements from either the front of the queue or the end. For instance, if you have a very large list in Python and you want to append an element to the start of the list, this operation takes O(n) time since each element in the list needs to be shifted over by one. In other words, if your list has 100,000 elements and you append an element to the front of the list, this requires 100,000 shift operations to move each element over by one in the list. This could become computationally expensive, especially if you need to do this within a loop, for example.
# create a list with 100,000 elements
nums = list(range(100000))
# append -10 to the front of the list
# note that this takes O(n) time, or 100,000 operations since
# each number if shifted over by one in the list
nums.insert(0, -10)
Doubled-ended queues, on the other hand, can append elements to the front or back in approximately O(1) time. Appending elements to the front of the queue can be done using the appendleft method.
queue = collections.deque()
queue.appendleft(5)
queue.appendleft(4)
queue.appendleft(3)
Similarly, to pop elements off from the left side, we can use the popleft method. Again, normally if you were to pop an element off from the left side of of a list, it would take O(n) operations, but with a doubled-ended queue, you can do this in O(1) time.
queue.popleft()
Double-ended queues also comes with operations to append multiple elements at once to either the front or back of the queue. For example, Python lists have an extend method that adds a list of other elements to the current list.
nums = [2, 3, 4]
nums.extend([5, 6, 7])
# nums = [2, 3, 4, 5, 6, 7]
Double-ended queues have an extendleft method and an extend method to be able to do this for the left and right sides of the list, respectively.
queue.extendleft([1, 2, 3, 4, 5])
Named tuples
Another data structure available is the named tuple. If you’ve used R, you may be familiar with the concept of named vectors where you can reference elements in a vector not only by index, but only by a name. Similarly, named tuples offered by collections let you reference tuple elements by name or index.
Info = collections.namedtuple("info", ["a", "b", "c"])
test = Info(a = 10, b = 20, c = 30)
# reference elements by index
test[0] + test[1]
# reference elements by name
test.a + test.b
Named tuples can also be useful when working with sqlite, as seen here.
Visit TheAutomatic.net to learn more about this topic: http://theautomatic.net/2021/04/14/python-collections-tutorial/
Disclosure: Interactive Brokers
Information posted on IBKR Campus that is provided by third-parties does NOT constitute a recommendation that you should contract for the services of that third party. Third-party participants who contribute to IBKR Campus are independent of Interactive Brokers and Interactive Brokers does not make any representations or warranties concerning the services offered, their past or future performance, or the accuracy of the information provided by the third party. Past performance is no guarantee of future results.
This material is from TheAutomatic.net and is being posted with its permission. The views expressed in this material are solely those of the author and/or TheAutomatic.net and Interactive Brokers is not endorsing or recommending any investment or trading discussed in the material. This material is not and should not be construed as an offer to buy or sell any security. It should not be construed as research or investment advice or a recommendation to buy, sell or hold any security or commodity. This material does not and is not intended to take into account the particular financial conditions, investment objectives or requirements of individual customers. Before acting on this material, you should consider whether it is suitable for your particular circumstances and, as necessary, seek professional advice.