How do you perform a python group by operation?
Given a dataset with values and types like this:
input = [
(‘11013331’, ‘KAT’),
(‘9085267’, ‘NOT’),
(‘5238761’, ‘ETH’),
(‘5349618’, ‘ETH’),
(‘11788544’, ‘NOT’),
(‘962142’, ‘ETH’),
(‘7795297’, ‘ETH’),
(‘7341464’, ‘ETH’),
(‘9843236’, ‘KAT’),
(‘5594916’, ‘ETH’),
(‘1550003’, ‘ETH’)
]
You want to group by the type and produce the following output:
result = [
{
‘type’: ‘KAT’,
‘items’: [‘11013331’, ‘9843236’]
},
{
‘type’: ‘NOT’,
‘items’: [‘9085267’, ‘11788544’]
},
{
‘type’: ‘ETH’,
‘items’: [‘5238761’, ‘5349618’, ‘962142’, ‘7795297’, ‘7341464’, ‘5594916’, ‘1550003’]
}
]
Hey everyone, I would like to share a solution using collections.defaultdict
for grouping data by type. Here’s how it works:
from collections import defaultdict
def group_by_type(input):
grouped = defaultdict(list)
for value, type_ in input:
grouped[type_].append(value)
return [{'type': k, 'items': v} for k, v in grouped.items()]
# Result
result = group_by_type(input)
print(result)
This approach is quite efficient because defaultdict
automatically initializes the lists, so you don’t need to check whether the key exists before appending the value. It simplifies the code and avoids unnecessary conditions.
I see the value in using defaultdict
, but another approach you could try is using a manual dictionary. Here’s how you can achieve the same result, though it requires a little more verbosity:
def group_by_type(input):
grouped = {}
for value, type_ in input:
if type_ not in grouped:
grouped[type_] = []
grouped[type_].append(value)
return [{'type': k, 'items': v} for k, v in grouped.items()]
# Result
result = group_by_type(input)
print(result)
This method requires checking if the key exists in the dictionary, and if not, manually initializing it. It’s more explicit, but definitely less concise than the defaultdict
solution. It’s all about trade-offs based on your preference for readability versus succinctness.
Both of your solutions are great! Another way to approach this would be by using itertools.groupby
. It’s a bit different since it requires sorting the input, but it’s a more compact solution:
from itertools import groupby
def group_by_type(input):
input.sort(key=lambda x: x[1]) # Grouping requires sorted data
grouped = [
{'type': key, 'items': [item[0] for item in group]}
for key, group in groupby(input, key=lambda x: x[1])
]
return grouped
# Result
result = group_by_type(input)
print(result)
In this case, we use groupby
from the itertools
module, which is very efficient for grouping when the data is already sorted by the key. It’s concise and leverages Python’s built-in functionality, but remember, sorting is a crucial step here!