Posts tagged ‘aggregate’

Daten aggregieren mit pandas

I recently came across a „challenge“ where I needed to combine various rows. Each row was identified by Key1 and Key2 and had two interesting columns, Foo and Bar. For each Key1 there may be a few Key2, for each Key2 n Foo/Bar entries. While all Foos are distinct per Key1 and Key2 the Bar column may appear j times.

The goal was to get a list of unique Bar items for each Key1/Key2 combination.

Key1 Key2 Foo Bar
0 C1 T1 a1 rc-1
1 C1 T1 a2 rc-1
2 C1 T1 a3 rc-1
3 C1 T1 a4 rc-1
4 C2 T2 b1 rc-1
5 C2 T2 b2 rc-2
6 C3 T3 c1 rc-3
7 C4 T4 d1 rc-4
8 C4 T4 d2 rc-5
9 C4 T4 d3 rc-4

The following Python code nicely did the job, thanks to http://stackoverflow.com/questions/17841149/pandas-groupby-how-to-get-a-union-of-strings

# -*- coding: utf-8 -*-
import pandas as pd
 
def unique(liste):
    """ takes a list of elements, separated by comma and returns sorted string of unique items separated by comma """
    a = liste.split(',')
    b = sorted(set(a))
    return ','.join(b)
 
df = pd.read_excel('groupb_Beispiel.xlsx')
print(df)
 
grouped = df.groupby(['Key1','Key2'],as_index=False)['Bar'].agg(lambda col: ','.join(col))
grouped = pd.DataFrame(grouped)
 
grouped['Unique'] = grouped['Bar'].apply(unique)
 
print(grouped)
 
grouped.to_excel('result.xlsx')
Key1 Key2 Bar Unique
0 C1 T1 rc-1,rc-1,rc-1,rc-1 rc-1
1 C2 T2 rc-1,rc-2 rc-1,rc-2
2 C3 T3 rc-3 rc-3
3 C4 T4 rc-4,rc-5,rc-4 rc-4,rc-5

Uwe

Uwe Ziegenhagen likes LaTeX and Python, sometimes even combined. Do you like my content and would like to thank me for it? Consider making a small donation to my local fablab, the Dingfabrik Köln. Details on how to donate can be found here Spenden für die Dingfabrik.

More Posts - Website