Archive for the ‘Python / SciPy / pandas’ Category.
2016-09-28, 20:14
A short summary on Python’s timestamps:
import datetime
now = datetime.datetime.now()
print(now.strftime('%Y-%m-%d %H:%M'))
print(now.isoformat()) |
import datetime
now = datetime.datetime.now()
print(now.strftime('%Y-%m-%d %H:%M'))
print(now.isoformat())
From the module’s documentation:
Directive |
Meaning |
%a |
Locale’s abbreviated weekday name. |
%A |
Locale’s full weekday name. |
%b |
Locale’s abbreviated month name. |
%B |
Locale’s full month name. |
%c |
Locale’s appropriate date and time
representation. |
%d |
Day of the month as a decimal number [01,31]. |
%H |
Hour (24-hour clock) as a decimal number
[00,23]. |
%I |
Hour (12-hour clock) as a decimal number
[01,12]. |
%j |
Day of the year as a decimal number [001,366]. |
%m |
Month as a decimal number [01,12]. |
%M |
Minute as a decimal number [00,59]. |
%p |
Locale’s equivalent of either AM or PM. |
%S |
Second as a decimal number [00,61]. |
%U |
Week number of the year (Sunday as the first
day of the week) as a decimal number [00,53].
All days in a new year preceding the first
Sunday are considered to be in week 0. |
%w |
Weekday as a decimal number [0(Sunday),6]. |
%W |
Week number of the year (Monday as the first
day of the week) as a decimal number [00,53].
All days in a new year preceding the first
Monday are considered to be in week 0. |
%x |
Locale’s appropriate date representation. |
%X |
Locale’s appropriate time representation. |
%y |
Year without century as a decimal number
[00,99]. |
%Y |
Year with century as a decimal number. |
%z |
Time zone offset indicating a positive or
negative time difference from UTC/GMT of the
form +HHMM or -HHMM, where H represents decimal
hour digits and M represents decimal minute
digits [-23:59, +23:59]. |
%Z |
Time zone name (no characters if no time zone
exists). |
%% |
A literal '%' character. |
Uwe Ziegenhagen likes LaTeX and Python, sometimes even combined.
Do you like my content and would like to thank me for it? Consider making a small donation to my local fablab, the Dingfabrik Köln. Details on how to donate can be found here Spenden für die Dingfabrik.
More Posts - Website
2016-08-24, 21:27
Here are my slides from the Froscon 2016 presentation „Using Python for Scientific Research“.
Slides: Froscon_Slides_2016
Video: Video Recording (The screen was flickering most of the time, pretty annoying and distracting)
I will continously update and expand this presentation during the next months, if you want to receive updates follow the GitHub repository: https://github.com/UweZiegenhagen/2016-Python-Data-Analysis-Slides/
Uwe Ziegenhagen likes LaTeX and Python, sometimes even combined.
Do you like my content and would like to thank me for it? Consider making a small donation to my local fablab, the Dingfabrik Köln. Details on how to donate can be found here Spenden für die Dingfabrik.
More Posts - Website
2016-08-17, 11:13
Here’s some experimental (alpha) code to parse Emacs Orgmode files. It’s far from complete, I only aim at parsing basic TODO strings with level (**), status (TODO, DONE), priority (#A, #B, #C), task and tags.
2016-09-03: It takes my actual orgmode file, so it’s working fine.
2016-09-04: I created a github repo, code updates will be added there, only: https://github.com/UweZiegenhagen/python-orgmode-parser
# -*- coding: utf-8 -*-
import re
def parseEmaceOrgmode(s):
r = '^([\*]+)?\s?(TODO|PROGRESSING|FEEDBACK|VERIFY|POSTPONED|DELEGATED|CANCELLED|DONE)?\s?(\[#[A|B|C]\])?\s?(.*?)\s*(:(.*):)?$'
m = re.search(r,s)
level = m.group(1)
if (level is not None):
level = len(level)
prio = m.group(3)
if (prio is not None):
prio = prio[2:3]
tags = []
a = m.group(5)
if a != None:
b = len(a)-1
a= a[1:b]
a = a.split(':')
tags.append(a)
return(level, m.group(2), prio, m.group(4), tags)
with open("../orgmode.org", "r") as ins:
for line in ins:
level, status, priority, task, tags = parseEmaceOrgmode(line)
if level is not None:
print('Level:', level)
print('Status:', status)
print('Priority:', priority)
print('Task:', task)
print('Tags:',tags,'\n\n') |
# -*- coding: utf-8 -*-
import re
def parseEmaceOrgmode(s):
r = '^([\*]+)?\s?(TODO|PROGRESSING|FEEDBACK|VERIFY|POSTPONED|DELEGATED|CANCELLED|DONE)?\s?(\[#[A|B|C]\])?\s?(.*?)\s*(:(.*):)?$'
m = re.search(r,s)
level = m.group(1)
if (level is not None):
level = len(level)
prio = m.group(3)
if (prio is not None):
prio = prio[2:3]
tags = []
a = m.group(5)
if a != None:
b = len(a)-1
a= a[1:b]
a = a.split(':')
tags.append(a)
return(level, m.group(2), prio, m.group(4), tags)
with open("../orgmode.org", "r") as ins:
for line in ins:
level, status, priority, task, tags = parseEmaceOrgmode(line)
if level is not None:
print('Level:', level)
print('Status:', status)
print('Priority:', priority)
print('Task:', task)
print('Tags:',tags,'\n\n')
Uwe Ziegenhagen likes LaTeX and Python, sometimes even combined.
Do you like my content and would like to thank me for it? Consider making a small donation to my local fablab, the Dingfabrik Köln. Details on how to donate can be found here Spenden für die Dingfabrik.
More Posts - Website
2016-08-01, 00:11
Hier noch ein einfaches Python-Beispiel für Klassen und Funktionen, das ich vor ein paar Tagen geschrieben habe. Die Punkt-Klasse erhält eine entsprechende Funktion, um die Euklidische Distanz zu einem anderen Punkt zu bestimmen.
# -*- coding: utf-8 -*-
import math as m
class Point:
def __init__(self,x,y):
self.x = x
self.y = y
def calcEuclidDistanceToPoint(self,x,y):
return m.sqrt(m.pow(self.x-x,2) + m.pow(self.y-y,2))
p1 = Point(0,0)
p2 = Point(1,1)
print(p2.calcEuclidDistanceToPoint(p1.x,p1.y)) |
# -*- coding: utf-8 -*-
import math as m
class Point:
def __init__(self,x,y):
self.x = x
self.y = y
def calcEuclidDistanceToPoint(self,x,y):
return m.sqrt(m.pow(self.x-x,2) + m.pow(self.y-y,2))
p1 = Point(0,0)
p2 = Point(1,1)
print(p2.calcEuclidDistanceToPoint(p1.x,p1.y))
runfile('euclidDistance.py', wdir='E:/Python')
1.4142135623730951
Uwe Ziegenhagen likes LaTeX and Python, sometimes even combined.
Do you like my content and would like to thank me for it? Consider making a small donation to my local fablab, the Dingfabrik Köln. Details on how to donate can be found here Spenden für die Dingfabrik.
More Posts - Website
2016-08-01, 00:03
Hier ein kurzes Beispiel aus der numpy-Dokumentation, wie man mit Hilfe von numpy lineare Gleichungssysteme lösen kann:
Zu lösen sind folgende Gleichungen:
- 3 * x0 + 1 * x1 = 9
- 1 * x0 + 2 * x1 = 8
Die Koeffizienten kommen in die entsprechenden numpy-Arrays, dann ruft man linalg.solve
auf:
import numpy as np
a = np.array([[3,1], [1,2]])
b = np.array([9,8])
x = np.linalg.solve(a, b)
print(x) # gibt [ 2. 3.] |
import numpy as np
a = np.array([[3,1], [1,2]])
b = np.array([9,8])
x = np.linalg.solve(a, b)
print(x) # gibt [ 2. 3.]
Den Plot habe ich mit LaTeX erstellt, siehe http://uweziegenhagen.de/?p=3516.
Uwe Ziegenhagen likes LaTeX and Python, sometimes even combined.
Do you like my content and would like to thank me for it? Consider making a small donation to my local fablab, the Dingfabrik Köln. Details on how to donate can be found here Spenden für die Dingfabrik.
More Posts - Website
2016-07-07, 21:52
Based on an example from stackexchange I have created a small example on parallel TeX compilation.
# -*- coding: utf-8 -*-
"""
Created on 2016-07-06
Uwe Ziegenhagen
based on http://stackoverflow.com/questions/16181121/python-very-simple-multithreading-parallel-url-fetching-without-queue
"""
from multiprocessing.pool import ThreadPool
from time import time as timer
import os
files = ['test-01.tex','test-02.tex','test-03.tex','test-04.tex','test-05.tex',
'test-06.tex','test-07.tex','test-08.tex','test-09.tex','test-10.tex']
def compile_file(cfile):
try:
result = os.system('pdflatex -interaction=batchmode ' + cfile)
return cfile, None
except Exception as e:
return cfile, e
start = timer()
results = ThreadPool(8).imap_unordered(compile_file, files)
for cfile, error in results:
if error is None:
print("%r compiled in %ss" % (cfile, timer() - start))
else:
print("Error compiling %r: %s" % (cfile, error))
print("Elapsed Time: %s" % (timer() - start,))
print('Gesamtzeit',timer() - start) |
# -*- coding: utf-8 -*-
"""
Created on 2016-07-06
Uwe Ziegenhagen
based on http://stackoverflow.com/questions/16181121/python-very-simple-multithreading-parallel-url-fetching-without-queue
"""
from multiprocessing.pool import ThreadPool
from time import time as timer
import os
files = ['test-01.tex','test-02.tex','test-03.tex','test-04.tex','test-05.tex',
'test-06.tex','test-07.tex','test-08.tex','test-09.tex','test-10.tex']
def compile_file(cfile):
try:
result = os.system('pdflatex -interaction=batchmode ' + cfile)
return cfile, None
except Exception as e:
return cfile, e
start = timer()
results = ThreadPool(8).imap_unordered(compile_file, files)
for cfile, error in results:
if error is None:
print("%r compiled in %ss" % (cfile, timer() - start))
else:
print("Error compiling %r: %s" % (cfile, error))
print("Elapsed Time: %s" % (timer() - start,))
print('Gesamtzeit',timer() - start)
Uwe Ziegenhagen likes LaTeX and Python, sometimes even combined.
Do you like my content and would like to thank me for it? Consider making a small donation to my local fablab, the Dingfabrik Köln. Details on how to donate can be found here Spenden für die Dingfabrik.
More Posts - Website
2016-04-24, 18:56
Hier ein Quick & Dirty Code, um eine Spalte aus einer Text-Datei zu extrahieren. Geht auch mit AWK, aber wenn man nur Python hat…
def splitFileOneColumn(inputFile,outputFile,columnSeparator,column):
with open(inputFile, 'r') as infile:
with open(outputFile, 'w') as outfile:
for line in infile:
s = line.split(columnSeparator)
outfile.write(s[column]+os.linesep) # '\r\n' on Windows, '\n' on Unix/Linux/Mac
outfile.close()
infile.close() |
def splitFileOneColumn(inputFile,outputFile,columnSeparator,column):
with open(inputFile, 'r') as infile:
with open(outputFile, 'w') as outfile:
for line in infile:
s = line.split(columnSeparator)
outfile.write(s[column]+os.linesep) # '\r\n' on Windows, '\n' on Unix/Linux/Mac
outfile.close()
infile.close()
Bei Gelegenheit muss ich das mal um die Möglichkeit erweitern, n Spalten zu extrahieren.
Uwe Ziegenhagen likes LaTeX and Python, sometimes even combined.
Do you like my content and would like to thank me for it? Consider making a small donation to my local fablab, the Dingfabrik Köln. Details on how to donate can be found here Spenden für die Dingfabrik.
More Posts - Website
2016-02-28, 21:29
I recently came across a „challenge“ where I needed to combine various rows. Each row was identified by Key1
and Key2
and had two interesting columns, Foo
and Bar
. For each Key1
there may be a few Key2
, for each Key2
n Foo/Bar entries. While all Foos are distinct per Key1
and Key2
the Bar
column may appear j times.
The goal was to get a list of unique Bar items for each Key1/Key2 combination.
|
Key1 |
Key2 |
Foo |
Bar |
0 |
C1 |
T1 |
a1 |
rc-1 |
1 |
C1 |
T1 |
a2 |
rc-1 |
2 |
C1 |
T1 |
a3 |
rc-1 |
3 |
C1 |
T1 |
a4 |
rc-1 |
4 |
C2 |
T2 |
b1 |
rc-1 |
5 |
C2 |
T2 |
b2 |
rc-2 |
6 |
C3 |
T3 |
c1 |
rc-3 |
7 |
C4 |
T4 |
d1 |
rc-4 |
8 |
C4 |
T4 |
d2 |
rc-5 |
9 |
C4 |
T4 |
d3 |
rc-4 |
The following Python code nicely did the job, thanks to http://stackoverflow.com/questions/17841149/pandas-groupby-how-to-get-a-union-of-strings
# -*- coding: utf-8 -*-
import pandas as pd
def unique(liste):
""" takes a list of elements, separated by comma and returns sorted string of unique items separated by comma """
a = liste.split(',')
b = sorted(set(a))
return ','.join(b)
df = pd.read_excel('groupb_Beispiel.xlsx')
print(df)
grouped = df.groupby(['Key1','Key2'],as_index=False)['Bar'].agg(lambda col: ','.join(col))
grouped = pd.DataFrame(grouped)
grouped['Unique'] = grouped['Bar'].apply(unique)
print(grouped)
grouped.to_excel('result.xlsx') |
# -*- coding: utf-8 -*-
import pandas as pd
def unique(liste):
""" takes a list of elements, separated by comma and returns sorted string of unique items separated by comma """
a = liste.split(',')
b = sorted(set(a))
return ','.join(b)
df = pd.read_excel('groupb_Beispiel.xlsx')
print(df)
grouped = df.groupby(['Key1','Key2'],as_index=False)['Bar'].agg(lambda col: ','.join(col))
grouped = pd.DataFrame(grouped)
grouped['Unique'] = grouped['Bar'].apply(unique)
print(grouped)
grouped.to_excel('result.xlsx')
|
Key1 |
Key2 |
Bar |
Unique |
0 |
C1 |
T1 |
rc-1,rc-1,rc-1,rc-1 |
rc-1 |
1 |
C2 |
T2 |
rc-1,rc-2 |
rc-1,rc-2 |
2 |
C3 |
T3 |
rc-3 |
rc-3 |
3 |
C4 |
T4 |
rc-4,rc-5,rc-4 |
rc-4,rc-5 |
Uwe Ziegenhagen likes LaTeX and Python, sometimes even combined.
Do you like my content and would like to thank me for it? Consider making a small donation to my local fablab, the Dingfabrik Köln. Details on how to donate can be found here Spenden für die Dingfabrik.
More Posts - Website
2016-02-03, 22:02
Hier meine Folien zum Data Science Meetup vom 29.01.2016.
Pandas (PDF)
Uwe Ziegenhagen likes LaTeX and Python, sometimes even combined.
Do you like my content and would like to thank me for it? Consider making a small donation to my local fablab, the Dingfabrik Köln. Details on how to donate can be found here Spenden für die Dingfabrik.
More Posts - Website
2016-01-27, 22:10
Vor einiger Zeit hatte ich eine Excel-Datei zu bearbeiten, in der in einer Spalte die Spaltennamen, in einer anderen die korrespondieren Werte standen. Immer drei Zeilen bildeten den eigentlichen Datensatz. Mit wenigen Zeilen Pandas und cleverer Adressierung der Ergebnis-Zelle.
Spaltenname |
Wert |
ColA |
Andi |
ColB |
Berni |
ColC |
Cesar |
ColA |
Dorian |
ColB |
Ernest |
ColC |
Frank |
import pandas as pd
# Lade die Daten
daten = pd.read_excel('combine.xlsx')
# Erstelle leeren Dataframe mit den Spaltennamen aus den Excelzeilen
verarbeitet = pd.DataFrame(columns=['ColA','ColB','ColC'])
# Iteriere über die Daten
for i, row in daten.iterrows():
# ganzzahliges Teilen, um die Zeile zu bestimmen
# in die die Zelle gehört, Spalte ergibt sich aus dem Wert in 'Spalte'
verarbeitet.loc[i // 3,row['Spalte']] = row['Wert']
print(verarbeitet) |
import pandas as pd
# Lade die Daten
daten = pd.read_excel('combine.xlsx')
# Erstelle leeren Dataframe mit den Spaltennamen aus den Excelzeilen
verarbeitet = pd.DataFrame(columns=['ColA','ColB','ColC'])
# Iteriere über die Daten
for i, row in daten.iterrows():
# ganzzahliges Teilen, um die Zeile zu bestimmen
# in die die Zelle gehört, Spalte ergibt sich aus dem Wert in 'Spalte'
verarbeitet.loc[i // 3,row['Spalte']] = row['Wert']
print(verarbeitet)
|
ColA |
ColB |
ColC |
0 |
Andi |
Berni |
Cesar |
1 |
Dorian |
Ernest |
Frank |
Nachtrag: Stephan vom Kölner Data Science Meetup hat mir noch einen alternativen Weg gezeigt:
import pandas as pd
data = {'A': ["cola", "colb", "colc", "cola", "colb", "colc"], "B": [1, 2, 3, 4, 5, 6]}
data = pd.DataFrame(data)
gb = data.groupby('A')
res = pd.DataFrame()
for key in gb.groups:
res[key] = gb.get_group(key)['B'].values.flatten()
print(res) |
import pandas as pd
data = {'A': ["cola", "colb", "colc", "cola", "colb", "colc"], "B": [1, 2, 3, 4, 5, 6]}
data = pd.DataFrame(data)
gb = data.groupby('A')
res = pd.DataFrame()
for key in gb.groups:
res[key] = gb.get_group(key)['B'].values.flatten()
print(res)
Uwe Ziegenhagen likes LaTeX and Python, sometimes even combined.
Do you like my content and would like to thank me for it? Consider making a small donation to my local fablab, the Dingfabrik Köln. Details on how to donate can be found here Spenden für die Dingfabrik.
More Posts - Website