Pandas Tips① ~How to add rows by splitting data using separator ??~¶
In data munging, or calculate statistics, or whatever else, "dataframe" in Python is really convenient and efficient. I'm really fond of using pandas. Nevertheless, I'm inclined to forget some useful technic. Thereby, I've decided to write it down here. First post is regarding "How to add rows by splitting data using separator??" I hope you will enjoy this :)
For instance, let us assume there is data as bellow.
import pandas as pd
purchase_data = pd.DataFrame([['john','apple|orange'],
['robert','apple|orange|blueberry'],
['david','banana']],columns=['name','fruit'])
purchase_data.head()
However, in some cases, the dataframe equipped with rows by fuit is required. You can create new dataframe as bellow.
column_list = ['name','fruit']
# Prepare empty data frame
purchase_data_new = pd.DataFrame(columns=column_list)
for index,row in purchase_data.iterrows():
# When creating dataframe with dictionary, order is not guaranteed.
# Hence I specify "columns" parameter.
purchase_data_temp = pd.DataFrame({row.index[0]:row.values[0],
row.index[1]:row.values[1].split('|')},
columns=column_list)
purchase_data_new = purchase_data_new.append(purchase_data_temp)
purchase_data_new
Should you have a question regarding the part where creating dataframe with dictionary, I'll share with you a little :)
Basically, you can create dataframe with dictionary as following. (By the way, you can see that this method doesn't guarranty order of columns)
df_temp = pd.DataFrame({'name':['hiroshi'],'fruit':['kiwi']})
df_temp.head()
When you specify value of dictionary as string not list, it means somewhat of constant value.
df_temp2 = pd.DataFrame({'name':'hiroshi','fruit':['kiwi','apple','banana']})
df_temp2.head()
Then, the point is that "split" method returns "list" type. Therefore, in david's turn in above example, there is no error :)
# Unfortunatelly, This gonna be error...
# df_temp3 = pd.DataFrame({'name':'hiroshi','fruit':'kiwi'})
df_temp3 = pd.DataFrame({'name':'hiroshi','fruit':'kiwi'.split('|')})
df_temp3.head()
You can see even though there is no '|' in fruit value, "split" returns "list" type as bellow.
'kiwi'.split('|')
If anyone know more efficient know, It's gonna be really grateful to let me know ! :)
No comments:
Post a Comment