Issue
This Content is from Stack Overflow. Question asked by Sazzadur Rahman
Let’s say we have the following dataframe
handle image_src image_position image_src_1 image_src_2 image_src_3
0 a img1 1 None None None
1 b img2 2 None None None
2 c img3 3 None None None
How can we convert this datafame into this?
handle image_src image_position image_src_1 image_src_2 image_src_3
0 a img1 1 img1 None None
1 b img2 2 None img2 None
2 c img3 3 None None img3
Basically I want to fill the image_src_{image_position}
column based on the value of image_position
of each row.
I come up with a solution which involves manually iterating over the whole dataframe, like this
data = [
{"handle": "a","image_src": "img1","image_position": 1,"image_src_1": None,"image_src_2": None,"image_src_3": None},
{"handle": "b","image_src": "img2","image_position": 2,"image_src_1": None,"image_src_2": None,"image_src_3": None,
{"handle": "c","image_src": "img3","image_position": 3,"image_src_1": None,"image_src_2": None,"image_src_3": None}
]
df = pd.DataFrame(data)
for index in range(0, len(df)):
row = df.iloc[index]
position = row["image_position"].astype("int64")
df.loc[index, f"image_src_{position}"] = row["image_src"]
But as iterating over the whole dataframe is a bad thing, how can I imporove this?
Solution
First that came to my mind was .pivot()
function. Though this might not be the exact thing you wanted, — the original image_src
column would be dropped, the column naming would slightly differ and nan
might need handling, — still it’s rather a short solution.
(df.pivot(index=['handle', 'image_position'], # to keep these columns intact
columns='image_src',
values='image_src')
.add_prefix('image_src_')
.reset_index())
What I got:
handle | image_position | image_src_img1 | image_src_img2 | image_src_img3 | |
---|---|---|---|---|---|
0 | a | 1 | img1 | nan | nan |
1 | b | 2 | nan | img2 | nan |
2 | c | 3 | nan | nan | img3 |
Another take is to use image_position
column as an index for the new columns, thus it would be dropped too. Also you may want to replace nan
with None
:
import numpy as np
(df.pivot(index='handle',
columns='image_position',
values='image_src')
.add_prefix('image_src_')
.replace({np.nan: None})
.reset_index())
Goes like this:
handle | image_src_1 | image_src_2 | image_src_3 | |
---|---|---|---|---|
0 | a | img1 | None | None |
1 | b | None | img2 | None |
2 | c | None | None | img3 |
This Question was asked in StackOverflow by Sazzadur Rahman and Answered by n.shabankin It is licensed under the terms of CC BY-SA 2.5. - CC BY-SA 3.0. - CC BY-SA 4.0.