[SOLVED] Pandas: Dynamically deciede which column to update based on another column’s value

Issue

This Content is from Stack Overflow. Question asked by Sazzadur Rahman

Let’s say we have the following dataframe

    handle  image_src   image_position  image_src_1 image_src_2 image_src_3
0        a       img1                1         None        None        None
1        b       img2                2         None        None        None
2        c       img3                3         None        None        None

How can we convert this datafame into this?

    handle  image_src   image_position  image_src_1 image_src_2 image_src_3
0        a       img1                1         img1        None        None
1        b       img2                2         None        img2        None
2        c       img3                3         None        None        img3

Basically I want to fill the image_src_{image_position} column based on the value of image_position of each row.

I come up with a solution which involves manually iterating over the whole dataframe, like this

data = [
    {"handle": "a","image_src": "img1","image_position": 1,"image_src_1": None,"image_src_2": None,"image_src_3": None},
    {"handle": "b","image_src": "img2","image_position": 2,"image_src_1": None,"image_src_2": None,"image_src_3": None,
    {"handle": "c","image_src": "img3","image_position": 3,"image_src_1": None,"image_src_2": None,"image_src_3": None}
]

df = pd.DataFrame(data)

for index in range(0, len(df)):
    row = df.iloc[index]
    
    position = row["image_position"].astype("int64")
    
    df.loc[index, f"image_src_{position}"] = row["image_src"]

But as iterating over the whole dataframe is a bad thing, how can I imporove this?



Solution

First that came to my mind was .pivot() function. Though this might not be the exact thing you wanted, — the original image_src column would be dropped, the column naming would slightly differ and nan might need handling, — still it’s rather a short solution.

(df.pivot(index=['handle', 'image_position'],  # to keep these columns intact
          columns='image_src',
          values='image_src')
   .add_prefix('image_src_')
   .reset_index())

What I got:

handleimage_positionimage_src_img1image_src_img2image_src_img3
0a1img1nannan
1b2nanimg2nan
2c3nannanimg3

Another take is to use image_position column as an index for the new columns, thus it would be dropped too. Also you may want to replace nan with None:

import numpy as np

(df.pivot(index='handle',
          columns='image_position',
          values='image_src')
   .add_prefix('image_src_')
   .replace({np.nan: None})
   .reset_index())

Goes like this:

handleimage_src_1image_src_2image_src_3
0aimg1NoneNone
1bNoneimg2None
2cNoneNoneimg3


This Question was asked in StackOverflow by Sazzadur Rahman and Answered by n.shabankin It is licensed under the terms of CC BY-SA 2.5. - CC BY-SA 3.0. - CC BY-SA 4.0.

people found this article helpful. What about you?