[SOLVED] How can Pandas flatten a dict with dynamic keys

Issue

This Content is from Stack Overflow. Question asked by HapiDaze

Supposing a dict looked like this:

example_1 = {
    'products': [
        {
            'p_code': 'AP001',
            'description': 'Product 1',
            'dimensions': {
                'height': 20,
                'width': 30
            }
        },
        {
            'p_code': 'AP002',
            'description': 'Product 2',
            'dimensions': {
                'height': 15,
                'width': 25
            }
        }
    ]
}

It would be fairly trivial to flatten it like this:

df_ex_1 = pd.json_normalize(example_1, record_path=['products'])

Result:

p_code  description dimensions.height   dimensions.width
0   AP001   Product 1              20                 30
1   AP002   Product 2              15                 25

But what if the dict looks like this:

example_2 = {
    'AP001': {
        'description': 'Product 1',
        'dimensions': {
            'height': 20,
            'width': 30
        }        
    },
    'AP002': {
        'description': 'Product 2',
        'dimensions': {
            'height': 15,
            'width': 25
        }        
    }
}

Is there a way to specify that record_path needs to drop down into each root-level key that it finds at runtime?

I’m not wedded to using json_normalize, so any other approach, that efficiently achieves the same result, would be fine.

I’ve tried:

pd.DataFrame.from_dict(example_2, orient='index')

Result:

      description                    dimensions
AP001   Product 1   {'height': 20, 'width': 30}
AP002   Product 2   {'height': 15, 'width': 25}

but, this doesn’t flatten the data nested in dimensions.

If necessary, I can write code to make a transformed version of the input dict, before I pull it into Pandas, but I suspect Pandas can already do what I want, and I just don’t know how.

Any ideas please?



Solution

Using pandas.concat and list comprehension, you can combine each DataFrame object created by pandas.json_normalize with something like this:

df = pd.concat([pd.json_normalize(example_2[product]) for product in example_2], keys=example_2.keys())

The index of this method is not exactly what you expected, you can do

df.index = example_2.keys()

to obtain what you want.


This Question was asked in StackOverflow by HapiDaze and Answered by Bijay Regmi It is licensed under the terms of CC BY-SA 2.5. - CC BY-SA 3.0. - CC BY-SA 4.0.

people found this article helpful. What about you?