Issue
This Content is from Stack Overflow. Question asked by HapiDaze
Supposing a dict looked like this:
example_1 = {
'products': [
{
'p_code': 'AP001',
'description': 'Product 1',
'dimensions': {
'height': 20,
'width': 30
}
},
{
'p_code': 'AP002',
'description': 'Product 2',
'dimensions': {
'height': 15,
'width': 25
}
}
]
}
It would be fairly trivial to flatten it like this:
df_ex_1 = pd.json_normalize(example_1, record_path=['products'])
Result:
p_code description dimensions.height dimensions.width
0 AP001 Product 1 20 30
1 AP002 Product 2 15 25
But what if the dict looks like this:
example_2 = {
'AP001': {
'description': 'Product 1',
'dimensions': {
'height': 20,
'width': 30
}
},
'AP002': {
'description': 'Product 2',
'dimensions': {
'height': 15,
'width': 25
}
}
}
Is there a way to specify that record_path needs to drop down into each root-level key that it finds at runtime?
I’m not wedded to using json_normalize, so any other approach, that efficiently achieves the same result, would be fine.
I’ve tried:
pd.DataFrame.from_dict(example_2, orient='index')
Result:
description dimensions
AP001 Product 1 {'height': 20, 'width': 30}
AP002 Product 2 {'height': 15, 'width': 25}
but, this doesn’t flatten the data nested in dimensions.
If necessary, I can write code to make a transformed version of the input dict, before I pull it into Pandas, but I suspect Pandas can already do what I want, and I just don’t know how.
Any ideas please?
Solution
Using pandas.concat
and list comprehension, you can combine each DataFrame
object created by pandas.json_normalize
with something like this:
df = pd.concat([pd.json_normalize(example_2[product]) for product in example_2], keys=example_2.keys())
The index of this method is not exactly what you expected, you can do
df.index = example_2.keys()
to obtain what you want.
This Question was asked in StackOverflow by HapiDaze and Answered by Bijay Regmi It is licensed under the terms of CC BY-SA 2.5. - CC BY-SA 3.0. - CC BY-SA 4.0.