[SOLVED] get stderr from inside python code (stderr provide by pd.read_csv with on_bad_lines=’warn’)


This Content is from Stack Overflow. Question asked by user3313834

I’m loading from very big and not very well formed csv file a pandas dataframe.

toy dataset to demonstrate (line 3 and 4 have more than 2 columns):

$ cat data.csv

Once loaded I get most of the lines loaded in the dataframe plus in stderr the list of lines with a mismatched number of columns (saved as err.log).

$ cat load.py 
import pandas as pd
df = pd.read_csv('data.csv', dtype=str, sep='|', on_bad_lines='warn')
$ python load.py 2> err.log 
   n car
0  7   u
1  9   z
2  2   t

$ cat err.log
b'Skipping line 3: expected 2 fields, saw 4nSkipping line 4: expected 2 fields, saw 3n'

The I use the lines numbers provided by err.log to get the raw lines with errors for further human analysis.

Could it be possible to have access to this lines with error from inside the Python code, without the need of 2> err.log from the calling (I do not have direct access to the execution to do the 2> err.log) ?


you can redirect stderr from inside python to an in-memory buffer as follows
, note you won’t get any error on your screen if an error happens … so if you want to see errors, you must catch it and rewire stderr back to stdout then raise the error again.

import sys
import io
buff = io.StringIO()
sys.stderr = buff

import pandas as pd
df = pd.read_csv('file.txt', dtype=str, sep='|', on_bad_lines='warn')


lines = buff.getvalue()[2:-4].split('\\n')
for line in lines:

b'Skipping line 3: expected 2 fields, saw 4\nSkipping line 4: expected 2 fields, saw 3\n'

Skipping line 3: expected 2 fields, saw 4
Skipping line 4: expected 2 fields, saw 3

This Question was asked in StackOverflow by user3313834 and Answered by Ahmed AEK It is licensed under the terms of CC BY-SA 2.5. - CC BY-SA 3.0. - CC BY-SA 4.0.

people found this article helpful. What about you?