I want to read a dataframe from a fixed width flat file. This is

Question

0

Asked: June 10, 20262026-06-10T09:13:09+00:00 2026-06-10T09:13:09+00:00

I want to read a dataframe from a fixed width flat file. This is

0

I want to read a dataframe from a fixed width flat file. This is a somewhat performance sensitive operation.

I would like all blank whitespace to be stripped from column value. After that whitespace is stripped, I want blank strings to be converted to NaN or None values. Here are the two ideas I had:

pd.read_fwf(path, colspecs=markers, names=columns,
            converters=create_convert_dict(columns))

def create_convert_dict(columns):
    convert_dict = {}
    for col in columns:
        convert_dict[col] = null_convert
        return convert_dict

def null_convert(value):
    value = value.strip()
    if value == "":
        return None
    else:
        return value

or:

pd.read_fwf(path, colspecs=markers, names=columns, na_values='',
            converters=create_convert_dict(columns))

def create_convert_dict(columns):
    convert_dict = {}
    for col in columns:
        convert_dict[col] = col_strip
    return convert_dict

def col_strip(value):
    return value.strip()

The second option depends on the converter (which strips whitespace) be evaluated before na_values.

I was wondering if the second one would work. The reason I am curious is because it seems better to retain NaN has the Null value opposed to None.

I am also open to any other suggestions for how I might perform this operation (stripping whitespace and then converting blank strings to NaN).

I do not have access to a computer with pandas installed at the moment, which is why I cannot test this myself.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-10T09:13:11+00:00

In case of fixed width file, no need to do anything special to strip white space, or handle missing fields. Below a small example of a fixed width file, three columns each of width 5. There is trailing and leading white space + missing data.

In [57]: data = """\
A    B     C     
 0    foo       
3    bar     2.0
  1        3.0
"""

In [58]: df = pandas.read_fwf(StringIO(data), widths=[5, 5, 5])

In [59]: df
Out[59]: 
   A    B   C
0  0  foo NaN
1  3  bar   2
2  1  NaN   3

In [60]: df.dtypes
Out[60]: 
A      int64
B     object
C    float64

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I want to read a dataframe from a fixed width flat file. This is

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply