I have a TSV file that consists of integers along with some false data that could be anything such as floats or characters etc.
The idea is to read the contents of the file and find out which ones are bad (containing data other than integers)
Each line can be read using the readline method once the file has been opened for reading. Off course, the readline() method returns each line read as a string and not it’s constituent data types. My understanding is, that I could use the pickle module somehow to ensure that i retain the original data type by representing it as it’s serialized version carrying out dump and load methods.
The question is, how do I do this?
By reading each line and pickling it, would not help since readline by default reads it as a string. Thereby upon pickling, it’s really just pickling a string into a serialized python object representation and unpickling would only return it as a string. Thus the actual data in the line, such as integers or chars are being represented as strings irrespective.
So I assume the question is, how do I pickle things the right way OR how do I process each line of a file ensuring that it’s data types are being maintained?
As if you are getting the string from pickle,just split the string using ‘\t’ then use
There is one more method for unicodes also unicodes numeric check , so just use them
and maintain the data types