If read a file with default column names, how do call them after?
df[1] seems to work almost all of the time. However, it complains about types when writing conditions like:
In [60]: cond = ((df[1] != node) & (df[2] != deco))
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
/home/ferreirafm/work/colab/SNP/rawdata/<ipython-input-60-513a433bfeb5> in <module>()
----> 1 cond = ((df[1] != node) & (df[2] != deco))
/usr/lib64/python2.7/site-packages/pandas/core/series.pyc in wrapper(self, other)
140 if np.isscalar(res):
141 raise TypeError('Could not compare %s type with Series'
--> 142 % type(other))
143 return Series(na_op(values, other),
144 index=self.index, name=self.name)
TypeError: Could not compare <type 'str'> type with Series
Treat dataframe columns by default names are more appropriate for my applications.
It seems that you compare a series of scalar values to a string:
Note that pandas can handle strings and numbers in a series, but it not really makes sense to compare strings and numbers, so the error message is useful.
However pandas should perhaps give a more detailed error message.
If your condition for the column 2 would be a number it would work:
Some comments:
Maybe some of your confusion is due to a design decision in pandas:
If you read data from a file with
read_csvthe default column names of the resulting data frame are set toX.1toX.N(and toX1toXNfor versions >= 0.9), which are strings.If you create a data frame from exiting arrays or lists or something the column names default to
0toNand are integers.I opened a ticket to discuss this.
So your
should work for a dataframe created from an array or something, if the type of
df[1]anddf[2]is the same as the type ofnodeanddeco.If you have read a file with
read_csvthanshould work with versions < 0.9, while it should be
with versions >= 0.9.