I’d like to use pandas for all my analysis along with numpy but use Rpy2 for plotting my data. I want to do all analyses using pandas dataframes and then use full plotting of R via rpy2 to plot these. py2, and am using ipython to plot. What’s the correct way to do this?
Nearly all commands I try fail. For example:
- I’m trying to plot a scatter between two columns of a pandas DataFrame
df. I’d like the labels ofdfto be used in x/y axis just like would be used if it were an R dataframe. Is there a way to do this? When I try to do it withr.plot, I get this gibberish plot:
In: r.plot(df.a, df.b) # df is pandas DataFrame
yields:
Out: rpy2.rinterface.NULL
resulting in the plot:

As you can see, the axes labels are messed up and it’s not reading the axes labels from the DataFrame like it should (the X axis is column a of df and the Y axis is column b).
-
If I try to make a histogram with
r.hist, it doesn’t work at all, yielding the error:In: r.hist(df.a) Out: ... vectors.pyc in <genexpr>((x,)) 293 if l < 7: 294 s = '[' + \ --> 295 ', '.join((p_str(x, max_width = math.floor(52 / l)) for x in self[ : 8])) +\ 296 ']' 297 else: vectors.pyc in p_str(x, max_width) 287 res = x 288 else: --> 289 res = "%s..." % (str(x[ : (max_width - 3)])) 290 return res 291 TypeError: slice indices must be integers or None or have an __index__ method
And resulting in this plot:

Any idea what the error means? And again here, the axes are all messed up and littered with gibberish data.
EDIT: This error occurs only when using ipython. When I run the command from a script, it still produces the problematic plot, but at least runs with no errors. It must be something wrong with calling these commands from ipython.
-
I also tried to convert the pandas DataFrame
dfto an R DataFrame as recommended by the poster below, but that fails too with this error:com.convert_to_r_dataframe(mydf) # mydf is a pandas DataFrame ----> 1 com.convert_to_r_dataframe(mydf) in convert_to_r_dataframe(df, strings_as_factors) 275 # FIXME: This doesn't handle MultiIndex 276 --> 277 for column in df: 278 value = df[column] 279 value_type = value.dtype.type TypeError: iteration over non-sequence
How can I get these basic plotting features to work on Pandas DataFrame (with labels of plots read from the labels of the Pandas DataFrame), and also get the conversion between a Pandas DF to an R DF to work?
EDIT2: Here is a complete example of a csv file “test.txt” (http://pastebin.ca/2311928) and my code to answer @dale’s comment:
import rpy2
from rpy2.robjects import r
import rpy2.robjects.numpy2ri
import pandas.rpy.common as com
from rpy2.robjects.packages import importr
from rpy2.robjects.lib import grid
from rpy2.robjects.lib import ggplot2
rpy2.robjects.numpy2ri.activate()
from numpy import *
import scipy
# load up pandas df
import pandas
data = pandas.read_table("./test.txt")
# plotting a column fails
print "data.c2: ", data.c2
r.plot(data.c2)
# Conversion and then plotting also fails
r_df = com.convert_to_r_dataframe(data)
r.plot(r_df)
The call to plot the column of “data.c2” fails, even though data.c2 is a column of a pandas df and therefore for all intents and purposes should be a numpy array. I use the activate() call so I thought it would handle this column as a numpy array and plot it.
The second call to plot the dataframe data after conversion to an R dataframe also fails. Why is that? If I load up test.txt from R as a dataframe, I’m able to plot() it and since my dataframe was converted from pandas to R, it seems like it should work here too.
When I do try rmagic in ipython, it does not fire up a plot window for some reason, though it does not error. I.e. if I do:
In [12]: X = np.array([0,1,2,3,4])
In [13]: Y = np.array([3,5,4,6,7])
In [14]: import rpy2
In [15]: from rpy2.robjects import r
In [16]: import rpy2.robjects.numpy2ri
In [17]: import pandas.rpy.common as com
In [18]: from rpy2.robjects.packages import importr
In [19]: from rpy2.robjects.lib import grid
In [20]: from rpy2.robjects.lib import ggplot2
In [21]: rpy2.robjects.numpy2ri.activate()
In [22]: from numpy import *
In [23]: import scipy
In [24]: r.assign("x", X)
Out[24]:
<Array - Python:0x592ad88 / R:0x6110850>
[ 0, 1, 2, 3, 4]
In [25]: r.assign("y", Y)
<Array - Python:0x592f5f0 / R:0x61109b8>
[ 3, 5, 4, 6, 7]
In [27]: %R plot(x,y)
There’s no error, but no plot window either. In any case, I’d like to stick to rpy2 and not rely on rmagic if possible.
Thanks.
[note: Your code in “edit 2” is working here (Python 2.7, rpy2-2.3.2, R-1.15.2).]
As @dale mentions it whenever R objects are anonymous (that is no R symbol exists for the object) the R
deparse(substitute())will end up returning thestructure()of the R object, and a possible fix is to specify the “xlab” and “ylab” parameters; for some plots you’ll have to also specifymain(the title).An other way to work around that is to use R’s formulas and feed the data frame (more below, after we work out the conversion part).
Forget about what is in
pandas.rpy. It is both broken and seem to ignore features available in rpy2.An earlier quick fix to conversion with ipython can be turned into a proper conversion rather easily. I am considering adding one to the rpy2 codebase (with more bells and whistles), but in the meantime just add the following snippet after all your imports in your code examples. It will transparently convert pandas’
DataFrameobjects into rpy2’sDataFramewhenever an R call is made.Now the following code will “just work”:
I also note that you are importing
ggplot2, without using it. Currently the conversionwill have to be explicitly requested. For example: