I created a binary classification tree using ctree. I would like each terminal node to contain the row names associated with that node. How can I accomplish this?
For example, for the dataset below, I would like the leftmost node to list the names of all those who are aged <23 (Abner to Abudemio) and the rightmost Abundiantus to Acelin respectively.
names age height young
1 Abner 18 76.1 yes
2 Abraham 19 77.0 yes
3 Abram 20 78.1 yes
4 Abrasha 21 78.2 yes
5 Absalom 22 78.8 yes
6 Abudemio 23 79.7 yes
7 Abundiantus 24 79.9 no
8 Acacio 25 81.1 no
9 Acario 26 81.2 no
10 Accursius 27 81.8 no
11 Ace 28 82.8 no
12 Acelin 29 83.5 no
.
Here is one hacky solution. It involves very little modification in the original source code of the plotting functions from the
partypackage. By reading the source code, I noticed that there is aterminal_panelwhich is callingnode_barplotin case the outcome is a factor. (Everything is located in theR/plot.Rfunction, if you have source package installed.) We can modify the later to display custom labels in the default bar chart.Just issue the following command at R prompt:
and then, start adding what we want:
labels = NULL, gp = NULLto the existing list of arguments for that function.Near the end of the function body, after
grid.rect(gp = gpar(fill = "transparent")), add the following lines:Here, the key idea is to select labels corresponding to the selected node (
node$nodeID), and we can grab this information from the slotwhereof thectreeobject (this is a vector indicating in which node each case ended up). Theiftest is just to ensure that we can use the function as originally written. Thegpargument can be used to change font size or color.A typical call to the function would now be:
where
dfrm$namesis a column of labels from a data frame nameddfrm. Here is an illustration with your data:(I have also tested this with the on-line example with the
irisdataset.)