I’ve been looking for a C++ implementation of the C4.5 algorithm, but I haven’t been able to find one yet. I found Quinlan’s C4.5 Release 8, but it’s written in C… has anybody seen any open source C++ implementations of the C4.5 algorithm?
I’m thinking about porting the J48 source code (or simply writing a wrapper around the C version) if I can’t find an open source C++ implementation out there, but I hope I don’t have to do that! Please let me know if you have come across a C++ implementation of the algorithm.
Update
I’ve been considering the option of writing a thin C++ wrapper around the C implementation of the C5.0 algorithm (C5.0 is the improved version of C4.5). I downloaded and compiled the C implementation of the C5.0 algorithm, but it doesn’t look like it’s easily portable to C++. The C implementation uses a lot of global variables and simply writing a thin C++ wrapper around the C functions will not result in an object oriented design because each class instance will be modifying the same global parameters. In other words: I will have no encapsulation and that’s a pretty basic thing that I need.
In order to get encapsulation I will need to make a full blown port of the C code into C++, which is about the same as porting the Java version (J48) into C++.
Update 2.0
Here are some specific requirements:
- Each classifier instance must encapsulate its own data (i.e. no global variables aside from constant ones).
- Support the concurrent training of classifiers and the concurrent evaluation of the classifiers.
Here is a good scenario: suppose I’m doing 10-fold cross-validation, I would like to concurrently train 10 decision trees with their respective slice of the training set. If I just run the C program for each slice, I would have to run 10 processes, which is not horrible. However, if I need to classify thousands of data samples in real time, then I would have to start a new process for each sample I want to classify and that’s not very efficient.
I may have found a possible C++ “implementation” of C5.0 (See5.0), but I haven’t been able to dig into the source code enough to determine if it really works as advertised.
To reiterate my original concerns, the author of the port states the following about the C5.0 algorithm:
I will update my answer as soon as I get some time to look into the source code.
Update
It’s looking pretty good, here is the C++ interface:
I would say that this is the best alternative I’ve found so far.