Right, I am studying CS and have been completing a lab in C, it has all gone pretty well but I’ve now hit something that appears to be impossible to solve for the bonus marks. I’ve been on this for a day or two and simply cannot figure out a correct way of completing this.
Introduction
A summary of the tasks so far:
Build the game of pangolins, where the computer stores the questions and objects in a tree structure. The game works as follows: You think of an object and the computer tries to guess what you are thinking of by asking you a series of ‘yes/no’ questions. If the computer guesses correctly, it wins. If you manage to fool it, you win, but you must then provide the computer with a question that would allow it to guess correctly next time round. That’s all there is to it.
An example of the tree structure you end up with (from left->right):
Tree
/
Is it made of wood?
\
Grass
/
Is it green?
\
Pangolin
/
Does it have a legs?
\
Computer
/
Is it larger than a microwave?
\
Laptop
/
Does it have a keyboard?
\
Desk
Where this is made of the following struct:
typedef struct node {
char* object_name; // object-name (which may be NULL)
char* question; // question (which may be NULL)
struct node *yesNode; // where if yes (NULL)
struct node *noNode; // where if no (NULL)
} node;
Where the above tree is stored as the following file:
Is it green?
Does it have a legs?
Does it have a keyboard?
Desk
Is it larger than a microwave?
Laptop
Computer
Pangolin
Is it made of wood?
Grass
Tree
The Problem
The bonus is as follows:
Make your program take an arbitrary number of input files, and graft
them together to make a huge tree. Swap your files with your friends
and get a huge collection of pointless questions. To do this properly
you need to scan for duplicate entries (you won’t be able to do this
perfectly, but it should at least weed out obvious duplicates and
graft the trees together in sensible ways.)
Or summarised:
Take your two tree structs (or files representing tree structs) and merge them together to produce a larger tree, while removing duplicates.
My problem:
- How is it possible to remove two unsorted trees?
- Is it even possible to remove duplicates without manually checking every node?
- How do you even loop to the bottom of every node and compare that to the other tree?
My best idea:
- Search the second tree for objects that are in the first tree also
- Replace those objects with pangolin object
- Find all occurrences of pangolin in the first
- Point the parent to the top of the second tree instead
But this breaks the tree structure, and potentially makes the game break (say the top question in the first tree was is it red) and the second tree was attached on the no node and all the items in the second tree were red?
My main questions:
- Do you have any idea as to how you would solve this? Or a hack that would kind of work?
- How do you even find duplicates in two trees?
- How do you attach the trees together?
- How do you loop through the bottom of each tree?
I think the easiest way to join all your trees is by re-building a minimum entropy decision tree, like this:
When it comes to associating logically identical but linguistically different questions… that’s an extremely difficult natural language problem. Without doing any language analysis you could do something simple like compare the question strings ignoring case, punctuation and spacing.