Suppose we want to represent molecules in terms of graphs, where each node is an atom, and each edge is a connection between atoms. What would be an algorithm to decide whether two graphs (representing molecules) are equivalent. Since molecules are being represented, each node would need an attribute defining which molecule it is (Carbon, Nitrogen, Oxygen etc).
To make it easier, suppose each graph branches off from the same root atom, Nitrogen, which we can use as the starting node of our algorithm.
eg. N-X, N-Y, N-Z. Where N is the root Nitrogen node and X,Y,Z are the rest of the graph.
I agree with Joachim Isaksson’s answer that the general case — heck, even a less-than-general case — is hard to solve. But I’d like to propose a strategy for solving the relatively narrow case of a molecular tree graph with a specified starting element. [Note that this is the equivalent of Peter de Rivaz’s answer which was posted while I was working on this answer.]
First let’s define a form or a language to describe a molecular graph that is unique to that graph — only one string can be formed by the graph. This will allow us to compare two strings to determine whether two graphs are the same so that your problem is reduced down to creating two correct strings to compare. (This approach also has the benefit of being easier to debug visually than an outright graph-comparing algorithm.) I usually see molecules described in forms like H2O and H2SO4, but this approach doesn’t preserve the graphy-ness of the molecules and so can’t be used for comparison (H2O could be water or some other very weird arrangement of its elements). So let’s use something Lisp-like to describe the molecule according to these rules:
(and ends with)With these starting rules, we can now describe H2O in more graph-like terms:
Ois the root of the graph so it starts a new graph:(O)His the first child ofO, but it has no children, so it’s listed in its parent as-is, not as the start of a subgraph:(OH)His the second child ofO, but is has no children, so it’s listed in its parent as-is, not as the start of a subgraph:(OHH)So
becomes
(OHH).Ordering in this case doesn’t matter, since the two
Hs are childless and elementally equivalent.Let’s try a weird, irrational form of H2O to test out this approach:
Ois the root of the graph so it starts a new graph:(O)His the first child ofO. It has a child so it starts a new graph:(O(H))His the first child ofH, but is has no children, so it’s listed in its parent as-is, not as the start of a subgraph:(O(HH))We know that our approach so far can handle simple cases like H2O where ordering isn’t a concern, but H2SO4 won’t work without consistently ordering the
Oelements coming off of theS. It’s not possible to give a meaningful order to a child until its subgraph (if it has one) has been calculated, so we’ll add a final rule to execute:(and ends with)Revisiting H2O with this new rule produces the same output because the two
Hs are alphabetically equivalent and they have no children. So let’s try H2SO4:Sis the root of the graph so it starts a new graph:(S)Ois the first child ofS. It has children, so it starts a new graph:(S[unsorted:](O))‘H’ is the first child of
O. It has no children. There are no other children to process, so sorting at this level isn’t needed:(S[unsorted:](OH))Ois the second child ofS. This one has no children:(S[unsorted:](OH)O)Ois the third child ofS. It has children, so it starts a new graph:(S[unsorted:](OH)O(O))‘H’ is the first child of
O. It has no children. There are no other children to process, so sorting at this level isn’t needed:(S[unsorted:](OH)O(OH))Ois the fourth child ofS. This one has no children:(S[unsorted:](OH)O(OH)O)Salphabetically:(S(OH)(OH)OO)(Note that I’m giving subgraphs special treatment in alphabetical comparison, but that’s not a requirement.)The final result is
(S(OH)(OH)OO)Let’s try a variation of H2SO4 to see what it produces. Please note that this isn’t a proof that the approach is good, just a demonstration of how a variation in the graph produces a different result.
Sis the root of the graph so it starts a new graph:(S)Ois the first child ofS. It has children, so it starts a new graph:(S[unsorted:](O))Ois the first child. It has no children.(S[unsorted:](O[unsorted:]O))His the second child, with no children.(S[unsorted:](O[unsorted:]OH))O:(S[unsorted:](OHO)Ois the second child ofS. It has children, so it starts a new graph:(S[unsorted:](O))His the first child. It has no children.(S[unsorted:](OHO)(O[unsorted:]H))Ois the second child, with no children.(S[unsorted:](OHO)(O[unsorted:]HO))O:(S[unsorted:](OHO)(OHO))Salphabetically:(S(OHO)(OHO))This H2SO4 (
(S(OHO)(OHO))) is different than the previous one ((S(OH)(OH)OO)).I made no attempt to formally prove that this approach is guaranteed to work or even to describe it formally, or to account for the broad range of molecular minutia out there, like bond counts and the like. At the very least, though, I hope this encourages you to try solving your graph comparison problem. I think it’s doable.