Suppose we want to represent molecules in terms of graphs, where each node is

Question

0

Asked: June 12, 20262026-06-12T13:03:37+00:00 2026-06-12T13:03:37+00:00

Suppose we want to represent molecules in terms of graphs, where each node is

0

Suppose we want to represent molecules in terms of graphs, where each node is an atom, and each edge is a connection between atoms. What would be an algorithm to decide whether two graphs (representing molecules) are equivalent. Since molecules are being represented, each node would need an attribute defining which molecule it is (Carbon, Nitrogen, Oxygen etc).

To make it easier, suppose each graph branches off from the same root atom, Nitrogen, which we can use as the starting node of our algorithm.

eg. N-X, N-Y, N-Z. Where N is the root Nitrogen node and X,Y,Z are the rest of the graph.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-12T13:03:38+00:00

I agree with Joachim Isaksson’s answer that the general case — heck, even a less-than-general case — is hard to solve. But I’d like to propose a strategy for solving the relatively narrow case of a molecular tree graph with a specified starting element. [Note that this is the equivalent of Peter de Rivaz’s answer which was posted while I was working on this answer.]

First let’s define a form or a language to describe a molecular graph that is unique to that graph — only one string can be formed by the graph. This will allow us to compare two strings to determine whether two graphs are the same so that your problem is reduced down to creating two correct strings to compare. (This approach also has the benefit of being easier to debug visually than an outright graph-comparing algorithm.) I usually see molecules described in forms like H₂O and H₂SO₄, but this approach doesn’t preserve the graphy-ness of the molecules and so can’t be used for comparison (H₂O could be water or some other very weird arrangement of its elements). So let’s use something Lisp-like to describe the molecule according to these rules:

Each graph (including subgraphs) begins with ( and ends with )
Any node A that has children is the first node listed in a new graph, and this graph is listed in its parent graph in a particular order (to be determined later)
Any node A that has no children is listed in its parent graph in a particular order (to be determined later)

With these starting rules, we can now describe H₂O in more graph-like terms:

O is the root of the graph so it starts a new graph: (O)
H is the first child of O, but it has no children, so it’s listed in its parent as-is, not as the start of a subgraph: (OH)
H is the second child of O, but is has no children, so it’s listed in its parent as-is, not as the start of a subgraph: (OHH)

So

graph of H2O

becomes (OHH).

Ordering in this case doesn’t matter, since the two Hs are childless and elementally equivalent.

Let’s try a weird, irrational form of H₂O to test out this approach:

graph of O-H-H

O is the root of the graph so it starts a new graph: (O)
H is the first child of O. It has a child so it starts a new graph: (O(H))
H is the first child of H, but is has no children, so it’s listed in its parent as-is, not as the start of a subgraph: (O(HH))

We know that our approach so far can handle simple cases like H₂O where ordering isn’t a concern, but H₂SO₄ won’t work without consistently ordering the O elements coming off of the S. It’s not possible to give a meaningful order to a child until its subgraph (if it has one) has been calculated, so we’ll add a final rule to execute:

Each graph (including subgraphs) begins with ( and ends with )
Any node A that has children is the first node listed in a new graph, and this graph is listed in its parent graph in a particular order (see step 4)
Any node A that has no children is listed in its parent graph in a particular order (see step 4)
After all child nodes have been visited and their subgraphs (if any) have been created, arrange the child nodes / child subgraphs in the parent node alphabetically

Revisiting H₂O with this new rule produces the same output because the two Hs are alphabetically equivalent and they have no children. So let’s try H₂SO₄:

graph of h2so4

S is the root of the graph so it starts a new graph: (S)
O is the first child of S. It has children, so it starts a new graph: (S[unsorted:](O))
‘H’ is the first child of O. It has no children. There are no other children to process, so sorting at this level isn’t needed: (S[unsorted:](OH))
O is the second child of S. This one has no children: (S[unsorted:](OH)O)
O is the third child of S. It has children, so it starts a new graph: (S[unsorted:](OH)O(O))
‘H’ is the first child of O. It has no children. There are no other children to process, so sorting at this level isn’t needed: (S[unsorted:](OH)O(OH))
O is the fourth child of S. This one has no children: (S[unsorted:](OH)O(OH)O)
Finally, sort the children of S alphabetically: (S(OH)(OH)OO) (Note that I’m giving subgraphs special treatment in alphabetical comparison, but that’s not a requirement.)

The final result is (S(OH)(OH)OO)

Let’s try a variation of H₂SO₄ to see what it produces. Please note that this isn’t a proof that the approach is good, just a demonstration of how a variation in the graph produces a different result.

graph of some weird H2SO4

S is the root of the graph so it starts a new graph: (S)
O is the first child of S. It has children, so it starts a new graph: (S[unsorted:](O))
O is the first child. It has no children. (S[unsorted:](O[unsorted:]O))
H is the second child, with no children. (S[unsorted:](O[unsorted:]OH))
Now sort the children of O: (S[unsorted:](OHO)
O is the second child of S. It has children, so it starts a new graph: (S[unsorted:](O))
H is the first child. It has no children. (S[unsorted:](OHO)(O[unsorted:]H))
O is the second child, with no children. (S[unsorted:](OHO)(O[unsorted:]HO))
Now sort the children of O: (S[unsorted:](OHO)(OHO))
Finally, sort the children of S alphabetically: (S(OHO)(OHO))

This H₂SO₄ ((S(OHO)(OHO))) is different than the previous one ((S(OH)(OH)OO)).

I made no attempt to formally prove that this approach is guaranteed to work or even to describe it formally, or to account for the broad range of molecular minutia out there, like bond counts and the like. At the very least, though, I hope this encourages you to try solving your graph comparison problem. I think it’s doable.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Suppose we want to represent molecules in terms of graphs, where each node is

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply