I have the following problem. I have brain regions and correlations between them. The brain regions I know the distances of. Now, we expect the correlations to be negatively correlated to the distance between the brain regions. So when we increase distance correlation goes down to zero. The expectation is that this is by 1/D^2.
I want to visualize my correlation matrix to check for abnormalities. I have already some other implementations like Taiyun’s correlation matrix visualization and a simple 2D scatterplot with the 1/D^2 curve as a blue line.
Next I want to have something based on correlation circles.
The brain regions I have created a Node class for. So my brain regions are nodes.
I mimic correlation with Edges. My Edges have a sourceNode and a destinationNode and also a correlation and distance so I can couple them to the correct Node. The distance and correlation are needed for table lookup (backcoupling to regionID and regionName etc).
Now what I want is to place all nodes on a circle so that the nodes which have a small distance to eachother are placed close together, and nodes far away from eachother are placed further away. This way the strong edges (which are thick) are close to eachother. And when you have a very strong edge crossing the circle it is awkward and the eye spots it easily. Of course I seek an optimum, as pointed out below a single real answer does not excist.
I have been searching google but since I do not have a clue what to search for I have found no results. I suspect there is a name for a standard algorithm for this but i do not know it. A link to such an algorithm is okay too.
The thing I came up with so far is to arrange the nodes on the circle in such a way that the SUM of all distances is smallest. But for this I need to make a sort of point system so regions which are close to each other and placed close to each other get for instance some +points and points close to each other but placed further away from each other get some downpoints. Now optimize the point algorithm and get highest outcome.
Any tips on this matter? My math is not that great ;). I’m currently googling on circles, nodes, weights..
Note
If you have any other bright ideas to visualize the matrix be sure to PM me about it, or comment here :).
The general problem that you describe doesn’t have a solution because you’re trying to make a map from a 2D surface to a 1D line that preserves all distances, and this isn’t possible. If there was a particular region that you wanted to compare to all others, you could then put all the others around a circle so their distance match the distance to this region (but then the distance between these other regions will be distorted).
But you can certainly do better than just random in approximating the distance. Here’s an approach: The first step would be to do multiple random arrangements and then pick the best of these. The next improvement would be to optimize each of these arrangements against some cost function by moving the regions around in small steps until they settle into a local minimum, and then pick the best of these local minima. The results of this is shown in the plots below, and the Python code is further down.
Interestingly, this seems to have resulted in far things being far-ish, and near things being near-ish, but with the mid-range orders messed up, so maybe this will do what you want? (This also illustrates the fundamental problem in going from 2D to 1D. For example, on the circle the 4 wants to be further from the 9, but it can’t do that without getting closer to the other numbers, whereas in 2D it could just go out to the side.)
You’ll probably want to modify
cost_fncwhich specifies the penalty for having the distances of points on the circle not match the distance from the 2D arrangement. Changing this in a way to increase the costs for large errors (say a quadradic), or to emphasise a cost for the large distance being right, sayd_target*(abs(d_actual-d_target)), etc, might help.Also, changing the size of the circle relative to the size of the 2D data will change the look of this quite a lot, and you probably will want to circle somewhat smaller than the data, as I’ve done here, as this will spread the points around the circle more. (Here the circle has R=1, so just scale the data appropriately.) Also note that this will make a quantitative assessment of the cost to be not very meaningful as the best arrangements never get very to very low cost since some regions can never be as far apart as in the 2D data.
The point of running multiple random starts is that the evolving arrangement can get stuck in local minima. This technique seems to be useful: settling helps in getting the distance right and the costs down (plot #3, blue dots=initial random, diamonds=local minimum) and it helps some initial arrangements much more than others, so it’s good to try multiple initial arrangements. Also, since a number of these seem to settle to around 15 it gives some confidence that this arrangement might be representative.