I am wondering if it is possible to remove any shape that is fully covered by other shapes?
I am often generating scatter plots of particles where some of them are located close to each other and since the number of particles can easily be 100k these plots become quite overwhelming.
Consider the following simple example:
import matplotlib.pyplot as plt
import numpy as np
N = 10000
x = np.random.randn(N)
y = np.random.randn(N)
plt.scatter(x,y)
plt.savefig('unseen.pdf')
When using a value of N that is greater than 10000 a large majority of the circles are underneath other circles and cannot be seen. However, when opening the resulting pdf-file all circles are drawn and the time to open the file is increasing even though the number of visible circles is almost the same.
Time to open figure in pdf-viewer (doesn’t matter which):
N=10000 > 5s (2.4MB)
N=20000 > 10s (4.8MB)
N=40000 > 20s (9.5MB)
A linear increase both in time and file size, just as expected when increasing the number of circles.
Does anyone have an idea on how one could get around this?
I think you should save the plot as a raster image and then embed it into a pdf (
cairomodule works great).In my experience, most people won’t zoom so much inside a PDF so as to make a difference between vector and image. Besides, your vector stuff is heavy enough to justify the use of a higher DPI image without increasing filesize.
Also, a good tip is to plot transparent circles without borders, using
ms(markersize) andmew(marker edge width) andalphakeyword parameters. The visual effect is stunning. Instead ofyou can do
try and see!
Hope this helps!