I’m working on an NDB based Google App Engine application that needs to keep track of the day/night cycle of a large number (~2000) fixed locations. Because the latitude and longitude don’t ever change, I can precompute them ahead of time using something like PyEphem. I’m using NDB. As I see it, the possible strategies are:
-
To precompute a year’s worth of sunrises into datetime objects, put
them into a list, pickle the list and put it into a PickleProperty -
, but put the list into a JsonProperty
-
Go with DateTimeProperty and set repeated=True
Now, I’d like the very next sunrise/sunset property to be indexed, but that can be popped from the list and places into it’s own DateTimeProperty, so that I can periodically use a query to determine which locations have changed to a different part of the cycle. The whole list does not need to be indexed.
Does anyone know the relative effort -in terms of indexing and CPU load for these three approaches? Does repeated=True have an effect on the indexing?
Thanks,
Dave
The answers that suggest “just calculate them when instance starts” or “precompute those structures and output them into hardcoded python structures” appear to be ignoring the times-365 multiplier entailed by storing a year’s worth of sunrises, or the times-2000 multiplier if computing is done when an instance starts. Use pyEphem, 2000 sunrises and sunsets take more than two seconds to compute. Storing a year of sunrises and sunsets for 2000 locations in source code might use upwards of 20 megabytes. If the numbers are efficiently pickled, 2*365*2000*8 = 11,680,000 bytes are needed.
An approach that works faster and better is to set up a least-squares model for the times at one location in terms of those at others. This allows a roughly 70-fold reduction in total space used, as described below.
First, if points A and B are at the same latitude and have similar altitude and horizon parameters, then sunrise at A occurs at a constant time offset vs sunrise at B. For example, if A is 15 degrees west of B, sunrise occurs an hour later at A than at B. Second, if points A, B, C are at the same longitude and at low latitudes, the sunrise times at one point can be computed fairly accurately as a linear combination of the other two. At high latitudes or for better accuracy, linear combinations of several time curves can be used. Third, time of sunrise at point A on 20 March, the day of the spring equinox, can be used as a normalization point, so all calculations can be normalized to the same latitude.
The following table shows what sort of accuracy results using linear combinations of four time curves. For longitudes up to 46° away from the equator, results stay within about half a second. For 48° to 60°, results stay within 5 seconds. At 64°, results may be up to two minutes in error, and at 65°, up to about six minutes. But these times are probably good enough for most practical purposes. Note, at 66° the program shown below breaks down because it does not handle an exception pyEphem throws; “AlwaysUpError: ‘Sun’ is still above the horizon at 2013/6/14 07:20:15” occurs, even though 66° is below the Arctic Circle, 66.5622° N.
It is easy to modify the program so that it uses as many time curves as desired (see various
lata = ...statements in program), giving whatever accuracy is desired but at the cost of storing more curves and more coefficients. Of course the model can be varied to use subsets of time curves; for example, 10 curves could be stored and calculations done based on the 4 nearest in latitude to any given target latitude. However, for this demo program such refinements are not in place.Using the approach outlined above, for each of the 2000 locations, you need to store five floating point numbers: time of sunrise on 20 March, and four multiplier coefficients for four time curves. (The 70-fold reduction mentioned earlier is from storing 5 numbers per location, rather than 365 numbers.) For each time curve, 365 numbers are stored, with entry i being the sunrise time difference vs that on 20 March. Storing four time curves uses 1/500 as much space as storing 2000 of them, so curve storage space is dominated by that for multiplier coefficients.
Before I give the program that uses scipy.optimize.leastsq to solve for coefficients, here are two code snippets that can be used, in the ipython interpreter, to make accuracy tables and to draw plots for visualizing errors.
The above produces most of the error table shown earlier. The third parameter of
lsris calleddaySkipand the value 4 makeslsrwork with every fourth day (ie only about 90 days of the year) for faster testing. Usingsr.lsr(lat, -110, 2013, 1)produces similar results but takes four times as long.The above tells sunrise.plotData to plot everything (the sunrise data to be approximated; the model’s resulting approximation; the residuals, scaled to be in seconds; and the cardinal curves.)
The program is shown below. Note that it has been tested mostly for Northern hemisphere longitudes. If time curves are symmetric enough, the program as-is will handle Southern hemisphere longitudes; if errors are too large, Southern hemisphere longitudes can be added in to the cardinal curves or the model can be changed to use a separate set of curves south of the equator. Note that sunsets aren’t calculated in this program. For sunsets, add
next_setting(ephem.Sun())calls analogous to theprevious_rising(ephem.Sun())calls, and store an additional four time curves.