I have this query SELECT zip, ( 3959 * acos( cos( radians(34.12520) ) *

Question

0

Asked: June 5, 20262026-06-05T21:15:12+00:00 2026-06-05T21:15:12+00:00

I have this query SELECT zip, ( 3959 * acos( cos( radians(34.12520) ) *

0

I have this query

SELECT zip, 
( 3959 * acos( cos( radians(34.12520) ) * cos( radians( zip_info.latitude ) ) * cos(radians( zip_info.longitude ) - radians(-118.29200) ) + sin( radians(34.12520) ) * sin( radians( zip_info.latitude ) ) ) ) AS distance, 
user_info.*, office_locations.* 

FROM zip_info 

RIGHT JOIN office_locations ON office_locations.zipcode = zip_info.zip 

RIGHT JOIN user_info ON office_locations.doctor_id = user_info.id 

WHERE user_info.status='yes' 

HAVING distance < 50 ORDER BY distance ASC

It outputs

distance | doctor_id | etc.

7 ————— 5 ——- etc

8 ————— 4 ——- etc

34 ————— 4 ——- etc

49 ————— 5 ——- etc

When I select a distance of 30 or less, it shows the top two results as well, which is good.

The Problem : I do not want to show more than one result per doctor_id so I do a GROUP BY user_info.doctor_id, which shows no results when distance is less than 50. For some reason it wants to have all the results to group otherwise it won’t work. Any tips? Anything else you need to help me out?

So What I want is

distance | doctor_id | etc.

7 ————— 5 ——- etc

8 ————— 4 ——- etc

Even though it wants to give me all 4 rows for results, I just want to group them so only the ones with smallest distance per unique user_info.doctor_id show up. Keep in mind distance is a virtual non existent table.

Based on llion’s query here are the results:

 (concat(user_info.id))     zip     distance    id
          1                 NULL    6.6643992   1

It only gives one result, and in order to get it to work, I had to change the AND to HAVING distance again.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-05T21:15:13+00:00

I don’t believe a GROUP BY is going to give you the result you want. And unfortunately, MySQL does not support analytic functions (which is how we would solve this problem in Oracle or SQL Server.)

It’s possible to emulate some rudimentary analytic functions, by making use of user-defined variables.

In this case, we want to emulate:

ROW_NUMBER() OVER(PARTITION BY doctor_id ORDER BY distance ASC) AS seq

So, starting with the original query, I changed the ORDER BY so that it sorts on doctor_id first, and then on the calculated distance. (Until we know those distances, we don’t know which one is “closest”.)

With this sorted result, we basically “number” the rows for each doctor_id, the closest one as 1, the second closest as 2, and so on. When we get a new doctor_id, we start again with the closest as 1.

To accomplish this, we make use of user-defined variables. We use one for assigning the row number (the variable name is @i, and returned column has the alias seq). The other variable we use to “remember” the doctor_id from the previous row, so we can detect a “break” in the doctor_id, so we can know when to restart the row numbering at 1 again.

Here’s the query:

SELECT z.*
, @i := CASE WHEN z.doctor_id = @prev_doctor_id THEN @i + 1 ELSE 1 END AS seq
, @prev_doctor_id := z.doctor_id AS prev_doctor_id
FROM
(

  /* original query, ordered by doctor_id and then by distance */
  SELECT zip, 
  ( 3959 * acos( cos( radians(34.12520) ) * cos( radians( zip_info.latitude ) ) * cos(radians( zip_info.longitude ) - radians(-118.29200) ) + sin( radians(34.12520) ) * sin( radians( zip_info.latitude ) ) ) ) AS distance, 
  user_info.*, office_locations.* 
  FROM zip_info 
  RIGHT JOIN office_locations ON office_locations.zipcode = zip_info.zip 
  RIGHT JOIN user_info ON office_locations.doctor_id = user_info.id 
  WHERE user_info.status='yes' 
  ORDER BY user_info.doctor_id ASC, distance ASC

) z JOIN (SELECT @i := 0, @prev_doctor_id := NULL) i
HAVING seq = 1 ORDER BY z.distance

I’m making an assumption that the original query is returning the result set you need, it just has too many rows, and you want to eliminate all but the “closest” (the row with the minimum value of distance) for each doctor_id.

I’ve wrapped your original query in another query; the only changes I made to the original query was to order the results by doctor_id and then by distance, and to remove the HAVING distance < 50 clause. (If you only want to return distances less than 50, then go ahead and leave that clause there. It wasn’t clear whether that was your intent, or whether that was specified in an attempt to limit rows to one per doctor_id.)

A couple of issues to note:

The replacement query returns two additional columns; these aren’t really needed in the result set, except as means to generate the result set. (It’s possible to wrap this whole SELECT again in another SELECT to omit those columns, but that is really more messy than it’s worth. I would just retrieve the columns, and know that I can ignore them.)

The other issue is that the use of the .* in the inner query is a bit dangerous, in that we really need to guarantee that the column names returned by that query are unique. (Even if the column names are distinct right now, the addition of a column to one of those tables could introduce an “ambiguous” column exception in the query. It’s best to avoid that, and that’s easily addressed by replacing the .* with the list of columns to be returned, and specifying an alias for any “duplicate” column name. (The use of the z.* in the outer query is not a concern, as long as we are in control of the columns returned by z.)

Addendum:

I noted that a GROUP BY wasn’t going to give you the result set you needed. While it would be possible to get the result set with a query using GROUP BY, a statement that returns the CORRECT result set would be tedious. You could specify MIN(distance) ... GROUP BY doctor_id, and that would get you the smallest distance, BUT there is no guarantee that the other non-aggregate expressions in the SELECT list would be from the row with the minimum distance, and not some other row. (MySQL is dangerously liberal in regards to GROUP BY and aggregates. To get the MySQL engine to be more cautious (and in line with other relational database engines), SET sql_mode = ONLY_FULL_GROUP_BY

Addendum 2:

Performance Issues reported by Darious “some queries take 7 seconds.”

To speed things up, you probably want to cache the results of the function. Basically, build a lookup table. e.g.

CREATE TABLE office_location_distance
( office_location_id INT UNSIGNED NOT NULL COMMENT 'PK, FK to office_location.id'
, zipcode_id         INT UNSIGNED NOT NULL COMMENT 'PK, FK to zipcode.id'
, gc_distance        DECIMAL(18,2)         COMMENT 'calculated gc distance, in miles'
, PRIMARY KEY (office_location_id, zipcode_id)
, KEY (zipcode_id, gc_distance, office_location_id)
, CONSTRAINT distance_lookup_office_FK
  FOREIGN KEY (office_location_id) REFERENCES office_location(id)
  ON UPDATE CASCADE ON DELETE CASCADE
, CONSTRAINT distance_lookup_zipcode_FK
  FOREIGN KEY (zipcode_id) REFERENCES zipcode(id)
  ON UPDATE CASCADE ON DELETE CASCADE
) ENGINE=InnoDB

That’s just an idea. (I expect that you are searching for office_location distance from a particular zipcode, so the index on (zipcode, gc_distance, office_location_id) is the covering index your query would need. (I would avoid storing the calculated distance as a FLOAT, due to poor query performance with FLOAT datatype)

INSERT INTO office_location_distance (office_location_id, zipcode_id, gc_distance)
SELECT d.office_location_id
     , d.zipcode_id
     , d.gc_distance
  FROM (
         SELECT l.id AS office_location_id
              , z.id AS zipcode_id
              , ROUND( <glorious_great_circle_calculation> ,2) AS gc_distance
           FROM office_location l
          CROSS
           JOIN zipcode z
          ORDER BY 1,3
       ) d
ON DUPLICATE KEY UPDATE gc_distance = VALUES(gc_distance)

With the function results cached and indexed, your queries should be much faster.

SELECT d.gc_distance, o.*
  FROM office_location o
  JOIN office_location_distance d ON d.office_location_id = o.id
 WHERE d.zipcode_id = 63101
   AND d.gc_distance <= 100.00
 ORDER BY d.zipcode_id, d.gc_distance

I am hesitant about adding a HAVING predicate on the INSERT/UPDATE to the cache table; (if you had a wrong latitude/longitude, and had calculated an erroneous distance under 100 miles; a subsequent run after the lat/long is fixed and the distance works out to 1000 miles… if the row is excluded from the query, then existing row in the cache table won’t get updated. (You could clear the cache table, but that’s not really necessary, that’s just a lot of extra work for the database and logs. If the result set of the maintenance query is too large, it could be broken down to run iteratively for each zipcode, or each office_location.)

On the other hand, if you aren’t interested in any distances over a certain value, you could add the HAVING gc_distance < predicate, and cut down the size of the cache table considerably.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have this query SELECT zip, ( 3959 * acos( cos( radians(34.12520) ) *

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply