Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 9241761
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 18, 20262026-06-18T08:26:01+00:00 2026-06-18T08:26:01+00:00

Im trying to analyse data from cycle accidents in the UK to find statistical

  • 0

Im trying to analyse data from cycle accidents in the UK to find statistical black spots. Here is the example of the data from another website. http://www.cycleinjury.co.uk/map

I am currently using SQLite to ~100k store lat / lon locations. I want to group nearby locations together. This task is called cluster analysis.

I would like simplify the dataset by ignoring isolated incidents and instead only showing the origin of clusters where more than one accident have taken place in a small area.

There are 3 problems I need to overcome.

  1. Performance – How do I ensure finding nearby points is quick. Should I use SQLite’s implementation of an R-Tree for example?

  2. Chains – How do I avoid picking up chains of nearby points?

  3. Density – How to take cycle population density into account? There is a far greater population density of cyclist in london then say Bristol, therefore there appears to be a greater number of backstops in London.

I would like to avoid ‘chain’ scenarios like this:

enter image description here

Instead I would like to find clusters:

enter image description here

London screenshot (I hand drew some clusters)…

enter image description here

Bristol screenshot – Much lower density – the same program ran over this area might not find any blackspots if relative density was not taken into account.

enter image description here

Any pointers would be great!

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-18T08:26:02+00:00Added an answer on June 18, 2026 at 8:26 am

    Well, your problem description reads exactly like the DBSCAN clustering algorithm (Wikipedia). It avoids chain effects in the sense that it requires them to be at least minPts objects.

    As for the differences in densities across, that is what OPTICS (Wikipedia) is supposed do solve. You may need to use a different way of extracting clusters though.

    Well, ok, maybe not 100% – you maybe want to have single hotspots, not areas that are “density connected”. When thinking of an OPTICS plot, I figure you are only interested in small but deep valleys, not in large valleys. You could probably use the OPTICS plot an scan for local minima of “at least 10 accidents”.

    Update: Thanks for the pointer to the data set. It’s really interesting. So I did not filter it down to cyclists, but right now I’m using all 1.2 million records with coordinates. I’ve fed them into ELKI for analysis, because it’s really fast, and it actually can use the geodetic distance (i.e. on latitude and longitude) instead of Euclidean distance, to avoid bias. I’ve enabled the R*-tree index with STR bulk loading, because that is supposed to help to get the runtime down a lot. I’m running OPTICS with Xi=.1, epsilon=1 (km) and minPts=100 (looking for large clusters only). Runtime was around 11 Minutes, not too bad. The OPTICS plot of course would be 1.2 million pixels wide, so it’s not really good for full visualization anymore. Given the huge threshold, it identified 18 clusters with 100-200 instances each. I’ll try to visualize these clusters next. But definitely try a lower minPts for your experiments.

    So here are the major clusters found:

    • 51.690713 -0.045545 a crossing on A10 north of London just past M25
    • 51.477804 -0.404462 “Waggoners Roundabout”
    • 51.690713 -0.045545 “Halton Cross Roundabout” or the crossing south of it
    • 51.436707 -0.499702 Fork of A30 and A308 Staines By-Pass
    • 53.556186 -2.489059 M61 exit to A58, North-West of Manchester
    • 55.170139 -1.532917 A189, North Seaton Roundabout
    • 55.067229 -1.577334 A189 and A19, just south of this, a four lane roundabout.
    • 51.570594 -0.096159 Manour House, Picadilly Line
    • 53.477601 -1.152863 M18 and A1(M)
    • 53.091369 -0.789684 A1, A17 and A46, a complex construct with roundabouts on both sides of A1.
    • 52.949281 -0.97896 A52 and A46
    • 50.659544 -1.15251 Isle of Wight, Sandown.
    • …

    Note, these are just random points taken from the clusters. It may be sensible to compute e.g. cluster center and radius instead, but I didn’t do that. I just wanted to get a glimpse of that data set, and it looks interesting.

    Here are some screenshots, with minPts=50, epsilon=0.1, xi=0.02:

    Some clusters in central London

    Notice that with OPTICS, clusters can be hierarchical. Here is a detail:

    Nested clusters - OPTICS produces a hierarchy

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I'm trying to access FFT data from the analyser node using analyser.getByteFrequencyData(array) but it
I'm trying to load some data from a plist file. It contains an NSArray
I'm trying to analyze data sets that are obtained from CSV files. After the
I am trying to get data from two different table and put into on
I'm trying to analyze data from the University of Minnesota IPUMS dataset for the
I am trying to analyse the IIS log and would like to look at
I am attempting to analyse a time series with spectral analysis. I am trying
I'm running python2.5 and trying to use the astLib library to analyse WCS information
I am trying to analyze my agent results from a collection of 20 txt
I'm trying to find any research/academic/journal papers/articles that analyze recent versions of Qt and

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.