Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 971597
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 16, 20262026-05-16T02:59:21+00:00 2026-05-16T02:59:21+00:00

I have an application which scrapes soccer results from different sources on the web.

  • 0

I have an application which scrapes soccer results from different sources on the web. Team names are not consistent on different websites – eg Manchester United might be called ‘Man Utd’ on one site, ‘Man United’ on a second, ‘Manchester United FC’ on a third. I need to map all possible derivations back to a single name (‘Manchester United’), and repeat the process for each of 20 teams in the league (Arsenal, Liverpool, Man City etc). Obviously I don’t want any bad matches [eg ‘Man City’ being mapped to ‘Manchester United’].

Right now I specify regexes for all the possible combinations – eg ‘Manchester United’ would be ‘man(chester)?(u|(utd)|(united))(fc)?’; this is fine for a couple of sites but is getting increasingly unwieldy. I’m looking for a solution which would avoid having to specify these regexes. Eg there must be a way to ‘score’ Man Utd so it gets a high score against ‘Manchester United’, but a low / zero score against ‘Liverpool’ [for example]; I’d test the sample text against all possible solutions and pick the one with the highest score.

My sense is that the solution may be similar to the classic example of a neural net being trained to recognise handwriting [ie there is a fixed set of possible outcomes, and a degree of noise in the input samples]

Anyone have any ideas ?

Thanks.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-16T02:59:22+00:00Added an answer on May 16, 2026 at 2:59 am

    You could use some similarity metric on the strings involved and a hand tuned threshold. Alternatively the threshold could also be trained by some a machine learning approach. Which particular similarity metric works best depends on the kind of strings you want to match. You might also need to pre-process the strings before applying a metric to them (i.e. remove noise characters like spaces etc., normalize capitalization, resolve common previously known abbreviations, …)

    For a quite comprehensive overview of different string similarity metrics and a Java library see http://www.dcs.shef.ac.uk/~sam/stringmetrics.html

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I'm writing an aggregation application which scrapes data from a couple of web sources
I have an application which extracts data from an XML file using XPath. If
I have an application which takes a string value of the form %programfiles%\directory\tool.exe from
I have a console application which screen scrapes some data, and now I need
I have a scraper, which queries different websites. Some of them varyingly use Content-Encoding.
i have application in which i have a web server api .this is my
I have an application which I am creating for Ubuntu. I come from a
I have an application which connects to a database, retrieves a username from a
I have java application which I am running on Unix from the command prompt.
I have application which needs to use a dll (also written by me) which

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.