Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 782771
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 14, 20262026-05-14T20:26:18+00:00 2026-05-14T20:26:18+00:00

this is the use case I’m trying to figure this out for. I have

  • 0

this is the use case I’m trying to figure this out for.

I have a list of spam subscriptions to a service and they are killing conversion rate and other usability studies.

The emails inserted look like the following:

rogerep_dyeepvu@hotmail.com

rogeram_ingramameb@hotmail.com

rogerew_jonesewct@hotmail.com

roger[…]_surname[…]@hotmail.com


What would be your suggestions on spotting these entries by using an automated script? It feels a little more complicated than it actually looks.

Help would be very much appreciated!

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-14T20:26:19+00:00Added an answer on May 14, 2026 at 8:26 pm

    I don’t think you can easily check for this. It’s not likely to be a simple string matching problem that you can throw a regular expression at because I would guess that your use of the name ‘Roger’ was just an example, and that any number of names can appear in that position. You could also run one of the regular expressions supplied by the other posters, parameterising it with every permutation of obvious first name and last name. This will probably take somewhere between “too long” and “forever”, and will flag up plenty of false positives.

    Another approach, which works with the pattern you posted above, would be to take the last 4 letters of the username, and compare them against something. Spotting characters that are random as opposed to arranged sensibly (given a specific language) can be done by training a Markov Chain on legitimate text which can then allow you to calculate the probability of a given 4 letters appearing in that order in that language. For random letters, this probability will typically come in far lower than for a legitimate name (although if there are special characters or digits in there, all bets are off).

    Another way might be to use a Bayesian filter (eg. something like Reverend in Python, though there are others) trained on the last 4 letters of legitimate email addresses. This would probably spot 95% of the ones which were just random, providing you made the data usable. eg. Submitting not just the 4 letters but each of the 2-letter and 3-letter substrings inside it, to capture the context of each letter. I don’t think this would work as well as the Markov-style method though.

    Whatever check you do, you can cut false positives by only submitting certain email addresses for it (eg. only those at webmail addresses, which contain an underscore, with at least 3 characters before the underscore and 5 characters after it.)

    But ultimately, you can never know whether it’s a spam address or a real one for sure until it gets used for one purpose or the other. So if possible I’d suggest giving up on trying to analyse the content and fix the problem somewhere else. In what way are they killing conversion rate? If you’re counting these dummy accounts in some sort of metric, you’d be best off adding a verification stage first and only caring about metrics for accounts that pass verification. Some people really do have addresses like rogerep_dyeepvu@hotmail.com, after all.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have checked similarly named questions, but they don't answer this use case. Basically,
I have this use case of an xml file with input like Input: <abc
My use case is this... I have a project that has two production branches.
I have a use case with NServiceBus explained in this question. Essentially there is
I have this use case scenario: there are places which are either playgrounds, restaurants,
I have this use case for which I am generating an image in the
I have this use case that is very similar to the robot-legs example of
Is this use case diagram over complicated? I'm trying to implement a use case
The use case is some what like this: public class SomeClass : ICloneable {
I'm starting to use Mercurial on my web server (in this case MediaTemple's Grid).

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.