Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7662837
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 31, 20262026-05-31T13:54:13+00:00 2026-05-31T13:54:13+00:00

I would like to create algorithm to distinguish the persons writing on forum under

  • 0

I would like to create algorithm to distinguish the persons writing on forum under different nicknames.

The goal is to discover people registring new account to flame forum anonymously, not under their main account.

Basicaly I was thinking about stemming words they use and compare users according to similarities or these words.

Users using words

As shown on the picture there is user3 and user4 who uses same words. It means there is probably one person behind the computer.

Its clear that there are lot of common words which are being used by all users. So I should focus on “user specific” words.

Input is (related to the image above):

<word1, user1>
<word2, user1>
<word2, user2>
<word3, user2>
<word4, user2>
<word5, user3>
<word5, user4>
... etc. The order doesnt matter

Output should be:

user1
user2
user3 = user4

I am doing this in Java but I want this question to be language independent.

Any ideas how to do it?

1) how to store words/users? What data structures?

2) how to get rid of common words everybody use? I have to somehow ignore them among user specific words. Maybe I could just ignore them because they get lost. I am afraid that they will hide significant difference of “user specific words”

3) how to recognize same users? – somehow count same words between each user?

I am very thankful for every advice in advance.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-31T13:54:15+00:00Added an answer on May 31, 2026 at 1:54 pm

    I recommend a language modelling approach. You can train a language model (unigram, bigram, parsimonious, …) on each of your user accounts’ words. That gives you a mapping from words to probabilities, i.e. numbers between 0 and 1 (inclusive) expressing how likely it is that a user uses each of the words you encountered in the complete training set. Language models can be stored as arrays of pairs, hash tables or sparse vectors. There are plenty of libraries on the web for fitting LMs.

    Such a mapping can be considered a high-dimensional vector, in the same way documents are considered as vector in the vector space model of information retrieval. You can then compare these vectors by using KL-divergence or any of the popular distance metrics: Euclidean distance, cosine distance, etc. A strong similarity/small distance between two users’ vectors might then indicate that they belong to one and the same user.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I would like create my own collection that has all the attributes of python
I would like create a web service in ASP.Net 2.0 that will supports JSON.
i would like create a array of structure which have a dynamic array :
Would like to create a strong password in C++. Any suggestions? I assume it
I would like to create a database backed interactive AJAX webapp which has a
I would like to create an SSL connection for generic TCP communication. I think
I would like to create a file format for my app like Quake, OO,
I would like to create a stored procedure in MySQL that took a list
I would like to create an application wide keyboard shortcut for a Java Swing
I would like to create events for certain resources that are used across various

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.