Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 831065
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 15, 20262026-05-15T04:08:43+00:00 2026-05-15T04:08:43+00:00

I am building a website in python/django and want to predict wether a user

  • 0

I am building a website in python/django and want to predict wether a user submission is valid or wether it is spam.

Users have an accept rate on their submissions, like this website has.

Users can moderate other users’ submissions; and these moderations are later metamoderated by an admin.

Given this:

  • the registered user A with an submission accept rate of 60% submits something.
  • user B moderates A’s post as a valid submission. However, user B is wrong 70% of the time.
  • user C moderates A’s post as spam. User C is usually right. If user C says something is spam/ no spam, this will be correct 80% of the time.

How can I predict the chance of A’s post being spam?

Edit: I made a python script simulating this scenario:

#!/usr/bin/env python

import random

def submit(p):
    """Return 'ham' with (p*100)% probability"""
    return 'ham' if random.random() < p else 'spam'

def moderate(p, ham_or_spam):
    """Moderate ham as ham and spam as spam with (p*100)% probability"""
    if ham_or_spam == 'spam':
        return 'spam' if random.random() < p else 'ham'
    if ham_or_spam == 'ham':
        return 'ham' if random.random() < p else 'spam'

NUMBER_OF_SUBMISSIONS = 100000 
USER_A_HAM_RATIO = 0.6 # Will submit 60% ham
USER_B_PRECISION = 0.3 # Will moderate a submission correctly 30% of the time
USER_C_PRECISION = 0.8 # Will moderate a submission correctly 80% of the time

user_a_submissions = [submit(USER_A_HAM_RATIO) \
                        for i in xrange(NUMBER_OF_SUBMISSIONS)]

print "User A has made %d submissions. %d of them are 'ham'." \
        % ( len(user_a_submissions), user_a_submissions.count('ham'))

user_b_moderations = [ moderate( USER_B_PRECISION, ham_or_spam) \
                        for ham_or_spam in user_a_submissions]

user_b_moderations_which_are_correct = \
    [i for i, j in zip(user_a_submissions, user_b_moderations) if i == j]

print "User B has correctly moderated %d submissions." % \
    len(user_b_moderations_which_are_correct)

user_c_moderations = [ moderate( USER_C_PRECISION, ham_or_spam) \
                        for ham_or_spam in user_a_submissions]

user_c_moderations_which_are_correct = \
    [i for i, j in zip(user_a_submissions, user_c_moderations) if i == j]

print "User C has correctly moderated %d submissions." % \
    len(user_c_moderations_which_are_correct)

i = 0
j = 0    
k = 0 
for a, b, c in zip(user_a_submissions, user_b_moderations, user_c_moderations):
    if b == 'spam' and c == 'ham':
        i += 1
        if a == 'spam':
            j += 1
        elif a == "ham":
            k += 1

print "'spam' was identified as 'spam' by user B and 'ham' by user C %d times." % j
print "'ham' was identified as 'spam' by user B and 'ham' by user C %d times." % k
print "If user B says it's spam and user C says it's ham, it will be spam \
        %.2f percent of the time, and ham %.2f percent of the time." % \
         ( float(j)/i*100, float(k)/i*100)

Running the script gives me this output:

  • User A has made 100000 submissions. 60194 of them are ‘ham’.
  • User B has correctly moderated 29864 submissions.
  • User C has correctly moderated 79990 submissions.
  • ‘spam’ was identified as ‘spam’ by user B and ‘ham’ by user C 2346 times.
  • ‘ham’ was identified as ‘spam’ by user B and ‘ham’ by user C 33634 times.
  • If user B says it’s spam and user C says it’s ham, it will be spam 6.52 percent of the time, and ham 93.48 percent of the time.

Is the probability here reasonable? Would this be the correct way to simulate the scenario?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-15T04:08:43+00:00Added an answer on May 15, 2026 at 4:08 am

    Bayes’ Theorem tells us:

    <code>P(A|B)=P(B|A)P(A)/P(B)</code>

    Let’s change the letters for the events A and B to X and Y resp. because you’re using A, B and C to stand for people, and it would make things confusing:

    P(X|Y) = P(Y|X) P(X) / P(Y)
    

    Edit: the following is slightly wrong because X should be this post _by A_ is spam, not just “this post is spam” (and thus Y should just be “B accepts A’s post, C rejects it”). I’m not redoing the math right here because the numbers are changed anyway — see other edit below for the right number and correct arithmetic.

    You want X to mean “this post is spam”, Y to stand for the combination of circumstances A has posted it, B approved it, C rejected it (and let’s assume conditional independence of the circumstances in question).

    We need P(X), the a priori probability that any post (no matter who makes it or approves it) is spam; P(Y), the a priori probability that a post would be made by A, approved by B, rejected by C (whether it’s spam or not); and P(Y | X), same as the latter but with the post being spam.

    As you may be noticing, you haven’t really given us all the bits and pieces we need for the computation. You three points tell us: a given post by A is spam with a probability of 0.4 (that seems to be how the first point reads); B’s acceptance probabilty is 0.3 but we have no idea how that differs for spam and non-spam, except that there should be “little” difference (low accuracy); C’s is 0.8 and again we don’t know how that’s influenced by spam vs non-spam, except that there should be “large” difference (high accuracy).

    So we need some more numbers! The fact that C has high accuracy while accepting 80% of the posts tells us that overall spam must be surprisingly low — if spam overall was the same 40% as for A, then C would have to accept half of it (even if he was perfect on always accepting non-spam) to get the overall 80% accept rate, and that would be hardly “high accuracy”. So say spam overall is just 20% and C only accepts 1/4 of it (and rejects 1/16 of non-spam), pretty good accuracy indeed and overall matching the numbers you’re giving.

    Guessing for B, who accepts at 30% overall, and now “knowing” that spam overall is 20%, we could guess that B accepts 1/4 of the spam and only 5/16 of non-spam.

    So: P(X)=0.2; P(Y)=0.3*0.2=0.06 (B’s overall acceptance times C’s rejection prob); P(Y|X)=0.4*0.25*0.75=0.075 (A’s prob of spamming time B’s prob of accepting spam time C’s prob of rejecting spam).

    So P(X|Y)=0.075*0.2/0.06=0.25 — unless I’ve made some arithmetic error (quite possible, the point is mostly to show you how one can reason in such cases;-), the probability of this particular post being spam is 0.25 — a bit higher than the probability of any random post being spam, lower than the probability of a random post by A being spam.

    But of course (even under the simplifying hypothesis of conditional independence all over hte place;=) this computation is highly sensitive to my guesses/hypotheses about the ratios of false positives vs false negatives for B and C, and the overall spam ratio. There are five numbers of this kind involved (overall spam prob, conditional prob for each of B and C for spam and non-spam) and you only give us two relevant (linear) constraints (unconditional prob of acceptance for B and C) and two vague “handwaving” statements (about low and high accuracy), so there’s plenty of degrees of freedom there.

    If you can better estimate the five key numbers, the computation can be made more precise.

    And, BTW, Python (and a fortiori Django) have absolutely nothing to do with the case — I recommend you remove those irrelevant tags to get a broader range of responses!

    Edit: the user clarifies (in a comment — shd really edit his Q!):

    When I said “B’s moderations’ accept
    rate is a mere 30%” I mean that for
    every ten times B moderates something
    spam/no spam he makes the wrong
    decision 7 times. So there is a 70%
    chance he will tag something spam/no
    spam when it is not. For user C, “His
    moderations’ accept rate is 80%” means
    that if C says something is spam or no
    spam, he is right 80% of the time.
    Overall chance of a registered user
    spamming is 20%.

    …and asks me to redo the math (I’m assuming false positives and negatives are equally likely for each of B and C). Note that B’s an excellent “contrarian indicator”, since he’s wrong 70% of the time!-).

    Anyway: B’s overall acceptance rate of A’s posts must then be 0.6*0.3 (for when he accepts A’s nonspam) + 0.4*0.7 (for when he accepts A’s spam) = 0.18 + 0.28 = 0.46; C’s must be 0.8*0.4 + 0.2*0.6 = 0.32 + 0.12 = 0.44. So we have…:

    P(X)=0.4 (I had it wrong at 0.2 before since I was ignoring the fact that A‘s probability of spamming is 0.4 — the overall prob of spam is not relevant, since we know that the post is A’s!); P(Y)=0.46*0.56=0.2576 (B’s overall acceptance rate for A times C’s rejection prob for A); P(Y|X)=0.7*0.8=0.56 (B’s prob of accepting spam time C’s prob of rejecting spam).

    So P(X|Y)=0.56*0.4/0.2576=0.87 (rounding). IOW: while a priori the probability that a post of A’s is spam is 0.4, both B’s acceptance and C’s rejection heighten it, so this specific post of A’s has about 87% chance of being spam.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Ask A Question

Stats

  • Questions 451k
  • Answers 451k
  • Best Answers 0
  • User 1
  • Popular
  • Answers
  • Editorial Team

    How to approach applying for a job at a company ...

    • 7 Answers
  • Editorial Team

    What is a programmer’s life like?

    • 5 Answers
  • Editorial Team

    How to handle personal stress caused by utterly incompetent and ...

    • 5 Answers
  • Editorial Team
    Editorial Team added an answer Read about AsyncTask in a great article from Google Android… May 15, 2026 at 8:58 pm
  • Editorial Team
    Editorial Team added an answer From one of my favorite books ever, Applied Cryptography by… May 15, 2026 at 8:58 pm
  • Editorial Team
    Editorial Team added an answer - (id)initWithFrame:(CGRect)frame delegate:(id<UITextFieldDelegate>)delegateObject; May 15, 2026 at 8:58 pm

Trending Tags

analytics british company computer developers django employee employer english facebook french google interview javascript language life php programmer programs salary

Top Members

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.