Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 3878664
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 19, 20262026-05-19T22:38:18+00:00 2026-05-19T22:38:18+00:00

I have documents which can belong to several classes and can contain several tokens

  • 0

I have documents which can belong to several classes and can contain several tokens (words):

create table Tokens (
        Id INT not null,
       Text NVARCHAR(255) null,
       primary key (Id)
    )

create table DocumentClassTokens (
        Id INT not null,
       DocumentFk INT null,
       ClassFk INT null,
       TokenFk INT null,
       primary key (Id)
    )

I would like to determine these stats (for all tokens given the class):

  • A = number of distinct documents which contain token and belong to class
  • B = number of distinct documents which contain token and do not belong to class
  • C = number of distinct documents which do not contain token and belong to class
  • D = number of distinct documents which do not contain token and do not belong to class

I am using this at the moment but it does not look right (I am pretty sure that the computation of A and B is correct):

declare @class int;

select @class = id from dbo.Classes where text = 'bla'

;with A as
(
    select
        a.text as token,
        count(distinct DocumentFk) as A
    from dbo.Tokens as a
    inner join dbo.DocumentClassTokens as b on a.id = b.TokenFk and b.ClassFk = @class
    group by a.text
)
,B as
(
    select
        a.text as token,
        count(distinct DocumentFk) as B
    from dbo.Tokens as a
    inner join dbo.DocumentClassTokens as b on a.id = b.TokenFk and b.ClassFk != @class
    group by a.text
)
,C as
(
    select
        a.text as token,
        count(distinct DocumentFk) as C
    from dbo.Tokens as a
    inner join dbo.DocumentClassTokens as b on a.id != b.TokenFk and b.ClassFk = @class
    group by a.text
)
,D as
(
    select
        a.text as token,
        count(distinct DocumentFk) as D
    from dbo.Tokens as a
    inner join dbo.DocumentClassTokens as b on a.id != b.TokenFk and b.ClassFk != @class
    group by a.text
)
select 
    case when A is null then 0 else A end as A,
    case when B is null then 0 else B end as B,
    case when C is null then 0 else C end as C,
    case when D is null then 0 else D end as D,
    t.Text,
    t.id
from dbo.Tokens as t
left outer join A as a on t.text = a.token
left outer join B as b on t.text = b.token
left outer join C as c on t.text = c.token
left outer join D as d on t.text = d.token
order by t.text

Any feedback would be very much appreciated. Many thanks!

Best wishes,

Christian

PS:

Some test data:

use play;

drop table tokens
create table Tokens 
(
   Id INT not null,
   Text NVARCHAR(255) null,
   primary key (Id)
)

insert into Tokens (id, text) values (1,'1')
insert into Tokens (id, text) values (2,'2')

drop table DocumentClassTokens
create table DocumentClassTokens (
        Id INT not null,
       DocumentFk INT null,
       ClassFk INT null,
       TokenFk INT null,
       primary key (Id)
    )

insert into DocumentClassTokens (Id,documentfk,ClassFk,TokenFk) values (1,1,1,1) 
insert into DocumentClassTokens (Id,documentfk,ClassFk,TokenFk) values (2,1,1,2) 
insert into DocumentClassTokens (Id,documentfk,ClassFk,TokenFk) values (3,2,1,1) 
insert into DocumentClassTokens (Id,documentfk,ClassFk,TokenFk) values (4,2,2,1) 
insert into DocumentClassTokens (Id,documentfk,ClassFk,TokenFk) values (5,3,2,1) 
insert into DocumentClassTokens (Id,documentfk,ClassFk,TokenFk) values (6,3,2,3)  
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-19T22:38:19+00:00Added an answer on May 19, 2026 at 10:38 pm

    Your question seems now much clearer, and if I haven’t overlooked anything, then here’s a query you might try to run against your data.

    DECLARE @class int;
    SET @class = 1;
    
    SELECT
      TokenFk,
      TokenClassDocs                        AS A,
      TokenNonClassDocs                     AS B,
      TotalClassDocs    - TokenClassDocs    AS C,
      TotalNonClassDocs - TokenNonClassDocs AS D
    FROM (
      SELECT
        TokenFk,
        COUNT(DISTINCT CASE ClassFk WHEN @class THEN DocumentFk ELSE NULL END) AS TokenClassDocs,
        COUNT(DISTINCT CASE ClassFk WHEN @class THEN NULL ELSE DocumentFk END) AS TokenNonClassDocs
      FROM DocumentClassTokens dct
      GROUP BY dct.TokenFk
    ) AS bytoken
      CROSS JOIN (
        SELECT
          COUNT(DISTINCT CASE ClassFk WHEN @class THEN DocumentFk ELSE NULL END) AS TotalClassDocs,
          COUNT(DISTINCT CASE ClassFk WHEN @class THEN NULL ELSE DocumentFk END) AS TotalNonClassDocs
        FROM DocumentClassTokens
      ) AS totals
    

    Please let us know if it’s all right.


    EDIT

    The above solution is wrong. Here’s the fixed one, and it certainly seems correct only I do not like it as much as the wrong version (what an irony…).

    DECLARE @class int;
    SET @class = 1;
    
    SELECT
      TokenFk,
      TokenClassDocs                        AS A,
      TokenNonClassDocs                     AS B,
      TotalClassDocs    - TokenClassDocs    AS C,
      TotalNonClassDocs - TokenNonClassDocs AS D
    FROM (
      SELECT
        TokenFk,
        COUNT(DISTINCT cls.DocumentFk) AS TokenClassDocs,
        COUNT(DISTINCT CASE WHEN cls.DocumentFk IS NULL THEN dct.DocumentFk END) AS TokenNonClassDocs
      FROM DocumentClassTokens dct
        LEFT JOIN (
          SELECT DISTINCT DocumentFk
          FROM DocumentClassTokens
          WHERE ClassFk = @class
        ) cls ON dct.DocumentFk = cls.DocumentFk
      GROUP BY dct.TokenFk
    ) AS bytoken
      CROSS JOIN (
        SELECT
          COUNT(DISTINCT cls.DocumentFk) AS TotalClassDocs,
          COUNT(DISTINCT CASE WHEN cls.DocumentFk IS NULL THEN dct.DocumentFk END) AS TotalNonClassDocs
        FROM DocumentClassTokens dct
          LEFT JOIN (
            SELECT DISTINCT DocumentFk
            FROM DocumentClassTokens
            WHERE ClassFk = @class
          ) cls ON dct.DocumentFk = cls.DocumentFk
      ) AS totals
    

    Note: I think I can see now how you can check if the figures are wrong: the sum of A, B, C, D in every row (i.e. for every token) must be equal to the total document count, which should not be surprising, because every document can satisfy 1 and only 1 of the 4 cases being explored. If the row sum is different from the total document count then some figures in the row are certainly wrong.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have text documents like the following which contain single and multiple variables: title::
I have to create a Java EE application which converts large documents into different
Does anybody know or have any documents which I can use to build a
In the application I am developing, I need to create documents which can be
I have SVG documents which I display directly in browsers (currently IE and Firefox)
I currently have 6 100mb+ PDF documents which have searchable text layer enabled. However
Good day, If I have for example the documents which have the following fields
I have about 100 Word documents which include transliteration of foreign names. The author
I have an app that uses a UITableView to list documents which are stored
I have a mongo collection which has documents with two fields fieldA and fieldB,

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.