Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 945907
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 15, 20262026-05-15T22:49:45+00:00 2026-05-15T22:49:45+00:00

I have this parent-child relationship Paragraph ——— ParagraphID PK // other attributes … Sentence

  • 0

I have this parent-child relationship

Paragraph
---------
ParagraphID   PK
// other attributes ...


Sentence
--------
SentenceID    PK
ParagraphID   FK -> Paragraph.ParagraphID
Text         nvarchar(4000)
Offset       int
Score        int
// other attributes ...

I’d like to find paragraphs that are equivalent; that is paragraphs that contain the same set of sentences. Two sentences are considered the same if they have the same Text, Offset and Score – SentenceID/ParagraphID is not part of the comparison, and two paragraphs are equivalent if they contain an equal set of sentences.

Could someone show me what a query to find equal paragraphs would look like?

EDIT: There are ca. 150K paragraphs, and 1.5M sentences. The output should include the ParagraphID, and the lowest paragraph ID that is equivalent to this one. E.g. if paragraph1 and paragraph2 are equal, then output would be

ParagraphID  EquivParagraphID
1            1
2            1
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-15T22:49:46+00:00Added an answer on May 15, 2026 at 10:49 pm

    In short, you need a signature for each paragraph and then compare the signatures. You did not mention the nature of the output itself. Here, I”m returning a row of comma-delimited ParagraphId values for each identical paragraph signature.

    With ParagraphSigs As
        (
        Select P.ParagraphId
            , Hashbytes('SHA1'
                    ,   (
                        Select '|' + S1.Text 
                            '|' + Cast(S1.Offset As varchar(10)) 
                            '|' + Cast(S1.Score As varchar(10))
                        From Sentence As S1
                        Where S1.ParagraphId = P.ParagraphId
                        Order By S1.SentenceId
                        For Xml Path('')
                        )) As Signature
        From Paragraph As P
        )
    Select Stuff(
                (
                Select ', ' + Cast(PS1.ParagraphId As varchar(10))
                From ParagraphSigs As PS1
                Where PS1.Signature = PS.Signature
                For Xml Path('')
                ), 1, 2, '') As Paragraph
    From ParagraphSigs As PS
    Group By PS.Signature
    

    Given you addition about the desired output, you can change the query like so:

    With ParagraphSigs As
        (
        Select P.ParagraphId
            , Hashbytes('SHA1'
                    ,   (
                        Select '|' + S1.Text 
                            '|' + Cast(S1.Offset As varchar(10)) 
                            '|' + Cast(S1.Score As varchar(10))
                        From Sentence As S1
                        Where S1.ParagraphId = P.ParagraphId
                        Order By S1.SentenceId
                        For Xml Path('')
                        )) As Signature
        From Paragraph As P
        )
    Select P1.ParagraphId, P2.ParagraphId As EquivParagraphId
    From ParagraphSigs As P1
        Left Join ParagraphSigs As P2
            On P2.Signature = P1.Signature
                And P2.ParagraphId <> P1.ParagraphId
    

    Obviously, it might be possible that three or four paragraphs share the same signature, so be warned that the above results will give you a cartesian product of matching paragraphs. (e.g. (P1,P2), (P1,P3), (P2,P1), (P2,P3), (P3,P1), (P3,P2)).

    In comments you asked about effectively searching on sentence last. Since you have two other parameters, you could reduce the number of signatures generated by doing by comparing on the two int columns first:

    With ParagraphsNeedingSigs As
        (
        Select P1.ParagraphId
        From Paragraph As P1
        Where Exists    (
                        Select 1
                        From Paragraph As P2
                        Where P2.ParagraphId <> P1.ParagraphId
                            And P2.Offset = P1.Offet
                            And P2.Score = P1.Score
                        )
        )
        , ParagraphSigs As
        (
        Select P.ParagraphId
            , Hashbytes('SHA1'
                    ,   (
                        Select '|' + S1.Text 
                            '|' + Cast(S1.Offset As varchar(10)) 
                            '|' + Cast(S1.Score As varchar(10))
                        From Sentence As S1
                        Where S1.ParagraphId = P.ParagraphId
                        Order By S1.SentenceId
                        For Xml Path('')
                        )) As Signature
        From ParagraphsNeedingSigs As P
        )
    Select P.ParagraphId, P2.ParagraphId As EquivParagraphId
    From Paragraph As P
        Left Join ParagraphSigs As P1
            On P1.ParagraphId = P.ParagraphId
        Left Join ParagraphSigs As P2
            On P2.Signature = P1.Signature
                And P2.ParagraphId <> P1.ParagraphId
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have a parent-/child relationship of folders, which looks like this: A folder can
I have this situation where I have a parent child relationship between two sets
I have two HTML pages that work in a parent-child relationship in this way:
Suppose I have this table parent | child 1 2 1 3 2 4
I have a path like this: parent/child/reply How do I use PHP to remove
So I have a set up similar to this questions: Parent Child Setup Everything
This should be simple, but I'm getting confused. I have a parent/child tables -
I have a parent window and a few child windows attached to this. With
I have this huge domain object(say parent) which contains other domain objects. It takes
I have this html: <div class=foo parent> <div class=child></div> </div> with some css: .foo{

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.