Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6724755
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 26, 20262026-05-26T09:43:38+00:00 2026-05-26T09:43:38+00:00

Let’s say I have a text transcript of a dialogue over a period of

  • 0

Let’s say I have a text transcript of a dialogue over a period of aprox. 1 hour. I want to know what words happen in close proximatey to one another. What type of statistical technique would I use to determine what words are clustered together and how close their proximatey to one another is?

I’m suspecting some sort of cluster analysis or PCA.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-26T09:43:39+00:00Added an answer on May 26, 2026 at 9:43 am

    To determine word proximity, you will have to build a graph:

    1. each word is a vertex (or “node”), and
    2. left and right words are edges

    So “I like dogs” would have 2 edges and 3 vertices.

    Now, the next step will be to decide based on this model what your definition of “close” is.

    This is where the statistics comes in.

    To determine “groups” of correlated words

    1. MCL clustering – This will give you a number of clusters which algorithmically have high odds of being seen together.

    2. K MEANS clustering – This will give you “k” groups of words.

    3. Thresholding – this is the most reliable and intuitive method. Plot all the relationships for a small subset of data that you understand (for example, a paragraph from a news clip or article you have read) and run your method to generate a graph, and visualize the graph using a tool such as graphviz or cytoscape. Once you can see the relatedness, you can count how many edges are generally found between different words that clearly cluster together. You might find that, for example, two words that cluster together will have an edge for every 5 instances. Use this as a cutoff and write your own graph analysis script which outputs word-pairs that have at least 1 edge for every 5 instances of the word in your vertex graph.

      1. Evaluating 3 by ROC curves. You can titrate this value of your cutoff higher and higher until you have very few “clusters”. If you then run your algorithm against a paragraph with known, expected results (created by a human who already knows what words should be reported as correlated), you can evaluate the precision of your algorithm using a receiver operating characteristic which compares the correlated-words output to a precalculated gold standard.
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Let's say I have a 1 GB text file and I want to read
Let's say I don't have photoshop, but I want to make pattern files (.pat)
Let's say I have thousands of users and I want to make the passwords
Let's say I have two text files that I need to extract data out
Let's say I have a file foo.py, and within the file I want to
Let's say I have an Instant Messenger server using SignalR. I want to broadcast
Let's say I have a Person class with FirstName and LastName . I want
Let's say I have my own website, my own database and I want to
Let's say I have a method in java, which looks up a user in
Let me explain best with an example. Say you have node class that can

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.