Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7523717
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 30, 20262026-05-30T02:53:03+00:00 2026-05-30T02:53:03+00:00

I am trying to come up with a very fast algorithm for calculating this

  • 0

I am trying to come up with a very fast algorithm for calculating this very interesting statistic that takes full advantage of the capabilities of a powerful GPU. Ideally I will do this in Matlab using Jacket, but other ideas in CUDA code or OpenCL would also be much appreciated. Basically, I want to crowd-source lots of clever ideas that I can try to put together, and then I will attempt to open-source the result so others can use it.

Despite the power of this dependency coefficient (it is capable of detecting even “one-to-many” dependency relationships), there is almost nothing about it online, except for two sources: SAS statistical software, and the excellent R package Hmisc by Frank Harrell. You can read a description of the algorithm here:

http://support.sas.com/documentation/cdl/en/procstat/63104/HTML/default/viewer.htm#procstat_corr_sect016.htm

And here is Harrell’s code in Fortran (which is surprisingly easy to follow if you understand the calculation already):

http://hmisc.sourcearchive.com/documentation/3.7-0/hoeffd_8f-source.html

(also, page 128 of the pdf documentation for Hmisc has some more details.)

This is a very computationally demanding algorithm– if you wanted to apply it to a data set that is thousands of rows and a few thousand columns, even with the fast Fortran implementation, you would be waiting many days for the result– even on a new machine. My hope is that using an Nvidia GTX 580 level card, or better yet, a Tesla, would bring that down to a couple hours. In which case, the combination would be an analytical force to be reckoned with, whether identifying genes or finding correlations in experimental data.

Anyway, I look forward to peoples’ responses and hope we can make a fast, GPU based algorithm for Hoeffding’s D a reality.

Thanks in advance for any ideas– and please don’t hesitate to give partial or half-baked ideas!

Update: So, Jascha has generously provided a working implementation of Hoeffding’s D in Matlab, which fulfilled one of my objectives. The other was to radically enhance the speed by using a GPU, preferably using Jacket. Does anyone see a path or strategy to do this in a clever way on the GPU?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-30T02:53:04+00:00Added an answer on May 30, 2026 at 2:53 am

    Below is a code example with a simple implementation of the Hoeffding’s D measure of dependency in MATLAB. This is NOT GPU-ized, but may prove useful to people who want to calculate this statistic and are not using Fortran, or as a starting point for putting it on the GPU. (The extended header illustrates Hoeffding’s D values for several types of joint distributions.)

    function [ D ] = hoeffdingsD( x, y )
    %Compute's Hoeffding's D measure of dependence between x and y
    %   inputs x and y are both N x 1 arrays
    %   output D is a scalar
    %     The formula for Hoeffding's D is taken from
    %     http://support.sas.com/documentation/cdl/en/procstat/63104/HTML/default/viewer.htm#procstat_corr_sect016.htm
    % Below is demonstration code for several types of dependencies.
    %
    % % this case should be 0 - there are no dependencies
    % x = randn(1000,1);
    % y = randn(1000,1);
    % D = hoeffdingsD( x, y );
    % desc = 'x = N( 0, 1 ), y and x independent';
    % desc = sprintf( '%s, Hoeffding''s D = %f', desc, D );
    % fprintf( '%s\n', desc );
    % figure; plot(x,y,'.'); title( desc );
    % 
    % % the rest of these cases have dependencies of different types
    % x = randn(1000,1);
    % y = x;
    % D = hoeffdingsD( x, y );
    % desc = 'x = N( 0, 1 ), y = x';
    % desc = sprintf( '%s, Hoeffding''s D = %f', desc, D );
    % fprintf( '%s\n', desc );
    % figure; plot(x,y,'.'); title( desc );
    % 
    % x = randn(1000,1);
    % y = cos(10*x);
    % D = hoeffdingsD( x, y );
    % desc = 'x = N( 0, 1 ), y = cos(10x)';
    % desc = sprintf( '%s, Hoeffding''s D = %f', desc, D );
    % fprintf( '%s\n', desc );
    % figure; plot(x,y,'.'); title( desc );
    % 
    % x = randn(1000,1);
    % y = x.^2;
    % D = hoeffdingsD( x, y );
    % desc = 'x = N( 0, 1 ), y = x^2';
    % desc = sprintf( '%s, Hoeffding''s D = %f', desc, D );
    % fprintf( '%s\n', desc );
    % figure; plot(x,y,'.'); title( desc );
    % 
    % x = randn(1000,1);
    % y = x.^2 + randn(1000,1);
    % D = hoeffdingsD( x, y );
    % desc = 'x = N( 0, 1 ), y = x^2 + N( 0, 1 )';
    % desc = sprintf( '%s, Hoeffding''s D = %f', desc, D );
    % fprintf( '%s\n', desc );
    % figure; plot(x,y,'.'); title( desc );
    % 
    % x = rand(1000,1);
    % y = rand(1000,1);
    % z = sign(randn(1000,1));
    % x = z.*x; y = z.*y;
    % D = hoeffdingsD( x, y );
    % desc = 'x = z U( [0, 1) ), y = z U( [0, 1) ), z = U( {-1,1} )';
    % desc = sprintf( '%s, Hoeffding''s D = %f', desc, D );
    % fprintf( '%s\n', desc );
    % figure; plot(x,y,'.'); title( desc );
    % 
    % x = rand(1000,1);
    % y = rand(1000,1);
    % z = sign(randn(1000,1));
    % x = z.*x; y = -z.*y;
    % D = hoeffdingsD( x, y );
    % desc = 'x = z U( [0, 1) ), y = -z U( [0, 1) ), z = U( {-1,1} )';
    % desc = sprintf( '%s, Hoeffding''s D = %f', desc, D );
    % fprintf( '%s\n', desc );
    % figure; plot(x,y,'.'); title( desc );
    
    N = size(x,1);
    
    R = tiedrank( x );
    S = tiedrank( y );
    
    Q = zeros(N,1);
    parfor i = 1:N
        Q(i) = 1 + sum( R < R(i) & S < S(i) );
        % and deal with cases where one or both values are ties, which contribute less
        Q(i) = Q(i) + 1/4 * (sum( R == R(i) & S == S(i) ) - 1); % both indices tie.  -1 because we know point i matches
        Q(i) = Q(i) + 1/2 * sum( R == R(i) & S < S(i) ); % one index ties.
        Q(i) = Q(i) + 1/2 * sum( R < R(i) & S == S(i) ); % one index ties.
    end
    
    D1 = sum( (Q-1).*(Q-2) );
    D2 = sum( (R-1).*(R-2).*(S-1).*(S-2) );
    D3 = sum( (R-2).*(S-2).*(Q-1) );
    
    D = 30*((N-2)*(N-3)*D1 + D2 - 2*(N-2)*D3) / (N*(N-1)*(N-2)*(N-3)*(N-4));
    
    end
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

This might come across as a very silly question. I am trying to place
I'm trying to come up with a Java regex that will match a filename
I am trying to come up with such a solution that the user is
i'm trying to come up with a very simple way to store a boolean
Currently trying to come up with a best practice for a ZF site that
I am trying to come up with an algorithm to asymmetrically match a first
I have come very close to finding a solution to this one; just missing
I have come across Adobe MXML that looks very similar to Microsoft's XAML. Who
I have come across a very odd problem in C that I have never
I'm very new to MySQL and I'm trying to come up with a query

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.