Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6907295
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 27, 20262026-05-27T08:25:01+00:00 2026-05-27T08:25:01+00:00

[Frontmatter] (skip this if you just want the question) : I’m currently looking at

  • 0

[Frontmatter] (skip this if you just want the question):

I’m currently looking at using Shannon-Weaver Mutual Information and normalized redundancy to measure the degree of information masking between bags of discrete and continuous feature values, organized by feature. Using this method, it is my goal to construct an algorithm that looks very similar to ID3, but instead of using Shannon entropy, the algorithm will seek (as a loop constraint) to maximize or minimize shared information between a single feature and a collection of features based on the complete input feature space, adding new features to the latter collection if (and only if) they increase or decrease mutual information, respectively. This, in effect, moves ID3’s decision algorithm into pairspace, stapling an ensemble approach to it with all of the expected time and space complexities of both methods.

[/Frontmatter]

On to the question: I’m trying to get a continuous integrator working in Python using SciPy. Because I’m working with comparisons of discrete and continuous variables, my current strategy for each comparison for feature-feature pairs is as follows:

  • Discrete feature versus discrete feature: use the discrete form of mutual information. This results in a double summation of the probabilities, which my code handles without issue.

  • All other cases (discrete versus continuous, the inverse, and continuous versus continuous): use the continuous form, using a Gaussian estimator to smooth out the probability density functions.

It is possible for me to perform some kind of discretization for the latter cases, but because my input data sets are not inherently linear, this is potentially needlessly complex.

Here’s the salient code:

import math
import numpy
import scipy
from scipy.stats import gaussian_kde
from scipy.integrate import dblquad

# Constants
MIN_DOUBLE = 4.9406564584124654e-324 
                    # The minimum size of a Float64; used here to prevent the
                    #  logarithmic function from hitting its undefined region
                    #  at its asymptote of 0.
INF = float('inf')  # The floating-point representation for "infinity"

# x and y are previously defined as collections of 
# floating point values with the same length

# Kernel estimation
gkde_x = gaussian_kde(x)
gkde_y = gaussian_kde(y)

if len(binned_x) != len(binned_y) and len(binned_x) != len(x):
    x.append(x[0])
    y.append(y[0])

gkde_xy = gaussian_kde([x,y])
mutual_info = lambda a,b: gkde_xy([a,b]) * \
           math.log((gkde_xy([a,b]) / (gkde_x(a) * gkde_y(b))) + MIN_DOUBLE)

# Compute MI(X,Y)
(minfo_xy, err_xy) = \
    dblquad(mutual_info, -INF, INF, lambda a: 0, lambda a: INF)

print 'minfo_xy = ', minfo_xy

Note that overcounting exactly one point is done deliberately to prevent a singularity in SciPy’s gaussian_kde class. As the size of x and y mutually approach infinity, this effect becomes negligible.

My current snag is in trying to get multiple integration working against a Gaussian kernel density estimate in SciPy. I’ve been trying to use SciPy’s dblquad to perform the integration, but in the latter case, I receive an astounding spew of the following messages.

When I set numpy.seterr ( all='ignore' ):

Warning: The ocurrence of roundoff error is detected, which prevents
the requested tolerance from being achieved. The error may be
underestimated.

And when I set it to 'call' using an error handler:

Floating point error (underflow), with flag 4

Floating point error (invalid value), with flag 8

Pretty easy to figure out what’s going on, right? Well, almost: IEEE 754-2008 and SciPy only tell me what’s going on here, not why or how to work around it.

The upshot: minfo_xy generally resolves to nan; its sampling is insufficient to prevent information from becoming lost or invalid when performing Float64 math.

Is there a general workaround for this problem when using SciPy?

Even better: if there is a robust, canned implementation of continuous mutual information for Python with an interface that takes two collections of floating point values or a merged collection of pairs, it would resolve this complete problem. Please link it if you know of one that exists.

Thank you in advance.


Edit: this resolves the nan propagation issue in the example above:

mutual_info = lambda a,b: gkde_xy([a,b]) * \
    math.log((gkde_xy([a,b]) / ((gkde_x(a) * gkde_y(b)) + MIN_DOUBLE)) \
        + MIN_DOUBLE)

However, the question of roundoff correction remains, as does the request for a more robust implementation. Any help in either domain would be greatly appreciated.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-27T08:25:02+00:00Added an answer on May 27, 2026 at 8:25 am

    Before trying more radical solutions like reframing the problem or using different integration tools, see if this helps. Replace INF=float('INF') with INF=1E12 or some other large number — that may eliminate NaN results created by simple arithmetic operations on the input variables.

    No promises on this one, but it is sometimes helpful to try a quick fix before engaging in a significant algorithmic rewrite or substitution of alternate tools.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I've been looking for the answer to this question for a while now but
I'm using Jekyll for my blog, and I'd like the ability to use unique
I am using gs -dBATCH -dNOPAUSE -sDEVICE=pdfwrite -sOutputFile=book.pdf -f front-matter.pdf fulltext-0.pdf fulltext-1.pdf back-matter.pdf to
I'm having trouble crafting a regex to match YAML Front Matter This is the
I am using the standard jekyll installation to maintain a blog, everything is going
I'm using Jekyll for one of my projects and it really seems a very
I have a latex document. I am using hyperref, makeidx and glossary packages for
I'm creating a site using jekyll.rb . I have a page called about.html: <div
I'm writing a site in Jekyll, which uses Liquid. I have front matter for
I'm not the biggest fan of WordPress, but I think the one thing that

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.