Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 786275
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 14, 20262026-05-14T21:00:56+00:00 2026-05-14T21:00:56+00:00

Is it a good practice to use sigmoid or tanh output layers in Neural

  • 0

Is it a good practice to use sigmoid or tanh output layers in Neural networks directly to estimate probabilities?

i.e the probability of given input to occur is the output of sigmoid function in the NN

EDIT
I wanted to use neural network to learn and predict the probability of a given input to occur..
You may consider the input as State1-Action-State2 tuple.
Hence the output of NN is the probability that State2 happens when applying Action on State1..

I Hope that does clear things..

EDIT
When training NN, I do random Action on State1 and observe resultant State2; then teach NN that input State1-Action-State2 should result in output 1.0

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-14T21:00:57+00:00Added an answer on May 14, 2026 at 9:00 pm

    First, just a couple of small points on the conventional MLP lexicon (might help for internet searches, etc.): ‘sigmoid’ and ‘tanh’ are not ‘output layers’ but functions, usually referred to as “activation functions”. The return value of the activation function is indeed the output from each layer, but they are not the output layer themselves (nor do they calculate probabilities).

    Additionally, your question recites a choice between two “alternatives” (“sigmoid and tanh”), but they are not actually alternatives, rather the term ‘sigmoidal function’ is a generic/informal term for a class of functions, which includes the hyperbolic tangent (‘tanh’) that you refer to.

    The term ‘sigmoidal’ is probably due to the characteristic shape of the function–the return (y) values are constrained between two asymptotic values regardless of the x value. The function output is usually normalized so that these two values are -1 and 1 (or 0 and 1). (This output behavior, by the way, is obviously inspired by the biological neuron which either fires (+1) or it doesn’t (-1)). A look at the key properties of sigmoidal functions and you can see why they are ideally suited as activation functions in feed-forward, backpropagating neural networks: (i) real-valued and differentiable, (ii) having exactly one inflection point, and (iii) having a pair of horizontal asymptotes.

    In turn, the sigmoidal function is one category of functions used as the activation function (aka “squashing function”) in FF neural networks solved using backprop. During training or prediction, the weighted sum of the inputs (for a given layer, one layer at a time) is passed in as an argument to the activation function which returns the output for that layer. Another group of functions apparently used as the activation function is piecewise linear function. The step function is the binary variant of a PLF:

    def step_fn(x) :
      if x <= 0 :
        y = 0
      if x > 0 :
        y = 1    
    

    (On practical grounds, I doubt the step function is a plausible choice for the activation function, but perhaps it helps understand the purpose of the activation function in NN operation.)

    I suppose there an unlimited number of possible activation functions, but in practice, you only see a handful; in fact just two account for the overwhelming majority of cases (both are sigmoidal). Here they are (in python) so you can experiment for yourself, given that the primary selection criterion is a practical one:

    # logistic function
    def sigmoid2(x) :
      return 1 / (1 + e**(-x))   
    
    # hyperbolic tangent
    def sigmoid1(x) :
      return math.tanh(x)
    

    what are the factors to consider in selecting an activation function?

    First the function has to give the desired behavior (arising from or as evidenced by sigmoidal shape). Second, the function must be differentiable. This is a requirement for backpropagation, which is the optimization technique used during training to ‘fill in’ the values of the hidden layers.

    For instance, the derivative of the hyperbolic tangent is (in terms of the output, which is how it is usually written) :

    def dsigmoid(y) :
      return 1.0 - y**2
    

    Beyond those two requriements, what makes one function between than another is how efficiently it trains the network–i.e., which one causes convergence (reaching the local minimum error) in the fewest epochs?

    #——– Edit (see OP’s comment below) ———#

    I am not quite sure i understood–sometimes it’s difficult to communicate details of a NN, without the code, so i should probably just say that it’s fine subject to this proviso: What you want the NN to predict must be the same as the dependent variable used during training. So for instance, if you train your NN using two states (e.g., 0, 1) as the single dependent variable (which is obviously missing from your testing/production data) then that’s what your NN will return when run in “prediction mode” (post training, or with a competent weight matrix).

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Is it a good practice to use Assert for function parameters to enforce their
I understand that it is good practice to use a using block when getting
When writing a Perl module, is it a good practice to use croak/die inside
Is it good practice to use 'goto' statements in SQL queries?
In Java, is it a good practice to use annotations to configure an application
Is it a good practice to use enums instead of literals to refer to
is it a good practice to use private properties in codeigniter controllers ? for
Is it a good practice to use exception for managing cases that are not
is it acceptable/good-practice to use the method chaining pattern on value objects (like, returning
For many reasons, it is not good practice to use threads inside a servlet.

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.