Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7662739
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 31, 20262026-05-31T13:52:41+00:00 2026-05-31T13:52:41+00:00

I am reading this document, and they stated that the weight adjustment formula is

  • 0

I am reading this document, and they stated that the weight adjustment formula is this:

new weight = old weight + learning rate * delta * df(e)/de * input

The df(e)/de part is the derivative of the activation function, which is usually a sigmoid function like tanh.

  • What is this actually for?
  • Why are we even multiplying with that?
  • Why isn’t just learning rate * delta * input enough?

This question came after this one and is closely related to it: Why must a nonlinear activation function be used in a backpropagation neural network?.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-31T13:52:42+00:00Added an answer on May 31, 2026 at 1:52 pm

    Training a neural network just refers to finding values for every cell in the weight matrices (of which there are two for a NN having one hidden layer) such that the squared differences between the observed and predicted data are minimized. In practice, the individual weights comprising the two weight matrices are adjusted with each iteration (their initial values are often set to random values). This is also called the online model, as opposed to the batch one where weights are adjusted after a lot of iterations.

    But how should the weights be adjusted–i.e., which direction +/-? And by how much?

    That’s where the derivative come in. A large value for the derivative will result in a large adjustment to the corresponding weight. This makes sense because if the derivative is large that means you are far from a minima. Put another way, weights are adjusted at each iteration in the direction of steepest descent (highest value of the derivative) on the cost function’s surface defined by the total error (observed versus predicted).

    After the error on each pattern is computed (subtracting the actual value of the response varible or output vector from the value predicted by the NN during that iteration), each weight in the weight matrices is adjusted in proportion to the calculated error gradient.

    Because the error calculation begins at the end of the NN (i.e., at the output layer by subtracting observed from predicted) and proceeds to the front, it is called backprop.


    More generally, the derivative (or gradient for multivariable problems) is used by the optimization technique (for backprop, conjugate gradient is probably the most common) to locate minima of the objective (aka loss) function.

    It works this way:

    The first derivative is the point on a curve such that a line tangent to it has a slope of 0.

    So if you are walking around a 3D surface defined by the objective function and you walk to a point where slope = 0, then you are at the bottom–you have found a minima (whether global or local) for the function.

    But the first derivative is more important than that. It also tells you if you are going in the right direction to reach the function minimum.

    It’s easy to see why this is so if you think about what happens to the slope of the tangent line as the point on the curve/surface is moved down toward the function minimumn.

    The slope (hence the value of the derivative of the function at that point) gradually decreases. In other words, to minimize a function, follow the derivative–i.e, if the value is decreasing then you are moving in the correct direction.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I was recently reading this document which lists a number of strategies that could
After reading this article on thedailywtf.com, I'm not sure that I really got the
Reading through the CKEditor documentation , I see that they have an option to
i have this code, that works without any problem: <script> $(document).ready(function () { $(.block1).click(function
I'm recently working on ID3v2.4.0. Reading 2.4.0 document, i found a particular part that
I saved my project.apk file, and I was reading this document: http://developer.android.com/guide/publishing/app-signing.html#setup and they
Reading this question I found this as (note the quotation marks) code to solve
Reading this post has left me wondering; are nightly builds ever better for a
Reading this blog post about HttpOnly cookies made me start thinking, is it possible
After reading this question , I was reminded of when I was taught Java

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.