according to wikipedia , with the delta rule we adjust the weight by: dw

Question

0

Editorial Team

Asked: June 17, 20262026-06-17T13:05:45+00:00 2026-06-17T13:05:45+00:00

according to wikipedia , with the delta rule we adjust the weight by: dw

0

according to wikipedia, with the delta rule we adjust the weight by:

dw = alpha * (ti-yi)*g'(hj)xi

when alpha = learning constant, ti – true answer, yi – perceptron’s guess,g’ = the derivative of the activation function g with respect to the weighted sum of the perceptron’s inputs, xi – input.

The part that I don’t understand in this formula is the multiplication by the derivative g’. let g = sign(x) (the sign of the weighted sum). so g’ is always 0, and dw = 0. However, in code examples I saw in the internet, the writers just omitted the g’ and used the formula:

dw = alpha * (ti-yi)*(hj)xi

I will be glad to read a proper explanation!

thank you in advance.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-17T13:05:46+00:00

You’re correct that if you use a step function for your activation function g, the gradient is always zero (except at 0), so the delta rule (aka gradient descent) just does nothing (dw = 0). This is why a step-function perceptron doesn’t work well with gradient descent. 🙂

For a linear perceptron, you’d have g'(x) = 1, for dw = alpha * (t_i - y_i) * x_i.

You’ve seen code that uses dw = alpha * (t_i - y_i) * h_j * x_i. We can reverse-engineer what’s going on here, because apparently g'(h_j) = h_j, which means remembering our calculus that we must have g(x) = e^x + constant. So apparently the code sample you found uses an exponential activation function.

This must mean that the neuron outputs are constrained to be on (0, infinity) (or I guess (a, infinity) for any finite a, for g(x) = e^x + a). I haven’t run into this before, but I see some references online. Logistic or tanh activations are more common for bounded outputs (either classification or regression with known bounds).

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

according to wikipedia , with the delta rule we adjust the weight by: dw

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply