I’ve seen some machine learning questions on here so I figured I would post

Question

0

Asked: June 14, 20262026-06-14T21:56:32+00:00 2026-06-14T21:56:32+00:00

I’ve seen some machine learning questions on here so I figured I would post

0

I’ve seen some machine learning questions on here so I figured I would post a related question:

Suppose I have a dataset where athletes participate at running competitions of 10 km and 20 km with hilly courses i.e. every competition has its own difficulty.

The finishing times from users are almost inverse normally distributed for every competition.

One can write this problem as a matrix:

       Comp1 Comp2 Comp3
User1  20min  ??   10min

User2  25min 20min 12min

User3  30min 25min ??

User4  30min ??    ??

I would like to complete the matrix above which has the size 1000×20 and a sparseness of 8 % (!).

There should be a very easy way to complete this matrix, since I can calculate parameters for every user (ability) and parameters for every competition (mu, lambda of distributions). Moreover the correlation between the competitions are very high.

I can take advantage of the rankings User1 < User2 < User3 and Item3 << Item2 < Item1

Could you maybe give me a hint which methods I could use?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-14T21:56:34+00:00

Your astute observation that this is a matrix completion problem gets
you most of the way to the solution. I’ll codify your intuition that
the combination of ability of a user and difficulty of the course
yields the time of a race, then present various algorithms.

Model

Let the vector u denote the speed of the users so that u_i is user i’s
speed. Let the vector v denote the difficulty of the courses so
that v_j is course j’s difficulty. Also when available, let t_ij be user i’s time on
course j, and define y_ij = 1/t_ij, user i’s speed on course j.

Since you say the times are inverse Gaussian distributed, a sensible
model for the observations is

y_ij = u_i * v_j + e_ij,

where e_ij is a zero-mean Gaussian random variable.

To fit this model, we search for vectors u and v that minimize the
prediction error among the observed speeds:

f(u,v) = sum_ij (u_i * v_j – y_ij)^2

Algorithm 1: missing value Singular Value Decomposition

This is the classical Hebbian
algorithm. It
minimizes the above cost function by gradient descent. The gradient of
f wrt to u and v are

df/du_i = sum_j (u_i * v_j - y_ij) v_j
df/dv_j = sum_i (u_i * v_j - y_ij) u_i

Plug these gradients into a Conjugate Gradient solver or BFGS
optimizer, like MATLAB’s fmin_unc or scipy’s optimize.fmin_ncg or
optimize.fmin_bfgs. Don’t roll your own gradient descent unless you’re willing to implement a very good line search algorithm.

Algorithm 2: matrix factorization with a trace norm penalty

Recently, simple convex relaxations to this problem have been
proposed. The resulting algorithms are just as simple to code up and seem to
work very well. Check out, for example Collaborative Filtering in a Non-Uniform World:
Learning with the Weighted Trace Norm. These methods minimize
f(m) = sum_ij (m_ij – y_ij)^2 + ||m||_*,
where ||.||_* is the so-called nuclear norm of the matrix m. Implementations will end up again computing gradients with respect to u and v and relying on a nonlinear optimizer.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’ve seen some machine learning questions on here so I figured I would post

Leave an answerCancel reply

1 Answer

Model

Algorithm 1: missing value Singular Value Decomposition

Algorithm 2: matrix factorization with a trace norm penalty

Leave an answer
Cancel reply