I am working on Histogram of Oriented Gradient(HOG) features and I am trying to implement the trilinear interpolation of histogram bins as described in Dalal’s PhD thesis. And he explains the interpolation process as cited below:
EDIT: Roughly speaking, HOG features are extracted from a 64×128 pixel window which is divided into blocks. Each block consists of 2×2 cells and a cell is 8×8 pixel area. Extraction starts with calculating first order derivatives of image, then orientation and magnitude of each pixel are calculated. An orientation histogram within the block for each 8×8 pixel cell is calculated where pixels contribute to the histogram with the magnitude value, based on the orientation of the pixel, and magnitude is interpolated between the neighbouring bin centres in both orientation and position. Histogram contains 9 bins represents 0-180 degrees with stride of 20 degrees. An overall depiction of the algorithm can be seen here: http://4.bp.blogspot.com/_7NBDeKCsVHg/TKBbldI8GmI/AAAAAAAAAG0/G-OXUz1ouPQ/s1600/a1.bmp
We first describe linear interpolation
in a one dimension space and then
extend it to 3-D. Let h be a histogram
with inter-bin distance(bandwidth) b.
h(x) denotes the value of the
histogram for the bin centred at x.
Assume that we want to interpolate a
weight w at point x into the
histogram. Let x1 and x2 be the two
nearest neighbouring bins of the point
x such that x1 ≤ x < x2. Linear
interpolation distributes the weight w
into two nearest neighbours as follows
Let w at the 3-D point x = [x, y, z]
be the weight to be interpolated. Let
x1 and x2 be the two corner vectors of
the histogram cube containing x, where
in each component x1 ≤ x < x2. Assume
that the bandwidth of the histogram
along the x, y and z axis is given by
b = [bx, by, bz]. Trilinear
interpolation distributes the weight w
to the 8 surrounding bin centres as
follows:
.
We compute histogram for cells and every pixel contributes with its magnitude value to the histogram. What I understand from the formulation is that x and y represents the location of the cells in the detection window and z is the bin number. In a 64×128 detection window, there are 8×16 cells and 9 orientation bins so that our histogram is represented as h(8,16,9). If above statements are correct, do (x1,y1) and (x2,y2) represent previous and letter cells respectively? Does z1 and z2 mean the previous and letter orientation bins? What about bandwidth b=[bx, by, bz]?
I’d be really appreciated if someone can clarify these issues.
Thanks.


Think of (x1, y1, z1) and (x2, y2, z2) as two points spanning a cube that surrounds the point (x,y,z) for which you want to interpolate a value of h.
The set of eight points (x1, y1, z1), (x2, y1, z1), (x1, y2, z1), (x1, y1, z2), (x2, y2, z1), (x2, y1, z2), (x1, y2, z2), (x2, y2, z2) forms the complete cube. So trilinear interpolation between (x1, y1, z1) and (x2, y2, z2) actually means interpolation between the 8 points in the 3D histogram space surrounding the point you are interested in! Now to your questions:
(x1, y1), (x2, y2) (and (x1,y2) and (x2, y1) represent the centers of bins in the (x,y) plane. In your case these would be the orientation vectors.
z1 and z2 represent two bin levels in the orientation direction, as you say. Combined with the four points in the image plane this gives you a total of 8 bins.
The bandwidth b=[bx, by, bz] is basically the distance between the centers of neighbouring bins in the x, y and z direction. In your case, with 8 bins in the x-direction and 64 pixels in that direction, 16 bins in the y direction and 128 pixels in the y direction:
This leaves bz, for which I actually need more data, because I don’t know the full range of your gradient (i.e. lowest to highest possible value) but if that range is
rgthen:In general, the bandwidth in any direction equals the full available range in that direction divided by the number of bins in that direction.
For a good explanation of trilinear interpolation with pictures look at the link in whoplisp’s answer.