I’m reading Andrew NG’s Machine Learning notes, but the functional margin definition confused me :

I can understand to geometric margin is the distance from x to its hyperplane, but how to understand functional margin ? And why they define its formula like that ?
Think of it like this: w^T.x_i +b is the model’s prediction for the i-th data point. Y_i is its label. If the prediction and ground truth have the same sign, then gamma_i will be positive. The further “inside” the class boundary this instance is, the bigger gamma_i will be : this is better because, summed over all i, you will have greater separation between your classes. If the prediction and the label don’t agree in sign, then this quantity will be negative (incorrect decision by the predictor), which will reduce your margin, and it will be reduced more the more incorrect you are (analogous to slack variables).