I’d like to detect my hand from a live video stream and create a mask of my hand. However I’m reaching quite a poor result, as you can see from the picture.
My goal is to track the hand movement, so what I did was convert the video stream from BGR to HSV color space then I thresholded the image in order to isolate the color of my hand, then I tried to find the contours of my hand although the final result isn’t quite what I wanted to achieve.
How could I improve the end result?
import cv2
import numpy as np
cam = cv2.VideoCapture(1)
cam.set(3,640)
cam.set(4,480)
ret, image = cam.read()
skin_min = np.array([0, 40, 150],np.uint8)
skin_max = np.array([20, 150, 255],np.uint8)
while True:
ret, image = cam.read()
gaussian_blur = cv2.GaussianBlur(image,(5,5),0)
blur_hsv = cv2.cvtColor(gaussian_blur, cv2.COLOR_BGR2HSV)
#threshould using min and max values
tre_green = cv2.inRange(blur_hsv, skin_min, skin_max)
#getting object green contour
contours, hierarchy = cv2.findContours(tre_green,cv2.RETR_TREE,cv2.CHAIN_APPROX_SIMPLE)
#draw contours
cv2.drawContours(image,contours,-1,(0,255,0),3)
cv2.imshow('real', image)
cv2.imshow('tre_green', tre_green)
key = cv2.waitKey(10)
if key == 27:
break
Here the link with the pictures: https://picasaweb.google.com/103610822612915300423/February7201303.
New link with image plus contours, mask, and original.
https://picasaweb.google.com/103610822612915300423/February7201304
And here’s a sample picture from above:

There are many ways to perform pixel-wise threshold to separate “skin pixels” from “non-skin pixels”, and there are papers based on virtually any colorspace (even with RGB). So, my answer is simply based on the paper Face Segmentation Using Skin-Color Map in Videophone Applications by Chai and Ngan. They worked with the YCbCr colorspace and got quite nice results, the paper also mentions a threshold that worked well for them:
The thresholds for the
Ychannel are not specified, but there are papers that mentionY > 80. For your single image,Yin the whole range is fine, i.e. it doesn’t matter for actually distinguishing skin.Here is the input, the binary image according to the thresholds mentioned, and the resulting image after discarding small components.
Lastly, there are a quite decent amount of papers that do not rely on individual pixel-wise classification for this task. Instead, they start from a base of labeled images that are known to contain either skin pixels or non-skin pixels. From that they train, for example, a SVM and then distinguish other inputs based on this classifier.