I have a small problem using the video creation capability of OpenCV.
For the same images, I get a weird output depending on the output size I want.
Here is an example of the results I can get.
http://www.youtube.com/watch?v=1wm8VjyfdyA&feature=youtu.be
I tried with several different sets of images, and on different computers.
It seems to run fine on Windows, and I have problems with the Opencv that ships in Ubuntu packages (current 2.3.1-7).
As the problem is not reproductible on my windows, I guess its was either fixed in the 2.4 or specific to Linux.
Here is a (python) test code that highlight the problem :
import os
import cv
in_dir = "../data/inputs/sample-test"
out = "output.avi"
# loading images, create Guys and store it into guys
frameSize = (652, 498)
#frameSize = (453, 325)
fourcc = cv.CV_FOURCC('F', 'M', 'P', '4')
my_video = cv.CreateVideoWriter(out,
fourcc,
15,
frameSize,
1)
for root, _, files in os.walk(in_dir):
for a_file in files:
guy_source = os.path.join(in_dir, a_file)
print guy_source
image = cv.LoadImage(guy_source)
small_im = cv.CreateImage(frameSize,
image.depth ,
image.nChannels)
cv.Resize(image, small_im, cv.CV_INTER_LINEAR)
cv.WriteFrame(my_video, small_im)
print "Finished !"
My concern is that depending on the output size, the video is fine (652, 498 is ok for example).
The behaviour is the same whatever codec I use.
If not a fix, I´d like some more information about the reason for this bug.
As I want to ship for Ubuntu, I´d better use their packaging system and keep the 2.3 for some time.
So I would like to know how I can wisely solve the problem, by choosing educated sizes.
Any information is welcome
Thx !
This is a common problem in video coding. As you can see, the image is shifted with a small amount to left each row.
As you may know, the image is saved as a long row of chars: BGRBGRBGR….
It is also defined by its width and height, and by step – the distance, in bytes, between two consecutive rows. A naive supposition is that the step is 3(channels)*width. But in addition, for memory alignment reasons, the image rows are padded with some extra bits, in order to make the step value a multiple of 4 (usually) or 16. The reason is that hardware codec acceleration works with aligned data – 32bit architectures read 32bits at once, and for SIMD processing, aligned data is loaded faster.
So the image will be represented as
Now, if a codec does not know of this padding, it will read the width of the image as 2, and will interpret the data as follows:
To make sure you do not experience this issue, you should select image width in such a way that the step value (channels*width) is a multiple of four. All of the standard resolutions have this property, and this is one of the reasons they were selected so: