A (quasi-) real-time video processing on iOS
In previous posts, I showed you how to create a custom camera using AVFoundation and how to process an image with the accelerate framework. Let’s now combine both results to create a (quasi-) real-time (I’ll explain later what I mean with quasi) video processing.
iOS image processing with the accelerate framework
Sometime ago, my friend John Fox asked me how to reproduce a blurring image effect in the iOS SDK. Core Image on the iOS does not provide that effect. You can find in the Internet a couple of solutions for the iOS performing the convolution as matrix multiplication. That’s an ok approach, but it does not take advantage of the hardware acceleration.


![A (quasi-) real-time video processing on iOS
In previous posts, I showed you how to create a custom camera using AVFoundation and how to process an image with the accelerate framework. Let’s now combine both results to create a (quasi-) real-time (I’ll explain later what I mean with quasi) video processing.
Custom camera preview
To appreciate what we are going to do, we need to build a custom camera preview. If we want to process a video buffer and show the result in real-time, we cannot use the AVCaptureVideoPreviewLayer as shown in this post, because that camera preview renders the signal directly and does not offer any way to process it, before the rendering. To make this possible, you need to take the video buffer, process it and then render it on a custom CALayer. Let’s see how to do that.
As I already demonstated here, setting the AVFoundation stack is quite straightfoward (thank you, Apple): you need to create a capture session (AVCaptureSession), then a capture device (AVCaptureDevice) and add it to the session as a device input (AVCaptureDeviceInput). Translating this in source code, this becomes:
// Create the capture session
AVCaptureSession *captureSession = [AVCaptureSession new];
[captureSession setSessionPreset:AVCaptureSessionPresetLow];
// Capture device
AVCaptureDevice *captureDevice = [AVCaptureDevice defaultDeviceWithMediaType:AVMediaTypeVideo];
// Device input
AVCaptureDeviceInput *deviceInput = [AVCaptureDeviceInput deviceInputWithDevice:captureDevice error:nil];
if ( [captureSession canAddInput:deviceInput] )
[captureSession addInput:deviceInput];
The output buffer
Until here, nothing is new with respct to the previous post. Here, instead, where the new stuffs come in place. First of all, we need to define a video data output (AVCaptureVideoDataOutput) and add it to the session:
AVCaptureVideoDataOutput *dataOutput = [AVCaptureVideoDataOutput new];
dataOutput.videoSettings = [NSDictionary dictionaryWithObject:[NSNumber numberWithUnsignedInt:kCVPixelFormatType_420YpCbCr8BiPlanarFullRange] forKey:(NSString *)kCVPixelBufferPixelFormatTypeKey];
[dataOutput setAlwaysDiscardsLateVideoFrames:YES];
if ( [captureSession canAddOutput:dataOutput]
[captureSession addOutput:dataOutput];
Here, I defined the output format as YUV (YpCbCr 4:2:0). If you don’t know what I am talking about, I suggest you to give a look at this article. YUV or, more correctly, YCbCr is a very common video format and I use it here, because, except when the color brings some usefull information, you usually use graylevel images for image processing. So, the YUV format provides a signal with the intensity component (the Y) and 2 cromatic components (the U and the V).
The destination layer
Additionally, we need to create a new layer and use it as our rendering destination:
CALayer *customPreviewLayer = [CALayer layer];
customPreviewLayer.bounds = CGRectMake(0, 0, self.view.frame.size.height, self.view.frame.size.width);
customPreviewLayer.position = CGPointMake(self.view.frame.size.width/2., self.view.frame.size.height/2.);
customPreviewLayer.affineTransform = CGAffineTransformMakeRotation(M_PI/2);
We can add this layer to any other layer. I’m going to add it to my view controller view layer:
[self.view.layer addSublayer:customPreviewLayer];
Let’s go
The last step of the initial configuration is to create a GCD queue that is going to manage the video buffer and set our class as delegate of the video data output sample buffer (AVCaptureVideoDataOutputSampleBufferDelegate):
dispatch_queue_t queue = dispatch_queue_create("VideoQueue", DISPATCH_QUEUE_SERIAL);
[dataOutput setSampleBufferDelegate:self queue:queue];
Final setup
Now, remeber to add the following frameworks to your project:
AVFoundation
CoreMedia
CoreVideo
CoreGraphics
Video rendering
Since the view controller is now the delegate of the capture video data output, you can implement the following callback:
- (void)captureOutput:(AVCaptureOutput *)captureOutput
didOutputSampleBuffer:(CMSampleBufferRef)sampleBuffer
fromConnection:(AVCaptureConnection *)connection
AVFoundation fires this delegate method as soon as it has a data buffer available. So, you can use it to collet the video buffer frames, process them and render them on the layer that we previously created. For the moment, let’s collect the video buffer frames and render them on the layer. Later, we’ll give a look at the image processing.
The previous delegate method returns the sampleBuffer of type CMSampleBufferRef. This is a Core Media object we can bring into Core Video:
CVImageBufferRef imageBuffer = CMSampleBufferGetImageBuffer(sampleBuffer);
Let’s lock the buffer base address:
CVPixelBufferLockBaseAddress(imageBuffer, 0);
Then, let’s extract some useful image information:
size_t width = CVPixelBufferGetWidthOfPlane(imageBuffer, 0);
size_t height = CVPixelBufferGetHeightOfPlane(imageBuffer, 0);
size_t bytesPerRow = CVPixelBufferGetBytesPerRowOfPlane(imageBuffer, 0);
Remember the video buffer is in YUV format, so I extract the luma component from the buffer in this way:
Pixel_8 *lumaBuffer = CVPixelBufferGetBaseAddressOfPlane(imageBuffer, 0);
Now, let’s render this buffer on the layer. To do so, we need to use Core Graphics: create a color space, create a graphic context and render the buffer onto the graphic context using the created color space:
CGColorSpaceRef grayColorSpace = CGColorSpaceCreateDeviceGray();
CGContextRef context = CGBitmapContextCreate(lumaBuffer, width, height, 8, bytesPerRow, grayColorSpace, kCGImageAlphaNone);
CGImageRef dstImage = CGBitmapContextCreateImage(context);
So, the dstImage is a Core Graphics image (CGImage), created from the captured buffer. Finally, we render this image on the layer, changing its contents. We do that on the main queue:
dispatch_sync(dispatch_get_main_queue(), ^{
customPreviewLayer.contents = (__bridge id)dstImage;
});
Now, let’s do some clean-up (we are good citizens, right?).
CGImageRelease(dstImage);
CGContextRelease(context);
CGColorSpaceRelease(grayColorSpace);
If you build and run, you’ll see the camera in action with your camera preview.
Image processing
Now, let’s start with some funny stuffs. Let’s process the buffer before rendering it. For this, I am going to use the Accelerate framework. The Pixel_8 *lumaBuffer would be the input of my algorithm. I need to convert the this buffer into a vImage_Buffer and prepare a vImage_Buffer for the output of the image processing algorithm.
Add this code after the line generating the lumaBuffer:
...
Pixel_8 *lumaBuffer = CVPixelBufferGetBaseAddressOfPlane(imageBuffer, 0);
const vImage_Buffer inImage = { lumaBuffer, height, width, bytesPerRow };
Pixel_8 *outBuffer = (Pixel_8 *)calloc(width*height, sizeof(Pixel_8));
const vImage_Buffer outImage = { outBuffer, height, width, bytesPerRow };
[self maxFromImage:inImage toImage:outImage];
...
The -maxFromImage:toImage: method does all the work. Just for fun, I process the input image with a morphological operator that minimizes a region of interest within the image. Here it is:
- (void)maxFromImage:(const vImage_Buffer)src toImage:(const vImage_Buffer)dst
{
int kernelSize = 7;
vImageMin_Planar8(&src, &dst, NULL, 0, 0, kernelSize, kernelSize, kvImageDoNotTile);
}
If you now run it, rendering the outImage on the custom preview, you should obtain something like this:
You can download from here the example.
Final considerations
As I mentioned at the beginning of this post, this processing is done in quasi- real-time. The limitiation derives from the accelerate framework. This framework is optimized for the CPU that is anyway a limited resource. Depending on the final application, this limitation could not be important. However, if you start to add more processing before the rendering, you will see what I mean. Again, the result is really dependent on the application, but if you really want to process and display the processed results in real-time, maybe you should think of using the GPU… but this is something for a future post.
Geppy](http://25.media.tumblr.com/31000623e7521ea0dd63d96a17e3d950/tumblr_m4391x5J8H1qae4fpo1_r1_1280.png)
![iOS image processing with the accelerate framework
Sometime ago, my friend John Fox asked me how to reproduce a blurring image effect in the iOS SDK. Core Image on the iOS does not provide that effect. You can find in the Internet a couple of solutions for the iOS performing the convolution as matrix multiplication. That’s an ok approach, but it does not take advantage of the hardware acceleration.
I show here briefly how to apply a blurring filter to any image using the Accelerate framework and vImage.
Before iOS 5, image processing on the iOS was hard. You were required to build your own set of basic tools to perform convolution, fft, dct, scaling, rotation, histogram equalization and so on. That was amazing difficult and time consuming, especially because you had to keep in mind the hardware limitations of your device. Many developers opted to send an image to a remote server, process it and send the result back to the iPhone. Obviously, that solution was only suitable for special and limited cases. If you wanted real-time processing, then you had to really fight hard against the clock cycles.
Nowadays, iOS 5 allows you to do image processing operation on board of your device. You can achieve this either using the Core Image or the Accelerate framework. Both frameworks were already available on the Mac, but only recently they shipped with iOS.
Core Image offers a set of predefined image processing filters. Unfortunately, differently than the Mac version, the iOS version does not offer the possibility to build custom filters. Additionally, it does not provide a blurring filter (and that’s what my friend John was looking for). So, the alternative is to use the Accelerate framework. Let’s see how to do it.
vImage
The basic data structure used by the Accelerate framework to process images is vImage. It’s essentially a C structure containing four elements: the image buffer (it can be either the image intensity or the values of the red, green, blue and alpha channels), the image height and width and the number of bytes for each image row.
Any method belonging to the Accelerate framework uses vImage as input and/or output. So, before applying any processing to an image, you have to create a vImage. Core Graphics can help with that. The quickest way to create a vImage structure is the following:
CGImageRef imageCGSource = [[UIImage imageNamed:@"input.png"] CGImage];
Now, you need to extract from this CGImage the pixels that will constitute the buffer of our vImage. You can do that in different ways. In this case, I am showing you the simplest approach, i.e. extracting the pixel intensities from the image (you can also apply a similar approach to colored images).
Here how to do it:
// Compute the image size
size_t width = CGImageGetWidth(image);
size_t height = CGImageGetHeight(image);
// create a reference to a color space
CGColorSpaceRef colorSpace = CGColorSpaceCreateDeviceGray();
// allocate some memory for the bitmap buffer
Pixel_8 *bitmap = (Pixel_8 *)malloc(width * height * sizeof(Pixel_8));
long bytesPerPixel = 1;
long bytesPerRow = bytesPerPixel * width;
long bitsPerComponent = 8;
// create a context
CGContextRef context = CGBitmapContextCreate(bitmap,
width,
height,
bitsPerComponent,
bytesPerRow,
colorSpace,
kCGImageAlphaNone);
// draw the image on the context
CGContextDrawImage(context, CGRectMake(0, 0, width, height), image);
// create the vImage buffer
const vImage_Buffer srcBuffer = { bitmap, height, width, bytesPerRow };
// release the memory
CGColorSpaceRelease(colorSpace);
CGContextRelease(context);
Now, the srcBuffer is our vImage and it’s ready to be processed.
The convolution can be performed using one of the convolution functions offered by the Accelerate framework. Since we are using a gray level image, I will use here the vImageConvolve_Planar8 function. Here, Planar8 means that the image is treated as a simple matrix with each element representing a pixel intensity.
Before the convolution, you need to prepare (allocate) some memory space for the final result:
size_t width = srcBuffer.width;
size_t height = srcBuffer.height;
size_t bytesPerRow = srcBuffer.rowBytes;
Pixel_8 *outData = (Pixel_8 *)malloc( bytesPerRow * height );
const vImage_Buffer dstBuffer = { outData, height, width, bytesPerRow };
Then, you need to create a blurring filter:
int16_t *kernel = (int16_t *)malloc(size * size * sizeof(int16_t));
int16_t *tempKernel = kernel;
for (int i = 0; i < (size*size); i++) {
*tempKernel++ = 1;
}
And finally, convolve the image and the filter:
vImageConvolve_Planar8(&srcBuffer,
&dstBuffer,
NULL,
0,
0,
kernel,
size,
size,
size*size,
0,
kvImageBackgroundColorFill);
I suppose that you need to display the result somewhere. So, you need to convert the resulting dstBuffer to a UIImage.
I am attaching here a simple project. I load an image and apply a blurring filter with different kernel sizes.
I hope my friend John (and you) enjoyed this post.](http://25.media.tumblr.com/40b508b6eed9c606deca14f4a58ff9a9/tumblr_lz0gys47mV1qae4fpo1_r1_1280.png)