How to build a real-time Face Intelligent camera in IOS

Demo of IOS App with facial intelligent

Leveraging Vision API an Apple is a framework that lets you apply high-performance image analysis to detect, sort, and classify images and video, letting developers take their imagination to the next level.

What is Vision?

Vision is a Framework that lets you apply high-performance image analysis and computer vision technology to images and videos, thus automatically identifying faces, detecting features, classifying scenes, saliency detection, barcodes detection, text, features, image similarity, style classification and object tracking, making it an incredible tool for sorting and filtering large numbers of image files and videos.

All Vision framework APIs use three constructs:

  1. Request: The request defines the type of thing you want to detect and a completion handler that will process the results. This is a subclass of VNRequest.
  2. Request handler: The request handler performs the request on the provided pixel buffer (think: image). This will be either a VNImageRequestHandler for single, one-off detections or a VNSequenceRequestHandler to process a series of images.
  3. Results: The results will be attached to the original request and passed to the completion handler defined when creating the request. They are subclasses of VNObservation

Getting Started with Vision, now, let get into the code

The above method setupCamera() above uses AVFoundation to discover the built-in front camera. If there is a camera we add it to our AVCaptureSession which manages the stream coming from the camera from the source (back or front camera), which gives us the buffer and the steam. In the setupPreview() we create a preview layer which we use to render the feeds from the camera and add the existing feeds to the current ViewController.

We use the captured output to get the feed for our VNImageRequestHandler, the object that processes one or more image analysis requests pertaining to a single image. The captureOutput(_ ..) method is called anytime a frame it receives from the buffer. We can also get the face landmark. The imageRequest handler orientation is set to the leftMirrored because we are using the front camera. Using the coordinate giving from the detection, we can process the image as pleases. In my case, I overlay different emojis to the faces

Full code can be found on my Github Page

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store