Leveraging Vision API an Apple is a framework that lets you apply high-performance image analysis to detect, sort, and classify images and video, letting developers take their imagination to the next level.
What is Vision?
Vision is a Framework that lets you apply high-performance image analysis and computer vision technology to images and videos, thus automatically identifying faces, detecting features, classifying scenes, saliency detection, barcodes detection, text, features, image similarity, style classification and object tracking, making it an incredible tool for sorting and filtering large numbers of image files and videos.
All Vision framework APIs use three constructs:
- Request: The request defines the type of thing you want to detect and a completion handler that will process the results. This is a subclass of
- Request handler: The request handler performs the request on the provided pixel buffer (think: image). This will be either a
VNImageRequestHandlerfor single, one-off detections or a
VNSequenceRequestHandlerto process a series of images.
- Results: The results will be attached to the original request and passed to the completion handler defined when creating the request. They are subclasses of
Getting Started with Vision, now, let get into the code
The above method setupCamera() above uses AVFoundation to discover the built-in front camera. If there is a camera we add it to our AVCaptureSession which manages the stream coming from the camera from the source (back or front camera), which gives us the buffer and the steam. In the setupPreview() we create a preview layer which we use to render the feeds from the camera and add the existing feeds to the current ViewController.
We use the captured output to get the feed for our VNImageRequestHandler, the object that processes one or more image analysis requests pertaining to a single image. The captureOutput(_ ..) method is called anytime a frame it receives from the buffer. We can also get the face landmark. The imageRequest handler orientation is set to the leftMirrored because we are using the front camera. Using the coordinate giving from the detection, we can process the image as pleases. In my case, I overlay different emojis to the faces
Full code can be found on my Github Page