Augmented reality & computer vision

tl;dr: Media capture, object detection and WebGL can be combined to hack a simple augmented reality experience. See my demo called we are all Saints.

To simplify things, augmented reality is based mainly on 2 technologies:

  • Sensors to track the position and orientation of the user field of view (FOV)
  • Computer vision to detect objects in the user FOV

Of course modern AR headsets like Hololens combine both to create the best experience possible. I explored using common devices with web technologies to create a simple AR experience. I decided to go for computer vision applied to media capture.

Screenshot of the demo

Object detection

There are few different libraries to detect objects in an image. Most of them seem to be based on OpenCV, ported to JavaScript via emscripten. I didn't spend too much time looking for a library and quickly settled for js-objectdetect. It's hand-written (as opposed to converted via emscripten) so it makes easy to read, understand and debug if needed. It can detect different types of objects but I used human faces here.

Once set up properly, js-objectdetect accepts a video element as an input, so I just pass to it the one that displays the camera feed I got from getUserMedia.

It return the coordinates in pixel of the faces detected (left, top, width and height).

Recreating a virtual 3D space

Next step is to place the faces detected in the image in a virtual 3D space by estimating their respective positions. I used A-Frame to create the 3D world because this framework is easy to use.

Positioning an element on the x and y axes is really easy. We need to convert the position in the (-1, 1) range, more about that later. For example a point centred in the image will have both its x and y values set to 0. Knowing the position in pixels and the size of the video the values are easy to get (Also the y axis direction is reverse in the web and WebGL). For the x axis, half the width needs to be added so that the element is horizontally centred on the face.

The z axis is a bit trickier. It needs to get estimated and calibrated. I used the height value. I noticed that once detected my face takes at most 80% of the image height when I stand at about 50cm from the camera. The further I step back, say about 2m, the smaller my face gets, to take about 30% of the screen height.

I used these distances as the values for the camera frustum near and far clipping planes (near and far attributes on the <a-camera> element).

Then I only need to convert and clamp the height of the face detected between the % values of image height that I determined above. Once converted to the (-1, 1) range, they'll vary proportionally between 50cm and 2m. That gives me a good enough approximation of where my face is located in respect to the camera.

I also used the detected height to position the virtual element a few apparent centimetres on top the faces.


A-Frame uses Three.js under the hood. That's what I'm using to perform computations.

Now that I've got the x, y and z position in the (-1, 1) range I need to unproject this vector along the active camera. That is not as complicated as it sounds:

const pos = new THREE.Vector3(x, y, z).unproject(camera);

This returns a vector corresponding to the position in space of the object given the current FOV (i.e. camera).

Finally I can set the position of the A-Frame element in the virtual space that I want to super impose over faces:

element.setAttribute('position', pos);


This experiment wasn't as hard as it sounded at first and the result is quite convincing. I learnt a lot about projection and unprojection in the process. As usual the code is on GitHub as all-saints-ar because we are all saints!

Experimenting with VR

I just finished a short series of experimentations about virtual reality called VR Diary. Here's a brief summary of why and how.

VR, but what for?

Creating content for virtual reality is easy. A-Frame is probably the easiest way to get started. It's entirely free and open-source and only requires a basic of HTML programming. On other projects, I used Google VR SDK for Unity and the graphic UI of Unity allows you to create content without even typing a single line of code.

So now the real question is what are we going to use VR for? VR is just a medium and in itself has no value.

This is what I explored in this project. VR is an immersive and engaging platform so I decided to use it to share personal experiences with the readers.

It took a direction towards philosophical questioning and reality challenging that I hadn't expected. But overall it's been successful from a personal standpoint: the experiments are consistent (except maybe the first one) and I've been able to go through the 6 days.

Working as a hobby

An essential aspect to this experiment is being on holiday with a lot of free time. I dedicated about 2 or 3 hours a day actively working on the VR experiment and writing the post. The rest of the day was spent thinking passively about what to do next.

This liberty was essential to generate new ideas to experiment. Creative people know you can't just sit at a desk and expect to get new ideas. That's when you go out for a walk, sit in the grass, spend some time in a zoo or a museum that you release the most of your creative potential.

I thoroughly enjoy this lifestyle of working actively a few hours in the morning and passively for the rest of the day. Maybe that's how life should be: dedicating most of your time to creativity and self-expression and work only for how long is necessary. This is something I want to explore more in the future.

Mozilla IOT Meetup London October 2016

On Wednesday 12, October (2 days ago) the first Mozilla IOT London meetup took place at the Mozilla Common Space near London Bridge.

A bit of history

The meetup was rebranded from the previous Firefox OS meetup group. A lot happened at Mozilla this year. The Firefox OS project was turned into a community project and the Connected Devices team was created.

Now that things got more stable we decided it was a good time to resume a series of meetups.

This was the first session so we're still learning and experimenting, but we're planning on doing monthly or bimonthly meetups. This first session was not recorded, but the plan is to record or stream the next ones.

The first session

Meetups are and should always be about meeting people so we want to emphasise the social aspect of it and making a place where ideas are shared and discussions happen naturally.

Francisco speaking about the creation of the Connected Devices team.

For this first session, we had a nice lineup of speakers:

In addition to the amazing speakers above I wanted to thank Mandy for the coordination, Dietrich and the devrel team for sponsoring the meetup and all the attendees.

What's next?

We're super happy to receive feedback from the participants. If you happen to be in London during one our of next sessions, make sure you're attending and let us know what would you like this meetup to be?

The talented Becky Jones delivering an inspiring talk.

See you soon!