Jarvis
Project, 2023
JarvisJarvis
Timeline
2 Weeks, Dec 2023
Tools
OpenCV
Python
Mediapipe
Flask
Next.js
Typescript
Three.js
Cloudflare R2
Overview
An interactive, gesture-controlled 3D hologram. Controlled by voice commands and hand movements to manipulate objects on a 3D projection.

Overview

We built Jarvis for our design project course, SE101, at Waterloo. Inspired by Iron Man, it combines computer vision and optical illusions into a gesture-controlled 3D hologram. Building it was a fun exercise in going from sci-fi idea to a real prototype incredibly quickly, interacting with genuinely real-looking 3D holograms with just hand movements was super cool.

Building Jarvis

We used a Flask backend running a WebSockets server and OpenCV thread in parallel, connected to a Next.js frontend running Three.js. Here's the architecture diagram:

Jarvis

Backend

Our goal was accurate 3D hand tracking and gesture recognition. While stereo imaging or depth-sensing cameras are common approaches, we took a simpler approach because of project constraints.

By computing the distance between two hand landmarks (index 0 to 5) that kept a relatively constant distance across multiple gestures, we achieved surprisingly accurate depth perception by to a linear scale.

Jarvis

An important intermediate step was correcting camera distortion, which we did by calibrating the camera using a chessboard pattern. OpenCV provides useful functions for this, such as findChessboardCorners, calibrateCamera, getOptimalNewCameraMatrix, and undistort.

Jarvis

The entire data flow involves continuously taking in camera frames, then extracting 3D coordinates as well as gesture recognition from Mediapipe models in real time to send over our WebSockets connection. All this runs in a multithreaded Flask server.

Frontend

Our Next.js app reflects the position and gesture data from our backend through 3D models. It uses Three.js with various pmndrs libraries, and 3D models from Sketchfab. We also used the Web Speech API and ElevenLabs for voice control and text-to-speech.

Projecting this onto the hologram hardware worked because of the black background, so any UI elements and 3D models appeared to float in mid-air.

Hardware

The key to achieving our hologram was the Pepper's Ghost illusion. By projecting a downward-facing monitor onto a 45-degree angled piece of glass, we created the effect of a floating holographic image!

Jarvis

Results

Check out the final demo video:

The final working hologram was amazing to see in person, and the project overall was super fun and rewarding.

I gave a talk on OpenCV live, hosted with the CEO of OpenCV, about the process of making the project. You should give it a listen if you're interested in learning more!

Ishaan Dey
Updated 1/1/2025