In each frame, the upper left is the input; the upper right is the geometric tracking illustrated by an overlayed yellow face mesh; the bottom is the recognized expression visualized by training images.