This is a very large question. And a hard task.
I think, the easy way is to extract one frame by second if the source is a video stream.
Then use OpenCV to make a face detection.
Once you got the faces, feed a NN for recognitions.
Some links for face recognition in Deep Learning:
https://aboveintelligent.com/face-recognition-with-keras-and-opencv-2baf2a83b799
https://github.com/rajathkumarmp/FaceRecog-Keras/blob/master/faceRecog.ipynb
solved TensorFlow and person recognition in video stream