The training procedure will attempt to access all URLs used during previous tags.save calls for the passed UIDs. Make sure all URLs used in prior tags.save calls remain public until the training process is completed.
For good results, minimal images size should keep the distance between the eyes larger than 16 pixels. So it really depends not only on the image size but also the face size. Of course that large image will usually contain bigger (pixelwise) faces.
When multiple faces appear in the same photo, they are assumed to be of different people. This is done to reduce noise in most results, and the API will only return the most likely result of all the faces in the photo. Since your example has Gates in multiple shots, you only receive a single recognition. Currently the only way to circumvent around this is to send individual face photos (via single or multiple calls).
For performance reasons, we only look for faces which are about +/- 45 degrees in angle. the reason is that in almost all photos faces are in the upright position and it does not worth the effort to always look for other orientations both in terms of cpu utilization and in terms of false positives (You will get more errors for regular photos). if you really want to look for faces in all orientations, you can run the detector 4 times on different variations of the photo and combine the results. another alternative which may be applicable to your use case is to use the jpeg exif data generated by most cameras to rotate the image to the correct orientation before sending it over.
You don't really have to call the faces.train. It just get things to work faster. If you won't call the train, face.com will pick the saved tags, within the next few minutes, or so, depending on the load of the system, and will train/re-train it.