After Apple decided to allow its researchers to publicly share their findings, its first academic paper was published at the end of last year. Now, that research has just won a “Best Paper Award” at a prestigious machine learning and computer vision conference.
The first academic paper to be published in connection with Apple was Learning from Simulated and Unsupervised Images through Adversarial Training by Ashish Shrivastava, Tomas Pfister, Oncel Tuzel, Josh Susskind, Wenda Wang, Russ Webb Apple Inc. The full document can be found here.
This research on AI was submitted to CVPR (Conference on Computer Vision & Pattern Recognition) which is regarded as one of the most distinguished and influential of conferences in this field.
Keep in mind this was Apple’s first publication of its research and was one of over 2,600 submissions to CVPR 2017 and it won a Best Paper Award (along with one other submission), quite an impressive accomplishment!
Last month we saw Apple further its efforts to publish its research with the launch of its Apple Machine Learning Journal. This week we also saw three new journal posts that will be presented at Interspeech 2017 in Stockholm this week. One part that is particularly interesting is a audio sampling comparison of Siri from iOS 9, 10, and 11 (found at the very bottom of Vol. 4).
If you’re curious about Apple’s award winning research paper, but don’t want to dive into the whole thing, here is the Abstract:
With recent progress in graphics, it has become more tractable to train models on synthetic images, poten- tially avoiding the need for expensive annotations. How- ever, learning from synthetic images may not achieve the desired performance due to a gap between synthetic and real image distributions. To reduce this gap, we pro- pose Simulated+Unsupervised (S+U) learning, where the task is to learn a model to improve the realism of a simulator’s output using unlabeled real data, while preserving the annotation information from the simula- tor. We develop a method for S+U learning that uses an adversarial network similar to Generative Adversarial Networks (GANs), but with synthetic images as inputs instead of random vectors. We make several key modifi- cations to the standard GAN algorithm to preserve an- notations, avoid artifacts, and stabilize training: (i) a ‘self-regularization’ term, (ii) a local adversarial loss, and (iii) updating the discriminator using a history of refined images. We show that this enables generation of highly realistic images, which we demonstrate both qualitatively and with a user study. We quantitatively evaluate the generated images by training models for gaze estimation and hand pose estimation. We show a significant improvement over using synthetic images, and achieve state-of-the-art results on the MPIIGaze dataset without any labeled real data.