Third-party integration into Siri remains at the top of many of our TUAW wish lists. Imagine being able to say “Play something from Queen on Spotify” or even “I want to hear a local police scanner.” And Siri replying, “OK, you have two apps that have local police scanners. Do you want ScannerPro or Wunder Radio?”
So why doesn’t Siri do that?
Well, first of all, there are no third party APIs. Second, it’s a challenging problem to implement. And third, it could open Siri to a lot of potential exploitation (think of an app that opens every time you say “Wake me up tomorrow at 7:00 AM” instead of deferring to the built-in timer).
That’s why we sat down and brainstormed how Apple might accomplish all of this safely using technologies already in-use. What follows is our thought experiment of how Apple could add these APIs into the iOS ecosystem and really allow Siri to explode with possibility.
Ace Object Schema. For anyone who thinks I just sneezed while typing that, please let me explain. Ace objects are the assistant command requests used by the underlying iOS frameworks to represent user utterances and their semantic meanings. They offer a context for describing what users have said and what the OS needs to do in response.
The APIs for these are private, but they seem to consist of property dictionaries, similar to property lists used throughout all of Apple’s OS X and iOS systems. It wouldn’t be hard to declare support for Ace Object commands in an application Info.plist property lists, just as developers now specify what kinds of file types they open and what kind of URL addresses they respond to.
Practicality. If you think of Siri support as a kind of extended URL scheme with a larger vocabulary and some grammatical elements, developers could tie into standard command structures (with full strings files localizations of course, for international deployment).
Leaving the request style of these commands to Apple would limit the kinds of requests initially rolled out to devs but it would maintain the highly flexible way Siri users can communicate with the technology.
There’s no reason for devs to have to think of a hundred ways to say “Please play” and “I want to hear”. Let Apple handle that — just as it handled the initial multitasking rollout with a limited task set — and let devs tie onto it, with the understanding that these items will grow over time and that devs could eventually supply specific localized phonemes that are critical to their tasks.
Handling. Each kind of command would be delineated by reverse domain notation, e.g. com.spotify.client.play-request. When matched to a user utterance, iOS could then launch the app and include the Ace dictionary as a standard payload. Developers are already well-acquainted in responding to external launches through local and remote notifications, through URL requests, through “Open file in” events, and more. Building onto these lets Siri activations use the same APIs and approaches that devs already handle.
Security. I’d imagine that Apple should treat Siri enhancement requests from apps the same way it currently works with in-app purchases. Developers would submit individual requests for each identified command (again, e.g. com.spotify.client.play-request) along with a description of the feature, the Siri specifications — XML or plist, and so forth. The commands could then be tested directly by review team members or be automated for compliance checks.
In-App Use. What all of this adds up to is an iterative way to grow third party involvement into the standard Siri voice assistant using current technologies. But that’s not the end of the story. The suggestions you just read through leave a big hole in the Siri/Dictation story: in-app use of the technology.
For that, hopefully Apple will allow more flexible tie-ins to dictation features outside of the standard keyboard, with app-specific parsing of any results. Imagine a button with the Siri microphone that developers could add directly, no keyboard involved.
I presented a simple dictation-only demonstration of those possibilities late last year. To do so, I had to hack my way into the commands that started and stopped dictation. It would be incredibly easy for Apple to expand that kind of interaction option so that spoken in-app commands were not limited to text-field and text-view entry, but could be used in place of touch driven interaction as well.