In Apple’s latest Machine Learning Journal entry, the Siri Speech Recognition Team shares an overview of the work behind improving Siri’s understanding of names for regional points-of-interest by incorporating the user’s location.
Based in part on data from the U.S. Census Bureau, Apple has been able to tune Siri to better understand users based on where they are and what POIs they’re more likely to ask about.
Apple says machine learning on its own has helped improve automatic speech recognition for general language over the years, but “recognizing named entities, like small local businesses” has proved a performance bottleneck.
We decided to improve Siri’s ability to recognize names of local POIs by incorporating knowledge of the user’s location into our speech recognition system.
That’s done partly by relying on data collected by the U.S. Census Bureau:
We define geo regions based on the combined statistical areas (CSAs)  from U.S. Census Bureau. The CSAs consist of adjacent metropolitan areas that are economically and socially linked, as measured by commuting patterns. There are 169 CSAs covering 80% of the population of the United States. We build a dedicated Geo-LM for each CSA, with a single global Geo-LM to cover all areas not defined by CSA.
To efficiently search the CSA for a user, we store a latitude and longitude lookup table derived from the rasterized cartographic boundary (or shapefile) provided by the U.S. Census Bureau . At runtime, the complexity of geolocation lookup is O(1).
The entry goes on to detail the mechanics behind identifying correct points-of-interest through speech patterns based on location. The Siri Speech Recognition Team says the approach works independent of language, too, so it can be applied to locales beyond U.S. English.
Read the full entry on Apple’s Machine Learning Journal for a behind-the-scenes look at some of the expertise required for improving Siri.