The Design of SIRI

(This article has been sitting on my desktop for ages, haven’t found the right place to post it. Now, I do.)

When the iPhone 4S launched in October, I expected a whole new makeover for the new iPhone. Many techies were already following various sites to get as much information as possible about the look of the new iPhone before it was launched at the WWDC and when Apple proudly displayed their new technology, there were mixed feelings about the iPhone 4S.

A look at the Apple’s page on Siri with a tag line that reads “Your wish is its command” is sufficient to pique the curiosity of any person.

Apple proudly says “Siri on iPhone 4S lets you use your voice to send messages, schedule meetings, place phone calls, and more. Ask Siri to do things just by talking the way you talk. Siri understands what you say, knows what you mean, and even talks back. Siri is so easy to use and does so much, you’ll keep finding more and more ways to use it.”

Many of us already know the 4 Million record break sales that Apple had in the first week of the launch of iPhone 4S.

So, besides the sales and statistics how does Siri work?

Was there a computer that was on par with Siri’s standards?

The answer to the second question is a variable ‘Yes’, and ‘No’ its not ‘Chitti’ from the movie Robo. In the January of 2011, IBM released Watson to the general public, an AI computer system which was capable of answering questions that were asked in natural language. A computer system that they have been working on for close to 2 years. IBM describes Watson as an Application of advanced natural language processing, information retrieval, knowledge representation and reasoning, and machine learning technologies to the field of open domain question answering”. The system is built on IBMs DeepQA technology for hypothesis generation, massive evidence gathering, analysis and scoring. More on Watson and DeepQA at Watson was featured in the AI Magazine fall 2010 issue on Question Answering. Back to Siri.

Just like Watson, Siri is able to understand natural language and works on Information Retrieval Technology. In order for Siri to work simply it has to access the web for information that’s outside the domain of the iPhone memory. For eg., If you want Siri to text someone, it uses the information present in the phone and if you want to find a restaurant Siri uses Information Retrieval to retrieve data from the web. A simple form of IR (Information Retrieval) that we use everyday is the normal web search engine. Siri is based on “Cognitive Assistant that Learns and Organizes” short for CALO and DARPA’s PAL (Perceptive/Personalized Assistant that Learns). If we dig in deeper, we would be able to observe that every iPhone 4S’ Siri is different. Siri learns from its users, learns how they interact with the virtual world and gives them results that are suitable for that particular user. Siri also uses active ontologies paired up with CALO and PAL, when one of my friend’s posted a video where he talked with Siri in a thick Indian accent, Siri was able to understand perfectly and respond with the accurate results; I was amazed. The ontologies in use for designing such an AI service is truly remarkable. Siri is a speech interpreter and once the request is placed, it figures out the intent by using Active Ontologies and analyzes the request, then Siri proceeds to call the relevant partner API to gather suggestions. For eg., when we say “suggest me some good Indian restaurant for dinner” is interpreted using the domain ontology Siri has relating to restaurant, compromising a particular set of rules like domain specific vocabulary, rules of interaction, reviews from other APIs, it might even take in our current location to show the required relevant results. The best part, Siri is proactive, it controls the request and tries to question back to the user using seemingly open ended questions, that traverses its set of ontological rules, Siri keeps doing it till it has the exact objects to make the relevant API calls. However, Siri is not a search engine. It does use Information Retrieval Technology, the field is so vast to encompass just the basic search engine in it. Siri is an answer engine, which is meant to make interactions with a system inherently logical, hence more personal and meaningful. That is one of the reasons why each and every iPhone 4S’ Siri is different.

For now the ontologies in the innards of Siri are just restaurants, weather, sports, travel which have been integrated with various partner APIs like Yelp or Zagat. These help users in doing regular tasks and nothing fancy. This is the first version of Siri, some may call it Beta. Yet, looking at Apple’s history of how their products are technologically advanced, this particular technology that Apple acquired last year holds great promise for expansion to various domains and encompass wide variety of APIs.

© 2012 Ajan Kancharla