DETAILED ACTION
This Office Action is in response to the correspondence filed by the applicant on 7/18/2022.
The Amendment filed on 7/18/2022 has been entered.  
Claims 1, 3-8, 10, 12-13, 15-16, and 18-19 have been amended by Applicant.
Claims 1-20 remain pending in the application of which Claims 1, 8, and 15 are independent.  
Applicant’s amendments and arguments are considered but are either unpersuasive or moot in view of the new grounds of rejection that were necessitated by the amendments to the Claims.   

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The Information Statements (IDS) filed on 8/7/2019 have been accepted and considered in this office action and are in compliance with the provisions of 37 CFR 1.97.

Examiner Notes
Claims 8-14 recite, “A computer program product comprising: one or more computer-readable storage media …"  The specification describes the computer-readable storage medium, which is specifically limited to non-transitory propagating signals ([0059]).  Thus, the claims are not directed to non-statutory subject matter.


Response to Arguments
Regarding 112(b) rejections, Applicant has amended the claims, the rejections are now withdrawn. 

Regarding 103 rejections, Applicant’s arguments, pages 10-13 of Remarks (7/18/2022) with respect to rejections have been fully considered and are moot upon a further consideration and a new ground(s) of rejection made under AIA  35 U.S.C. 103 as being unpatentable over PATEL (US 2014/0122059 A1), and further in view of DEVRIES (US 10,832,668 B1).  
The Applicant asserts, Patel fails to teach, “wherein the local database stores a set of user queries and commands, a set of intentions, and a set of responses, wherein each user query and command of the set of user queries and commands has at least one corresponding intention of the set of intentions, and wherein each intention of the set of intentions has at least one corresponding response of the set of responses.”  In reply, PATEL teaches interpreting user’s commands/requests locally and executing the commands/requests locally when the commands/requests are found in the database (i.e., a device lexicon cache 112).  The Examiner reviewed the specification for the definition of “intent”, however, it is not defined in the specification.  The commands/requests of PATEL have their corresponding user intents (Par 77 – “In another embodiment, an action associated with a cache entry may be presented for user confirmation prior to performing the action to ensure that the user intended to execute the identified action.”; Par 84 – “In response to the displayed cache entries portions, a user may select a particular cache entry portion that mostly corresponds to the media query the user intended to request.”).  With the broadest reasonable interpretation, “play my favorite show” (See Patel Par 77) is a query that corresponds to “a device action” to play the user’s favorite show (i.e., user’s intention to control device to play a show), whereas “get me the movie jaws” (see Par 81) is a query corresponds to a “media search query” to search for a movie titled Jaws (i.e., user’s intention to search a movie titled Jaws instead of playing the movie Jaws). Based on the user’s requests/commands, the associated intents are found (i.e., what machine-understandable commands/requests are to be executed), and the associated responses are generated (e.g., the search results are displayed, etc.).  Thus, PATEL’s database stores the commands/requests, their associated intents, and the responses.  
However, for the clarity of the rejections, the Examiner provides DEVRIES that explicitly includes the limitations at issue.  DEVIRES clearly discloses a device database stores the commands, intents, and responses. Please see the rejections below for more details.


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.


Claims 1, 5-6, 8, 12-13, 15, and 18-19 are rejected under 35 U.S.C. 103 as being unpatentable over PATEL (US 2014/0122059 A1), and further in view of DEVRIES (US 10,832,668 B1).

REGARDING CLAIM 1, PATEL discloses a computer-implemented method comprising: 
receiving, by one or more processors of a computing device with a local database (PATEL Fig. 1A – “Media Device 110; Lexicon Cache 112; Natural Language Processing Cache 114”), an audio input (PATEL Fig. 2 – “Receiving voice input 202”; Fig. 1A – “Voice Input Device 102” and “Media Device 110”; Par 41 – “In an embodiment, voice input device 102 generally represents one or more microphones or other voice recognition devices that can be used to receive voice input from one or more users. In an embodiment, a microphone may be a device separate from media device 110, integrated as part of a media device 110, or part of another device (e.g., a remote control, a phone, a tablet, a keyboard, etc.) that is communicatively coupled with the media device 110.”), wherein the local database stores a set of user queries and commands (PATEL Par 77 – “In an embodiment, the textual representation of voice input received from a user is compared against a set of reserved input text strings or sampled voice data stored in a device lexicon cache 112 in Step 208. A device lexicon cache is a repository of sampled voice data and/or words and word phrases that are mapped to one or more device actions, media search queries, or other commands related to applications running on a media device 110.”),  a set of intentions (PATEL Par 77 – “In an embodiment, the textual representation of voice input received from a user is compared against a set of reserved input text strings or sampled voice data stored in a device lexicon cache 112 in Step 208. A device lexicon cache is a repository of sampled voice data and/or words and word phrases that are mapped to one or more device actions, media search queries, or other commands related to applications running on a media device 110. For example, entries in a device lexicon cache 112 may include frequently used commands and phrases including “pause,” “live TV,” “volume up,” “play my favorite show,” etc. In an embodiment, if a lexicon cache 112 includes a cache entry corresponding to the textual representation of voice input received from a user, then the action or media search query stored in association with the cache entry may be processed automatically by media device 110.”), and a set of responses (PATEL Par 44 – “Media search service 108 may additionally include one or more Internet search engines. In an embodiment, some search results may be cached on media device 110 from data from the media search service 108 so that searches may be performed at the client when a connection to media search service 108 is unavailable.”; Par 34 – “The natural language processing cache may store one or more mappings between text signatures and one or more device actions and/or media search queries. In response to detecting a natural language processing cache entry corresponding to the signature, the media device may perform the one or more associated actions and/or send an associated media search query to a media search service.”), wherein each user query and command of the set of user queries and commands has at least on corresponding intention of the set of intentions (PATEL Par 77 – “In an embodiment, the textual representation of voice input received from a user is compared against a set of reserved input text strings or sampled voice data stored in a device lexicon cache 112 in Step 208. A device lexicon cache is a repository of sampled voice data and/or words and word phrases that are mapped to one or more device actions, media search queries, or other commands related to applications running on a media device 110. For example, entries in a device lexicon cache 112 may include frequently used commands and phrases including “pause,” “live TV,” “volume up,” “play my favorite show,” etc. In an embodiment, if a lexicon cache 112 includes a cache entry corresponding to the textual representation of voice input received from a user, then the action or media search query stored in association with the cache entry may be processed automatically by media device 110.”; Par 81 – “For example, a user may specify voice input corresponding to the word phrase “get me the movie jaws” in order to search for a movie titled “Jaws.” Natural language processing techniques may be used to recognize in the context of a request for media content that the words “get,” “me,” “the,” and “movie” are extraneous in the example user's command for the purposes of a media search query and may translate the user's command into a modified textual representation including only the word “jaws.””; In other words, the sampled voice data and/or words and word phrases are corresponding to “actions,” “media search queries”, or “other commands related to applications.” For example, “play my favorite show” is a query that corresponds to “a device action” to play the user’s favorite show (i.e., user’s intention to control device to play a show) , whereas “get me the movie jaws” is a query corresponds to a “media search query” to search for a movie titled Jaws (i.e., user’s intention to search a movie titled Jaws instead of playing the movie Jaws).), and wherein each intention of the set of intentions has at least one corresponding response of the set of response (PATEL Fig. 2 – “Performing an action/search based on result text 214”; Par 77 – “In an embodiment, if a lexicon cache 112 includes a cache entry corresponding to the textual representation of voice input received from a user, then the action or media search query stored in association with the cache entry may be processed automatically by media device 110.”; Par 78 – “For example, a user may have a favorite television show and may desire a mapping in the device lexicon cache 112 so that in response to the user speaking the command “play my favorite show,” the media device causes the most recent recording of the favorite television show to be played.”; Par 81 – “For example, a user may specify voice input corresponding to the word phrase “get me the movie jaws” in order to search for a movie titled “Jaws.””); 
transcribing, by the one or more processors, the audio input to text (PATEL Par 71 – “In an embodiment, a speech-to-text service 104 translates voice input data into a textual representation of the voice input data.”; Par 73 – “In an embodiment, the textual representation of voice input received from speech-to-text service 104 may formatted as plain text, formatted to indicate a combination of one or more of phonemes, chronemes and minimal pairs associated with the voice input, or any other representation format suitable for further textual analysis.”); 
comparing, by one or more processors, the text to the set of user queries and commands using a phonetic algorithm (PATEL Par 77 – “In an embodiment, the textual representation of voice input received from a user is compared against a set of reserved input text strings or sampled voice data stored in a device lexicon cache 112 in Step 208. A device lexicon cache is a repository of sampled voice data and/or words and word phrases that are mapped to one or more device actions, media search queries, or other commands related to applications running on a media device 110. For example, entries in a device lexicon cache 112 may include frequently used commands and phrases including “pause,” “live TV,” “volume up,” “play my favorite show,” etc. In an embodiment, if a lexicon cache 112 includes a cache entry corresponding to the textual representation of voice input received from a user, then the action or media search query stored in association with the cache entry may be processed automatically by media device 110.”); 
determining, by the one or more processors, whether a user query or command of the set of user queries and commands meets a pre-defined [threshold of similarity to the text] condition (PATEL Fig. 2 – “Translated Text Exists in Lexicon Cache? 208->Yes or No”; Par 32 – “In response to detecting a device lexicon cache entry corresponding to the textual representation of the voice input, the media device may cause the one or more actions associated with the cache entry to be performed.”); 
responsive to determining that the user query or command meets the pre-defined [threshold of similarity] condition (PATEL Fig. 2 – “Translated Text Exists in Lexicon Cache? 208->Yes or No”; Par 32 – “In response to detecting a device lexicon cache entry corresponding to the textual representation of the voice input, the media device may cause the one or more actions associated with the cache entry to be performed.”), identifying, by the one or more processors, an intention of the set of intentions that corresponds to the user query or command (PATEL Par 32 – “In an embodiment, the textual representation of the voice input is used to search a device lexicon cache storing one or more mappings between textual representations of media device commands and one or more device actions. For example, device actions may include actions performed locally on a media device such as changing the channel, scheduling a media content recording, listing recorded content, etc., or the device actions may be requests transmitted to other services.”; Par 77 – “A device lexicon cache is a repository of sampled voice data and/or words and word phrases that are mapped to one or more device actions, media search queries, or other commands related to applications running on a media device 110. For example, entries in a device lexicon cache 112 may include frequently used commands and phrases including “pause,” “live TV,” “volume up,” “play my favorite show,” etc. In an embodiment, if a lexicon cache 112 includes a cache entry corresponding to the textual representation of voice input received from a user, then the action or media search query stored in association with the cache entry may be processed automatically by media device 110. In another embodiment, an action associated with a cache entry may be presented for user confirmation prior to performing the action to ensure that the user intended to execute the identified action.”); 
identifying, by the one or more processors, a response of the set of responses (PATEL Fig. 2 – “Performing an action/search based on result text 214”; Par 77 – “In an embodiment, if a lexicon cache 112 includes a cache entry corresponding to the textual representation of voice input received from a user, then the action or media search query stored in association with the cache entry may be processed automatically by media device 110.”; Par 87 – “In an embodiment, search queries generated by media device 110 may be used to search for media content item results and associated information including, but not limited to, media content program titles, media content scheduling information, media device application content, or tags associated with media content.”; Par 90 – “Search results may include one or more content item listings and any other additional data associated with media content represented by the content item listings. For example, the search results may include information associated with one or more content item listings including title information, synopsis, scheduling information, actor or actress names, etc.”; Par 44 – “Media search service 108 may additionally include one or more Internet search engines. In an embodiment, some search results may be cached on media device 110 from data from the media search service 108 so that searches may be performed at the client when a connection to media search service 108 is unavailable.”) that corresponds to the identified intention (PATEL Par 77 – “In an embodiment, if a lexicon cache 112 includes a cache entry corresponding to the textual representation of voice input received from a user, then the action or media search query stored in association with the cache entry may be processed automatically by media device 110.”); and 
outputting, by the one or more processors, the response audibly (PATEL Par 95 – “In an embodiment, after media device 110 applies any filters, weighting, or other modifications to the list of content item results received from media search service 108, the content items results may be displayed to a user in Step 306.”; Par 39 – “The media device 110 may present media content by playing the media content (e.g., audio and/or visual media content), displaying the media content (e.g., still images), or by any other suitable means.”).
PATEL does not explicitly teach the [square-bracketed] limitations, and teaches the underlined features instead.

DEVRIES discloses a computer-implemented method comprising: 
receiving, by one or more processors of a computing device with a local database (DEVRIES Fig. 9), an audio input (DEVRIES Fig. 10A – “Receive input audio corresponding to an utterance 140”), wherein the local database stores a set of user queries and commands (DEVRIES Fig. 9 – “Speech Recognition 902”; Col 20:53-21:6 – “The device 110 may further include a speech recognition component 902. …. the speech recognition component 902 may be configured only with respect to the frequently input commands or other commands to be processed by the local device 110. … the device 110 may store speech recognition data such as acoustic models and language models specific to the commands (such as frequently input commands) to be handled by the local device 110 and/or specific to user(s) associated with the local device 110.”), a set of intentions (DEVRIES Fig. 9 – “Natural Language 904”; Col 21:7-28 – “The device 110 may additionally include a natural language component 904. … the natural language component 904 may be configured only with respect to the frequently input commands or other commands to be processed by the local device 110. The natural language component 904 may include recognizers specific to the domains associated with the frequently input commands. Each recognizer of the natural language component 904 may include an NER component and an IC (intent classification) component. … the device 110 may include a natural language storage and/or an entity library storage that include speech recognition data such as device domains, domain grammars, domain intents, and gazetteers specific to the commands (such as frequently input commands) to be handled by the local device 110 and/or specific to user(s) associated with the local device 110.”), and a set of responses (DEVRIES Fig. 9 – “Text-to-Speech 906”; Col 21:29-55 – “The device 110 may additionally include a text-to-speech (TTS) component 906. … the TTS component 906 may be configured only with respect to the frequently input commands or other commands to be processed by the local device 110. … the device 110 may include a voice unit storage and TTS storage that include speech recognition data such as voice inventories/unit databases, parametric synthesis configuration data, or other TTS data specific to the commands (such as frequently input commands) to be handled by the local device 110 and/or specific to user(s) associated with the local device 110. The TTS component 906 may also include a selection of pre-stored TTS output (such as already selected units, already synthesized speech, or the like) that correspond to canned responses to commands that may be processed locally. For example, if a local device is configured to handle utterances related to a local thermostat, the TTS component 906 may include pre-synthesized speech along the lines of “your thermostat has been changed,” thus allowing the local device to output a TTS acknowledgement of the locally-handled utterance without needing to communicate with the server for TTS purposes.”), wherein each user query and command of the set of user queries and commands has at least on corresponding intention of the set of intentions (DEVRIES Col 11:14-29 – “The IC component 464 may communicate with a database 278 of words linked to intents. For example, a music intent database may link words and phrases such as “quiet,” “volume off,” and “mute” to a “mute” intent. The IC component 464 identifies potential intents by comparing words in the textual interpretation to the words and phrases in an intents database 278 associated with the domain that is associated with the recognizer 463 implementing the IC component 464.”), and wherein each intention of the set of intentions has at least one corresponding response of the set of response (DEVRIES Col 21:29-55 – “For example, if a local device is configured to handle utterances related to a local thermostat, the TTS component 906 may include pre-synthesized speech along the lines of “your thermostat has been changed,” thus allowing the local device to output a TTS acknowledgement of the locally-handled utterance without needing to communicate with the server for TTS purposes.”); 
determining, by the one or more processors, whether a user query or command of the set of user queries and commands meets a pre-defined [threshold of similarity to the text] (DEVRIES Fig. 10A – “Perform speech recognition processing on the input audio data using speech recognition processing data specific to frequently input commands 1002 -> ASR score threshold satisfied? 1004”; Col 22:17-37 – “The device 110 performs (1002) speech recognition processing on the input audio data using speech recognition processing data specific to frequently input commands. In performing speech recognition processing, the device 110 assigns a score to each determined textual interpretation potentially corresponding to the utterance. The device 110 determines (1004) whether a generated speech recognition score satisfies (e.g., meets or exceeds) a threshold. The threshold represents a system confidence that textual interpretations associated with scores satisfying the threshold in fact correspond to the input utterance.”; Col 5:13-38– “The speech recognition component 250 interprets the spoken utterance based on a similarity between the spoken utterance and pre-established language models. For example, the speech recognition component 250 may compare the audio data 211 with models for sounds (e.g., subword units or phonemes) and sequences of sounds to identify words that match the sequence of sounds spoken in the utterance represented in the audio data 211.”; Col 7:56-8:10 – “The confidence score may be based on a number of factors including, for example, a similarity of the sound in the utterance to models for language sounds (e.g., an acoustic model 353 stored in the speech recognition models storage 352), and a likelihood that a particular word that matches the sound would be included in the sentence at the specific location (e.g., using a language model 354 stored in the speech recognition models storage 352).”); 
responsive to determining that the user query or command meets the pre-defined [threshold of similarity] (DEVRIES Fig. 10A – “Perform speech recognition processing on the input audio data using speech recognition processing data specific to frequently input commands 1002 -> ASR score threshold satisfied? 1004”), identifying, by the one or more processors, an intention of the set of intentions that corresponds to the user query or command (DEVRIES Fig. 10A – “Perform natural language processing on input text data using natural language processing data specific to frequently input commands to determine a command 1020”; Col 14:64-15:10 – “Following final ranking, the natural language component 260 may output natural language output data 585. The natural language component 260 may be sent to the orchestrator component 230, which sends the natural language output data 585 to an appropriate application 290 (e.g., one configured to execute a command based on the textual interpretation represented in the natural language output data 585). The natural language output data 585 may include an indicator of the intent of the textual interpretation along with data associated with the intent, for example an indication that the intent is <PlayMusic> and the music to be played is “Adele.” Multiple instances of natural language output data (e.g., 585 a-585 n) may be output for a given set of text data input into the natural language component 260.”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the method/system of PATEL to include a threshold, as taught by DEVRIES.
One of ordinary skill would have been motivated to include a threshold, in order to provide top scoring results (DEVREIS Col 5).

REGARDING CLAIM 5, PATEL in view of DEVRIES discloses the computer-implemented method of claim 1, further comprising: 
responsive to determining that the user query or response does not meet the pre-defined [threshold of similarity] condition (PATEL Fig. 2 – “Translated text exists in lexicon cache? 208 -> No … ->”), sending, by the one or more processors, the transcription of the audio input [the audio input] to a natural language processing engine (PATEL Fig. 2 – “Transmitting translated text natural language processing service 212”; Par 85 – “In an embodiment, the textual representation of voice input received by a media device 110 may be processed using one or more natural language processing techniques in Step 212. In general, using natural language processing techniques to process the textual representation of voice input involves parsing the textual representations into word or word phrase tokens and categorizing the parsed tokens into one or more natural language component categories.”; Par 86 – “Par 86 – “In an embodiment, the textual representation of the voice input is transmitted to a natural language processing service 106.”); 
receiving, by one or more processors, a transcription of the audio input from the natural language processing engine in text form (PATEL Fig. 2 – “Transmitting translated text natural language processing service 212”; Par 85 – “In general, using natural language processing techniques to process the textual representation of voice input involves parsing the textual representations into word or word phrase tokens and categorizing the parsed tokens into one or more natural language component categories. For example, in an embodiment, natural language processing may include categorizing the text into one or more natural language components including noun and noun phrases, verb and verb phrases, pronouns, prepositions, etc. In an embodiment, based on the parsed and categorized representation of the textual representation of voice input, particular words or word phrases may be filtered out in order to formulate a more focused media content search query.”; Par 86 – “Natural language processing service 106 processes the textual representation using one or more of the natural language processing techniques described above and returns a version of the textual representation that may include one or more modifications. In an embodiment, the modified textual representation and any other metadata associated with the natural language processing process may be stored in natural language processing cache 114 in association with the input textual representation.”); and 
determining, by the one or more processors, whether there is a corresponding intention (PATEL Par 32 – “In an embodiment, the textual representation of the voice input is used to search a device lexicon cache storing one or more mappings between textual representations of media device commands and one or more device actions. For example, device actions may include actions performed locally on a media device such as changing the channel, scheduling a media content recording, listing recorded content, etc., or the device actions may be requests transmitted to other services.”; Par 77 – “A device lexicon cache is a repository of sampled voice data and/or words and word phrases that are mapped to one or more device actions, media search queries, or other commands related to applications running on a media device 110. For example, entries in a device lexicon cache 112 may include frequently used commands and phrases including “pause,” “live TV,” “volume up,” “play my favorite show,” etc. In an embodiment, if a lexicon cache 112 includes a cache entry corresponding to the textual representation of voice input received from a user, then the action or media search query stored in association with the cache entry may be processed automatically by media device 110. In another embodiment, an action associated with a cache entry may be presented for user confirmation prior to performing the action to ensure that the user intended to execute the identified action.”) and a corresponding response (PATEL Par 87 – “In an embodiment, search queries generated by media device 110 may be used to search for media content item results and associated information including, but not limited to, media content program titles, media content scheduling information, media device application content, or tags associated with media content.”; Par 90 – “Search results may include one or more content item listings and any other additional data associated with media content represented by the content item listings. For example, the search results may include information associated with one or more content item listings including title information, synopsis, scheduling information, actor or actress names, etc.”) in the local database to transcription (PATEL Par 44 – “Media search service 108 may additionally include one or more Internet search engines. In an embodiment, some search results may be cached on media device 110 from data from the media search service 108 so that searches may be performed at the client when a connection to media search service 108 is unavailable.”).
PATEL does not explicitly teach the [square-bracketed] and teaches underlined features instead.

DEVRIES discloses the [square-bracketed] limitations.  DEVIRES discloses a method/system for speech processing comprising: responsive to determining that the user query or response does not meet the pre-defined [threshold of similarity] (DEVRIES Figs. 10A-10B – “ASR score threshold satisfied 1004? … NLU score threshold satisfied 1021? …”; Col 22:39-58 – “If the device 110 determines one or more of the generated speech recognition scores satisfy the threshold (1004: Yes) (e.g., representing that the input utterance corresponds to a frequently input command), the device 110 performs (1020) natural language processing on the input text data using natural language processing specific to the frequently input commands to determine a command corresponding to the input utterance.”; Col 23:11-17 – “If the device 110 determines none of the generated speech recognition scores satisfy the ASR threshold (1004: No) (e.g., representing that the input utterance does not correspond to a frequently input command), or none of the generated NLU scores satisfy the NLU threshold (1021: No) the device 110 (referring to FIG. 10B) sends (1006) the input audio data to the server(s) 120.”), sending, by the one or more processors, [the audio input] to a natural language processing engine (DEVRIES Col 23:11-17 – “If the device 110 determines none of the generated speech recognition scores satisfy the ASR threshold (1004: No) (e.g., representing that the input utterance does not correspond to a frequently input command), or none of the generated NLU scores satisfy the NLU threshold (1021: No) the device 110 (referring to FIG. 10B) sends (1006) the input audio data to the server(s) 120.”; Col 23:18-34 – “The server(s) 120 performs (1008) speech recognition processing on the input audio data using speech recognition data associated with all system inputtable commands to generate input text data. The server(s) 120 performs (1010) natural language processing on the input text data to determine a command corresponding to the utterance.”);
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the method/system of PATEL to include sending the audio input to a natural language processing engine, as taught by DEVRIES.
One of ordinary skill would have been motivated to include sending the audio input to a natural language processing engine, in order to accurately interpret a user utterance.


REGARDING CLAIM 6, PATEL in view of DEVRIES discloses the computer-implemented method of claim 5, further comprising: 
responsive to determining that the transcription has a corresponding response of the set of responses (PATEL Par 87 – “In an embodiment, search queries generated by media device 110 may be used to search for media content item results and associated information including, but not limited to, media content program titles, media content scheduling information, media device application content, or tags associated with media content.”; Par 90 – “Search results may include one or more content item listings and any other additional data associated with media content represented by the content item listings. For example, the search results may include information associated with one or more content item listings including title information, synopsis, scheduling information, actor or actress names, etc.”) in the local database (PATEL Par 44 – “Media search service 108 may additionally include one or more Internet search engines. In an embodiment, some search results may be cached on media device 110 from data from the media search service 108 so that searches may be performed at the client when a connection to media search service 108 is unavailable.”), outputting, by the one or more processors, the corresponding response audibly (PATEL Par 95 – “In an embodiment, after media device 110 applies any filters, weighting, or other modifications to the list of content item results received from media search service 108, the content items results may be displayed to a user in Step 306.”; Par 39 – “The media device 110 may present media content by playing the media content (e.g., audio and/or visual media content), displaying the media content (e.g., still images), or by any other suitable means.”).


REGARDING CLAIM 8, PATEL in view of DEVRIES discloses a computer program product comprising: one or more computer-readable storage media and program instructions stored on the one or more computer-readable storage media, the program instructions (PATEL Par 102 – “Computer system 400 also includes a main memory 406, such as a random access memory (RAM) or other dynamic storage device, coupled to bus 402 for storing information and instructions to be executed by processor 404. Main memory 406 also may be used for storing temporary variables or other intermediate information during execution of instructions to be executed by processor 404.”) comprising: 
program instructions to perform the steps of Claim 1; thus, it is rejected under the same rationale.

CLAIM 12 is the computer program product similar to the method of Claim 5; thus, it is rejected under then same rationale.

CLAIM 13 is the computer program product similar to the method of Claim 6; thus, it is rejected under then same rationale.


REGARDING CLAIM 15, PATEL in view of PRYAKHIN discloses a computer system comprising: one or more computer processors; one or more computer-readable storage media; program instructions stored on the computer-readable storage media for execution by at least one of the one or more processors, the program instructions (PATEL Par 102 – “Computer system 400 also includes a main memory 406, such as a random access memory (RAM) or other dynamic storage device, coupled to bus 402 for storing information and instructions to be executed by processor 404. Main memory 406 also may be used for storing temporary variables or other intermediate information during execution of instructions to be executed by processor 404.”) comprising: 
program instructions to perform the steps of Claim 1; thus, it is rejected under the same rationale.

CLAIM 18 is the computer program product similar to the method of Claim 5; thus, it is rejected under then same rationale.

CLAIM 19 is the computer program product similar to the method of Claim 6; thus, it is rejected under then same rationale.



Claims 2-4, 9-11, and 16-17 are rejected under 35 U.S.C. 103 as being unpatentable over PATEL (US 2014/0122059 A1) in view of DEVRIES (US 10,832,668 B1), and further in view of PRYAKHIN (US 2012/0215806 A1).

REGARDING CLAIM 2, PATEL in view of DEVRIES discloses the computer-implemented method of claim 1.
PATEL in view of DEVIRES does not explicitly teach Jaro-Winkler distance algorithm.
PRYAKHIN further discloses wherein the phonetic algorithm is Jaro-Winkler distance algorithm (PRYAKHIN Par 12 – “The distance metric for objects which are strings may be selected from any one of a plurality of string metrics available. For example, the distance metric may be based on a Levenshtein distance. The distance function may also be any one of a Damerau-Levenshtein distance, a Jaro-Winkler distance, a Hamming distance, a distance determined in accordance with the Soundex distance metric, a Needleman-Wunsch distance, a Gotoh distance, a Smith-Waterman-Gotoh distance, a Lp distance with p≧1 or any other string metric which complies with the postulates of reflexivity, symmetry and triangle inequality, or any other distance metric”).
It is always desired to find the most appropriate algorithm. There are a finite number of phonetic distance algorithms, and it would have been obvious to one of ordinary skill in art before the effective filing date of the claimed invention to try to pick a Jaro-Winkler distance algorithm and incorporate it into the method/system of PATEL since there are a finite number of identified, predictable potential solutions (i.e. types of distance algorithms in PRYAKHIN Par 12) to the recognized need (measuring a phonetic distance between two strings) and one of ordinary skill in the art could have pursued the known potential solutions with a reasonable expectation of success.


REGARDING CLAIM 3, PATEL in view of DEVRIERS and PRYAKHIN discloses the computer-implemented method of claim 2, wherein comparing the text to the set of user queries and commands using the phonetic algorithm (PATEL Par 77 – “In an embodiment, the textual representation of voice input received from a user is compared against a set of reserved input text strings or sampled voice data stored in a device lexicon cache 112 in Step 208. A device lexicon cache is a repository of sampled voice data and/or words and word phrases that are mapped to one or more device actions, media search queries, or other commands related to applications running on a media device 110. For example, entries in a device lexicon cache 112 may include frequently used commands and phrases including “pause,” “live TV,” “volume up,” “play my favorite show,” etc. In an embodiment, if a lexicon cache 112 includes a cache entry corresponding to the textual representation of voice input received from a user, then the action or media search query stored in association with the cache entry may be processed automatically by media device 110.”) comprises: 
calculating, by the one or more processors, a set of Jaro-Winkler distances (PRYAKHIN Par 12 – “The distance metric for objects which are strings may be selected from any one of a plurality of string metrics available. For example, the distance metric may be based on a Levenshtein distance. The distance function may also be any one of a Damerau-Levenshtein distance, a Jaro-Winkler distance, a Hamming distance, a distance determined in accordance with the Soundex distance metric, a Needleman-Wunsch distance, a Gotoh distance, a Smith-Waterman-Gotoh distance, a Lp distance with p≧1 or any other string metric which complies with the postulates of reflexivity, symmetry and triangle inequality, or any other distance metric”), wherein the set of Jaro-Winkler distances comprises a Jaro-Winkler distance between the text and each user query or command of the set of user queries and commands (PRYAKHIN Par 82 – “To perform the similarity search, distances between the query object and objects in the index structure are determined according to the distance metric. At step 37, the most relevant objects found in the search are output. The most relevant objects may be the k>1 nearest neighbours of the query object, i.e. the k objects having the smallest distances from the query object among the indexed objects, the distances being determined according to the distance metric. The most relevant objects may alternatively be all objects in the index structure which have a distance from the query object which is less than a predetermined threshold.”).
It is always desired to find the most appropriate algorithm. There are a finite number of phonetic distance algorithms, and it would have been obvious to one of ordinary skill in art before the effective filing date of the claimed invention to try to pick a Jaro-Winkler distance algorithm and incorporate it into the method/system of PATEL since there are a finite number of identified, predictable potential solutions (i.e. types of distance algorithms in PRYAKHIN Par 12) to the recognized need (measuring a phonetic distance between two strings) and one of ordinary skill in the art could have pursued the known potential solutions with a reasonable expectation of success.

REGARDING CLAIM 4, PATEL in view of DEVRIERS and PRYAKHIN discloses the computer-implemented method of claim 3, wherein determining whether the user query or command meets the pre-defined threshold of similarity to the text (PRYAKHIN Fig. 5 – “Receive input 34-> Conversion to phoneme string 35 -> Perform similarity search 36 …”; Par 82 – “The index structure may be configured as described with reference to FIGS. 2-4 above. To perform the similarity search, distances between the query object and objects in the index structure are determined according to the distance metric. At step 37, the most relevant objects found in the search are output. The most relevant objects may be the k>1 nearest neighbours of the query object, i.e. the k objects having the smallest distances from the query object among the indexed objects, the distances being determined according to the distance metric. The most relevant objects may alternatively be all objects in the index structure which have a distance from the query object which is less than a predetermined threshold.”; Par 9 – “Examples include the search for objects having a dissimilarity, measured as distance according to the distance metric, which is less than a fixed threshold, or the search for objects having the smallest dissimilarity, measured as distance according to the distance metric, from the query object among the indexed objects.”) comprises: 
determining, by the one or more processors, whether at least one of the set of Jaro-Winkler distances meets the pre-defined threshold of similarity (PRYAKHIN Par 12 – “The distance metric for objects which are strings may be selected from any one of a plurality of string metrics available. For example, the distance metric may be based on a Levenshtein distance. The distance function may also be any one of a Damerau-Levenshtein distance, a Jaro-Winkler distance, a Hamming distance, a distance determined in accordance with the Soundex distance metric, a Needleman-Wunsch distance, a Gotoh distance, a Smith-Waterman-Gotoh distance, a Lp distance with p≧1 or any other string metric which complies with the postulates of reflexivity, symmetry and triangle inequality, or any other distance metric”; Par 82 – “To perform the similarity search, distances between the query object and objects in the index structure are determined according to the distance metric. At step 37, the most relevant objects found in the search are output. The most relevant objects may be the k>1 nearest neighbours of the query object, i.e. the k objects having the smallest distances from the query object among the indexed objects, the distances being determined according to the distance metric. The most relevant objects may alternatively be all objects in the index structure which have a distance from the query object which is less than a predetermined threshold.”).
It is always desired to find the most appropriate algorithm. There are a finite number of phonetic distance algorithms, and it would have been obvious to one of ordinary skill in art before the effective filing date of the claimed invention to try to pick a Jaro-Winkler distance algorithm and incorporate it into the method/system of PATEL since there are a finite number of identified, predictable potential solutions (i.e. types of distance algorithms in PRYAKHIN Par 12) to the recognized need (measuring a phonetic distance between two strings) and one of ordinary skill in the art could have pursued the known potential solutions with a reasonable expectation of success.

CLAIM 9 is the computer program product similar to the method of Claim 2; thus, it is rejected under then same rationale.

CLAIM 10 is the computer program product similar to the method of Claim 3; thus, it is rejected under then same rationale.

CLAIM 11 is the computer program product similar to the method of Claim 4; thus, it is rejected under then same rationale.

CLAIM 16 is the computer program product similar to the method of Claim 3; thus, it is rejected under then same rationale.

CLAIM 17 is the computer program product similar to the method of Claim 4; thus, it is rejected under then same rationale.



Claims 7, 14 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over PATEL (US 2014/0122059 A1) in view of DEVRIES (US 10,832,668 B1), and further in view of SHTILMAN (US 2014/0250097 A1).

REGARDING CLAIM 7, PATEL in view of DEVRIES discloses the computer-implemented method of claim 1, further comprising: 
updating, by one or more processors, one or more files of a set of files in the local database of the computing device with at least one of new text and audio inputs (PATEL Par 78 – “In an embodiment, device lexicon cache entries may be manually added and modified by a user in order to express personalized voice input commands. In another embodiment, one or more device lexicon cache entries may be created based on monitoring usage of a media device and automatically adding frequently used voice input/device action associations.”; Par 79 – “An imported device lexicon cache may either supplement an existing device lexicon cache or replace an existing device lexicon cache entirely. In another embodiment, user-specific device lexicon caches may be shared between different users on the same media device.”) [on a pre- configured time interval defined by the user].
PATEL in view of DEVRIES does not explicitly teach the [square-bracketed] limitations.

SHTILMAN discloses a method/system for indexing and searching reporting data comprising:
updating, by one or more processors, one or more files of a set of files in the local database of the computing device [on a pre-configured time interval defined by the user] (SHTILMAN Par 55 – “In an embodiment of the present invention, the indexing server 208 combines the reporting data together with dictionary items of a dictionary database 210. The dictionary items may include business definitions of reporting entities in an embodiment of the present invention. The reporting entities may include, but is not restricted to, element types (e.g., agent, skill etc.), index type (e.g., intraday, daily, weekly or monthly), reporting statistics (e.g., ACD calls, ACD abandoned calls etc.), time stamps (e.g., date and time), and mathematical operators (e.g., bigger than, smaller than etc.). In an embodiment of the present invention, the dictionary database 210 may get updated at predefined time interval that is configured based on predefined configurations of a user.”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the method/system of PATEL in view of DEVRIES to include periodically updating files, as taught by SHTILMAN.
One of ordinary skill would have been motivated to include periodically updating files, in order to obtain up-to-date data.

CLAIM 14 is the computer program product similar to the method of Claim 7; thus, it is rejected under then same rationale.

CLAIM 20 is the computer program product similar to the method of Claim 7; thus, it is rejected under then same rationale.


Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 



Any inquiry concerning this communication or earlier communications from the examiner should be directed to JONATHAN C KIM whose telephone number is (571)272-3327. The examiner can normally be reached Monday to Friday 8:00 AM thru 4:00 PM EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Andrew C Flanders can be reached on 571-272-7516. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/JONATHAN C KIM/Primary Examiner, Art Unit 2655