Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Priority
Acknowledgment is made of applicant’s claim for foreign priority under 35 U.S.C. 119 (a)-(d). The certified copy has been filed in parent Application No. GB1819658.4, filed on 12/03/2018.
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 11/25/2019 and 12/04/2020 are being considered by the examiner.
Drawings
The information drawing submitted on 11/25/2019 is being considered by the examiner.
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claim(s) 1-9, and 13-20 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Hoover (US 11,016968 B1).

Regarding Claims 1 and 18, Hoover teaches:  A content playback system comprising: a playback device, the playback device being configured to detect a voice command from a user and to play content; wherein the system is configured to (Col 2, lines 24-20, A speech-controlled computing system may answer user commands requesting the output of content. For example, a user may say "Computer, what is the weather?" In response, the system may output weather information. For further example, a user may say "Computer, play music from the 90's." In response, the system may output music from the 1990's. Col 5, lines 54-61, As shown in FIG. 1, the system 100 includes a computing device 110, a speech processing system 120, a skill 170, and a context aggregator system 138. Col 7, lines 16-21, In FIG. 1, device 110 may receive audio including a spoken utterance of a user via a microphone (or array of microphones) of the device 110. The device 110 may generate input audio data corresponding to the audio, and may send the input audio data to the speech processing system 120 for processing. Col 8, lines 45-48, A user command may correspond to a user request for the system to output content to the user. The requested content to be output may correspond to music, video, search results, weather information, etc.): analyse a voice command detected by the playback device to determine a user intent (command or intent) (Col 7, lines 27-48, The speech processing system 120 receives input data from a device 110. If the input data is the input audio data from the device 110, the speech processing system 120 performs speech recognition processing (e.g., ASR) on the input audio data to generate input text data. The speech processing system 120 performs natural language processing on input text data (either received from the device 110 or generated from the input audio data received from the device 110) to determine a user command. A user command may correspond to a user request for the system to output content to the user. The requested content to be output may correspond to music, video, search results, weather information, etc. Col 9, lines 35-55, Results of speech recognition processing (e.g., text data representing speech) are processed by a natural language component 259 of the speech-processing system 120. The natural language component 259 interprets a text string to derive an intent or a desired action from the user as well as the pertinent pieces of information in the text data that allow a device (e.g., the device 110, the speech processing system 120, etc.) to complete that action.); analyse the voice command to extract one or more entities from the voice command, wherein each of the extracted entities is of a type (for an intent of "play" may specify a list of slots/fields applicable to play the identified "object" and any object modifier (e.g., a prepositional phrase), such as [Artist Name], [Album Name], [Song name], etc.) associated with the determined user intent ; select an entity from the one or more extracted entities based on a set of conflict resolution rules; and controlling the playback device based on the selected entity (Col 10, lines 37- 40, In various examples, the natural language component 259 may include a recognizer that includes a named entity resolution (NER) component configured to parse and tag to annotate text as part of natural language processing. Col 11, lines 35-46, A downstream process called named entity resolution may link a text portion to an actual specific entity known to the system. To perform named entity resolution, the system may utilize gazetteer information stored in an entity library storage. The gazetteer information may be used for entity resolution, for example matching speech recognition results with different entities (e.g., song titles, contact names, etc.). Col 11, line 64 to Col 12, line 1, In order to generate a particular interpreted response, the NER component applies the grammar models and lexical information associated with the respective recognizer to recognize a mention of one or more entities in the text represented in the text data. Col 12, lines 26-35, For example, the NER component may parse the text data to identify words as subject, object, verb, preposition, etc., based on grammar rules and/or models, prior to recognizing named entities. The identified verb may be used by the IC component to identify intent, which is then used by the NER component to identify frameworks. A framework for an intent of "play" may specify a list of slots/fields applicable to play the identified "object" and any object modifier (e.g., a prepositional phrase), such as [Artist Name], [Album Name], [Song name], etc. Col 13, lines 16-23, So a framework for "play music intent" might indicate to attempt to resolve the identified object based on [Artist Name], [Album Name], and [Song name], and another framework for the same intent might indicate to attempt to resolve the object modifier based on [Artist Name], and resolve the object based on [Album Name] and [Song Name] linked to the identified [Artist Name]. Col 13, lines 52-57, For example, if the natural language processing results include a command to play music, the destination application 290 may be a music playing application, such as one located on the device 110 or in a music playing appliance, configured to execute a music playing command.).

Regarding Claims 2 and 19, Hoover teaches:  A content playback system according to claim 1, wherein the system is further configured to store a plurality of predefined user intents, and wherein the user intent is determined from the plurality of predefined user intents (Col 11, lines 55-63, Each recognizer is associated with a database of words linked to intents. For example, a music intent database may link words and phrases such as "quiet," "volume off," and "mute" to a "mute" intent. The IC component identifies potential intents by comparing words in the text data to the words and phrases in the intents database. Traditionally, the IC component determines using a set of rules or templates that are processed against the incoming text data to identify a matching intent. Col 12, lines 26-35, For example, the NER component may parse the text data to identify words as subject, object, verb, preposition, etc., based on grammar rules and/or models, prior to recognizing named entities. The identified verb may be used by the IC component to identify intent, which is then used by the NER component to identify frameworks. A framework for an intent of "play" may specify a list of slots/fields applicable to play the identified "object" and any object modifier (e.g., a prepositional phrase), such as [Artist Name], [Album Name], [Song name], etc.).

Regarding Claims 3 and 20, Hoover teaches:  A content playback system according to claim 2, wherein the system is further configured to: store, for each of the plurality of predefined user intents, a pattern (framework) associated with that predefined user intent; and determine that a predefined user intent is the user intent if a phrase in the voice command matches the pattern associated with that predefined user intent (Col 11, lines 55-63,  Each recognizer is associated with a database of words linked to intents. For example, a music intent database may link words and phrases such as "quiet," "volume off," and "mute" to a "mute" intent. The IC component identifies potential intents by comparing words in the text data to the words and phrases in the intents database. Traditionally, the IC component determines using a set of rules or templates that are processed against the incoming text data to identify a matching intent.   Col 12, lines 26-44,  For example, the NER component may parse the text data to identify words as subject, object, verb, preposition, etc., based on grammar rules and/or models, prior to recognizing named entities. The identified verb may be used by the IC component to identify intent, which is then used by the NER component to identify frameworks. A framework for an intent of "play" may specify a list of slots/fields applicable to play the identified "object" and any object modifier (e.g., a prepositional phrase), such as [Artist Name], [Album Name], [Song name], etc. The NER component then searches the corresponding fields in the domain-specific and personalized lexicon(s), attempting to match words and phrases in the text data tagged as a grammatical object or object modifier with those identified in the database(s). As used herein, "intent data" may correspond to the intent itself, framework(s) for the intent, slot(s)/field(s) corresponding to the intent, object modifier(s), any information associated with the intent/framework(s)/slot(s), or any combination thereof without departing from the disclosure.).

Regarding Claim 4, Hoover teaches: A content playback system according to claim 2, wherein the plurality of predefined user intents is stored as an ordered list, and wherein the system is configured to iteratively determine whether a predefined user intent is the user intent in the order in which the plurality of predefined user intents is stored (Col 11, lines 55-63, Each recognizer is associated with a database of words linked to intents. For example, a music intent database may link words and phrases such as "quiet," "volume off," and "mute" to a "mute" intent. The IC component identifies potential intents by comparing words in the text data to the words and phrases in the intents database. Traditionally, the IC component determines using a set of rules or templates that are processed against the incoming text data to identify a matching intent.   Col 12, lines 14-22, The intents identified by the IC component are linked to domain-specific grammar frameworks with "slots" or "fields" to be filled. Each slot/field corresponds to a portion of the text data that the system believes corresponds to an entity. For example, if "play music" is an identified intent, a grammar framework(s) may correspond to sentence structures such as "Play [Artist Name]," "Play [Album Name]," "Play [Song name]," "Play [Song name] by [Artist Name]," etc. Col 12, lines 26-44,  The NER component then searches the corresponding fields in the domain-specific and personalized lexicon(s), attempting to match words and phrases in the text data tagged as a grammatical object or object modifier with those identified in the database(s). Col 13, lines 16-26, So a framework for "play music intent" might indicate to attempt to resolve the identified object based on [Artist Name], [Album Name], and [Song name], and another framework for the same intent might indicate to attempt to resolve the object modifier based on [Artist Name], and resolve the object based on [Album Name] and [Song Name] linked to the identified [Artist Name]. If the search of the gazetteer does not resolve the slot/field using gazetteer information, the NER component may search a database of generic words associated with the domain. Col 13, lines 34-42, The results of natural language processing may be tagged to attribute meaning to the text data. So, for instance, "play mother's little helper by the rolling stones" might produce a result of: [domain] Music, [intent] Play Music, [artist name] "rolling stones," [media type] SONG, and [song title] "mother's little helper." As another example, "play songs by the rolling stones" might produce: [domain] Music, [intent] Play Music, [artist name] "rolling stones," and [media type] SONG.).

Regarding Claim 5, Hoover teaches: A content playback system according to claim 2, wherein the system is further configured to store, for each of the plurality of predefined user intents, one or more entity types associated with that predefined user intent (Col 11, lines 35-46, A downstream process called named entity resolution may link a text portion to an actual specific entity known to the system. To perform named entity resolution, the system may utilize gazetteer information stored in an entity library storage. The gazetteer information may be used for entity resolution, for example matching speech recognition results with different entities (e.g., song titles, contact names, etc. Col 12, lines 6-13, Each grammar model includes the names of entities (i.e., nouns) commonly found in speech about the particular domain (i.e., generic terms), whereas the lexical information from the gazetteer is personalized to the user(s) and/or the device. For instance, a grammar model associated with the shopping domain may include a database of words commonly used when people discuss shopping. Col 13, lines 34-42, The results of natural language processing may be tagged to attribute meaning to the text data. So, for instance, "play mother's little helper by the rolling stones" might produce a result of: [domain] Music, [intent] Play Music, [artist name] "rolling stones," [media type] SONG, and [song title] "mother's little helper." As another example, "play songs by the rolling stones" might produce: [domain] Music, [intent] Play Music, [artist name] "rolling stones," and [media type] SONG.).

Regarding Claim 6, Hoover teaches: A content playback system according to claim 5, wherein the system is further configured to: for a first entity type stored by the system, store a plurality of regular expressions associated with the first entity type; and extract an entity of the first entity type from the voice command by matching a phrase in the voice command with one of the plurality of regular expression associated with the first entity type( Col 12, lines 14-22, The intents identified by the IC component are linked to domain-specific grammar frameworks with "slots" or "fields" to be filled. Each slot/field corresponds to a portion of the text data that the system believes corresponds to an entity. For example, if "play music" is an identified intent, a grammar framework(s) may correspond to sentence structures such as "Play [Artist Name]," "Play [Album Name]," "Play [Song name]," "Play [Song name] by [Artist Name]," etc. Col 12, lines 26-44,  For example, the NER component may parse the text data to identify words as subject, object, verb, preposition, etc., based on grammar rules and/or models, prior to recognizing named entities. The identified verb may be used by the IC component to identify intent, which is then used by the NER component to identify frameworks. A framework for an intent of "play" may specify a list of slots/fields applicable to play the identified "object" and any object modifier (e.g., a prepositional phrase), such as [Artist Name], [Album Name], [Song name], etc. The NER component then searches the corresponding fields in the domain-specific and personalized lexicon(s), attempting to match words and phrases in the text data tagged as a grammatical object or object modifier with those identified in the database(s). Col 13, lines 13-23, The frameworks linked to the intent are then used to determine what database fields should be searched to determine the meaning of these phrases, such as searching a user's gazette for similarity with the framework slots. So a framework for "play music intent" might indicate to attempt to resolve the identified object based on [Artist Name], [Album Name], and [Song name], and another framework for the same intent might indicate to attempt to resolve the object modifier based on [Artist Name], and resolve the object based on [Album Name] and [Song Name] linked to the identified [Artist Name].   Col 13, lines 34-42, The results of natural language processing may be tagged to attribute meaning to the text data. So, for instance, "play mother's little helper by the rolling stones" might produce a result of: [domain] Music, [intent] Play Music, [artist name] "rolling stones," [media type] SONG, and [song title] "mother's little helper." As another example, "play songs by the rolling stones" might produce: [domain] Music, [intent] Play Music, [artist name] "rolling stones," and [media type] SONG.).

Regarding Claim 7, Hoover teaches: A content playback system according to claim 5, wherein the system is further configured to: for a second entity type (i.e. rolling stone) stored by the system, store a phrase structure associated with the second entity type; and extract an entity of the second type from the voice command by matching a phrase in the voice command with the phrase structure associated with the second entity type (see rejection of claim 6).
.

Regarding Claim 8, Hoover teaches: A content playback system according to claim 7, wherein the phrase structure has a song field and an artist field (See rejection of claim 6).

Regarding Claim 9, Hoover teaches: A content playback system according to claim 8, wherein the system is further configured to: determine an artist name from the phrase in the voice command, the artist name being a string in the phrase that is at a position corresponding to the artist field in the phrase structure; obtain from a database a list of song names associated with the artist name; and extract the entity of the second type by matching a song name in the list of song names with a string in the phrase that is at a position corresponding to the song field (See rejection of claim 6 specifically Col 13, lines 13-23, The frameworks linked to the intent are then used to determine what database fields should be searched to determine the meaning of these phrases, such as searching a user's gazette for similarity with the framework slots. So a framework for "play music intent" might indicate to attempt to resolve the identified object based on [Artist Name], [Album Name], and [Song name], and another framework for the same intent might indicate to attempt to resolve the object modifier based on [Artist Name], and resolve the object based on [Album Name] and [Song Name] linked to the identified [Artist Name].   Col 13, lines 34-42, The results of natural language processing may be tagged to attribute meaning to the text data. So, for instance, "play mother's little helper by the rolling stones" might produce a result of: [domain] Music, [intent] Play Music, [artist name] "rolling stones," [media type] SONG, and [song title] "mother's little helper." As another example, "play songs by the rolling stones" might produce: [domain] Music, [intent] Play Music, [artist name] "rolling stones," and [media type] SONG.).

Regarding Claim 13, Hoover teaches: A content playback system according to claim 1, wherein controlling the playback device includes playing content via the playback device (Col 2, lines 24-30, A speech-controlled computing system may answer user commands requesting the output of content. For example, a user may say "Computer, what is the weather?" In response, the system may output weather information. For further example, a user may say "Computer, play music from the 90's." In response, the system may output music from the 1990's.).

Regarding Claim 14, Hoover teaches: A content playback system according to claim 1, wherein controlling the playback device includes generating or adapting a content playlist, and playing the content playlist via the playback device (Col 2, lines 24-30, A speech-controlled computing system may answer user commands requesting the output of content. For example, a user may say "Computer, what is the weather?" In response, the system may output weather information. For further example, a user may say "Computer, play music from the 90's." In response, the system may output music from the 1990's. Col 4, lines 4-11, In addition to various speech processing components using contextual data, various speech processing components may generate contextual data. For example, a user may utter a spoken request that a particular song be added to a playlist. A music skill may add the song to the playlist. In various examples, an identifier for the song added to the playlist may represent contextual data for the device ID, account ID, IP address, or other entity. Col 11, lines 4-12, For example, a music gazetteer may include one or more long vectors, each representing a particular group of musical items (such as albums, songs, artists, etc.) where the vector includes positive bit values for musical items that belong in the user's approved music list. Thus, for a song gazetteer, each bit may be associated with a particular song, and for a particular user's song gazetteer the bit value may be 1 if the song is in the particular user's music list.).

Regarding Claim 15, Hoover teaches: A content playback system according to claim 1, further comprising a content server configured to store content that is playable by the playback device (See rejection of claim 14).

Regarding Claim 16, Hoover teaches: A content playback system according to claim 1, further comprising a controller separate from the playback device, the controller (Fig.1, speech processing system 120) being configured to control the playback device (Col 2, lines 24-30, A speech-controlled computing system may answer user commands requesting the output of content. For example, a user may say "Computer, what is the weather?" In response, the system may output weather information. For further example, a user may say "Computer, play music from the 90's." In response, the system may output music from the 1990's. Col 8, lines  14-21, The speech processing system 120 sends back to the initiating device (110) output data including the output content responsive to the user command. The device (110) may emit the output data as audio, present the output data on a display, or perform some other operation responsive to the user command.).

Regarding Claim 17, Hoover teaches:  A content playback system according to claim 1, wherein the playback device comprises a speaker for playing audio content ( Col 2, lines 24-30, A speech-controlled computing system may answer user commands requesting the output of content. For example, a user may say "Computer, what is the weather?" In response, the system may output weather information. For further example, a user may say "Computer, play music from the 90's." In response, the system may output music from the 1990's. Col 3, lines 20-33, The invocation of a skill by a user's utterance may include a request that an action be taken. That request can be transmitted to a control system that will cause that action to be executed. For example, the user's utterance may be, "Computer, turn on the living room lights." In response, instructions may be sent to a "smart home" system to turn on the lights in the user's living room. Examples of skills include voice-enabled applications invoked by the Siri virtual personal assistant from Apple Inc. of Cupertino, Calif., voice-enabled actions invoked by the Google Assistant virtual personal assistant from Google LLC of Mountain View, Calif., or voice-enabled skills invoked by the Alexa virtual personal assistant from Amazon.com, Inc. of Seattle, Wash.  Col 8, lines 16-19, The device (110) may emit the output data as audio, present the output data on a display, or perform some other operation responsive to the user command. Col 18, lines 57-59, The solicitation may take the form of text output via a display of a user device or audio output by a speaker of a user device.).

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 10-12 are rejected under 35 U.S.C. 103 as being unpatentable over Hoover.

Regarding Claim 10, Hoover teaches:  Col 11, lines 35-41, A downstream process called named entity resolution may link a text portion to an actual specific entity known to the system. To perform named entity resolution, the system may utilize gazetteer information stored in an entity library storage. The gazetteer information may be used for entity resolution, for example matching speech recognition results with different entities (e.g., song titles, contact names, etc.). Col 11, line 64 to Col 12, line 8, In order to generate a particular interpreted response, the NER component applies the grammar models and lexical information associated with the respective recognizer to recognize a mention of one or more entities in the text represented in the text data. In this manner the NER component identifies "slots" (i.e., particular words in text data) that may be needed for later command processing. Depending on the complexity of the NER component, it may also label each slot with a type (e.g., noun, place, city, artist name, song name, or the like). Each grammar model includes the names of entities (i.e., nouns) commonly found in speech about the particular domain (i.e., generic terms), whereas the lexical information from the gazetteer is personalized to the user(s) and/or the device. Col 12, lines 26-39, For example, the NER component may parse the text data to identify words as subject, object, verb, preposition, etc., based on grammar rules and/or models, prior to recognizing named entities. The identified verb may be used by the IC component to identify intent, which is then used by the NER component to identify frameworks. A framework for an intent of "play" may specify a list of slots/fields applicable to play the identified "object" and any object modifier (e.g., a prepositional phrase), such as [Artist Name], [Album Name], [Song name], etc. The NER component then searches the corresponding fields in the domain-specific and personalized lexicon(s), attempting to match words and phrases in the text data tagged as a grammatical object or object modifier with those identified in the database(s). As used herein, "intent data" may correspond to the intent itself, framework(s) for the intent, slot(s)/field(s) corresponding to the intent, object modifier(s), any information associated with the intent/framework(s)/slot(s), or any combination thereof without departing from the disclosure. Col 12, line 63 to Col 13, line 12, For instance, a query of "play mother's little helper by the rolling stones" might be parsed and tagged as [Verb]: "Play," [Object]: "mother's little helper," [Object Preposition]: "by," and [Object Modifier]: "the rolling stones." At this point in the process, "Play" is identified as a verb based on a word database associated with the music domain, which the IC component will determine corresponds to the "play music" intent. Additionally, in at least some examples, probability data generated by shortlister component 241 may indicate a high likelihood that the "play music" intent is appropriate as the highest probability applications for the user utterance correspond to music applications. At this stage, no determination has been made as to the meaning of "mother's little helper" and "the rolling stones," but based on grammar rules and models, it is determined that the text of these phrases relate to the grammatical object (i.e., entity) of the text data. Col 13, lines 13-23, So a framework for "play music intent" might indicate to attempt to resolve the identified object based on [Artist Name], [Album Name], and [Song name], and another framework for the same intent might indicate to attempt to resolve the object modifier based on [Artist Name], and resolve the object based on [Album Name] and [Song Name] linked to the identified [Artist Name].  Col 13, lines 34-42, The results of natural language processing may be tagged to attribute meaning to the text data. So, for instance, "play mother's little helper by the rolling stones" might produce a result of: [domain] Music, [intent] Play Music, [artist name] "rolling stones," [media type] SONG, and [song title] "mother's little helper." As another example, "play songs by the rolling stones" might produce: [domain] Music, [intent] Play Music, [artist name] "rolling stones," and [media type] SONG.
Hoover however, do not specifically teaches, “wherein the set of conflict resolution rules are set such that, when the one or more extracted entities includes two or more overlapping entities, the longest entity of the overlapping entities is selected.”
According to the applicant specification the selection of longest entity of the overlapping entities explanation given in pages 14 line 29 to page 15 line 4 are as “The set of conflict resolution rules may be set such that, when the extracted list of entities includes two or more overlapping entities, the longest entity of the overlapping entities is selected. The system may thus be configured to select the overlapping entity having the most characters. Thus, in the example mentioned above, where the extracted list of entities includes "The Moody Blues" and "blues", the conflict resolution rules may cause the system to select the entity "The Moody Blues", as it is the longer of the two. This may enable accurate interpretation of the user’s   voice command.
However the applicant claimed conflict resolution method of “when the one or more extracted entities includes two or more overlapping entities, the longest entity of the overlapping entities is selected”  for the purpose of interpreting  an appropriate entity from the voice command, would be obvious in Hoover teaching since, Named Entity Resolution(NER) component “parse the text data to identify words as subject, object, verb, preposition, etc., based on grammar rules and/or models, prior to recognizing named entities” and further based on grammar rules and models, determined that the text of these phrases relate to the grammatical object (i.e., entity) of the text data. 
 Again NER component parsing therefore will obviously include “the longest entity of the overlapping entities” when the words in the longest entity together requires to identify an entity, and since the parsing by the NER component is based on grammar rules and/or models as discussed above in order to identify an appropriate entity from the utterance such as the meaning of "mother's little helper" and "the rolling stones,"  . 
Therefore it is a simple substitution of one known element for another to obtain predictable results of identifying and selecting an appropriate entity when extracted list of entities include multiple overlapping entity having longest character in a command and/or utterance.

Regarding Claim 11, Hoover teaches:  Col 9, line 14-41, As noted above, in traditional natural language processing, text data may be processed applying the rules, models, and information applicable to each identified domain. For example, if text represented in text data potentially implicates both communications and music, the text data may, substantially in parallel, be natural language processed using the grammar models and lexical information for communications, and natural language processed using the grammar models and lexical information for music. The responses based on the text data produced by each set of models is scored, with the overall highest ranked result from all applied domains being ordinarily selected to be the correct result.   A downstream process called named entity resolution may link a text portion to an actual specific entity known to the system. To perform named entity resolution, the system may utilize gazetteer information stored in an entity library storage. The gazetteer information may be used for entity resolution, for example matching speech recognition results with different entities (e.g., song titles, contact names, etc.). Col 11, line 64 to Col 12 line 10, In order to generate a particular interpreted response, the NER component applies the grammar models and lexical information associated with the respective recognizer to recognize a mention of one or more entities in the text represented in the text data. In this manner the NER component identifies "slots" (i.e., particular words in text data) that may be needed for later command processing. Depending on the complexity of the NER component, it may also label each slot with a type (e.g., noun, place, city, artist name, song name, or the like). Each grammar model includes the names of entities (i.e., nouns) commonly found in speech about the particular domain (i.e., generic terms), whereas the lexical information from the gazetteer is personalized to the user(s) and/or the device. Col 12, lines 26-29, For example, the NER component may parse the text data to identify words as subject, object, verb, preposition, etc., based on grammar rules and/or models, prior to recognizing named entities.
Hoover does not specifically teaches: wherein the set of conflict resolution rules are set such that, when the one or more extracted entities includes two or more overlapping entities, one of the overlapping entities having a prioritized entity type is selected.
However based on Hoover teaching the limitation “when the one or more extracted entities includes two or more overlapping entities, one of the overlapping entities having a prioritized entity type is selected” would be obvious since NER component applies the grammar models and lexical information associated with the respective recognizer to recognize a mention of one or more entities in the text represented in the text data and each grammar model includes the names of entities (i.e., nouns) commonly found in speech. The text data produced by each set of models is scored, with the overall highest ranked result from all applied domains being ordinarily selected to be the correct result and further named entity resolution is performed, by matching speech recognition results with gazetteer information which stores different entities (e.g., song titles, contact names, etc.) in an entity library storage.
Therefore it is a simple substitution of one known element for another to obtain predictable results of identifying and selecting an appropriate prioritized entity when extracted list of entities include multiple overlapping entity.

Regarding Claim 12, Hoover teaches: A content playback system according to claim 1, wherein selecting an entity includes identifying word boundaries (Slots/Fields) in the voice command (Col 11, lines 58-63, The IC component identifies potential intents by comparing words in the text data to the words and phrases in the intents database. Traditionally, the IC component determines using a set of rules or templates that are processed against the incoming text data to identify a matching intent. Col 12, lines 14-39, The intents identified by the IC component are linked to domain-specific grammar frameworks with "slots" or "fields" to be filled. Each slot/field corresponds to a portion of the text data that the system believes corresponds to an entity. For example, if "play music" is an identified intent, a grammar framework(s) may correspond to sentence structures such as "Play [Artist Name]," "Play [Album Name]," "Play [Song name]," "Play [Song name] by [Artist Name]," etc. However, to make resolution more flexible, these frameworks would ordinarily not be structured as sentences, but rather based on associating slots with grammatical tags. For example, the NER component may parse the text data to identify words as subject, object, verb, preposition, etc., based on grammar rules and/or models, prior to recognizing named entities. The identified verb may be used by the IC component to identify intent, which is then used by the NER component to identify frameworks. A framework for an intent of "play" may specify a list of slots/fields applicable to play the identified "object" and any object modifier (e.g., a prepositional phrase), such as [Artist Name], [Album Name], [Song name], etc. The NER component then searches the corresponding fields in the domain-specific and personalized lexicon(s), attempting to match words and phrases in the text data tagged as a grammatical object or object modifier with those identified in the database(s). Col 13, lines 26-30,  “For example, if the text data corresponds to "play songs by the rolling stones," after failing to determine an album name or song name called "songs" by "the rolling stones," the NER component may search the domain vocabulary for the word "songs.").
Hoover does not specifically teaches, “discarding entities which do not start and end at identified word boundaries”.
However the limitation is obvious in the teaching of Hoover above, where NER perform different slots/fields  search when a slot/fields corresponds to an entity that does not match with database slots/fields associated with an entity (Col 13, lines 26-30,  “For example, if the text data corresponds to "play songs by the rolling stones," after failing to determine an album name or song name called "songs" by "the rolling stones," the NER component may search the domain vocabulary for the word "songs.").
Therefore it is a simple substitution of one known element for another to obtain predictable results of discarding entities which do not start and end at identified word boundaries.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Elangovan et al. (US 10089981 B1) teach: (Abstract) Methods and systems for performing contact resolution are described herein. When initiating a communications session using a voice activated electronic device, a contact name may be resolved to determine an appropriate contact with which the communications session may be directed to. Contacts from an individual's contact list may be queried to determine a listing of probable contacts associated with the contact name, and contact identifiers associated with the contact may be determined. Using one or more rules for disambiguating between similar contact names, a single contact may be identified, and a communications session with that contact may be initiated.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MOHAMMAD K ISLAM whose telephone number is (571)270-5878.  The examiner can normally be reached on Monday -Friday, EST (IFP).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh Mehta can be reached on 571-272-7453.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/MOHAMMAD K ISLAM/Primary Examiner, Art Unit 2656