DETAILED ACTION
This office action is in response to the above identified application filed on July 18, 2022. The application contains claims 1-50:
Claims 21-50 were previously cancelled in the preliminary amendment
Claims 2, 9, 10, 12, 19, and 20 are cancelled
Claims 1 and 11 are amended
Claims 1, 3-8, 11, and 13-18 are pending

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statement (IDS) was submitted on July 18, 2022. The submission is in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement is being considered by the examiner.

Response to Arguments
Applicant's arguments and amendments filed on July 18, 2022 have been fully considered and the objections and rejections are updated accordingly. 

Non-Statutory Double Patenting
	The nonstatutory double patenting rejection is maintained (not held in abeyance). As pointed out in MPEP 804(I)(B)(1), “A complete response to a nonstatutory double patenting (NSDP) rejection is either a reply by applicant showing that the claims subject to the rejection are patentably distinct from the reference claims or the filing of a terminal disclaimer”. Applicant should respond in that manner only.

Claim Rejections - 35 USC § 103
Applicant’s arguments with respect to the new limitations introduced with the amendments are addressed with new rationale. 
Please refer to the updated 35 U.S.C. 103 rejections as set forth below for details.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 1, 3-8, 11, and 13-18 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
The 2019 PEG guidance for subject matter eligibility is applied in the following analyses:

At Step 1
The inventions of claims 1, 3-8, 11, and 13-18 are directed to the statutory categories of a process (claims 1 and 3-8) and machine (claims 11 and 13-18). Thus, the claimed invention is directed to statutory subject matter.

At Step 2A, Prong One
The claimed invention is directed to mental processes without significantly more. Claims 1 and 11 recite abstract ideas in the following limitations:
"extracting … one or more keywords from the voice query” recites a mental process as an evaluation or judgement of the important (i.e. key) words in audio. One listening to speech or audio can mentally evaluate that certain words heard are “important”, consistent with the spec at [0051]. 
“determining … pronunciation information for the one or more keywords, wherein the pronunciation information comprises a phonetic spelling of each of the one or more keywords” recites a mental process as an evaluation or judgement of the phonetic representation or alternative spellings of the keywords. One hearing a key word can mentally (or mentally with pen and paper) evaluate the word as having a phonetic or alternative spelling that explains how the word is pronounced, consistent with the spec at [0014] and [0053]. This would include for instance, the mental process taught in elementary school to determine if a heard word has a long or short vowel sound. 
“generating … a text query comprising the one or more keywords and the pronunciation information” recites a mental process as an evaluation or judgement as to how to use the keyword and pronunciation information as query text. One can mentally evaluate or judge that the spoken word and pronunciation information relate to a specific spelling and determine the query text comprising the pronunciation. Consistent with the spec at [0019], [0044], and [0054] this would include for example determining audio keyword pronounced ‘loo-ihs’ should be searched as text query “Lewis” for a certain query context.
“identifying an entity among a plurality of entities of a database based on the text query and stored metadata for the entity, wherein the metadata comprises a pronunciation tag comprising a phonetic spelling for the entity” recites a mental process as an evaluation (i.e. comparison) of the text query and pronunciation information to stored data and pronunciation tags. Consistent with the spec at [0055]-[0057] one can mentally evaluate/compare that an entity matches the text of the query and further compare/evaluate if the pronunciation information also matches. Humans mentally perform this regularly when searching content by name and knowing specific individuals pronounce their names differently to distinguish which content is most relevant.

At Step 2A, Prong Two 
This judicial exception is not integrated into a practical application because the claims recite the additional elements of:
“an audio interface” and “control circuitry” constitute a high-level recitation of a generic computer components and represent mere instructions to apply on a computer, see MPEP 2106.05(f). 
“receiving a voice query” constitutes preliminary data gathering, see MPEP 2106.05(g).
“retrieving a content item associated with the entity” constitutes preliminary data gathering, see MPEP 2106.05(g) or as mere instruction to ‘apply it’ under MPEP 2106.05(f).

Even when viewed in combination, these additional elements do not integrate the recited judicial exception into a practical application and the claim is directed to the judicial exception.

At Step 2B
Claims 1 and 11 do not include additional elements that are sufficient to amount to significantly more than the judicial exception because as discussed above the additional elements constitute a high-level recitation of a generic computer components which represent mere instructions to apply on a computer and preliminary data gathering. As identified by courts retrieving and receiving data are well-understood, routine, and conventional activities, see MPEP 2106.05(d). [Receiving or transmitting data over a network, e.g., using the Internet to gather data, Symantec, 838 F.3d at 1321, 120 USPQ2d at 1362 (utilizing an intermediary computer to forward information); Storing and retrieving information in memory, Versata Dev. Group, Inc. v. SAP Am., Inc., 793 F.3d 1306, 1334, 115 USPQ2d 1681, 1701 (Fed. Cir. 2015); OIP Techs., 788 F.3d at 1363, 115 USPQ2d at 1092-93;].
Even when considered in combination, these additional elements do not provide an inventive concept or significantly more.
Therefore, claims 1 and 11 are rejected under 35 USC 101 as being directed to an abstract idea without significantly more.

Dependent claims 3-8 and 13-18 each recite abstract ideas elaborating on the further details of the “identifying” in the independent claims 1 and 11 that are still mentally performable.
Therefore, dependent claims 3-8 and 13-18 are also rejected under 35 USC 101 as being directed to an abstract idea without significantly more.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 3-8, 11, and 13-18 are rejected under 35 U.S.C. 103 as being unpatentable over Jang (US 20140359523 A1), in view of Ramos et al. (US 11157696 B1).

With regard to claim 1,
	Jang teaches
a method for responding to voice queries (Fig. 11; [0165]-[0169]; Fig. 12; [0170]-[0173]), the method comprising: 
receiving a voice query at an audio interface ([0166]; Fig. 1, microphone 122; [0047]: receive a voice query through an audio input component such as a microphone, wherein the microphone corresponds to “an audio interface”); 
extracting, using control circuitry (Fig. 1; [0073]-[0076]: controller 180 corresponds to “control circuity”), one or more keywords from the voice query ([0170]: identify a query term of the voice query, wherein a query term corresponds to “one or more keywords from the voice query”); 
determining, using the control circuitry, pronunciation information for the one or more keywords, wherein the pronunciation information comprises a phonetic spelling of each of the one or more keywords (Fig. 12; [0170]: determine pronunciation information of individual query terms by identifying phonemes from an audio input containing the voice query and matching the identified phonemes to the pronunciation expression of various query terms of the voice query, wherein pronunciation expression reads on “phonetic spelling”); 
generating, using the control circuitry, a text query comprising the one or more keywords and the pronunciation information (Fig. 11; [0167]-[0168]; Fig. 12; [0170]: convert the voice query to a text query, which identifies query terms from the voice query, determines pronunciation information for each query term, and converts each query term into a typical text query term using a voice query term database that links a range of pronunciation of terms to a typical query term. As a result, a text query is generated comprising the query terms and their pronunciations); 
	Jang does not explicitly teach
identifying an entity among a plurality of entities of a database based on the text query and stored metadata for the entity, wherein the metadata comprises a pronunciation tag comprising a phonetic spelling for the entity; and 
retrieving a content item associated with the entity.
Ramos teaches
identifying an entity among a plurality of entities of a database based on the text query and stored metadata for the entity, wherein the metadata comprises a pronunciation tag comprising a phonetic spelling for the entity (Fig. 1, 138-144; Col. 3, lines 61-67; Col. 4, lines 1-56: perform entity resolution based on the tagged portion of text data and the portion of audio data corresponding to the tagged portion of text by comparing the portion of audio data against audio data representing entities known to the system, wherein performing entity resolution corresponds to “identifying an entity”, the tagged portion of text data corresponds to “the text query”, and audio data representing entities known to the system corresponds to “a pronunciation tag” comprised in the “stored metadata for the entity”. Fig. 6; Col. 16, lines 42-67; Col. 17, lines 1-18: audio data comprises phonetic representation of text data, wherein the phonetic representation reads on "phonetic spelling". Fig. 6; Col. 16, lines 7-13; Fig. 8; Col. 19, lines 6-41: entity storage (608/706) corresponds to “a database” that contains “a plurality of entities” known to the system); and 
retrieving a content item associated with the entity (Fig. 1, 146; Col. 4, lines 57-62; Col. 2, lines 10-18: use the resolved entity to perform downstream processes. For example, for the user input of "Alexa, play Adele music," a system may output music sung by Adele, wherein output indicates “retrieving”, and music sung by Adele corresponds to “a content item associated with the entity” Adele).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Jang to incorporate the teachings of Ramos to identify an entity among a plurality of entities of a database based on the text query and stored metadata for the entity, wherein the metadata comprises a pronunciation tag and retrieve a content item associated with the entity. Doing so would improve text-based entity resolution by providing a language agnostic phonetic searching as part of entity resolution when text-based entity resolution may be unsuccessful or successful to a degree below a requisite threshold confidence as taught by Ramos (Col. 2, lines 48-67).

With regard to claim 3,
	As discussed in claim 1, Jang and Ramos teach all the limitations therein.
	Ramos further teaches
the method of claim 1, wherein identifying the entity is further based on user profile information (Col. 20, lines 13-19; Col. 7, lines 61-66: the phonetic entity resolution component 802 may consider user preferences, wherein user preferences corresponds to “user profile information”).

With regard to claim 4,
	As discussed in claim 3, Jang and Ramos teach all the limitations therein.
	Ramos further teaches
the method of claim 3, wherein the identifying the entity is based on a previously identified entity from a previous voice query (Col. 20, lines 13-19: the phonetic entity resolution component 802 may consider the user’s system usage history, wherein the user’s system usage history includes information on “a previously identified entity from a previous voice query”).

With regard to claim 5,
	As discussed in claim 1, Jang and Ramos teach all the limitations therein.
	Ramos further teaches
the method of claim 1, wherein identifying the entity is further based on popularity information associated with the entity (Col. 20, lines 13-19: the phonetic entity resolution component 802 may consider popularity of known entities).

With regard to claim 6,
	As discussed in claim 1, Jang and Ramos teach all the limitations therein.
	Ramos further teaches
the method of claim 1, wherein identifying the entity comprises: 
identifying the plurality of entities, wherein respective metadata is stored for each entity of the plurality of entities (Fig. 8; Col. 19, lines 27-41: generate an N-best list of known entities by performing phonetic matching of the audio data (representing the entity to be resolved) to audio data stored in the entity storage (608/706), wherein generating an N-best list of known entities corresponds to “identifying the plurality of entities” and audio data stored for each known entity on the N-best list corresponds to “respective metadata” stored for “each entity of the plurality of entities”), 
determining a respective score for each respective entity of the plurality of entities based on comparing the respective pronunciation tag with the text query (Fig. 8; Col. 19, lines 27-41: perform phonetic matching of the audio data (representing the entity to be resolved) to audio data stored in the entity storage (608/706) and associate with each known entity a confidence value representing the data catalog component 804's confidence that the known entity corresponds to the entity in the user input, wherein confidence value corresponds to “a respective score”, phonetic matching corresponds to “comparing”. The audio data representing the entity to be resolve is metadata stored for the text query, in other words, a part of the text query, hence “comparing … with the text query”); and 
selecting the entity by determining a maximum score (Col. 20, lines 20-41: the N-best list may not include any more than a maximum number of top scoring known entities, and resolve previously unresolved entities using one or more known entities represented in the N-best list output by the phonetic entity resolution component 802, wherein in the case a maximum number of top scoring known entities is one, the top scoring known entity on the N-best list will be the entity determined to have "a maximum score" that will be selected for entity resolution).

With regard to claim 7,
	As discussed in claim 1, Jang and Ramos teach all the limitations therein.
	Ramos further teaches
the method of claim 1, wherein the entity is a first entity, further comprising identifying a second entity among the plurality of entities based on the text query and second metadata for the second entity, and wherein the content item is associated with the first entity and the second entity (Col. 14, lines 10-32: a framework for a <PlayMusic> intent might indicate to attempt to resolve an object modifier based on [Album Name] and [Song Name] linked to an identified [Artist Name], wherein [Artist Name] corresponds to “a first entity” and [Album Name] and [Song Name] correspond to “a second entity”. For example, if the text data includes "play songs by the rolling stones," either "songs" or "the rolling stones" corresponds to “a first entity”, and the other corresponds to “a second entity”; both entities are resolved in accordance with Figure 1, and the content item retrieved will be songs by the rolling stones that is associated with both entities. “identifying a second entity …based on the text query and second metadata…” is taught in the same manner as discussed in the parent claim with respect to “an entity”).

With regard to claim 8,
	As discussed in claim 1, Jang and Ramos teach all the limitations therein.
	Ramos further teaches
the method of claim 1, wherein identifying the entity among a plurality of entities of the database comprises comparing at least a portion of the text query to tags of the stored metadata to identify a match (Fig. 1, 138; Col. 3, lines 61-67; Col. 4, lines 1-4: perform entity resolution by comparing the tagged portion of text data against text data representing entities known to the system, wherein the tagged portion of text data corresponds to “a portion of the text query” and text data representing entities known to the system corresponds to a tag of the “stored metadata”).

With regard to claim 11,
	Jang teaches
a system for responding to voice queries (Fig. 11; [0165]-[0169]; Fig. 12; [0170]-[0173]), the system comprising: 
an audio interface for receiving a voice query ([0166]; Fig. 1, microphone 122; [0047]: receive a voice query through an audio input component such as a microphone, wherein the microphone corresponds to “an audio interface”); and 
control circuitry coupled to the audio interface (Fig. 1; [0073]-[0076]: controller 180 corresponds to “control circuity”), the control circuitry configured to: 
extract one or more keywords from the voice query ([0170]: identify a query term of the voice query, wherein a query term corresponds to “one or more keywords from the voice query”); 
determine pronunciation information for the one or more keywords, wherein the pronunciation information comprises a phonetic spelling of each of the one or more keywords (Fig. 12; [0170]: determine pronunciation information of individual query terms by identifying phonemes from an audio input containing the voice query and matching the identified phonemes to the pronunciation expression of various query terms of the voice query, wherein pronunciation expression reads on “phonetic spelling”); 
generate a text query comprising the one or more keywords and the pronunciation information (Fig. 11; [0167]-[0168]; Fig. 12; [0170]: convert the voice query to a text query, which identifies query terms from the voice query, determines pronunciation information for each query term, and converts each query term into a typical text query term using a voice query term database that links a range of pronunciation of terms to a typical query term. As a result, a text query is generated comprising the query terms and their pronunciations); 
	Jang does not explicitly teach
identify an entity among a plurality of entities of a database based on the text query and stored metadata for the entity, wherein the metadata comprises a pronunciation tag comprising a phonetic spelling for the entity; and 
retrieve a content item associated with the entity.
Ramos teaches
identify an entity among a plurality of entities of a database based on the text query and stored metadata for the entity, wherein the metadata comprises a pronunciation tag comprising a phonetic spelling for the entity (Fig. 1, 138-144; Col. 3, lines 61-67; Col. 4, lines 1-56: perform entity resolution based on the tagged portion of text data and the portion of audio data corresponding to the tagged portion of text by comparing the portion of audio data against audio data representing entities known to the system, wherein performing entity resolution corresponds to “identifying an entity”, the tagged portion of text data corresponds to “the text query”, and audio data representing entities known to the system corresponds to “a pronunciation tag” comprised in the “stored metadata for the entity”. Fig. 6; Col. 16, lines 42-67; Col. 17, lines 1-18: audio data comprises phonetic representation of text data, wherein the phonetic representation reads on "phonetic spelling". Fig. 6; Col. 16, lines 7-13; Fig. 8; Col. 19, lines 6-41: entity storage (608/706) corresponds to “a database” that contains “a plurality of entities” known to the system); and 
retrieve a content item associated with the entity (Fig. 1, 146; Col. 4, lines 57-62; Col. 2, lines 10-18: use the resolved entity to perform downstream processes. For example, for the user input of "Alexa, play Adele music," a system may output music sung by Adele, wherein output indicates “retrieving”, and music sung by Adele corresponds to “a content item associated with the entity” Adele).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Jang to incorporate the teachings of Ramos to identify an entity among a plurality of entities of a database based on the text query and stored metadata for the entity, wherein the metadata comprises a pronunciation tag and retrieve a content item associated with the entity. Doing so would improve text-based entity resolution by providing a language agnostic phonetic searching as part of entity resolution when text-based entity resolution may be unsuccessful or successful to a degree below a requisite threshold confidence as taught by Ramos (Col. 2, lines 48-67).

With regard to claim 13,
	As discussed in claim 11, Jang and Ramos teach all the limitations therein.
	Ramos further teaches
the system of claim 11, wherein the control circuitry is further configured to identify the entity based on user profile information (Col. 20, lines 13-19; Col. 7, lines 61-66: the phonetic entity resolution component 802 may consider user preferences, wherein user preferences corresponds to “user profile information”).

With regard to claim 14,
	As discussed in claim 13, Jang and Ramos teach all the limitations therein.
	Ramos further teaches
the system of claim 13, wherein the control circuitry is further configured to identify the entity based on a previously identified entity from a previous voice query (Col. 20, lines 13-19: the phonetic entity resolution component 802 may consider the user’s system usage history, wherein the user’s system usage history includes information on “a previously identified entity from a previous voice query”).

With regard to claim 15,
	As discussed in claim 11, Jang and Ramos teach all the limitations therein.
	Ramos further teaches
the system of claim 11, wherein the control circuitry is further configured to identify the entity based on popularity information associated with the entity (Col. 20, lines 13-19: the phonetic entity resolution component 802 may consider popularity of known entities).

With regard to claim 16,
	As discussed in claim 11, Jang and Ramos teach all the limitations therein.
	Ramos further teaches
the system of claim 11, wherein the control circuitry is further configured to identify the entity by: 
identifying the plurality of entities, wherein respective metadata is stored for each entity of the plurality of entities (Fig. 8; Col. 19, lines 27-41: generate an N-best list of known entities by performing phonetic matching of the audio data (representing the entity to be resolved) to audio data stored in the entity storage (608/706), wherein generating an N-best list of known entities corresponds to “identifying the plurality of entities” and audio data stored for each known entity on the N-best list corresponds to “respective metadata” stored for “each entity of the plurality of entities”), 
determining a respective score for each respective entity of the plurality of entities based on comparing the respective pronunciation tag with the text query (Fig. 8; Col. 19, lines 27-41: perform phonetic matching of the audio data (representing the entity to be resolved) to audio data stored in the entity storage (608/706) and associate with each known entity a confidence value representing the data catalog component 804's confidence that the known entity corresponds to the entity in the user input, wherein confidence value corresponds to “a respective score”, phonetic matching corresponds to “comparing”. The audio data representing the entity to be resolve is metadata stored for the text query, in other words, a part of the text query, hence “comparing … with the text query”); and 
selecting the entity by determining a maximum score (Col. 20, lines 20-41: the N-best list may not include any more than a maximum number of top scoring known entities, and resolve previously unresolved entities using one or more known entities represented in the N-best list output by the phonetic entity resolution component 802, wherein in the case a maximum number of top scoring known entities is one, the top scoring known entity on the N-best list will be the entity determined to have "a maximum score" that will be selected for entity resolution).

With regard to claim 17,
	As discussed in claim 11, Jang and Ramos teach all the limitations therein.
	Ramos further teaches
the system of claim 11, wherein the entity is a first entity, wherein the control circuitry is further configured to identify a second entity among the plurality of entities based on the text query and second metadata for the second entity, and wherein the content item is associated with the first entity and the second entity (Col. 14, lines 10-32: a framework for a <PlayMusic> intent might indicate to attempt to resolve an object modifier based on [Album Name] and [Song Name] linked to an identified [Artist Name], wherein [Artist Name] corresponds to “a first entity” and [Album Name] and [Song Name] correspond to “a second entity”. For example, if the text data includes "play songs by the rolling stones," either "songs" or "the rolling stones" corresponds to “a first entity”, and the other corresponds to “a second entity”; both entities are resolved in accordance with Figure 1, and the content item retrieved will be songs by the rolling stones that is associated with both entities. “identifying a second entity …based on the text query and second metadata…” is taught in the same manner as discussed in the parent claim with respect to “an entity”).

With regard to claim 18,
	As discussed in claim 11, Jang and Ramos teach all the limitations therein.
	Ramos further teaches
the system of claim 11, wherein the control circuitry is further configured to identify the entity among a plurality of entities of the database by comparing at least a portion of the text query to tags of the stored metadata to identify a match (Fig. 1, 138; Col. 3, lines 61-67; Col. 4, lines 1-4: perform entity resolution by comparing the tagged portion of text data against text data representing entities known to the system, wherein the tagged portion of text data corresponds to “a portion of the text query” and text data representing entities known to the system corresponds to a tag of the “stored metadata”).

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to XIAOQIN HU whose telephone number is (571)272-1792.  The examiner can normally be reached on Monday-Friday 7:00am-3:30pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Fred Ehichioya can be reached on (571) 272-4034.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/XIAOQIN HU/Examiner, Art Unit 2168

/IRETE F EHICHIOYA/Supervisory Patent Examiner, Art Unit 2168