DETAILED ACTION
This office action is in response to the above identified application originally filed on July 31, 2019 with a preliminary amendment filed on October 15, 2019. 
The application contains claims 1-50:
Claims 21-50 are cancelled in the preliminary amendment
Claims 1-20 are pending

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statements (IDS) were submitted on October 15, 2019, November 19, 2020, December 10, 2020, June 28, 2021, November 01, 2021, and January 06, 2022. The submissions are in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statements are being considered by the examiner.

Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.
Claims 1, 3, 5, 11, 13, and 15 are provisionally rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1, 6, 7, 11, 16, and 17 of copending Application No. 16/528,541 in view of Jang (US 20140359523 A1) and Ramos et al. (US 11157696 B1). 
This is a provisional nonstatutory double patenting rejection.

Present Application
Copending Application No. 16/528,541
A method for responding to voice queries, the method comprising:
A method for responding to voice queries, the method comprising: 
receiving a voice query at an audio interface;
receiving a voice query at an audio interface; 
extracting, using control circuitry, one or more keywords from the voice query;
extracting, using control circuitry, one or more keywords from the voice query; 
determining, using the control circuitry, pronunciation information for the one or more keywords;

generating, using the control circuitry, a text query based on the one or more keywords and the pronunciation information;
generating, using the control circuitry, a text query based on the one or more keywords; 
identifying an entity among a plurality of entities of a database based on the text query and stored metadata for the entity, wherein the metadata comprises a pronunciation tag; and
identifying an entity based on the text query and metadata for the entity, wherein the metadata comprises one or more alternate text representations of the entity based on pronunciation of an identifier associated with the entity; and
retrieving a content item associated with the entity.
retrieving a content item associated with the entity.


Present Application
Copending Application No. 16/528,541
11. A system for responding to voice queries, the system comprising: 
11. A system for responding to voice queries, the system comprising: 
an audio interface for receiving a voice query; and
an audio interface for receiving a voice query; 
control circuitry coupled to the audio interface, the control circuitry configured to:
control circuitry configured to: 
extract one or more keywords from the voice query;
extract one or more keywords from the voice query; 
determine pronunciation information for the one or more keywords;

generate a text query based on the one or more keywords and the pronunciation information; 
generate a text query based on the one or more keywords; 
identify an entity among a plurality of entities of a database based on the text query and stored metadata for the entity, wherein the metadata comprises a pronunciation tag; and 
identify an entity based on the text query and metadata for the entity, wherein the metadata comprises one or more alternate text representations of the entity based on pronunciation of an identifier associated with the entity; and 
retrieve a content item associated with the entity.
retrieve a content item associated with the entity.


	Claims 1 and 11 of the present application recite similar limitations. Since, claim 11 contains an additional difference than claim 1 when compared to the copending Application No. 16/528,541 as shown in the two comparison tables above, claim 11 will be used to address the obviousness double patenting rejection as follows:
	Jang (US 20140359523 A1) teaches 
	control circuitry coupled to the audio interface (Fig. 1; [0073]-[0076]: controller 180 corresponds to “control circuity”, microphone 122 corresponds to “the audio interface”, and they are coupled to each other as shown in Fig. 1),
	determine pronunciation information for the one or more keywords (Fig. 12; [0170]: identify phonemes from an audio input containing the voice query and map identified phonemes to individual query terms according to the pronunciation model 237. As a result, it determines pronunciation information for each query term);
	generate a text query based on the one or more keywords and the pronunciation information(Fig. 11; [0167]-[0168]; Fig. 12; [0170]: convert the voice query to a text query, which identifies query terms from the voice query, determines pronunciation information for each query term, and converts each query term into a typical text query term using a voice query term database that links a range of pronunciation of terms to a typical query term. As a result, a text query is generated based on the query terms and their pronunciations);
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified copending Application No. 16/528,541 to incorporate the teachings of Jang to determine pronunciation information for the one or more keywords and generate a text query based on the one or more keywords and the pronunciation information. Doing so would facilitate identifying some of the terms in the voice query due to pronunciation variation as taught by Jang ([0168]).
Copending Application No. 16/528,541 in view of Jang does not teach the following limitation:
identify an entity among a plurality of entities of a database based on the text query and stored metadata for the entity, wherein the metadata comprises a pronunciation tag;
Ramos et al. (US 11157696 B1) teaches
identify an entity among a plurality of entities of a database based on the text query and stored metadata for the entity, wherein the metadata comprises a pronunciation tag (Fig. 1, 138-144; Col. 3, lines 61-67; Col. 4, lines 1-56: perform entity resolution based on the tagged portion of text data; when it fails, perform entity resolution using the portion of audio data corresponding to the tagged portion of text by comparing the portion of audio data against audio data representing entities known to the system, wherein performing entity resolution corresponds to “identifying an entity”, the tagged portion of text data corresponds to “the text query”, and audio data representing entities known to the system corresponds to “a pronunciation tag” comprised in the “stored metadata for the entity”. Fig. 6; Col. 16, lines 7-13; Fig. 8; Col. 19, lines 6-41: entity storage (608/706) corresponds to “a database” that contains “a plurality of entities” known to the system); 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified copending Application No. 16/528,541 in view of Jang to incorporate the teachings of Ramos to identify an entity among a plurality of entities of a database based on the text query and stored metadata for the entity, wherein the metadata comprises a pronunciation tag and retrieve a content item associated with the entity. Doing so would improve text-based entity resolution by providing a language agnostic phonetic searching as part of entity resolution when text-based entity resolution may be unsuccessful or successful to a degree below a requisite threshold confidence as taught by Ramos (Col. 2, lines 48-67).
Therefore, claim 11 is rejected on the ground of nonstatutory double patenting as being unpatentable over claim 11 of copending Application No. 16/528,541. 
Claim 1 is rejected on the ground of nonstatutory double patenting as being unpatentable over claim 1 of copending Application No. 16/528,541 by a similar rationale.
Dependent claims 13 and 15, which are dependent on claim 11, are also rejected on the ground of nonstatutory double patenting as being unpatentable over claims 16 and 17 of copending Application No. 16/528,541, respectively. The rationale is similar to that of claim 11.
Dependent claims 3 and 5, which are dependent on claim 1, are also rejected on the ground of nonstatutory double patenting as being unpatentable over claims 6 and 7 of copending Application No. 16/528,541, respectively. The rationale is similar to that of claim 1.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-20 are rejected under 35 U.S.C. 103 as being unpatentable over Jang (US 20140359523 A1), in view of Ramos et al. (US 11157696 B1).

With regard to claim 1,
	Jang teaches
a method for responding to voice queries (Fig. 11; [0165]-[0169]; Fig. 12; [0170]-[0173]), the method comprising: 
receiving a voice query at an audio interface ([0166]; Fig. 1, microphone 122; [0047]: receive a voice query through an audio input component such as a microphone, wherein the microphone corresponds to “an audio interface”); 
extracting, using control circuitry (Fig. 1; [0073]-[0076]: controller 180 corresponds to “control circuity”), one or more keywords from the voice query ([0170]: identify a query term of the voice query, wherein a query term corresponds to “one or more keywords from the voice query”); 
determining, using the control circuitry, pronunciation information for the one or more keywords (Fig. 12; [0170]: identify phonemes from an audio input containing the voice query and map identified phonemes to individual query terms according to the pronunciation model 237. As a result, it determines pronunciation information for each query term); 
generating, using the control circuitry, a text query based on the one or more keywords and the pronunciation information (Fig. 11; [0167]-[0168]; Fig. 12; [0170]: convert the voice query to a text query, which identifies query terms from the voice query, determines pronunciation information for each query term, and converts each query term into a typical text query term using a voice query term database that links a range of pronunciation of terms to a typical query term. As a result, a text query is generated based on the query terms and their pronunciations); 
	Jang does not explicitly teach
identifying an entity among a plurality of entities of a database based on the text query and stored metadata for the entity, wherein the metadata comprises a pronunciation tag; and 
retrieving a content item associated with the entity.
Ramos teaches
identifying an entity among a plurality of entities of a database based on the text query and stored metadata for the entity, wherein the metadata comprises a pronunciation tag (Fig. 1, 138-144; Col. 3, lines 61-67; Col. 4, lines 1-56: perform entity resolution based on the tagged portion of text data; when it fails, perform entity resolution using the portion of audio data corresponding to the tagged portion of text by comparing the portion of audio data against audio data representing entities known to the system, wherein performing entity resolution corresponds to “identifying an entity”, the tagged portion of text data corresponds to “the text query”, and audio data representing entities known to the system corresponds to “a pronunciation tag” comprised in the “stored metadata for the entity”. Fig. 6; Col. 16, lines 7-13; Fig. 8; Col. 19, lines 6-41: entity storage (608/706) corresponds to “a database” that contains “a plurality of entities” known to the system); and 
retrieving a content item associated with the entity (Fig. 1, 146; Col. 4, lines 57-62; Col. 2, lines 10-18: use the resolved entity to perform downstream processes. For example, for the user input of "Alexa, play Adele music," a system may output music sung by Adele, wherein output indicates “retrieving”, and music sung by Adele corresponds to “a content item associated with the entity” Adele).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Jang to incorporate the teachings of Ramos to identify an entity among a plurality of entities of a database based on the text query and stored metadata for the entity, wherein the metadata comprises a pronunciation tag and retrieve a content item associated with the entity. Doing so would improve text-based entity resolution by providing a language agnostic phonetic searching as part of entity resolution when text-based entity resolution may be unsuccessful or successful to a degree below a requisite threshold confidence as taught by Ramos (Col. 2, lines 48-67).

With regard to claim 2,
	As discussed in claim 1, Jang and Ramos teach all the limitations therein.
	Jang further teaches
the method of claim 1, wherein the pronunciation information comprises a phoneme of one of the one or more keywords (Fig. 12; [0170]: match phonemes identified by the audio model 235 to individual query terms according to the pronunciation model 237, i.e., a query term’s pronunciation comprises a phoneme).

With regard to claim 3,
	As discussed in claim 1, Jang and Ramos teach all the limitations therein.
	Ramos further teaches
the method of claim 1, wherein identifying the entity is further based on user profile information (Col. 20, lines 13-19; Col. 7, lines 61-66: the phonetic entity resolution component 802 may consider user preferences, wherein user preferences corresponds to “user profile information”).

With regard to claim 4,
	As discussed in claim 3, Jang and Ramos teach all the limitations therein.
	Ramos further teaches
the method of claim 3, wherein the identifying the entity is based on a previously identified entity from a previous voice query (Col. 20, lines 13-19: the phonetic entity resolution component 802 may consider the user’s system usage history, wherein the user’s system usage history includes information on “a previously identified entity from a previous voice query”).

With regard to claim 5,
	As discussed in claim 1, Jang and Ramos teach all the limitations therein.
	Ramos further teaches
the method of claim 1, wherein identifying the entity is further based on popularity information associated with the entity (Col. 20, lines 13-19: the phonetic entity resolution component 802 may consider popularity of known entities).

With regard to claim 6,
	As discussed in claim 1, Jang and Ramos teach all the limitations therein.
	Ramos further teaches
the method of claim 1, wherein identifying the entity comprises: 
identifying the plurality of entities, wherein respective metadata is stored for each entity of the plurality of entities (Fig. 8; Col. 19, lines 27-41: generate an N-best list of known entities by performing phonetic matching of the audio data (representing the entity to be resolved) to audio data stored in the entity storage (608/706), wherein generating an N-best list of known entities corresponds to “identifying the plurality of entities” and audio data stored for each known entity on the N-best list corresponds to “respective metadata” stored for “each entity of the plurality of entities”), 
determining a respective score for each respective entity of the plurality of entities based on comparing the respective pronunciation tag with the text query (Fig. 8; Col. 19, lines 27-41: perform phonetic matching of the audio data (representing the entity to be resolved) to audio data stored in the entity storage (608/706) and associate with each known entity a confidence value representing the data catalog component 804's confidence that the known entity corresponds to the entity in the user input, wherein confidence value corresponds to “a respective score”, phonetic matching corresponds to “comparing”. The audio data representing the entity to be resolve is metadata stored for the text query, in other words, a part of the text query, hence “comparing … with the text query”); and 
selecting the entity by determining a maximum score (Col. 20, lines 20-41: the N-best list may not include any more than a maximum number of top scoring known entities, and resolve previously unresolved entities using one or more known entities represented in the N-best list output by the phonetic entity resolution component 802, wherein in the case a maximum number of top scoring known entities is one, the top scoring known entity on the N-best list will be the entity determined to have "a maximum score" that will be selected for entity resolution).

With regard to claim 7,
	As discussed in claim 1, Jang and Ramos teach all the limitations therein.
	Ramos further teaches
the method of claim 1, wherein the entity is a first entity, further comprising identifying a second entity among the plurality of entities based on the text query and second metadata for the second entity, and wherein the content item is associated with the first entity and the second entity (Col. 14, lines 10-32: a framework for a <PlayMusic> intent might indicate to attempt to resolve an object modifier based on [Album Name] and [Song Name] linked to an identified [Artist Name], wherein [Artist Name] corresponds to “a first entity” and [Album Name] and [Song Name] correspond to “a second entity”. For example, if the text data includes "play songs by the rolling stones," either "songs" or "the rolling stones" corresponds to “a first entity”, and the other corresponds to “a second entity”; both entities are resolved in accordance with Figure 1, and the content item retrieved will be songs by the rolling stones that is associated with both entities. “identifying a second entity …based on the text query and second metadata…” is taught in the same manner as discussed in the parent claim with respect to “an entity”).

With regard to claim 8,
	As discussed in claim 1, Jang and Ramos teach all the limitations therein.
	Ramos further teaches
the method of claim 1, wherein identifying the entity among a plurality of entities of the database comprises comparing at least a portion of the text query to tags of the stored metadata to identify a match (Fig. 1, 138; Col. 3, lines 61-67; Col. 4, lines 1-4: perform entity resolution by comparing the tagged portion of text data against text data representing entities known to the system, wherein the tagged portion of text data corresponds to “a portion of the text query” and text data representing entities known to the system corresponds to a tag of the “stored metadata”).

With regard to claim 9,
	As discussed in claim 1, Jang and Ramos teach all the limitations therein.
	Ramos further teaches
the method of claim 1, wherein a first keyword of the one or more keywords is associated with more than one pronunciation of the first keyword (Col. 16, lines 64-67; Col. 17, lines 1-4: different pronunciations are associated with a single entity (e.g., male pronunciation, female pronunciation, and different accents (e.g., a Japanese user speaking English)). As discussed in the parent claim, the fact that the tagged portion of the text query that includes “a first keyword” matches with an entity with more than one pronunciation indicates the first keyword is associated with more than one pronunciation as well).

With regard to claim 10,
	As discussed in claim 1, Jang and Ramos teach all the limitations therein.
	Ramos further teaches
the method of claim 1, wherein the pronunciation information comprises a phonetic representation of a first keyword of the one or more keywords (Col. 19, lines 27-41: performing phonetic matching of the audio data (representing the entity to be resolved) to audio data stored in the entity storage (608/706) indicates the pronunciation information of a first keyword is “a phonetic representation” because the matching would not be possible if otherwise).

With regard to claim 11,
	Jang teaches
a system for responding to voice queries (Fig. 11; [0165]-[0169]; Fig. 12; [0170]-[0173]), the system comprising: 
an audio interface for receiving a voice query ([0166]; Fig. 1, microphone 122; [0047]: receive a voice query through an audio input component such as a microphone, wherein the microphone corresponds to “an audio interface”); and 
control circuitry coupled to the audio interface (Fig. 1; [0073]-[0076]: controller 180 corresponds to “control circuity”), the control circuitry configured to: 
extract one or more keywords from the voice query ([0170]: identify a query term of the voice query, wherein a query term corresponds to “one or more keywords from the voice query”); 
determine pronunciation information for the one or more keywords (Fig. 12; [0170]: identify phonemes from an audio input containing the voice query and map identified phonemes to individual query terms according to the pronunciation model 237. As a result, it determines pronunciation information for each query term); 
generate a text query based on the one or more keywords and the pronunciation information (Fig. 11; [0167]-[0168]; Fig. 12; [0170]: convert the voice query to a text query, which identifies query terms from the voice query, determines pronunciation information for each query term, and converts each query term into a typical text query term using a voice query term database that links a range of pronunciation of terms to a typical query term. As a result, a text query is generated based on the query terms and their pronunciations); 
	Jang does not explicitly teach
identify an entity among a plurality of entities of a database based on the text query and stored metadata for the entity, wherein the metadata comprises a pronunciation tag; and 
retrieve a content item associated with the entity.
Ramos teaches
identify an entity among a plurality of entities of a database based on the text query and stored metadata for the entity, wherein the metadata comprises a pronunciation tag (Fig. 1, 138-144; Col. 3, lines 61-67; Col. 4, lines 1-56: perform entity resolution based on the tagged portion of text data; when it fails, perform entity resolution using the portion of audio data corresponding to the tagged portion of text by comparing the portion of audio data against audio data representing entities known to the system, wherein performing entity resolution corresponds to “identifying an entity”, the tagged portion of text data corresponds to “the text query”, and audio data representing entities known to the system corresponds to “a pronunciation tag” comprised in the “stored metadata for the entity”. Fig. 6; Col. 16, lines 7-13; Fig. 8; Col. 19, lines 6-41: entity storage (608/706) corresponds to “a database” that contains “a plurality of entities” known to the system); and 
retrieve a content item associated with the entity (Fig. 1, 146; Col. 4, lines 57-62; Col. 2, lines 10-18: use the resolved entity to perform downstream processes. For example, for the user input of "Alexa, play Adele music," a system may output music sung by Adele, wherein output indicates “retrieving”, and music sung by Adele corresponds to “a content item associated with the entity” Adele).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Jang to incorporate the teachings of Ramos to identify an entity among a plurality of entities of a database based on the text query and stored metadata for the entity, wherein the metadata comprises a pronunciation tag and retrieve a content item associated with the entity. Doing so would improve text-based entity resolution by providing a language agnostic phonetic searching as part of entity resolution when text-based entity resolution may be unsuccessful or successful to a degree below a requisite threshold confidence as taught by Ramos (Col. 2, lines 48-67).

With regard to claim 12,
	As discussed in claim 11, Jang and Ramos teach all the limitations therein.
	Jang further teaches
the system of claim 11, wherein the pronunciation information comprises a phoneme of one of the one or more keywords (Fig. 12; [0170]: match phonemes identified by the audio model 235 to individual query terms according to the pronunciation model 237, i.e., a query term’s pronunciation comprises a phoneme).

With regard to claim 13,
Jang and Ramos teach all the limitations therein.
	Ramos further teaches
the system of claim 11, wherein the control circuitry is further configured to identify the entity based on user profile information (Col. 20, lines 13-19; Col. 7, lines 61-66: the phonetic entity resolution component 802 may consider user preferences, wherein user preferences corresponds to “user profile information”).

With regard to claim 14,
	As discussed in claim 13, Jang and Ramos teach all the limitations therein.
	Ramos further teaches
the system of claim 13, wherein the control circuitry is further configured to identify the entity based on a previously identified entity from a previous voice query (Col. 20, lines 13-19: the phonetic entity resolution component 802 may consider the user’s system usage history, wherein the user’s system usage history includes information on “a previously identified entity from a previous voice query”).

With regard to claim 15,
	As discussed in claim 11, Jang and Ramos teach all the limitations therein.
	Ramos further teaches
the system of claim 11, wherein the control circuitry is further configured to identify the entity based on popularity information associated with the entity (Col. 20, lines 13-19: the phonetic entity resolution component 802 may consider popularity of known entities).

With regard to claim 16,
Jang and Ramos teach all the limitations therein.
	Ramos further teaches
the system of claim 11, wherein the control circuitry is further configured to identify the entity by: 
identifying the plurality of entities, wherein respective metadata is stored for each entity of the plurality of entities (Fig. 8; Col. 19, lines 27-41: generate an N-best list of known entities by performing phonetic matching of the audio data (representing the entity to be resolved) to audio data stored in the entity storage (608/706), wherein generating an N-best list of known entities corresponds to “identifying the plurality of entities” and audio data stored for each known entity on the N-best list corresponds to “respective metadata” stored for “each entity of the plurality of entities”), 
determining a respective score for each respective entity of the plurality of entities based on comparing the respective pronunciation tag with the text query (Fig. 8; Col. 19, lines 27-41: perform phonetic matching of the audio data (representing the entity to be resolved) to audio data stored in the entity storage (608/706) and associate with each known entity a confidence value representing the data catalog component 804's confidence that the known entity corresponds to the entity in the user input, wherein confidence value corresponds to “a respective score”, phonetic matching corresponds to “comparing”. The audio data representing the entity to be resolve is metadata stored for the text query, in other words, a part of the text query, hence “comparing … with the text query”); and 
selecting the entity by determining a maximum score (Col. 20, lines 20-41: the N-best list may not include any more than a maximum number of top scoring known entities, and resolve previously unresolved entities using one or more known entities represented in the N-best list output by the phonetic entity resolution component 802, wherein in the case a maximum number of top scoring known entities is one, the top scoring known entity on the N-best list will be the entity determined to have "a maximum score" that will be selected for entity resolution).

With regard to claim 17,
	As discussed in claim 11, Jang and Ramos teach all the limitations therein.
	Ramos further teaches
the system of claim 11, wherein the entity is a first entity, wherein the control circuitry is further configured to identify a second entity among the plurality of entities based on the text query and second metadata for the second entity, and wherein the content item is associated with the first entity and the second entity (Col. 14, lines 10-32: a framework for a <PlayMusic> intent might indicate to attempt to resolve an object modifier based on [Album Name] and [Song Name] linked to an identified [Artist Name], wherein [Artist Name] corresponds to “a first entity” and [Album Name] and [Song Name] correspond to “a second entity”. For example, if the text data includes "play songs by the rolling stones," either "songs" or "the rolling stones" corresponds to “a first entity”, and the other corresponds to “a second entity”; both entities are resolved in accordance with Figure 1, and the content item retrieved will be songs by the rolling stones that is associated with both entities. “identifying a second entity …based on the text query and second metadata…” is taught in the same manner as discussed in the parent claim with respect to “an entity”).

With regard to claim 18,
	As discussed in claim 11, Jang and Ramos teach all the limitations therein.
	Ramos further teaches
the system of claim 11, wherein the control circuitry is further configured to identify the entity among a plurality of entities of the database by comparing at least a portion of the text query to tags of the stored metadata to identify a match (Fig. 1, 138; Col. 3, lines 61-67; Col. 4, lines 1-4: perform entity resolution by comparing the tagged portion of text data against text data representing entities known to the system, wherein the tagged portion of text data corresponds to “a portion of the text query” and text data representing entities known to the system corresponds to a tag of the “stored metadata”).

With regard to claim 19,
	As discussed in claim 11, Jang and Ramos teach all the limitations therein.
	Ramos further teaches
the system of claim 11, wherein a first keyword of the one or more keywords is associated with more than one pronunciation of the first keyword (Col. 16, lines 64-67; Col. 17, lines 1-4: different pronunciations are associated with a single entity (e.g., male pronunciation, female pronunciation, and different accents (e.g., a Japanese user speaking English)). As discussed in the parent claim, the fact that the tagged portion of the text query that includes “a first keyword” matches with an entity with more than one pronunciation indicates the first keyword is associated with more than one pronunciation as well).

With regard to claim 20,
	As discussed in claim 11, Jang and Ramos teach all the limitations therein.
	Ramos further teaches
the system of claim 11, wherein the pronunciation information comprises a phonetic representation of a first keyword of the one or more keywords (Col. 19, lines 27-41: performing phonetic matching of the audio data (representing the entity to be resolved) to audio data stored in the entity storage (608/706) indicates the pronunciation information of a first keyword is “a phonetic representation” because the matching would not be possible if otherwise).

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to XIAOQIN HU whose telephone number is (571)272-1792.  The examiner can normally be reached on Monday-Friday 7:00am-3:30pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Fred Ehichioya can be reached on (571) 272-4034.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/XIAOQIN HU/Examiner, Art Unit 2168