DETAILED ACTION

Response to amendments

1.	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . A response was filed in this application on 03/05/2021 after the non-final rejection of 12/08/2020. In this submission, claims 1 and 11 were amended while claims 21-50 were previously cancelled. Thus, claims 1-20 are currently pending for reconsideration by the Examiner and are examined below.

Response to arguments

2.	The Applicant’s arguments have been fully considered but are moot in light of new grounds of rejection as necessitated by amendments presented in this latest submission pertaining to receiving a search query comprising a plurality of entities and identifying an entity of the plurality of entities. 

Claim Rejections - 35 USC § 103

In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be 

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:

A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

3.	Claims 1-6 and 11-16 are rejected under 35 U.S.C. 103 as being unpatentable over Raedel (U.S. Patent Application Publication # 2015/0371636 A1) in view of Aher (U.S. Patent Application Publication # 2021/0026901 A1).

With regards to claim 1, Raedel teaches a method for generating entity metadata for voice queries, the method comprising generating, using a text-to-speech module, an audio file based on a first text string and at least one speech criterion, wherein the first text string describes the entity (Para 28, further Para 33 and figure 2, further teach that the processing module 205 may cause a processing of the textual input received from the one or more user via input module 201 to cause a text analysis. The text analysis process involves the processing module 205 identifying phonemes, pronunciation, pitches, and other information associated with the at least one textual input);

generating, using a speech-to-text module, a second text string based on the audio file (Para 28, further teaches that then, the content creation platform 115 may run these audio phrases through a speech-to-text processing); 

comparing the second text string to the first text string (Para 28, further teaches that subsequently, the content creation platform 115 may compare the results to the original typed text);

and storing, in metadata associated with the entity, the second text string if it is not identical to the first text string (Para 28, further teaches that the results of the above comparison are used to further refine the speech creation process on a per user basis. The content creation platform 115 may save the one or more corrections to the database 211. The content creation platform 115 may also capture words and/or phrases directly, wherein the words and/or phrases are 

However, Raedel may not explicitly detail the limitation wherein receiving a search query comprising a plurality of entities and identifying an entity of the plurality of entities. This is taught by Aher (Figure 1 along with para 17, shows an illustrative diagram 100 of metadata for a plurality of entities of a search query, e.g. a primary search term "Warrior" 102 is identified from a search query by a processing engine. The processing engine may receive the metadata of a plurality of entities from memory. The table 104 lists the entity and other categories such as title, sub-entity, important members, release year, domain, and genres. Many other fields may be applicable for any type of media asset or digital information. The processing engine may determine a metadata identifier for the respective entity. A metadata identifier is unique among each of the plurality of entities other than the respective entity, e.g. the release year has value "1982" for entity "Music band." This year "1982" is unique among all entities that matched the primary search term "Warrior." The processing engine may generate a suggested search string comprising a plurality of search string elements 106, wherein the plurality of search string elements includes the primary search term, the entity, and the metadata identifier, e.g. a generated search string includes "The Golden State Warriors basketball" having the primary search term "Warrior," the entity "Team," and a metadata identifier "basketball"). 

Aher, para 6). 

With regards to claim 2, Raedel teaches the method of claim 1, wherein the at least one speech criterion comprises a pronunciation setting (Para 33 and figure 2, further teach that the text analysis process involves the processing module 205 identifying phonemes, pronunciation, pitches, and other information associated with the textual input).

With regards to claim 3, Raedel teaches the method of claim 1, wherein the at least one speech criterion comprises a language setting (Para 33 and figure 2, further teach that the text analysis process involves the processing module 205 identifying phonemes, pronunciation, pitches, and other information associated with the textual input).

With regards to claim 4, Raedel teaches the method of claim 1, wherein the at least one speech criterion comprises a plurality of speech criterion (Para 33 and figure 2, further teach that the text analysis process involves the processing 

the method further comprising generating, using the text-to-speech module, a respective audio file based on a first text string and a respective speech criterion (Para 28, further teaches that when a user uses a text function, the content creation platform 115 may know the exact text of a phrase, and may also know the assembled text-to-speech algorithm process. The content creation platform 115 may then recreate the phrases. Para 33 and figure 2, further teach that the processing module 205 may cause a processing of the textual input received from the one or more user via input module 201 to cause a text analysis. The text analysis process involves the processing module 205 identifying phonemes, pronunciation, pitches, and other information associated with the at least one textual input);

generating, using the speech-to-text module, a respective second text string based on the respective audio file (Para 28, further teaches that then, the content creation platform 115 may run these audio phrases through a speech-to-text processing);  Application No.: 16/528.550Docket No.: 00359'7-2280-103 Prcliin ary Amendment dated October 15, 2019

comparing the respective second text string to the first text string Para 28, further teaches that subsequently, the content creation platform 115 may compare the results to the original typed text);

and storing, in metadata associated with the entity, the respective second text string if it is not identical to the first text string (Para 28, further teaches that the results of the above comparison are used to further refine the speech creation process on a per user basis. The content creation platform 115 may save the one or more corrections to the database 211. The content creation platform 115 may also capture words and/or phrases directly, wherein the words and/or phrases are processed to further enhance to a user's phoneme descriptors, speech cadence and tone).

With regards to claim 5, Raedel teaches the method of claim 1, further comprising updating the metadata based on one or more text queries (Para 28, further teaches that the content creation platform 115 may also capture words and/or phrases directly, wherein the words and/or phrases are processed to further enhance to a user's phoneme descriptors, speech cadence and tone. Para 51, further teaches that automatic processing by content creation platform may cause a storing of new sound updates, and may implement various mechanisms for correct application of the new sounds).

With regards to claim 6, Raedel teaches the method of claim 1, further comprising storing, in metadata associated with the entity, a phonetic representation of the first text string (Para 28, further teaches that the content creation platform 115 may also capture words and/or phrases directly, wherein 

With regards to claims 11-16, these are system claims for the corresponding method claims 1-6. These two sets of claims are related as method and apparatus of using the same, with each claimed system element's function corresponding to the claimed method step. Accordingly, claims 11-16 are similarly rejected under the same rationale as applied above with respect to method claims 1-6.

4.	Claims 7-10 and 17-20 are rejected under 35 U.S.C. 103 as being unpatentable over Raedel in view of Aher and further in view of Moore (U.S. Patent Application Publication # 2019/0149987 A1).

With regards to claim 7, Raedel teaches the method of claim 1, wherein generating the audio file based on the first text string comprises converting the first text string to a first audio signal and processing the audio signal to generate the audio file (Para 28, further teaches that when a user uses a text function, the content creation platform 115 may know the exact text of a phrase, and may also know the assembled text-to-speech algorithm process. The content creation platform 115 may then recreate the phrases);

Moore (Para 33 and figure 1B, teach that after successfully setting up the second device, the second device may emit a first text-to-speech output from a speaker of the second device e.g., "Thanks for setting me up!");

Moore also teaches detecting the speech using a microphone to generate a second audio signal (Para 33 and figure 1B, further teach that a microphone of the first device detects this first TTS output due to the inclusion of a predefined "wake word" in the first TTS output and respond by emitting a second TTS output from a speaker of the first device e.g., "No problem!").

Raedel, Aher and Moore can be considered as analogous art as they belong to a similar field of endeavor in speech processing. It would thus have been obvious to one having ordinary skill in the art to advantageously combine the teachings of Moore (Use of a speaker and microphone to output and detect respectively, the text-to-speech output) with those of Raedel and Aher (Use of a text-to-speech system in combination with a speech-to-text system) so as to mitigate cumbersome, complicated steps and prevent user frustration during user device interaction (Moore, para 2). 

With regards to claim 8, Raedel and Aher may not explicitly detail generating the speech at the speaker is further based on at least one speech setting of the text-Moore once again teaches this (Para 33 and figure 1B, teach that the speech is generated at the speaker of the second device by text-to-speech including a wake word in the speech output according to the settings).  

Raedel, Aher and Moore can be considered as analogous art as they belong to a similar field of endeavor in speech processing. It would thus have been obvious to one having ordinary skill in the art to advantageously combine the teachings of Moore (generating the speech at the speaker based on a speech setting of the text-to-speech module) with those of Raedel and Aher (Use of a text-to-speech system in combination with a speech-to-text system) so as to mitigate cumbersome, complicated steps and prevent user frustration during user device interaction (Moore, para 2). 

With regards to claim 9, Raedel teaches converting the audio signal to the second text string by identifying one or more words (Para 28, further teaches that then, the content creation platform 115 runs audio phrases through a speech-to-text processing. The content creation platform 115 captures words and/or phrases directly, wherein the words and/or phrases are processed to further enhance to a user's phoneme descriptors, speech cadence and tone);

However, Raedel and Aher may not explicitly detail generating the second text string based on the audio file comprises generating a playback of the audio file at Moore (Para 33 and figure 1B, teach that after successfully setting up the second device, the second device may emit a first text-to-speech output from a speaker of the second device e.g., "Thanks for setting me up!");

Moore also teaches detecting the playback using a microphone to generate an audio signal (Para 33 and figure 1B, further teach that a microphone of the first device detects this first TTS output due to the inclusion of a predefined "wake word" in the first TTS output and respond by emitting a second TTS output from a speaker of the first device e.g., "No problem!"); 

Raedel, Aher and Moore can be considered as analogous art as they belong to a similar field of endeavor in speech processing. It would thus have been obvious to one having ordinary skill in the art to advantageously combine the teachings of Moore (Use of a speaker and microphone to output and detect respectively, the text-to-speech output) with those of Raedel and Aher (Use of a text-to-speech system in combination with a speech-to-text system) so as to mitigate cumbersome, complicated steps and prevent user frustration during user device interaction (Moore, para 2). 

With regards to claim 10, Raedel teaches the method of claim 9, wherein converting the audio signal to the second text string is based on at least one text setting of the speech-to-text module (Para 28, further teaches that then, the 

With regards to claims 17-20, these are system claims for the corresponding method claims 7-10. These two sets of claims are related as method and apparatus of using the same, with each claimed system element's function corresponding to the claimed method step. Accordingly, claims 17-20 are similarly rejected under the same rationale as applied above with respect to method claims 7-10.

Conclusion

5.	Applicant's amendment necessitated the new grounds of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  The following prior art, made of record but not relied upon, is considered pertinent to applicant's disclosure: Flanagan (U.S. Patent # 5737485 A), Boxwell (U.S. Patent Application Publication # 2021/0011934 A1). These references are also included in the PTO-892 form attached with this office action.



Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.

Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. If you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). In case you would like assistance from a USPTO Customer Service Representative or 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to NEERAJ SHARMA whose contact information is given below.  The examiner can normally be reached on Monday to Friday 8 am to 5 pm. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Pierre Louis-Desir can be reached on 571-272-7799 (Direct Phone).  The fax number for the organization where this application or proceeding is assigned is 571-273-8300.

/NEERAJ SHARMA/
Primary Examiner, Art Unit 2659
571-270-5487 (Direct Phone)
571-270-6487 (Direct Fax)
neeraj.sharma@uspto.gov (Direct Email)