Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

1.	Applicant’s amendment filed 10/31/2022 is entered. Claims 1, 7, and 20 are amended. Claims 13-20 are withdrawn claims. Claims 1-12 are pending for examination.
It is a Final Rejection.

Response to Arguments
2.1	Applicant’s arguments, see pages 8-10 filed 10/31/2022, with respect to rejection of claims 1-12 and claim 9 separately under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, have  been fully considered and are persuasive.  The rejection of claims 1-12 and claim 9 separately under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph,  has been withdrawn. 

2.2.	Applicant’s arguments, see pages 10-12,  filed 10/31/2022 , with respect to the rejection of claim 1 under 35 USC has been fully considered and are persuasive in view of the amendments made to independent claim 1.  Therefore, the rejection has been withdrawn.  However, upon further consideration, a new ground(s) of rejection is made in view of reference Mahaffy et al. [US 20030018531 A1], hereinafter Mahaffy. 

Claim Rejections - 35 USC § 103

3.	In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
3.1.	Claims 1-9, and 11-12 are rejected under 35 U.S.C. 103 as being unpatentable over Zeitlin in view of Mahaffy.

Regarding claim 1, Zeitlin teaches a non-transitory computer-readable medium [See Fig.7B wherein the “Digital Assistant” represents the claimed non-transitory computer readable medium configured to store instructions executable by one or more processors [Fig.7B, “730 STT processing Module” includes one or more ASR systems”] to perform operations comprising:
independently processing, by each of a first system and a second system, audio data relating to one or more orders by a customer for food or beverage ;
determining whether the first system has supplied a first indication relating to an intention of the customer associated with performing first language processing of the audio data, the first system being configured to perform one or more of the first language processing, first transcription processing, or first entity recognition processing for menu items in the menu; and selecting a result of second language processing of the audio data from the second system in response to determining that the first system has not supplied the first indication, the second system being configured to perform one or more of second language processing of higher quality than the first language processing, second transcription processing of lower quality than the first transcription processing, or second entity recognition processing for menu items in the menu,  see para 0205 [ “STT processing module 730 includes one or more ASR systems. The one or more ASR systems can process the speech input that is received through I/O processing module 728 to produce a recognition result. Each ASR system includes a front-end speech pre-processor. The front-end speech pre-processor extracts representative features from the speech input. For example, the front-end speech pre-processor performs a Fourier transform on the speech input to extract spectral features that characterize the speech input as a sequence of representative multi-dimensional vectors. Further, each ASR system includes one or more speech recognition models (e.g., acoustic models and/or language models) and implements one or more speech recognition engines. Examples of speech recognition models include Hidden Markov Models, Gaussian-Mixture Models, Deep Neural Network Models, n-gram language models, and other statistical models. Examples of speech recognition engines include the dynamic time warping based engines and weighted finite-state transducers (WFST) based engines. The one or more speech recognition models and the one or more speech recognition engines are used to process the extracted representative features of the front-end speech pre-processor to produce intermediate recognitions results (e.g., phonemes, phonemic strings, and sub-words), and ultimately, text recognition results (e.g., words, word strings, or sequence of tokens). In some examples, the speech input is processed at least partially by a third-party service or on the user's device (e.g., device 104, 200, 400, or 600) to produce the recognition result. Once STT processing module 730 produces recognition results containing a text string (e.g., words, or sequence of words, or sequence of tokens), the recognition result is passed to natural language processing module 732 for intent deduction. In some examples, STT processing module 730 produces multiple candidate text representations of the speech input. Each candidate text representation is a sequence of words or tokens corresponding to the speech input. In some examples, each candidate text representation is associated with a speech recognition confidence score. Based on the speech recognition confidence scores, STT processing module 730 ranks the candidate text representations and provides the n-best (e.g., n highest ranked) candidate text representation(s) to natural language processing module 732 for intent deduction, where n is a predetermined integer greater than zero. For example, in one example, only the highest ranked (n=1) candidate text representation is passed to natural language processing module 732 for intent deduction. In another example, the five highest ranked (n=5) candidate text representations are passed to natural language processing module 732 for intent deduction.”]. Zeitlin describes that a plurality of ASR systems wherein each AST system including one or more speech recognition modules  receives audio data which is recognized by the ASR system modules and based on the results they are converted by natural language processing module 732 into texts for intent  deduction of speech inputs [which correspond to audio data related to one or more orders] received . Each text conversions processed is ranked based on speech recognition confidence scores [which represent the claimed first language processing and second language process of the audio data [speech inputs] received and the determined higher rankings relate to the claimed processing of higher quality of second language processing of the audio data and the lower ranking relate to the claimed lower quality of the first language processing of audio data]. The intended deductions for each of the speech inputs correspond to the claimed first entry, and second entry for the first and second speech inputs, as further elaborated in paras 0220 and 0216 where the intended deductions could relate to speech inputs for orders for multiple entities including “food”, “drinks” “pizza” “fast food” etc.  such as restaurant reservation, date/time , cuisine, price range, etc.  Zeitlin fails to teach that that the multiple entities including food to be ordered are from menu items in a menu. Mahaffy, in the same field of placing verbal orders for foods from menu items in a menu, teaches receiving audio data related to one or more fast food orders from menu items in a menu [see para 0017, “ The artificial intelligence routines of the computer system are preferably adapted to process the verbal orders such that a complete fast food order (menu item selection, special preparation requests, eat in or take out, etc.) can be processed.”. Paras 0041 and 0042 that utterances related to verbal orders are processed by CIT system using Natural Language Processing technique. teaches , and para 0042 teaches natural language processing for the orders. Therefore, in view of Mahaffy it would be obvious to an ordinary skilled in the art to have modified Zeitlin to incorporate the concept of  placing fast food orders including multiple entities such as food or pizza or drinks  or restaurant  being menu items in a menu of a restaurant, because the food items such as pizza or drinks or “fast food” are required to be selected from a menu including plurality of menu items.

Regarding claim 2, the limitations, “The non-transitory computer-readable medium of claim 1, wherein the first entity recognition processing comprises recognizing at least a portion of the audio data as relating to multiple entities”, are already covered in the analysis of claim 1 [see Zeitlin paras 0220 and 0216].

Regarding claim 3, the limitations, “ The non-transitory computer-readable medium of claim 1, wherein the second entity recognition processing comprises recognizing at least a portion of the audio data as relating to a single entity”, are already covered as analyzed for claim 1 wherein the second or third entity comprising date/time, party size, are relating to the  single  entity that is restaurant reservation [see Zeitlin paras 0205 and  0216

Regarding claim 4, Zeitlin teaches that the non-transitory computer-readable medium of claim 1, wherein the operations further comprise: comparing the result of the second language processing to a result of the first language processing , see para 0205 and as analyzed above for claim 1 that the results of the recognition of speech inputs are compared and then ranked based on confidence scores and see para 0221 which describes  selection of results is based upon the combination of the number and the importance of the triggered nodes “Based on the quantity and/or relative importance of the activated nodes, natural language processing module 732 selects one of the actionable intents as the task that the user intended the digital assistant to perform. In some examples, the domain that has the most “triggered” nodes is selected. In some examples, the domain having the highest confidence value (e.g., based on the relative importance of its various triggered nodes) is selected. In some examples, the domain is selected based on a combination of the number and the importance of the triggered nodes. In some examples, additional factors are considered in selecting the node as well, such as whether the digital assistant has previously correctly interpreted a similar request from a user.

Regarding claim 5, Zeitlin teaches that the non-transitory computer-readable medium of claim 4, wherein the comparison comprises evaluating the result of the second language processing and the result of the first language processing against a plurality of criteria [see para 0208 wherein the ranking of the results of intended deductions that is entries is done based on a plurality of pronunciation basis, geographical origin, nationality or ethnicity, and also see para 0221 which describes that NLU [natural language module 732 determines a combination of criteria such as the number of triggered nodes [intent nodes, see para 0216] and their relative importance. 

Regarding claim 6, Zeitlin teaches that the non-transitory computer-readable medium of claim 1, wherein the operations further comprise selecting the result of the second language processing when (i) the first system is not determined to have supplied the first indication associated with performing the first language processing and the second system is determined to have supplied a second indication associated with performing the second language processing, see para 0205 “ In some examples, each candidate text representation is associated with a speech recognition confidence score. Based on the speech recognition confidence scores, STT processing module 730 ranks the candidate text representations and provides the n-best (e.g., n highest ranked) candidate text representation(s) to natural language processing module 732 for intent deduction, where n is a predetermined integer greater than zero. For example, in one example, only the highest ranked (n=1) candidate text representation is passed to natural language processing module 732 for intent deduction. In another example, the five highest ranked (n=5) candidate text representations are passed to natural language processing module 732 for intent deduction”. Zeitlin describes ranking the recognition results from different speech inputs so that if the second speech input recognition result is of higher quality it is ranked better than the other recognition result  [from first language processing] and the higher- ranking result is taken to have supplied a second indication associated with performing the second language processing. The limitations “or (ii) an identifier associated with the first indication and an identifier associated with the second indication are different”, “which, an alternative, is not considered.

Regarding claim 7, Zeitlin teaches that the non-transitory computer-readable medium of claim 1, wherein the operations further comprise selecting a result of the first language processing in response to determining that the second system has not supplied a second indication associated with performing the second language processing, see para 0205, see para 0205 “ In some examples, each candidate text representation is associated with a speech recognition confidence score. Based on the speech recognition confidence scores, STT processing module 730 ranks the candidate text representations and provides the n-best (e.g., n highest ranked) candidate text representation(s) to natural language processing module 732 for intent deduction, where n is a predetermined integer greater than zero. For example, in one example, only the highest ranked (n=1) candidate text representation is passed to natural language processing module 732 for intent deduction. In another example, the five highest ranked (n=5) candidate text representations are passed to natural language processing module 732 for intent deduction”. Zeitlin describes ranking the recognition results from different speech inputs so that if the first speech input recognition result is of higher quality it is ranked better than the other recognition result  [from second language processing] and the higher-ranking result of first language processing is selected.

Regarding claim 8, the limitations, “ The non-transitory computer-readable medium of claim 2, wherein the operations further comprise: selecting a result of the first language processing in response to recognizing, by the first entity recognition processing, a plurality of entities”, are discussed and analyzed in the analysis of claim 1, wherein a plurality of entities related to the first entity namely the restaurant could be recognized [See Zeitlin paras 0205 , 0216. And 0220]

Regarding claim 9, Zeitlin teachings, see para 0205, that the recognition results are ranked based on recognition confidence  scores and the results with the highest ranked scores   which may reach a certain threshold such as first 5 highest ranked results are further processed by the natural language processing module 732, read on the limitations of claim 9 that the non-transitory computer-readable medium of claim 3, wherein the operations further comprise: selecting a result of the first language processing in response to detecting, by the second entity recognition processing, a number of entities above a threshold. 

Regarding claim 11, Zeitlin teaches that the non-transitory computer-readable medium of claim 1, wherein the operations further comprise: detecting, by each of the first entity recognition processing and the second entity recognition processing, a number of entities in the audio data; and selecting a result of the first language processing in response to the first entity recognition processing detecting a higher number of entities than the second entity recognition processing or selecting the result of the second language processing in response to the second entity recognition processing detecting a higher number of entities than the first entity recognition processing [See para 0205 and 0216  which teach detecting entities from fist language and second language processing [based on speech inputs] and detecting a higher number of entities related to restaurant reservation requirements than the second entity recognition processing or selecting the result of the second language processing in response to the second entity recognition processing detecting a higher number of entities than the first entity recognition processing based on the highest rankings of the speech recognition results.

Regarding claim 12, Zeitlin teaches all the limitations of claim 1, as analyzed above, including that the computer readable medium including first and second ASR systems [The digital Assistant , see Fig,7B , as analyzed above. Since Zeitlin does not specifically teach that different providers own them it is interpreted that Zeitlin disclosure relates to a same provider of both the ASR systems who owns the Digital Assistant and further as who is or are the owner(s) of the systems is an inventive concept because the system is to belong to somebody whether one owner or more than one owner. 

3.2.	Claim 10 is rejected under 35 U.S.C. 103 as being unpatentable over Zeitlin in view of Mahaffy and  in view of Chellapilla et al. [US 20070230791 A1], hereinafter Chellapilla.

Regarding claim 10, Zeitlin teaches all the limitations of claims 1 and 5, as analyzed above including evaluate the results of first and second language processing based on a plurality of criteria and further teaches that recognition results  contain text strings including sequence of words which are passed to natural language processing module 732  for intent deduction [see para 0205] , but fails to disclose specifically  evaluating the criteria sequentially and terminating the evaluation upon selection of the result of the first language processing or the result of the second language processing. The concept of sequentially evaluating data and terminating the evaluation upon a selection of a result is an old and well-known concept as disclosed in Chellapilla [see para 0006, “ All of the above approaches for ink retrieval rely on a linear scan through the database for each query which tends to be slow. Sequential evaluation combined with early termination is commonly employed while computing match scores to avoid long query times.”. ]. Therefore , in view of the teachings of Chellapilla, it would be obvious to an ordinary skilled in the art at the time of the invention to have modified claim 5  to incorporate the concept of using sequential evaluation of results of the first language and second language processing to terminate the evaluation upon selection of the very first high ranked result based on confidence recognition scores because as shown in Chellapilla to avoid long query time in evaluation of the results.


Conclusion

4.	Final Rejection
	Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 

5.	The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
	(i)	Gella et al. [US Patent 10089983 B1; see col. 10, line 45 -col.11, line 14]  discloses an ASR [Automatic Speech Recognition] system receives an audio data, 
performs speech-to-text processing to the first audio data to generate first text data representing the first audio data, determines  an intent of the utterance e associated with a first application. After the first text data is generated, the text data may be provided to a natural language understanding (“NLU”) system to perform NLU processing to the text data. …… The NLU system may determine one or more domains, which may also be referred to as categories that may be capable of handling the intent of the utterance, “ For example, utterance 4, “Order a pizza from ‘Pizza Application’,” may be identified by a Food domain as possibly being able to handle the corresponding request. For instance, the NLU system may identify that the word “order” may be a recognized intent as being an invocation word associated with the food domain, and may use various sample utterances and invocation phrases associated with the food domain to determine an intent of the utterance. In some embodiments, the NLU system may determine that the intent of utterance is for placing an order with an application (e.g., {Intent}: “Order Item”), where the item to be ordered is a pizza (e.g., {Item To Be Ordered}: “Pizza”), and that a particular application to be used to order that item (e.g., {Skill/Application}: “Pizza Application”).”].

	Foreign reference:
	(ii)	WO 02/21090 A1, see  discloses using an artificial intelligence of a computer system for processing verbal orders for fast food from a menu including plurality of menu items [see page 3, lines 27-29].

Any inquiry concerning this communication or earlier communications from the examiner should be directed to YOGESH C GARG whose telephone number is (571)272-6756. The examiner can normally be reached Max-Flex.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jeffrey A Smith can be reached on 571-272-6763. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/YOGESH C GARG/Primary Examiner, Art Unit 3625