Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Election/Restrictions

1.1	Applicant's election with traverse of Group I including claims 1-12 in the reply filed on 05/24/2022 is acknowledged.  The traversal is on the ground(s) that the non-elected groups II and II are in the same class as elected Group I  This is not found persuasive because inventions being in the same class are directed to different and distinct inventions, e.g. in a single class G06Q 30/0635 are currently 2440 patents and in class G06Q30/0641 there are currently 4988 US Patents. The reasons provided by the Examiner in the Requirement for Restriction/Election mailed 03/24/2022, see pages 2-3 are sustainable, because Inventions I, Il, and III are related as subcombinations disclosed as usable together in a single combination. The subcombinations are distinct if they do not overlap in scope and are not obvious variants, and if it is shown that at least one subcombination is separately usable. In the instant case, subcombination || as described in Invention II has separate utility such as a POS terminal in wireless communication with a cloud computing system to process received audio data for placing one or more orders and processing the order, which are not required for the inventions | and III, as claimed. Subcombination | has separate utility such as a process in response to determining that the first system has not supplied the first indication, the second system being configured to perform one or more of second language processing of higher quality than the first language processing, second transcription processing of lower quality than the first transcription processing, or second entity recognition processing, which are not required for inventions II and III, as claimed. See MPEP § 806.05(d).Therefore, the requirement is still deemed proper and is therefore made FINAL.
	Since Applicant has elected Group I, claims 1-12 will be examined on merits, while claims 13- 30 falling within non-elected Groups II and III are withdrawn.
	
1.2.	Applicant’s arguments “ the Office Action contains at least one error with respect to the requirement for election of species, which is understood to apply only if Group III is elected
for Group II, the species are defined in terms of claims, yet “[c]laims themselves are never species.””, MPEP806.04(e) “, are not persuasive. First there is a typographical error that GROUP II includes species defined in term of claims, because it is Group III and not Group II. Secondly MPEP states, “ , Where there is no disclosure of a relationship between species (see MPEP § 806.04(b)), they are independent inventions. A requirement for restriction is permissible if there is a patentable difference between the species as claimed and there would be a serious burden on the examiner if restriction is not required. See MPEP § 803 and § 808.02. The species 1, 2, and 3 , as explained on pages 4 and 5 are distinct and independent requiring three different strategies : Species 1: claims 21 and 24 directed to comprising reducing noise from the audio data using a natural network de-nosier; Species 2: claims 23 and 28 directed to distinguishing between speakers of audio data; and Species 3: claims 26 and 29 directed to recognizing synonym. Therefore, the requirement of Species for Non-elected Group III is still deemed proper and is therefore made FINAL.
	Elected Group I including claims 1-12 are pending and will be examined further on merits.


Claim Rejections - 35 USC § 112
2.	The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


2.1.	Claims 1-12 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claims 1-3, 8-9, and 11 recite terms, “first entity” and “second entity” which are broad and covers different aspects see Specification paras 0067, “where each entity is the subject of an intent. The customer 101 may use at least one utterance, e.g., a spoken or written phrase or sentence, to express or contextualize an intent. For example, the intent of the utterance “I would like one coffee” is “add item,” and the entities of the utterance are “one” and “coffee,” i.e., a quantity and a particular item offered by the store 109”.; para 0077, “ The assistant may then identify the general entity (falafel) but request further information by asking “Please specify what you are referring to? Falafel platter, falafel sandwich, falafel appetizer or falafel side.” The customer 101 may then clarify “Falafel appetizer.” , para 0115, “ The entity refers to either the menu item itself (e.g., an ice cream cone) or a modifier (a topping)” and para 0117 “the process continues with a determination of whether the first engine has detected an intention relating an entity quantity (such as an intention to order various quantities of different items). When the intent relating to the entity quantity is determined by the first engine, the process S200 involves determining whether the second engine has detected a quantity greater than a threshold quantity (e.g., a quantity greater than 10) (S213)”. Since the terms “first entity” and “second entity” which are broad and covers different aspects, they render the scope of claims 1-3, 8-9 and 11 unclear and indefinite and take away the inventive character of the claims. As how the computer readable medium and the one or more processors can be applied technically. Since claims 4-7, 10 and 12 depend from claim 1, they inherit the deficiency of claim 1 and are rejected for the same reasons.

2.2.	Claim 9 recites :The non-transitory computer-readable medium of claim 3, wherein the operations further comprise: selecting a result of the first language processing in response to detecting, by the second entity recognition processing, a number of entities above a threshold. The terms “entities” and “threshold” are broad, as recited. It is unclear what does the term “entities” represent and what threshold is to be exceeded by a second entity, thereby rendering the scope of claim 9 indefinite and unclear.


3.	Patent eligibility analysis of claims 1-12 per “PEG 2019”:
		Step 1: The claims 1-12 are directed to manufacture, which, as recited, are statutory.
	Step 2A, Prong 1 analysis: Claim 1-12  recite an abstract idea:
		Claim 1 recites:
1. A non-transitory computer-readable medium configured to store instructions executable by one or more processors to perform operations comprising:
independently processing, by each of a first system and a second system, audio data relating to one or more orders;
determining whether the first system has supplied a first indication associated with performing first language processing of the audio data, the first system being configured to perform one or more of the first language processing, first transcription processing, or first entity recognition processing; and
selecting a result of second language processing of the audio data from the second system in response to determining that the first system has not supplied the first indication, the second system being configured to perform one or more of second language processing of higher quality than the first language processing, second transcription processing of lower quality than the first transcription processing, or second entity recognition processing.

The claim 1 recites the limitations of independently processing audio data relating to one or more orders, which, as drafted, is a simple process of processing received verbal orders and a commercial related activity, e.g., restaurants receiving verbal telephone orders for food but for the language “by each of a first system and a second system”, which relate to a computer medium. That is other than reciting, “by each of a first system and a second system” nothing in the claim elements precludes the step from practically performed manually, e.g., restaurants receiving verbal telephone orders for food. The mere nominal recitation, “by each of a first system and a second system:”, does not take the claim limitations being performed manually a commercial activity falling within  “Certain Methods of Organizing a Human Activity” abstract idea. Claim 1 recites an abstract idea.
Step 2A, prong 2, analysis: Claim 1 recites a practical Application and is patent eligible:
The claim1 recites additional elements of “determining whether the first system has supplied a first indication associated with performing first language processing of the audio data, the first system being configured to perform one or more of the first language processing, first transcription processing, or first entity recognition processing; and selecting a result of second language processing of the audio data from the second system in response to determining that the first system has not supplied the first indication, the second system being configured to perform one or more of second language processing of higher quality than the first language processing, second transcription processing of lower quality than the first transcription processing, or second entity recognition processing”, which are not insignificant steps but provides a practical application in processing the languages of the audio data to process the orders by determining if the languages of audio data received provide indications for processing. Thus, the claimed invention is directed to a practical application and is patent eligible.
No: Claim 1 is not directed to an abstract idea, but it is patent eligible with its dependent claims 2-12.
Claims 1-12 are patent eligible.

Claim Rejections - 35 USC § 102
4.	(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.


Claims 1-9, and 11-12 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Zeitlin et al. cited in the IDS filed 10/08/2020, hereinafter Zeitlin.

Regarding claim 1, Zeitlin teaches a non-transitory computer-readable medium [See Fig.7B wherein the “Digital Assistant” represents the claimed non-transitory computer readable medium configured to store instructions executable by one or more processors [Fig.7B, “730 STT processing Module” includes one or more ASR systems”] to perform operations comprising:
independently processing, by each of a first system and a second system, audio data relating to one or more orders;
determining whether the first system has supplied a first indication associated with performing first language processing of the audio data, the first system being configured to perform one or more of the first language processing, first transcription processing, or first entity recognition processing; and selecting a result of second language processing of the audio data from the second system in response to determining that the first system has not supplied the first indication, the second system being configured to perform one or more of second language processing of higher quality than the first language processing, second transcription processing of lower quality than the first transcription processing, or second entity recognition processing,  see para 0205 [ “STT processing module 730 includes one or more ASR systems. The one or more ASR systems can process the speech input that is received through I/O processing module 728 to produce a recognition result. Each ASR system includes a front-end speech pre-processor. The front-end speech pre-processor extracts representative features from the speech input. For example, the front-end speech pre-processor performs a Fourier transform on the speech input to extract spectral features that characterize the speech input as a sequence of representative multi-dimensional vectors. Further, each ASR system includes one or more speech recognition models (e.g., acoustic models and/or language models) and implements one or more speech recognition engines. Examples of speech recognition models include Hidden Markov Models, Gaussian-Mixture Models, Deep Neural Network Models, n-gram language models, and other statistical models. Examples of speech recognition engines include the dynamic time warping based engines and weighted finite-state transducers (WFST) based engines. The one or more speech recognition models and the one or more speech recognition engines are used to process the extracted representative features of the front-end speech pre-processor to produce intermediate recognitions results (e.g., phonemes, phonemic strings, and sub-words), and ultimately, text recognition results (e.g., words, word strings, or sequence of tokens). In some examples, the speech input is processed at least partially by a third-party service or on the user's device (e.g., device 104, 200, 400, or 600) to produce the recognition result. Once STT processing module 730 produces recognition results containing a text string (e.g., words, or sequence of words, or sequence of tokens), the recognition result is passed to natural language processing module 732 for intent deduction. In some examples, STT processing module 730 produces multiple candidate text representations of the speech input. Each candidate text representation is a sequence of words or tokens corresponding to the speech input. In some examples, each candidate text representation is associated with a speech recognition confidence score. Based on the speech recognition confidence scores, STT processing module 730 ranks the candidate text representations and provides the n-best (e.g., n highest ranked) candidate text representation(s) to natural language processing module 732 for intent deduction, where n is a predetermined integer greater than zero. For example, in one example, only the highest ranked (n=1) candidate text representation is passed to natural language processing module 732 for intent deduction. In another example, the five highest ranked (n=5) candidate text representations are passed to natural language processing module 732 for intent deduction.”]. Zeitlin describes that a plurality of ASR systems wherein each AST system including one or more speech recognition modules  receives audio data which is recognized by the ASR system modules and based on the results they are converted by natural language processing module 732 into texts for intent  deduction of speech inputs [which correspond to audio data related to one or more orders] received . Each text conversions processed is ranked based on speech recognition confidence scores [which represent the claimed first language processing and second language process of the audio data [speech inputs] received and the determined higher rankings relate to the claimed processing of higher quality of second language processing of the audio data and the lower ranking relate to the claimed lower quality of the first language processing of audio data]. The intended deductions for each of the speech inputs correspond to the claimed first entry, and second entry for the first and second speech inputs, as further elaborated in paras 0220 and 0216 where the intended deductions could relate to speech inputs for orders for multiple entries such as restaurant reservation, date/time , cuisine, price range, etc. 

Regarding claim 2, the limitations, “The non-transitory computer-readable medium of claim 1, wherein the first entity recognition processing comprises recognizing at least a portion of the audio data as relating to multiple entities”, are already covered in the analysis of claim 1 [see paras 0220 and 0216].

Regarding claim 3, the limitations, “ The non-transitory computer-readable medium of claim 1, wherein the second entity recognition processing comprises recognizing at least a portion of the audio data as relating to a single entity”, are already covered as analyzed for claim 1 wherein the second or third entity comprising date/time, party size, are relating to the  single  entity that is restaurant reservation [see paras 0205 and  0216

Regarding claim 4, Zeitlin teaches that the non-transitory computer-readable medium of claim 1, wherein the operations further comprise: comparing the result of the second language processing to a result of the first language processing , see para 0205 and as analyzed above for claim 1 that the results of the recognition of speech inputs are compared and then ranked based on confidence scores and see para 0221 which describes  selection of results is based upon the combination of the number and the importance of the triggered nodes “Based on the quantity and/or relative importance of the activated nodes, natural language processing module 732 selects one of the actionable intents as the task that the user intended the digital assistant to perform. In some examples, the domain that has the most “triggered” nodes is selected. In some examples, the domain having the highest confidence value (e.g., based on the relative importance of its various triggered nodes) is selected. In some examples, the domain is selected based on a combination of the number and the importance of the triggered nodes. In some examples, additional factors are considered in selecting the node as well, such as whether the digital assistant has previously correctly interpreted a similar request from a user.

Regarding claim 5, Zeitlin teaches that the non-transitory computer-readable medium of claim 4, wherein the comparison comprises evaluating the result of the second language processing and the result of the first language processing against a plurality of criteria [see para 0208 wherein the ranking of the results of intended deductions that is entries is done based on a plurality of pronunciation basis, geographical origin, nationality or ethnicity, and also see para 0221 which describes that NLU [natural language module 732 determines a combination of criteria such as the number of triggered nodes [intent nodes, see para 0216] and their relative importance. 

Regarding claim 6, Zeitlin teaches that the non-transitory computer-readable medium of claim 1, wherein the operations further comprise selecting the result of the second language processing when (i) the first system is not determined to have supplied the first indication associated with performing the first language processing and the second system is determined to have supplied a second indication associated with performing the second language processing, see para 0205 “ In some examples, each candidate text representation is associated with a speech recognition confidence score. Based on the speech recognition confidence scores, STT processing module 730 ranks the candidate text representations and provides the n-best (e.g., n highest ranked) candidate text representation(s) to natural language processing module 732 for intent deduction, where n is a predetermined integer greater than zero. For example, in one example, only the highest ranked (n=1) candidate text representation is passed to natural language processing module 732 for intent deduction. In another example, the five highest ranked (n=5) candidate text representations are passed to natural language processing module 732 for intent deduction”. Zeitlin describes ranking the recognition results from different speech inputs so that if the second speech input recognition result is of higher quality it is ranked better than the other recognition result  [from first language processing] and the higher- ranking result is taken to have supplied a second indication associated with performing the second language processing. The limitations “or (ii) an identifier associated with the first indication and an identifier associated with the second indication are different”, “which, an alternative, is not considered.

Regarding claim 7, Zeitlin teaches that the non-transitory computer-readable medium of claim 1, wherein the operations further comprise selecting a result of the first language processing in response to determining that the second system has not supplied a second indication associated with performing the second language processing, see para 0205, see para 0205 “ In some examples, each candidate text representation is associated with a speech recognition confidence score. Based on the speech recognition confidence scores, STT processing module 730 ranks the candidate text representations and provides the n-best (e.g., n highest ranked) candidate text representation(s) to natural language processing module 732 for intent deduction, where n is a predetermined integer greater than zero. For example, in one example, only the highest ranked (n=1) candidate text representation is passed to natural language processing module 732 for intent deduction. In another example, the five highest ranked (n=5) candidate text representations are passed to natural language processing module 732 for intent deduction”. Zeitlin describes ranking the recognition results from different speech inputs so that if the first speech input recognition result is of higher quality it is ranked better than the other recognition result  [from second language processing] and the higher-ranking result of first language processing is selected.

Regarding claim 8, the limitations, “ The non-transitory computer-readable medium of claim 2, wherein the operations further comprise: selecting a result of the first language processing in response to recognizing, by the first entity recognition processing, a plurality of entities”, are discussed and analyzed in the analysis of claim 1, wherein a plurality of entities related to the first entity namely the restaurant could be recognized [paras 0205 , 0216. And 0220]

Regarding claim 9, Zeitlin teachings, see para 0205, that the recognition results are ranked based on recognition confidence  scores and the results with the highest ranked scores   which may reach a certain threshold such as first 5 highest ranked results are further processed by the natural language processing module 732, read on the limitations of claim 9 that the non-transitory computer-readable medium of claim 3, wherein the operations further comprise: selecting a result of the first language processing in response to detecting, by the second entity recognition processing, a number of entities above a threshold. 

Regarding claim 11, Zeitlin teaches that the non-transitory computer-readable medium of claim 1, wherein the operations further comprise: detecting, by each of the first entity recognition processing and the second entity recognition processing, a number of entities in the audio data; and selecting a result of the first language processing in response to the first entity recognition processing detecting a higher number of entities than the second entity recognition processing or selecting the result of the second language processing in response to the second entity recognition processing detecting a higher number of entities than the first entity recognition processing [See para 0205 and 0216  which teach detecting entities from fist language and second language processing [based on speech inputs] and detecting a higher number of entities related to restaurant reservation requirements than the second entity recognition processing or selecting the result of the second language processing in response to the second entity recognition processing detecting a higher number of entities than the first entity recognition processing based on the highest rankings of the speech recognition results.

Regarding claim 12, Zeitlin teaches all the limitations of claim 1, as analyzed above, including that the computer readable medium including first and second ASR systems [The digital Assistant , see Fig,7B , as analyzed above. Since Zeitlin does not specifically teach that different providers own them it is interpreted that Zeitlin disclosure relates to a same provider of both the ASR systems who owns the Digital Assistant and further as who is or are the owner(s) of the systems is an inventive concept because the system is to belong to somebody whether one owner or more than one owner. 


Claim Rejections - 35 USC § 103

5.	In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
5.1.	Claim 10 is rejected under 35 U.S.C. 103 as being unpatentable over Zeitlin in view of Chellapilla et al. [US 20070230791 A1], hereinafter Chellapilla.

Regarding claim 10, Zeitlin teaches all the limitations of claims 1 and 5, as analyzed above including evaluate the results of first and second language processing based on a plurality of criteria, but fails to disclose specifically  evaluating the criteria sequentially and terminating the evaluation upon selection of the result of the first language processing or the result of the second language processing. The concept of sequentially evaluating data and terminating the evaluation upon a selection of a result is an old and well-known concept as disclosed in Chellapilla [see para 0006, “ All of the above approaches for ink retrieval rely on a linear scan through the database for each query which tends to be slow. Sequential evaluation combined with early termination is commonly employed while computing match scores to avoid long query times.”. ]. Therefore , in view of the teachings of Chellapilla,   it would be obvious to an ordinary skilled in the art at the time of the invention to have incorporated the concept of using sequential evaluation of results of the first language and second language processing to terminate the evaluation upon selection of the very first high ranked result based on confidence recognition scores because as shown in Chellapilla to avoid long query time in evaluation of the results.

Conclusion
6.	The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
	(i)	Agarwal [US Patent 10360265 B1; see col.8, lines 40-61 ] discloses performing a search content in response to receiving a search query from a speaker via a speech processing service associated with the voice communications device 104 or receive audio input data in the search request from the voice communications device 104 and may perform automatic speech recognition (ASR) and/or natural language processing (NLP) to identify the search query from the audio input data and a search module 241  processes the audio input data.
	(ii)	Johnson et al. [US 20190080685 A1; see para 114 and Fig.3] discloses 
a process 300 receiving [t step 302] first audio data from an electronic device 10 representing a request  by language processing system 200, determining an user’s account [step 304] associated with the electronic device 10, ASR processing [step 306] to generate first text data representing the audio data, NLU processing [step 308] to generate intent data representing the text data. The intent data may contain specific information related to the request and correspond to a domain associated with the request. For example, a user may request to order a pepperoni pizza from a particular restaurant. The intent data may include the domain information (e.g., a “Food Ordering” domain, a “Restaurant Delivery” domain, etc.), as well as the specific parameters related to the user's request (e.g., pepperoni, pizza, from a specific restaurant). 
	(iii)	Pollock et al. [US 20080222004 A1; see para 0016 ] discloses a remote order processing system 100 to facilitate order taking and remote order taking for a restaurant or other sales and service supplier. 
	(i)	Garber, Amy, “ Quick-service leaders eye life in the fast lane”; [2005] : Publication: Nation's Restaurant News 39.50: 1,83-84,86. Lebhar-Friedman, Inc. (Dec 12, 2005), retrieved from Dialog on 06/19/2022 discloses use of a remote order-processing center that is linked to multiple restaurants' kitchens.


Any inquiry concerning this communication or earlier communications from the examiner should be directed to YOGESH C GARG whose telephone number is (571)272-6756. The examiner can normally be reached Max-Flex.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jeffrey A Smith can be reached on 571-272-6763. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/YOGESH C GARG/Primary Examiner, Art Unit 3625