DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . 

Response to Amendment
This office action is in reply to Applicant’s Response dated 05/23/2022. No claim is amended. Claims 1-20 remain pending in the application.
	
Response to Arguments
The Applicant argues (see pages 9-10), with respect to the claim interpretation under 35 U.S.C. 112(f), that the "natural language unit" limitations is not a generic placeholder but instead refers to structure understood by one skilled in the art. See Fig. 9, item 903 (the "natural language unit (external)"); Fig. 10, items 903A and 903B (the "natural language unit") and related discussions (paras 0068, 0069, 0070, 0074, 0076, 0077, 0080 as well as others). 
In response to the Applicant’s argument, the Examiner respectfully disagrees. Nowhere in Fig. 10 discloses the corresponding structure for the "natural language unit". Instead, Fig. 10 merely shows that the "natural language unit" interacts with bot 208. Paragraphs 0068, 0069, 0070, 0074, 0076, 0077 and 0080 discloses that the natural language unit 903 is remote, the group messaging service 104 sends the text 912 to the natural language unit 903 and one natural language unit 903 may be used for restaurant or food-related messages, another natural language unit 903 may be used for document or publication-related messages, and another natural language unit 903 may be used for scheduling related messages. However, nowhere in paragraphs 0068, 0069, 0070, 0074, 0076, 0077 and 0080 discloses the corresponding structure for the "natural language unit". Instead, paragraphs 0068, 0069, 0070, 0074, 0076, 0077 and 0080 merely disclose the function or use of the "natural language unit", not its structure. Accordingly, the claim interpretation under 35 U.S.C. 112(f) is applicable and is maintained.

The Applicant argues (see page 12), with respect to the rejection of claims 1, 10 and 17 under 35 U.S.C. 103, that the combination of references does not suggest the claimed steps as performed by a group messaging service configured to manage audio messaging between a plurality of users.
In response, the Examiner respectfully disagrees. Brown teaches that each virtual assistant of the virtual assistant team 102 may be configured for multi-modal input/output (e.g., receive and/or respond in audio or speech, text, touch, gesture, etc.). Brown further teaches that if a user desires to schedule a meeting with another user, the user may communicate this desire to a virtual assistant and the virtual assistant may communicate with the other user's virtual assistant to schedule the meeting and that upon identifying that a user has arrived at an office building for a meeting with another user (e.g., based on location information), the user's virtual assistant may communicate with a virtual assistant of the other user to let the other user know that the user has arrived for the meeting. This clearly shows the virtual assistant relays a user’s message to the other user. In other words, a user communicates with another user by way of virtual assistants. Thus, Brown teaches the claimed steps as performed by a group messaging service configured to manage audio messaging between a plurality of users (Brown, see figs. 1, 2, 4 and 17; paragraphs 0029, 0035, 0051, 0064-0066 and 0111).

The Applicant argues (see page 12), with respect to the rejection of claims 1, 10 and 17 under 35 U.S.C. 103, that the "group" in applicant's claims is a group of a plurality of user nodes, and the element discusses an audio message from a user in the group to "a bot member node of the group."  Conversely, Brown discusses a system whereby a user interacts directly with a virtual assistant, and not communications in a group of a plurality of user nodes (see, e.g., FIG. 11-12). Therefore, Brown does not suggest any "determining" step that a message is directed to a bot, because that is a given in Brown - the interface is specifically designed for communications between a user and a bot, and no determination step is necessary.
In response, the Examiner respectfully disagrees. Brown teaches that each virtual assistant of the virtual assistant team 102 may be configured for multi-modal input/output (e.g., receive and/or respond in audio or speech, text, touch, gesture, etc.). Brown further teaches that if a user desires to schedule a meeting with another user, the user may communicate this desire to a virtual assistant and the virtual assistant may communicate with the other user's virtual assistant to schedule the meeting. Clearly, it is determined here that the user’s desire (bot command) to schedule a meeting is directed to the user’s virtual assistant (bot member node of the group) and the other user’s virtual assistant (bot member node of the group). The group here is the user’s device, the other user’s device, the user’s virtual assistant and the other user’s virtual assistant. Thus, Brown teaches "determining" step that a message is directed to a bot (Brown, see figs. 1, 2, 4 and 17; paragraphs 0029, 0035, 0051, 0064-0066 and 0111).

The Applicant argues (see page 13), with respect to the rejection of claims 1, 10 and 17 under 35 U.S.C. 103, that Brown does not suggest, "accessing a data structure for the group via the group messaging service to determine, from a plurality of voice libraries, a selected voice library associated with the bot member node". Brown does not teach accessing a data structure for the group [of user nodes]; it does not teach determining a selected voice library from a plurality of voice libraries based on that data structure; and it does not teach that the selected voice is associated with the bot member node. 
The Examiner respectfully disagrees. The terms “data structure” is not defined in the claims. Brown teaches that the input that is received from the user 106 during a conversation with a virtual assistant may be sent to the input processing module 208 for processing. If the input is speech input, the input processing module 208 may perform speech recognition techniques to convert the input into a format that is understandable by a computing device, such as text. Additionally, or alternatively, the input processing module 208 may utilize Natural Language Processing (NLP) to interpret or derive a meaning and/or concept of the input. The speech recognition and/or NLP techniques may include known or new techniques (Brown, see paragraph 0051).
Note that, as disclosed above by Brown, the NLP and the speech recognition are two separate libraries because the NLP can used additionally to or as an alternative to the speech recognition techniques. Thus, Brown teaches “a plurality of voice libraries”. Brown teaches that a determination is made as to whether the input is a speech. In other words, Brown teaches accessing the input or structure of the input that has been received to make a determination if it is a speech (speech data structure). If the structure of the input is a speech then Brown’s system selects a speech recognition technique or the NLP or the speech recognition in addition to the NLP to process the input. Hence, Brown teaches "accessing a data structure for the group (structure of the input for the group) via the group messaging service (virtual assistant team/service) to determine, from a plurality of voice libraries (speech recognition and  NLP), a selected voice library (speech recognition technique or the NLP) associated with the bot member node (virtual assistant node)". 

The Applicant argues (see page 13), with respect to the "speech units" of Basye do not correspond go either "a speech-to-text engine and a natural language unit configured to convert a received message into enhanced text in a format suited to processing by the bot member node" and that the speech units are not "a specific combination of a voice-to text engine and a natural language unit configured to convert a received message into enhanced text in a format suited to processing by the bot member node" as in Applicant's claim. Further, the speech units of Basye are not "a selected voice library associated with the bot member node", determined from a plurality of voice libraries.
In response the Applicant’s argument, the Examiner respectfully disagrees. Applicant's arguments fail to comply with 37 CFR 1.111(b) because they amount to a general allegation that the claims define a patentable invention without specifically pointing out how the language of the claims patentably distinguishes them from the references. This is because the Applicant merely describes what Basye teaches. Applicant fails to describes how the specific limitations in the claims are different from Basye’s teaching. Nowhere in the Applicant’s argument or in the specific limitations in which Basye is relied upon to teach clarifies what the speech-to-text engine and the natural language unit are or how these two elements are different from each other or what “a specific combination” is. 
Basye teaches that devices performing the ASR process 250 may include an acoustic front end (AFE) 256 and a speech recognition engine 258. The acoustic front end (AFE) 256 transforms the audio data from the microphone into data for processing by the speech recognition engine. The speech recognition engine 258 compares the speech recognition data with acoustic models 253, language models 254, and other data models and information for recognizing the speech conveyed in the audio data.  Basye further teaches that system may perform (146) ASR to determine utterance text and that ASR results in the form of a single textual representation of the speech, an N-best list including multiple hypotheses and respective scores, lattice, etc. may be sent to a server, such as server 120, for natural language understanding (NLU) processing, such as conversion of the text into commands for execution, either by the device 110 (Basye, see fig. 1; see paragraphs 0024-0025, 0028 and 0081).
Clearly, Basye teaches “a voice library (Basye’s System in Fig. 1) including a specific combination of a speech-to-text engine and a natural language unit (combination of speech recognition engine, acoustic front end (AFE) and speech unit database) configured to convert a received message into enhanced text (perform (146) ASR to determine utterance text) in a format (ASR results in the form of a single textual representation of the speech) suited to processing by the bot member node ( server 120)” (Basye, see fig. 1; see paragraphs 0024-0025, 0028 and 0081).

The Applicant argues (see page 14) that the combination of references does not teach or suggest, "providing the enhanced text to the bot member node". The text messages, as received from the users, are sent to the translation bots- there is no "enhanced text", converted from a received message into a format suited to processing by the bot member node. 
In response to the Applicant’s argument, the Examiner respectfully disagrees. Montet teaches ChatBot No. 1 and ChatBot No. 2 each automatically accept their respective requests-to-join the chat room, enter the chat room and begin message translations on the chat channel. Communications on the chat channel shown in FIG. 2 are managed by the server 34. The server 34 connects to the IM clients 32 (i.e., Amy and Bob), the IM ChatBots 36, and the UnivBot 18. In this way the Amy and Bob receive and send messages, ChatBot No. 1 and ChatBot No. 2 receive messages and send translation messages, and the UnivBot 18 receives feedback and sends commands (Montet, see figs 3 and 25-28; see paragraph 0034). In this case, the translation messages sent to/from the ChatBots are enhanced text. It is clear here that Montet teaches “providing the enhanced text to the bot member node” (Montet, see figs 3 and 25-28; see paragraph 0034).
With respect to the Applicant’s argument that there is no "enhanced text", converted from a received message into a format suited to processing by the bot member node, Montet’s messages are converted or translated into enhanced text (translation messages) and sent to the receiver node (bot member node) for processing or displaying to the user. 
Additionally, as shown above, it is Basye, not Montet, that is relied upon to teach “…convert a received message into enhanced text in a format suited to processing by the bot member node”. 

The Applicant argues (see pages 14-15) that a PHOSITA would not be motivated to select these references, and selectively cull and modify elements to combine in such a way as to arrive at Applicant's claims. The rejection appears to be motivated by impermissible hindsight reasoning, using Applicant's own claims and specification as a guide to piece together the teachings of the prior art. And even if the references were combined as asserted, there is no explanation of how such a combination would work.
In response to applicant's argument that the examiner's conclusion of obviousness is based upon improper hindsight reasoning, it must be recognized that any judgment on obviousness is in a sense necessarily a reconstruction based upon hindsight reasoning.  But so long as it takes into account only knowledge which was within the level of ordinary skill at the time the claimed invention was made, and does not include knowledge gleaned only from the applicant's disclosure, such a reconstruction is proper.  See In re McLaughlin, 443 F.2d 1392, 170 USPQ 209 (CCPA 1971).
In this case, as the Examiner indicated above, all the features of the claims are taught by the combination of the references. The motivations to combine the references are clearly taken from the references themselves (see pages 15 and 16 of the Non-Final Office Action dated 03/08/2022) and not from the Applicant’s disclosure. Thus, the examiner's conclusion of obviousness is not based upon improper hindsight reasoning.

The Applicant argues (see page 16) that the OA makes no explanation of how these different systems would be combined and function to arrive at Applicant's claims. 
In response to applicant’s argument that there is no teaching, suggestion, or motivation to combine the references, the examiner recognizes that obviousness may be established by combining or modifying the teachings of the prior art to produce the claimed invention where there is some teaching, suggestion, or motivation to do so found either in the references themselves or in the knowledge generally available to one of ordinary skill in the art.  See In re Fine, 837 F.2d 1071, 5 USPQ2d 1596 (Fed. Cir. 1988), In re Jones, 958 F.2d 347, 21 USPQ2d 1941 (Fed. Cir. 1992), and KSR International Co. v. Teleflex, Inc., 550 U.S. 398, 82 USPQ2d 1385 (2007).  
In this case, Brown is directed to implementing a team of virtual assistants, each virtual assistant of the virtual assistant team is configured for multi-modal input/output (e.g., receive and/or respond in audio or speech, text, touch, gesture, etc.), multi-language communication (e.g., receive and/or respond according to any type of human language) and that If the input is speech input, the input processing module 208 may perform speech recognition techniques to convert the input into a format that is understandable by a computing device, such as text. (Brown, see abstract, paragraphs 0035 and 0051).
Basye teaches a system that receive audio data and performing ASR to determine utterance text (Basye, see fig. 1). Montet teaches a system that uses a UnivBot 18 sends room information and requests-to-join to the appropriate ChatBots 36 (i.e., ChatBot No. 1--French-to-English--for Amy's speech and ChatBot No. 2--English-to-French--for Bob's speech) (Montet, see figs. 1-3; see paragraph 0033).
Therefore, all three references, Brown, Basye and Montet as well as the Applicant’s claimed invention concern the same problem of processing or translating user input messages and providing the processed or translated messages to another node. The motivations to combine the references are set forth on pages 15 and 16 of the Non-Final Office Action dated 03/08/2022 and also, again, below in the “Claim Rejections - 35 USC § 103” section of this Office Action. As such, the combination of the references is proper.

Allowable Subject Matter
Claims 4, 13 and 19 objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation(s) is/are: “natural language unit” in claims 1, 6, 8, 10, 14, 17 and 20.
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
A review of the specification shows that the following appears to be the corresponding structure described in the specification for the 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph limitation: “processor 1204 and applications 1212 may be configured in any allowable hardware/software configuration, including pure hardware configurations implemented in ASIC or FPGA forms” (see paragraph 0095 of the specification as filed).
	If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.

Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.

Claims 1-3, 5-7, 9-12, 14-18 and 20 rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1-5, 9-10, 12 and 15-18 of U.S. Patent No. 10897433. Although the claims at issue are not identical, they are not patentably distinct from each other.

Regarding claims 1, 10 and 17, claims 1, 10 and 17 of Application No. 17152393 correspond to claims 1, 10 and 17 of U.S. Patent No. 10897433. See the table below regarding claim 1. Claims 10 and 17 of Application No. 17152393 recite similar features to claim 1 of Application No. 17152393. Likewise, claims 10 and 17 of U.S. Patent No. 10897433 recite similar features to claim 1 of U.S. Patent No. 10897433. Therefore, the table below also applies to claims 10 and 17 of Application No. 17152393 and claims 10 and 17 of U.S. Patent No. 10897433.

Application No. 17152393
U.S. Patent No. 10897433
Claims 1, 10 and 17. A method comprising: receiving, at a group messaging service configured to manage audio messaging between a plurality of user nodes in a group, a recorded audio message from a first user node in the group; determining, at the group messaging service, that the recorded audio message includes a bot command directed to a bot member node of the group; 


executing the bot command, including: accessing a data structure for the group via the group messaging service to determine, from a plurality of voice libraries, a selected voice library associated with the bot member node to process the recorded audio, a voice library including a specific combination of a speech-to-text engine and a natural language unit configured to convert a received message into enhanced text in a format suited to processing by the bot member node; 

processing the recorded audio message via the selected voice library; and providing the enhanced text to the bot member node.
Claims 1, 10 and 17. A method comprising: receiving, by a group messaging service configured to manage audio messaging between a plurality of user nodes in a group comprising at least a user node, a second user node, and a bot software application member node, a message from the user node comprising recorded audio and including a request, a user node identifier that identifies the user node, and a group identifier that identifies the group;

selecting a selected voice library from a plurality of voice libraries to process the recorded audio, a voice library including both a speech-to-text engine and a natural language unit configured to convert a received message into enhanced text in a format suited to processing by the bot; processing, by the selected voice library, the recorded audio to produce the enhanced text comprising the request;



sending, by the group messaging service, the enhanced text to the bot; receiving, at the group messaging service, a reply from the bot, the reply comprising information indicating completion of the request; and sending, to the user node and the second user node, a group reply indicating completion of the request.


Regarding claims 2-3, 5-7, 9, 11-12, 14-16, 18 and 20, claims 2-3, 5-7, 9, 11-12, 14-16, 18 and 20 of Application No. 17152393 correspond respectively to claims 1, 4, 2-3, 5, 9, 10, 15, 12, 5, 16, 17 and 18 of U.S. Patent No. 10897433.

Claims 1, 5-7, 10, 14-15, 17 and 20 rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1-2, 7, 10, 13 and 15-17 of U.S. Patent No. 11127636. Although the claims at issue are not identical, they are not patentably distinct from each other.

Regarding claims 1, 10 and 17, claims 1, 10 and 17 of Application No. 17152393 correspond to claims 1, 10 and 16 of U.S. Patent No. 11127636. See the table below regarding claim 1. Claims 10 and 17 of Application No. 17152393 recite similar features to claim 1 of Application No. 17152393. Likewise, claims 10 and 16 of U.S. Patent No. 11127636 recite similar features to claim 1 of U.S. Patent No. 11127636. Therefore, the table below also applies to claims 10 and 17 of Application No. 17152393 and claims 10 and 16 of U.S. Patent No. 11127636.

Application No. 17152393
U.S. Patent No. 11127636
Claims 1, 10 and 17. A method comprising: receiving, at a group messaging service configured to manage audio messaging between a plurality of user nodes in a group, a recorded audio message from a first user node in the group; determining, at the group messaging service, that the recorded audio message includes a bot command directed to a bot member node of the group; 





executing the bot command, including: accessing a data structure for the group via the group messaging service to determine, from a plurality of voice libraries, a selected voice library associated with the bot member node to process the recorded audio, a voice library including a specific combination of a speech-to-text engine and a natural language unit configured to convert a received message into enhanced text in a format suited to processing by the bot member node; 

processing the recorded audio message via the selected voice library; and providing the enhanced text to the bot member node.
Claims 1, 10 and 16. A method comprising: receiving, by a group messaging service configured to manage messaging between a plurality of user nodes in a group, a message comprising recorded audio and a bot identifier for a bot member of the group, a bot comprising a software application for performing a task over the internet; in response to receiving the message, searching a data structure of the group, maintained by the group messaging service, based on the bot identifier to determine that the bot member is a member of the group; 

in response to determining the bot member is a member of the group, accessing a bot entry in the data structure corresponding to the bot identifier, the bot entry including an indicator of a voice library corresponding to the bot member, voice libraries including a set of processing elements configured to convert an audio message into a target format; selecting which of a plurality of available voice libraries to use to process the recorded audio based on the indicator in the bot entry;

processing, by a selected voice library, the recorded audio to produce a modified message in the target format suited to the bot member; and sending, by the group messaging service, the modified message to the bot member.



Regarding claims 5-7, 14-15 and 20, claims 5-7, 14-15 and 20 of Application No. 17152393 correspond respectively to claims 1, 7, 2, 15, 13 and 17 of U.S. Patent No. 11127636.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 1-3, 6, 8, 10-12, 14, 17-18 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Brown et al. (U.S. PGPub 2015/0185996) in view of Basye et al. (U.S. PGPub 2016/0379638) further in view of Montet et al. (U.S. PGPub 2003/0220972).

Regarding claims 1, 10 and 17, Brown teaches A method comprising: receiving, at a group messaging service configured to manage audio messaging between a plurality of user nodes in a group, a recorded audio message from a first user node in the group; (Brown, see figs. 1, 2, 4 and 17; paragraph 0035 where Each virtual assistant of the virtual assistant team 102 is configured for multi-modal input/output (e.g., receive and/or respond in audio or speech...; see paragraphs 0064-0066 where Message information describing a message that has been sent via a messaging service (e.g., a text message, an email, an instant messaging message, a telephone call (audio), etc.). The messaging information may identify the content of the message, who the message was sent to, from whom the message was sent, etc....; see paragraph 0111 where  if a user desires to schedule a meeting with another user (users in a group), the user may communicate this desire to a virtual assistant and the virtual assistant may communicate with the other user's virtual assistant to schedule the meeting...; see also paragraph 0029; see paragraph 0051 where process input received from a user. For instance, input that is received from the user 106 during a conversation with a virtual assistant may be sent to the input processing module 208 for processing...)
determining, at the group messaging service, that the recorded audio message includes a bot command directed to a bot member node of the group; (Brown, see figs. 8-11 virtual assistant groups; paragraph 0035 where Each virtual assistant of the virtual assistant team 102 is configured for multi-modal input/output (e.g., receive and/or respond in audio or speech...; see paragraph 0111 where  if a user desires to schedule a meeting with another user (users in a group), the user may communicate this desire to a virtual assistant and the virtual assistant may communicate with the other user's virtual assistant to schedule the meeting...; see paragraph 0140 where he user requests "Do I need to pay any bills?,"...; see paragraph 0051)
executing the bot command, including: accessing a data structure for the group via the group messaging service to determine, from a plurality of voice libraries, a selected voice library associated with the bot member node to process the recorded audio, (Brown, see paragraph 0051 where If the input is speech input, the input processing module 208 may perform speech recognition techniques to convert the input into a format that is understandable by a computing device, such as text. Additionally, or alternatively, the input processing module 208 may utilize Natural Language Processing (NLP) (selected voice library) to interpret or derive a meaning and/or concept of the input. The speech recognition and/or NLP techniques may include known or new techniques...)
However, Brown does not explicitly teach a voice library including a specific combination of a speech-to-text engine and a natural language unit configured to convert a received message into enhanced text in a format suited to processing by the bot member node;
processing the recorded audio message via the selected voice library; and
Basye teaches a voice library including a specific combination of a speech-to-text engine and a natural language unit configured to convert a received message into enhanced text in a format suited to processing by the bot member node; (Basye, see fig. 1; see paragraphs 0024-0025 where arrive at the server encoded, in which case they may be decoded prior to processing by the processor executing the speech recognition engine 258…; see paragraph 0081 where Each voice corpus may include a speech unit database. The speech unit database may be stored in TTS storage 320, in storage 312, or in another storage component. For example, different unit selection databases may be stored in TTS voice unit storage 372. Each speech unit database includes recorded speech utterances with the utterances' corresponding text aligned to the utterances. A speech unit database may include many hours of recorded speech (in the form of audio waveforms, feature vectors, or other formats), which may occupy a significant amount of storage. The unit samples in the speech unit database may be classified in a variety of ways including by phonetic unit (phoneme, diphone, word, etc.)...)
processing the recorded audio message via the selected voice library; and (Basye, see fig. 1; see paragraph 0025 where arrive at the server encoded, in which case they may be decoded prior to processing by the processor executing the speech recognition engine 258…; see paragraph 0081 where Each voice corpus may include a speech unit database. The speech unit database may be stored in TTS storage 320, in storage 312, or in another storage component. For example, different unit selection databases may be stored in TTS voice unit storage 372. Each speech unit database includes recorded speech utterances with the utterances' corresponding text aligned to the utterances. A speech unit database may include many hours of recorded speech (in the form of audio waveforms, feature vectors, or other formats), which may occupy a significant amount of storage. The unit samples in the speech unit database may be classified in a variety of ways including by phonetic unit (phoneme, diphone, word, etc.)...; see fig. 6A; see paragraph 0090 where processing on the input audio to determine utterance text. The system may also perform NLU or further processing on the utterance text. The system may send (612) the utterance text, semantic notes, and/or other data to a command processor for execution. The system may then execute (614) a command associated with the utterance using the utterance text and the indicator of speech quality, thus customizing the output based on the input speech)
It would have been obvious to one of ordinary skill in the art, at the time the invention was filed, to combine Brown and Basye to provide the technique of a voice library including a specific combination of a speech-to-text engine and a natural language unit configured to convert a received message into enhanced text in a format suited to processing by the bot member node and processing the recorded audio message via the selected voice library of Basye in the system of Brown in order to improve a user experience for human-device interactions to mimic human-human interactions where possible (Basye, see paragraph 0014).
However, Brown-Basye does not explicitly teach providing the enhanced text to the bot member node.
Montet teaches providing the enhanced text to the bot member node. (Montet, see figs 3 and 25-28; see paragraph 0034 where The server 34 connects to the IM clients 32 (i.e., Amy and Bob), the IM ChatBots 36, and the UnivBot 18. In this way the Amy and Bob receive and send messages, ChatBot No. 1 and ChatBot No. 2 receive messages and send translation messages, and the UnivBot 18 receives feedback and sends commands.)
It would have been obvious to one of ordinary skill in the art, at the time the invention was filed, to combine Brown-Basye and Montet to provide the technique of providing the enhanced text to the bot member node of Montet in the system of Brown-Basye in order to provide a flexible way that is easy to use in a chat room environment (Montet, see paragraph 0007).

Regarding claims 2 and 11, Brown-Basye-Montet teaches further comprising: receiving, at the group messaging service, a reply from the bot member node comprising (Brown, see fig. 11; see paragraph 0140 where the user requests "Do I need to pay any bills?," as illustrated by a conversation item 1104. Based on this information, the executive assistant virtual assistant may determine that a finance virtual assistant is needed, since the executive assistant virtual assistant may not have access to any bill information...)
information indicating completion of the bot command; and (Brown, see figs. 13 and 17; see paragraph 0188 where conversation items may be caused to be output via the conversation user interface to represent the communication between the virtual assistants...conversation item may indicate that the task has been completed (e.g., an icon for a calendar event that has been scheduled, an icon for a flight that has been booked, and so on)…)
transmitting a completion message to the group indicating completion of the bot command. (Montet, see fig. 39 Amy (user node), Bob (second user node) and ENG FRE bot (a bot) says Bonjour!; see fig. 40; see paragraph 0178 where ENG-FRE bot and FRA-ENG bot display in the conversation the translation results. FIG. 39 shows Amy and Bob saying "Hello!" ...; this indicates that the result "Bonjour!" is a group reply sent to Amy and Bob indicating that the translation has been completed)
The motivation regarding to the obviousness to claims 1, 10 and 17 also applies to claims 2 and 11.

Regarding claims 3 and 12, Brown-Basye-Montet teaches further comprising: determining a type of bot the bot member node is between: a group bot responsive to any member of the group; a user bot responsive to only the first user node; (Brown, see figs. 4, 11 and 16; see paragraphs 0018-0019 where the trainer may submit a trained version of a virtual assistant to be offered for acquisition to users. Here, the user may submit one of the virtual assistants that is selected below the heading "Your Virtual Assistants in Training." In some instances, after submitting a virtual assistant, the trainer may be directed to a page to obtain compensation (e.g., input bank account routing information, obtain gift cards, etc.)...; see paragraph 0174 where a virtual assistant may be selected that is configured to perform a task that is related to the concept or input of the conversation (e.g., identify a finance virtual assistant when the user mentions "ATM" or other term that is related to banking). In another example, location information may be analyzed to determine a location of a smart device that is used by a user (e.g., current, future/destination, or previous geo-location). Here, a virtual assistant may be selected that is configured to perform a task that is related to the location (e.g., selecting a travel virtual assistant when the user arrives at an airport). In yet another example, content output history describing content that has been output to a user may be analyzed to identify content that has been output...; see paragraph 0179)
transmitting the completion message to a plurality of user nodes in the group when the bot member node is a group bot; and (Brown, see figs. 13 and 17; see paragraph 0188 where conversation items may be caused to be output via the conversation user interface to represent the communication between the virtual assistants...conversation item may indicate that the task has been completed (e.g., an icon for a calendar event that has been scheduled, an icon for a flight that has been booked, and so on)…)
transmitting the completion message to only the first user node when the bot member node is a user bot. (Brown, see figs. 13 and 17; see paragraph 0188 where conversation items may be caused to be output via the conversation user interface to represent the communication between the virtual assistants...conversation item may indicate that the task has been completed (e.g., an icon for a calendar event that has been scheduled, an icon for a flight that has been booked, and so on)…)

Regarding claims 6, 14 and 20, Brown-Basye-Montet teaches further comprising: receiving the recorded audio message as encoded audio at the group messaging service; (Basye, see fig. 1; see paragraph 0025 where arrive at the server encoded, in which case they may be decoded prior to processing by the processor executing the speech recognition engine 258…; see paragraph 0081 where Each voice corpus may include a speech unit database. The speech unit database may be stored in TTS storage 320, in storage 312, or in another storage component. For example, different unit selection databases may be stored in TTS voice unit storage 372. Each speech unit database includes recorded speech utterances with the utterances' corresponding text aligned to the utterances. A speech unit database may include many hours of recorded speech (in the form of audio waveforms, feature vectors, or other formats), which may occupy a significant amount of storage. The unit samples in the speech unit database may be classified in a variety of ways including by phonetic unit (phoneme, diphone, word, etc.)...; see figs. 6A-6B; see paragraph 0090 where processing on the input audio to determine utterance text. The system may also perform NLU or further processing on the utterance text. The system may send (612) the utterance text, semantic notes, and/or other data to a command processor for execution. The system may then execute (614) a command associated with the utterance using the utterance text and the indicator of speech quality, thus customizing the output based on the input speech)
decoding, via the group messaging service, the recorded audio message to obtain decoded audio; (Basye, see fig. 1; see paragraph 0025 where arrive at the server encoded, in which case they may be decoded prior to processing by the processor executing the speech recognition engine 258…; see paragraph 0081 where Each voice corpus may include a speech unit database. The speech unit database may be stored in TTS storage 320, in storage 312, or in another storage component. For example, different unit selection databases may be stored in TTS voice unit storage 372. Each speech unit database includes recorded speech utterances with the utterances' corresponding text aligned to the utterances. A speech unit database may include many hours of recorded speech (in the form of audio waveforms, feature vectors, or other formats), which may occupy a significant amount of storage. The unit samples in the speech unit database may be classified in a variety of ways including by phonetic unit (phoneme, diphone, word, etc.)...; see figs. 6A-6B; see paragraph 0090 where processing on the input audio to determine utterance text. The system may also perform NLU or further processing on the utterance text. The system may send (612) the utterance text, semantic notes, and/or other data to a command processor for execution. The system may then execute (614) a command associated with the utterance using the utterance text and the indicator of speech quality, thus customizing the output based on the input speech)
converting, by the speech-to-text engine of the selected voice library, the decoded audio to decoded text; and (Basye, see fig. 1; see paragraph 0025 where arrive at the server encoded, in which case they may be decoded prior to processing by the processor executing the speech recognition engine 258…; see paragraph 0081 where Each voice corpus may include a speech unit database. The speech unit database may be stored in TTS storage 320, in storage 312, or in another storage component. For example, different unit selection databases may be stored in TTS voice unit storage 372. Each speech unit database includes recorded speech utterances with the utterances' corresponding text aligned to the utterances. A speech unit database may include many hours of recorded speech (in the form of audio waveforms, feature vectors, or other formats), which may occupy a significant amount of storage. The unit samples in the speech unit database may be classified in a variety of ways including by phonetic unit (phoneme, diphone, word, etc.)...; see figs. 6A-6B; see paragraph 0090 where processing on the input audio to determine utterance text. The system may also perform NLU or further processing on the utterance text. The system may send (612) the utterance text, semantic notes, and/or other data to a command processor for execution. The system may then execute (614) a command associated with the utterance using the utterance text and the indicator of speech quality, thus customizing the output based on the input speech)
enhancing, by the natural language unit of the selected voice library, the decoded text to create the enhanced text. (Basye, see fig. 1; see paragraph 0025 where arrive at the server encoded, in which case they may be decoded prior to processing by the processor executing the speech recognition engine 258…; see paragraph 0081 where Each voice corpus may include a speech unit database. The speech unit database may be stored in TTS storage 320, in storage 312, or in another storage component. For example, different unit selection databases may be stored in TTS voice unit storage 372. Each speech unit database includes recorded speech utterances with the utterances' corresponding text aligned to the utterances. A speech unit database may include many hours of recorded speech (in the form of audio waveforms, feature vectors, or other formats), which may occupy a significant amount of storage. The unit samples in the speech unit database may be classified in a variety of ways including by phonetic unit (phoneme, diphone, word, etc.)...; see figs. 6A-6B; see paragraph 0090 where processing on the input audio to determine utterance text. The system may also perform NLU or further processing on the utterance text. The system may send (612) the utterance text, semantic notes, and/or other data to a command processor for execution. The system may then execute (614) a command associated with the utterance using the utterance text and the indicator of speech quality, thus customizing the output based on the input speech)
The motivation regarding to the obviousness to claims 1, 10 and 17 also applies to claims 6, 14 and 20.

Regarding claim 8, Brown-Basye-Montet teaches wherein: one of the speech-to-text engine and the natural language unit are remote from the group messaging service; and (Basye, see figs. 2 and 3; see paragraph 0022 where An ASR process 250 converts the audio data 111 into text. The ASR transcribes audio data into text data representing the words of the speech contained in the audio data. The text data may then be used by other components for various purposes, such as executing system commands, inputting data, etc. A spoken utterance in the audio data is input to a processor configured to perform ASR which then interprets the utterance based on the similarity between the utterance and pre-established language models 254 stored in an ASR model knowledge base (ASR Models Storage 252)...)
the other of the speech-to-text engine and the natural language unit are located at the group messaging service. (Basye, see figs. 2 and 3; see paragraph 0067 where TTS module/processor 314 includes a TTS front end (TTSFE) 316, a speech synthesis engine 318, and TTS storage 320. The TTSFE 316 transforms input text data (for example from command processor 290) into a symbolic linguistic representation for processing by the speech synthesis engine 318. The speech synthesis engine 318 compares the annotated phonetic units models and information stored in the TTS storage 320 for converting the input text into speech...)
The motivation regarding to the obviousness to claims 1, 10 and 17 also applies to claim 8.

Regarding claim 18, Brown-Basye-Montet teaches the processor further configured to: determine a type of bot the bot member node is between: a group bot responsive to any member of the group; a user bot responsive to only the first user node; (Brown, see fig. 11; see paragraph 0140 where the user requests "Do I need to pay any bills?," as illustrated by a conversation item 1104. Based on this information, the executive assistant virtual assistant may determine that a finance virtual assistant is needed, since the executive assistant virtual assistant may not have access to any bill information...)
receive a reply from the bot member node comprising information indicating completion of the bot command; (Brown, see figs. 13 and 17; see paragraph 0188 where conversation items may be caused to be output via the conversation user interface to represent the communication between the virtual assistants...conversation item may indicate that the task has been completed (e.g., an icon for a calendar event that has been scheduled, an icon for a flight that has been booked, and so on)…)
transmit a completion message to a plurality of user nodes in the group when the bot member node is a group bot; and (Montet, see fig. 39 Amy (user node), Bob (second user node) and ENG FRE bot (a bot) says Bonjour!; see fig. 40; see paragraph 0178 where ENG-FRE bot and FRA-ENG bot display in the conversation the translation results. FIG. 39 shows Amy and Bob saying "Hello!" ...; this indicates that the result "Bonjour!" is a group reply sent to Amy and Bob indicating that the translation has been completed)
transmit the completion message to only the first user node when the bot member node is a user bot. (Montet, see fig. 39 Amy (user node), Bob (second user node) and ENG FRE bot (a bot) says Bonjour!; see fig. 40; see paragraph 0178 where ENG-FRE bot and FRA-ENG bot display in the conversation the translation results. FIG. 39 shows Amy and Bob saying "Hello!" ...; this indicates that the result "Bonjour!" is a group reply sent to Amy and Bob indicating that the translation has been completed)
The motivation regarding to the obviousness to claims 1, 10 and 17 also applies to claim 18.

Claim 5 is rejected under 35 U.S.C. 103 as being unpatentable over Brown-Basye-Montet in view of Typrin (U.S. PGPub 2015/0088514 ).

Regarding claim 5, Brown-Basye-Montet teaches all the features of claim 1. However, Brown-Basye-Montet does not explicitly teach wherein: the recorded audio message includes a bot identifier that identifies the bot member node; and
the method further comprises determining the selected voice library from the data structure based on the bot identifier.
Typrin teaches wherein: the recorded audio message includes a bot identifier that identifies the bot member node; and (Typrin, see fig. 1; see paragraph 0028 where the user 102(1) states the following at 128(2): "I don't know. Virtual assistant (identifier), what is the temperature supposed to be tomorrow?" In this example, in response to identifying the predefined phrase "virtual assistant", the invocation module 120 invokes the speech-recognition engine 122, which identifies the voice command from 128(2)...)
the method further comprises determining the selected voice library from the data structure based on the bot identifier. (Typrin, see fig. 1; see paragraph 0028 where the user 102(1) states the following at 128(2): "I don't know. Virtual assistant (identifier), what is the temperature supposed to be tomorrow?" In this example, in response to identifying the predefined phrase "virtual assistant", the invocation module 120 invokes the speech-recognition engine 122, which identifies the voice command from 128(2)...; see paragraph 0009)
It would have been obvious to one of ordinary skill in the art, at the time the invention was filed, to combine Brown-Basye-Montet and Typrin to provide the technique of the recorded audio message includes a bot identifier that identifies the bot member node and determining the selected voice library from the data structure based on the bot identifier of Typrin in the system of  Brown-Basye-Montet in order to increase connectedness amongst large populations of users (Typrin, see paragraph 0001).

Claims 7 and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Brown-Basye-Montet in view of Seshan (U.S. PGPub 2017/0163781).

Regarding claims 7 and 15, Brown-Basye-Montet teaches all the features of claims 1 and 10. However, Brown-Basye-Montet does not explicitly teach further comprising: sending the recorded audio message, by the group messaging service, to a plurality of user nodes in the group besides the first user node.
Seshan teaches further comprising: sending the recorded audio message, by the group messaging service, to a plurality of user nodes in the group besides the first user node. (Seshan, see paragraph 0015 where a user to send a message or recorded voice message to all member call buttons with a single click…)
It would have been obvious to one of ordinary skill in the art, at the time the invention was filed, to combine Brown-Basye-Montet and Seshan to provide the technique of sending the recorded audio message, by the group messaging service, to a plurality of user nodes in the group besides the first user node of Seshan in the system of  Brown-Basye-Montet in order to easily notify contacts or user nodes of changes (Seshan, see paragraph 0004).

Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MENG VANG whose telephone number is (571)270-7023. The examiner can normally be reached Monday - Friday 8:30 AM - 4:30 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, NICHOLAS TAYLOR can be reached on (571) 272-3889. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/MENG VANG/Primary Examiner, Art Unit 2443