DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . 

The text of those sections of Title 35, U.S. Code not included in this action can be found in a prior Office action.

Response to Amendments and Arguments
Regarding an objection to the specification (Abstract), applicant did NOT provide any response. The objection has been maintained. 

Regarding a double patenting rejection, applicant stated (Remarks, page 8) that “Applicant requests that the Examiner hold in abeyance the obviousness-type double patenting rejection until allowable subject matter is found. If at any time it is determined that a Terminal Disclaimer will put the application in condition for allowance, the Examiner is invited to contact Applicant's representative”.

By reviewing the amendment, the examiner believes that the amended claims are still have a double patent issue. The obviousness type double patenting rejection has been maintained. 

Regarding twice rejections under 35 U.S.C. §102, applicant amended independent claim 2 and claim 13 by adding different limitations. Applicant also added two new claims 22-23.

Applicant stated (Remarks, pages 8-9) that “As discussed during the interview, the cited references, individually and in combination, fail to teach or suggest at least these elements of Claim 2 in the context of the remaining elements of claim 2” and  “As discussed during the interview, the cited references, individually and in combination, fail to teach or suggest at least these elements of Claim 13 in the context of the remaining elements of Claim 13”.

To make the record clear, the examiner points out that the newly added limitations in independent claim 2 and claim 13 were not presented during an interview conducted on 08/01/2022. Applicant’s representative proposed a different amendment during the interview. The examiner pointed out that the original disclosure does not have an adequate support for the proposed amendment (See interview summary mailed on 08/05/2022). During the interview, the examiner also explained that the claimed “a first application” or “a second application” could be interpreted as different sections of computer programs (software functions). For example, one section of a computer program is responsible for displaying text and another section of the computer program is responsible for outputting images or outputting audio. In other words, different sections of computer program (software functions) meet the claimed “a first application” and “a second application” because in light of the specification, the claimed “a first application” / “a second application” are just different software functions. 

Applicant amended claim 2 by adding a limitation: “send the second output content to a second application, wherein the second application causes presentation of the second output content in the second modality”.

Applicant amended claim 13 differently by adding a limitation: “wherein the second output content is not presented to a user in the second modality by the application”

For the first rejection under §102 by Kalns et al. (US PG Pub. 2014/0310001, applicant submitted IDS), Kalns discloses a user’s is having a dialog with a virtual personal assistant (VPA). The VPA invokes a shopping function to present text information for a product (Fig. 9, #924). In addition, the VPA also invokes image displaying function for presenting a product image (Fig. 9, #926) as well as using text-to-speech function to describe the product ([0057], using speech synthesizer to output information). In Kalns reference, a shopping function corresponds to the claimed “a first application”. The outputted text is claimed “a first modality”. The image displaying function (or outputting audio using a text-to-speech function) meets the claimed “a second application”. The image output or speech output is claimed “a second modality”. Kalns further meets the newly added limitation in claim 13 because the speech output is generated by a text-to-speech ([0057]) program. The speech output is not presented by the shopping software function. 

For the second anticipation rejection using Gao reference (US PG Pub. 2004/0111272), Gao also meets the amended claims 2 and 13. Gao discloses a multimodal speech-to-speech translation (Abstract). In response to a user’s speech in one language, the speech-to-speech translation system generates outputs in different modality in text, images and audio (Fig. 4, [0011] output translations using text-to-speech synthesizer; [0042-0044], output translations in image and text). As explained above, different software functions are implemented using different program sections. These different program sections with different functions meet the claimed “a first application” or “a second application”.  Different output modalities including text output, image output and audio output correspond to claimed “a first modality” and “a second modality”. 

	Regarding the newly added claims 22-23, Kalns discloses a dialog manager that keeps track of dialog states (Kalns, [0050]). Although Kalns implicitly discloses the limitations of new claims 22-23, the examiner further cites Ehlen et al. (US PG Pub. 2004/0006480, previously cited in PTO-892 form, but not relied upon in the previous rejection) to meet limitations in the newly added claims 22-23. 

Ehlen discloses a multi-modal dialog by outputting information in different modalities under a control of a dialog manager (Abstract, Fig. 2, #204 a multimodal dialog manager; Fig. 4, outputting information in different modalities including text, speech and images). The examiner combines Ehlen with Kalns to reject the newly added claims 22-23. 

Specification
The abstract of the disclosure is objected to because the current abstract is related to context sharing between dialogs and is not related to the subject matter defined by the independent claims. The claimed invention defined by each of independent claims in the instant application is related to generating output using multiple modalities (i.e., audio and image). Correction is required.  See MPEP § 608.01(b).

Double Patenting
Claims 2-4, 6-11 and 13-23 are rejected on the ground of nonstatutory obviousness-type double patenting as being unpatentable over claims 1-24 of U.S. Pat. 9,754,591, as well as claims 1-18 of US Pat. 10,706,854. Although the conflicting claims are not identical, they are not patentably distinct from each other because the instant claims are broader than corresponding claims of ‘591 patent.  The instant claims are also broader than corresponding claims of ‘854 patent. In other words, corresponding claims of the grandparent as well as parent patents anticipate instant claims. Anticipation is “the ultimate or epitome of obviousness” (In re Kalm, 154 USPQ 10 (CCPA 1967), also In re Dailey, 178 USPQ 293 (CCPA 1973) and In re Pearson, 181 USPQ 641 (CCPA 1974)).
Claim Rejections - 35 USC § 102
Claims 2-4, 6-11 and 13-21 are rejected under 35 U.S.C. 102 (a)(2) as being anticipated by Kalns et al. (US PG Pub. 2014/0310001, applicant submitted IDS, referred to as Kalns).

Kalns discloses a user has a spoken dialog with a virtual personal assistant (VPA) to do shopping or request certain services. The VPA displays a recommended product in text (claimed “a first modality”) and also shows some product pictures (claimed “a second modality”) as illustrated in Fig. 9 or output as speech ([0057]) 

	Regarding claims 2 and 13, Kalns discloses a system and a method, comprising:
computer-readable memory storing executable instructions; and one or more processors in communication with the computer-readable memory, wherein the one or more processors are programmed by the executable instructions to (Fig. 9 and Fig. 11, a computer implemented virtual personal assistant, VPA): 

receive audio data representing at least one utterance (Fig. 4, #410, a user has a spoken dialog with a VPA); 
generate, using natural language understanding ("NLU") processing based at least partly on the audio data, command data that represents a subject of the at least one utterance (Fig. 4, #410, [0019], understanding user’s intent from a request, [0046], natural language understanding component); 
send the command data to an application, wherein the application causes presentation of first output content in a first modality in response to receiving the command data (Fig. 9, #900, [0059], a user has a conversation with VPA to do shopping; [0018], [0057], output results in spoken words or in text (claimed “a first modality”); also see Fig. 4, #410); 
receive, from the application, second output content in a second modality, wherein the second output content is associated with the first output content (Fig. 9, #926, also shows some images of the product, claimed “a second modality”); and 
For a newly added limitation to independent claim 2: send the second output content to a second application, wherein the second application causes presentation of the second output content in the second modality (Fig. 9, #926; [0057], the system outputs in text, image, audio and video; Note, different software functions such as displaying text or outputting audio (TTS) correspond to claimed “a first application” or “a second application”).
	
	For a newly added limitation to independent claim 13: “wherein the second output content is not presented to a user in the second modality by the application” ([0057], outputting information using TTS, not by the shopping function).

	Regarding claims 3 and 14, Kalns further discloses wherein the first modality comprises at least one of a visual mode of presentation or an audio mode of presentation ([0057], using speech synthesizer, i.e., TTS; Fig. 9, #924, #926, showing product pictures; note, prior art reference only need to teach ONE alternative).

	Regarding claims 4 and 15, Kalns further discloses wherein first output content comprises an audio presentation associated with the subject, and wherein the second output content comprises a visual presentation (([0057], using speech synthesizer, i.e., TTS; Fig. 9, #924, #926, showing some product pictures (claimed “visual presentation”)).

	Regarding claim 16, Kalns further discloses wherein the one or more processors are programmed by further executable instructions to manage a multi-turn dialog comprising the at least one utterance, at least a second utterance, and at least one system-generated response (Fig. 4, #410, Fig. 5, #510; multi-turn dialog between a user and VPA).

	Regarding claims 6 and 17, Kalns further discloses wherein the one or more processors are programmed by further executable instructions to determine to send the command data to the application based at least partly on the subject of the at least one utterance ([0059], sent to e-commerce shopping or finance service depending on user’s request).

	Regarding claims 7 and 18, Kalns further discloses using automatic speech recognition ("ASR") processing and the audio data, utterance data representing the at least one utterance, wherein the command data being generated using NLU processing based at least partly on the audio data comprises the command data being generated using NLU processing and the utterance data ([0035-0036], [0039-0041], speech recognition and natural language understanding, determine user’s intent from spoken requests).
	
Regarding claims 8 and 19, Kalns further discloses 
store context data, wherein the context data is generated by the application (Fig. 2, #210, [0036]);
generate, using second NLU processing based at least partly on second audio data, second command data ([0048], [0087], [0097], Fig. 9, #920, continue shopping after completion of the next-to-be processed incomplete intent); 
determine that a second application is to use the context data ([0048]); and 
send the second command data and the context data to the second application ([0095], [0098-0100]).

Regarding claim 9, Kalns further discloses 
generate context data during the NLU processing (Fig. 2, #210, #212); 
store the context data (Fig. 2, #210, #212); and 
generate second command data using second NLU processing based at least partly on second audio data and the context data (Fig. 9, #910, #922, #924; [0087], [0097]).

Regarding claim 10, Kalns further discloses named entity data associated with the at least one utterance, or utterance data representing a plurality of previously processed utterances ([0074], determining user’s intent based on “Thriller”, which is a movie name and “Hitchcock” is a person name; [0018-0019], using dialog history (claimed “previously processed utterances”)).

Regarding claim 11, Kalns further discloses an audio input device configured to generate the audio data based on the at least one utterance ([0057], using TTS to generate audio for product information); and
an audio output device configured to present at least one of the first output content or the second output content ([0057]).

Regarding claim 20, Kalns further discloses receiving the audio data comprises receiving the audio data from a microphone of the computing system ([0093]).

Regarding claim 21, Kalns further discloses receiving the audio data comprises receiving the audio data over a network connection to a computing device (Fig. 11, #1130, [0112]).

Claims 2 and 13 are rejected under 35 U.S.C. 102 (a)(1) as being anticipated by Gao et al. (US PG Pub. 2004/0111272, referred to as Gao).

Many references from different application areas could meet the broad limitations of independent claims 1 and 13. For example, Gao discloses a speech-to-speech translation. After a user speaking an utterance in an original language, the utterance is translated into a target language. The translation is outputted in a text format or using audio output (claimed “a first modality”), and also displayed some images (claimed “a second modality”) to show meanings of individual words (Fig. 4, [0042-0044]).

	Regarding claims 2 and 13, Gao discloses a system and a method, comprising:
computer-readable memory storing executable instructions; and one or more processors in communication with the computer-readable memory, wherein the one or more processors are programmed by the executable instructions to ([0026], [0028], a computer implemented multimodal speech-to-speech translation system): 
receive audio data representing at least one utterance (Fig. 4, #406, [0042], a person asks a question in a source language, e.g., Chinese); 
generate, using natural language understanding ("NLU") processing based at least partly on the audio data, command data that represents a subject of the at least one utterance ([0043]); 
send the command data to an application, wherein the application causes presentation of first output content in a first modality in response to receiving the command data ([0031], Fig. 4, #402, generating translated speech (“a first modality”), e.g., English); 
receive, from the application, second output content in a second modality, wherein the second output content is associated with the first output content ([0042], Fig. 4, #404, outputting images to illustrate meaning of words; image is claimed “a second modality”); and 
cause presentation of the second output content in the second modality (Fig. 4, #404).

	Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.


Claims 22-23 are rejected under 35 U.S.C. 103 as being unpatentable over Kalns in view of Ehlen et al. (US PG Pub. 2004/0006480, cited in PTO-892 form in an office action mailed on 05/04/2022, referred to as Ehlen). 

Applicant added new claims 22-23 which is related to using a dialog manager application to manage outputs. Although Kalns implicitly discloses the features defined by the newly added claims (Kalns, [0050], a dialog manager keeps track of dialog states), the excite further cites a reference to Ehlen which was previously cited but relied upon for the rejection. 

Ehlen discloses a dialog manager controls output in different modalities including image, text or audio (Ehlen, Abstract, Fig. 2, #204, [0014], [0038], [0049], presenting information in audio, image under a control of a dialog manager), 

	It would have been obvious to a person having ordinary skill in the art at the time the invention was filed to combine Kalns’s teaching with Ehlen’s teaching to output information in multi-modal modes using a dialog manager. One having ordinary skill in the art would have been motivated to make such a modification so that a user could understand information in a noisy environment (Ehlen, [0011]). 

Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to Jialong He, whose telephone number is (571) 270-5359.  The examiner can normally be reached on Monday – Friday, 8:00AM – 4:30PM, EST.

If attempts to reach the examiner by telephone are unsuccessful, the examiner's supervisor, Pierre Desir can be reached on (571) 272-7799. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/JIALONG HE/Primary Examiner, Art Unit 2659