DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
1. 	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

         Response to Amendment
2.	As per Applicant’s instruction as filed on 02/24/22, claims 1 and 12 have been amended, and claim 20 has been newly added.

Response to Remarks
3.	Applicant’s remarks with respect to currently amended claims as filed on 02/24/22 have been carefully reconsidered/reviewed, but, nevertheless, are moot in view of the following new ground(s) of rejection(s), incorporating previously cited prior art references to further support the currently amended claims limitations/features.  
In response to Applicant's remarks/arguments against the references individually, one cannot show nonobviousness by attacking/emphasizing references individually where the rejections are based on combinations of references. See In re Keller, 642 F.2d 413, 208 USPQ 871 (CCPA 1981); In re Merck & Co., 800 F.2d 1091,231 USPQ 375 (Fed. Cir. 1986).
In this case, as per Applicant’s remarks regarding the claimed features (see Applicant’s remarks with respect to the features (1) – (5); page 2), Tighe et al in combination with Krause et al, Schurter, and Wang et al discloses/teaches all of the claimed features (1) – (5). 
Please refer to the following new grounds of rejection(s) for a detailed discussion.

Claim Rejections - 35 USC § 103
4. 	In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. 

5.	The following is a quotation of (AIA ) 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:



A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:

1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

6.	Claims 1-4, 8, 12-14, and 20 are rejected under (AIA ) 35 U.S.C. 103 as being unpatentable over Tighe et al (2007/0110107 A1) in view of Krause et al (2005/0190794 A1), Schurter (2005/0043951 A1), and Wang et al (2011/0261151 A1).
Regarding claims 1 and 12, Tighe et al discloses a system/method of synthesizing audio/video, the system/method being applied to application/client server (thus obvious to utilize cloud server), comprising:
Note: Applicant’s arguments rely on language solely recited in preamble recitations in claims 1 and 12. When reading the preamble in the context of the entire claim, the recitation “cloud server” not limiting because the body of the claim describes a complete invention and the language recited solely in the preamble does not provide any distinct definition of any of the claimed invention’s limitations. Thus, the preamble of the claim(s) is not considered a limitation and is of no significance to claim construction. See Pitney Bowes, Inc. v. Hewlett-Packard Co., 182 F.3d 1298, 1305, 51 USPQ2d 1161, 1165 (Fed. Cir. 1999). See MPEP § 2111.02;
the data stream synthesis and processing module configured for:
synthesizing a first video stream (para. [0021]),
synthesizing a (first) audio stream (para. [0021]),
the data stream multi-version encoding module configured for:
respectively encoding the first video stream (204b), a second video stream (from 102b), the first audio stream (204a) and the second audio stream (from 102b) to correspondingly obtain a first video encoding stream set (first and second video set or multiple video input streams), a second video encoding stream set, a first audio encoding stream set and a second audio encoding stream set (first and second audio set or multiple audio input streams) (104) (Figs. 1-2; paras. [0021-0024]),
wherein, a video picture of the first video stream includes a video picture of at least one video input stream, wherein, a video picture of the second video stream includes a video picture for each video input stream in addition to the video picture of the first video stream (the first and the second video streams/media each includes a video and an audio, wherein the video (in the first and second video stream) inherently includes a video picture/frame typically broken down (divided/separated) as slice, and then as an intra/anchor frame, and then as a block) (paras. [0017-0018]); and
the data merging output module configured for:
respectively determining a first video encoding stream and/or a first audio encoding stream from the first video encoding stream set and the first audio encoding stream set respectively, and integrate the first video encoding stream and/or the first audio encoding stream into a first output stream (106a) (“N” media streams) that is provided to a user client (via a client-server network) (Fig. 1; para. [0018]), and 
determining a second video encoding stream and/or a second audio encoding stream from the second video encoding stream set and the second audio encoding stream set respectively, and integrate the second video encoding stream and/or the second audio encoding stream into a second output stream (106b) (“N” media streams), that is provided to the client (via the client-server network) (Fig. 1; para. [0018]).
Tighe et al does not seem to particularly disclose:
wherein an instruction control module is configured to:
receive video synthesis instruction and receive an audio synthesis instruction sent by a broadcast client, 
the data stream synthesis and processing module configured for:
synthesizing the first video stream based on the multiple video input streams and synthesizing the second video stream based on the multiple video streams and the first video stream, 
synthesizing the (first) audio stream and a second audio stream based on the multiple audio input streams, and
the data merging output module configured for:
providing the second output stream to the broadcast client.

However, Krause et al teaches video multiplexer system comprising an instruction control module is configured to receive video synthesis instructions (client request) sent by a broadcast client, so that a video server synthesizes such as a desired playback effect, creates new video streams as necessary, achieves low-latency operation by switching and by coordinating video buffers and buffer restoration (abs.; para. [0013]). 
Furthermore, Schurter teaches audio/voice instant messaging system comprising:   
an instruction control module is configured to receive an audio synthesis instructions (a client/user request of wanting to hear the audio/voice synthesis) sent by a messaging client, in order to convert text messages to audio/voice using audio/voice synthesis and then broadcasts the synthesized audio over the telephony connection as an audio signal, so that user hears the audio/voice synthesis of the message (abs.; para. [0033]).
Moreover, Wang et al teaches video/audio processing method, multipoint control unit and videoconference system comprising:
a data stream synthesis and processing module configured for:
synthesizing ‘N’ video streams (includes a first video stream and a second video stream), and
synthesizing ‘N’ audio streams (includes a first audio stream and a second audio stream) so as to implement interoperability between the sites that support different numbers of media streams, thus reducing the construction cost of the entire network (abs.; paras. [0093-0094]).
Therefore, it would have been considered obvious to a person of ordinary skill in the relevant art employing the method of synthesizing audio/video as taught by Tighe et al to incorporate/combine Krause et al, Schurter, and Wang et al’s teachings as above so that,
the instruction control module is configured to:
receive the video synthesis instruction and receive the audio synthesis instruction sent by the broadcast client, 
the data stream synthesis and processing module configured for:
synthesizing the first video stream based on the multiple video input streams and synthesizing the second video stream based on the multiple video streams and the first video stream, 
synthesizing the (first) audio stream and the second audio stream based on the multiple audio input streams, and
the data merging output module configured for:
providing the second output stream to the broadcast client,


so that the video server synthesizes such as a desired playback effect, creates new video streams as necessary, achieves low-latency operation by switching and by coordinating video buffers and buffer restoration, converts text messages to audio/voice using audio/voice synthesis and then broadcasts the synthesized audio over the telephony connection as an audio signal, so that user hears the audio/voice synthesis of the message, and
implement interoperability between the sites that support different numbers of media streams, thus reducing the construction cost of the entire network.
Regarding claims 2 and 13, Tighe et al discloses:
receiving multiple audio/video data streams (Fig. 1, 104);
decoding the audio/video data stream into a video data stream (306b) and an audio data stream (306a), and caching/storing the decoded video data stream (in local decoder’s memory) and audio data stream (in local decoder’s memory) separately (306a and 306b are separate decoders) (paras. [0043], [0045-0046]), and
correspondingly, reading the multiple video input streams and the multiple audio input streams from caches of the video data stream and the audio data stream respectively (from 308) (paras. [0045-0046]).
Furthermore, Krause et al teaches decoding the video data stream (DATA in) into a decoded video data stream (DATA out), and caching/storing the decoded video data stream (in local decoder’s memory, 855) (Fig. 8), and 
receiving a pull stream instruction from the broadcast client and acquiring synthesized streams as necessary (para. [0013]).
Moreover, Krause et al teaches video multiplexer system comprising receiving video synthesis instructions (client request) sent by a broadcast client, so that a video server synthesizes such as a desired playback effect, creates new video streams as necessary, achieves low-latency operation by switching and by coordinating video buffers and buffer restoration.
Therefore, it would have been considered obvious to a person of ordinary skill in the relevant art employing all of the teachings as above to realize/recognize receiving the pull stream instruction from the broadcast client and acquiring multiple audio/video data streams, for substantially the same reasons/rational as discussed above.
Regarding claim 3, Krause et al teaches video multiplexer system comprising receiving video synthesis instructions (client request) as discussed above.

Furthermore, Tighe et al discloses: 
synthesizing the second video stream based on the multiple video input streams and the first video stream, and integrating the first video encoding stream and/or the first audio encoding stream into a first output stream as discussed above.
Furthermore, Wang et al teaches synthesizing ‘N’ video streams, and synthesizing ‘N’ audio streams as discussed above.
Moreover, Wang et al teaches synthesizing ‘N’ audio streams into one audio stream, and synthesizing ‘N’ video stream/channel information into ‘L’ video stream/channel information, and determining/selecting one or more (target) video streams from multiple video streams (paras. [0093-0094], [0073], [0076-0077]).
Therefore, it would have been considered obvious to a person of ordinary skill in the relevant art employing all of the teachings as above to realize/recognize,
in response to the video synthesis instructions, determining one or more target video input streams from the multiple video input streams and integrating the video pictures of the one or more target video input streams into one video picture, wherein a video stream corresponding to the integrated video picture is used as the first video stream, for substantially the same reasons/rational as discussed above.
Regarding claims 4 and 14, Tighe et al discloses: 
synthesizing the second video stream based on the multiple video input streams and the first video stream, and integrating the first video encoding stream and/or the first audio encoding stream into a first output stream as discussed above.
Furthermore, Wang et al teaches synthesizing ‘N’ video streams, and synthesizing ‘N’ audio streams as discussed above.
Moreover, Wang et al teaches synthesizing ‘N’ audio streams into one audio stream, and synthesizing ‘N’ video stream/channel information into ‘L’ video stream/channel information, and determining/selecting one or more target video input streams from the multiple video input streams (para. [0093]).
Therefore, it would have been considered obvious to a person of ordinary skill in the relevant art employing all of the teachings as above to realize/recognize,
integrating the video picture of the first video stream and the video pictures of the multiple video input streams into one video picture, wherein a video stream corresponding to the integrated video picture is used as the second video stream for substantially the same reason/rational as discussed above.
Regarding claim 8, Tighe et al discloses sending the first output stream to the broadcast client (from 104 to 106, client-server network) (Fig. 1; para. [0018]).
Furthermore, Schurter teaches receiving the audio synthesis instructions (a client/user request of wanting to hear the audio/voice synthesis) sent by the messaging client as discussed above.
Moreover, Wang et al teaches a media switching module (74) for switching the audio streams (Fig. 7; para. [0082]).
Therefore, it would have been considered obvious to a person of ordinary skill in the relevant art employing the method of synthesizing audio/video as taught by Tighe et al to further incorporate/combine Schurter and Wang et al’s teachings as above so as to receive the audio switching instruction sent by the broadcast client, and in response to the audio switching instruction, sending the first output stream to the broadcast client, for substantially the same reasons/rational as discussed above.
Regarding claim 20, Tighe et al discloses:
respectively encoding the first video stream, the second video stream, the first audio stream and the second audio stream in multiple different encoding versions (Global System for Mobile Communication (GSM)--13 kbps, G.729 (8 kbps) and G.723.3 (both 6.4 and 5.3 kbps), and a number of propriety media compression techniques, the video media stream may be compressed by using Moving Picture Experts Group MPEG 1, MPEG 2, MPEG 4, H.261, and H.263) to correspondingly obtain a first video encoding stream set, a second video encoding stream set, a first audio encoding stream set and a second audio encoding stream set (para. [0023]).

7.	Claims 5, 15, and 19 are rejected under (AIA ) 35 U.S.C. 103 as being unpatentable over Tighe et al (2007/0110107 A1), Krause et al (2005/0190794 A1), Schurter (2005/0043951 A1), and Wang et al (2011/0261151 A1) as applied to claims 1, 14, and 4 above, respectively, and further in view of SHERAIZIN (WO 02/01886 A1).
Regarding claims 5, 15, and 19, the combination of Tighe et al, Krause et al, Schurter, and Wang et al does not seem to particularly disclose the claimed features.
However, SHERAIZIN teaches an integration parameter determination unit configured to pre-create a background picture (2, BP, using trees as background) matching a resolution of an integrated/combined video picture (10, 14) and determining integration parameters of each video picture to be integrated/combined, wherein the integration parameters include at least one of a picture size, a location, and an overlay level, and a picture addition unit configured to add each video picture (8, Separated PPI) to be integrated/combined onto the background picture (2) to form the integrated/combined video picture (10) according to the integration parameters, in order to perform picture segmentation and superposition of real time motion pictures in real time with the signals being processed pixel-by-pixel, line-by-line, and frame-by-frame without processing interruptions and video signal loss (abs.; page 10, lines 15-31; page 27, lines 21-25; page 3, lines 3-11).
Therefore, it would have been considered obvious to a person of ordinary skill in the relevant art employing the method of synthesizing audio/video as taught by Tighe et al to further incorporate/combine SHERAIZIN’s teachings as above so that the integration parameter determination unit is configured to pre-create the background picture matching the resolution of the integrated/combined video picture, and determining integration parameters of each video picture to be integrated/combined, wherein the integration parameters include at least one of a picture size, the location, and an overlay level, and the picture addition unit is configured to add each video picture to be integrated/combined onto the background picture to form the integrated/combined video picture according to the integration parameters, in order to perform picture segmentation and superposition of real time motion pictures in real time with the signals being processed pixel-by-pixel, line-by-line, and frame-by-frame without processing interruptions and video signal loss.

8.	Claims 6 and 16 are rejected under (AIA ) 35 U.S.C. 103 as being unpatentable over Tighe et al (2007/0110107 A1), Krause et al (2005/0190794 A1), Schurter (2005/0043951 A1), and Wang et al (2011/0261151 A1) as applied to claims 1 and 12 above, respectively, and further in view of Visser et al (2019/0251971 A1) and DOEHLA et al (2017/0154635 A1).
Regarding claims 6 and 16, Tighe et al discloses respectively determining the first/second video encoding stream and/or the first/second audio encoding stream from the first/second video encoding stream set and the first/second audio encoding stream set respectively, and integrate the first/second video encoding stream and/or the first/second audio encoding stream into the first/second output stream (106a) (“N” media streams), which is provided to the user client (via a client-server network) as discussed above.
Tighe et al further discloses an audio synchronization module configured to provide an audio synchronization via the client-server network (paras. [0018], [0023], [0049]).


Furthermore, Krause et al teaches receiving video synthesis instructions (client request) sent by a broadcast client, so that a video server synthesizes such as a desired playback effect, creates new video streams as necessary, achieves low-latency operation by switching and by coordinating video buffers and buffer restoration, as discussed above. 
Moreover, Schurter teaches receiving the audio synthesis instructions (a client/user request of wanting to hear the audio/voice synthesis) sent by a messaging client, in order to convert text messages to audio/voice using audio/voice synthesis and then broadcasts the synthesized audio over the telephony connection as an audio signal, so that user hears the audio/voice synthesis of the message, as discussed above.
The combination of Tighe et al, Krause et al, Schurter, and Wang et al does not seem to particularly disclose the method further including:
an audio adjustment module configured to receive regulation instructions including audio synthesis parameters sent by the broadcast client, adjusting the second audio stream according to the audio synthesis parameters, and feedbacking the adjusted second audio stream to the broadcast client; and
the audio synchronization module configured to receive an audio synchronization instruction sent by the broadcast client, adjusting the first audio stream according to the audio synthesis parameters, and providing the adjusted first audio stream to the user client.
	However, Visser et al teaches enhanced speech generation comprising an audio adjustment module comprising an audio input stream/signal, a synthesized audio input stream/signal, and one or more parameters, wherein the synthesized audio stream/signal is generated based on training data associated with a user, wherein the training data is distinct from the one or more parameters, in order to provide enhanced speech signal by utilizing automatic speech recognition (ASR) associated with the first audio input stream/signal (abs.; page 18, left col. lines 4-14).
Additionally, DOEHLA et al teaches concept for switching of sampling rates at audio processing devices comprising an audio adjustment module configured comprising an audio synthesizing parameters for the decoded audio frame for one or more of the memories, in order to allow a predictive coding scheme to switch its sampling rate without the need to resample the whole buffers for recomputing the state of it’s filters, and by sampling directly and only the necessitated memory states, a low complexity is maintained while a seamless transition is still possible (abs.; paras. [0094], [0011-0012], [0022]).

Therefore, it would have been considered obvious to a person of ordinary skill in the relevant art employing the method of synthesizing audio/video as taught by Tighe et al to incorporate/combine Krause et al, Schurter, and Wang et al’s teachings as above so that the method further includes the audio adjustment module configured to receive regulation instructions including audio synthesis parameters sent by the broadcast client, adjust the second audio stream according to the audio synthesis parameters, feedback/provide the adjusted second audio stream to the broadcast client, receive the audio synchronization instruction sent by the broadcast client, adjust the first audio stream according to the audio synthesis parameters, and provide the adjusted first audio stream to the user client, in order to provide the enhanced speech signal by utilizing automatic speech recognition (ASR) associated with the first audio input stream/signal and allow a predictive coding scheme to switch its sampling rate without the need to resample the whole buffers for recomputing the state of it’s filters, and by sampling directly and only the necessitated memory states, a low complexity is maintained while a seamless transition is still possible.

9.	Claim 7 is rejected under (AIA ) 35 U.S.C. 103 as being unpatentable over Tighe et al (2007/0110107 A1), Krause et al (2005/0190794 A1), Schurter (2005/0043951 A1), and Wang et al (2011/0261151 A1) as applied to claim 1 above, and further in view of Nicholls (2014/0344469 A1).
Regarding claim 7, Schurter teaches receiving the audio synthesis instructions (a client/user request of wanting to hear the audio/voice synthesis) sent by the messaging client as discussed above.
Furthermore, Tighe et al discloses determining the first/second video encoding stream and/or the first/second audio encoding stream from the second video encoding stream set and the second audio encoding stream set respectively, and integrate the second video encoding stream and/or the second audio encoding stream into a second output stream as discussed above.
The combination of Tighe et al, Krause et al, Schurter, and Wang et al does not seem to particularly disclose determining whether the audio synthesis instructions include an audio copy instruction, and if included, copying the first audio stream, and using the copied data as the second audio stream.
	

However, Nicholls teaches the server processing the audio copy, and onward streaming of data to a client terminal, wherein the frame rate is the same as the size of the audio copy data, which is buffered, so that if needed, the size of the audio copy data is modified to match the audio capability of the client terminal (abs.; para. [0083]).
Therefore, it would have been considered obvious to a person of ordinary skill in the relevant art employing the method of synthesizing audio/video as taught by Tighe et al to further incorporate/combine Schurter and Nicholls’ teachings as above so as to determine whether the audio synthesis instructions include the audio copy instruction, and if included, copying the first audio stream, and using the copied data as the second audio stream, so that if needed, the size of the audio copied data is modified to match the audio capability of the client terminal.

Allowable Subject Matter
10.	Claims 9-11 and 17-18 are objected to as being dependent upon rejected base claim 1, but would be allowable:
	if either claim 9 or 10 is rewritten in independent form including all of the limitations of the base claim 1 and any intervening claims; and
if either claim 17 or 18 is rewritten in independent form including all of the limitations of the base claim 12 and any intervening claims.
Dependent claims 9-10 and 17-18 each recites novel feature(s), wherein 
the prior art of record fails to anticipate or make obvious the novel feature(s) as specified in the claims 9-10 and 17-18.
Accordingly, if the amendment is made to the claim(s) listed above, and if rejected claims are canceled, the application would be placed in a condition for allowance.

Conclusion
11.	The prior art made of record is considered pertinent to Applicant's disclosure.
A)	CHEN et al (2011/0099594 A1), streaming encoded video data. 

12.	Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a). A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 

13.	Any inquiry concerning this communication or earlier communications from the Examiner should be directed Shawn An whose telephone number is 571-272-7324.
If attempts to reach the Examiner by telephone are unsuccessful, the Examiner's supervisor, Joseph Ustaris can be reached on 571-272-7383.

14.	The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.

15.	Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). 
If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/SHAWN S AN/Primary Examiner, Art Unit 2483