DETAILED ACTION
1.	The communication is in response to the application received 03/10/2021, wherein claims 1-18 are pending and are examined as follow. Note, this application is a Continuation of Application No. 15/817,058, now U.S. Patent No. 10,979,721 B2

Notice of Pre-AIA  or AIA  Status
2.	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . 

Information Disclosure Statement
3.	The information disclosure statements (IDS) were submitted on 03/30/2015, 09/09/2015, and 04/26/2016. The submissions are in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statements are being considered by the examiner.

Claim Rejections - 35 USC § 112
4.	The following is a quotation of 35 U.S.C. 112(b):

(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claim 4 is rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Regarding claim 4, claim 4 recites the phrase “artistic intents”, however, it is not readily clear as to what is meant by said phrase. The specification (e.g. ¶0036) refers to a content creators’ artistic intent (e.g. director’s intent, color grader’s intent, etc.), however, this does not further define the meaning of intent as used in the claim. Other paragraphs of the specification also do not appear to elaborate on the meaning of the foregoing phrase. Moreover, it is not exactly clear how “artistic intents” compares with for example AI methods or ML methods with respect to predicting two or more ROIs. For these reasons, the aforementioned phrase is ambiguous particularly in light of the other methods which are found to be concrete methods for determining ROIs. Since the claim relies on the limitation “based on one or more of…”, the Examiner selects one of the other methods for the purposes of examination.

Double Patenting
5.	The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees.  A nonstatutory double patenting rejection is appropriate where the claims at issue are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); and In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321I or 1.321(d) may be used to overcome an actual or provisional rejection based on a nonstatutory double patenting ground provided the reference application or patent either is shown to be commonly owned with this application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b).
The USPTO internet Web site contains terminal disclaimer forms which may be used.  Please visit http://www.uspto.gov/forms/.  The filing date of the application will determine what form should be used.  A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission.  For more information about eTerminal Disclaimers, refer to http://www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.  
6.	Claims 1-18 are rejected on the ground of nonstatutory double patenting as being unpatentable over claim 1-4, 6-12, 15-17, and 20-23 of U.S. Patent No. 10,979,721 B2 in view of Kortum et al. US 2003/0194141 A1, hereinafter referred to as 721 and Kortum, respectively. Please refer to the following Table for details.
**Note: The items below that are BOLD/UNDERLINED in the Instant Application/Co-pending Application, respectively, indicate differences in the claim limitation.

Instant Application 17/198,085


U.S. Patent No. 10,979,721 B2
(Previously 15/817,058)

Claim 1
A method for determining regions of interest (ROIs) in video images, comprising: generating user viewing behavior data while rendering a video clip including a video image in a plurality of viewports of a plurality of content viewers, wherein each content viewer in the plurality of content viewers views the video clip including the video image through a respective viewport in the plurality of viewports; determining, from the user viewing behavior data, a plurality of spatial locations, in the video image, 
to which a plurality of foveal visions of the plurality of content viewers is directed; identifying, based on the plurality of spatial locations in the video image, two or more ROIs in the video image; 
encoding two or more image portions in the two or more ROIs of the video image and one or more other image portions outside the two or more ROIs of the video image into a video stream; wherein the two or more image portions are non-overlapping with the one or more other image portions; wherein the two or more image portions in the two or more ROIs of the video image are encoded in the video stream at spatiotemporal resolutions higher than other spatiotemporal resolutions at which the one or more other image portions outside the two or more ROIs of the video image are encoded in the video stream; transmitting, to a streaming client device, the video stream.
Claim 1
1. A method for determining regions of interest (ROIs) in video images, comprising: generating user viewing behavior data while rendering one or more first video images of a first video clip in a plurality of viewports of a plurality of content viewers, wherein each content viewer in the plurality of content viewers views the one or more first video images through a respective viewport in the plurality of viewports; determining, from the user viewing behavior data, a plurality of spatial locations, in the one or more first video images, to which a plurality of foveal visions of the plurality of content viewers is directed; identifying, based on the plurality of spatial locations in the one or more first video images, one or more ROIs in the one or more first video images; 
determining, based at least in part on one or more visual objects recognized with one or more computer vision techniques, whether one or more second video images of a second different video clip and the one or more first video images of the first video clip belong to a common visual scene class, wherein no user viewing behavior data has been generated to determine second spatial locations, in the one or more second video images, to which foveal visions of content viewers are directed; in response to determining, based at least in part on the one or more visual objects recognized with the one or more computer vision techniques, that the one or more second video images and the one or more first video images belong to a common visual scene class, determining, based at least in part on the one or more ROIs in the one or more first video images, one or more second ROIs in the one or more second video images; wherein the method is performed by one or more computing devices.
Claim 2
The method of Claim 1, further comprising: compressing pixel values, of the video image, in the two or more image portions in the two or more ROIs with first peak-signal-noise-ratios (PSNRs); compressing pixel values, of the video image, in the one or more other image portions outside the two or more ROIs with second PSNRs lower than the first PSNRs.
Claim 2
The method of claim 1, further comprising: compressing pixel values, of the one or more first video images, in the one or more ROIs with first peak-signal-noise-ratios (PSNRs); compressing pixel values, of the one or more first video images, not in the one or more ROIs with second PSNRs lower than the first PSNRs.
Claim 3
The method of Claim 1, further comprising: D16088US02 (60175-0486)- 38 -classifying the plurality of foveal visions of the plurality of content viewers into two or more foveal vision clusters; identifying the two or more ROIs from the two or more foveal vision clusters.
Claim 3
The method of claim 1, further comprising: classifying the plurality of foveal visions of the plurality of content viewers into two or more foveal vision clusters; identifying two or more ROIs from the two or more foveal vision clusters.
Claim 4
The method of Claim 3, wherein the two or more ROIs are predicted based on one or more of: artistic intents, artificial intelligence methods, machine learning methods, k- NN classification algorithms, k-means clustering algorithms, linear discriminant analyses (LDAs), nearest centroid classifiers, nearest prototype classifiers, or Rochhio classifiers.
Claim 4
The method of claim 3, wherein the plurality of foveal visions of the plurality of content viewers is classified into the two or more foveal vision clusters based on one or more of: k-NN classification algorithms, k-means clustering algorithms, linear discriminant analyses (LDAs), nearest centroid classifiers, nearest prototype classifiers, or Rochhio classifiers.
Claim 5
The method of Claim 1, further comprising: determining one or more spatial locations, represented in the one or more first video images, of sound sources, in spatial audio to be concurrently rendered with the one or more first video images; determining at least one of the one or more ROIs based in part on the one or more spatial locations of the sound sources in the spatial audio.
Claim 6
The method of claim 1, further comprising: determining one or more spatial locations, represented in the one or more first video images, of sound sources, in spatial audio to be concurrently rendered with the one or more first video images; determining at least one of the one or more ROIs based in part on the one or more spatial locations of the sound sources in the spatial audio.
Claim 6
The method of Claim 1, wherein the plurality of image locations is identified based on first viewing behavior data that comprise the plurality of foveal visions of the plurality of content viewers; wherein the first viewing behavior data are collected from the plurality of content viewers up to a first time point; further comprising: collecting second viewing behavior data that comprise a second plurality of foveal visions of a second plurality of content viewers, wherein the second viewing behavior data are at least partly collected from the plurality of content viewers after the first time point; determining a second plurality of spatial locations, in the video image, to which the second plurality of foveal visions of the plurality of content viewers is directed; identifying, based on the second plurality of spatial locations in the video image, one or more second ROIs in the video image.
Claim 7
The method of claim 1, wherein the plurality of image locations is identified based on first viewing behavior data that comprise the plurality of foveal visions of the plurality of content viewers; wherein the first viewing behavior data are collected from the plurality of content viewers up to a first time point; further comprising: collecting second viewing behavior data that comprise a second plurality of foveal visions of a second plurality of content viewers, wherein the second viewing behavior data are at least partly collected from the plurality of content viewers after the first time point; determining a second plurality of spatial locations, in the one or more first video images, to which the second plurality of foveal visions of the plurality of content viewers is directed; identifying, based on the second plurality of spatial locations in the one or more first video images, one or more second regions of interest (ROIs) in the one or more first video images.
Claim 7
The method of Claim 1, further comprising: determining a set of user perceivable characteristics in connection with the video image; D16088US02 (60175-0486)- 39 -determining a set of second user perceivable characteristics in connection with a second video image; based on the set of user perceivable characteristics and the second set of user perceivable characteristics, predicting a second ROI, in the second video image, that have one or more second perceivable characteristics correlating with one or more user perceivable characteristics of an ROI in the video image.
Claim 8
The method of claim 1, further comprising: determining a set of user perceivable characteristics in connection with the one or more first video images; determining a set of second user perceivable characteristics in connection with one or more second video images; based on the set of user perceivable characteristics and the second set of user perceivable characteristics, predicting a second ROI, in the one or more second video images, that have one or more second perceivable characteristics correlating with one or more user perceivable characteristics of an ROI in the one or more first video images.
Claim 8
The method of Claim 7, wherein the second ROI in the second video image is identified from the one or more user perceivable characteristics before user viewing behavior data have been collected from the second video image.
Claim 9
The method of claim 8, wherein the second ROI in the one or more second video images is identified from the one or more user perceivable characteristics before user viewing behavior data have been collected from the one or more second video images.
Claim 9
The method of Claim 7, wherein the second ROI in the second video image is identified from the one or more user perceivable characteristics after at least a part of user viewing behavior data has been collected from the second video image.
Claim 10
The method of claim 8, wherein the second ROI in the one or more second video images is identified from the one or more user perceivable characteristics after at least a part of user viewing behavior data has been collected from the one or more second video images.
Claim 10
The method of Claim 7, wherein the one or more user perceivable characteristics comprise at least one of: one or more visual characteristics, one or more audio characteristics, or one or more non-visual non-audio user perceptible characteristics.
Claim 11
The method of claim 8, wherein the one or more user perceivable characteristics comprise at least one of: one or more visual characteristics, one or more audio characteristics, or one or more non-visual non-audio user perceptible characteristics.
Claim 11
The method of Claim 7, wherein the video image is in a first video clip, and wherein the second video image is in a second different video clip.
Claim 12
The method of claim 8, wherein the one or more first video images are in a first video clip, and wherein the one or more second video images are in a second different video clip.
Claim 12
The method of Claim 7, wherein objective metrics of computer vision for the video image is different from those for the second video image.
Claim 15
The method of claim 13, wherein objective metrics of computer vision for the one or more first video images are different from those for the one or more second video images.
Claim 13
The method of Claim 12, wherein the objective metrics of computer vision comprise one or more of: luminance characteristics, luminance distributions, chrominance characteristics, chrominance distributions, or spatial resolutions.
Claim 16
The method of claim 15, wherein the objective metrics of computer vision comprise one or more of: luminance characteristics, luminance distributions, chrominance characteristics, chrominance distributions, or spatial resolutions.
Claim 14
The method of Claim 7, further comprising: correlating a sound source, in spatial audio for the video image, to a ROI in the two or more ROIs of the video image; determining one or more second sound sources in second spatial audio for the second D16088US02 (60175-0486)- 40 -video image; 
predicting at least one of the one or more ROIs in the second video image, based at least in part on one or more spatial locations of the one or more second sound sources in the second video image.
Claim 17
The method of claim 13, further comprising: correlating one or more sound sources, in spatial audio for the one or more first video images, to the ROI in the one or more first video images; determining one or more second sound sources in second spatial audio for the one or more second video images; predicting the second ROI based at least in part on one or more spatial locations, in the one or more second video images, of the one or more second sound sources.
Claim 15
An apparatus performing the method as recited in Claim 1.
Claim 20
An apparatus performing the method as recited in claim 1.
Claim 16
A system performing the method as recited in Claim 1.
Claim 21
A system performing the method as recited in claim 1.
Claim 17
A non-transitory computer readable storage medium, storing software instructions, which when executed by one or more processors causes performance of the method recited in Claim 1.
Claim 22
A non-transitory computer readable storage medium, storing software instructions, which when executed by one or more processors causes performance of the method recited in claim 1.
Claim 18
A computing device comprising one or more processors and one or more non- transitory storage media, storing a set of instructions, which when executed by one or more processors cause performance of the method recited in Claim 1.
Claim 23
 A computing device comprising one or more processors and one or more non-transitory storage media, storing a set of instructions, which when executed by one or more processors cause performance of the method recited in claim 1.


7.	Claim 1 is rejected on the grounds of nonstatutory obviousness-type double patenting as being unpatentable over claim 1 of 721, in view of Kortum. With the exception of teaching encoding (emphasized in bold in the table above) , claim 1 of 721 discloses all of the elements in instant claim 1, where noted differences are primarily due to minor differences in identifying different video images. See table above.
As to Claim 1, Those features that are not disclosed by 721 include “encoding two or more image portions in the two or more ROIs of the video image and one or more other image portions outside the two or more ROIs of the video image into a video stream; wherein the two or more image portions are non-overlapping with the one or more other image portions; wherein the two or more image portions in the two or more ROIs of the video image are encoded in the video stream at spatiotemporal resolutions higher than other spatiotemporal resolutions at which the one or more other image portions outside the two or more ROIs of the video image are encoded in the video stream; transmitting, to a streaming client device, the video stream”. 
Kortum however teaches the aforementioned limitation as illustrated for e.g. in Fig. 5 (and associated text), where two ROIs given by two foveation zones centered on letters ‘A’ and ‘X’ are considered to be non-overlapping with the other image portions as shown. The two foveation zones exhibit the highest degree of resolution while the other regions have lower variable degrees of resolution. Note the compression methodology given in Fig. 1, where foveation zones are compressed based on their weighting which consequently is determined according to the probability of a viewer(s) viewing the viewing locations within the image sequence. Lastly, Fig. 1 illustrates the compressed image can be transmitted to at least one viewer.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to add the teachings of Kortum as above to perform multi-point predictive foveation on a scene of an image sequence in a multi-viewer environment so as to help reduce the bandwidth of the moving images (Kortum – abstract). 


Claim Rejections - 35 USC § 103
8.	In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 1, 3, and 15-18 are rejected under 35 U.S.C. 103 as being unpatentable over Kortum et al. (US 2003/0194141 A1), in view of Nguyen et al. (US 2017/0236252 A1), hereinafter referred to as Kortum and Nguyen, respectively.
Regarding claim 1, Kortum discloses “A method for determining regions of interest (ROIs) in video images, comprising: generating user viewing behavior data while rendering a video clip including a video image in a plurality of viewports of a plurality of content viewers [See ¶0015-0017 with respect to using eye tracking devices for sensing where viewers are viewing different scenes of an image sequence, i.e. a plurality of viewports. Reference Fig. 2], wherein each content viewer in the plurality of content viewers views the video clip including the video image through a respective viewport in the plurality of viewports [Same as above, where the plurality of viewers view the different scenes in the image sequence at different view locations. This is understood as a plurality of viewports]; determining, from the user viewing behavior data, a plurality of spatial locations, in the video image, to which a plurality of foveal visions of the plurality of content viewers is directed [See ¶0015-0017 regarding data from eye-tracking devices (i.e. user viewing behavior data) which help determine view locations of the viewers for scenes of said image sequence, i.e. where viewer’s views are directed. Foveation zones can be determined for each scene based on clusters of view locations]; identifying, based on the plurality of spatial locations in the video image [Based on the determined view locations of viewers – same citations above], two or more ROIs in the video image [From the view locations, different objects and scene types can be recognized such as people (¶0018), which can be considered ROIs within the view locations. Examples of this are illustrated in ¶0019-0020 and Fig. 2]; encoding two or more image portions in the two or more ROIs of the video image [Compressor compresses scenes of said image sequence (¶0027) based on probabilities of a viewer viewing a corresponding portion of a scene for each foveation zone (¶0022). For e.g. there can be two foveation zones, i.e. a person and a dog (¶0019). Both get compressed with different half-resolution constants. Also see Fig. 5 where two foveation zones are centered at ‘X’ and ‘A’ (2 ROIs)] and one or more other image portions outside the two or more ROIs of the video image into a video stream [In Fig. 5., the other letters are outside the two foveation zones. See Nguyen below for further support]; wherein the two or more image portions are non-overlapping with the one or more other image portions [Per Fig. 5, letters ‘A’ and ‘X’ do not overlap with the other letters of the image. See Nguyen below for further support]; wherein the two or more image portions in the two or more ROIs of the video image are encoded in the video stream at spatiotemporal resolutions higher than other spatiotemporal resolutions at which the one or more other image portions outside the two or more ROIs of the video image are encoded in the video stream [Foveation zones centered on letters ‘A’ and ‘X’ are at higher resolutions than the other letters outside the zones. See Fig. 5 and associated text. Note Kortum’s compression methodology in Fig. 1. See Nguyen below for further support]; transmitting, to a streaming client device, the video stream.  [Fig. 1 element 54 shows transmitting compressed version of image sequence based on method employed] Although Kortum’s teachings are found to disclose the foregoing limitation, Nguyen from the same or similar field of endeavor is brought in to provide additional support regarding the rendering of foveated video (abstract).  Specifically, “encoding two or more image portions in the two or more ROIs of the video image and one or more other image portions outside the two or more ROIs of the video image into a video stream [See for e.g. ¶0107 where portions of the video frame in the regions of interest have relatively high pixel fidelity/resolution and regions outside the regions of interest have a relatively low pixel fidelity/resolution. As to encoding, see for example ¶0073-0074 and 0082 for support] wherein the two or more image portions are non-overlapping with the one or more other image portions [See for e.g. Figs. 4 and 7D where different regions of an image having different qualities are shown. These regions also appear to be non-overlapping]; wherein the two or more image portions in the two or more ROIs of the video image are encoded in the video stream at spatiotemporal resolutions higher than other spatiotemporal resolutions at which the one or more other image portions outside the two or more ROIs of the video image are encoded in the video stream [See ¶0107 regarding the different regions having different fidelities/resolutions. For e.g., portions of the video frame in the regions of interest have relatively high pixel fidelity/resolution and regions outside the regions of interest have a relatively low pixel fidelity/resolution.] It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the method of Kortum for performing multi-point predictive foveation, to add the teachings of Nguyen to provide techniques for foveated rendering of regions of interest in an image at high pixel resolution while reducing the pixel resolution of the remaining portions of said image which means less data is transmitted by an encoder and further, decoding performance can be improved (¶0057).
Regarding claim 3,  Kortum and Nguyen teach all the limitations of claim 1, and are analyzed as previously discussed with respect to that claim.  Kortum further discloses  “further comprising: D16088US02 (60175-0486)- 38 -classifying the plurality of foveal visions of the plurality of content viewers into two or more foveal vision clusters [¶0017 where foveation zones are based on clusters of view locations of the viewers]; identifying the two or more ROIs from the two or more foveal vision clusters.”  [¶0018 shows different scene types and objects (i.e. ROIs) can be determined in the scene from the foveation zones. See Fig. 1 and associated text regarding different objects in the foveation zones]
Regarding claim 15,  Kortum and Nguyen teach all the limitations of claim 1, and are analyzed as previously discussed with respect to that claim. Kortum further discloses “An apparatus performing the method as recited in Claim 1.” [See the hardware of Kortum’s system in Fig. 2 for performing multi-point predictive foveation]  
Regarding claim 16,  Kortum and Nguyen teach all the limitations of claim 1, and are analyzed as previously discussed with respect to that claim. Kortum further discloses “A system performing the method as recited in Claim 1.” [Note the system in Fig. 2 for performing multi-point predictive foveation]  
 Regarding claim 17,  Kortum and Nguyen teach all the limitations of claim 1, and are analyzed as previously discussed with respect to that claim. Kortum further discloses “A non-transitory computer readable storage medium, storing software instructions, which when executed by one or more processors causes performance of the method recited in Claim 1.” [See ¶0032 for support]  
Regarding claim 18,  Kortum and Nguyen teach all the limitations of claim 1, and are analyzed as previously discussed with respect to that claim. Kortum further discloses “A computing device comprising one or more processors and one or more non- transitory storage media, storing a set of instructions, which when executed by one or more processors cause performance of the method recited in Claim 1.” [See ¶0032 for support] 
Claim 2 is rejected under 35 U.S.C. 103 as being unpatentable over Kortum, in view of Nguyen, and in further view of Gopalan R. “Exploiting Region of Interest for Improved Video Coding”, Thesis, Ohio State University, 2009, hereinafter referred to as Gopalan.
Regarding claim 2,  Kortum and Nguyen teach all the limitations of claim 1, and are analyzed as previously discussed with respect to that claim.  Although Kortum and Nguyen disclose compression by foveation, they do not explicitly teach employing the metric PSNR for compression, i.e., “compressing pixel values, of the video image, in the two or more image portions in the two or more ROIs with first peak-signal-noise-ratios (PSNRs); compressing pixel values, of the video image, in the one or more other image portions outside the two or more ROIs with second PSNRs lower than the first PSNRs.” Gopalan however from the same or similar field of endeavor discloses the foregoing. [Gopalan (see pg. 4) reveals allocating more bits to the ROI over the background during coding operations of a video sequence. Moreover, ROI pre-processing steps are conducted using objective metrics such as PNSR and bit rate; hence, it stands to reason that if more bits are used for the ROI, the PNSR will be higher than that for the background.] It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the techniques of Kortum and Nguyen for foveated video to add the teachings of Gopalan as above to provide methods for region of interest tracking and video preprocessing so as to ensure higher quality to the regions of interest over the background (Gopalan - abstract). 
Claim 4 is rejected under 35 U.S.C. 103 as being unpatentable over Kortum, in view of Nguyen, and in further view of Santella A. “Robust Clustering of Eye Movement Recordings for Quantification of Visual Interest”, ACM 2004, hereinafter referred to as Santella.
Regarding claim 4,  Kortum and Nguyen teach all the limitations of claim 1, and are analyzed as previously discussed with respect to that claim.  Kortum and Nguyen however do not teach “wherein the two or more ROIs are predicted based on one or more of: artistic intents, artificial intelligence methods, machine learning methods, k- NN classification algorithms, k-means clustering algorithms, linear discriminant analyses (LDAs), nearest centroid classifiers, nearest prototype classifiers, or Rochhio classifiers.” [See Santella (abstract) where visual point of regard measurements are clustered into gazes and ROIs using a mean-shift procedure. Santella further shows using K-means clustering methods based on other works. See for e.g., pg. 28 with respect to Latimer (1988). This is also addressed later in Santella] It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the techniques disclosed by Kortum and Nguyen to add the teachings of Santella as above to provide an automatic and data driven method for characterizing the location and extent of a viewer’s interest in terms of eye movement recordings, in order to inform a range of investigations in image and scene viewing (abstract).
Claim 5 is rejected under 35 U.S.C. 103 as being unpatentable over Kortum, in view of Nguyen, and in further view of Lee J-S. “Efficient Video Coding in H.264/AVC by using Audio-Visual Information”, IEEE 2009, hereinafter referred to as Lee.
Regarding claim 5,  Kortum and Nguyen teach all the limitations of claim 1, and are analyzed as previously discussed with respect to that claim.  Kortum and Nguyen however do not teach “further comprising: determining one or more spatial locations, represented in the one or more first video images, of sound sources, in spatial audio to be concurrently rendered with the one or more first video images; determining at least one of the one or more ROIs based in part on the one or more spatial locations of the sound sources in the spatial audio.  Lee on the other hand from the same or similar field of endeavor discloses the foregoing. [Lee discloses using audio sources in a video sequence to facilitate identifying where an observer’s attention is focused. This helps to determine the level of quality in ROI coding (e.g., see abstract and conclusion – Sect. V)] It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the system disclosed by Kortum and Nguyen to add the teachings of Lee as above to provide an efficient video coding method that utilizes audio-visual information based on observations that sound-emitting regions in a video sequence draw the attention of an observer. Advantageously, the foregoing allows for encoding different regions in a scene with different quality where regions far from the sound source are coded with lesser quality (Lee - abstract).
Claim 6 is rejected under 35 U.S.C. 103 as being unpatentable over Kortum, in view of Nguyen, and in further view of Nystrom et al. “Deriving and evaluating eye-tracking controlled volumes of interest for variable-resolution video compression”, J. Electronic Imaging 16(1), 013006 (Jan-Mar 2007), hereinafter referred to as Nystrom.
Regarding claim 6,  Kortum and Nguyen teach all the limitations of claim 1, and are analyzed as previously discussed with respect to that claim. Kortum is further found to disclose “wherein the plurality of image locations is identified based on first viewing behavior data that comprise the plurality of foveal visions of the plurality of content viewers;” [See ¶0015-0017 of Kortum with respect to using eye tracking devices for sensing where viewers are viewing different scenes of an image sequence, i.e. a plurality of viewports. Reference Fig. 2] However, both Kortum and Nguyen do not teach the remaining limitation of claim 6. Nyström on the other hand from the same or similar field of endeavor is found to disclose the features of claim 6. Specifically, “wherein the first viewing behavior data are collected from the plurality of content viewers up to a first time point [Nyström discloses (Sect. 2 ‘Data Collection’, pg. 2) collecting eye movement data during test sessions, where each session commenced with calibration followed by temporal synchronization. Each session can be reasonably construed as collecting data during that session, i.e., up to a given time. Also refer to pg. 6 Sect. 5.2 ‘Procedure’ regarding subjects watching three different 8-s video clips which is construed as said test session.]; further comprising: collecting second viewing behavior data that comprise a second plurality of foveal visions of a second plurality of content viewers, wherein the second viewing behavior data are at least partly collected from the plurality of content viewers after the first time point [Same as above, where each test session of Nyström can be reasonably viewed as collecting eye movement data during a given time period. For example a 2nd session would follow the 1st session from which a 2nd set of eye movement data is collected]; determining a second plurality of spatial locations, in the one or more video images, to which the second plurality of foveal visions of the plurality of content viewers is directed [Nyström’s methods as outlined in Sects. 3.1-3.2 can identify the ROIs based on the determined gaze density function (GDF) which serves as a likelihood estimation of where future viewers will direct their gazes (i.e., spatial locations)]; identifying, based on the second plurality of spatial locations in the one or more video images, one or more second regions of interest (ROIs) in the one or more video images.” [Same as above where the GDF contains information relative to where the ROIs are located in the scenes of the video. Also reference the intra frame ROI function (Eq. 6 on pg. 3) formed by the combined normalized GDFs (Fig. 2).] It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the techniques disclosed by Kortum and Nguyen to add the teachings of Nyström as above to provide an off-line foveation technique using derived VOI shapes to help improve the performance of current state of the art video compression technology having reported bit savings between 30-54% prior to encoding with H.264 (Nyström – abstract).
Claim 7-11 is rejected under 35 U.S.C. 103 as being unpatentable over Kortum, in view of Nguyen, and in further view of Johannesson E. “An eye-tracking based approach to gaze prediction using low-level features ”Master’s Thesis Spring 2005, Dept. of Cognitive Science, Lund University, Sweden, hereinafter referred to as Johannesson. 
Regarding claim 7,  Kortum and Nguyen teach all the limitations of claim 1, and are analyzed as previously discussed with respect to that claim. Kortum and Nguyen however do not teach the limitation of claim 7.  Johannesson on the other hand from the same or similar field of endeavor is found to disclose the features of claim 7, i.e. “further comprising: determining a set of user perceivable characteristics in connection with the video image [Johannesson reveals collecting eye-tracking data of subjects (pg. 19 top paragraph) while watching a video sequence. Said data of one subject can be viewed as a first set of user perceivable characteristics. Here the video image could be from video sequence A (pg. 16 – Table 1)]; D16088US02 (60175-0486)- 39 -determining a set of second user perceivable characteristics in connection with a second video image [same as above, where a second or even a third subject can provide second or third sets of user perceivable characteristics, respectively, while watching the video  sequence. Here the video image could be from video sequence B (pg. 17 – Table 2)]; based on the set of user perceivable characteristics and the second set of user perceivable characteristics, predicting a second ROI, in the second video image, that have one or more second perceivable characteristics correlating with one or more user perceivable characteristics of an ROI in the video image.” [Johannesson’s prediction model (Fig. 2) can predict where a subject may likely look while watching a video sequence, i.e., a predicted gaze point is a predicted ROI in an image. This is found in the prediction map outputted from the model based on previously determined feature density maps that link gaze points with feature values in an image(s). See at the top of pg. 12 (lines 1-9).]
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the techniques disclosed by Kortum and Nguyen to add the teachings of Johannesson as above which relies on a gaze prediction model for performing predicted foveation that provides a cost-efficient way of improved digital video compression (pg. 6, 3rd paragraph).
Regarding claim 8,  Kortum, Nguyen, and Johannesson teach all the limitations of claim 7, and are analyzed as previously discussed with respect to that claim. Kortum and Nguyen however do not teach the limitation of claim 8.  Johannesson on the other hand from the same or similar field of endeavor is found to disclose the features of claim 8, i.e. “wherein the second ROI in the second video image is identified from the one or more user perceivable characteristics before user viewing behavior data have been collected from the second video image.” the 2-D probability map (i.e., prediction map) outputted from Johannesson’s model yields the likelihood of a gaze in different regions of the video frames See at the top of pg. 12 (lines 1-9). This map is generated prior to measured gaze point information from subjects viewing a video sequence. This is subsequently performed to validate the model results. See Sect. 5.2 (pg. 36) where the prediction maps are validated against eye-tracking data of the subjects] The motivation for combining Kortum, Nguyen, and Johannesson has been discussed in connection with claim 7, above. 
Regarding claim 9,  Kortum, Nguyen, and Johannesson teach all the limitations of claim 7, and are analyzed as previously discussed with respect to that claim. Kortum and Nguyen however do not teach the limitation of claim 9.  Johannesson on the other hand from the same or similar field of endeavor is found to disclose the features of claim 9, i.e. “wherein the second ROI in the second video image is identified from the one or more user perceivable characteristics after at least a part of user viewing behavior data has been collected from the second video image.”  [Johannesson discloses after obtaining eye-tracking data for a plurality of users watching a video sequence (i.e., user viewing behavior data), this is used by the model (Fig. 2) to predict gaze points/ROIs in other images of a video sequence (Sect. 5.1, at the top of pg. 35)] The motivation for combining Kortum, Nguyen, and Johannesson has been discussed in connection with claim 7, above. 
Regarding claim 10,  Kortum, Nguyen, and Johannesson teach all the limitations of claim 7, and are analyzed as previously discussed with respect to that claim. Kortum and Nguyen however do not teach the limitation of claim 10.  Johannesson on the other hand from the same or similar field of endeavor is found to disclose the features of claim 10, i.e. “wherein the one or more user perceivable characteristics comprise at least one of: one or more visual characteristics, one or more audio characteristics, or one or more non-visual non-audio user perceptible characteristics.”  [Johannesson reveals tracking eye movements of test subjects where gaze coordinates are collected and processed (top paragraph of pg. 19) to determine the feature density maps (last two paragraphs on pg. 28). Said gaze coordinates can be viewed as visual characteristics to facilitate gaze prediction of low-level features in images] The motivation for combining Kortum, Nguyen, and Johannesson has been discussed in connection with claim 7, above. 
Regarding claim 11,  Kortum, Nguyen, and Johannesson teach all the limitations of claim 7, and are analyzed as previously discussed with respect to that claim. Kortum and Nguyen however do not teach the limitation of claim 11.  Johannesson on the other hand from the same or similar field of endeavor is found to disclose the features of claim 11, i.e. “wherein the video image is in a first video clip, and wherein the second video image is in a second different video clip. [See the two video sequences (A and B) taught by Johannesson as found for e.g., on pgs. 16 (Table 1) and 17 (Table 2).] The motivation for combining Kortum, Nguyen, and Johannesson has been discussed in connection with claim 7, above. 
Claim 12 is rejected under 35 U.S.C. 103 as being unpatentable over Kortum, in view of Nguyen, in further view of Johannesson, and in further view of Gopalan.
Regarding claim 12,  Kortum, Nguyen, and Johannesson teach all the limitations of claim 7, and are analyzed as previously discussed with respect to that claim. Kortum, Nguyen, and Johannesson however do not teach the limitation of claim 12.  Gopalan on the other hand from the same or similar field of endeavor is found to disclose the features of claim 12, i.e. “wherein objective metrics of computer vision for the video image is different from those for the second video image.” [In view of the filed specification (¶0149), where objective metrics of computer vision can comprise luminance characteristics, luminance distributions, chrominance characteristics, etc., Gopalan (pg. 12 Sect. 2.2,1st paragraph) describes regional features such as color, intensity, etc., which can be used for different images in ROI video coding. Also reference Gopalan’s use of PNSR which in turn is a function of pixel intensity as seen on pg. 4]  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the techniques of Kortum, Nguyen, and Johannesson to add the teachings of Gopalan as above to provide methods for region of interest tracking and video preprocessing so as to ensure higher quality to the regions of interest over the background (Gopalan - abstract). 
Regarding claim 13,  Kortum, Nguyen, Johannesson, and Gopalan teach all the limitations of claim 12, and are analyzed as previously discussed with respect to that claim. Kortum, Nguyen, and Johannesson however do not teach the limitation of claim 13.  Gopalan on the other hand from the same or similar field of endeavor is found to disclose the features of claim 13, i.e. “wherein the objective metrics of computer vision comprise one or more of: luminance characteristics, luminance distributions, chrominance characteristics, chrominance distributions, or spatial resolutions.  [Same as claim 12 above] The motivation for combining Kortum, Nguyen, Johannesson, and Gopalan has been discussed in connection with claim 12, above
Claim 14 is rejected under 35 U.S.C. 103 as being unpatentable over Kortum, in view of Nguyen, in further view of Johannesson and in further view of Lee. 
Regarding claim 14,  Kortum, Nguyen, and Johannesson teach all the limitations of claim 7, and are analyzed as previously discussed with respect to that claim. Kortum, Nguyen, and Johannesson however do not teach the limitation of claim 14.  Lee on the other hand from the same or similar field of endeavor discloses “further comprising: correlating a sound source, in spatial audio for the video image, to a ROI in the two or more ROIs of the video image; determining one or more second sound sources in second spatial audio for the second D16088US02 (60175-0486)- 40 -video image; predicting at least one of the one or more ROIs in the second video image, based at least in part on one or more spatial locations of the one or more second sound sources in the second video image.”  [Lee discloses using audio sources in a video sequence to facilitate identifying where an observer’s attention is focused. This helps to determine the level of quality in ROI coding (e.g., see abstract and conclusion - Sect. V). Also see Sect. II, pgs. 1-2 for added support where acoustic and visual feature vectors are analyzed over N frames.] 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the system disclosed by Kortum, Nguyen, and Johannesson to add the teachings of Lee as above to provide an efficient video coding method that utilizes audiovisual information based on observations that sound-emitting regions in a video sequence draw the attention of an observer. Advantageously, the foregoing allows for encoding different regions in a scene with different quality where regions far from the sound source are coded with lesser quality (abstract)

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. See PTO 892 for additional references.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to RICHARD A HANSELL JR. whose telephone number is (571)270-0615. The examiner can normally be reached Mon - Fri 10 am- 7 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jamie Atala can be reached on 571-272-7384. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/RICHARD A HANSELL  JR./Primary Examiner, Art Unit 2486