Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Priority
The later-filed application must be an application for a patent for an invention which is also disclosed in the prior application (the parent or original nonprovisional application or provisional application). The disclosure of the invention in the parent application and in the later-filed application must be sufficient to comply with the requirements of 35 U.S.C. 112(a) or the first paragraph of pre-AIA  35 U.S.C. 112, except for the best mode requirement.  See Transco Products, Inc. v. Performance Contracting, Inc., 38 F.3d 551, 32 USPQ2d 1077 (Fed. Cir. 1994)
The disclosure of the prior-filed provisional applications 62/645774 and 62/767454 fail to provide adequate support or enablement in the manner required by 35 U.S.C. 112(a) or pre-AIA  35 U.S.C. 112, first paragraph for one or more claims of this application. Neither of these applications disclose deriving an encoding comparison metric from a test and reference codec or analyzing a baseline model for accuracy.
Therefore, this application is being examined with a priority date of 3/13/2019. To advance this discussion, Applicant should cite the paragraphs of the provisionals that Applicant believes would enable the present claims. 

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention 

Claims 1-2, 4-8, 11-12, 14-16, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Bjontegaard (NPL “Calculation of average PSNR differences between RD-curves,” VCEG-M33, 2001) in view of Huszar (US PG Publication 2018/0240017) and Breiman (NPL “Bagging Predictors,” Machine Learning, 24, 123–140 (1996)).

	Regarding Claim 1, Bjontegaard (NPL “Calculation of average PSNR differences between RD-curves,” VCEG-M33, 2001) discloses a computer-implemented method (method for calculating difference between two RD curves, Section 1 intro), comprising:
	computing a plurality of first quality scores (four data points, Section 1 intro, of SNR, Section 4.1) associated with a test encoding configuration (first simulation of two simulations for video coding, Section 1 Intro, e.g., plot 2, Fig. 3) based on a quality model (interpolated SNR curve from SNR data points Section 4.1, first equation, SNR = a + b*bit + c*bit2 + d*bit3, with constants a, b, c, and d determined such that the curve passes through all data points, Section 3, curve fitting);
	computing a plurality of second quality scores (our data points, Section 1 intro, of SNR, Section 4.1) associated with a reference encoding configuration (second simulation of two simulations for video coding, Section 1 Intro, e.g., plot 1, Fig. 3) based on the quality model (interpolated SNR curve from SNR data points Section 4.1, first equation, SNR = a + b*bit + c*bit2 + d*bit3, with constants a, b, c, and d determined such that the curve passes through all data points, Section 3, curve fitting);
and generating a value for an encoding comparison metric (Average bitrate difference  based on the first plurality quality scores (SNR of first simulation, Fig. 3) and the second plurality of quality scores (SNR of second simulation, Fig. 3).
	Bjontegaard does not explicitly disclose, but Huszar (US PG Publication 2018/0240017) teaches wherein the quality model replaced by a plurality of bootstrap quality models (plurality of committee members, e.g., deep neural network 150_1…150_n [0035]), wherein each bootstrap quality model is trained (each committee member initialized [0035]) based on a different subset (using a different training set of the training sets [00035], different training sets of data from labeled distortions pairs 105 [0034]) of first training data (labeled distortion pairs 105, [0029, 0031-0034, 0035-0038]) included in a training database (stored in memory [0032]), wherein the first training data (labeled distortion pairs 105, [0029, 0031-0034, 0035-0038]) comprises at least a portion of all training data (labeled distortion pairs 105, [0029, 0031-0034, 0035-0038]) included in a training database (stored in memory [0032]);
wherein generating a value is generating a distribution (variance in scores [0036], diversity of scores [0035-0038]) of bootstrap values (each committee member provides a score [0036]);
and determining an accuracy (uncertainty [0036]) of a baseline value (score returned by committee member deep neural network 150_1 [0035]) based on the distribution of bootstrap values (variance in scores [0036], diversity of scores [0035-0038]), wherein the baseline value (score returned by committee member deep neural network 150_1 [0035]) is generated by a baseline quality model (deep neural network 150_1 [0035]).
In addition, Breiman teaches wherein a baseline model (classification tree constructed from L, Page 125 bullets ii…iv; predictor φ(x, L), Introduction) is trained based on the first training data (the learning set L, from which bootstrapped learning sets LB of the same size as L are derived by sampling from L with replacement, Introduction).
	It would have been obvious to one of ordinary skill in the art before the application was filed to replace the PSNR metric of Bjontegaard with the bootstrapped perceptual loss score of Huszar because Huszar teaches that the perceptual loss score fully captures the human quality perception for many different types of distortion, and bootstrapping the machine-learned model generates an accurate predictor via a small set of labeled training data [0010] and can outperform other techniques of measuring video quality loss in breadth and scalability [0015].
It would have been obvious to one of ordinary skill in the art before the application was filed to train one of the neural networks, e.g., 150_1, of Huszar based on the entire set of labeled distortion pairs because Briemann suggests that training a model on the entire training set L and training bootstrapped models on resampled sets of L will provide information on both the stability of the model based on L, and the improvement of the bootstrapped models over the model based on L (Introduction and Tables 2, 3, 5).

Regarding Claim 2, Bjontegaard discloses the computer-implemented method of claim 1, wherein the encoding comparison metric comprises a Bjontegaard rate difference (BD-rate) (Average bitrate difference in % over the whole range of PSNR, Section 4.1).

Regarding Claim 4, Bjontegaard discloses the computer-implemented method of claim 1, wherein the test encoding configuration is associated with a first coder/decoder (codec) (first of two simulation conditions for video coding, Section 1 introduction), and the reference encoding configuration is associated with a second codec (second of two simulation conditions for video coding, Section 1 introduction).

	Regarding Claim 5, Bjontegaard discloses the computer-implemented method of claim 1, further comprising:
	computing a third plurality of quality scores (SNR plot, Section 1 intro, of SNR, Section 4.1) associated with the test encoding configuration (first simulation of two simulations for video coding, Section 1 Intro, e.g., plot 2, Fig. 3) based on the baseline quality model (interpolated SNR curve from SNR data points Section 4.1, first equation, SNR = a + b*bit + c*bit2 + d*bit3, with constants a, b, c, and d determined such that the curve passes through all data points, Section 3, curve fitting);
	computing a fourth plurality of quality scores (SNR plot data points, Section 1 intro, of SNR, Section 4.1) associated with the reference encoding configuration (second simulation of two simulations for video coding, Section 1 Intro, e.g., plot 1, Fig. 3) based on the baseline quality model (interpolated SNR curve from SNR data points Section 4.1, first equation, SNR = a + b*bit + c*bit2 + d*bit3, with constants a, b, c, and d determined such that the curve passes through all data points, Section 3, curve fitting);
	and computing the baseline value for the encoding comparison metric (Average bitrate difference in %, Section 4.1) based on the third plurality of quality scores (first simulation of two simulations for video coding, Section 1 Intro, e.g., plot 2, Fig. 3) and the fourth plurality of quality scores (second simulation of two simulations for video coding, Section 1 Intro, e.g., plot 2, Fig. 3). 

Regarding Claim 6, Bjontegaard discloses the computer-implemented method of claim 1, wherein computing the first plurality of quality scores comprises: 
performing one or more encoding operations based on the test encoding configuration and a portion of video content, to generate a portion of encoded video content (first of two simulation conditions for video coding, Section 1 introduction);
and computing a first quality score based on the portion of encoded video content and a quality model (interpolated SNR curve from SNR data points Section 4.1, first equation, SNR = a + b*bit + c*bit2 + d*bit3, with constants a, b, c, and d determined such that the curve passes through all data points, Section 3, curve fitting).
Bjontegaard does not explicitly disclose, but Huszar (US PG Publication 2018/0240017) teaches wherein the model is a first bootstrap quality model included in the plurality of bootstrap quality models (one of deep neural networks 150_1…150_n, Fig. 1, [0035]; for perceptual loss [0038]). 
	It would have been obvious to one of ordinary skill in the art before the application was filed to replace the PSNR metric of Bjontegaard with the bootstrapped perceptual loss score of Huszar because Huszar teaches that the perceptual loss score fully captures the human quality perception for many different types of distortion, and bootstrapping the machine-learned model generates an accurate predictor via a small set of labeled training data [0010] and can outperform other techniques of measuring video quality loss in breadth and scalability [0015].

	Regarding Claim 7, Bjontegaard discloses the computer-implemented method of claim 1, wherein each quality score included in the first plurality of quality scores is associated with a different combination of a portion of encoded video content (simulation, Section 1 Intro), a bitrate setting (performed at 4 data points, Section 1, e.g. 4 bitrate settings—data points, Fig. 3), and a quality model (interpolated SNR curve from SNR data points Section 4.1, first equation, SNR = a + b*bit + c*bit2 + d*bit3, with constants a, b, c, and d determined such that the curve passes through all data points, Section 3, curve fitting).
Bjontegaard does not explicitly disclose, but Huszar (US PG Publication 2018/0240017) teaches wherein the model is a bootstrap quality model included in the plurality of bootstrap models (deep neural networks 150_1…150_n, Fig. 1, [0035]). 
	It would have been obvious to one of ordinary skill in the art before the application was filed to replace the PSNR metric of Bjontegaard with the bootstrapped perceptual loss score of Huszar because Huszar teaches that the perceptual loss score fully captures the human quality perception for many different types of distortion, and bootstrapping the machine-learned model generates an accurate predictor via a small set of labeled training data [0010] and can outperform other techniques of measuring video quality loss in breadth and scalability [0015].

Regarding Claim 8, Bjontegaard discloses the computer-implemented method of claim 1.
Bjontegaard does not explicitly disclose, but Huszar teaches wherein the baseline quality model is trained based on the training database (training set generator generates training sets for each deep neural net 150_1…150_n, Fig. 1, [0034]; each NN 150_1..n initialized using different training set [0035]). 
	It would have been obvious to one of ordinary skill in the art before the application was filed to replace the PSNR metric of Bjontegaard with the bootstrapped perceptual loss score of Huszar because Huszar teaches that the perceptual loss score fully captures the human quality perception for many different types of distortion, and bootstrapping the machine-learned model generates an accurate predictor via a small set of labeled training data [0010] and can outperform other techniques of measuring video quality loss in breadth and scalability [0015].

	Regarding Claim 11, Bjontegaard does not explicitly disclose, but Huszar (US PG Publication 2018/0240017) teaches one or more non-transitory computer readable media including instructions that, when executed by one or more processors, cause the one or more processors to perform the steps (computer program product embodied on a computer-readable storage device includes instructions [0014]).
	The remainder of Claim 11 is rejected on the grounds provided in Claim 1. 

	Regarding Claim 12, Bjontegaard discloses the one or more non-transitory computer readable media of claim 11, wherein the encoding comparison metric specifies a percentage bitrate change when encoding using the test encoding configuration relative to encoding using the reference encoding configuration while maintaining the same quality score (rejected on the grounds provided in Claim 2. According to Spec filed 3/13/2019 at [0006] this claim is the definition of BD rate, claimed in Claim 2). 

	Regarding Claim 14, Claim 14 is rejected on the grounds provided in Claim 4. 
	Regarding Claim 15, Claim 15 is rejected on the grounds provided in Claim 6. 
	Regarding Claim 16, Claim 16 is rejected on the grounds provided in Claim 7.
Regarding Claim 20, Claim 20 is rejected on the grounds provided in Claim 11. 

Claims 3 and 13 are rejected under 35 U.S.C. 103 as being unpatentable over Bjontegaard (NPL “Calculation of average PSNR differences between RD-curves,” VCEG-M33, 2001) in view of Huszar (US PG Publication 2018/0240017) and Netflix (NPL, “Toward a Practical Perceptual Video Quality Metric,” Netflix Tech Blog, June 2016, available at https://netflixtechblog.com/toward-a-practical-perceptual-video-quality-metric-653f208b9652).

	Regarding Claim 3, Bjontegaard discloses the computer-implemented method of claim 1, wherein each quality score included in the first plurality of quality scores (four data points of PSNR at different bitrates, Fig. 3) is a different value for a metric (PSNR, Fig. 3). 
Bjontegaard does not explicitly disclose, but Netflix (NPL, “Toward a Practical Perceptual Video Quality Metric,” Netflix Tech Blog, June 2016, available at https://netflixtechblog.com/toward-a-practical-perceptual-video-quality-metric-653f208b9652) teaches wherein the metric is the Video Multimethod Assessment Fusion (VMAF) metric (Video multimethod assessment fusion, Pages 14-19).
It would have been obvious to one of ordinary skill in the art before the application was filed to replace the PSNR model of Bjontegaard or the perceptual loss model of Huszar with VMAF because Netflix teaches that VMAF is a perceptual quality model that exhibits strong correlation with a mean opinion score, how non-experts would perceive the video quality, and is a better predictor of perceptual quality than the best competing quality model, PSNRHVS (Pages 7, 16-17). 

	Regarding Claim 13, Claim 13 is rejected on the grounds provided in Claim 3. 

Claims 9-10 and 18-19 are rejected under 35 U.S.C. 103 as being unpatentable over Bjontegaard (NPL “Calculation of average PSNR differences between RD-curves,” VCEG-M33, 2001) in view of Huszar (US PG Publication 2018/0240017), Netflix (NPL, “Toward a Practical Perceptual Video Quality Metric,” Netflix Tech Blog, June 2016, available at https://netflixtechblog.com/toward-a-practical-perceptual-video-quality-metric-653f208b9652), and Kirkman (NPL “Kolmogorov-Smirnov Test,” http://www.physics.csbsju.edu/stats/  2007).

	Regarding Claim 9, Bjontegaard discloses the computer-implemented method of claim 1, wherein the configuration is an encoding configuration (H.26L simulation, Section 1 Intro).
Bjontegaard does not explicitly disclose, but Huszar (US PG Publication 2018/0240017) teaches wherein the values are bootstrap values (deep neural networks 150_1…150_n, scores from each committee member [0036], Fig. 1, [0035]).
Bjontegaard does not explicitly disclose, but Netflix teaches wherein the analysis is content analysis (performance of metric, Page 16) for a first type of video content (e.g., high noise video, computer graphics, TV drama, Page 16).
Bjontegaard does not explicitly disclose, but Kirkman (NPL “Kolmogorov-Smirnov Test,” http://www.physics.csbsju.edu/stats/  2007) teaches further comprising performing one or more analysis operations (Kolmogorov-Smirnov Test, Title) based on the distribution of values (two data sets, summary) to quantify a performance of the test configuration (compare treatment group to control group, Introduction).
	It would have been obvious to one of ordinary skill in the art before the application was filed to replace the PSNR metric of Bjontegaard with the bootstrapped perceptual loss score of Huszar because Huszar teaches that the perceptual loss score fully captures the human quality perception for many different types of distortion, and bootstrapping the machine-learned model generates an accurate predictor via a small set of labeled training data [0010] and can outperform other techniques of measuring video quality loss in breadth and scalability [0015].
It would have been obvious to one of ordinary skill in the art before the application was filed to analyze the encoders of Bjontegaard by content types because it was well-known in the art before the application was filed that different image types, for example text, computer graphics, natural images do well under different encoding schemes, therefore comparing encoders across multiple image-types would not enable precise analysis of how an encoder performs against another encoder. 
It would have been obvious to one of ordinary skill in the art before the application was filed to compare the distribution of BD rates among different encoding schemes and images using the Kolmogorov-Smirnov Test because Kirkman teaches that the Kolmogorov-Smirnov Test makes no assumptions about the underlying distribution of the data, which is very useful in empirical datasets where the data are usually non-normally distributed, making other tests, such as the t-test, unusable. 

	Regarding Claim 10, Claim 10 is rejected on the grounds provided in Claim 19. 
	Regarding Claim 18, Claim 18 is rejected on the grounds provided in Claim 9. 

	Regarding Claim 19, Bjontegaard discloses the one or more non-transitory computer readable media of claim 11, 
wherein the configuration is an encoding configuration (H.26L simulation, Section 1 Intro); and
wherein the performance is an encoding performance (Average bitrate difference in % over the whole range of PSNR, Section 4.1).
	Bjontegaard does not explicitly disclose, but Huszar (US PG Publication 2018/0240017) teaches wherein the values are bootstrap values (deep neural networks 150_1…150_n, scores from each committee member [0036], Fig. 1, [0035]).
Bjontegaard does not explicitly disclose, but Kirkman (NPL “Kolmogorov-Smirnov Test,” http://www.physics.csbsju.edu/stats/  2007) teaches further comprising performing a Kolmogorov Smirnov test (Kolmogorov-Smirnov Test, Title) based on the distribution of values (the data, first of two data sets, summary) and another distribution of values associated with a different configuration (the data, second of two data sets, summary) to compare a performance of the test configuration (treatment group, Introduction) to a performance of the different configuration (control group, Introduction). 
	It would have been obvious to one of ordinary skill in the art before the application was filed to replace the PSNR metric of Bjontegaard with the bootstrapped perceptual loss score of Huszar because Huszar teaches that the perceptual loss score fully captures the human quality perception for many different types of distortion, and bootstrapping the machine-learned model generates an accurate predictor via a small set of labeled training data [0010] and can outperform other techniques of measuring video quality loss in breadth and scalability [0015].
It would have been obvious to one of ordinary skill in the art before the application was filed to compare the distribution of BD rates among different encoding schemes and images using the KS test because Kirkman teaches that the KS test makes no assumptions about the underlying distribution of the data, which is very useful in empirical datasets where the data are usually non-normally distributed, making other tests, such as the t-test, unusable. 

Claim 17 is rejected under 35 U.S.C. 103 as being unpatentable over Bjontegaard (NPL “Calculation of average PSNR differences between RD-curves,” VCEG-M33, 2001) in view of Huszar (US PG Publication 2018/0240017), and Minitab (NPL “Understanding Hypothesis Tests: Confidence Intervals and Confidence Levels,” available at https://blog.minitab.com/blog/adventures-in-statistics-2/understanding-hypothesis-tests-confidence-intervals-and-confidence-levels 2015).

	Regarding Claim 17, Bjontegaard discloses the one or more non-transitory computer readable media of claim 11.
	Bjontegaard does not explicitly disclose, but Huszar teaches wherein the values are bootstrap values (deep neural networks 150_1…150_n, scores from each committee member [0036], Fig. 1, [0035]).
Bjontegaard does not explicitly disclose, but Minitab (NPL “Understanding Hypothesis Tests: Confidence Intervals and Confidence Levels,” available at https://blog.minitab.com/blog/adventures-in-statistics-2/understanding-hypothesis-tests-confidence-intervals-and-confidence-levels 2015) teaches further comprising generating a confidence interval (e.g., 267 – 394, Figure, Section “confidence interval and margin of error”) based on the distribution of values (distribution figure, Section “confidence interval and margin of error”) and a confidence level (95%, figure, Section “confidence interval and margin of error”).
	It would have been obvious to one of ordinary skill in the art before the application was filed to replace the PSNR metric of Bjontegaard with the bootstrapped perceptual loss score of Huszar because Huszar teaches that the perceptual loss score fully captures the human quality perception for many different types of distortion, and bootstrapping the machine-learned model generates an accurate predictor via a small set of labeled training data [0010] and can outperform other techniques of measuring video quality loss in breadth and scalability [0015].
It would have been obvious to one of ordinary skill in the art before the application was filed to use confidence interval to measure the divergence of the bootstrapped models of Huszar because Minitab teaches confidence intervals can be used to assess the precision of the sample estimate. For a specific variable, a narrower confidence interval [90 110] suggests a more precise estimate of the population parameter than a wider confidence interval [50 150] (Section “How to correctly interpret confidence intervals…”).

Response to Arguments
	Applicant’s remarks filed 7/23/2021 have been considered but are unpersuasive. 
	Applicant argues that the combination of Huszar in view of Brieman does not teach “determining an accuracy of a baseline value for the encoding comparison metric based on the distribution of bootstrap values, wherein the baseline value for the encoding comparison metric is generated by a baseline quality model that is trained based on the first training data.”
	This is unpersuasive in view of the specification’s definition of “determining accuracy.”
	The specification indicates that the uncertainty/accuracy of the baseline model is reflected in the width/breadth of the bootstrap distribution. The specification states, “The confidence interval quantifies the accuracy of the baseline perceptual quality score [0029]. …[S]uppose that the confidence intervals associated with the perceptual quality scores for relatively low resolution encoded source videos are significantly larger than the confidence intervals associated with the perceptual quality scores for relatively high resolution encoded sources videos. The video service provider could add additional low resolution encoded sources to the training encode database. Subsequently, the bootstrapping training engine could retrain the baseline perceptual quality model and the bootstrap perceptual quality model to improve the accuracy of the baseline perceptual quality scores for low resolution encoded source videos [0030].” Thus, the specification indicates that the width or breadth of the confidence interval of the bootstrap distribution reflects the accuracy of the baseline model.
	Huszar also uses the width/breadth of the bootstrap distribution to reflect the uncertainty/accuracy of the classification result. Huszar calls it diversity, variance, disagreement, and uncertainty [0036] and [0049]. Huszar does not specifically use the uncertainty to quantify the accuracy of a baseline model. However, Brieman provides a teaching and motivation to modify Huszar to do so. 
Brieman teaches that bootstrapped predictors from resampled sets of a training set can reflect the instability of a baseline predictor trained on the original training set (Introduction). Furthermore, the baseline predictor can be compared against the bootstrapped predictors to indicate how much improvement, if any, the bootstrapped predictors can provide (Section 2). 
With this teaching, one of ordinary skill in the art would be motivated to modify Huszar to train one of the predictors 150_1 on the original training set for the benefit of evaluating the stability of the original predictor, and quantifying any improvement the bootstrap aggregated predictor proffers. 
Thus, in light of these teachings, one of ordinary skill in the art would have found it obvious to modify the disclosed references to arrive at the claimed subject matter. 

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Haug; Peter J. et al.
US 20080133275 A1
SYSTEMS AND METHODS FOR EXPLOITING MISSING CLINICAL DATA


THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SHADAN E HAGHANI whose telephone number is (571)270-5631.  The examiner can normally be reached on M-F 8-7.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jay Patel can be reached on 571-272-2988.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/SHADAN E HAGHANI/Examiner, Art Unit 2485