DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This action is in response to the remarks and amendment filed on 6/6/2022. Claims 1-6, 9-12 and 15-18 are pending. Claims 7, 8, 13, 14, 19 and 20 were previously cancelled in the amendment filed 5/7/2020. 

Response to Amendment
Applicant’s amendment filed on 6/6/2022 has been entered. 
In the amendment, claims 1, 9 and 15 were amended, and no claims were added or cancelled. As noted above, claims 7, 8, 13, 14, 19 and 20 were previously cancelled. Thus, claims 1-6, 9-12 and 15-18 are pending and have been examined.

Response to Arguments
Applicant's arguments filed 6/6/2022 with respect to the rejections of claims 1-2, 9-10 and 15-16 under 35 U.S.C. 103 over Bilenko in view of Achin, Yang, Abadi and Mellempudi, and the rejections of claims 3-6, 11-12 and 17-18 under 35 U.S.C. 103 over Bilenko in view of Achin, Yang, Abadi, Mellempudi and Yu, have been fully considered, but they are not persuasive.
The examiner disagrees with applicant’s assertions regarding amended claims 1, 9 and 15, and directs applicant to the discussion of Bilenko and Achin below. Applicant’s amendments have necessitated the claim objections and rejections under 35 U.S.C. 103 discussed below.
With reference to claim 1, applicant states “As amended, independent claim 1 is directed to an apparatus and recites elements directed to processing circuitry to: …
generate a fixed-bit size scoring engine which enables deployment of a plurality of versions of the neural network model, wherein the plurality of versions of models comprise low-precision models, to perform simultaneous scoring of the plurality of versions of models to generate a plurality of scoring results; 
test an accuracy of the plurality of versions of the neural network model based on received input data and select a first of the plurality of model versions having a highest accuracy by comparing a minimum and a maximum of the input data with metadata for neural network model; and … provide an aggregation of the plurality of scoring results.” (applicant’s remarks pages 6-7, emphasis added to indicate limitations added in 6/6/2022 amendment). 
Applicant next repeats the entirety of claim 1 and states “As amended, claim 1 clarifies that the apparatus comprises processing circuitry to … perform simultaneous scoring of the plurality of versions of models to generate a plurality of scoring results, test an accuracy of the plurality of versions of the neural network model based on received input data and select a first of the plurality of model versions having a highest accuracy by comparing a minimum and a maximum of the input data with metadata for neural network model; provide a linear combination of outputs of the plurality of versions of the neural network model; and provide an aggregation of the plurality of scoring results.” (applicant’s remarks, page 7, emphasis added to indicate limitations added in the amendment). 
Applicant then again repeats all limitations of claim 1 and generally asserts “The cited references, alone or in combination, neither discloses (nor even suggest) an apparatus comprising processing circuitry to … perform simultaneous scoring of the plurality of versions of models to generate a plurality of scoring results, test an accuracy of the plurality of versions of the neural network model based on received input data and select a first of the plurality of model versions having a highest accuracy by comparing a minimum and a maximum of the input data with metadata for neural network model; provide a linear combination of outputs of the plurality of versions of the neural network model; and provide an aggregation of the plurality of scoring results, as recited in claim 1 as amended herein.” (applicant’s remarks, pages 8-9). 
With regard to amended claims 9 and 15 and the dependent claims, applicant generally asserts “Independent claims 9 and 15 have been amended to recite elements generally similar to those recited in independent claim 1. Accordingly, independent claims 1, 9 and 15 are allowable for at least similar arguments applied to independent claim 1. The remaining dependent claims depend ultimately from one of claims 1, 9 or 15 and are allowable at least by virtue of the dependency on claims 1, 9, or 15, or for the claim elements recited separately therein.” (applicant’s remarks, page 9).
Accordingly, applicant apparently argues that the above-noted claim limitations that were amended in independent claims 1, 9 and 15, i.e., “perform simultaneous scoring of the plurality of versions of models to generate a plurality of scoring results, test an accuracy of the plurality of versions of the neural network model based on received input data and select a first of the plurality of model versions having a highest accuracy by comparing a minimum and a maximum of the input data with metadata for neural network model; provide a linear combination of outputs of the plurality of versions of the neural network model; and provide an aggregation of the plurality of scoring results” are not taught in the portions of the Bilenko, Achin, Abadi, Yang, Mellempudi and Yu references cited in the previous Office Action. The examiner respectfully disagrees and points applicant to the below discussion of Bilenko and Achin, which teach the above-noted limitations added to claims 1, 9 and 15. 
Regarding the other above-noted limitations recited in the independent claims, the examiner further points applicant to the below discussion of Bilenko, Achin and Abadi.
First, as discussed in previous Office Actions and noted below, regarding the “generate a fixed-bit size scoring engine which enables deployment of a plurality of versions of the neural network model” limitation recited in independent claims 1, 9 and 15, paragraph 154 of applicant’s specification discloses that “a 32-bit engine of the model is generated” and “histograms for each tensor in a calibration set of data is generated via the 32-bit engine.” Therefore, a “fixed-bit size engine” of the neural network model, under the broadest reasonable interpretation (BRI), in light of the specification, is any engine or module representation of the network model having a certain bit size (e.g., 32-bit). 
With continued reference to the above-noted “generate a fixed-bit size scoring engine” limitation, the examiner points to paragraphs 55 and 65 of Bilenko, which explicitly disclose “compar[ing] various trained models 110” (i.e., a plurality of versions 110 of the neural network model) and “Statistics may be determined to show the differences in each model. For example … areas of disagreement between two models or algorithms” (i.e., generate a scoring engine for scoring/comparing accuracies of different versions of models 110 to enable deployment of the models 110). The examiner also points to paragraphs 60 and 79 of Yang, which explicitly disclose “a 32-bit floating point representation of the model … a fixed point representation (e.g., a 16-bit fixed point representation) … may be implemented” (i.e., a fixed-bit size representation/engine of the model) and “the neural network may include a floating point representation (e.g., a 32-bit” (i.e., a 32-bit representation of the neural network model).
Second, regarding the “wherein the plurality of versions of models comprise low-precision models, to perform simultaneous scoring of the plurality of versions of models” limitation recited in independent claims 1, 9 and 15, paragraphs 155, 157, 169 and 208 of applicant’s specification state that “input data 701 is fed into scoring engine 613, which utilizes low precision models. Scoring engine 613 enables a deployment of multiple model versions … in order to perform simultaneous scoring of the model versions.”, “metadata about an execution unit in the underlying hardware (e.g., low precision EU)”, “perform computational operations at a range of precisions … suited for machine learning computations. For example, … a subset of the floating point units in each of the compute clusters 906A-H can be configured to perform 16-bit or 32-bit floating point operations, while a different subset of the floating point units can be configured to perform 64-bit floating point operations” and “GPGPU 1506 can support instructions to perform low precision computations such as 8-bit and 4-bit integer vector operations.” The examiner notes that these are the only mentions of any “low-precision” models or operations in applicant’s specification. Therefore, “low-precision models”, under the BRI, are any models capable of performing low precision computations, such as 4, 8, or 16-bit operations. The plain meaning of simultaneous is existing, occurring, or operating at the same time. See https://www.dictionary.com/browse/simultaneously. Further, the plain meaning of simultaneously is at the same time. See https://www.dictionary.com/browse/simultaneously Therefore, “models, to perform simultaneous scoring of the plurality of versions of models”, under the BRI, in light of the specification, are any models capable of scoring or ranking versions of models simultaneously, at the same time, or in parallel. 
With continued reference to “the plurality of versions of models comprise low-precision models” limitation recited in independent claims 1, 9 and 15 and noted in applicant’s remarks, the examiner points to the Abstract and pages 1, 2 and 4 of Mellempudi, which discloses “we have also trained low-precision Resnet-50 with 8-bit activations and ternary weights … Our final quantized model can run on a full 8-bit compute pipeline”, “low-precision alternatives to perform deep learning tasks … trained networks using 16-bit fixed point” [i.e., 8 and 16-bit, low-precision models] and “We trained the low precision ResNet-50 … using 2-bit weights and 8-bit activations by initializing the network with pre-trained full precision model. … We obtained the pre-trained models published by Marcel et al.[10] and fine-tune the parameters of our low-precision network” [i.e., versions of the plurality of pre-trained models include low-precision, 8 and 16-bit models]. Alternatively, the examiner also points to paragraph 60 of Yang, which explicitly discloses that “the model … to implement lower level layers of the distributed neural network may be stored in a 16-bit fixed point, 8-bit fixed point, or quantized representation. Such representations of the model may provide substantial memory storage requirement reductions with similar accuracy (e.g., less than a 1 % accuracy drop for 16-bit fixed point representation) with respect to a 32-bit floating point representation of the model.” [i.e., models versions/representations include low-precision, 8 or 16-bit fixed point models with lower accuracy/lower-precision than 32-bit floating point models].
Third, regarding the “models, to perform simultaneous scoring of the plurality of versions of models to generate a plurality of scoring results” limitation added to independent claims 1, 9 and 15, the examiner points to paragraphs 35, 51, 54-55 and 67 of Bilenko, which explicitly disclose that “For the trained model 110, the visualization tool 116 may use a set of model outputs, such as … corresponding probability scores.” [i.e., models output/generate a plurality of scores/scoring results], “Visualizations … allow immediate comparison and contrast of the accuracy of different trained models 110” [i.e., immediate, side-by-side simultaneous comparison of a plurality of accuracy scores of models 110], “allow the user to view the … scores produced by trained models. … visualization of data … corresponding to features, labels, scores and any derived quantities (e.g., derived features or model confidence scores).”, “compare various trained models 110 … to compare the accuracy of the various trained models 110” and “the user may … request a visualization of … the scores produced by the trained model 110.” [i.e., simultaneously comparing the plurality of accuracy scores generated by different versions of models 110].
Regarding the “provide a linear combination of outputs of the plurality of versions of the neural network model” limitation recited in claims 1, 9 and 15 and noted in applicant’s remarks, the examiner points to paragraph 114 of Achin, which explicitly discloses that “Two or more models may be blended by combining the outputs of the constituent models … the blended model may comprise a weighted, linear combination of the outputs of the constituent models” [i.e., produce/provide a linear combination of outputs of versions of the constituent neural network model]. 
Additionally, regarding the “select a first of the plurality of model versions having a highest accuracy” limitation recited in independent claims 1, 9, and 15 and noted in applicant’s remarks, the examiner points to paragraphs 35, 57 and 66 of Bilenko, which explicitly disclose that “probability scores may indicate a confidence level that the predicted label values are accurate”, “provide threshold impact evaluation … a threshold specifies a confidence level at which a positive prediction is made. For example, the machine learning system may predict that an email is spam at or above a 90% confidence level”, “precision recall curve may show a tradeoff between false positives and false negatives for a given featurized training dataset 106 and set of parameters in the trained model 110” [i.e., a first of the model versions] and “the user may select one or more points on the precision recall curve, corresponding to decision thresholds” [i.e., user may select a most precise/accurate model]. Alternatively, the examiner also points to paragraphs 24, 91, 103, 185, 195 and 291 of Achin, which explicitly disclose “selecting, from the generated models, a predictive model for the initial prediction problem based, at least in part, on the score of the selected predictive model”, “selects the modeling procedures with suitability scores within a specified range of the highest suitability score … The range may be absolute (e.g., scores within S points of the highest score)”, “scoring metrics may place different weights on different aspects of a predictive model's performance, including, without limitation, the model's accuracy”, “embodiments … may track different versions of the same logical model”, “the predictive modeling system 100 can fit many different model types, including … neural networks” [i.e., select a neural network model version of the plurality of different model versions] and “evaluate sensitivity of the top models based on their relative predictive accuracy.” [i.e., select a top model having a highest score/predictive accuracy]).
Moreover, regarding the “provide an aggregation of the plurality of scoring results” limitation added to claims 1, 9 and 15, paragraph 156 of applicant’s specification states “result assembly logic 706 provides an aggregation of the scoring results to provide an optimal result.” This is the sole mention of any aggregation of any results, or any other aggregation of anything, in the specification. The plain meaning of aggregation is a group or mass of distinct or varied things or collection into an unorganized whole. See https://www.dictionary.com/browse/aggregation. Therefore, “an aggregation of the plurality of scoring results”, under the BRI, in light of the specification, is any grouping or collection of scoring results/scores into a whole, such as an overall result accuracy.
With continued reference to the above-noted “provide an aggregation of the plurality of scoring results” limitation, the examiner points to paragraphs 54-56 and 64 of Bilenko, which explicitly disclose “allow[ing] the user to view the … scores produced by trained models … allow the user to view 2-D or 3-D graphical visualization of data with dimensions and possibly other instance properties (e.g., color) corresponding to … scores and any derived quantities (e.g., derived features or model confidence scores).” [i.e., provide a view/visualization of the plurality of scoring results], “Visualizations provided … enable the user to compare various trained models 110. For example, a visualization may allow the user to compare the accuracy of the various trained models 110”, “interface 400 may show … overall accuracy” and “visualization area 508B may … show various statistics about the classification accuracy for different models” [i.e., provide a visualization/interface with aggregation of scoring results and accuracies of models 110]).
Lastly, regarding the “comparing a minimum and a maximum of the input data with metadata for neural network model” limitation added to independent claims 1, 9 and 15, paragraph 157 of applicant’s specification states “the most accurate output is determined by comparing a minimum and maximum of the input data with the metadata for the model, where the metadata records statistic characteristics of the data … metadata may include information regarding … scoring tasks”. The examiner notes that this is the only mention of any comparison of any “minimum and a maximum of the input data with metadata for the model” in applicant’s specification. Therefore, “comparing a minimum and a maximum of the input data with metadata for [sic – the] neural network model” under the BRI, in light of the specification, is any evaluation or comparison of minimum/lowest and maximum/highest input data values with metadata associated with a neural network model.
With continued reference to the above-noted, newly-added comparing limitation, the examiner points to paragraphs 142, 147, 159, 169, 171, 274 and 318 of Achin, which explicitly disclose that “engine 110 may determine which modeling techniques to eliminate using a suitable technique, including … eliminating those that do not produce models that meet a minimum threshold value of a model fit metric” [i.e., comparing a minimum value in data to neural network model metrics], “automatically producing a wide variety of models and maintaining metadata describing how the candidate models differ”, “modeling methodologies, techniques … edit the metadata attached to a technique.”, “model deployment engine 140 monitors the performance of deployed predictive models, and updates the performance metadata associated with the modeling techniques that generated the deployed models”, “a model's prediction may be generated by applying the pre-processing and modeling steps described in the modeling technique to each instance of new input data.” [i.e., comparing input data with metadata for the neural network model], “encodes this characteristic in the modeling technique's metadata. Datasets themselves may also have time series specific metadata” and “system 100 may prune ‘less important’ features from the dataset … if the predictive value of the feature is less than a threshold value, if the feature has one of the M lowest predictive values among the features in the dataset … the system may create derived features from ‘more important’ features in the dataset … if the feature has one of the N highest predictive values among the features in the dataset” [i.e., comparing lowest/minimum and highest/maximum of input data/dataset with metadata for the model].
Also, as detailed below, the combination of Bilenko, Achin, Yang, Abadi and Mellempudi (i.e., Bilenko in view of Achin, Abadi and Yang and further in view of Mellempudi) teaches the limitations of independent claims 1, 9 and 15 and claims 2, 10 and 16. As further discussed in detail below, the combination of Bilenko, Achin, Yang, Abadi, Mellempudi and Yu (i.e., Bilenko in view of Achin, Abadi, Yang and Mellempudi and further in view of Yu) teaches the limitations of dependent claims 3-6, 11-12 and 17-18. 
Applicant’s amendments have necessitated the claim objections and rejections under 35 U.S.C. 103 discussed below.

Claim Objections
Claims 1-6, 9-12 and 15-18 are objected to because of the following informalities: 
In lines 14-15 of amended claims 1 and 15 and lines 13-14 of amended claim 9, the respective recitations of “metadata for neural network model” are grammatically incorrect, and the word “the” appears to be missing between “for” and “neural” in these recitations (see, e.g., line 12 of claim 1 reciting “the neural network model” and lines 11 and 12 of claims 9 and 15 introducing “a neural network model”). Appropriate correction is required.
Also, claims 2-6, 10-12, and 16-18 are objected to based on their dependencies from claims 1, 9, and 15, respectively.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. 
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The text of those sections of Title 35, U.S. Code not included in this action can be found in a prior Office action.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 1-2, 9-10 and 15-16 are rejected under 35 U.S.C. 103 as being unpatentable over Bilenko et al. (U.S. Patent Application Pub. No. 2012/0158623 A1, hereinafter “Bilenko”) in view of Achin et al. (U.S. Patent Application Pub. No. 2018/0060738A1, hereinafter “Achin”) and non-patent literature Abadi et al ("TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems." November 9 2015, pages 1-19, hereinafter “Abadi”) and Yang et al. (U.S. Patent Application Pub. No. 2017/0076195 A1, hereinafter “Yang”) and further in view of non-patent literature Mellempudi et al. ("Mixed low-precision deep learning inference using dynamic fixed point." arXiv preprint arXiv:1701.08978 (February 2017): 1-6, hereinafter “Mellempudi”).
Achin was filed on October 21, 2016 and claims priority to U.S. Provisional Application No. 62/002,469, filed on May 23, 2014 and U.S. Provisional Application No. 62/411,526, filed on Oct. 21, 2016, and all of these dates are before the effective filing date of this application, i.e., April 1, 2017. Therefore, Achin constitutes prior art under 35 U.S.C. 102(a)(2).
Yang was published on March 16, 2017 and filed on September 10, 2015, and both of these dates are before the effective filing date of this application, i.e., April 1, 2017. Therefore, Yang constitutes prior art under 35 U.S.C. 102(a)(1).

Regarding claim 1, Bilenko discloses the invention as claimed including an apparatus to facilitate calibration of a neural network (NN) (see, e.g., paragraphs 19 and 37, “claimed subject matter may be implemented as a[n] … apparatus”, “visualizations may provide insight to issues with the trained model 110. Accordingly, the user may modify the trained model 110 with different parameters, or a different learning algorithm. The learning algorithm may identify model parameters, e.g., connection weights for neural networks.” [i.e., apparatus to facilitate user modification/calibration of parameters/weights for neural networks]), comprising processing circuitry (see, e.g., paragraphs 6 and 97, “a machine learning system … comprises a processing unit and a system memory that comprises code configured to direct the processing unit”, “functions performed by the above described components, devices, circuits” [i.e., processing circuits/circuitry]) to:
generate histograms for … a calibration data set (see, e.g., FIG. 5D and paragraph 67, “visualization area 508D may include a count versus probability graph … [t]he graph may represent a histogram of the results 114. The x-axis may represent ranges (buckets) of probability values, i.e., confidence levels, … [t]he y-axis may represent the number of emails that fall in a given probability bucket … the user may select one or more bars of the graph, and then request a visualization of the corresponding training data 102, featurized training dataset 106, or the scores produced by the trained model 110” [i.e., generate histograms for a calibration data set/dataset 106]);
generate a calibration table (see, e.g., paragraphs 51 and 64, “derived confusion difference matrices … allow immediate comparison and contrast of the accuracy of different trained models 110”, “a confusion difference matrix … may show various statistics about the classification accuracy for different models or algorithms” [i.e., derive/generate confusion difference matrix, a calibration table]) using one or more of the histograms constructed from the calibration data set (see, e.g., FIG. 5D – showing use of Histogram and paragraphs 64-67, “[s]tatistics may be determined to show the differences in each model. For example … areas of disagreement between two models or algorithms” [i.e., generate calibration table/confusion difference matrix], “points 520 may be selected as decision thresholds for visualization of a corresponding confusion matrix”, “visualization … may represent a histogram of the results 114 … a visualization of … featurized training dataset 106.” [i.e., using a histogram constructed from the calibration data set/dataset 106]);
generate a … scoring engine which enables deployment of a plurality of versions of the neural network model (see, e.g., paragraphs 55 and 65, “compare various trained models 110” [i.e., a plurality of versions 110 of the neural network model], “[s]tatistics may be determined to show the differences in each model. For example … areas of disagreement between two models or algorithms” ([i.e., generate a scoring engine for scoring/comparing accuracies of different versions of models 110 to enable deployment of the models 110]), wherein the plurality of versions of models comprise … models, to perform simultaneous scoring of the plurality of versions of models to generate a plurality of scoring results (aside from repeating the claim language in paragraphs 155 and 298, which state “simultaneous scoring of the model versions” and “scoring logic to simultaneously test accuracy of a plurality of versions of a NN model”, applicant’s specification does not define “models, to perform simultaneous scoring of the plurality of versions of models”. The plain meaning of simultaneous is existing, occurring, or operating at the same time. See https://www.dictionary.com/browse/simultaneously. Further, the plain meaning of simultaneously is at the same time. See https://www.dictionary.com/browse/simultaneously. Therefore, “models, to perform simultaneous scoring of the plurality of versions of models”, under the BRI, in light of the specification, are any models capable of comparing, scoring or ranking versions of models simultaneously, at the same time, or in parallel) (see, e.g., paragraphs 35, 51, 54-55 and 67, “For the trained model 110, the visualization tool 116 may use a set of model outputs, such as … corresponding probability scores.” [i.e., models output/generate a plurality of scores/scoring results], “Visualizations … allow immediate comparison and contrast of the accuracy of different trained models 110” [i.e., immediate, side-by-side simultaneous comparison of a plurality of accuracy scores of models 110], “allow the user to view the … scores produced by trained models. … visualization of data … corresponding to features, labels, scores and any derived quantities (e.g., derived features or model confidence scores).”, “compare various trained models 110 … to compare the accuracy of the various trained models 110”, “the user may … request a visualization of … the scores produced by the trained model 110.” [i.e., simultaneously comparing the plurality of accuracy scores generated by different versions of models 110]); 
test an accuracy of the plurality of versions of the neural network model based on received input data (see, e.g., paragraphs 30-31, 37 and 51, “training data 102 may be some collection of information that is used to train a machine learning system” [i.e., received input data], “[e]ach training instance may include a corresponding label of a target variable, i.e., the value to be predicted … [t]hese labels may be used to determine the accuracy of the machine learning system's predictions”, “modify the trained model 110 with different parameters, or a different learning algorithm … [t]he learning algorithm may identify model parameters, e.g., connection weights for neural networks” [i.e., versions of a NN model/neural networks with different parameters], “[v]isualizations … allow immediate comparison and contrast of the accuracy of different trained models” [i.e., test accuracy of a plurality of different versions of the neural network model]) and select a first of the plurality of model versions having a highest accuracy (see, e.g., paragraphs 35, 57 and 66, “probability scores may indicate a confidence level that the predicted label values are accurate”, “provide threshold impact evaluation … a threshold specifies a confidence level at which a positive prediction is made. For example, the machine learning system may predict that an email is spam at or above a 90% confidence level”, “precision recall curve may show a tradeoff between false positives and false negatives for a given featurized training dataset 106 and set of parameters in the trained model 110” [i.e., a first of the model versions], “the user may select one or more points on the precision recall curve, corresponding to decision thresholds” [i.e., user may select a most precise/accurate model]) … ; and
provide an aggregation of the plurality of scoring results (paragraph 156 of applicant’s specification states “result assembly logic 706 provides an aggregation of the scoring results to provide an optimal result.” This is the sole mention of any aggregation of any results, or any other aggregation of anything, in the specification. The plain meaning of aggregation is a group or mass of distinct or varied things or collection into an unorganized whole. See https://www.dictionary.com/browse/aggregation. Therefore, “an aggregation of the plurality of scoring results”, under the BRI, in light of the specification, is any grouping or collection of scoring results/scores into a whole, such as an overall result accuracy) (see, e.g., paragraphs 54-56 and 64 “allow the user to view the … scores produced by trained models … allow the user to view 2-D or 3-D graphical visualization of data with dimensions and possibly other instance properties (e.g., color) corresponding to … scores and any derived quantities (e.g., derived features or model confidence scores).” [i.e., provide a view/visualization of the plurality of scoring results], “Visualizations provided … enable the user to compare various trained models 110. For example, a visualization may allow the user to compare the accuracy of the various trained models 110”, “interface 400 may show … overall accuracy”, “visualization area 508B may … show various statistics about the classification accuracy for different models” [i.e., provide a visualization/interface with aggregation of scoring results and accuracies of models 110]).
Although Bilenko substantially discloses the claimed invention, Bilenko is not relied on for explicitly disclosing comparing a minimum and a maximum of the input data with metadata for neural network model and
provide a linear combination of outputs of the plurality of versions of the neural network model.
In the same field, analogous art Achin teaches comparing a minimum and a maximum of the input data with metadata for neural network model (paragraph 157 of applicant’s specification states “the most accurate output is determined by comparing a minimum and maximum of the input data with the metadata for the model, where the metadata records statistic characteristics of the data … metadata may include information regarding … scoring tasks”. The examiner notes that this is the only mention of any comparison of any “minimum and a maximum of the input data with metadata for the model” in applicant’s specification. Therefore, “comparing a minimum and a maximum of the input data with metadata for [sic – the] neural network model” under the BRI, in light of the specification, is any evaluation or comparison of minimum/lowest and maximum/highest input data values with metadata associated with a neural network model) (see, e.g., paragraphs 142, 147, 159, 169, 171, 274 and 318, “engine 110 may determine which modeling techniques to eliminate using a suitable technique, including … eliminating those that do not produce models that meet a minimum threshold value of a model fit metric” [i.e., comparing a minimum value in data to neural network model metrics], “automatically producing a wide variety of models and maintaining metadata describing how the candidate models differ”, “modeling methodologies, techniques … edit the metadata attached to a technique.”, “model deployment engine 140 monitors the performance of deployed predictive models, and updates the performance metadata associated with the modeling techniques that generated the deployed models”, “a model's prediction may be generated by applying the pre-processing and modeling steps described in the modeling technique to each instance of new input data.” [i.e., comparing input data with metadata for the neural network model], “encodes this characteristic in the modeling technique's metadata. Datasets themselves may also have time series specific metadata”, “system 100 may prune ‘less important’ features from the dataset … if the predictive value of the feature is less than a threshold value, if the feature has one of the M lowest predictive values among the features in the dataset … the system may create derived features from ‘more important’ features in the dataset … if the feature has one of the N highest predictive values among the features in the dataset” [i.e., comparing lowest/minimum and highest/maximum of input data/dataset with metadata for the model]) and
provide a linear combination of outputs of the plurality of versions of the neural network model (paragraph 156 of applicant’s specification states “logic 706 receives the output from the model versions and assembles the individual scoring” and “logic 706 may provide a linear combination of the output for all of result assembly logic 706. In such an embodiment, result assembly logic 706 provides an aggregation of the scoring results”. The examiner notes that this is the sole mention of any “linear combination” or “linear combination” of output in applicant’s specification. Therefore, “a linear combination of outputs” of versions of the neural network model under the BRI, in light of the specification, is a combination, summation or aggregation of outputs, results or scores of versions of a neural network model) (see, e.g., paragraph 114, “Two or more models may be blended by combining the outputs of the constituent models … the blended model may comprise a weighted, linear combination of the outputs of the constituent models” [i.e., produce/provide a linear combination of outputs of versions of the constituent neural network model]). 
Alternatively, Achin also teaches select a first of the plurality of model versions having a highest accuracy (see, e.g., paragraphs 24, 91, 103, 185, 195 and 291, “selecting, from the generated models, a predictive model for the initial prediction problem based, at least in part, on the score of the selected predictive model”, “selects the modeling procedures with suitability scores within a specified range of the highest suitability score … The range may be absolute (e.g., scores within S points of the highest score)”, “scoring metrics may place different weights on different aspects of a predictive model's performance, including, without limitation, the model's accuracy”, “embodiments … may track different versions of the same logical model”, “the predictive modeling system 100 can fit many different model types, including … neural networks” [i.e., select a neural network model version of the plurality of different model versions], “evaluate sensitivity of the top models based on their relative predictive accuracy.” [i.e., select a top model having a highest score/predictive accuracy]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Bilenko to incorporate the teachings of Achin to provide a predictive modeling system that implements a search technique (or "modeling methodology") for efficiently exploring a predictive modeling search space (e.g., potential modeling algorithms and versions of models) to generate a predictive modeling solution suitable for a specified prediction problem (See, e.g., Achin paragraph 40). Doing so would have allowed Bilenko to compare the suitability of different modeling solutions (i.e., different versions of a neural network model) for the prediction problem and adapt the modeling methodology/search technique based on results of prior searches to improve the effectiveness of the search technique over time, as suggested by Achin (See, e.g., Achin paragraph 40).
Although Bilenko in view of Achin substantially teaches the claimed invention, Bilenko in view of Achin is not relied on to teach receive a network definition of a neural network model; and generate histograms for each tensor in a … data set.
In the same field, analogous art Abadi teaches receive a network definition of a neural network model (see, e.g., page 2, “large-scale training of deep neural networks with hundreds of billions of parameters on hundreds of billions of example records” [i.e., neural network/NN models], “[a] TensorFlow binary defines the sets of operations and kernels available via a registration mechanism, and this set can be extended by linking in additional operation and/or kernel definitions/registrations” [i.e., receive a network definition of a neural network model]); and
generate histograms for each tensor in a … data set (see, e.g., pages 2 and 12-13, “Values that flow along normal edges in the graph (from outputs to inputs) are tensors, arbitrary dimensionality arrays where the underlying element type is specified or inferred at graph-construction time” [i.e., tensors in a set of values/data set], “TensorFlow supports a collection of different Summary operations that can be inserted into the graph including … histogram-based summaries (e.g., the distribution of weight values in a neural network layer)” [i.e., insert/generate histograms for each tensor]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Bilenko in view of Achin to incorporate the teachings of Abadi to provide a TensorFlow interface for expressing machine learning algorithms and an implementation for executing such algorithms so that computations expressed using TensorFlow can be executed with little or no change on a wide variety of heterogeneous systems, ranging from mobile devices such as phones and tablets up to large-scale distributed systems of hundreds of machines and thousands of computational devices such as GPU cards (See, e.g., Abadi, page 1, Abstract). Doing so would have allowed Bilenko in view of Achin to use the TensorFlow interface to express a wide variety of algorithms, including training and inference algorithms for deep neural network models (i.e., versions of neural network models) and to improve (i.e., reduce) training time for the models, as suggested by Abadi (See, e.g., Abadi, Abstract, pages 1 and 11). 
Although Bilenko in view of Achin and Abadi substantially teaches the claimed invention, Bilenko in view of Achin and Abadi is not relied on to teach generate a fixed-bit size engine of the neural network model; and generate a fixed-bit size scoring engine which enables deployment of a plurality of versions of the neural network model, wherein … versions of models comprise low-precision models.
In the same field, analogous art Yang teaches generate a fixed-bit size engine of the neural network model (paragraph 154 of applicant’s specification discloses that “a 32-bit engine of the model is generated” and “histograms for each tensor in a calibration set of data is generated via the 32-bit engine.” Therefore, a “fixed-bit size engine” of the neural network model, under the BRI, in light of the specification, is any engine or module representation of the network model having a certain bit size (e.g., 32-bit)) (see, e.g., paragraphs 60 and 79, “a 32-bit floating point representation of the model … a fixed point representation (e.g., a 16-bit fixed point representation) … may be implemented” [i.e., a fixed-bit size representation/engine of the model], “the neural network may include a floating point representation (e.g., a 32-bit” [i.e., a 32-bit representation of the neural network model]); and
generate a fixed-bit size scoring engine which enables deployment of a plurality of versions of the neural network model (paragraphs 153-154 and 157 of applicant’s specification disclose that “accuracy is optimized for the reference data set by a scoring engine 613”, “a scoring engine 613 is generated based on the network definition and the calibration table”, “[i]n one embodiment, scoring engine 613 is an 8-bit scoring engine” and “scoring engine 613 includes result selection logic 705 that classifies an input”. Therefore, a “fixed-bit size scoring engine”, under the BRI, is any engine, module, program or application implementation in a certain bit size (e.g., 8-bit) including logic for classifying input data) (see, e.g., paragraphs 60 and 62, “to implement lower level layers of the distributed neural network may be stored in a 16-bit fixed point, 8-bit fixed point, or quantized representation … representations of the model … (e.g., … 16-bit fixed point representation) …the models … a fixed point representation (e.g., a 16-bit fixed point representation) or a quantized representation may be implemented” [i.e., generate a fixed-bit size engine implementation], “some neural network implementations … may be performed with the fixed point representation model discussed herein (e.g., a 16-bit fixed point representation of the model)” [i.e., enabling distribution/deployment of representations/versions of the neural network model]), wherein … versions of models comprise low-precision models (paragraphs 155, 157, 169 and 208 of applicant’s specification state “input data 701 is fed into scoring engine 613, which utilizes low precision models. Scoring engine 613 enables a deployment of multiple model versions”, “metadata about an execution unit in the underlying hardware (e.g., low precision EU)”, “computational operations at a range of precisions … suited for machine learning computations. For example, … a subset of the floating point units in each of the compute clusters 906A-H can be configured to perform 16-bit or 32-bit floating point operations” and “GPGPU 1506 can support instructions to perform low precision computations such as 8-bit and 4-bit integer vector operations.” As noted above, these are the only references to any “low-precision” models or operations in applicant’s specification. Therefore, “low-precision models”, under the BRI, in light of the specification, are any models capable of performing low precision computations, such as 4, 8, or 16-bit operations) (see, e.g., paragraph 60, “the model … to implement lower level layers of the distributed neural network may be stored in a 16-bit fixed point, 8-bit fixed point, or quantized representation. Such representations of the model may provide substantial memory storage requirement reductions with similar accuracy (e.g., less than a 1 % accuracy drop for 16-bit fixed point representation) with respect to a 32-bit floating point representation of the model.” [i.e., representations/versions of models include low-precision, 8 or 16-bit fixed point models with lower accuracy/lower-precision than 32-bit floating point models]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Bilenko in view of Achin and Abadi to incorporate the teachings of Yang to provide a model to implement lower level layers of a distributed neural network that is stored in a 16-bit fixed point or 8-bit fixed point (i.e., fixed-bit size) representation. (See, e.g., Yang, paragraph 60). Doing so would have allowed Bilenko in view of Achin and Abadi to use the fixed-bit size representations of the model to provide substantial memory storage requirement reductions with similar accuracy (e.g., less than a 1% accuracy drop for 16-bit fixed point representation) with respect to a 32-bit floating point representation of the model, as suggested by Yang (See, e.g., Yang, paragraph 60). 
Although Bilenko in view of Achin, Abadi and Yang substantially teaches the claimed invention, Bilenko in view of Achin, Abadi and Yang is not relied on to teach wherein the plurality of versions of models comprise low-precision models.
In the same field, analogous art Mellempudi teaches wherein the plurality of versions of models comprise low-precision models (paragraphs 155, 157, 169 and 208 of applicant’s specification state that “input data 701 is fed into scoring engine 613, which utilizes low precision models. Scoring engine 613 enables a deployment of multiple model versions … in order to perform simultaneous scoring of the model versions.”, “metadata about an execution unit in the underlying hardware (e.g., low precision EU)”, “perform computational operations at a range of precisions … suited for machine learning computations. For example, … a subset of the floating point units in each of the compute clusters 906A-H can be configured to perform 16-bit or 32-bit floating point operations, while a different subset of the floating point units can be configured to perform 64-bit floating point operations” and “GPGPU 1506 can support instructions to perform low precision computations such as 8-bit and 4-bit integer vector operations.” As noted above, these are the only mentions of any “low-precision” models or operations in applicant’s specification. Therefore, “low-precision models”, under the BRI, are any models capable of performing low precision computations, such as 4, 8, or 16-bit operations) (see, e.g., Abstract and pages 1, 2 and 4, “we have also trained low-precision Resnet-50 with 8-bit activations and ternary weights … Our final quantized model can run on a full 8-bit compute pipeline”, “low-precision alternatives to perform deep learning tasks … trained networks using 16-bit fixed point” [i.e., 8 and 16-bit, low-precision models], “We trained the low precision ResNet-50 … using 2-bit weights and 8-bit activations by initializing the network with pre-trained full precision model. … We obtained the pre-trained models published by Marcel et al.[10] and fine-tune the parameters of our low-precision network” [i.e., versions of the plurality of pre-trained models include low-precision, 8 and 16-bit models]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Bilenko in view of Achin, Abadi and Yang to incorporate the teachings of Mellempudi to provide “a cluster-based quantization method to convert pre-trained full precision weights into ternary weights with minimal impact on the accuracy” by training a “low-precision Resnet-50 with 8-bit activations and ternary weights” (i.e., by training a low-precision neural network model) (See, e.g., Mellempudi, Abstract, page 1). Doing so would have allowed Bilenko in view of Achin, Abadi and Yang to use Mellempudi’s quantization method to produce a “final quantized model [that] can run on a full 8-bit compute pipeline, with a potential 16x improvement in performance compared to baseline full-precision models” where low-precision, quantized models can “be efficiently run on full 8-bit compute pipeline”, as suggested by Mellempudi (See, e.g., Mellempudi, Abstract, pages 1 and 5).

Regarding claim 2, as discussed above, Bilenko in view of Achin, Abadi, Yang and Mellempudi teaches the apparatus of claim 1.
Bilenko further discloses the processing circuitry to test the accuracy of each of the plurality of model versions with reference to a data distribution (see, e.g., paragraphs 51 and 58 “visualizations may be selected from various options. Such options may include, but are not limited, to … predicted value distribution plots for each label”, “Visualizations such as these may allow immediate comparison and contrast of the accuracy of different trained models 110.” [i.e., testing the accuracy of each of the different model versions], “[t]hese visualizations may show distributions of values in the featurized training dataset 106.” [i.e., value/data distributions]). 

Regarding independent claim 9, Bilenko discloses the invention as claimed including a method to facilitate calibration of a neural network (NN) (see, e.g., paragraphs 5, 19 and 37, “method and a system for improving accuracy in a machine learning system”, “claimed subject matter may be implemented as a method”, “visualizations may provide insight to issues with the trained model 110. Accordingly, the user may modify the trained model 110 with different parameters, or a different learning algorithm. The learning algorithm may identify model parameters, e.g., connection weights for neural networks.” [i.e., apparatus to facilitate user modification/calibration of parameters/weights for neural networks]), comprising: 
generating histograms for … a calibration data set (see, e.g., FIG. 5D and paragraph 67, “visualization area 508D may include a count versus probability graph … [t]he graph may represent a histogram of the results 114. The x-axis may represent ranges (buckets) of probability values, i.e., confidence levels, … [t]he y-axis may represent the number of emails that fall in a given probability bucket … the user may select one or more bars of the graph, and then request a visualization of the corresponding training data 102, featurized training dataset 106, or the scores produced by the trained model 110” [i.e., generate histograms for a calibration data set/dataset 106]);
generating a calibration table using one or more of the histograms constructed from the calibration data set (see, e.g., FIG. 5D – showing use of Histogram and paragraphs 51 and 64-67, “derived confusion difference matrices … allow immediate comparison and contrast of the accuracy of different trained models 110”, “a confusion difference matrix … may show various statistics about the classification accuracy for different models or algorithms” [i.e., the confusion difference matrix is a calibration table], “[s]tatistics may be determined to show the differences in each model. For example … areas of disagreement between two models or algorithms” [i.e., generate calibration table/confusion difference matrix], “points 520 may be selected as decision thresholds for visualization of a corresponding confusion matrix”, “visualization … may represent a histogram of the results 114 … a visualization of … featurized training dataset 106.” [i.e., using a histogram constructed from the calibration data set/dataset 106]); 
generating a … scoring engine which enables deployment of a plurality of versions of the neural network model (see, e.g., paragraphs 55 and 65, “compare various trained models 110” [i.e., a plurality of versions 110 of the neural network model], “[s]tatistics may be determined to show the differences in each model. For example … areas of disagreement between two models or algorithms” ([i.e., generate a scoring engine for scoring/comparing accuracies of different versions of models 110 to enable deployment of the models 110]), wherein the plurality of versions of models comprise … models, to perform simultaneous scoring of the plurality of versions of models to generate a plurality of scoring results (as indicated above, “models, to perform simultaneous scoring of the plurality of versions of models”, under the BRI, in light of the specification, are any models capable of comparing, scoring or ranking versions of models simultaneously, at the same time, or in parallel) (see, e.g., paragraphs 35, 51, 54-55 and 67, “For the trained model 110, the visualization tool 116 may use a set of model outputs, such as … corresponding probability scores.” [i.e., models output/generate a plurality of scores/scoring results], “Visualizations … allow immediate comparison and contrast of the accuracy of different trained models 110” [i.e., immediate, side-by-side simultaneous comparison of a plurality of accuracy scores of models 110], “allow the user to view the … scores produced by trained models. … visualization of data … corresponding to features, labels, scores and any derived quantities (e.g., derived features or model confidence scores).”, “compare various trained models 110 … to compare the accuracy of the various trained models 110”, “the user may … request a visualization of … the scores produced by the trained model 110.” [i.e., simultaneously comparing the plurality of accuracy scores generated by different versions of models 110]); 
testing an accuracy of the plurality of versions of a neural network model based on received input data (see, e.g., paragraphs 30-31, 37 and 51, “training data 102 may be some collection of information that is used to train a machine learning system” [i.e., received input data], “[e]ach training instance may include a corresponding label of a target variable, i.e., the value to be predicted … [t]hese labels may be used to determine the accuracy of the machine learning system's predictions”, “modify the trained model 110 with different parameters, or a different learning algorithm … [t]he learning algorithm may identify model parameters, e.g., connection weights for neural networks” [i.e., versions of a NN model/neural networks with different parameters], “[v]isualizations … allow immediate comparison and contrast of the accuracy of different trained models” [i.e., testing accuracy of a plurality of different versions of NN model]); and selecting a first of the plurality of model versions having a highest accuracy (see, e.g., paragraphs 35, 57 and 66, “probability scores may indicate a confidence level that the predicted label values are accurate”, “provide threshold impact evaluation … a threshold specifies a confidence level at which a positive prediction is made. For example, the machine learning system may predict that an email is spam at or above a 90% confidence level”, “precision recall curve may show a tradeoff between false positives and false negatives for a given featurized training dataset 106 and set of parameters in the trained model 110” [i.e., a first of the model versions], “the user may select one or more points on the precision recall curve, corresponding to decision thresholds” [i.e., user may select a most precise/accurate model]) … ; and
providing an aggregation of the plurality of scoring results (as indicated above, “an aggregation of the plurality of scoring results”, under the BRI, in light of the specification, is any grouping or collection of scoring results/scores into a whole, such as an overall result accuracy) (see, e.g., paragraphs 54-56 and 64 “allow the user to view the … scores produced by trained models … allow the user to view 2-D or 3-D graphical visualization of data with dimensions and possibly other instance properties (e.g., color) corresponding to … scores and any derived quantities (e.g., derived features or model confidence scores).” [i.e., providing a view/visualization of the plurality of scoring results], “Visualizations provided … enable the user to compare various trained models 110. For example, a visualization may allow the user to compare the accuracy of the various trained models 110”, “interface 400 may show … overall accuracy”, “visualization area 508B may … show various statistics about the classification accuracy for different models” [i.e., provide a visualization/interface with aggregation of scoring results and accuracies of models 110]).
Although Bilenko substantially discloses the claimed invention, Bilenko is not relied on for explicitly disclosing comparing a minimum and a maximum of the input data with metadata for neural network model and
 providing a linear combination of outputs of the plurality of versions of the neural network model.
In the same field, analogous art Achin teaches comparing a minimum and a maximum of the input data with metadata for neural network model (as indicated above, “comparing a minimum and a maximum of the input data with metadata for [sic – the] neural network model” under the BRI, in light of the specification, is any evaluation or comparison of minimum/lowest and maximum/highest input data values with metadata associated with a neural network model) (see, e.g., paragraphs 142, 147, 159, 169, 171, 274 and 318, “engine 110 may determine which modeling techniques to eliminate using a suitable technique, including … eliminating those that do not produce models that meet a minimum threshold value of a model fit metric” [i.e., comparing a minimum value in data to neural network model metrics], “automatically producing a wide variety of models and maintaining metadata describing how the candidate models differ”, “modeling methodologies, techniques … edit the metadata attached to a technique.”, “model deployment engine 140 monitors the performance of deployed predictive models, and updates the performance metadata associated with the modeling techniques that generated the deployed models”, “a model's prediction may be generated by applying the pre-processing and modeling steps described in the modeling technique to each instance of new input data.” [i.e., comparing input data with metadata for the neural network model], “encodes this characteristic in the modeling technique's metadata. Datasets themselves may also have time series specific metadata”, “system 100 may prune ‘less important’ features from the dataset … if the predictive value of the feature is less than a threshold value, if the feature has one of the M lowest predictive values among the features in the dataset … the system may create derived features from ‘more important’ features in the dataset … if the feature has one of the N highest predictive values among the features in the dataset” [i.e., comparing lowest/minimum and highest/maximum of input data/dataset with metadata for the model]) and
 providing a linear combination of outputs of the plurality of versions of the neural network model (as indicated above, “a linear combination of outputs” of versions of the neural network model under the BRI is a combination, summation or aggregation of outputs, results or scores of versions of a neural network model) (see, e.g., paragraph 114, “Two or more models may be blended by combining the outputs of the constituent models … the blended model may comprise a weighted, linear combination of the outputs of the constituent models” [i.e., produce/provide a linear combination of outputs of versions of the constituent neural network model]). 
Alternatively, Achin also teaches selecting a first of the plurality of model versions having a highest accuracy (see, e.g., paragraphs 24, 91, 103, 185, 195 and 291, “selecting, from the generated models, a predictive model for the initial prediction problem based, at least in part, on the score of the selected predictive model”, “selects the modeling procedures with suitability scores within a specified range of the highest suitability score … The range may be absolute (e.g., scores within S points of the highest score)”, “scoring metrics may place different weights on different aspects of a predictive model's performance, including, without limitation, the model's accuracy”, “embodiments … may track different versions of the same logical model”, “the predictive modeling system 100 can fit many different model types, including … neural networks” [i.e., select a neural network model version of the plurality of different model versions], “evaluate sensitivity of the top models based on their relative predictive accuracy.” [i.e., select a top model having a highest score/predictive accuracy]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Bilenko to incorporate the teachings of Achin to provide a predictive modeling system that implements a search technique (or "modeling methodology") for efficiently exploring a predictive modeling search space (e.g., potential modeling algorithms and versions of models) to generate a predictive modeling solution suitable for a specified prediction problem (See, e.g., Achin paragraph 40). Doing so would have allowed Bilenko to compare the suitability of different modeling solutions (i.e., different versions of a neural network model) for the prediction problem and adapt the modeling methodology/search technique based on results of prior searches to improve the effectiveness of the search technique over time, as suggested by Achin (See, e.g., Achin paragraph 40).
Although Bilenko in view of Achin substantially teaches the claimed invention, Bilenko in view of Achin is not relied on to teach receive a network definition of a neural network model; and generate histograms for each tensor in a … data set.
In the same field, analogous art Abadi teaches receive a network definition of a neural network model (see, e.g., page 2, “large-scale training of deep neural networks with hundreds of billions of parameters on hundreds of billions of example records” [i.e., neural network/NN models], “A TensorFlow binary defines the sets of operations and kernels available via a registration mechanism, and this set can be extended by linking in additional operation and/or kernel definitions/registrations” [i.e., receive a network definition of a neural network model]); and
generate histograms for each tensor in a … data set (see, e.g., pages 2 and 12-13, “Values that flow along normal edges in the graph (from outputs to inputs) are tensors, arbitrary dimensionality arrays where the underlying element type is specified or inferred at graph-construction time” [i.e., tensors in a set of values/data set], “TensorFlow supports a collection of different Summary operations that can be inserted into the graph including … histogram-based summaries (e.g., the distribution of weight values in a neural network layer)” [i.e., insert/generate histograms for each tensor]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Bilenko in view of Achin to incorporate the teachings of Abadi to provide a TensorFlow interface for expressing machine learning algorithms and an implementation for executing such algorithms so that computations expressed using TensorFlow can be executed with little or no change on a wide variety of heterogeneous systems, ranging from mobile devices such as phones and tablets up to large-scale distributed systems of hundreds of machines and thousands of computational devices such as GPU cards (See, e.g., Abadi, page 1, Abstract). Doing so would have allowed Bilenko in view of Achin to use the TensorFlow interface to express a wide variety of algorithms, including training and inference algorithms for deep neural network models (i.e., versions of neural network models) and to improve (i.e., reduce) training time for the models, as suggested by Abadi (See, e.g., Abadi, Abstract, pages 1 and 11). 
Although Bilenko in view of Achin and Abadi substantially teaches the claimed invention, Bilenko in view of Achin and Abadi is not relied on to teach generate a fixed-bit size engine of the neural network model; and generate a fixed-bit size scoring engine which enables deployment of a plurality of versions of the neural network model, wherein … versions of models comprise low-precision models.
In the same field, analogous art Yang teaches generate a fixed-bit size engine of the neural network model (as indicated above, a “fixed-bit size engine” of the neural network model under the BRI, in light of the specification, is any engine or module representation of the network model having a certain bit size (e.g., 32-bit)) (see, e.g., paragraphs 60 and 79, “a 32-bit floating point representation of the model … a fixed point representation (e.g., a 16-bit fixed point representation) … may be implemented” [i.e., a fixed-bit size representation/engine of the model], “the neural network may include a floating point representation (e.g., a 32-bit” [i.e., a 32-bit representation of the neural network model]); and
generate a fixed-bit size scoring engine which enables deployment of a plurality of versions of the neural network model (paragraphs 153-154 and 157 of applicant’s specification disclose that “accuracy is optimized for the reference data set by a scoring engine 613”, “a scoring engine 613 is generated based on the network definition and the calibration table”, “[i]n one embodiment, scoring engine 613 is an 8-bit scoring engine” and “scoring engine 613 includes result selection logic 705 that classifies an input”. Therefore, a “fixed-bit size scoring engine” under the BRI is any engine, module, program or application implementation in a certain bit size (e.g., 8-bit) including logic for classifying input data) (see, e.g., paragraphs 60 and 62, “to implement lower level layers of the distributed neural network may be stored in a 16-bit fixed point, 8-bit fixed point, or quantized representation … representations of the model … (e.g., … 16-bit fixed point representation) …the models … a fixed point representation (e.g., a 16-bit fixed point representation) or a quantized representation may be implemented” [i.e., generate a fixed-bit size engine implementation], “some neural network implementations … may be performed with the fixed point representation model discussed herein (e.g., a 16-bit fixed point representation of the model)” [i.e., enabling distribution/deployment of representations/versions of the neural network model]), wherein … versions of models comprise low-precision models (as indicated above, “low-precision models”, under the BRI, are any models capable of performing low precision computations, such as 4, 8, or 16-bit operations) (see, e.g., paragraph 60, “the model … to implement lower level layers of the distributed neural network may be stored in a 16-bit fixed point, 8-bit fixed point, or quantized representation. Such representations of the model may provide substantial memory storage requirement reductions with similar accuracy (e.g., less than a 1 % accuracy drop for 16-bit fixed point representation) with respect to a 32-bit floating point representation of the model.” [i.e., representations/versions of models include low-precision, 8 or 16-bit fixed point models with lower accuracy/lower-precision than 32-bit floating point models]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Bilenko in view of Achin and Abadi to incorporate the teachings of Yang to provide a model to implement lower level layers of a distributed neural network that is stored in a 16-bit fixed point or 8-bit fixed point (i.e., fixed-bit size) representation. (See, e.g., Yang, paragraph 60). Doing so would have allowed Bilenko in view of Achin and Abadi to use the fixed-bit size representations of the model to provide substantial memory storage requirement reductions with similar accuracy (e.g., less than a 1% accuracy drop for 16-bit fixed point representation) with respect to a 32-bit floating point representation of the model, as suggested by Yang (See, e.g., Yang, paragraph 60). 
Although Bilenko in view of Achin, Abadi and Yang substantially teaches the claimed invention, Bilenko in view of Achin, Abadi and Yang is not relied on to teach wherein the plurality of versions of models comprise low-precision models.
In the same field, analogous art Mellempudi teaches wherein the plurality of versions of models comprise low-precision models (paragraphs 155, 157, 169 and 208 of applicant’s specification state that “input data 701 is fed into scoring engine 613, which utilizes low precision models. Scoring engine 613 enables a deployment of multiple model versions … in order to perform simultaneous scoring of the model versions.”, “metadata about an execution unit in the underlying hardware (e.g., low precision EU)”, “perform computational operations at a range of precisions … suited for machine learning computations. For example, … a subset of the floating point units in each of the compute clusters 906A-H can be configured to perform 16-bit or 32-bit floating point operations, while a different subset of the floating point units can be configured to perform 64-bit floating point operations” and “GPGPU 1506 can support instructions to perform low precision computations such as 8-bit and 4-bit integer vector operations.” As noted above, these are the only mentions of any “low-precision” models or operations in applicant’s specification. Therefore, “low-precision models”, under the BRI, are any models capable of performing low precision computations, such as 4, 8, or 16-bit operations) (see, e.g., Abstract and pages 1, 2 and 4, “we have also trained low-precision Resnet-50 with 8-bit activations and ternary weights … Our final quantized model can run on a full 8-bit compute pipeline”, “low-precision alternatives to perform deep learning tasks … trained networks using 16-bit fixed point” [i.e., 8 and 16-bit, low-precision models], “We trained the low precision ResNet-50 … using 2-bit weights and 8-bit activations by initializing the network with pre-trained full precision model. … We obtained the pre-trained models published by Marcel et al.[10] and fine-tune the parameters of our low-precision network” [i.e., versions of the plurality of pre-trained models include low-precision, 8 and 16-bit models]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Bilenko in view of Achin, Abadi and Yang to incorporate the teachings of Mellempudi to provide “a cluster-based quantization method to convert pre-trained full precision weights into ternary weights with minimal impact on the accuracy” by training a “low-precision Resnet-50 with 8-bit activations and ternary weights” (i.e., by training a low-precision neural network model) (See, e.g., Mellempudi, Abstract, page 1). Doing so would have allowed Bilenko in view of Achin, Abadi and Yang to use Mellempudi’s quantization method to produce a “final quantized model [that] can run on a full 8-bit compute pipeline, with a potential 16x improvement in performance compared to baseline full-precision models” where low-precision, quantized models can “be efficiently run on full 8-bit compute pipeline”, as suggested by Mellempudi (See, e.g., Mellempudi, Abstract, pages 1 and 5).

Regarding independent claim 15, Bilenko discloses the invention as claimed including at least one non-transitory computer readable medium having instructions, which when executed by one or more processors, cause the processors to (see, e.g., paragraphs 5, 37, 78, 81 and 97, “method and a system for improving accuracy in a machine learning system”, “visualizations may provide insight to issues with the trained model 110. Accordingly, the user may modify the trained model 110 with different parameters, or a different learning algorithm. The learning algorithm may identify model parameters, e.g., connection weights for neural networks”, “system memory 716 is non-transitory computer-readable media”, “computer 712 also includes other non-transitory computer-readable media”, “a computer-readable storage media having computer-executable instructions for performing the acts … of the various methods” [i.e., non-transitory computer-readable medium having instructions to facilitate user modification/calibration of parameters/weights for neural networks]): …
generate histograms for … a calibration data set (see, e.g., FIG. 5D and paragraph 67, “visualization area 508D may include a count versus probability graph … [t]he graph may represent a histogram of the results 114. The x-axis may represent ranges (buckets) of probability values, i.e., confidence levels, … [t]he y-axis may represent the number of emails that fall in a given probability bucket … the user may select one or more bars of the graph, and then request a visualization of the corresponding training data 102, featurized training dataset 106, or the scores produced by the trained model 110” [i.e., generate histograms for a calibration data set/dataset 106]);
generate a calibration table using one or more of the histograms constructed from the calibration data set (see, e.g., FIG. 5D – showing use of Histogram and paragraphs 51 and 64-67, “derived confusion difference matrices … allow immediate comparison and contrast of the accuracy of different trained models 110”, “a confusion difference matrix … may show various statistics about the classification accuracy for different models or algorithms” [i.e., the confusion difference matrix is a calibration table], “[s]tatistics may be determined to show the differences in each model. For example … areas of disagreement between two models or algorithms” [i.e., generate calibration table/confusion difference matrix], “points 520 may be selected as decision thresholds for visualization of a corresponding confusion matrix”, “visualization … may represent a histogram of the results 114 … a visualization of … featurized training dataset 106.” [i.e., using a histogram constructed from the calibration data set/dataset 106]); 
generate a … scoring engine which enables deployment of a plurality of versions of the neural network model (see, e.g., paragraphs 55 and 65, “compare various trained models 110” [i.e., a plurality of versions 110 of the neural network model], “[s]tatistics may be determined to show the differences in each model. For example … areas of disagreement between two models or algorithms” ([i.e., generate a scoring engine for scoring/comparing accuracies of different versions of models 110 to enable deployment of the models 110]), wherein the plurality of versions of models comprise … models, to perform simultaneous scoring of the plurality of versions of models to generate a plurality of scoring results (as indicated above, “models, to perform simultaneous scoring of the plurality of versions of models”, under the BRI, in light of the specification, are any models capable of comparing, scoring or ranking versions of models simultaneously, at the same time, or in parallel) (see, e.g., paragraphs 35, 51, 54-55 and 67, “For the trained model 110, the visualization tool 116 may use a set of model outputs, such as … corresponding probability scores.” [i.e., models output/generate a plurality of scores/scoring results], “Visualizations … allow immediate comparison and contrast of the accuracy of different trained models 110” [i.e., immediate, side-by-side simultaneous comparison of a plurality of accuracy scores of models 110], “allow the user to view the … scores produced by trained models. … visualization of data … corresponding to features, labels, scores and any derived quantities (e.g., derived features or model confidence scores).”, “compare various trained models 110 … to compare the accuracy of the various trained models 110”, “the user may … request a visualization of … the scores produced by the trained model 110.” [i.e., simultaneously comparing the plurality of accuracy scores generated by different versions of models 110]); 
 test an accuracy of the plurality of versions of a neural network model based on received input data (see, e.g., paragraphs 30-31, 37 and 51, “training data 102 may be some collection of information that is used to train a machine learning system” [i.e., received input data], “[e]ach training instance may include a corresponding label of a target variable, i.e., the value to be predicted … [t]hese labels may be used to determine the accuracy of the machine learning system's predictions”, “modify the trained model 110 with different parameters, or a different learning algorithm … [t]he learning algorithm may identify model parameters, e.g., connection weights for neural networks” [i.e., versions of a NN model/neural networks with different parameters], “[v]isualizations … allow immediate comparison and contrast of the accuracy of different trained models” [i.e., test accuracy of a plurality of different versions of NN model]);
select a first of the plurality of model versions having a highest accuracy (see, e.g., paragraphs 35, 57 and 66, “probability scores may indicate a confidence level that the predicted label values are accurate”, “provide threshold impact evaluation … a threshold specifies a confidence level at which a positive prediction is made. For example, the machine learning system may predict that an email is spam at or above a 90% confidence level”, “precision recall curve may show a tradeoff between false positives and false negatives for a given featurized training dataset 106 and set of parameters in the trained model 110” [i.e., a first of the model versions], “the user may select one or more points on the precision recall curve, corresponding to decision thresholds” [i.e., user may select a most precise/accurate model]) … ; and
providing an aggregation of the plurality of scoring results (as indicated above, “an aggregation of the plurality of scoring results”, under the BRI, in light of the specification, is any grouping or collection of scoring results/scores into a whole, such as an overall result accuracy) (see, e.g., paragraphs 54-56 and 64 “allow the user to view the … scores produced by trained models … allow the user to view 2-D or 3-D graphical visualization of data with dimensions and possibly other instance properties (e.g., color) corresponding to … scores and any derived quantities (e.g., derived features or model confidence scores).” [i.e., providing a view/visualization of the plurality of scoring results], “Visualizations provided … enable the user to compare various trained models 110. For example, a visualization may allow the user to compare the accuracy of the various trained models 110”, “interface 400 may show … overall accuracy”, “visualization area 508B may … show various statistics about the classification accuracy for different models” [i.e., provide a visualization/interface with aggregation of scoring results and accuracies of models 110]).
Although Bilenko substantially discloses the claimed invention, Bilenko is not relied on for explicitly disclosing comparing a minimum and a maximum of the input data with metadata for neural network model and
 provide a linear combination of outputs of the plurality of versions of the neural network model.
In the same field, analogous art Achin teaches comparing a minimum and a maximum of the input data with metadata for neural network model (as indicated above, “comparing a minimum and a maximum of the input data with metadata for [sic – the] neural network model” under the BRI, in light of the specification, is any evaluation or comparison of minimum/lowest and maximum/highest input data values with metadata associated with a neural network model) (see, e.g., paragraphs 142, 147, 159, 169, 171, 274 and 318, “engine 110 may determine which modeling techniques to eliminate using a suitable technique, including … eliminating those that do not produce models that meet a minimum threshold value of a model fit metric” [i.e., comparing a minimum value in data to neural network model metrics], “automatically producing a wide variety of models and maintaining metadata describing how the candidate models differ”, “modeling methodologies, techniques … edit the metadata attached to a technique.”, “model deployment engine 140 monitors the performance of deployed predictive models, and updates the performance metadata associated with the modeling techniques that generated the deployed models”, “a model's prediction may be generated by applying the pre-processing and modeling steps described in the modeling technique to each instance of new input data.” [i.e., comparing input data with metadata for the neural network model], “encodes this characteristic in the modeling technique's metadata. Datasets themselves may also have time series specific metadata”, “system 100 may prune ‘less important’ features from the dataset … if the predictive value of the feature is less than a threshold value, if the feature has one of the M lowest predictive values among the features in the dataset … the system may create derived features from ‘more important’ features in the dataset … if the feature has one of the N highest predictive values among the features in the dataset” [i.e., comparing lowest/minimum and highest/maximum of input data/dataset with metadata for the model]) and
 provide a linear combination of outputs of the plurality of versions of the neural network model (paragraph 156 of applicant’s specification discloses that “logic 706 receives the output from the model versions and assembles the individual scoring” and “logic 706 may provide a linear combination of the output for all of result assembly logic 706. In such an embodiment, result assembly logic 706 provides an aggregation of the scoring results”. The examiner notes that this is the sole mention of any “linear combination” or “linear combination” of output in applicant’s specification. Therefore, “a linear combination of outputs” of versions of the neural network model under the BRI is a combination, summation or aggregation of outputs, results or scores of versions of a neural network model) (see, e.g., paragraph 114, “Two or more models may be blended by combining the outputs of the constituent models … the blended model may comprise a weighted, linear combination of the outputs of the constituent models” [i.e., produce/provide a linear combination of outputs of versions of the constituent neural network model]). 
Alternatively, Achin also teaches select a first of the plurality of model versions having a highest accuracy (see, e.g., paragraphs 24, 91, 103, 185, 195 and 291, “selecting, from the generated models, a predictive model for the initial prediction problem based, at least in part, on the score of the selected predictive model”, “selects the modeling procedures with suitability scores within a specified range of the highest suitability score … The range may be absolute (e.g., scores within S points of the highest score)”, “scoring metrics may place different weights on different aspects of a predictive model's performance, including, without limitation, the model's accuracy”, “embodiments … may track different versions of the same logical model”, “the predictive modeling system 100 can fit many different model types, including … neural networks” [i.e., select a neural network model version of the plurality of different model versions], “evaluate sensitivity of the top models based on their relative predictive accuracy.” [i.e., select a top model having a highest score/predictive accuracy]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Bilenko to incorporate the teachings of Achin to provide a predictive modeling system that implements a search technique (or "modeling methodology") for efficiently exploring a predictive modeling search space (e.g., potential modeling algorithms and versions of models) to generate a predictive modeling solution suitable for a specified prediction problem (See, e.g., Achin paragraph 40). Doing so would have allowed Bilenko to compare the suitability of different modeling solutions (i.e., different versions of a neural network model) for the prediction problem and adapt the modeling methodology/search technique based on results of prior searches to improve the effectiveness of the search technique over time, as suggested by Achin (See, e.g., Achin paragraph 40).
Although Bilenko in view of Achin substantially teaches the claimed invention, Bilenko in view of Achin is not relied on to teach receive a network definition of a neural network model; and generate histograms for each tensor in a … data set.
In the same field, analogous art Abadi teaches receive a network definition of a neural network model (see, e.g., page 2, “large-scale training of deep neural networks with hundreds of billions of parameters on hundreds of billions of example records” [i.e., neural network/NN models], “[a] TensorFlow binary defines the sets of operations and kernels available via a registration mechanism, and this set can be extended by linking in additional operation and/or kernel definitions/registrations” [i.e., receive a network definition of a neural network model]); and
generate histograms for each tensor in a … data set (see, e.g., pages 2 and 12-13, “Values that flow along normal edges in the graph (from outputs to inputs) are tensors, arbitrary dimensionality arrays where the underlying element type is specified or inferred at graph-construction time” [i.e., tensors in a set of values/data set], “TensorFlow supports a collection of different Summary operations that can be inserted into the graph including … histogram-based summaries (e.g., the distribution of weight values in a neural network layer)” [i.e., insert/generate histograms for each tensor]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Bilenko in view of Achin to incorporate the teachings of Abadi to provide a TensorFlow interface for expressing machine learning algorithms and an implementation for executing such algorithms so that computations expressed using TensorFlow can be executed with little or no change on a wide variety of heterogeneous systems, ranging from mobile devices such as phones and tablets up to large-scale distributed systems of hundreds of machines and thousands of computational devices such as GPU cards (See, e.g., Abadi, page 1, Abstract). Doing so would have allowed Bilenko in view of Achin to use the TensorFlow interface to express a wide variety of algorithms, including training and inference algorithms for deep neural network models (i.e., versions of neural network models) and to improve (i.e., reduce) training time for the models, as suggested by Abadi (See, e.g., Abadi, Abstract, pages 1 and 11). 
Although Bilenko in view of Achin and Abadi substantially teaches the claimed invention, Bilenko in view of Achin and Abadi is not relied on to teach generate a fixed-bit size engine of the neural network model; and generate a fixed-bit size scoring engine which enables deployment of a plurality of versions of the neural network model, wherein … versions of models comprise low-precision models.
In the same field, analogous art Yang teaches generate a fixed-bit size engine of the neural network model (as indicated above, a “fixed-bit size engine” of the neural network model, under the BRI, in light of the specification, is any engine or module representation of the network model having a certain bit size (e.g., 32-bit)) (see, e.g., paragraphs 60 and 79, “a 32-bit floating point representation of the model … a fixed point representation (e.g., a 16-bit fixed point representation) … may be implemented” [i.e., a fixed-bit size representation/engine of the model], “the neural network may include a floating point representation (e.g., a 32-bit” [i.e., a 32-bit representation of the neural network model]); and
generate a fixed-bit size scoring engine which enables deployment of a plurality of versions of the neural network model (paragraphs 153-154 and 157 of applicant’s specification disclose that “accuracy is optimized for the reference data set by a scoring engine 613”, “a scoring engine 613 is generated based on the network definition and the calibration table”, “[i]n one embodiment, scoring engine 613 is an 8-bit scoring engine” and “scoring engine 613 includes result selection logic 705 that classifies an input”. Therefore, a “fixed-bit size scoring engine” under the BRI is any engine, module, program or application implementation in a certain bit size (e.g., 8-bit) including logic for classifying input data) (see, e.g., paragraphs 60 and 62, “to implement lower level layers of the distributed neural network may be stored in a 16-bit fixed point, 8-bit fixed point, or quantized representation … representations of the model … (e.g., … 16-bit fixed point representation) …the models … a fixed point representation (e.g., a 16-bit fixed point representation) or a quantized representation may be implemented” [i.e., generate a fixed-bit size engine implementation], “some neural network implementations … may be performed with the fixed point representation model discussed herein (e.g., a 16-bit fixed point representation of the model)” [i.e., enabling distribution/deployment of representations/versions of the neural network model]), wherein … versions of models comprise low-precision models (as indicated above, “low-precision models”, under the BRI, are any models capable of performing low precision computations, such as 4, 8, or 16-bit operations) (see, e.g., paragraph 60, “the model … to implement lower level layers of the distributed neural network may be stored in a 16-bit fixed point, 8-bit fixed point, or quantized representation. Such representations of the model may provide substantial memory storage requirement reductions with similar accuracy (e.g., less than a 1 % accuracy drop for 16-bit fixed point representation) with respect to a 32-bit floating point representation of the model.” [i.e., representations/versions of models include low-precision, 8 or 16-bit fixed point models with lower accuracy/lower-precision than 32-bit floating point models]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Bilenko in view of Achin and Abadi to incorporate the teachings of Yang to provide a model to implement lower level layers of a distributed neural network that is stored in a 16-bit fixed point or 8-bit fixed point (i.e., fixed-bit size) representation. (See, e.g., Yang, paragraph 60). Doing so would have allowed Bilenko in view of Achin and Abadi to use the fixed-bit size representations of the model to provide substantial memory storage requirement reductions with similar accuracy (e.g., less than a 1% accuracy drop for 16-bit fixed point representation) with respect to a 32-bit floating point representation of the model, as suggested by Yang (See, e.g., Yang, paragraph 60). 
Although Bilenko in view of Achin, Abadi and Yang substantially teaches the claimed invention, Bilenko in view of Achin, Abadi and Yang is not relied on to teach wherein the plurality of versions of models comprise low-precision models.
In the same field, analogous art Mellempudi teaches wherein the plurality of versions of models comprise low-precision models (paragraphs 155, 157, 169 and 208 of applicant’s specification state that “input data 701 is fed into scoring engine 613, which utilizes low precision models. Scoring engine 613 enables a deployment of multiple model versions … in order to perform simultaneous scoring of the model versions.”, “metadata about an execution unit in the underlying hardware (e.g., low precision EU)”, “perform computational operations at a range of precisions … suited for machine learning computations. For example, … a subset of the floating point units in each of the compute clusters 906A-H can be configured to perform 16-bit or 32-bit floating point operations, while a different subset of the floating point units can be configured to perform 64-bit floating point operations” and “GPGPU 1506 can support instructions to perform low precision computations such as 8-bit and 4-bit integer vector operations.” As noted above, these are the only mentions of any “low-precision” models or operations in applicant’s specification. Therefore, “low-precision models”, under the BRI, are any models capable of performing low precision computations, such as 4, 8, or 16-bit operations) (see, e.g., Abstract and pages 1, 2 and 4, “we have also trained low-precision Resnet-50 with 8-bit activations and ternary weights … Our final quantized model can run on a full 8-bit compute pipeline”, “low-precision alternatives to perform deep learning tasks … trained networks using 16-bit fixed point” [i.e., 8 and 16-bit, low-precision models], “We trained the low precision ResNet-50 … using 2-bit weights and 8-bit activations by initializing the network with pre-trained full precision model. … We obtained the pre-trained models published by Marcel et al.[10] and fine-tune the parameters of our low-precision network” [i.e., versions of the plurality of pre-trained models include low-precision, 8 and 16-bit models]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Bilenko in view of Achin, Abadi and Yang to incorporate the teachings of Mellempudi to provide “a cluster-based quantization method to convert pre-trained full precision weights into ternary weights with minimal impact on the accuracy” by training a “low-precision Resnet-50 with 8-bit activations and ternary weights” (i.e., by training a low-precision neural network model) (See, e.g., Mellempudi, Abstract, page 1). Doing so would have allowed Bilenko in view of Achin, Abadi and Yang to use Mellempudi’s quantization method to produce a “final quantized model [that] can run on a full 8-bit compute pipeline, with a potential 16x improvement in performance compared to baseline full-precision models” where low-precision, quantized models can “be efficiently run on full 8-bit compute pipeline”, as suggested by Mellempudi (See, e.g., Mellempudi, Abstract, pages 1 and 5).

Regarding claims 10 and 16, Bilenko in view of Achin teaches the method of claim 9 and the computer readable medium of claim 15 above.
Bilenko further discloses testing the accuracy of each of the plurality of model versions with reference to a data distribution (see, e.g., paragraphs 51 and 58 “visualizations may be selected from various options. Such options may include, but are not limited, to … predicted value distribution plots for each label”, “Visualizations such as these may allow immediate comparison and contrast of the accuracy of different trained models 110.” [i.e., testing the accuracy of each of the different model versions], “[t]hese visualizations may show distributions of values in the featurized training dataset 106.” [i.e., value/data distributions]).

Claims 3-6, 11-12 and 17-18 are rejected under 35 U.S.C. 103 as being unpatentable over Bilenko in view of Achin, Abadi, Yang and Mellempudi as applied to claims 1-2, 9-10 and 15-16 above and further in view of Yu et al. (U.S. Patent Application Pub. No. 2018/0046913 A1, hereinafter “Yu”). 
Yu was filed on August 26, 2016, and this date is before the effective filing date of this application, i.e., April 1, 2017. Therefore, Yu constitutes prior art under 35 U.S.C. 102(a)(2).
Regarding claim 3, as discussed above, Bilenko in view of Achin, Abadi, Yang and Mellempudi teaches the apparatus of claim 2.
Although Bilenko in view of Achin, Abadi, Yang and Mellempudi substantially teaches the claimed invention, Bilenko in view of Achin, Abadi, Yang and Mellempudi is not relied on to teach wherein each model version obtains a minimal accuracy loss upon the input data becoming equivalent to a data distribution for a reference data set for a model version.
In the same field, analogous art Yu teaches wherein each model version obtains a minimal accuracy loss upon the input data becoming equivalent to a data distribution for a reference data set for a model version (see, e.g., paragraphs 57 and 96, “optimize a CNN from the algorithm perspective, in order to reduce both memory and computation resources it requires to implement a CNN, while suffer[ing] minimum loss of accuracy”, “intermediate data of the fixed-point CNN model and the floating-point CNN model are compared layer by layer using a greedy algorithm to reduce the accuracy loss.” [i.e., each CNN model version minimizes/reduces accuracy loss upon input data becoming equivalent to intermediate data/data distribution for reference data]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Bilenko in view of Achin, Abadi, Yang and Mellempudi to incorporate the teachings of Yu to provide an optimized CNN (convolutional neural network) from an algorithm perspective (i.e., a NN model version) (See, e.g., Yu, paragraph 57). Doing so would have allowed Bilenko in view of Achin, Abadi, Yang and Mellempudi to reduce memory and computational resources required to implement the CNN, while also obtaining a minimal accuracy loss, as suggested by Yu (See, e.g., Yu, paragraph 57). 

Regarding claim 4, as discussed above, Bilenko in view of Achin, Abadi, Yang, Mellempudi and Yu teaches the apparatus of claim 3.
Although Bilenko substantially discloses the claimed invention, Bilenko is not relied on for explicitly disclosing processing circuitry to select the first model version.
In the same field, analogous art Achin teaches processing circuitry to select the first model version (see, e.g., paragraphs 24, 91, 185 and 342, “selecting, from the generated models, a predictive model for the initial prediction problem based, at least in part, on the score of the selected predictive model”, “selects the modeling procedures with suitability scores within a specified range of the highest suitability score”, “embodiments … may track different versions of the same logical model”, “embodiments may be embodied as … circuit configurations … encoded with one or more programs that, when executed on one or more computers or other processors” [i.e., processing circuitry to select the model version]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Bilenko to incorporate the teachings of Achin to provide a predictive modeling system that implements a search technique (or "modeling methodology") for efficiently exploring a predictive modeling search space (e.g., potential modeling algorithms and versions of models) to generate a predictive modeling solution suitable for a specified prediction problem (See, e.g., Achin paragraph 40). Doing so would have allowed Bilenko to compare the suitability of different modeling solutions (i.e., different versions of a neural network model) for the prediction problem and adapt the modeling methodology/search technique based on results of prior searches to improve the effectiveness of the search technique over time, as suggested by Achin (See, e.g., Achin paragraph 40).

Regarding claim 5, as discussed above, Bilenko in view of Achin, Abadi, Yang, Mellempudi and Yu teaches the apparatus of claim 4.
Bilenko further discloses the processing circuitry to classify the input based on the data distributions and determines which of the plurality of model versions has the highest accuracy (see, e.g., paragraphs 51, 58 and 64 “visualizations may be selected from various options. Such options may include, but are not limited, to … predicted value distribution plots for each label”, “Visualizations such as these may allow immediate comparison and contrast of the accuracy of different trained models 110.” [i.e., determine which of the different model versions has highest accuracy], “[t]hese visualizations may show distributions of values in the featurized training dataset 106” [i.e., value/data distributions], “show various statistics about the classification accuracy for different models or algorithms … statistics may show, in raw numbers, true positives 512, true negatives 514, false negatives 516, and false positives” [i.e., classify input based on data distributions]). 

Regarding claim 6, as discussed above, Bilenko in view of Achin, Abadi, Yang, Mellempudi and Yu teaches the apparatus of claim 4.
Bilenko further discloses the processing circuitry to receive output from the plurality of model versions (see, e.g., paragraphs 5 and 35, “receiving a plurality of output results corresponding to predictions made by the machine learning system. The plurality of results corresponds to the plurality of training instances”, “visualization tool 116 may use a set of model outputs, such as predicted values of the target variable (label). The model outputs may also include corresponding probability scores. The probability scores may indicate a confidence level that the predicted label values are accurate.” [i.e., tool 116 receives outputs from the model versions]).

Regarding claims 11 and 17, Bilenko in view of Achin, Abadi, Yang and Mellempudi teaches the method of claim 10 and the computer readable medium of claim 16.
Although Bilenko in view of Achin, Abadi, Yang and Mellempudi substantially teaches the claimed invention, Bilenko in view of Achin, Abadi, Yang and Mellempudi is not relied on to teach wherein each model version obtains a minimal accuracy loss upon the input data becoming equivalent to a data distribution for a reference data set for a model version.
In the same field, analogous art Yu teaches wherein each model version obtains a minimal accuracy loss upon the input data becoming equivalent to a data distribution for a reference data set for a model version (see, e.g., paragraphs 57 and 96, “optimize a CNN from the algorithm perspective, in order to reduce both memory and computation resources it requires to implement a CNN, while suffer[ing] minimum loss of accuracy”, “intermediate data of the fixed-point CNN model and the floating-point CNN model are compared layer by layer using a greedy algorithm to reduce the accuracy loss.” [i.e., each CNN model version minimizes/reduces accuracy loss upon input data becoming equivalent to intermediate data/data distribution for reference data]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Bilenko in view of Achin to incorporate the teachings of Yu to provide an optimized CNN from an algorithm perspective (i.e., a NN model) (See, e.g., Yu, paragraph 57). Doing so would have allowed Bilenko in view of Achin to reduce memory and computational resources required to implement the CNN, while also obtaining a minimal accuracy loss, as suggested by Yu (See, e.g., Yu, paragraph 57). 

Regarding claims 12 and 18, as discussed above, Bilenko in view of Achin, Abadi, Yang and Yu teaches the method of claim 11 and the computer readable medium of claim 17. 
Bilenko further discloses classifying the input based on the data distributions (see, e.g., paragraphs 58 and 64, “visualizations may show distributions of values in the featurized training dataset 106” [i.e., value/data distributions], “show various statistics about the classification accuracy for different models or algorithms … statistics may show, in raw numbers, true positives 512, true negatives 514, false negatives 516, and false positives” [i.e., classifying the input based on data distributions]); and
determining which of the plurality of model versions has the highest accuracy (see, e.g., paragraph 51, “visualizations may be selected from various options. Such options may include, but are not limited, to … predicted value distribution plots for each label”, “Visualizations such as these may allow immediate comparison and contrast of the accuracy of different trained models 110.” [i.e., determine which of the different model versions has highest accuracy]).

Conclusion
Applicant's amendment necessitated the new grounds of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a). 
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
The prior art made of record, listed on the accompanying PTO-892 Notice of References Cited form, and not relied upon is considered pertinent to applicant's disclosure.
The examiner requests, in response to this office action, support be shown for language added to any original claims on amendment and any new claims. That is, indicate support for newly added claim language by specifically pointing to page(s) and line no(s) in the specification and/or drawing figure(s). This will assist the examiner in prosecuting the application.
When responding to this office action, Applicant is advised to clearly point out the patentable novelty which he or she thinks the claims present, in view of the state of the art disclosed by the reference cited or the objections made. He or she must also show how the amendments avoid such references or objections See 37 CFR 1.111 (c).
Any inquiry concerning this communication or earlier communications from the examiner should be directed to RANDY K BALDWIN whose telephone number is (571)270-5222. The examiner can normally be reached on Mon - Fri 9:00-6:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kamran Afshar can be reached on 571-272-7796. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/R.K.B./Examiner, Art Unit 2125


/KAMRAN AFSHAR/Supervisory Patent Examiner, Art Unit 2125