Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Status of Claims
	Claims 1-5, 7-17, and 19-20 are pending in the present application. Claims 1, 3, 4, 7, 11-13, 15, and 20 are newly amended. Claims 6 and 18 have been cancelled.

Response to Arguments
Applicant's arguments filed 5/19/2022 have been fully considered but they are not persuasive. 
Per applicant’s arguments regarding:
Claim objections (p.11): 
Objections withdrawn in view of amendments.

Rejections under §101 (p. 11): 
Rejections withdrawn in view of amendments.

Rejections under §112(a) (p. 11-12):
Rejections withdrawn in view of amendments.

Rejections under §112(b) (p. 12): 
Rejections withdrawn in view of amendments.

Rejections under §103 (p.13-17):
In response to applicant’s argument that “"Variational approximation" is a term of art and is not even suggested in any portion of Hase, let alone the cited portion. Variational inference does not equal variational approximation.” (p.14)
The examiner respectfully disagrees, and would like to provide clarification. Variational approximation, as a term of the art, broadly refers to a collection of methods which make an approximate inference of a posterior distribution using a second distribution. The “variational inference” recited by Hase and used to construct a surrogate model creates an approximate distribution of the model parameters (i.e., a second distribution) such that the model represents the actual properties of the compound being analyzed by the system (i.e., a posterior distribution) (Hase p.1135, “In the first step, the surrogate model is constructed by conditioning f on a prior ϕprior(θ) over the functional form, which is described by parameters θ. The parameters θ of the prior distribution are refined based on observations of n pairs                         
                            
                                
                                    D
                                
                                
                                    n
                                
                            
                            =
                            
                                
                                    
                                        
                                            
                                                
                                                    
                                                        
                                                            x
                                                        
                                                        
                                                            k
                                                        
                                                    
                                                    ,
                                                    
                                                        
                                                            f
                                                        
                                                        
                                                            k
                                                        
                                                    
                                                
                                            
                                        
                                    
                                
                                
                                    k
                                    =
                                    1
                                
                                
                                    n
                                
                            
                        
                     of parameter values xk, denoting, for instance, experimental conditions, and corresponding objective function values fk = f (xk), denoting the experimental responses such as reaction yield …”). One of ordinary skill in the art would thus readily recognize that the variational inference performed by Hase is a form of variational approximation. 

In response to applicant’s argument that “First of all, the cited portion of Hase is directed to experiments… the state of a compound in an experiment is a physical state, and not an objective variable pertaining to a model of the compound.” (p. 14-15)
Hase’s system creates a BNN which attempts to recreate the state of a compound in a physical state (see p.1135-1136, “The parameters θ of the prior distribution are refined based on observations of n pairs                         
                            
                                
                                    D
                                
                                
                                    n
                                
                            
                            =
                            
                                
                                    
                                        
                                            
                                                
                                                    
                                                        
                                                            x
                                                        
                                                        
                                                            k
                                                        
                                                    
                                                    ,
                                                    
                                                        
                                                            f
                                                        
                                                        
                                                            k
                                                        
                                                    
                                                
                                            
                                        
                                    
                                
                                
                                    k
                                    =
                                    1
                                
                                
                                    n
                                
                            
                        
                     of parameter values xk, denoting, for instance, experimental conditions, and corresponding objective function values fk = f (xk), denoting the experimental responses such as reaction yield... With more and more observations Dn, the posterior ϕpost yields a better approximation and eventually converges to the objective function in the limit of infinitely many distinct observations, thus perfectly reproducing the experimental response landscape.”). Thus, Hase determines a “reaction yield” (e.g., “objective variable” that is a “chemical property of the compound”), using the model which, as a whole, reproduces the experimental response landscape.

In response to applicant’s argument that “The probabilistic prediction model does not equate to a plurality of bins as essentially asserted by the Examiner in relation to Athey” (p. 15)
Athey discloses determining the length of a chemical sequence, and the placement of the chemical compound into a certain bin based on the determined sequence length. The “probabilistic prediction model” is disclosed by Hase, not, as misconstrued by the applicant, Athey. When Athey is applied to Hase, the resulting combination would obviously determine the sequence length of the compound being analyzed by Hase’s model, and use the sequence length in attempting to model the compound. In response to applicant's arguments against the references individually, one cannot show nonobviousness by attacking references individually where the rejections are based on combinations of references.  See In re Keller, 642 F.2d 413, 208 USPQ 871 (CCPA 1981); In re Merck & Co., 800 F.2d 1091, 231 USPQ 375 (Fed. Cir. 1986).

In response to applicant's argument that “Moreover, the probabilistic prediction model is now recited as being for sequence-to-sequence feature vector copying, a feature absent from all of the cited references.” (p.15), a recitation of the intended use of the claimed invention must result in a structural difference between the claimed invention and the prior art in order to patentably distinguish the claimed invention from the prior art.  If the prior art structure is capable of performing the intended use, then it meets the claim. The combination of Hase and Athey would result in a system capable of sequence-to-sequence feature vector copying. 
Accordingly, the rejections are upheld.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claims 1, 4-5, 7, 11, 13, 16-17, and 19-20 are rejected under 35 U.S.C. 103 as being unpatentable over “Phoenics: A Bayesian Optimizer for Chemistry” to Hase et al (hereinafter, Hase) in view of US 20190172584 A1 to Athey et al (hereinafter, Athey), further in view of US 20180071425 A1 to Jin et al (hereinafter, Jin).

As per claim 1, Hase teaches A computer-implemented method for creating a prediction model that predicts chemical properties of a compound from sequence data as a set of feature vectors describing the compound, the sequence data comprising multiple data sequences, the method comprising (p.1134, Abstract, “We report Phoenics, a probabilistic global optimization algorithm identifying the set of conditions of an experimental or computational procedure which satisfies desired targets… We showcase the applicability of Phoenics on the Oregonator, a complex case-study describing a nonlinear chemical reaction network. Despite the large search space, Phoenics quickly identifies the conditions which yield the desired target dynamic behavior.”): 
generating, by a hardware processor, a probabilistic prediction model y* for predicting an objective variable y for sequence-to-sequence feature vector copying and learned using Bayesian criterion and variational approximation (p.1136, right column, “We suggest to use BNNs to estimate the parameter kernel density from the observed parameter points in an autoencoder-like architecture. As such, the BNN is used to nonlinearly estimate the density of the observed parameter points x (see Figure 1b). A particular realization of the BNN represents a map projecting parameter points into the parameter space, i.e, BNN: Rd → Rd. Thereby, we can construct an estimate to the parameter kernel density, which corresponds to a particular observed objective function value… In both cases, the model parameters θ of the BNN are trained via variational inference.” p.1136, right column “Examples include the execution of experiments on multiple experimental platforms or the distribution of computational models across multiple processors. Batched Bayesian optimization has been suggested with different assumptions and applicability scenarios. Marmin et. al proposed derivative-based expected improvement criterion for synchronous batch-sequential Bayesian optimization.” Figure 1. Examiner Note: Hase’s BNN is a probabilistic Bayesian model trained using variational inference (i.e., approximation) to predict the value of an objection function at a given point (i.e., predicting the value of variable y) and is seen as equivalent to the claimed probabilistic model y*. The examiner further notes that the language “for sequence-to-sequence vector copying” is intended use, and the performance of “sequence-to-sequence vector copying” is not claimed by the present language of this limitation.);
configuring, by the hardware processor, the probabilistic prediction model y* to (i) assign one of multiple prediction functions for each of the feature vectors extracted from the sequence data, (ii) identify a relationship between a t-th vector in an i-th data and the objective variable y, and (iii) identify similarities of relationships between the feature vectors and the objective variable y (p.1135, right column, “Bayesian optimization is a gradient-free strategy for the global optimization of possibly noisy black-box functions, which we denote with f from hereon. 27−30 It consists of two major steps: (i) construct a surrogate to f and (ii) propose new parameter points for querying f based on this probabilistic approximation. In the first step, the surrogate model is constructed by conditioning f on a prior ϕprior(θ) over the functional form, which is described by parameters θ. The parameters θ of the prior distribution are refined based on observations of n pairs Dn ={Xk, Fk}nk=1 of parameter values xk, denoting, for instance, experimental conditions, and corresponding objective function values f k = f (xk), denoting the experimental responses such as reaction yield.” p. 1136, left column, “GPs associate every point in the parameter domain with a normally distributed random variable. These normal distributions are then constructed via a similarity measure between observations given by a kernel function. A GP therefore provides a flexible way of finding analytic approximations to the objective function.” p.1136, right column, “We suggest to use BNNs to estimate the parameter kernel density from the observed parameter points in an autoencoder-like architecture. As such, the BNN is used to nonlinearly estimate the density of the observed parameter points x (see Figure 1b). A particular realization of the BNN represents a map projecting parameter points into the parameter space, i.e, BNN: Rd → Rd. Thereby, we can construct an estimate to the parameter kernel density, which corresponds to a particular observed objective function value” Figure 1. Examiner Note: The examiner sees the construction of a surrogate model f as equivalent to the extraction of feature vectors from sequence data, and the selection of (a) parameter point(s) for querying said surrogate model as equivalent to assigning (a) prediction function to that surrogate data. The examiner sees the identification of parameter values with corresponding objective function values as equivalent to identifying a relationship between a data point in a set of data with the objective function. The examiner further sees measuring the estimation of objective function value via a density based similarity measure as equivalent to identifying a similarity between feature vectors and an objective function.);
predicting, by the hardware processor, the objective variable y as a chemical property of the compound based on the probabilistic prediction model y* (p. 1141, right column, “In particular, we demonstrate how Phoenics can be employed to propose a set of conditions for an experimental procedure. The experimental procedure can then be executed with the proposed conditions, and the results of the procedure are reported back to Phoenics. With this feedback, Phoenics can make more informed decisions and, thus, provides more promising sets of experimental conditions, to eventually result in the discovery of the optimal set of conditions. Most chemical reactions lead to a steady-state, i.e., a state in which the concentrations of involved compounds are constant in time. While chemical reactions described by linear differential equations always feature such a steady-state, more complicated dynamics phenomena can arise for reactions described by sets of nonlinear coupled differential equations. With the right choice of parameters, such differential equations may have a stable limit cycle, leading to periodic oscillations in the concentrations of involved compounds” Examiner Note: The state of a compound under certain conditions is seen as equivalent to an objective variable and chemical property of that compound, and so Hase’s prediction of conditions under which a given compound is in steady-state is seen as equivalent to claimed prediction of a chemical property of a compound.).


Hase discloses a probabilistic prediction model, but does not explicitly teach identifying, by the hardware processor using the probabilistic prediction model y*, a sequence length which is variable between the multiple data sequences.

Athey teaches identifying, by the hardware processor using the probabilistic prediction model y*, a sequence length which is variable between the multiple data sequences ([0104] “This map may be generated with “bins” representing fixed lengths of DNA sequences, or bins representing cutsite increments or collections thereof, or functional elements such as genes, chromatin state segments, loop domains, chromatin domains, TADs, etc. Contacts may be discerned with thresholding in a variety of normalization modes for distances, overall contact propensity, and other elements. For example, in the case of bins which are not fixed in sequence length, and for which therefore the squared genome area described by a pair of bins may be of variable size and shape, normalization methods may be devised to substitute for traditional methods which rely on fixed bins. The density of contacts as a function of distance may be fitted to an integrable function, which may be integrated over the rectangular area of a bin pair to produce an expected value of contacts mapped to this squared genome region” Examiner Note: Hase teaches the probabilistic prediction y*, as above. Athey teaches a variety of bins representing variable sequence lengths of a given compound, and the placement of a compound in a given bin based on the sequence length of the compound (i.e., identifying the sequence length of a chemical compound). When Athey is applied to Hase, the resulting system would use the probabilistic model y* to identify the variable sequence length of a chemical compound.).

Hase and Athey are analogous art because they are both directed to the application of machine learning to chemistry. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Hase’s chemical property prediction system with Athey’s chemical sequence length measurement. The combination would have been obvious to one of ordinary skill in the art before the filing date of the claimed invention because he/she would have been motivated to more accurately model the properties of a given chemical compound, which can be accomplished by accounting for the sequence length of said compound (Athey, [0110] “More specifically, to filter out a portion of the permissive candidate variants to generate the subset of intermediate candidate variants (block 410), genomic regions around the permissive candidate variants are evaluated for regulatory function (block 408 a), to determine whether their sequence contexts (e.g., alleles) influence the regulatory function (variant dependence) (block 408 b), and to determine their target genes (block 408 c).”).

Hase and Athey do not explicitly teach forming, by a manufacturing system coupled to the hardware processor, a new compound based on the prediction of the objective variable y as a constituent element of the new compound.

Jin teaches forming, by a manufacturing system coupled to the hardware processor, a new compound based on the prediction of the objective variable y as a constituent element of the new compound ([0070] “While many ready-made (pre-prepared) fragrance, or flavor substances (either referred to as “scent”) can be stored for delivery (without blending) to users of devices incorporating the presently disclosed embodiments, an unlimited number of scents can also be rapidly created and generated, on-demand, according to formulae (and other instructions) by blending basic chemical ingredients (incorporating odor generating chemicals), to create scent compounds (the foregoing action referred to as “compounding”” [0146] “Device software that interprets moving or still image(s) of the user can identify user emotion, behavior, intention, or nervous system state, as well as patterned tendencies of the foregoing, and can be programmed to activate the selection, generation and delivery to the user of (iv) scents from a ready-made scent container in a scent delivery device scent cartridge; or can otherwise can be programmed to (v) select, a labeled formula (containing a list of ingredients, parameters such as molar weights and specifications for blending and for release such as concentration) and instruct the delivery device to selectively blend the formula's compound for release and delivery to deliver to the user. The software may optionally send instructions to trigger the operation of other sensory generating mechanisms (such as heating, cooling, or moisturizing), while delivering the aforesaid selected blended scent compound (or ready-made scent as the case may be), in order to further enhance or modify user experience and perception.” Examiner Note: The combination of Hase and Athey disclose the prediction of the objective variable y as a constituent element of a compound as detailed in at least claim 1. Jin discloses the creation of instructions for manufacturing of a chemical compound based on that compound having a certain scent (i.e., containing an objective variable), and the sending of that compound to a manufacturing device for compounding. When Jin is applied to the combination of Hase and Athey, the resulting combination would create instructions to form a compound based on the prediction of that compound having an objective variable as a constituent element of the compound.).

Hase, Athey, and Jin are analogous art because they are all directed to chemical applications. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the chemical property prediction system and sequence length measurement of Hase and Athey with Jin’s chemical compounding instructions. The combination would have been obvious to one of ordinary skill in the art before the filing date of the claimed invention because he/she would have been motivated to increase flexibility of the system, which can be accomplished through on-demand synthesis of a predicted compound (Jin, [0003] “Rapidly switchable, compact blendable fluid control offers the promise for on-demand synthesis of up to thousands of basic ingredients according to prescribed formulae and has application in a broad array of fields as a research and education tool, a manufacturing or processing method, and commercial as well as consumer products and applications. Furthermore, such devices should offer practical, economic, compact, mechanically and electrically reliable, and efficient on-demand control, and precision-timed gas, vapor, mist, liquid, powders or other substance delivery for effective use by individual users or groups.”). 

As per claim 4, the combination of Hase and Athey thus far teaches The computer-implemented method of claim 1.
Hase teaches repeating the method to predict another objective variable y' as another property of the compound relative to a different prediction function than that used to predict the objective variable y (p.1138, left column, “As chemical reactions are time-consuming to evaluate, we therefore assess the performance of each of these three algorithms on a set of 15 benchmark functions covering a large range of qualitatively diverse response surfaces for problems in chemistry. The employed functions are well-established benchmarks and include continuous and convex, nonconvex, or discrete functions with possibly multiple global minima. A complete list of the employed objective functions as well as their global minima is provided in the Supporting Information (see Table S.1).” Examiner Note: Hase discloses repeating their method on a different objective function, where each objective function searches for a different objective variable.).

As per claim 5, the combination of Hase and Athey thus far teaches The computer-implemented method of claim 1.
Hase teaches wherein the probabilistic model is a Gaussian model (p.1136, left column, “A popular choice for modeling the functional prior ϕprior on the objective function are Gaussian processes (GPs),30,31,50,58 and random forests (RFs).33−35,59 GPs associate every point in the parameter domain with a normally distributed random variable. These normal distributions are then constructed via a similarity measure between observations given by a kernel function. A GP therefore provides a flexible way of finding analytic approximations to the objective function”).


As per claim 7, the combination of Hase and Athey thus far teaches The computer-implemented method of claim 1.

Hase teaches further comprising replacing a mixture component of the probabilistic prediction model y* with one or more neural networks (p. 1136, left column, “Recently, Bayesian neural networks (BNNs) have been employed for Bayesian optimization,37,38 retaining the flexibility of GPs at a computational scaling comparable to RFs. In contrast to traditional neural networks, weights and biases for neurons in BNNs are not single numbers but instead sampled from a distribution. BNNs are trained by updating the distributions from which weights and biases are sampled.” p.1137, right column, “We suggest to use BNNs to estimate the parameter kernel density from the observed parameter points in an autoencoder-like architecture. As such, the BNN is used to nonlinearly estimate the density of the observed parameter points x (see Figure 1b).” Examiner Note: Hase’s BNN, used to estimate density, is seen as equivalent to a mixture component of a probabilistic model.). 


As per claim 11, the combination of Hase and Athey thus far teaches The computer-implemented method of claim 1. 
Hase teaches The computer-implemented method of claim 1, wherein the hidden variable is provided in a form of ni,t,d, where ni is a binary variable representing the assignation of the d-th function to the t-th feature vector in the i-th data such that [Sum]dni,t,d = 1 (p.1135, right column, “Bayesian optimization is a gradient-free strategy for the global optimization of possibly noisy black-box functions, which we denote with f from hereon. 27−30 It consists of two major steps: (i) construct a surrogate to f and (ii) propose new parameter points for querying f based on this probabilistic approximation. In the first step, the surrogate model is constructed by conditioning f on a prior ϕprior(θ) over the functional form, which is described by parameters θ. The parameters θ of the prior distribution are refined based on observations of n pairs Dn ={Xk, Fk}nk=1 of parameter values xk, denoting, for instance, experimental conditions, and corresponding objective function values f k = f (xk), denoting the experimental responses such as reaction yield.” p. 1136, left column, “GPs associate every point in the parameter domain with a normally distributed random variable. These normal distributions are then constructed via a similarity measure between observations given by a kernel function. A GP therefore provides a flexible way of finding analytic approximations to the objective function.” p.1136, right column, “We suggest to use BNNs to estimate the parameter kernel density from the observed parameter points in an autoencoder-like architecture. As such, the BNN is used to nonlinearly estimate the density of the observed parameter points x (see Figure 1b). A particular realization of the BNN represents a map projecting parameter points into the parameter space, i.e, BNN: Rd → Rd. Thereby, we can construct an estimate to the parameter kernel density, which corresponds to a particular observed objective function value” Figure 1. Examiner Note: The examiner sees the construction of a surrogate model f as equivalent to the extraction of feature vectors from sequence data, and the selection of (a) parameter point(s) for querying said surrogate model as equivalent to assigning (a) prediction function to that surrogate data. The examiner sees the identification of parameter values with corresponding objective function values as equivalent to identifying a relationship between a data point in a set of data with the objective function. The examiner recognizes that a computer system, in order to assign a relationship between a function and a data point, must include at least one binary variable representing the relationship between that function and data point, which is seen as equivalent to the claimed hidden variable.).

Claim 13 is a program product claim that implements the same features as method claim 1 and is therefore rejected for at least the same reasons therein. Claim 13 requires a non-transitory computer readable storage medium having program instructions embodied therewith, the program instructions executable by a computer to cause the computer to perform a method (Athey [0227], “Unless specifically stated otherwise, discussions herein using words such as “processing,” “computing,” “calculating,” “determining,” “presenting,” “displaying,” or the like may refer to actions or processes of a machine (e.g., a computer) that manipulates or transforms data represented as physical (e.g., electronic, magnetic, or optical) quantities within one or more memories (e.g., volatile memory, non-volatile memory, or a combination thereof), registers, or other machine components that receive, store, transmit, or display information.”). 

Hase and Athey are analogous art because they are both directed to the application of machine learning to chemistry. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Hase’s chemical property prediction system with Athey’s computer readable storage medium. The combination would have been obvious to one of ordinary skill in the art before the filing date of the claimed invention because he/she would have been motivated to increase the flexibility of the method, which can be accomplished by applying the method to a storage method (Athey, [0226] “The performance of certain of the operations may be distributed among the one or more processors, not only residing within a single machine, but deployed across a number of machines. In some example embodiments, the one or more processors or processor-implemented modules may be located in a single geographic location (e.g., within a home environment, an office environment, or a server farm). In other example embodiments, the one or more processors or processor-implemented modules may be distributed across a number of geographic locations.”).

Claim 16 is a program product claim corresponding to method claim 4. Claim 16 is rejected for the same reasons as claim 4.
Claim 17 is a program product claim corresponding to method claim 5. Claim 17 is rejected for the same reasons as claim 5. 
Claim 19 is a program product claim corresponding to method claim 7. Claim 19 is rejected for the same reasons as claim 7.
Claim 20 is a system claim that implements the same features as method claim 1, and is therefore rejected for at least the same reasons as claim 1. Claim 20 requires a memory for storing program code; and a hardware processor for executing the program code to ((Athey [0227], “Unless specifically stated otherwise, discussions herein using words such as “processing,” “computing,” “calculating,” “determining,” “presenting,” “displaying,” or the like may refer to actions or processes of a machine (e.g., a computer) that manipulates or transforms data represented as physical (e.g., electronic, magnetic, or optical) quantities within one or more memories (e.g., volatile memory, non-volatile memory, or a combination thereof), registers, or other machine components that receive, store, transmit, or display information.”). Claim 20 is rejected for the same reasons as claim 1.
Hase and Athey are analogous art because they are both directed to the application of machine learning to chemistry. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Hase’s chemical property prediction system with Athey’s computer readable storage medium. The combination would have been obvious to one of ordinary skill in the art before the filing date of the claimed invention because he/she would have been motivated to increase the flexibility of the method, which can be accomplished by applying the method to a storage method (Athey, [0226] “The performance of certain of the operations may be distributed among the one or more processors, not only residing within a single machine, but deployed across a number of machines. In some example embodiments, the one or more processors or processor-implemented modules may be located in a single geographic location (e.g., within a home environment, an office environment, or a server farm). In other example embodiments, the one or more processors or processor-implemented modules may be distributed across a number of geographic locations.”).)

Claims 2-3 and 14-15 are rejected under 35 U.S.C. 103 as being unpatentable over Hase, Athey, and Jin, further in view of “A Survey of Statistical Models for Reverse Engineering Gene Regulatory Networks” to Huang et al (hereinafter, Huang).

As per claim 2, the combination of Hase, Athey, and Jin thus far teaches the computer-implemented method of claim 1. 

The combination of Hase and Athey discloses a probabilistic model learned using Bayesian criteria, but does not explicitly teach wherein the probabilistic prediction model y* is learned using Bayesian criterion as follows: y* (X,Y,{Xi}Ni = 1) = argminy* [integral]p(y, X, Y, {Xi}Ni = 1, 0)(y* - y)2 dy dX dY d{Xi}Ni = 1 d0 = [integral]p(y|X, Y, {Xi}Ni = 1)ydy, where X is a set of input sequences in training data, Y = {yi}Ni = 1 is the set of objective variables in the training data, {Xi}Ni = 1, is a set of input sequences in the training data, and 0 is a set of parameters to be learned.
Huang teaches wherein the probabilistic prediction model y* is learned using Bayesian criterion as follows: y* (X,Y,{Xi}Ni = 1) = argminy* [integral]p(y, X, Y, {Xi}Ni = 1, 0)(y* - y)2 dy dX dY d{Xi}Ni = 1 d0 = [integral]p(y|X, Y, {Xi}Ni = 1)ydy, where X is a set of input sequences in training data, Y = {yi}Ni = 1 is the set of objective variables in the training data, {Xi}Ni = 1, is a set of input sequences in the training data, and 0 is a set of parameters to be learned (Section 3, “Probabilistic Boolean Networks, page 9, “p(xt+1∣xt,θ,S)=∏i=1G∑k∈Kip(xi,t+1∣xt,θi,t+1,f(i)k)c(i)k” Examiner Note: The model required by the claim is cited as equivalent to “[integral]p(y|X, Y, {Xi}Ni = 1)ydy”. The model recited by Huang, “∏i=1G∑k∈Kip(xi,t+1∣xt,θi,t+1,f(i)k)c(i)k” is a functional equivalent recited to sum over discrete values, rather than a continuous integral. Adapting the model to perform a continuous summation is a mathematical operation that would be obvious to one of ordinary skill in the art. Thus, when Huang is applied to the combination of Hase and Athey, the resulting system would learn the probabilistic model y* according to the above recited equation.).

Hase, Athey, Jin, and Huang are analogous art because they are all directed to chemical applications. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the chemical property prediction system, sequence length measurement, and compound manufacturing of Hase, Athey, and Jin with Huang’s probabilistic model. The combination would have been obvious to one of ordinary skill in the art before the filing date of the claimed invention because he/she would have been motivated to accurately model the chemical compounds of the system, which can be accomplished through use of proper mathematical models (Huang, p. 3-4, “The problem of uncovering GRNs falls within the framework of system identification and is a traditional inverse problem. A difficulty unique to uncovering GRNs is the enormous scale of the problem involving hundreds or even thousands of genes, not to mention the nonlinearity and dynamics of regulation, inherent experimental errors, noisy readouts of expression levels, and many unobserved factors. Obviously, the task calls for proper mathematical models and powerful inference algorithms.”).

As per claim 3, the combination of Hase and Athey teaches The computer-implemented method of claim 1.
The combination of Hase and Athey does not explicitly teach wherein the probabilistic model is as follows: p(y|X, {wd}Dd=0, B, {nt}Tt=0) = Gauss(y|SumdSumtwdnt,dxt, 1/B), p(xt|{wd}Dd=0, {Cd}Dd=0,nt) = SumdGauss(xt|ud, 1/Ed)nt,d p(nt|nt-1, kt,t-1,lamdat,d) = exp(-kt,t-1(1-nTtnt-1) – Sumdlamdat,d(1-nt,d)) p(w) = Automatic Relevance Determination (ARD) prior in Bayesian sparse learning, and p(fi, , K, )-independent Gamma distributions to restrict the set of parameters to be learned to positive values, where X is a set of input sequences in training data, y is an objective variable in the training data, {Xi}N1 is a set of input sequences in the training data, and t denotes the t-th feature vector, ij denotes a binary variable representing assigning a d-th function to the t-th feature vector in the i-th data, and w, B, u, c, k, and lamda are parameters to be learned.

Huang teaches wherein the probabilistic model is as follows: p(y|X, {wd}Dd=0, B, {nt}Tt=0) = Gauss(y|SumdSumtwdnt,dxt, 1/B), p(xt|{wd}Dd=0, {Cd}Dd=0,nt) = SumdGauss(xt|ud, 1/Ed)nt,d p(nt|nt-1, kt,t-1,lamdat,d) = exp(-kt,t-1(1-nTtnt-1) – Sumdlamdat,d(1-nt,d)) p(w) = Automatic Relevance Determination (ARD) prior in Bayesian sparse learning, and p(fi, , K, )-independent Gamma distributions to restrict the set of parameters to be learned to positive values, where X is a set of input sequences in training data, y is an objective variable in the training data, {Xi}N1 is a set of input sequences in the training data, and t denotes the t-th feature vector, ij denotes a binary variable representing assigning a d-th function to the t-th feature vector in the i-th data, and w, B, u, c, k, and lamda are parameters to be learned (p.13 “p(xi,t+1∣xt,θ,S)=(2πσ2i)−1/2exp(−12σ2i∣xi,t+1−μ(xt)∣2), p(xi,t+1,xj,t∣θ,S)=1Me∑iMeς−2Υ(ς−1∣z−z(me)∣)” Examiner Note: The examiner sees the equations recited by Huang as functional equivalents to those required by the claim. The examiner further notes that training a model with training data, the objective variable, set of input sequences, and binary variable have all been taught by the combination of Hase and Athey, recited above. When Huang is applied to the combination of Hase and Athey, the resulting system would learn the above functions as the probabilistic model.).

Hase, Athey, Jin, and Huang are analogous art because they are all directed to chemical applications. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the chemical property prediction system, sequence length measurement, and compound manufacturing of Hase, Athey, and Jin with Huang’s probabilistic model The combination would have been obvious to one of ordinary skill in the art before the filing date of the claimed invention because he/she would have been motivated to accurately model the chemical compounds of the system, which can be accomplished through use of proper mathematical models (Huang, p. 3-4, “The problem of uncovering GRNs falls within the framework of system identification and is a traditional inverse problem. A difficulty unique to uncovering GRNs is the enormous scale of the problem involving hundreds or even thousands of genes, not to mention the nonlinearity and dynamics of regulation, inherent experimental errors, noisy readouts of expression levels, and many unobserved factors. Obviously, the task calls for proper mathematical models and powerful inference algorithms.”).

Claim 14 is a program product claim that implements the same features as method claim 2 and is therefore rejected for at least the same reasons therein.
Claim 15 is a program product claim that implements the same features as method claim 3 and is therefore rejected for at least the same reasons therein.

Claims 8-10 are rejected under 35 U.S.C. 103 as being unpatentable over Hase, Athey, and Jin, further in view of US 20150317589 A1 to Anderson et al (hereinafter, Anderson).

As per claim 8, the combination of Hase, Athey, and Jin thus far teaches The computer-implemented method of claim 1.
Neither Hase nor Athey nor Jin explicitly teach further comprising assigning a prediction function by an estimation of a hidden variable that explicitly represents an assignation of the prediction function from among a plurality of available prediction functions.

Anderson teaches further comprising assigning a prediction function by an estimation of a hidden variable that explicitly represents an assignation of the prediction function from among a plurality of available prediction functions ([0116] “In an exemplary embodiment, MAPE can be used from the previous to select the best model for each day. In an exemplary embodiment, the Machine Learning Forecasting Model 213 can run an ensemble of machine learning and statistical models and select the best performing model to use at each forecasting time interval. In another exemplary embodiment, the Machine Learning Forecasting Model 213 can apply a combining rule, such as a majority rule, to select a model.” Examiner Note: Hase teaches the use of a plurality of prediction functions, but does not specifically disclose the assignation of the prediction function. Anderson teaches the selection of a model from a plurality of models based on an estimation [of performance]. The examiner recognizes that, in order for a computer system to select a model from a plurality of models, the computer system must have at least one binary digit representing the selection of that particular model. When Anderson is applied to the combination of Hase and Athey, the resulting combination would assign (i.e., select) a prediction function using a variable representing the selection of that prediction function from the plurality of available prediction functions.).

The combination of Hase, Athey, and Jin is analogous art to Anderson because they are both directed to machine learning. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Hase’s chemical property prediction system with Athey’s chemical sequence length measurement, Jin’s chemical compounding, and Anderson’s model selection. The combination would have been obvious to one of ordinary skill in the art before the filing date of the claimed invention because he/she would have been motivated to more accurately model the properties of a given chemical compound, which can be accomplished by selecting the best prediction function (Anderson, [0116] “In an exemplary embodiment, MAPE can be used from the previous to select the best model for each day. In an exemplary embodiment, the Machine Learning Forecasting Model 213 can run an ensemble of machine learning and statistical models and select the best performing model to use at each forecasting time interval. In another exemplary embodiment, the Machine Learning Forecasting Model 213 can apply a combining rule, such as a majority rule, to select a model.”).

As per claim 9, the combination of Hase, Athey, Jin, and Anderson thus far teaches The computer-implemented method of claim 8.
Hase teaches wherein said predicting step comprises calculating a summation of outputs of the assigned ones of the plurality of available prediction functions (p.1137, column, “We formulate the approximation to the objective function as an ensemble average of the observed objective function values f k taken over the set of computed kernel densities pk(x) (see eq 2).64,65 In this ensemble average, each of the constructed distributions pk(x) is rescaled by the value of the objective function f k observed for the parameter point xk (Figure 1c).” Examiner Note: An ensemble average is considered equivalent to calculating a summation of outputs.).

As per claim 10, the combination of Hase, Athey, Jin, and Anderson thus far teaches The computer-implemented method of claim 8.
Hase teaches wherein the estimation represents roles of each of the feature vectors in each i-th data (p.1135, right column, “Bayesian optimization is a gradient-free strategy for the global optimization of possibly noisy black-box functions, which we denote with f from hereon. 27−30 It consists of two major steps: (i) construct a surrogate to f and (ii) propose new parameter points for querying f based on this probabilistic approximation. In the first step, the surrogate model is constructed by conditioning f on a prior ϕprior(θ) over the functional form, which is described by parameters θ. The parameters θ of the prior distribution are refined based on observations of n pairs Dn ={Xk, Fk}nk=1 of parameter values xk, denoting, for instance, experimental conditions, and corresponding objective function values f k = f (xk), denoting the experimental responses such as reaction yield.” p. 1136, left column, “GPs associate every point in the parameter domain with a normally distributed random variable. These normal distributions are then constructed via a similarity measure between observations given by a kernel function. A GP therefore provides a flexible way of finding analytic approximations to the objective function.” p.1136, right column, “We suggest to use BNNs to estimate the parameter kernel density from the observed parameter points in an autoencoder-like architecture. As such, the BNN is used to nonlinearly estimate the density of the observed parameter points x (see Figure 1b). A particular realization of the BNN represents a map projecting parameter points into the parameter space, i.e, BNN: Rd → Rd. Thereby, we can construct an estimate to the parameter kernel density, which corresponds to a particular observed objective function value” Figure 1.  Examiner Note: Hase’s estimation of a parameter to an objective function using density is seen as equivalent to estimating the role of that parameter in the objective function.).



Claim 12 is rejected under 35 U.S.C. 103 as being unpatentable over Hase, Athey, and Jin, further in view of “Adversarial Machine Learning” to Huang et al (hereinafter, L. Huang).

As per claim 12, the combination of Hase and Athey thus far teaches The computer-implemented method of claim 1.
Hase does not explicitly disclose further comprising discarding the object on a basis of contamination of the object, responsive to the prediction of the objective variable involving an element unexpected as a part of the object.

L. Huang teaches further comprising discarding a predicted object on a basis of contamination of the predicted object, responsive to the prediction of the objective variable involving an element unexpected as a part of the object (p.51, right column 3.5.1, “For example, for SpamBayes we explored such a sanitization technique called the Reject On Negative Impact (RONI) defense [48], a technique that measures the empirical effect of adding each training instance and discards instances that have a substantial negative impact on classification accuracy” Examiner Note: L. Huang teaches the removal of data based on that data negatively impacting the system. The combination of Hase and Athey teach the prediction of objective variables in an element. When L. Huang is applied to the combination of Hase and Athey, the resulting system would, based on the prediction of a negatively impacting (i.e., unwanted) element in an object, discard the object.).

The combination of Hase, Athey, and Jin is analogous art to L. Huang because they are both directed to machine learning. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Hase’s chemical property prediction system with Athey’s chemical sequence length measurement, Jin’s chemical compounding, and L Huang’s unwanted element detection. The combination would have been obvious to one of ordinary skill in the art before the filing date of the claimed invention because he/she would have been motivated to improve consistency of the system, which can be accomplished by removing objects that negatively affect it (L. Huang, [0094] “. The defender applies both classifiers to a quiz set of instances with known labels and measures the difference in accuracy between the two classifiers. If adding the candidate instance to the training set causes the resulting classifier to produce substantially more classification errors, the defender permanently removes the instance as detrimental in its effect.”).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. “Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules” to Gomez et al, “Constrained Bayesian Optimization for Automatic Chemical Design” to Griffiths et Hernandez, “Mol2vec: Unsupervised Machine Learning Approach with Chemical Intuition” to Jaeger et al, and US 20170200265 A1.

Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to PAUL G SMITH whose telephone number is (571)272-9730. The examiner can normally be reached M-F 9:30-18:00 EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ann Lo can be reached on 5712729767. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
Respectfully Submitted,
/P.G.S./Examiner, Art Unit 2126                                                                                                                                                                                                        
/NICHOLAS KLICOS/Primary Examiner, Art Unit 2145