DETAILED ACTION

The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114

A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 3/1/2021 has been entered.
 

This action is responsive to the original application filed on 2/14/2020 and the Remarks and Amendments filed on 1/20/2021 and the RCE filed on 3/1/2021.  Acknowledgement is made with regards to priority claimed to Singapore Application No. SG10201904549Q filed on 5/21/2019. 

Claim Rejections - 35 USC § 112

The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.



Claims 1-4, 6-11, 13-18, 20, and 21 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor, or for pre-AIA  the applicant regards as the invention.

Regarding independent claims 1, 8, and 15, they recite the limitation “selecting, from the target domain, a set of features with the least training errors as training samples for training each of the at least one additional layer” (emphasis added).  There is insufficient antecedent basis for this limitation in the claims.  For examination purposes, the limitation will be interpreted to read “selecting, from the target domain, a set of features with a least number of training errors as training samples for training each of the at least one additional layer” (emphasis added). Further, independent claims 1, 8, and 15 recite the limitation “determining whether to activate each of the set of nodes based on the weighted sum and a predetermined activation function”.  This limitation is unclear because it is uncertain as to what “set of nodes” the limitation refers to. Is it the set of nodes in the plurality of hidden layers, or is it the set of nodes in the additional layer?  For examination purposes, the limitation will be interpreted to mean that “determining whether to activate each of the set of nodes based on the weighted sum and a predetermined activation function” refers to the set of nodes in each of the plurality of hidden layers.


Regarding dependent claims 2, 9, and 16, they recite the limitation “wherein the at least one additional layer is a feature layer introduced to the pre-trained neural network in parallel with the last hidden layer of the plurality of hidden layers farthest from the input layer, and wherein introducing the at least one additional layer comprises introducing additional trainable parameters for converting the features obtained at the output layer to newly adapted features as an output from the at least one additional layer”.  There is insufficient antecedent basis for this limitation in the claims because it is unclear as to what “the last hidden layer” and “the features obtained at the output layer” refer to. Further, this limitation is unclear.  What does it mean to have “newly” adapted features?  For examination purposes and to correct the antecedent issue in the limitation, the limitation will be interpreted to mean “wherein the at least one additional layer is a feature layer introduced to the pre-trained neural network in parallel with a last hidden layer of the plurality of hidden layers farthest from the input layer, and wherein introducing the at least one additional layer comprises introducing additional trainable parameters for converting [[the]] features obtained at the output layer to adapted features as an output from the at least one additional layer” (emphasis added).  Dependent claims 3, 10, and 17 are rejected under 35 U.S.C. 112(b) as being indefinite by virtue of their dependency on indefinite claims 2, 9, and 16. Appropriate correction is required.  

Claim Rejections - 35 USC § 103

In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

The following is a quotation of 35 U.S.C. § 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.



Claims 1, 4, 8, 11, 15, and 18 are rejected under 35 U.S.C. § 103 as being obvious over Hettinger et al. (Hettinger et al., “Forward Thinking: Building and Training Neural Networks One Layer at a Time”, Jun. 8, 2017, 31st Conference on Neural Information Processing Systems, pp. 1-9, hereinafter “Hettinger”) in view of Gros et al. (US 20190236774 A1, hereinafter “Gros”) and Gunes et al. (US 20190370684 A1, hereinafter “Gunes”).

1, Hettinger discloses [a] computer-implemented method comprising: (Abstract; “We present a general framework for training deep neural networks without backpropagation”, suggesting a computer implemented method for training DNNs; and Page 7, §4.1; “Our experiments were run on a single desktop with an Intel i5-7400 processor and an Nvidia GeForce GTX 1060 3GB GPU”)
training a pre-trained neural network, wherein the pre-trained neural network comprises: (Page 1, Introduction; “We present a general framework, which we call forward thinking, for training DNNs without doing backpropagation”, which suggests training a pre-trained neural network; and Page 2, ¶1; “This is much faster than traditional backpropagation, and the number of layers can be determined at training time by simply continuing to add and train layers consecutively until performance plateau”, further suggesting training a pre-trained neural network by adding and training new layers until a performance plateau is reached)
an input layer; (Page 2, §2, “The Input Layer”; the section discloses the input layer for the disclosed method)
a plurality of hidden layers, wherein each of the plurality of hidden layers has a set of nodes, wherein each of the set of nodes has an associated weight trained based on data from a source domain; and (Page 2, §2, “The first hidden layer” and “Additional hidden layers”; the sections disclose a plurality of hidden layers with nodes disclosed as “learners”, each node has a weight that is trained based on data from a source domain in the form of training data)
an output layer, (Page 2, §2, “Final Layer”; the section discloses the output or final  layer for the disclosed method)
wherein training the pre-trained neural network comprises: introducing at least one additional layer to the plurality of hidden layers, wherein said additional layer has one or more nodes having associated weights;  (Page 2, §2, “Additional hidden layers”;  the section discloses introducing at least one additional layer to the plurality of hidden layers, each additional layer having nodes associated with weights; and Figure 1; “The first three iterations of a fully-connected network built with the forward thinking algorithm. The original data set is represented by an ellipse, fully-connected layers with rectangles, and final (output) layers with triangles. Layers with single blue outlines are trainable, while those with double black outlines have been frozen and thus turned into new data sets”, further suggesting introducing at least one additional layer to the plurality of hidden layers)
keeping weights of the nodes in the plurality of hidden layers of the pre-trained neural network unchanged; (Figure 1; “The first three iterations of a fully-connected network built with the forward thinking algorithm. The original data set is represented by an ellipse, fully-connected layers with rectangles, and final (output) layers with triangles. Layers with single blue outlines are trainable, while those with double black outlines have been frozen and thus turned into new data sets”, suggesting keeping weights of the nodes in the plurality of hidden layers of the pre-trained neural network unchanged by freezing the weights of the nodes in the plurality of hidden layers; and Page 4, §3.2; “Once the first network is trained, the weights coming into the first layer are frozen (and stored)”)
inputting data from a target domain to the input layer; and (Page 4, §3.2; “the training inputs . . .  are pushed through the resulting layer to give new “synthetic” data”)
iteratively adjusting weights of the one or more nodes in the at least one additional layer based on features obtained at the output layer (Page 4, §3.3; “Now insert a new hidden layer between the previously trained layer and the output layer. This layer is trained as a single-hidden-layer network on the new, synthetic data”, which discloses adjusting weights, by training the single-hidden layer, of the one of more nodes in the additional layer based on features obtained at the output layer and Page 4, §3.4; “The process of freezing old layers and inserting new ones is repeated until additional layers cease to improve performance. This indicates that it’s time to stop adding new layers and consider the network complete”, which further discloses iteratively adjusting weights of the one or more nodes in the at least one additional layer based on features obtained at the output later based on iteratively training the additional layer).
Hettinger fails to explicitly disclose selecting, from the target domain, a set of features with the least training errors as training samples for training each of the at least one additional layer; processing the data in the plurality of hidden layers based on weights associated with the set of nodes of each of the plurality of hidden layers by calculating a weighted sum based on the weights; determining whether to activate each of the set of nodes based on the weighted sum and a predetermined activation function.
Gros discloses processing the data in the plurality of hidden layers based on weights associated with the set of nodes of each of the plurality of hidden layers by calculating a weighted sum based on the weights; ([0032]; “Each hidden node of the third hidden layer 340 receives the inputs from the second hidden layer 330 and sums them to produce an output, the sums of each node are weighted”, which discloses processing data by calculating a weighted sum based on weights associated with each node of the hidden nodes)
determining whether to activate each of the set of nodes based on the weighted sum and a predetermined activation function ([0032]; “Each hidden node of the third hidden layer 340 receives the inputs from the second hidden layer 330 and sums them to produce an output, the sums of each node are weighted, the sum is passed through a non-linear activation function, and the resulting output is then passed onto each of the nodes 352 of the output layer 350. The nodes 352 of the output layer 350 in turn generate a final output 355 of the deep neural network 300”, which discloses determining whether to activate each of the nodes based on the weighted sum and a predetermined activation function (non-linear activation function).
Hettinger and Gros are analogous art because both are concerned with neural network computing.  Before the effective filing date of the claimed invention, it would have been obvious to one skilled in neural networks to combine the weighted sum and activation function of Gros with the method of Hettinger to yield the predictable result of processing the data in the plurality of hidden layers based on weights associated with the set of nodes of each of the plurality of hidden layers by calculating a weighted sum based on the weights. The motivation for doing so would be to generate deep learning training data with an imaging system (Gros; [0001]).
Gunes discloses selecting, from the target domain, a set of features with the least training errors as training samples for training each of the at least one additional layer ([0138]; “The minimum prediction error was selected to identify the "best" pair of the feature set and the hyperparameter configuration. Instead of fitting a gradient boosting model for each combination (100.times.100=10,000), 10 out of the 100 hyperparameter configurations (10% used as percentage of hyperparameter configurations) were selected for each feature set (100% used as percentage of feature sets) resulting in 100.times.10=1,000 trained gradient boosting models. Each gradient boosting models can be trained and validated in parallel, further decreasing the computational cost as discussed previously” (emphasis added), which discloses selecting, from a target domain, a set of features (the features being training samples) with the least training errors as training samples).
Hettinger, Gros, and Gunes are analogous art because all are concerned with the training of machine learning models.  Before the effective filing date of the claimed invention, it would have been obvious to one skilled in machine learning to combine the selecting of features with the least training errors of Gunes with the training of the additional layer (or layer-wise training) as taught by Hettinger to yield the predictable result of selecting, from the target domain, a set of features with the least training errors as training samples for training each of the at least one additional layer. The motivation for doing so would be to decrease computation costs for training a machine learning model (Gunes; [0138])


Regarding claim 8, Hettinger discloses [a] computer-implemented system for training a neural network, comprising: a processor module; and a memory module including computer program code; the memory module and the computer program code configured to, with the processor module, cause the system at least to: (Abstract; “We present a general framework for training deep neural networks without backpropagation”, suggesting a computer implemented method for training DNNs; and Page 7, §4.1; “Our experiments were run on a single desktop with an Intel i5-7400 processor and an Nvidia GeForce GTX 1060 3GB GPU”)
retrieve, from the memory module, a pre-trained neural network comprising (Page 1, Introduction; “We present a general framework, which we call forward thinking, for training DNNs without doing backpropagation”, which suggests retrieving a pre-trained neural network; and Page 2, ¶1; “This is much faster than traditional backpropagation, and the number of layers can be determined at training time by simply continuing to add and train layers consecutively until performance plateau”, further suggesting training a pre-trained neural network by adding and training new layers until a performance plateau is reached)
an input layer; (Page 2, §2, “The Input Layer”; the section discloses the input layer for the disclosed method)
a plurality of hidden layers, wherein each of the plurality of hidden layers has a set of nodes, wherein each of the set of nodes has an associated weight trained based on data from a source domain; and (Page 2, §2, “The first hidden layer” and “Additional hidden layers”; the sections disclose a plurality of hidden layers with nodes disclosed as “learners”, each node has a weight that is trained based on data from a source domain in the form of training data)
an output layer, (Page 2, §2, “Final Layer”; the section discloses the output or final  layer for the disclosed method)
introduce at least one additional layer to the plurality of hidden layers, wherein said additional layer has one or more nodes having associated weights  (Page 2, §2, “Additional hidden layers”;  the section discloses introducing at least one additional layer to the plurality of hidden layers, each additional layer having nodes associated with weights; and Figure 1; “The first three iterations of a fully-connected network built with the forward thinking algorithm. The original data set is represented by an ellipse, fully-connected layers with rectangles, and final (output) layers with triangles. Layers with single blue outlines are trainable, while those with double black outlines have been frozen and thus turned into new data sets”, further suggesting introducing at least one additional layer to the plurality of hidden layers)
keep weights of the nodes in the plurality of hidden layers of the pre-trained neural network unchanged (Figure 1; “The first three iterations of a fully-connected network built with the forward thinking algorithm. The original data set is represented by an ellipse, fully-connected layers with rectangles, and final (output) layers with triangles. Layers with single blue outlines are trainable, while those with double black outlines have been frozen and thus turned into new data sets”, suggesting keeping weights of the nodes in the plurality of hidden layers of the pre-trained neural network unchanged by freezing the weights of the nodes in the plurality of hidden layers; and Page 4, §3.2; “Once the first network is trained, the weights coming into the first layer are frozen (and stored)”)
input data from a target domain to the input layer and (Page 4, §3.2; “the training inputs . . .  are pushed through the resulting layer to give new “synthetic” data”)
iteratively adjust weights of the one or more nodes in the at least one additional layer based on features obtained at the output layer (Page 4, §3.3; “Now insert a new hidden layer between the previously trained layer and the output layer. This layer is trained as a single-hidden-layer network on the new, synthetic data”, which discloses adjusting weights, by training the single-hidden layer, of the one of more nodes in the additional layer based on features obtained at the output layer and Page 4, §3.4; “The process of freezing old layers and inserting new ones is repeated until additional layers cease to improve performance. This indicates that it’s time to stop adding new layers and consider the network complete”, which further discloses iteratively adjusting weights of the one or more nodes in the at least one additional layer based on features obtained at the output later based on iteratively training the additional layer).
Hettinger fails to explicitly disclose selecting, from the target domain, a set of features with the least training errors as training samples for training each of the at least one additional layer; processing the data in the plurality of hidden layers based on weights associated with the set of nodes of each of the plurality of hidden layers by calculating a weighted sum based on the weights; determining whether to activate each of the set of nodes based on the weighted sum and a predetermined activation function.
Gros discloses processing the data in the plurality of hidden layers based on weights associated with the set of nodes of each of the plurality of hidden layers by calculating a weighted sum based on the weights; ([0032]; “Each hidden node of the third hidden layer 340 receives the inputs from the second hidden layer 330 and sums them to produce an output, the sums of each node are weighted”, which discloses processing data by calculating a weighted sum based on weights associated with each node of the hidden nodes)
determining whether to activate each of the set of nodes based on the weighted sum and a predetermined activation function ([0032]; “Each hidden node of the third hidden layer 340 receives the inputs from the second hidden layer 330 and sums them to produce an output, the sums of each node are weighted, the sum is passed through a non-linear activation function, and the resulting output is then passed onto each of the nodes 352 of the output layer 350. The nodes 352 of the output layer 350 in turn generate a final output 355 of the deep neural network 300”, which discloses determining whether to activate each of the nodes based on the weighted sum and a predetermined activation function (non-linear activation function).
The motivation to combine Hettinger and Gros is the same as discussed above with respect to claim 1.
Gunes discloses selecting, from the target domain, a set of features with the least training errors as training samples for training each of the at least one additional layer ([0138]; “The minimum prediction error was selected to identify the "best" pair of the feature set and the hyperparameter configuration. Instead of fitting a gradient boosting model for each combination (100.times.100=10,000), 10 out of the 100 hyperparameter configurations (10% used as percentage of hyperparameter configurations) were selected for each feature set (100% used as percentage of feature sets) resulting in 100.times.10=1,000 trained gradient boosting models. Each gradient boosting models can be trained and validated in parallel, further decreasing the computational cost as discussed previously” (emphasis added), which discloses selecting, from a target domain, a set of features (the features being training samples) with the least training errors as training samples).
The motivation to combine Hettinger, Gros, and Gunes is the same as discussed above with respect to claim 1.


Regarding claim 15, Hettinger discloses [a] non-transitory, computer-readable medium storing one or more instructions executable by a computer system to perform one or more operations to index blockchain data for storage, comprising (Abstract; “We present a general framework for training deep neural networks without backpropagation”, suggesting a computer implemented method for training DNNs; and Page 7, §4.1; “Our experiments were run on a single desktop with an Intel i5-7400 processor and an Nvidia GeForce GTX 1060 3GB GPU”; and Page 2, ¶2; the disclosed data “D” is the blockchain data)
training a pre-trained neural network that comprises: (Page 1, Introduction; “We present a general framework, which we call forward thinking, for training DNNs without doing backpropagation”, which suggests training a pre-trained neural network; and Page 2, ¶1; “This is much faster than traditional backpropagation, and the number of layers can be determined at training time by simply continuing to add and train layers consecutively until performance plateau”, further suggesting training a pre-trained neural network by adding and training new layers until a performance plateau is reached)
an input layer; (Page 2, §2, “The Input Layer”; the section discloses the input layer for the disclosed method)
a plurality of hidden layers, wherein each of the plurality of hidden layers has a set of nodes, wherein each of the set of nodes has an associated weight trained based on data from a source domain; and (Page 2, §2, “The first hidden layer” and “Additional hidden layers”; the sections disclose a plurality of hidden layers with nodes disclosed as “learners”, each node has a weight that is trained based on data from a source domain in the form of training data)
an output layer, (Page 2, §2, “Final Layer”; the section discloses the output or final  layer for the disclosed method)
introducing at least one additional layer to the plurality of hidden layers, wherein said additional layer has one or more nodes having associated weights  (Page 2, §2, “Additional hidden layers”;  the section discloses introducing at least one additional layer to the plurality of hidden layers, each additional layer having nodes associated with weights; and Figure 1; “The first three iterations of a fully-connected network built with the forward thinking algorithm. The original data set is represented by an ellipse, fully-connected layers with rectangles, and final (output) layers with triangles. Layers with single blue outlines are trainable, while those with double black outlines have been frozen and thus turned into new data sets”, further suggesting introducing at least one additional layer to the plurality of hidden layers)
keeping weights of the nodes in the plurality of hidden layers of the pre-trained neural network unchanged (Figure 1; “The first three iterations of a fully-connected network built with the forward thinking algorithm. The original data set is represented by an ellipse, fully-connected layers with rectangles, and final (output) layers with triangles. Layers with single blue outlines are trainable, while those with double black outlines have been frozen and thus turned into new data sets”, suggesting keeping weights of the nodes in the plurality of hidden layers of the pre-trained neural network unchanged by freezing the weights of the nodes in the plurality of hidden layers; and Page 4, §3.2; “Once the first network is trained, the weights coming into the first layer are frozen (and stored)”)
inputting data from a target domain to the input layer and (Page 4, §3.2; “the training inputs . . .  are pushed through the resulting layer to give new “synthetic” data”)
iteratively adjusting weights of the one or more nodes in the at least one additional layer based on features obtained at the output layer (Page 4, §3.3; “Now insert a new hidden layer between the previously trained layer and the output layer. This layer is trained as a single-hidden-layer network on the new, synthetic data”, which discloses adjusting weights, by training the single-hidden layer, of the one of more nodes in the additional layer based on features obtained at the output layer and Page 4, §3.4; “The process of freezing old layers and inserting new ones is repeated until additional layers cease to improve performance. This indicates that it’s time to stop adding new layers and consider the network complete”, which further discloses iteratively adjusting weights of the one or more nodes in the at least one additional layer based on features obtained at the output later based on iteratively training the additional layer).
Hettinger fails to explicitly disclose selecting, from the target domain, a set of features with the least training errors as training samples for training each of the at least one additional layer; processing the data in the plurality of hidden layers based on weights associated with the set of nodes of each of the plurality of hidden layers by calculating a weighted sum based on the weights; determining whether to activate each of the set of nodes based on the weighted sum and a predetermined activation function.
Gros discloses processing the data in the plurality of hidden layers based on weights associated with the set of nodes of each of the plurality of hidden layers by calculating a weighted sum based on the weights; ([0032]; “Each hidden node of the third hidden layer 340 receives the inputs from the second hidden layer 330 and sums them to produce an output, the sums of each node are weighted”, which discloses processing data by calculating a weighted sum based on weights associated with each node of the hidden nodes)
determining whether to activate each of the set of nodes based on the weighted sum and a predetermined activation function ([0032]; “Each hidden node of the third hidden layer 340 receives the inputs from the second hidden layer 330 and sums them to produce an output, the sums of each node are weighted, the sum is passed through a non-linear activation function, and the resulting output is then passed onto each of the nodes 352 of the output layer 350. The nodes 352 of the output layer 350 in turn generate a final output 355 of the deep neural network 300”, which discloses determining whether to activate each of the nodes based on the weighted sum and a predetermined activation function (non-linear activation function).
The motivation to combine Hettinger and Gros is the same as discussed above with respect to claim 1.
Gunes discloses selecting, from the target domain, a set of features with the least training errors as training samples for training each of the at least one additional layer ([0138]; “The minimum prediction error was selected to identify the "best" pair of the feature set and the hyperparameter configuration. Instead of fitting a gradient boosting model for each combination (100.times.100=10,000), 10 out of the 100 hyperparameter configurations (10% used as percentage of hyperparameter configurations) were selected for each feature set (100% used as percentage of feature sets) resulting in 100.times.10=1,000 trained gradient boosting models. Each gradient boosting models can be trained and validated in parallel, further decreasing the computational cost as discussed previously” (emphasis added), which discloses selecting, from a target domain, a set of features (the features being training samples) with the least training errors as training samples).
The motivation to combine Hettinger, Gros, and Gunes is the same as discussed above with respect to claim 1.

Regarding claims 4, 11, and 18, the rejection of claims 1, 8, and 15 are incorporated and Hettinger further discloses wherein the at least one additional layer is introduced to the pre-trained neural network after the last hidden layer of the pre-trained neural network (Figure 1; the figure discloses introducing the additional layer (C(3)) after the last hidden layer (D(2)) of the pre-trained neural network).


Claims 2, 3, 9, 10, 16, and 17 are rejected under 35 U.S.C. § 103 as being obvious over Hettinger in view of Gros and Gunes and further in view of Chen et al. (Chen et al., “Net2Net: ACCELERATING LEARNING VIA KNOWLEDGE TRANSFER”, Apr., 23, 2016, Neurocomputing, Vol. 287, pp. 1-12, hereinafter “Chen”).

Regarding claims 2, 9, and 16, the rejection of claims 1, 8, and 15 are incorporated but Hettinger fails to explicitly disclose wherein the at least one additional layer is a feature layer introduced to the pre-trained neural network in parallel with the last hidden layer of the plurality of hidden layers farthest from the input layer, and wherein introducing the at least one additional layer comprises introducing additional trainable parameters for converting the features obtained at the output layer to newly adapted features as an output from the at least one additional layer.
Chen discloses wherein the at least one additional layer is a feature layer introduced to the pre-trained neural network in parallel with the last hidden layer of the plurality of hidden layers farthest from the input layer (Page 4, Figure 2; the figure discloses an additional layer (h[3]) that is introduced to a pre-trained neural network in parallel with the last hidden layer of the pretrained network that is farthest from the input layer)
wherein introducing the at least one additional layer comprises introducing additional trainable parameters for converting the features obtained at the output layer to newly adapted features as an output from the at least one additional layer (Page 4, Figure 2 and Associated Description; the figure discloses, under a broadest reasonable interpretation of the claim language, introducing additional trainable parameters such as weights for converting the features obtained at the output layer “y” to adapted features (as interpreted in the 112b rejection above) as an output from the at least one additional layer h[3]. Further, the description of the figure discusses the weights coming out of the additional layer which are the additional trainable parameters).
Hettinger, Gros, Gunes, and Chen are analogous art because all are concerned with training machine learning models.  Before the effective filing date of the claimed invention, it would have been obvious to one skilled in machine learning to combine the parallel additional layer of Chen with the method of Hettinger, Gros, and Gunes to yield the predictable result of wherein the at least one additional layer is a feature layer introduced to the pre-trained neural network in parallel with the last hidden layer of the plurality of hidden layers farthest from the input layer, and wherein introducing the at least one additional layer comprises introducing additional trainable parameters for converting the features obtained at the output layer to newly adapted features as an output from the at least one additional layer. The motivation for doing so would be to accelerate the training of a significantly larger neural net (Chen; Abstract).

Regarding claims 3, 10, and 17, the rejection of claims 1, 2, 8, 9, 15, and 16 are incorporated but Hettinger fails to explicitly disclose performing a concatenation of the output from the at least one additional layer and an output from the last hidden layer of the pre-trained neural network; and passing on the concatenated outputs to the output layer.
Chen discloses performing a concatenation of the output from the at least one additional layer and an output from the last hidden layer of the pre-trained neural network; and passing on the concatenated outputs to the output layer (Page 4, Figure 2; the figure discloses, under a broadest reasonable interpretation of the claim language, performing a concatenation of an output from the at least one additional layer (h[3]) and an output from the last hidden layer of the pre-trained neural network (h[1] or h[2]); and passing on the concatenated outputs to the output layer (y); and Page 4, ¶1; “Another example is concatenation. If we concatenate the output of layer 1 and layer 2, then pass this concatenated output to layer 3, the remapping function for layer 3 needs to take the concatenation into account”).
The motivation to combine Hettinger, Gros, Gunes, and Chen is the same as discussed above with respect to claim 2.

Claims 6, 13, and 20 are rejected under 35 U.S.C. § 103 as being obvious over Hettinger in view of Gros and Gunes and further in view of Elhatri et al. (Elhatri et al., “Extreme Learning Machine-Based Traffic Incidents Detection with Domain Adaptation Transfer Learning”, Oct. 19, 2016, Journal of Intelligent Systems | Volume 26: Issue 4, pp. 601-612, hereinafter “Elhatri”).
Regarding claims 6, 13, and 20, the rejection of claims 1, 8, and 15 are incorporated and Hettinger discloses the additional layer (Page 2, §2, “Additional hidden layers”; the section discloses introducing at least one additional layer to the plurality of hidden layers, each additional layer having nodes associated with weights).
Hettinger fails to explicitly disclose adjusting weights of the one or more nodes in the at least one additional layer based on the set of features with the least training errors.
Elhatri discloses adjusting weights of the one or more nodes in the at least one additional layer based on the set of features with the least training errors (Page 603, Last paragraph going into Page 604; Equations 4, 5, and Algorithm 1; “ELM tends to reach not only the smallest training error but also the smallest norm of output weights. For feed-forward neural networks reaching a smaller training error, the smaller the norms of weights are, the better generalisation performance the networks tend to have. ELM tends to minimise the training error as well as the norm of the output weight”, which discloses, under a broadest reasonable interpretation of the claim language in conjunction with Equations 4 and 5 and Algorithm 1, adjusting or minimizing weights of the one or more nodes based on a set of features that are associated with a smallest training error; and Page 603, §2.1.2; “It is a necessary and sufficient condition that the feature mapping h(x) is chosen to make h(x)β have the capability of approximating any target continuous function”, which further discloses that,. In connection with the ELM having the smallest training error, it chooses a set of features that result in the smallest training error and adjusts weights or the norm of the weights accordingly).
Hettinger, Gros, Gunes, and Elhatri are analogous art because all are concerned with machine learning.  Before the effective filing date of the claimed invention, it would have been obvious to one skilled in neural networks to combine the weight adjusting of Elhatri with the additional layer of Hettinger to yield the predictable result of adjusting weights of the one or more nodes in the at least one additional layer based on the set of features with the least training errors. The motivation for doing so would be to learn a classifier to decide whether an incident has occurred or not (Elhatri; Conclusion).

Claims 7, 14, and 21 are rejected under 35 U.S.C. § 103 as being obvious over Hettinger in view of Gros and Gunes and further in view of Zhang et al. (Zhang et al., “Is Tofu the Cheese of Asia?: Searching for Corresponding Objects across Geographical Areas”, Apr. 2017, WWW '17 Companion: Proceedings of the 26th International Conference on World Wide Web Companion, pp. 1033-1042, hereinafter “Zhang”).

Regarding claims 7, 14, and 21, the rejection of claims 1, 8, and 15 are incorporated but Hettinger fails to explicitly disclose wherein the source domain is associated with a first country or geographical region and the target domain is associated with a second country or geographical region that is different from the firs
Zhang discloses wherein the source domain is associated with a first country or geographical region and the target domain is associated with a second country or geographical region that is different from the first (Abstract; “In particular, we focus on geographical domains. We propose to build connections between two different spaces (e.g., USA and Japan) by mapping the distributed word representations in one space with the ones in the other space”, suggesting a different source and target domain based on different geographical regions or countries (USA and Japan; and Page 1034, §3; “Our goal is to compare terms related to disjoint geographical areas and to find matching term pairs (e.g., NASA and JAXA). For this, we propose constructing a mapping function between the base space and the target space”).
Hettinger, Gros, Gunes, and Zhang are analogous art because all are concerned with machine learning.  Before the effective filing date of the claimed invention, it would have been obvious to one skilled in machine learning to combine the geographically different domains of Zhang with the method of Hettinger, Gros, and Gunes to yield the predictable result of wherein the source domain is associated with a first country or geographical region and the target domain is associated with a second country or geographical region that is different from the first. The motivation for doing so would be to provide for a query suggestion mechanism based on automatic transformation of concepts from one spatial area to another (Zhang; Conclusion).

Response to Arguments

Applicant’s arguments, filed on 1/20/2021, with respect to the 35 USC § 103 rejection of independent claims 1, 8, and 15 have been considered but are moot because the arguments do not apply to any of the references being used in the current rejection to reject independent claims 1, 8, and 15.  Hettinger, Gros, and Gunes are now being used to render claims 1, 8, and 15 obvious under 35 USC § 103.

Conclusion
 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Brent Hoover whose telephone number is (303)297-4403.  The examiner can normally be reached on Monday - Friday 9-5 MST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kamran Afshar can be reached on 571-272-7796.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/BRENT JOHNSTON HOOVER/Examiner, Art Unit 2125