Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Remarks
This Office Action is responsive to Applicants' Amendment filed on April 19, 2022, in which claims 1-19 are amended. Claims 1-19 are currently pending.

Specification
Applicant's amendments made to the specification are acknowledged. Examiner’s objection to the specification are hereby withdrawn, as necessitated by Applicant’s amendments made to the specification.

Response to Arguments
The objections to claims 1, 11, and 13 under 35 U.S.C. § 112(f) are hereby withdrawn, as necessitated by applicant's amendments and remarks made to the objections.
With respect to Applicant's argument regarding the 112(b) towards claim 4 for circular reasoning: while the Examiner agrees that the Applicant's arguments are persuasive, the claim language remains circular and would not be well understood by one of ordinary skill in the art.  The Examiner recommends differentiating between the known training labels and newly generated labels in the claim language.  
The rejections to claims 1-3, 5-7, 10, and 17-18 under 35 U.S.C. § 112(b) are hereby withdrawn, as necessitated by applicant's amendments and remarks made to the rejections.
The rejections to claims 12, 15, and 19 under 35 U.S.C. § 112(d) are hereby withdrawn, as necessitated by applicant's amendments and remarks made to the rejections.
Applicant’s arguments with respect to rejection of claims 1-14 and 15-18 under 35 U.S.C. 101 based on amendment have been considered, and have been deemed persuasive.  Examiner notes, however, that no attempt was made to correct the signal-per-se rejections of claims 15 and 19 which are directed towards transitory computer readable media.  
Applicant’s arguments with respect to rejection of claims 1-19 under 35 U.S.C. 102/103 based on amendment have been considered, however, have not been deemed persuasive.
With respect to Applicant's argument that Aggarwal does not anticipate the claims, Examiner respectfully disagrees.  As best understood, Applicant's arguments are directed towards the instant specification, intended use of the invention, and supposed advantages of the invention rather than the claim language itself.  Examiner asserts that regardless of whether or not the claimed projection layer is capable of performing tasks beyond those described in Aggarwal, the disclosure of Aggarwal fully anticipates the claimed invention in claims 1-3, 7, 11-15, 17, and 19 based on the merits of the claims as drafted.  

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.

Claims 4-5 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.

Regarding claim 4, “wherein the training data includes sensor data annotated with one or more labels, the neural network being trained to generate a number of labels given sensor data in accordance with the training data” is circular and therefore indefinite.  The training data is taught to include sensor data and labels, however the neural network is also taught to generate labels from the training data sensor data which is being taught as already having labels.  In the interest of further examination the claim is being interpreted as “The training device according to claim 1, wherein the training data includes sensor data annotated with one or more input labels, the neural network being trained to generate a number of output labels given sensor data in accordance with the training data, wherein the number of output labels equals the summing parameter”.  Claim 5 also recites the labels from claim 4 and therefore would require appropriate differentiation as well.

Claim Rejections - 35 USC § 101
101 Rejection
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.



Claims 15 and 19 are rejected under 35 USC § 101 because the claimed invention is directed to non-statutory subject matter.

Regarding claims 15 and 19: Claims 15 and 19 recite a “transitory computer readable medium”, which is directed to a transitory signal which is a non-statutory category “signal per se” [MPEP §2106.03]. See In re Nuijten, 500 F.3d 1346, 84 USPQ2d 1495 (Fed. Cir. 2007).

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claims 1-3, 7, 11, 13-15, 17, and 19 are rejected under 35 U.S.C. 102 as being unpatentable over Aggarwal (US 2020/0302016 A1).

	Regarding claim 1, Aggarwal teaches  A neural network training device for training a neural network for classifying objects, the neural network including a sequence of neural network layers, the device comprising: ([¶0036] "In at least some implementations, the classification of structural features discussed herein is implemented using recurrent neural network (RNN) techniques, such as a long short-term memory (LSTM) machine learning model" [¶0037] "In the following discussion, an example environment is first described that may employ the techniques described herein. Example systems and procedures are then described which may be performed in the example environment as well as other environments. Performance of the example procedures is not limited to the example environment and the example environment is not limited to performance of the example procedures. Finally, an example system and device are described that are representative of one or more computing systems and/or devices that may implement the various techniques described herein." LSTM by definition have a sequence of layers. Aggrawal explicitly teaches that the input may be images ([¶0030] "As used herein, the term “digital document” refers to digital collections of digital content, such as digital text and digital images") and that structural features (interpreted as synonymous with objects) may be detected from said images ([¶0031] "As used herein, the term “structural feature” refers to visual elements of digital documents, such as visual structures that make up a digital document"))
	a communication interface for accessing training data comprising object classifications; ([¶0039] "The illustrated environment 100 includes a document analysis system 102 and a client device 104 that are communicatively coupled, one to another, via a network 106." See FIG. 1)
	a non-transitory neural network storage configured to store parameters for multiple layers of the sequence of neural network layers; and ([¶0044] "the document analysis system 102 maintains training data 132 stored on the storage 112. Generally, the training data 132 can be utilized by the analysis manager module 108 to train the character analysis model 126 and the classification model 128 prior to processing the structural features 120. The training data 132, for instance, includes training digital documents (“training documents”) 134, which include tagged structural features (“tagged features”) 136" Tagged features of the training data interpreted as synonymous with parameters.  Training of LSTM with multiple layers is explicitly taught [¶0075].)
	a processor system configured to apply the sequence of neural network layers to data of the training data, and adjust the stored parameters to train the network, wherein (See FIG. 7 for applying sequence of neural network layers to training data.  [¶0044] "Generally, the training data 132 can be utilized by the analysis manager module 108 to train the character analysis model 126 and the classification model 128 prior to processing the structural features 120." [¶0108] "The example computing device 1302 as illustrated includes a processing system 1304, one or more computer-readable media 1306, and one or more I/O interfaces 1308 that are communicatively coupled, one to another. Although not shown, the computing device 1302 may further include a system bus or other data and command transfer system that couples the various components, one to another.")
	at least one layer of the sequence of neural network layers, both during training and in the trained neural network ([¶0025] " Generally, the context determination model is trained as part of training the classification model" Aggarwal teaches that the projection layer is part of the context determination model, therefore it is explicitly taught that the projection layer exists in the network during training and in the trained neural network.)
	is a projection layer, the projection layer being configured for a summing parameter  and is configured to project a layer input vector  of the projection layer to a layer output vector , ([¶0083] "For instance, consider that j represents each structural feature in the digital document 302. Thus for each feature j, the context determination model 902 generates a context aware representation hj which not only takes into account information about structural features which appear before it in the sequence S but also takes into account information about structural features that occur subsequently in S. Hence, h j =[h j f ,h j b]," [¶0084] "where j=1, 2, . . . n, with n=number of structural features on a page. f and b denote, respectively, outputs of the forward and backward LSTMs of the context determination model 902." [¶0085] "After the context determination model 902 processes each vcj and generates hj for each vcj, each hj is input into the decoder model 904 which is represented by Decθ. Generally, Decθ sequentially processes each hj with an output projection layer f to generate feature categorizations 908 for each structural feature. In this example, each individual feature categorization 908 is represented by a respective feature category type ct for each hj. The size of f corresponds to a number of defined categories of structural features." This teaches the projection layer, the correlation between the projection layer inputs and number of structural features which are interpreted as synonymous with labels, the projection term here referred to as hj, and projecting the layer input vector to an output vector f which corresponding directly to the number of defined structural features. f is interpreted as synonymous with k, and is explicitly taught to be the number of categorizations or outputs of the projection layer.)
	The summing parameter being at least 2 ([¶0085] " For instance, based on the structural features 304, f=10")
	the projecting including optimizing a layer loss-function applied to the projection layer output vector subject to the condition that the projection layer output vector  sums to the summing parameter , wherein the layer loss-function includes a regulating term and a projection term. (Regarding the loss function  [¶0096] "Generally, the model parameters are optimized to maximize the log likelihood of feature types in the pages of the training documents 134. In at least one implementation, this can be achieved by minimizing the mean (taken over multiple pages of the training documents 134) of cross entropy loss between predicted softmax probability distribution of each structural feature in a page and one-hot vectors corresponding to their actual output class." Teaches optimizing a loss function of the output classification layer which with regards the instant specification is interpreted to be the projection layer output vector. Cross entropy loss function interpreted as explicitly containing regulating term which with respect to the instant specification is taught as a entropy function.). 

	Regarding claim 2, Aggarwal teaches The training device according to claim 1, wherein the regulating term is a sum of a function applied to the coefficients of the projection layer output vector , the function being convex, continuous and having a minimum for an input between 0 and 1, and/or ([¶0096] "Generally, the model parameters are optimized to maximize the log likelihood of feature types in the pages of the training documents 134...this can be achieved by minimizing the mean (taken over multiple pages of the training documents 134) of cross entropy loss between predicted softmax probability distribution of each structural feature in a page and one-hot vectors corresponding to their actual output class. " See Eqn. in ¶0096 A cross entropy loss function is always convex, the one-hot encoding ensures that the input is between 0 and 1.  Model parameters include projection layer.). 

	Regarding claim 3, Aggarwal teaches The training device according to claim 1, wherein the projection layer is a final layer of the sequence of neural network layers. ([¶0085] "Generally, Decθ sequentially processes each hj with an output projection layer f to generate feature categorizations 908 for each structural feature"). 

	Regarding claim 7, Aggarwal teaches The training device according to claim 1, wherein optimizing the layer loss function includes applying an iterated approximation algorithm to obtain the projection layer output vector . ([¶0096] "Generally, the model parameters are optimized to maximize the log likelihood of feature types in the pages of the training documents 134. In at least one implementation, this can be achieved by minimizing the mean (taken over multiple pages of the training documents 134) of cross entropy loss between predicted softmax probability distribution of each structural feature in a page and one-hot vectors corresponding to their actual output class. Hence, the objective loss function becomes:" See Eqn. in ¶0096 [¶0085] "After the context determination model 902 processes each vcj and generates hj for each vcj, each hj is input into the decoder model 904 which is represented by Decθ. Generally, Decθ sequentially processes each hj with an output projection layer f to generate feature categorizations" In the equation in ¶0096 there are multiple summations which are interpreted as iterative steps.  The loss function also explicitly makes use of one-hot encoding, which is an approximation method.  Therefore the loss function in Aggarwal is interpreted as an iterated approximation algorithm.  See FIG. 9  for model LSTM and layers to see how output vector is obtained.). 

	Regarding claims 11, 13-15, and 19, claims 11, 13-15 ,and 19 are substantially similar to claim 1.  Therefore, the rejection applied to claim 1 also applies to claims 11, 13-15, and 19.  Claim 11 is directed towards a device with substantially similar attributes.  Claims 13-14 are method claims for performing the methods disclosed in the device claim of claim 1.  Claims 15 and 19 are directed towards computer readable medium having instructions for performing the methods in claim 1. 

	Regarding claim 16, claim 16 is substantially similar to claim 4.  Therefore, the rejection applied to claim 4 also applies to claim 16.

	Regarding claim 17, Aggarwal teaches The training device according to claim 1, wherein the function in the regulating term includes minus the binary entropy function, and/or the projection term includes minus the dot-product of the layer input vector and the layer output vector. ([¶0097] See Eqn. "“⋅” is the dot product operation, N is a number of pages in a training document 134, n is a maximum number of structural features in a page of a training document, and the summation of j is performed to account for all structural features in a page. pj i is a softmax probability vector (as predicted by the models) over different possible output categories and lj i is the one-hot vector corresponding to actual class of jth structural feature in ith training document 134" training document interpreted as layer output vector, softmax probability over different output categories interpreted as layer output vector.). 

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 4-6, 12, and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Aggarwal and in view of Block (US 2019/0163982 A1).

	Regarding claim 4, Aggarwal teaches The training device according to claim 1, wherein the number of labels equals the summing parameter . ([¶0085] "Generally, Decθ sequentially processes each hj with an output projection layer f to generate feature categorizations 908 for each structural feature" f is interpreted as synonymous with k which is explicitly taught to be equal to the number of labels or categorizations generated by the neural network.).
	However, Aggarwal does not explicitly teach the training data includes sensor data annotated with one or more labels, the neural network being trained to generate a number of labels given sensor data in accordance with the training data

Block, in the same field of endeavor, teaches the training data includes sensor data annotated with one or more labels, the neural network being trained to generate a number of labels given sensor data in accordance with the training data, wherein ([¶0060] " A node combines sensor input from the data with a set of coefficients, or weights, that either amplify or dampen that input, thereby assigning significance to inputs for the task the algorithm is trying to learn...each layer may provide a useful representation for some aspect of the target class to be labeled." ¶0085] "For current observations of one or more objects within the visual scene of interest, many observation systems may be utilized to obtain data about the objects with respect to various feature types. For example, a GPS location 602 may be obtained for sensors or actuators such as microphones, IR cameras, visible light cameras, and other observation data acquisition systems." Scene including related sensor data is interpreted as synonymous with sensor data annotated with labels.  Neural network classification interpreted as synonymous with generating labels given input data.). 

Aggarwal and Block are both directed towards neural networks for object detection.  Therefore, Aggarwal and Block are analogous art in the same field of endeavor.  It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the teachings of Aggarwal and Block by using sensor data as an input to the neural network. Aggarwal explicitly teaches that sensors may be used for input ([¶0111]). Block gives a more detailed usage of sensor input used for input detection and gives as a motivation for combination ([¶0013] “The feature graphs and developed feature vector spaces may be used to establish semantic associations via a semantic scene graph relating to aspects of a visual scene of interest for identification and determination of associations between objects or other features of the visual scene of interest according to techniques such as the embodiments described”).

	Regarding claim 5, the combination of Aggarwal, and Block teaches The training device according to claim 4, wherein for at least part of the sensor data in the training data the number of labels is less than the summing parameter (Aggarwal [¶0089] "Accordingly, the aspects of the systems 800, 900 described above can be utilized for both training the character analysis model 126 and the classification model 128 using the labeled training documents 134, and for classifying structure features of unlabeled digital documents 114." [¶0085] "Generally, Decθ sequentially processes each hj with an output projection layer f to generate feature categorizations 908 for each structural feature. In this example, each individual feature categorization 908 is represented by a respective feature category type" [¶0111] "Examples of input devices include a keyboard, a cursor control device (e.g., a mouse), a microphone, a scanner, touch functionality (e.g., capacitive or other sensors" Having a subset of data correlate to a subset of output categories less than the total number of output categories would lead to obvious and expected outcomes.  Aggarwal explicitly teaches using sensors for input data.). 

	Regarding claim 6, Aggarwal teaches the set having a number of elements equal to the summing parameter. ([¶0085] "Generally, Decθ sequentially processes each hj with an output projection layer f to generate feature categorizations 908 for each structural feature" f is interpreted as synonymous with the summing parameter, the set of relationships is interpreted as synonymous with categorizations.).
	However, Aggarwal does not explicitly teach The training device according to claim 1 for use in a scene graph generation task, the training data including sensor data annotated with objects and relationships between the objects, the network being configured to generate a set of relationships for edges of the scene graph,  

Block, in the same field of endeavor, teaches The training device according to claim 1 for use in a scene graph generation task, the training data including sensor data annotated with objects and relationships between the objects, the network being configured to generate a set of relationships for edges of the scene graph, ([¶0085] "For current observations of one or more objects within the visual scene of interest, many observation systems may be utilized to obtain data about the objects with respect to various feature types. For example, a GPS location 602 may be obtained for sensors or actuators such as microphones, IR cameras, visible light cameras, and other observation data acquisition systems." A GPS location is explicitly taught as an object with a relationship to the sensor data.  Including related sensor data is interpreted as being synonymous with annotating.). 

Aggarwal and Block are both directed towards neural networks for object detection.  Therefore, Aggarwal and Block are analogous art in the same field of endeavor.  It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the teachings of Aggarwal and Block by using sensor data as an input to the neural network. Aggarwal explicitly teaches that sensors may be used for input ([¶0111]). Block gives a more detailed usage of sensor input used for input detection and gives as a motivation for combination ([¶0013] “The feature graphs and developed feature vector spaces may be used to establish semantic associations via a semantic scene graph relating to aspects of a visual scene of interest for identification and determination of associations between objects or other features of the visual scene of interest according to techniques such as the embodiments described”).

Regarding claim 2, claim 2 is substantially similar to claim 4.  Therefore, the rejection applied to claim 4 also applies to claim 4.  Claim 12 is also specifically directed towards an autonomous vehicle, the claim limitation not explicitly taught by Aggarwal is taught by Block the input data is applied to sensor data of an autonomous device, the neural network being configured to classify objects in the sensor data, an autonomous device control being configured for decision making depending on the classification ([¶0029] "The computer system 100 can also include one or more buses 108 operable to transmit communications between the various hardware components such as any combination of various input and output (I/O) devices." See FIG. 1 [¶0034] "For example, instructions 124 may execute the scene feature extractor and feature classifier system 132, software agents, or other aspects or components" [¶0060] " A node combines sensor input from the data with a set of coefficients, or weights, that either amplify or dampen that input, thereby assigning significance to inputs for the task the algorithm is trying to learn...each layer may provide a useful representation for some aspect of the target class to be labeled." ¶0085] "For current observations of one or more objects within the visual scene of interest, many observation systems may be utilized to obtain data about the objects with respect to various feature types. For example, a GPS location 602 may be obtained for sensors or actuators such as microphones, IR cameras, visible light cameras, and other observation data acquisition systems.").  The combination of Aggarwal and Block given in claim 4 also applies to claim 12.  

	Claim 8 is rejected under 35 U.S.C. 103 as being unpatentable over Aggarwal and in view of Zeng (US 2019/0050372 A1). 

	Regarding claim 8, Aggarwal teaches The training device according to claim 1.
	However, Aggarwal does not explicitly teach optimizing the layer loss function includes optimizing a scalar (v*), wherein the projection layer output vector is computed by applying a function to the layer input vector and the optimized scalar.  

Zeng, in the same field of endeavor, teaches The training device according to claim 1, wherein optimizing the layer loss function includes optimizing a scalar (v*), wherein the projection layer output vector is computed by applying a function to the layer input vector and the optimized scalar. (See Eqn. 14 [¶0054] "Formulation (14) is a minimax problem where the primal variables {U, V, Z} and dual variable Λ aim at decreasing and increasing" See also Eqn. 19 [¶0057] "and ignoring the constant term independent on {U,V}, the subproblem of equation (15) can be seen as an equivalent to the following Frobenius norm minimization problem:" the dual variable Λ is interpreted as synonymous with the scalar v*.  The optimized scalar representation is interpreted as a synonymous derivation of Eqn. 19 in Zeng.  The Frobenius norm is well known in the art as a cost function.). 

Zeng is directed towards optimized matrix methods to be used in computer systems, in which Zeng explicitly mentions the application towards machine learning, which would be obvious to one of ordinary skill in the art. Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the loss-function in Aggarwal with the Frobenius norm derivation in Zeng. The combination would have been obvious because a person of ordinary skill in the art would be able to determine from Zeng ([¶0076] “the ADMM scheme is superior to the ACO scheme in terms of robust subspace estimation performance and computational complexity… In addition to the aforementioned advantages of lp-PCA techniques, a lp-PCA technique implemented in accordance with the concepts herein provides robust direction-of-arrival (DOA) estimation.” The underlying mathematics and motivation used for approximation are considered relevant to any matrix distance approximation.).

	Claim 9 is rejected under 35 U.S.C. 103 as being unpatentable over the combination of Aggarwal, and Zeng and in further view of Mahent-Shetti (US5367702A).

	Regarding claim 9, the combination of Aggarwal and Zeng teaches The training device according to claim 8.
However, the combination of Aggarwal and Zeng does not explicitly teach upper and/or lower bounds for the scalar are obtained by sorting the layer input vector , selecting a component and adding and/or subtracting a value.  

Mahent-Shetti, in the same field of endeavor, teaches The training device according to claim 8, wherein upper and/or lower bounds for the scalar are obtained by sorting the layer input vector , selecting a component and adding and/or subtracting a value. ("The two outputs of the routing logic 156, x and y, are coupled to subtracter 158 and to first and second multiplexers 160 and 162. The carry bit of subtracter 158 is also connected to the multiplexers 160 and 162, which in combination with subtracter 158, sort the outputs, x and y, of routing logic 156 such that the output of multiplexer 160, n, is the minimum value of x and y, and the output of multiplexer 162, m, is the maximum value of x and y." the minimum and maximum values of x are interpreted as upper and lower bounds for scalar (v*), subtracter 158 has the purpose of subtracting a value corresponding to input vector x.). 

Mahent-Shetti is directed towards computer methods of approximating non-linear functions, which is seen as highly relevant to the field of neural networks where nonlinear cost functions are optimized.  Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the teachings of Aggarwal and Zeng with the teachings of Mahent-Shetti by sorting a vector in an a neural network to obtain the upper and lower bounds. The combination would have been obvious because a person of ordinary skill in the art would be able to determine from Mahent-Shetti ([¶Summary] “An important technical advantage of the present invention inheres in the fact that it uses adders, subtracters, multiplexers, and shifters to approximate nonlinear functions. The use of these small and fast circuit components increases the speed and reduces the expense in terms of semiconductor area with which nonlinear functions can be calculated.”).

	Claims 10 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Aggarwal, and Zeng and in further view of Beheshti (“On Interval Weighted Three-layer Neural Networks”, 1998).  

	Regarding claim 10, the combination of Aggarwal and Zeng teaches The training device according to claim 8.
	However, the combination of Aggarwal and Zeng does not explicitly teach  optimizing the scalar is a bisection method.  

Beheshti, in the same field of endeavor, teaches The training device according to claim 8, wherein optimizing the scalar is a bisection method. ([p. 3 §IIIB] "Interval Newton/Generalized Bisection Algorithm...To speed up the linear solver to solve the interval linear system (7), an NN scalar matrix Y , called a preconditioner, is introduced to form a new system" [p. 4 §IIIC] "We use an example below to explain how to apply interval Newton/generalized bisection method to find weights for a given three-layer neural network"). 

	The combination of Aggarwal and Zeng as well as Beheshti are directed towards applications of optimizing nonlinear loss functions in neural networks.  Therefore, the combination of Aggarwal and Zeng as well as Beheshti are analogous art in the same field of endeavor.  It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the teachings of Aggarwal and Zeng with the teachings of Beheshti by using a bisection method to optimize a scalar. The bisection method is a well known iterative algorithm for solving nonlinear systems and it would be obvious to one of ordinary skill in the art to use this method in a neural network.  Beheshti further reinforces this ([p. 3 §IIB] “The well known classical Newton method can be used to find solutions for some nonlinear systems of equations numerically. The solution found by traditional Newton method largely depends on the initial guess point if it is successful. It does not mathematically guarantee to find all roots within a given domain either. Finite-precision arithmetic may also cause unreliability both mathematically and computationally. To overcome these problems, extensive studies on interval Newton methods [2, 3, 4, 14] have been done.”).  

	Regarding claim 18, the combination of Aggarwal and Zeng teaches The training device according to claim 1.
	However, the combination of Aggarwal and Zeng does not explicitly optimizing is a bisection method.  

Beheshti, in the same field of endeavor, teaches The training device according to claim 1, wherein optimizing is a bisection method. ([p. 3 §IIIB] "Interval Newton/Generalized Bisection Algorithm...To speed up the linear solver to solve the interval linear system (7), an NN scalar matrix Y , called a preconditioner, is introduced to form a new system" [p. 4 §IIIC] "We use an example below to explain how to apply interval Newton/generalized bisection method to find weights for a given three-layer neural network"). 

	The combination of Aggarwal and Zeng as well as Beheshti are directed towards applications of optimizing nonlinear loss functions in neural networks.  Therefore, the combination of Aggarwal and Zeng as well as Beheshti are analogous art in the same field of endeavor.  It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the teachings of Aggarwal and Zeng with the teachings of Beheshti by using a bisection method to optimize a scalar. The bisection method is a well known iterative algorithm for solving nonlinear systems and it would be obvious to one of ordinary skill in the art to use this method in a neural network.  Beheshti further reinforces this ([p. 3 §IIB] “The well known classical Newton method can be used to find solutions for some nonlinear systems of equations numerically. The solution found by traditional Newton method largely depends on the initial guess point if it is successful. It does not mathematically guarantee to find all roots within a given domain either. Finite-precision arithmetic may also cause unreliability both mathematically and computationally. To overcome these problems, extensive studies on interval Newton methods [2, 3, 4, 14] have been done.”).  

Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SIDNEY VINCENT BOSTWICK whose telephone number is (571)272-4720. The examiner can normally be reached M-F 7:30am-5:00pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Miranda Huang can be reached on (571)270-7092. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/SB/Examiner, Art Unit 2124                                                                                                                                                                                                        
/LUIS A SITIRICHE/Primary Examiner, Art Unit 2126