DETAILED ACTION
This action is in response to communications filed on 09/23/2021 in which claims 1-101 were cancelled; claims 113 and 122 have been amended; claims 104-106 and 115-117 have been cancelled; and claims 102-103, 107-114, and 118-123 are still pending. 

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Drawings
The drawings were received on 07/15/2020.  These drawings are acceptable.

Priority
The present application, filed on 07/08/2020, is Continuation of Application No. 16/767,966 (filed 05/28/2020), which is 371 of PCT/US2019/015389 (filed on 01/28/2019). PCT/US2019/015389 claims priority to U.S. Provisional Application No. 62/647,085 (filed on 03/23/2018) and to U.S. Provisional Application No. 62/623,773 (filed on 01/30/2018) that is acknowledged by the examiner.

Information Disclosure Statement
The information disclosure statements (IDSs) submitted on 07/15/2020 and 09/30/2020 has been considered by the examiner. 
Response to Arguments
Applicant's arguments filed 09/23/2021 have been fully considered.
Regarding the rejection of claims under 35 USC § 102 and 103, the applicant’s arguments have been fully considered and the rejection of claims have been updated and new 
First the applicant argues that the Mat reference does not disclose the teaches for training a neural network having nodes that have nodes that have no connection as recited by the claim limitation “and wherein there is no direct connection from node A to node B in the neural network after the initial training”. The examiner respectfully disagrees.
The standard for analysis of claims under 35 US 103 are provided in MPEP 2141. These guidelines are intended to assist Office personnel to make a proper determination of obviousness under 35 U.S.C. 103, and to provide an appropriate supporting rationale in view of the decision by the Supreme Court in KSR International Co. v. Teleflex Inc. (KSR), 550 U.S. 398, 82 USPQ2d 1385 (2007). Under claim interpretation guidelines claim terms most be examined under broadest reasonable interpretation (BRI) in light of applicant’s specification, see MPEP 2111.
Examiner notes the scope/interpretation of “no connection” includes setting the initial weight state to zero, as disclosed by Mat paragraphs 0144-0145. The applicant’s specification discloses that neural network having a plurality of hidden layers, 0044, using feed-forward (which will consider the zero weight as  non-connected node) and back-propagation computations is connected for a weight connection that is non-zero, in 0058. The rejection has been updated to not that Mat teaches the sequential training of an ordered neural network to determining when the change the connection weight states from zero when after training an preceding stage of layers having node B feeding to higher order layer for sequential pattern recognition. Addition references have been brought in to highlight the use of sequential training using back-propagation computation by feed-forwarding computation to a particular node layer 
Second, the applicant argues the disclosure in Matt is directed to adding a new node after the initial training using the sum over multiple factor. 
Examiner respectfully disagrees. The rejection has been updated to note that node B can be part of the connection that is associated with a zero weight that is updated using backpropagation, in-lined with the claim language interpretation noted above. The rejection has been updated to add new art that teaches the details of backpropagation as claimed, see updated rejected below.
	
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 102-103, 107-114, and 118-123 are rejected under 35 U.S.C. 103 as being unpatentable over Matsugu et al. (US Pub. No. 2005/0283450, hereinafter ‘Mat’) in view of Burr (US Pat. No. 11,074,499) and in further view of Rojas (NPL: “The backpropagation algorithm”).

Regarding independent claim 102 limitations, Mat teaches a method of training a neural network, the method comprising:
(in 0144-0145: The receptor field structure of the neuron in each processing modules of the subsequent feature detection layers ((1,1),(1,2), ... ) forms a receptor field structure to detect a feature unique to the recognition target pattern by supervised learning (so-called back propagation learning rule), unlike the processing module formed in advance in the feature detection layer (1,0). The size of the local region where feature detection is executed approaches stepwise the size of the entire recognition target toward the final layer so that the middle- or high-order features are detected geo­metrically…Each feature detection layer has two processing modules formed in advance in the initial state. The receptor field structure of each neuron in each processing module is given at random. Learning in each processing module of the feature detection layers ((1, 1), (1,2), . . . ) is executed sequentially [claimed initial learning in the first set of sequential layers]] in ascending order of layer level for each processing module of each layer…)
wherein the neural network comprises a plurality of layers, including an input layer, an output layer, and at least one hidden layer, wherein each layer comprises at least one node, such that the neural network comprises a plurality of nodes, including a node A and a node B and a node C, wherein node A is not in the output layer and node B is not in the input layer, and wherein there is no direct connection from node A to node B in the neural network after the initial training (claimed neural network as depicted in annotated Fig. 6 below, as descending node order, in 0154-0156: To describe the operation of the processing module addition/deletion control circuit 30, the process of learning operation in the processing module in the feature detection layer which detects a block of a significant partial region (e.g., an eye, nose, or mouth in a face image) in an image will be described. A feature detection processing module (A1) to detect an eye will be described. As shown in FIG. SA, processing modules to detect local features (e.g., pat­terns in frames Bl to B4 in FIG. SA) which are estimated as effective in detecting a partial region (supervisory data: eye) are present in the feature detection layer and feature integration layer under a processing module A1 Fin advance, as shown in FIG. 6….Each neuron element of the added processing mod­ule forms a connection (to be described later) to each neuron element of the upper processing module (A1). This addition can be implemented by changing the value of the connection coefficient from O (no connection state) [claimed wherein node A is not in the output layer and node B is not in the input layer having an initial connection state of zero]...)
and there is direct connection from node C to node B in the neural network after the initial training, and wherein the initial training comprises back-propagating estimates of partial derivatives for each of nodes A, B and C with respect to an objective for the neural network, wherein the objective is the same for nodes A, B and C in the initial training;  (claimed connection of C to B node in the set of nodes trained in the initial sequential learning process using claimed backpropagation, in 0144: The receptor field structure of the neuron in each processing modules of the subsequent feature detection layers ((1,1),(1,2), ... ) forms a receptor field structure to detect a feature unique to the recognition target pattern by supervised learning (so-called back propagation learning rule) [and wherein the initial training comprises back-propagating estimates of partial derivatives for each of nodes A, B and C with respect to an objective for the neural network], unlike the processing module formed in advance in the feature detection layer (1,0). The size of the local region where feature detection is executed approaches stepwise the size of the entire recognition target toward the final layer  [including initial step C to B layer preceding nod A layer towards final layer as claimed direct connection from node C to node B in the neural network after the initial training] so that the middle- or high-order features are detected geo­metrically. For example, for detection/recognition of a face [claimed objective for the neural network as perform pattern recognition/classification task], middle-order ( or high-order) features are features at the graphic element level such as eyes, nose, and mouth in the face.)
after the initial training, evaluating, by the computer system, whether to add a direct connection from node A in the neural network to node B in the neural network, wherein evaluating whether to add (evaluation by the detection processing module for the detection of a an error amount at Node A1,F, that the claimed Node A, in [0154]-[0155]: …the process of learning operation in the processing module in the feature detection layer which detects a block of a significant partial region (e.g., an eye, nose, or mouth in a face image) in an image will be described... Referring to FIG. 6, the processing modules (B1 s, B2 s, B3 5) indicate processing modules arranged in the feature detection layer. 0155 Assume that in the learning process of a local feature detection processing module A1 F  the error amount for a training data set is larger than a predetermined thresh­old value because of detection error…; Where training is used for determine to connect all processing module containing claim B node to the A 1, F, based on synapse weight value and error amount, in 0156-0157: Each neuron element of the added processing mod­ule forms a connection (to be described later) to each neuron element of the upper processing module (A1). This addition can be implemented by changing the value of the connection coefficient from O (no connection state)… When the error amount (or error value change rate) is equal to or smaller than the threshold value, relative evaluation of the degree of contribution for the feature detection performance in the processing module A1 F between the processing modules (B1 s, B2 s, ...) of the feature integration layer is done on the basis of the maxi­mum value of the synapse weight value in the processing module [error terms used in back propagation at state 0 than when added at state 1]…)
wherein computing the value comprises computing, by the computer system, a sum, over a set of training data, of products of multiple factors, wherein the multiple factors comprise, for each item in the set of training data, an activation value for node A and a partial derivative of an error loss function with respect to node B; (claimed computed sum using backpropagation, in 0145: The receptor field structure of the neuron in each processing modules of the subsequent feature detection layers ((1,1),(1,2), ...) forms a receptor field structure to detect a feature unique to the recognition target pattern by supervised learning (so-called back propagation learning rule), unlike the processing module formed in advance in the feature detection layer (1,0). The size of the local region where feature detection is executed approaches stepwise the size of the entire recognition target toward the final layer so that the middle- or high-order features are detected geo­metrically…)
and adding, by the computer system, the direct connection from node A to node B upon a determination by the computer system that an outcome of estimating the improvement in the objective of the neural network meets a criterion for adding the direct connection. (claimed adding process as determining to connect all processing module containing claim B node to the A 1, F, based on synapse weight value connection state change based on error evaluation, in 0156-0157: Each neuron element of the added processing mod­ule forms a connection (to be described later) to each neuron element of the upper processing module (A1). This addition can be implemented by changing the value of the connection coefficient from O (no connection state)… When the error amount ( or error value change rate) is equal to or smaller than the threshold value, relative evaluation of the degree of contribution for the feature detection performance [claimed determination by the computer system that an outcome of estimating the improvement in the objective of the neural network meets a criterion for adding the direct connection] in the processing module A1 F between the processing modules (B1 s, B2 s, ... ) of the feature integration layer is done on the basis of the maxi­mum value of the synapse weight value in the processing module…)
Examiner notes claimed computer system for performing operations in 0222: In the above embodiments, the pattern recognition apparatus is implemented by dedicated hardware. Instead, the above-described processing executed by the pattern recognition apparatus may be prepared in the form of a program, installed in the memory of a computer such as PC (personal computer) or WS (workstation), and executed by the CPU of the computer so that the computer can execute the processing executed by the pattern recognition apparatus described in the above embodiments.; And in 0225: The functions of the above-described embodiments are also implemented when the program codes read out from the recording medium are written in the memory of a function expansion board inserted into the computer or a function expansion unit connected to the computer, and the CPU of the function expansion board or function expansion unit performs part or all of actual processing on the basis of the instructions of the program codes.
While Mat discloses the use of back-propagation as a supervised learning method for training neural network as sequential layer groups executed in ascending order. Matt does not expressly recite the learning/training as the adjustment of the weights for production a pattern recognition output for training a neural network in sequential ascending order as training cycles. Burr expressly learning/training as the adjustment of the weights for production a pattern recognition output for training a neural network in sequential ascending order as training cycles, (in 1:20-32: Artificial Neural Networks (ANNs) are distributed com­puting systems, which consist of a number of neurons interconnected through connection points called synapses. Each synapse encodes the strength of the connection between the output of one neuron and the input of another. The output of each neuron is determined by the aggregate input received from other neurons that are connected to it, and thus by the outputs of these "upstream" connected neurons and the strength of the connections as determined by the synaptic weights. The ANN is trained to solve a specific problem (e.g., pattern recognition) by adjusting the weights of the synapses such that a particular class of inputs produce a desired output. The weight adjustment procedure is known as "learning."…; And in 4:55-64: …where a plurality of training examples are serially input to the ANN while observing its output, where a backpropagation algorithm updates the synaptic weight in response to a difference between the output from a given layer and a desired output from the given layer, the method comprising: (a) pausing training and measuring conductance across analog memory elements in the ANN network, and computing an original synaptic weight for each synapse by arithmetic contributions from two or more conductance pairs;…)
While the Matt and Burr teach the use of backpropagation used in training neural networks and updating the synaptic connections. The references do not recite the inherent computations of back propagation. Rojas expressly teaches the inherent computations of back propagations (use of activation functions for computing an activation value for nodes in the neural network, in Sec. 7.1 & Sec. 7.1.1: …In this chapter we discuss a popular learning method capable of handling such large learning problems -the backpropagation algorithm… One of  the  more  popular  activation  functions  for  backpropagation  networks  is  the  sigmoid… Many  other  kinds  of  activation  functions  have  been  proposed  and  the backpropagation algorithm  is applicable to  all of them...: And computing the error for updating the weights using an iterative process using partial derivatives, in Pg. Pg. 155 and Pgs. 164-166: 

    PNG
    media_image1.png
    706
    1034
    media_image1.png
    Greyscale
 …






    PNG
    media_image2.png
    923
    779
    media_image2.png
    Greyscale
…)

The Mat, Burr, and Rojas are references would have been recognized by those of ordinary skill in the art as useful for applicant’s purpose in developing information processing in artificial neural network systems using back-propagation learning.

One of ordinary skill in the arts would have been motivated to combine the disclosed methods of Burr and Matt in order to discover patterns in the artificial networks synaptic weight patterns using backpropagation algorithms for incremental learning (Burr, 1:34-38); Doing so leads to a pattern of synaptic weights that, during the learning process, that converges toward an optimal solution of the given problem (Burr, 1:34-38).

In addition, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of the prior art for updating connection weights using backpropagation as disclosed by Rojas with the method of sequentially training artificial neural network systems using back-propagation learning as collectively disclosed by Mat and Burr.
One of ordinary skill in the arts would have been motivated to combine the disclosed methods of Rojas, Burr, and Mat in order facilitate the computational effort needed for finding the correct combination of weights when more parameters and more complicated  topologies are considered. (Rojas, Abstract); Doing so allows the use of a popular learning method capable of handling such large learning problems - the backpropagation algorithm, (Sec. 7 1st para.); the algorithm can be efficiently implemented in computing systems in which only local information can be transported through the network, (Rojas, Sec. 7: 2nd para.).

	

further comprising, in a subsequent training, training, by the computer system, the neural network after adding the direct connection from node A to node B . (learning after connection as sequential process to include layers after the A node layer towards the final layer, The receptor field structure of the neuron in each processing modules of the subsequent feature detection layers ((1, 1), (1, 2), ...) forms a receptor field structure to detect a feature unique to the recognition target pattern by supervised learning (so-called back propagation learning rule), unlike the processing module formed in advance in the feature detection layer (1, 0). The size of the local region where feature detection is executed approaches stepwise the size of the entire recognition target toward the final layer so that the middle- or high-order features are detected geo­metrically; And the learning an upper detection layer, in 0145: …Learning in each processing module of the feature detection layers ((1, 1), (1, 2), . . .) is executed sequentially in ascending order of layer level for each processing module of each layer. Connection between the lower layer (feature integration layer) and each neuron belonging to the processing module which has finished learning is corrected within a predetermined range later by learning in a processing module belonging to an upper feature detection layer…) 

Regarding claim 107, the rejection of claim 103 is incorporated and Mat in combination with Burr and Rojas further teaches the method of claim 103,
wherein the multiple factors further comprise a first hyper parameter. (δ in 0162: In addition, δ represents the local gradient. In an output layer L, δ is given by 
    PNG
    media_image3.png
    33
    201
    media_image3.png
    Greyscale
 … )

Regarding claim 108, the rejection of claim 107 is incorporated and Mat in combination with Burr and Rojas further teaches the method of claim 107,
further comprising determining, by a computer-implemented learning coach, the first hyper parameter. (learned as the local gradient during learning process, in 0161-0162: The updating formula related to a synapse connection used in the learning process is given by…the coefficient of inertia of learning, and ƞji is the learning coefficient of connection from the ith neuron to the jth neuron. In addition, δ represents the local gradient; And (in 0170-0171: As a result of supervised learning in the processing module A1, the receptor field structure of the neuron to give the minimum error value in the newly added processing module (B) generally corresponds to an unlearned local feature class contained in the pattern to be detected …  In this way, the receptor field structure related to detection of an unlearned feature class as the constituent element of the composite local feature is automatically formed by learning. When the receptor field structure is copied to the receptor field (synapse connection weight data) of each neuron in the same processing module, the newly added processing module (B) is formed.) And [leaning coach executing back-propagation, in 0174: In supervised learning (back propagation learning) executed next, the learning coefficient related to connection from the modules… and in 0144: The receptor field structure of the neuron in each processing modules of the subsequent feature detection layers ((1,1),(1,2), ... ) forms a receptor field structure to detect a feature unique to the recognition target pattern by supervised learning (so-called back propagation learning rule),…)


wherein the first hyper parameter comprises a data influence weight for each item in the set of training data. (δ for computing the sum of products for the set of k training data for each j neuron, in 0163:  by using a differential coefficient cp' related to a neuron internal state v (corresponding to the result of the sum of the products of the neuron output of the preceding layer) of an activation function cp (typically, a logistic function is used) and an error value between the output signal and the supervisor signal. In the intermediate layer (1th layer), δ is given by 
    PNG
    media_image4.png
    66
    346
    media_image4.png
    Greyscale
 )

Regarding claim 110, the rejection of claim 107 is incorporated. Mat in combination with Burr and Rojas further teaches claim 112 limitation,
wherein computing the value of adding the direct connection from node A to node B further comprises adding, by the computer system, a second hyper parameter to the sum. (as second parameter at k> 1, in Figure 6 and in 0163:  by using a differential coefficient cp' related to a neuron internal state v (corresponding to the result of the sum of the products of the neuron output of the preceding layer [hyper parameter to the sum]) of an activation function cp (typically, a logistic function is used) and an error value between the output signal and the supervisor signal. In the intermediate layer (1th layer), δ is given by 
    PNG
    media_image4.png
    66
    346
    media_image4.png
    Greyscale
 )


wherein the set of training data comprises a batch of training data (in 118: as supervisory data, the data of a local feature contained in a recognition object as a constituent element] for the neural network. [neural network containing node A, in 0155: Assume that in the learning process of a local feature detection processing module A1 F ,the error amount for a training data set is larger than a predetermined thresh­old value because of detection error…; and, in 0173)

Regarding claim 112, the rejection of claim 102 is incorporated. Mat in combination with Burr and Rojas further teaches claim 112 limitation,
wherein the neural network comprises a self-organizing partially ordered network. (in 0193: As in the first embodiment, after supervised learning converges, … After that, self-organizing learning to be described later is executed by a learning control circuit 40. The self-organizing learning promotes detection of a feature class which is present independently of a feature class to be detected by processing modules which have finished learning.)

Regarding independent claim 113 limitations, Mat in combination with Burr and Rojas in combination with Burr and Rojas teaches a computer system comprising: 
one or more processor cores; and a memory in communication with the one or more processor cores, wherein the memory stores software that, when executed by the one or more processor cores, cause the one or more processor cores to: (in 0222: the program codes of software to implement the functions of the above-described embodiments to a system or apparatus and causing the computer ( or CPU or MPU) of the system or apparatus to read out and execute the program codes stored in the recording medium)
the additional claim limitations are similar to claim 102 limitations and are rejected under the same rationale.

Regarding claim 114, the rejection of claim 113 is incorporated and Mat in combination with Burr and Rojas teaches the memory stores further software that, when executed by the one or more processors, cause the one or more processors to, in [0222]:
the additional claim limitations are similar to claim 103 limitations, respectively, and are rejected under the same rationale.
	
Regarding claims 118-123, the rejection respective dependent claims are incorporated. Mat in combination with Burr and Rojas teaches the memory stores further software that, when executed by the one or more processors cause the one or more processors to, in [0222]:
the additional claim limitations are similar to claims 107-112 limitations, respectively, and are rejected under the same rationale.


Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure are listed below:
Petrowski, A. "Choosing among several parallel implementations of the backpropagation algorithm.": teaches Neural networks are partial ordered networks for feed-forward neuron networks with index of the layers in a specified order. 
Baker (US Pub. No. 2008/0069437): teaches pattern recognition with a plurality of models within a classifier and use a model to determine labels associated with a plurality of links.
Chickering et al. (US Pub. No. 2005/0131848): teaches the use of learning techniques for determining link values between node links in  Bayesian networks.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to OLUWATOSIN ALABI whose telephone number is (571)272-0516.  The examiner can normally be reached on Monday-Friday, 8:00am-5:00pm EST..
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Michael Huntley can be reached on (303) 297-4307.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-






/O.O.A./Examiner, Art Unit 2126   
                                                                                                                                                                                                                                                                                                                                                                                                       
/MICHAEL J HUNTLEY/Supervisory Patent Examiner, Art Unit 2129