DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Status
Claims 1-17 are pending in this application.

Priority
Acknowledgment is made of applicant’s claim for a foreign priority under 35 U.S.C. 119 (a)-(d). The certified copy for a foreign Australia application No. AU2016277542 filed on 12/19/2016 has not been filed.

Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.


As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation(s) is/are: “means for determining an activation value”, “means for scaling the determined activation values”, and “means for updating weights” in claim 16.

   • “means for determining…” is sufficiently described in [0045] and Fig 3. 
   • “means for scaling the determined activation values…” is sufficiently described in [0116] “The scaled activation value is the product of the node's activation value and the node's assigned scaling factor”.
   • “means for updating weights…” is sufficiently described in [0097] “The edge weights are updated or adjusted using the derivative of the sparsity penalty values determined in the step 504 with respect to the edge weights of the network 600.”; [0043] “The sparsity penalty value may also be referred to as a regularisation cost, a cost function, a loss function, or an objective function If the node responds sparsely to the training data instances, the sparsity penalty is low. If the node does not respond sparsely to the training data instances, the sparsity penalty is high.”; [0075] “The set of derivative values is determined from the combination of the derivatives of an error value determined by the difference between the activation value of the output nodes and the target for each training instance or training example, and the derivatives of one or more sparsity penalty values determined from the distribution of activation values of nodes in the network.” 
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the 

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claim 15 is rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter.  The claim(s) does/do not fall within at least one of the four categories of patent eligible subject matter because the claim recites a “computer readable medium.” The broadest reasonable interpretation of a claim drawn to a computer readable medium (also called machine readable medium and other such variations) typically covers forms of non-transitory tangible medium and transitory propagating signals per se in view of the ordinary and customary meaning of computer readable medium. In an effort to assist applicant in overcoming a rejection or potential rejection under 35 U.S.C. 101 in this situation, the USPTO suggests the following approach. A claim drawn to such a computer readable medium that covers both transitory and non-transitory embodiments may be amended to narrow the claim to cover only statutory embodiments to avoid a rejection under 35 U.S.C. 101 by adding the limitation “non-transitory” to the claim e.g., “non-transitory computer readable medium.”

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claim(s) 1, 8 and 15-17 is/are rejected under 35 U.S.C. 103 as being unpatentable over Tadesse et al. (US 2017/0032247 A1, hereinafter Tadesse) in view of Swift et al. (US 2009/0138420 A1, hereinafter Swift).

Regarding claim 1, Tadesse teaches: A method of training an artificial neural network ([0054] e.g., “A DCN may be trained.” [0023] e.g., “FIG. 3B is a block diagram illustrating an exemplary deep convolutional network (DCN) in accordance with aspects of the present disclosure.” Examiner note: the examiner maps a deep convolutional network (DCN) in Tadesse to an artificial neural network in the application.), the method comprising: 
scaling the determined activation values for each of a plurality of the nodes in a portion of the artificial neural network ([0069] e.g., “preprocess 514 the image by scaling”; [0081] e.g., “the scale factor 838 is set to the value”. [0007] e.g., “Deep learning architectures, such as deep belief networks and deep convolutional networks, are layered neural networks architectures in which the output of a first layer of neurons becomes an input to a second layer of neurons, the output of a second layer of neurons becomes and input to a third layer of neurons, and so on.” Examiner note: the examiner maps “neurons” in Tadesse to “a set of nodes” in the application, “values” in Tadesse to “activation values” in the application, and “deep convolutional networks” in Tadesse to “artificial neural network” in the application.),
updating weights ([0065]  e.g., “Between each layer of the deep convolutional network 350 are weights (not shown) that are to be updated. The output of each layer may serve as an input of a succeeding layer in the deep convolutional network 350 to learn hierarchical feature representations from input data”) associated with each of the plurality of nodes in the portion of the artificial neural network using the determined scaled activation values to train the neural network ([0063] e.g., “The normalization layer may be used to normalize the output of the convolution filters. Examiner note: examiner considers normalized values as scaled values, “values” in Tadesse to “activation values” in the application, and “deep convolutional networks” in Tadesse to “artificial neural network” in the application.” [0054] e.g., “A DCN may be trained”).  
	However, Tadesse does not explicitly teach: determining an activation value for each node in a set of nodes of the artificial neural network, the activation values being determined by applying training data to the artificial neural network; 
each scaled activation value determined using a scaling factor associated with a corresponding one of the plurality of nodes, each scaling factor being determined based on a rank of the corresponding node.
	Swift teaches: determining an activation value for each node in a set of nodes of the artificial neural network, the activation values being determined by applying training data to the artificial neural network. ([0053] e.g., “In the above example in which ten inputs define the input vector, the artificial neural network would correspondingly have ten input nodes with each node receiving a respective input.” [0021] e.g., “the artificial neural network may be trained with a plurality of pre-defined inputs and pre-defined outputs.”); 
each scaled activation value determined using a scaling factor associated with a corresponding one of the plurality of nodes, each scaling factor being determined based on a rank of the corresponding node ([0055] e.g., “A scaling factor is then determined with the scaling factor being representative of the manner in which the output of a respective node must be increased or decreased in order to match the anticipated output.” Examiner note: examiner maps “output” in Swift to “scaled activation value” in the application because an output from a node is produced after applying a scaling factor to a value (or activation value); a scaled value (or scaled activation value) is generated by applying a scaling factor to a value.
	Tadesse and Swift are analogous art because they are in the same field of endeavor of neural network. It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify the method of Tadesse to incorporate the method for determining scale factors of Swift. The motivation/suggestion for doing this would be for the purpose of reducing the error between the actual and anticipated outputs (Swift [0055] e.g., “In an effort to reduce the error between the actual and anticipated outputs, the method of one embodiment employs back-propagation to initially examine the output nodes 38 and to determine what the output of each node should have been in order to have generated the anticipated output.”).

Regarding claim 8, Tadesse teaches: The method according to claim 1.
Tadesse further teaches: … the plurality of the nodes... ([0007] e.g., “Deep learning architectures, such as deep belief networks and deep convolutional networks, are layered neural networks architectures in which the output of a first layer of neurons becomes an input to a second layer of neurons, the output of a second layer of neurons becomes and input to a third layer of neurons, and so on.” Examiner note: the examiner maps “neurons” in Tadesse to “a set of nodes” in the application.).

	Swift teaches: wherein the scaling factor associated with one of the plurality of the nodes is repeated in at least one scaling factor associated with another node in the plurality of the nodes ([0055] e.g., “The foregoing process relating to the determination of a scaling factor for each node is then repeated at each level followed by the assignment of blame for a respective local error to corresponding nodes of a previous level”).  
	Tadesse and Swift are analogous art because they are in the same field of endeavor of neural network. It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify the method of Tadesse to incorporate the method for determining scale factors of Swift. The motivation/suggestion for doing this would be for the purpose of reducing the error between the actual and anticipated outputs (Swift [0055] e.g., “In an effort to reduce the error between the actual and anticipated outputs, the method of one embodiment employs back-propagation to initially examine the output nodes 38 and to determine what the output of each should have been in order to have generated the anticipated output.).

Regarding claim 15, Tadesse teaches: A computer readable medium having a computer program stored thereon for training an artificial neural network ([0110] e.g., “a computer-readable medium having instructions stored” [0054] e.g., “A DCN may be trained” Examiner note: the examiner maps deep convolutional network (DCN) in Tadesse to an artificial neural network in the application.) of claim 1, and is similarly analyzed.

Regarding claim 16, Tadesse teaches: An apparatus for training an artificial neural network ([0092] e.g., “any apparatus configured to perform the functions recited by the aforementioned means”; [0054] e.g., “A DCN may be trained” Examiner note: the examiner maps deep convolutional network (DCN) in Tadesse to an artificial neural network in the application.) of claim 1, and is similarly analyzed.

Regarding claim 17, Tadesse teaches: A system for training an artificial neural network ([0054] e.g., “A DCN may be trained” Examiner note: the examiner maps deep convolutional network (DCN) in Tadesse to an artificial neural network in the application.), the system comprising: 
a memory for storing data and a computer program; a processor coupled to the memory for executing said computer program, said computer program comprising instructions ([0014] e.g., “a memory and at least one processor coupled to the memory” [0110] e.g., “computer program product may comprise a computer-readable medium having instructions”) of claim 1, and is similarly analyzed.

Claim(s) 2 is/are rejected under 35 U.S.C. 103 as being unpatentable over Tadesse in view of Swift, further in view of Madsen et al. (US 2018/0279939 A1, hereinafter Madsen).

Regarding claim 2, Tadesse in view of Swift teaches: The method according to claim 1.
Tadesse further teaches: … the plurality of the nodes... activation value of the node ([0007] e.g., “Deep learning architectures, such as deep belief networks and deep convolutional networks, are layered neural networks architectures in which the output of a first layer of neurons becomes an input to a second layer of neurons, the output of a second layer of neurons becomes and input to a third layer of neurons, and so on.” Examiner note: the examiner maps “neurons” in Tadesse to “a set of nodes” in the application and “values” in Tadesse to “activation values” in the application.).
	However, Tadesse in view of Swift does not explicitly teach: wherein the rank of each of the plurality of the nodes is determined according to the activation value of the node.
	Madsen teaches: wherein the rank of each of the plurality of the nodes is determined according to the activation value of the node ([0042] e.g., “the causal nodes can be color-coded to indicate rank ordering of the summed value to distinguish the most influential causal nodes from the least influential nodes”; [0042] e.g., “normalized by its maximum value to scale the values between zero and one”).
	Tadesse in view of Swift and Madsen are analogous art because they are directed to the method of calculating node data. It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to (Madsen [0042] e.g., “the computed causal connectivity can be visualized on a schematic map of grid locations”).

Claim(s) 3, 5, and 13 is/are rejected under 35 U.S.C. 103 as being unpatentable over Tadesse in view of Swift, further in view of Wendker et al. (US 2015/0347530 A1, hereinafter Wendker).

Regarding claim 3, Tadesse in view of Swift teaches: The method according to claim 1.
	Tadesse further teaches: the plurality of the nodes... activation value … ([0007] e.g., “Deep learning architectures, such as deep belief networks and deep convolutional networks, are layered neural networks architectures in which the output of a first layer of neurons becomes an input to a second layer of neurons, the output of a second layer of neurons becomes and input to a third layer of neurons, and so on.” Examiner note: the examiner maps “neurons” in Tadesse to “a set of nodes” in the application and “values” in Tadesse to “activation values” in the application.).
	However, Tadesse in view of Swift does not explicitly teach: wherein the plurality of the nodes are ranked from a highest activation value to a lowest activation value, the 
	Wendker teaches: wherein the plurality of the nodes are ranked from a highest activation value to a lowest activation value, the node having the highest activation value being assigned lowest rank, and the node having the lowest activation value being assigned highest rank ([0023] e.g., “the values can be ranked from smallest to largest with the smallest being designated as the highest ranking”).
	Tadesse in view of Swift and Wendker are analogous art because they are directed to the method of sorting. It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify the method of Tadesse in view of Swift to incorporate the method for sorting list of results of Wendker. The motivation/suggestion for doing this would be for the purpose of sorting conversion results (Wendker [0023] e.g., “The sorted conversion results region 18 can show a sorted list of conversion results for a given input value”).

Regarding claim 5, Tadesse in view of Swift teaches: The method according to claim 1. 
Tadesse further teaches: wherein nodes with activation values between the highest activation value and the lowest activation value have scaling factors between the highest scaling factor and the lowest scaling factor ([0084] e.g., “The sort function 842 sorts the normalized scores”; [0086] e.g., “a value corresponding to a maximum F-score”; [0082] e.g., “defining minimum and maximum scale factors”;  [0007] e.g., “Deep learning architectures, such as deep belief networks and deep convolutional networks, are layered neural networks architectures in which the output of a first layer of neurons becomes an input to a second layer of neurons, the output of a second layer of neurons becomes and input to a third layer of neurons, and so on.” Examiner note: the examiner maps “neurons” in Tadesse to “a set of nodes” in the application and “values” in Tadesse to “activation values” in the application.)).
	However, Tadesse in view of Swift does not explicitly teach: wherein the plurality of the nodes are ranked from a highest activation value to a lowest activation value, the node having the highest activation value being assigned lowest rank, and the node having the lowest activation value being assigned highest rank.
	Wendker teaches: wherein the plurality of the nodes are ranked from a highest activation value to a lowest activation value, the node having the highest activation value being assigned lowest rank, and the node having the lowest activation value being assigned highest rank ([0023] e.g., “the values can be ranked from smallest to largest with the smallest being designated as the highest ranking”).
Tadesses in view of Swift and Wendker are analogous art because they are directed to the method of sorting. It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify the method of Tadesse in view of Swift to incorporate the method for sorting list of results of Wendker. The motivation/suggestion for doing this would be for the purpose of sorting conversion results (Wendker [0023] e.g., “The sorted conversion results region 18 can show a sorted list of conversion results for a given input value”).

Regarding claim 13, Tadesse in view of Swift teaches: The method according to claim 1.
Tadesse further teaches: … the activation values of the plurality of the nodes in the portion of the artificial neural network… ([0007] e.g., “Deep learning architectures, such as deep belief networks and deep convolutional networks, are layered neural networks architectures in which the output of a first layer of neurons becomes an input to a second layer of neurons, the output of a second layer of neurons becomes and input to a third layer of neurons, and so on.” Examiner note: the examiner maps “neurons” in Tadesse to “a set of nodes” in the application, “values” in Tadesse to “activation values” in the application, and “deep convolutional networks” in Tadesse to “artificial neural network” in the application.).
	However, Tadesse in view of Swift does not explicitly teach: further comprising sorting the activation values of the plurality of the nodes in the portion of the artificial neural network in order of decreasing value, and determining the rank of each node in an incrementing sequence.
	Wendker teaches: further comprising sorting the activation values of the plurality of the nodes in the portion of the artificial neural network in order of decreasing value, and determining the rank of each node in an incrementing sequence ([0024] e.g., “the sorting calculates a score for each value in the conversion results that is used to rank the conversion results”; [0023] e.g., “the values can be ranked from smallest to largest with the smallest being designated as the highest ranking”).
(Wendker [0023] e.g., “The sorted conversion results region 18 can show a sorted list of conversion results for a given input value”).

Claim(s) 4 is/are rejected under 35 U.S.C. 103 as being unpatentable over Tadesse in view of Swift, further in view of and Wendker and Walker (US 8743234 B1, hereinafter Walker).

Regarding claim 4, Tadesse in view of Swift teaches: The method according to claim 1.
Tadesse further teaches: the plurality of the nodes... activation value… ([0007] e.g., “Deep learning architectures, such as deep belief networks and deep convolutional networks, are layered neural networks architectures in which the output of a first layer of neurons becomes an input to a second layer of neurons, the output of a second layer of neurons becomes and input to a third layer of neurons, and so on.” Examiner note: the examiner maps “neurons” in Tadesse to “a set of nodes” in the application and “values” in Tadesse to “activation values” in the application.).

wherein the scaling factor for the node with lowest rank is a lowest scaling factor, and the scaling factor for the node with highest rank is a highest scaling factor.
Wendker teaches: wherein the plurality of the nodes are ranked from a highest activation value to a lowest activation value, the node having the highest activation value being assigned lowest rank, and the node having the lowest activation value being assigned highest rank ([0023] e.g., “the values can be ranked from smallest to largest with the smallest being designated as the highest ranking”).
	Tadesse  in view of Swift and Wendker are analogous art because they are directed to the method of sorting. It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify the method of Tadesse in view of Swift to incorporate the method for sorting list of results of Wendker. The motivation/suggestion for doing this would be for the purpose of sorting conversion results (Wendker [0023] e.g., “The sorted conversion results region 18 can show a sorted list of conversion results for a given input value”).
	Walker teaches: wherein the scaling factor for the node with lowest rank is a lowest scaling factor, and the scaling factor for the node with highest rank is a highest scaling factor (Col. 1, ll. 51-56 e.g., “progressively scaling the pixel values inversely according to their magnitudes such that pixel values with smaller magnitudes are scaled with larger scale factors and pixel values with larger magnitudes are scaled according to smaller scale factors”).  
	Tadesse  in view of Swift and Walker are analogous art because they are directed to the method of scaling data. It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify the method of Tadesse in view of Swift to incorporate the method of quantization of Walker. The motivation/suggestion for doing this would be for the purpose of quantizing scaled input values (Walker, Abstract e.g., “The technique includes the acts of scaling the pixel values according to their values, wherein smaller pixel values are scaled with larger scale factors and larger pixel values are scaled with smaller scale factors; quantizing the scaled pixel values; looking up the quantized pixel values in a lookup table to provide lookup values; and scaling the lookup values responsive to their scale factors to provide the gamma-corrected pixel values.”).

Claim(s) 6 is/are rejected under 35 U.S.C. 103 as being unpatentable over Tadesse in view of Swift, further in view Chen et al. (US 7921106 B2, hereinafter Chen ).

Regarding claim 6, Tadesse in view of Swift teaches: The method according to claim 1.
	However, Tadesse in view of Swift does not explicitly teach: wherein the scaling factor varies monotonically according to the rank of the associated node.  
(col. 5, l. 55 – col.6, l. 8, e.g., “The algorithm for calculating the attribute value rank is referred to as “object ranking,” which means that each attribute value can be treated as an object and, thus, the rank of this object can be calculated. One object ranking algorithm that can be utilized for attribute value ranking is Eq. 1 where:

    PNG
    media_image1.png
    100
    181
    media_image1.png
    Greyscale

where R1, . . . Rk are dynamic ranks of results which have an attribute value “attr.” The f(Sattr) can be any combination function. For example:

    PNG
    media_image2.png
    84
    208
    media_image2.png
    Greyscale

where c is a constant float number (e.g., scaling factor) that can be varied to emphasize and/or de-emphasize a ranking value.”).  
	Tadesse in view of Swift and Chen are analogous art because they are directed to the method of data scaling. It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify the method of Tadesse in view of Swift to incorporate the method of ranking search results of Chen. The motivation/suggestion for doing this would be for the purpose of utilizing multiple bases of relevancy (Chen, col. 1, ll. 49-51, e.g., “Search results are ranked utilizing multiple bases of relevancy. This allows search result lists to be further refined into relevant groupings.”).

Claim(s) 7 is/are rejected under 35 U.S.C. 103 as being unpatentable over Tadesse in view of Swift, further in view of Walker.

Regarding claim 7, Tadesse in view of Swift teaches: The method according to claim 1.
Tadesse further teaches: the plurality of the nodes ([0007] e.g., “Deep learning architectures, such as deep belief networks and deep convolutional networks, are layered neural networks architectures in which the output of a first layer of neurons becomes an input to a second layer of neurons, the output of a second layer of neurons becomes and input to a third layer of neurons, and so on.” Examiner note: the examiner maps “neurons” in Tadesse to “a set of nodes” in the application.).
	However, Tadesse in view of Swift does not explicitly teach: wherein the scaling factor associated with one of the plurality of nodes has a different value to the scaling factors associated with other nodes in the plurality of nodes.  
	Walker teaches: wherein the scaling factor associated with one of the plurality of nodes has a different value to the scaling factors associated with other nodes in the plurality of nodes (Col. 1, ll. 51-56 e.g., “pixel values inversely according to their magnitudes such that pixel values with smaller magnitudes are scaled with larger scale factors and pixel values with larger magnitudes are scaled according to smaller scale factors”).  
	Tadesse in view of Swift and Walker are analogous art because they are directed to the method of scaling data. It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify the method of Tadesse in view of Swift to incorporate the method of quantization of Walker. The motivation/suggestion for doing this would be for the purpose of quantizing scaled input values (Walker, Abstract e.g., “The technique includes the acts of scaling the pixel values according to their values, wherein smaller pixel values are scaled with larger scale factors and larger pixel values are scaled with smaller scale factors; quantizing the scaled pixel values; looking up the quantized pixel values in a lookup table to provide lookup values; and scaling the lookup values responsive to their scale factors to provide the gamma-corrected pixel values.”).

Claim(s) 9 is/are rejected under 35 U.S.C. 103 as being unpatentable over Tadesse in view of Swift, further in view of Bao-Liang et al. (US 2003/0050719 A1, hereinafter Bao-Liang).

Regarding claim 9, Tadesse in view of Swift teaches: The method according to claim 1.
	However, Tadesse in view of Swift does not explicitly teach: wherein the training data is additional training data and training the artificial neural network relates to incremental learning by the artificial neural network.
([0017] e.g., “the new training data is added”; [0017] e.g., “a pattern classifier capable of incremental learning”; [0044] e.g., “neural network is used”).
	Tadesse in view of Swift and Bao-Liang are analogous art because they are in the same field of endeavor of neural network. It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify the method of Tadesse in view of Swift to incorporate the method of incremental learning of Bao-Liang. The motivation/suggestion for doing this would be for the purpose of being capable of incremental learning (Bao-Liang [0017] e.g., “a pattern classifier capable of incremental learning according to the present invention wherein a multiclass classification problem is divided into two-class classification subproblems”).

Claim(s) 10-11 is/are rejected under 35 U.S.C. 103 as being unpatentable over Tadesse in view of Swift, further in view of Majumdar et al. (US 2017/0061328 A1, hereinafter Majumdar).

Regarding claim 10, Tadesse in view of Swift teaches: The method according to claim 1.
Tadesse further teaches: … the activation values of the plurality of the nodes ([0007] e.g., “Deep learning architectures, such as deep belief networks and deep convolutional networks, are layered neural networks architectures in which the output of a first layer of neurons becomes an input to a second layer of neurons, the output of a second layer of neurons becomes and input to a third layer of neurons, and so on.” Examiner note: the examiner maps “neurons” in Tadesse to “a set of nodes” in the application and “values” in Tadesse to “activation values” in the application.).
	However, Tadesse in view of Swift does not explicitly teach: further comprising determining a sparsity penalty value based on relative ranking of the activation values of the plurality of the nodes.
	Majumdar teaches: further comprising determining a sparsity penalty value based on relative ranking of the activation values of the plurality of the nodes ([0065] e.g., "This error is the penalty term and one goal is to reduce the error to zero. In accordance with aspects of the present disclosure, a second penalty term may be added. The second penalty term may comprise a norm of the activations of the layer for which sparsity is desired." [0072] e.g., "In a second example, the element values may be quantized. In this example, all “surviving quantities” or the K highest values may be encoded with a 1 and all others with a 0. For instance, if the desired sparsity is 80% and the vector size is 10, then 8 of the lowest quantities may be set to 0 and the two surviving quantities (e.g. highest element values) to 1.").
	Tadesse in view of Swift and Majumdar are analogous art because they are in the same field of endeavor of machine learning. It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify the method of Tadesse in view of Swift to incorporate the method of generating (Majumdar, [0065] e.g., “Because the goal is to minimize the number of non-zero elements in the feature vector, this second penalty term may in some aspects, comprise a count of the number of non-zero terms in that layer.”).

Regarding claim 11, Tadesse in view of Swift teaches: The method according to claim 1.
	However, Tadesse in view of Swift does not explicitly teach: further comprising determining a sparsity penalty value using the scaled activation values, and updating the weights relates to the sparsity penalty value.
	Majumdar teaches: further comprising determining a sparsity penalty value using the scaled activation values, and updating the weights relates to the sparsity penalty value ([0055] e.g., “The SceneDetect Backend Engine 512 may be configured to further preprocess 514 the image by scaling 516 and cropping 518.” [0065] e.g., “In a DCN, training progresses by making the weight updates as a function of the error between the predicted label and the actual label. This error is the penalty term and one goal is to reduce the error to zero. In accordance with aspects of the present disclosure, a second penalty term may be added. The second penalty term may comprise a norm of the activations of the layer for which sparsity is desired.”).
	Tadesse in view of Swift and Majumdar are analogous art because they are in the same field of endeavor of machine learning. It would have been obvious to a person (Majumdar, [0065] e.g., “Because the goal is to minimize the number of non-zero elements in the feature vector, this second penalty term may in some

Claim(s) 12 is/are rejected under 35 U.S.C. 103 as being unpatentable over Tadesse in view of Swift, further in view of  Majumdar and Staelin et al. (US 2004/0260662 A1, hereinafter Staelin).

Regarding claim 12, Tadesse in view of Swift teaches: The method according to claim 1.
	However, Tadesse in view of Swift does not explicitly teach: further comprising determining a sparsity penalty value using the scaled activation values, 
determining a derivative of the sparsity penalty value and updating the weights based on the determined derivative.
	Majumdar teaches: further comprising determining a sparsity penalty value using the scaled activation values ([0055] e.g., “The SceneDetect Backend Engine 512 may be configured to further preprocess 514 the image by scaling 516 and cropping 518.” [0065] e.g., “This error is the penalty term and one goal is to reduce the error to zero. In accordance with aspects of the present disclosure, a second penalty term may be added. The second penalty term may comprise a norm of the activations of the layer for which sparsity is desired.”).
Tadesse in view of Swift and Majumdar are analogous art because they are in the same field of endeavor of machine learning. It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify the method of Tadesse in view of Swift to incorporate the method of generating sparse feature vector of Majumdar. The motivation/suggestion for doing this would be for the purpose of minimizing the number of non-zero elements in the feature vector (Majumdar, [0065] e.g., “Because the goal is to minimize the number of non-zero elements in the feature vector, this second penalty term may in some
Staelin teaches: determining a derivative of the sparsity penalty value ([0081] e.g., “computing derivatives of the penalized errors”) and 
updating the weights based on the determined derivative ([0037] e.g., “derivatives are computed from the errors (318), back-propagation is performed (320), and node weights are further adjusted (322)”).
	Tadesse in view of Swift and Staelin are analogous art because they are in the same field of endeavor of neural network. It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify the method of Tadesse in view of Swift to incorporate the method of error measure of Staelin. The motivation/suggestion for doing this would be for the purpose of computing derivatives from the errors for back-propagation (Staelin [0037] e.g., “For each iteration (312-322), an upscaled image is generated from the input image and the adjusted weights (314), errors are computed (316), derivatives are computed from the errors (318), back-propagation is performed (320), and node weights are further adjusted (322).”).

Claim(s) 14 is/are rejected under 35 U.S.C. 103 as being unpatentable over Tadesse in view of Swift, further in view of Min et al. (US 2014/0309122 A1, hereinafter Min) and Wendker.

Regarding claim 14, Tadesse in view of Swift teaches: The method according to claim 1.
Tadesse further teaches: … activation values of the plurality of the nodes … ([0007] e.g., “Deep learning architectures, such as deep belief networks and deep convolutional networks, are layered neural networks architectures in which the output of a first layer of neurons becomes an input to a second layer of neurons, the output of a second layer of neurons becomes and input to a third layer of neurons, and so on.” Examiner note: the examiner maps “neurons” in Tadesse to “a set of nodes” in the application and “values” in Tadesse to “activation values” in the application.).
	However, Tadesse in view of Swift does not explicitly teach: further comprising normalising the determined activation values of the plurality of the nodes using a mean of the determined activation values of the nodes, and determining the rank of the nodes using the normalised activation values.
	Min teaches: further comprising normalising the determined activation values of the plurality of the nodes using a mean of the determined activation values of the nodes ([0022] e.g., “When computing feature interactions as features, the system can take products of pairwise features first and then the system can perform normalization, which often results in better performance than products of normalized feature values on expression datasets.”; [0042] e.g., “the system performs feature standardization before running Lasso or Group Lasso. Instead of using the original quadratic interactions xjxk between pairwise variables xj and xk, the system standardizes xjxk by g(xjxk) as input feature, where

    PNG
    media_image3.png
    42
    95
    media_image3.png
    Greyscale

and µ and σ are respectively the mean and standard deviation of feature x.”), and 
Tadesse in view of Swift and Min are analogous art because they are in the same field of endeavor of machine learning. It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify the method of Tadesse in view of Swift to incorporate the method of sparse learning of Min. The motivation/suggestion for doing this would be for the purpose of reducing feature space (Min, [0019] e.g., “This reduced search space then enables the system to look for combinations of interacting pairs of informative genes in a more practical sparse learning setting.”).
Wendker teaches: determining the rank of the nodes using the normalised activation values ([0023] e.g., “the values can be ranked from smallest to largest with the smallest being designated as the highest ranking”).
	Tadesse in view of Swift and Wendker are analogous art because they are directed to the method of sorting. It would have been obvious to a person having (Wendker [0023] e.g., “The sorted conversion results region 18 can show a sorted list of conversion results for a given input value”).

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JAEYONG J PARK whose telephone number is (571) 272-3898. The examiner can normally be reached on M-F 9:00 a.m. - 6:00 p.m.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. 
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ann Lo can be reached at (571) 272-9767. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business 

/JAEYONG J PARK/Examiner, Art Unit 2126
/ANN J LO/Supervisory Patent Examiner, Art Unit 2126