DETAILED ACTION

The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

This action is responsive to the original application filed on 8/3/2018.

Claim Rejections - 35 USC § 112

The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 1-20 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.

The terms "nearest" and “furthest” in claims 1, 8, and 15 are relative terms which render the claims indefinite.  The terms "nearest" and “furthest” are not defined by the claim, the specification does not provide a standard for ascertaining the requisite degree, and one of ordinary skill in the art would not be reasonably apprised of the scope of the invention.  What does it mean for an output to be “nearest” to or “furthest” from a measure of central tendency?  There does not appear to be an ascertainable standard for what these terms mean in the specification.  Is it a numeric difference, or is something else?  How does one ascertain an output to be “nearest” or “furthest”? Please explain.  For examination purposes, the terms “nearest” and “furthest” will be interpreted to mean any value for an output that is associated with or subject to a constraint of a measure of central tendency.  Appropriate correction is required.  Dependent claims 2-7, 9-14, and 16-20 are rejected under 35 USC 112(b) by virtue of their dependency on indefinite claims 1, 8, and 15.

The terms "next nearest" and “next furthest” in claims 5, 12, and 19 are relative terms which render the claims indefinite.  The terms "next nearest" and “next furthest” are not defined by the claim, the specification does not provide a standard for ascertaining the requisite degree, and one of ordinary skill in the art would not be reasonably apprised of the scope of the invention.  What does it mean for an output to be “next nearest” to or “next furthest” from a measure of central tendency?  There does not appear to be an ascertainable standard for what these terms mean in the specification.  Is it a numeric difference, or is something else?  Please explain.  For examination purposes, the terms “next nearest” and “next furthest” will be interpreted to mean any value for an output that is associated with or subject to a constraint of a measure of central tendency.  Appropriate correction is required.



Claim Rejections - 35 USC § 101

35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 8-14 are rejected under 35 U.S.C. § 101 because the claimed invention is directed towards non-statutory subject matter.  The claims do not fall within at least one of the four categories of patent eligible subjected matter because the claimed invention is directed towards signals per se.

Looking to the originally filed specification, paragraph [0024] discloses “[t]he term ‘machine-readable medium’ shall also be taken to include any tangible medium that is capable of storing, encoding or carrying instructions for execution by the machine and that cause the machine to perform any one or more of the methodologies of the present invention, or that is capable of storing, encoding or carrying data structures utilized by or associated with such instructions. The term ‘machine-readable medium’ shall accordingly be taken to include, but not be limited to, solid-state memories, and optical and magnetic media”. Under a broadest reasonable interpretation of the claim language, the “computer readable medium” of claim 8 and its dependents may encompass transitory signals as transitory forms of signal communication are not excluded, and is thus directed towards signals per se. Examiner suggests amending claim 8 and its dependents to recite a "A non-transitory computer readable medium ".  Appropriate correction is required.
Claim Rejections - 35 USC § 103

In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

The following is a quotation of 35 U.S.C. § 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 1-20 are rejected under 35 U.S.C. § 103 as being obvious over Wang et al. (Wang et al., “Deep Growing Learning”, Oct. 2017, 2017 IEEE International Conference on Computer Vision (ICCV), pp. 2831-2839, hereinafter “Wang”) in view of Baker et al. (US 20190095798 A1, hereinafter “Baker”).

Regarding claim 1, Wang discloses [a] process comprising: (Abstract; "To get around this bottleneck, we propose a bio­ inspired SS-L framework on deep neural network, namely Deep Growing Learning (DGL). Specifically, we formulate the SSL as an EM-like process, where the deep network alternately iterates between automatically growing convolutional layers and selecting reliable pseudo-labeled data for training."; and Page 2832, Column 1, lines 40-41: ''Along with the line of third-group method, we propose a novel growing learning framework.")
(a) creating a seed artificial neural network; (Page 2834, Column 1, lines 10-13: "The proposed deep growing learning framework trains a shallow network and selects the confident prediction examples as the next iteration of the loop to train a deeper network."; and Page 2834, Column 1, lines 21-25: ''As a start, the growing sub­ network contains only one building block. This sub-network serves as a seed that can automatically grows a new building block when the network can not fit the labeled data and the increasing pseudo­ labeled data."; and Page 2835 [Algorithm 1J numbered step 1: "initialize a shallow net Net”]
(b) coupling a temporary classifier to the seed artificial neural network; (Page 2832, Column 2, lines 23-25: "...the ability of the classifier is improved by dynamically growing new layers."; and Page 2833, Column 2, lines 21-22: "Deep growing learning aims to learn an accurate classifier with a small amount of labeled data and a large amount of unlabeled data."; and Page 2834, Column 1, lines 35-39: "Supervised sub-network. The supervised sub-network ("SoftmaxWithloss" layer) computes the multinomial logistic loss for a one-of-many classification task, passing real-valued prediction through a softmax to get a probability distribution over classes."; and Page 2834, Column 1, lines 45-46: "Because of the limited classifying ability of the shallow classifier C..."; and Page 2834, Column 2, lines 22-23:" (ii) The more accurate classifier we have, the more pseudo-labeled data we select;"; and Page 2834, Column 2, line 29: "We first consider a one-layer classifier CLnett···; and Figure 1: Overview)
(c) training the seed artificial neural network and the temporary classifier using all classes in a dataset; (Page 2832, Column 1, lines 48-49: "Second, labeled and pseudo­ labeled examples (if any) are used to train the network."; and Page 2833, Column 2, line 26-29: "...we firstly train a shallow network with labeled data and subsequently feed the unlabeled data to pick up the confident ones as pseudo-labeled data, which is further used to train a deeper network."; and Figure 1: Overview)
(h) for each class in the dataset, adding a first perceptron associated with the first class member and a second perceptron associated with the second class member to a new layer in the seed artificial neural network; (Page 2832, Column 2, lines 23-25: "...the ability of the classifier is improved by dynamically growing new layers"; and Page 2834, Column 1, lines 21-29: · s a start, the growing sub­ network contains only one building block. This sub-network serves as a seed that can automatically grows a new building block when the network can not fit the labeled data and the increasing pseudo­ labeled data. We copy the parameters of the previous growing sub­ network to the new one and then fine-tune the network globally. The sub-network continues growing up before it goes into over-fitting."; and Page 2835, Algorithm 1, numbered step 15: "Grow a new layer;"; and Figure 1: Overview)
 (i) for each class in the dataset, connecting inputs of the first perceptron and inputs of the second perceptron to the perceptron outputs in the seed artificial neural network that are determined to have output values that are a threshold variance from the measure of central tendency; (Page 2833, Column 2, lines 26-29: "...we firstly train a shallow network with labeled data and subsequently feed the unlabeled data to pick up the confident ones as pseudo-labeled data, which is further used to train a deeper network."; and Page 2834, Column 1, lines 21-29: · “start, the growing sub­ network contains only one building block. This sub-network serves as a seed that can automatically grows a new building block when the network can not fit the labeled data and the increasing pseudo­ labeled data. We copy the parameters of the previous growing sub­ network to the new one and then fine-tune the network globally. The sub-network continues growing up before it goes into over-fitting."; and Figure 1, Overview)
(j) adding a classifier to perceptron outputs of the first perceptron and outputs of the second perceptron; and (Page 2834, Column 1, lines 14-17:"Specifically, our architecture is comprised of four sub-networks: growing sub-network, fixed sub­
network, supervised sub-network, and selection sub-network (Fig. 1)”; and Page 2834, Column 1, lines 42-45: "Selection sub-network. After renewing the network structure and updating the parameters by K­ iteration training, all unlabeled examples are fed into this sub­ network ("Softmax" layer) to predict their labels”; and Figure 1, Overview)
(k) training the seed artificial neural network and the new layer on all members in the dataset (Page 2834, Column 1, lines 42-44: "Selection sub-network. After renewing the network structure and updating the parameters by K­ iteration training..."; and Page 2835, Column 1, lines 3-6: "The training process then enters a loop according to Eq. 2,3,4. In this way, the DGL model boosts itself up to automatically fit the increasing data.").
Wang fails to explicitly disclose (d) for each class in the dataset, applying all members of each class to the seed artificial neural network; (e) for each class in the dataset, calculating a measure of central tendency of perceptron outputs in the seed artificial neural network; (f) for each class in the dataset, selecting a first class member that generates a first perceptron output nearest to the measure of central tendency and selecting a second class member that generates a second perceptron output furthest from the measure of central tendency; (g) for each class in the dataset, analyzing the perceptron outputs in the seed artificial neural network to determine a measure of variance from the measure of central tendency when the first class member and the second class member are applied to the seed artificial neural network.
Baker discloses (d) for each class in the dataset, applying all members of each class to the seed artificial neural network; ([0016]; “In this embodiment as well as in autoencoders in general, the input 103 is encoded by an encoder network 104 to a reduced representation in a bottleneck layer, herein represented in the form of sample random variables 105”, which suggests applying all data of all classes to the artificial NN or encoder; and Figure 5 and [0013]; the input data (all members of each class of the NN) are applied to the artificial NN)
(e) for each class in the dataset, calculating a measure of central tendency of perceptron outputs in the seed artificial neural network; ([0016]; “Preferably, the parameters of each parametric distribution include a measure of central tendency, such as the mean (122), and a measure of dispersion, such as the standard deviation (123) and, optionally, other parameters (124), all controlled by hyperparameters 121”, which discloses the calculation and consideration of a measure of central tendency or mean of perceptron outputs; and Figure 1, 122 and Figure 2, 222)
(f) for each class in the dataset, selecting a first class member that generates a first perceptron output nearest to the measure of central tendency and selecting a second class member that generates a second perceptron output furthest from the measure of central tendency; ([0022]; “Some embodiments can use a different norm for the means than for the standard deviations. Some embodiments can constrain the means to have a norm less than or equal to the specified constraint, while some embodiments can constrain the means to have a norm equal to the specified value. Some embodiments can constrain the standard deviations to have a norm greater than or equal to the specified value, while some can constrain the standard deviations to have a norm equal to the specified value. The specified value of each norm is controlled by a hyperparameter. Some embodiments have a hyperparameter for each mean and each standard deviation, whereas some embodiments can use a default value, say 1.0, for each norm” (emphasis added), which discloses, in view of the 112b indefiniteness rejection above, the use of a hyperparameter to select first and second class members that generates outputs nearest and furthest, based on a constrained standard deviation controlled by a hyperparameter, from a measure of central tendency or a mean; and [0025]; “For example, in some embodiments, each standard deviation may be set to the value 1.0.”)
(g) for each class in the dataset, analyzing the perceptron outputs in the seed artificial neural network to determine a measure of variance from the measure of central tendency when the first class member and the second class member are applied to the seed artificial neural network ([0016]; “Preferably, the parameters of each parametric distribution include a measure of central tendency, such as the mean (122), and a measure of dispersion, such as the standard deviation (123) and, optionally, other parameters (124), all controlled by hyperparameters 121. Means (122) and standard deviations (123) or variances are sufficient parameters . . . The encoder 104 generates the probability distribution parameters 122, 123, 124 from the input data 103 based on the controlling hyperparameters 121. The computer system 400 then generates sample random variables 105 (e.g., through a random number generator program) that adhere to or satisfy the probability distribution parameters 122-124 for input to the decoder 106. FIG. 1 shows that, and the description below assumes that, means (122) and standard deviations 123 are used, but in other embodiments, other statistics of central tendency than means may be used and other dispersion statistics may be used, such as variances in lieu of standard deviations”, which discloses that the parameters of the perceptron outputs are analyzed to determine a variance or standard deviation from a measure of central tendency or mean when the input class members are applied to the artificial NN through an encoder; and Figure 1, Elements 103, 104 and 124; the input 103 or classes in the dataset are applied to the NN encoder 104 to then analyze perceptron output variances at 124).
Wang and Baker are analogous art because both are concerned with neural network analytics.  Before the effective filing date of the claimed invention, it would have been obvious to one skilled in neural network analytics to combine the consideration of perceptron outputs’ central tendencies and measures of variances as taught by Baker with the process and seed network of Wang to yield the predictable result of (d) for each class in the dataset, applying all members of each class to the seed artificial neural network; (e) for each class in the dataset, calculating a measure of central tendency of perceptron outputs in the seed artificial neural network; (f) for each class in the dataset, selecting a first class member that generates a first perceptron output nearest to the measure of central tendency and selecting a second class member that generates a second perceptron output furthest from the measure of central tendency; (g) for each class in the dataset, analyzing the perceptron outputs in the seed artificial neural network to determine a measure of variance from the measure of central tendency when the first class member and the second class member are applied to the seed artificial neural network. The motivation for doing so would be to represent the latent variables that describe a data distribution that can produce or reproduce data examples in high-dimensional space (Baker; [0002]).

Regarding claim 8, it is a computer readable media claim that corresponds to the steps of claim 1, and is rejected for the same reasons as claim 1.

Regarding claim 15, it is a system claim that corresponds to the steps of claim 1, and is rejected for the same reasons as claim 1.

Regarding claims 2, 9, and 16, the rejection of claims 1, 8, and 15 are incorporated and Wang further discloses wherein the seed layer comprises a two-layer convolutional neural network (Table 1 and Page 2836, Column 2, lines 24-26; “For the growing neural network, we use a convolutional network with 4 building blocks, each of which consists of two convolutional layers”).

Regarding claims 3, 10, and 17, the rejection of claims 1, 8, and 15 are incorporated and Wang further discloses decoupling the temporary classifier from the seed artificial neural network after the seed artificial neural network and the temporary classifier have been trained (Page 2834 Column 1 lines 25-27; “We copy the parameters of the previous growing sub-network to the new one and then fine-tune the network globally. The sub-network continues growing up before it goes into overfitting”; and Page 2835, Algorithm 1; the algorithm discloses decoupling or ending the training of the temporary classifier from the seed network after training).

Regarding claims 4, 11, and 18, the rejection of claims 1, 8, and 15 are incorporated and Wang further discloses training the seed artificial neural network using all classes in the dataset and training the temporary classifier using output of the seed artificial neural network (Page 2834, Column 2, lines 29-30; “We first consider a one-layer classifier CL net1 , which is trained over the limited labeled data set L. According to Eq. 1, we select a set of pseudo-labeled data U from a set of unlabeled data U by U = {x|p(y = l|x, CL net1 ) > α, x ∈ U} (2) where p(y = l|x, CL net1 ) represents the probability of the candidate pseudo-label given a point x and a one-layer classifier CL net1 . Using the set L U , we then train a more accurate classifier CL U net1 based on Assumption (i). We re-select a new set of pseudo-labeled data U by U = {x|p(y = l|x, CL U net1 ) > α, x ∈ U} (3)”; and Page 2835, Column 1, lines 3-6, “The training process then enters a loop according to Eq. 2, 3, 4. In this way, the DGL model boosts itself up to automatically fit the increasing data”; and Page 2835 Algorithm 1; the algorithm discloses training the seed ANN using all classes in the dataset and training the temporary classifier; and Figure 1).

Regarding claims 5, 12, and 19, the rejection of claims 1, 8, and 15 are incorporated and Wang further discloses analyzing output of the seed artificial neural network and the new layer; (Page 2835, Algorithm 1; the algorithm discloses the analyzing of the output of the seed NN and new layer at steps 10, 13, and 17 of the algorithm).
Wang fails to explicitly disclose selecting a next nearest first class member and a next furthest second class member; and repeating operations (g) - (k) using the next nearest first class member and the next furthest second class member.
Baker discloses selecting a next nearest first class member and a next furthest second class member; and ([0022]; “Some embodiments can use a different norm for the means than for the standard deviations. Some embodiments can constrain the means to have a norm less than or equal to the specified constraint, while some embodiments can constrain the means to have a norm equal to the specified value. Some embodiments can constrain the standard deviations to have a norm greater than or equal to the specified value, while some can constrain the standard deviations to have a norm equal to the specified value. The specified value of each norm is controlled by a hyperparameter. Some embodiments have a hyperparameter for each mean and each standard deviation, whereas some embodiments can use a default value, say 1.0, for each norm” (emphasis added), which discloses, in view of the 112b indefiniteness rejection above, the use of a hyperparameter to select first and second class members that generates outputs next nearest and next furthest, based on a constrained standard deviation controlled by a hyperparameter, from a measure of central tendency or a mean; and [0025]; “For example, in some embodiments, each standard deviation may be set to the value 1.0.”)
repeating operations (g) - (k) using the next nearest first class member and the next furthest second class member (See rejection of claim 1 above).
The motivation to combine Wang and Baker is the same as discussed above with respect to claim 1.

Regarding claims 6 and 13, the rejection of claims 1 and 8 are incorporated but Wang fails to explicitly disclose wherein the variance measure comprises a standard deviation and wherein the threshold variance comprises one standard deviation.
Baker discloses wherein the variance measure comprises a standard deviation and ([0016]; “FIG. 1 shows that, and the description below assumes that, means (122) and standard deviations 123 are used, but in other embodiments, other statistics of central tendency than means may be used and other dispersion statistics may be used, such as variances in lieu of standard deviations”, which discloses that the variance measure may comprise a standard deviation)
wherein the threshold variance comprises one standard deviation ([0022]; “Some embodiments can constrain the standard deviations to have a norm greater than or equal to the specified value, while some can constrain the standard deviations to have a norm equal to the specified value. The specified value of each norm is controlled by a hyperparameter. Some embodiments have a hyperparameter for each mean and each standard deviation, whereas some embodiments can use a default value, say 1.0, for each norm” (emphasis added), which discloses a threshold variance or standard deviation that is controlled by a hyperparameter selection such as one standard deviation; and [0025]; “For example, in some embodiments, each standard deviation may be set to the value 1.0.”).
The motivation to combine Wang and Baker is the same as discussed above with respect to claim 1.

Regarding claims 7 and 14, the rejection of claims 1 and 8 are incorporated but Wang fails to explicitly disclose wherein the measure of central tendency comprises a mean, a median, or a mode.
Baker discloses wherein the measure of central tendency comprises a mean, a median, or a mode ([0016]; “Preferably, the parameters of each parametric distribution include a measure of central tendency, such as the mean (122)”).
The motivation to combine Wang and Baker is the same as discussed above with respect to claim 1.

Regarding claim 20, the rejection of claim 15 is incorporated but Wang fails to explicitly disclose wherein the variance measure comprises a standard deviation; wherein the threshold variance comprises one standard deviation; and wherein the measure of central tendency comprises a mean, a median, or a mode.
Baker discloses wherein the variance measure comprises a standard deviation; ([0016]; “FIG. 1 shows that, and the description below assumes that, means (122) and standard deviations 123 are used, but in other embodiments, other statistics of central tendency than means may be used and other dispersion statistics may be used, such as variances in lieu of standard deviations”, which discloses that the variance measure may comprise a standard deviation)
wherein the threshold variance comprises one standard deviation; and ([0022]; “Some embodiments can constrain the standard deviations to have a norm greater than or equal to the specified value, while some can constrain the standard deviations to have a norm equal to the specified value. The specified value of each norm is controlled by a hyperparameter. Some embodiments have a hyperparameter for each mean and each standard deviation, whereas some embodiments can use a default value, say 1.0, for each norm” (emphasis added), which discloses a threshold variance or standard deviation that is controlled by a hyperparameter selection such as one standard deviation; and [0025]; “For example, in some embodiments, each standard deviation may be set to the value 1.0.”)
wherein the measure of central tendency comprises a mean, a median, or a mode ([0016]; “Preferably, the parameters of each parametric distribution include a measure of central tendency, such as the mean (122)”).
The motivation to combine Wang and Baker is the same as discussed above with respect to claim 1.




Conclusion

Any inquiry concerning this communication or earlier communications from the examiner should be directed to Brent Hoover whose telephone number is (303)297-4403.  The examiner can normally be reached on Monday - Friday 9-5 MST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kamran Afshar can be reached on 571-272-7796.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
 
/BRENT JOHNSTON HOOVER/Examiner, Art Unit 2125