DETAILED ACTION

This office action is in response to Applicant’s submission filed on 1 March 2019

Status of Claims

Claims 1-20 are pending.
Claims 1-20 are rejected under 35 U.S.C. 112(b) as indefinite.
Claims 1-20 are rejected under 35 U.S.C. 103 as unpatentable.

Claim Rejections - 35 USC § 112

The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.

The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.

A claim is indefinite if, when read in light of the specification, it fails to inform, with reasonable certainty, those skilled in the art about the scope of the invention.  Nautilus, Inc. v. Biosig Instruments, Inc., 110 USPQ.2d 1688, U.S. Supreme Court (2014).

Claims 1-20 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor, or for pre-AIA  the applicant regards as the invention.
Regarding claim 1, 1, 17, 
“training a neural network having an output layer that outputs continuous values so that the output layer of the neural network will tend to output discrete value”, is the output continuous or discrete?  the claim is indefinite.  For the purpose of applying prior art, this limitation is construed to be “training a neural network having an output layer that outputs discrete value”.
Regarding claims 2-12 / 14-16 / 18-20, which depend on above rejected claim 1 / 13 / 17, are rejected for the same reason. 

Claim Rejections - 35 USC § 103

In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 1-2, 5, 11, 13-14, 17-18  are rejected under 35 U.S.C. 103 as being unpatentable over Oord, et al., “Neural Discrete Representation Learning”, arXiv:1711.00937v2 [cs.LG] 30 May 2018 [hereafter Oord], in view of Schmidt, et al., US-PGPUB NO.2020/0118000A1 [hereafter Schmidt].

With regards to claim 1, Oord teaches 
A computer-implemented method for neural network training, comprising: training a neural network having an output layer that outputs continuous values so that the output layer of the neural network will tend to output discrete values, wherein the output layer includes a plurality of nodes, each node corresponding to one of a plurality of classes (Oord, p.3, 3.1 Discrete Latent variables, shows determining discrete classes,

    PNG
    media_image1.png
    319
    817
    media_image1.png
    Greyscale

); assigning a priority to at least one class of the plurality of classes; and activating the nodes by priority according to the corresponding class of the plurality of classes.”
Oord does not explicitly detail “assigning a priority to at least one class of the plurality of classes; and activating the nodes by priority according to the corresponding class of the plurality of classes”.
However Schmidt teaches “assigning a priority to at least one class of the plurality of classes; and activating the nodes by priority according to the corresponding class of the plurality of classes (Schmidt, FIG.11, [0022], ‘assigning, by the deep learning node, priority levels … and prioritizing, by the deep learning node, transmission among the backpropagation message communications based on the priority levels’)”.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention, and having the teachings of Oord and Schmidt before him or her, to modify the Discrete Representation for Neural Network of Oord to include associating priority with nodes as shown in Schmidt.   
The motivation for doing so would have been for training deep learning nodes of neural network (Schmidt, Abstract). 

With regards to claim 2, Oord in view of Schmidt teaches 
“The method of Claim 1”
Oord does not explicitly detail “wherein the training further includes regularizing each class by minimizing a network loss function that applies a penalty term associated with the priority of the corresponding class”.
However Schmidt teaches “wherein the training further includes regularizing each class by minimizing a network loss function that applies a penalty term associated with the priority of the corresponding class (Schmidt, FIG.1, [0008], ‘A loss function is defined to evaluate the prediction error, which is then back-propagated through the network’, [0051], ‘chunking the backpropagation messages based on the corresponding one of the layers of the prediction error data associated with each of the backpropagation messages to create message chunks.  The priority levels can be assigned such that a lower layer of the layers has a higher priority level than a higher layer of the layers’)”.
 to include loss function as shown in Schmidt.   
The motivation for doing so would have been for training deep learning nodes of neural network (Schmidt, Abstract). 

With regards to claim 5, Oord in view of Schmidt teaches 
“The method of Claim 2”
Oord does not explicitly detail “wherein the network loss function includes weighted activation levels of the plurality of nodes of all classes, the weighted activation levels being weighted according to the priority of the corresponding class”.
However Schmidt teaches “wherein the training further includes regularizing each class by minimizing a network loss function that applies a penalty term associated with the priority of the corresponding class (Schmidt, FIG.1, [0022], ‘assigning, by the deep learning node, priority levels … and prioritizing, by the deep learning node, transmission among the backpropagation message communications based on the priority levels’, [0038], ‘transmits a message having information on at least one of a gradient or a weight’, and 

    PNG
    media_image2.png
    250
    353
    media_image2.png
    Greyscale

shows minimizing loss function calculation related to weights.)”.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention, and having the teachings of Oord and Schmidt before him or her, to modify the Discrete Representation for Neural Network of Oord to include minimizing loss function as shown in Schmidt.   
The motivation for doing so would have been for training deep learning nodes of neural network (Schmidt, Abstract). 

With regards to claim 11, Oord in view of Schmidt teaches 
“The method of Claim 1, wherein the neural network is a Variational Autoencoder (VAE), and the output layer is included in an encoder of the VAE (Oord, Figure 1, p.2, ‘In this work, we present a new way of training variational autoencoders [23,32] with discrete latent variables [27]’.

    PNG
    media_image3.png
    276
    645
    media_image3.png
    Greyscale

)”

Claims 13-14, 17-18 are substantially similar to claims 1-2, 5, 11. The arguments as given above for claims 13-14, 17-18 are applied, mutatis mutandis, to claims 13-14, 17-18, therefore the rejection of claims 13-14, 17-18 are applied accordingly.

The combined teaching described above will be referred as Oord + Schmidt hereafter.

Claims 3-4, 15-16, 19-20 are rejected under 35 U.S.C. 103 as being unpatentable over Oord, et al., “Neural Discrete Representation Learning”, arXiv:1711.00937v2 [cs.LG] 30 May 2018 [hereafter Oord], in view of Schmidt, et al., US-PGPUB NO.2020/0118000A1 [hereafter Schmidt] and YOO, et al., US-PGPUB NO.2016/0247064A1 [hereafter YOO].

With regards to claim 3, Oord + Schmidt teaches 
“The method of Claim 2“
 does not explicitly detail “wherein the network loss function includes activation levels of the plurality of nodes except nodes of one class”.
However YOO teaches excluding node(s) from neural network function calculation (YOO, FIG.7, [0021], ‘excluding a reference hidden node from hidden nodes included in the neural network’

    PNG
    media_image4.png
    498
    751
    media_image4.png
    Greyscale

)”.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention, and having the teachings of Oord + Schmidt and YOO before him or her, to modify the Discrete Representation for Neural Network of Oord + Schmidt to exclude node(s) from neural network function calculation as shown in YOO.   
The motivation for doing so would have been for training neural networks (YOO, Abstract). 

With regards to claim 4, Oord + Schmidt teaches 
“The method of Claim 2“
Oord + Schmidt does not explicitly detail “wherein the network loss function includes activation levels of the plurality of nodes except nodes of two or more classes”.
However YOO teaches excluding node(s) from neural network function calculation (YOO, FIG.7, [0021], ‘excluding a reference hidden node from hidden nodes included in the neural network’

    PNG
    media_image4.png
    498
    751
    media_image4.png
    Greyscale

)”.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention, and having the teachings of Oord + Schmidt and YOO before him or her, to modify the Discrete Representation for Neural Network of Oord + Schmidt to exclude node(s) from neural network function calculation as shown in YOO.   
, Abstract). 

Claims 15-16, 19-20 are substantially similar to claims 3-4. The arguments as given above for claims 3-4 are applied, mutatis mutandis, to claims 15-16, 19-20, therefore the rejection of claims 3-4 are applied accordingly.

Claims 6-7  are rejected under 35 U.S.C. 103 as being unpatentable over Oord, et al., “Neural Discrete Representation Learning”, arXiv:1711.00937v2 [cs.LG] 30 May 2018 [hereafter Oord], in view of Schmidt, et al., US-PGPUB NO.2020/0118000A1 [hereafter Schmidt] and Lee, et. al., “Structure Level Adaptation for Artificial Neural Networks”, Springer Science + Business Media, LLC, 1991 [hereafter Lee].

With regards to claim 6, Oord + Schmidt teaches 
“The method of Claim 1, wherein the output layer comprises a plurality of sets of nodes, each set of nodes corresponding to one of a plurality of variables, and each set of the plurality of sets includes one of the plurality of nodes corresponding to each class (Oord, p.3, 3.1 Discrete Latent variables, shows determining discrete classes,

    PNG
    media_image1.png
    319
    817
    media_image1.png
    Greyscale

)”
Oord + Schmidt does not explicitly detail “wherein the method further comprises: identifying a variable of the plurality of variables of which only a particular class is activated regardless of input data to the neural network, and deleting a set of nodes corresponding to the identified variable”.
However Lee teaches “wherein the method further comprises: identifying a variable of the plurality of variables of which only a particular class is activated regardless of input data to the neural network, and deleting a set of nodes corresponding to the identified variable (Lee, FIG.3.1, 3.4.2 Neuron Annihilation, p.73, ‘If a neuron has an essentially constant output value, then it can be annihilated’)”.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention, and having the teachings of Oord + Schmidt and Lee before him or her, to modify the Discrete Representation for Neural Network of Oord + Schmidt to include the evaluation for deleting nodes based on the teaching shown in Lee.   


With regards to claim 7, Oord + Schmidt teaches 
“The method of Claim 1, wherein the output layer comprises a plurality of sets of nodes, each set of nodes corresponding to one of a plurality of variables, and each set of the plurality of sets includes one of the plurality of nodes corresponding to each class (Oord, p.3, 3.1 Discrete Latent variables, shows determining discrete classes,

    PNG
    media_image1.png
    319
    817
    media_image1.png
    Greyscale

)”
Oord + Schmidt does not explicitly detail “wherein the method further comprises: identifying a class that is not activated throughout the plurality of variables regardless of input data to the neural network, and deleting nodes corresponding to the identified class throughout the plurality of variables”.
However Lee teaches “wherein the method further comprises: identifying a class that is not activated throughout the plurality of variables regardless of input data to the neural network, and deleting nodes corresponding to the identified class throughout the plurality of variables (Lee, FIG.3.1, 3.4.2 Neuron Annihilation, p.73, ‘If a neuron has an essentially constant output value, then it can be annihilated’)”.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention, and having the teachings of Oord + Schmidt and Lee before him or her, to modify the Discrete Representation for Neural Network of Oord + Schmidt to include the evaluation for deleting nodes based on the teaching shown in Lee.   
The motivation for doing so would have been for improving efficiency of NN computing. 

The combined teaching described above will be referred as Oord + Schmidt + Lee hereafter.

Claims 8-10 are rejected under 35 U.S.C. 103 as being unpatentable over Srinivas, et al., Oord, et al., “Neural Discrete Representation Learning”, arXiv:1711.00937v2 [cs.LG] 30 May 2018 [hereafter Oord], in view of Schmidt, et al., US-PGPUB NO.2020/0118000A1 [hereafter Schmidt], Lee, et. al., “Structure Level Adaptation for Artificial Neural Networks”, Springer Science + Business Media, LLC, 1991 [hereafter Lee] and Jang et al., “Categorical Reparameterization with Gumbel-Softmax”, arXiv:1611.01144v5 [stat.ML] 5 Aug 2017 [hereafter Jang].

With regards to claim 8, Oord + Schmidt + Lee teaches 
“The method of Claim 6”
wherein, during the training of the neural network, the plurality of nodes of each set calculate a softmax value based at least on logit values of outputs from nodes in a previous layer connected to the output layer and a sample of a predetermined distribution”.
However Jang teaches “wherein, during the training of the neural network, the plurality of nodes of each set calculate a softmax value based at least on logit values of outputs from nodes in a previous layer connected to the output layer and a sample of a predetermined distribution (Jang, 2 ‘The Gumbel-Softmax Distribution’, p.2-3, and 4.3 ‘Generative Semi-Supervised Classification’, p.7,

    PNG
    media_image5.png
    247
    802
    media_image5.png
    Greyscale

)”.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention, and having the teachings of Oord + Schmidt + Lee and Jang before him or her, to modify the Discrete Representation for Neural Network of Oord + Schmidt + Lee to include the evaluation for deleting nodes based on the teaching shown in Jang.   
The motivation for doing so would have been for categorizing latent variables (Jang, Abstract). 

With regards to claim 9, Oord + Schmidt + Lee teaches 
“The method of Claim 6, wherein”
Oord + Schmidt + Lee does not explicitly detail “during the training of the neural network, the plurality of nodes of each set calculate a softmax value base at least on logit values of outputs from nodes P201805743US01 (M2283)Page 25 of 29of a previous layer connected to the output layer, a sample of Gumbel distribution, and a temperature parameter”.
However Jang teaches “during the training of the neural network, the plurality of nodes of each set calculate a softmax value base at least on logit values of outputs from nodes P201805743US01 (M2283)Page 25 of 29of a previous layer connected to the output layer, a sample of Gumbel distribution, and a temperature parameter (Jang, 2 ‘The Gumbel-Softmax Distribution’, p.2-3, and 4.3 ‘Generative Semi-Supervised Classification’, p.7,

    PNG
    media_image5.png
    247
    802
    media_image5.png
    Greyscale

)”.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention, and having the teachings of Oord + Schmidt + Lee and Jang before him or her, to modify the Discrete Representation for Neural Network of Oord + Schmidt + Lee to include the evaluation for deleting nodes based on the teaching shown in Jang.   


With regards to claim 10, Oord + Schmidt + Lee teaches 
“The method of Claim 8”
Oord + Schmidt + Lee does not explicitly detail “further comprises: replacing the output layer used at the training with an argmax layer”.
However Jang teaches “further comprises: replacing the output layer used at the training with an argmax layer (Lee, 2.2 ‘Straight-Through Gumbel-Softmax Estimator’, p.3, ‘For scenarios in which we are constrained to sample discrete values (e.g., from a discrete action space for reinforcement learning, or quantized compression), we discretize y using arg max’).”
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention, and having the teachings of Oord + Schmidt + Lee and Jang before him or her, to modify the Discrete Representation for Neural Network of Oord + Schmidt + Lee to include evaluation using argmax as  shown in Jang.   
The motivation for doing so would have been for categorizing latent variables (Jang, Abstract). 

Claims 12 are rejected under 35 U.S.C. 103 as being unpatentable over Srinivas, et al., Oord, et al., “Neural Discrete Representation Learning”, arXiv:1711.00937v2 [cs.LG] 30 May 2018 [hereafter Oord], in view of Schmidt, et al., US-PGPUB .

With regards to claim 12, Oord + Schmidt teaches 
“The method of Claim 1”
Oord + Schmidt does not explicitly detail “wherein output from the encoder is used for input to a problem solver”.
However Schmidhuber teaches “wherein output from the encoder is used for input to a problem solver (Schmidhuber, [0003] ‘To become a general problem solver that is able to run arbitrary problem-solving programs, a control system for a robot or an artificial agent can be implemented as a computer-based artificial recurrent neural network (RNN)’).”
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention, and having the teachings of Oord + Schmidt and Jang before him or her, to modify the Discrete Representation for Neural Network of Oord + Schmidt to include applying the trained system as a problem solver as shown in Schmidhuber.   
The motivation for doing so would have to perform problem solving through training. 



Additional Relevant Art

The prior art made of record is considered pertinent to applicant’s disclosure and is recorded on Form PTO-892. Applicant is required under 37 C.F.R. § 1.111 (c) to consider these references fully when responding to this action, with particular attention paid to:
Srinivas, et al., “Data-free Parameter Pruning for Deep Neural Networks”, asXiv:1507.06149v1 [cs.CV] 22 Jul 2015 [hereafter Srinivas] teaches neuron removal for deep neural networks.




Examiner's Note

The Examiner respectfully requests of the Applicant in preparing responses, to fully consider the entirety of the reference(s) as potentially teaching all or part of the claimed invention.  It is noted, REFERENCES ARE RELEVANT AS PRIOR ART FOR ALL THEY CONTAIN.  “The use of patents as references is not limited to what the patentees describe as their own inventions or to the problems with which they are concerned.  They are part of the literature of the art, relevant for all they contain.”  In re Heck, 699 F.2d 1331, 1332-33, 216 USPQ 1038, 1039 (Fed. Cir. 1983) (quoting In re Lemelson, 397 F.2d 1006, 1009, 158 USPQ 275, 277 (CCPA 1968)).  A reference may be relied upon for all that it would have reasonably suggested to one having ordinary skill in the art, including non-preferred embodiments (see MPEP 2123).  The Examiner has cited particular locations in the reference(s) as applied to the claim(s) above for the convenience of the Applicant.  Although the specified citations are representative of the teachings of the art and are applied to the specific limitations within the individual claim(s), typically other passages and figures will apply as well. 


Conclusion

Any inquiry concerning this communication or earlier communications from the examiner should be directed to TSU-CHANG LEE whose telephone number is 571-272-3567.  
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ann J. Lo, can be reached 571-272-9767.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
 Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/TSU-CHANG LEE/
Examiner, Art Unit 2126