DETAILED ACTION

The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

This action is responsive to the original application filed on 3/30/2019. Acknowledgement is made with respect to a claim of priority to Indian Application IN201941006134 filed on 2/15/2019.

Claim Objections

Claims 1, 11, and 18 and their dependents are objected to because of the following informalities:  
Independent claims 1, 11, and 18 recite the limitation “wherein the plurality of training parameters are derived from training data corpus for training the ANN model, and wherein the plurality of model parameters are associated with the ANN model and are derived based on the plurality of training parameters and input from user with respect to the ANN model” which should read as “wherein the plurality of training parameters are derived from a training data corpus for training the ANN model, and wherein the plurality of model parameters are associated with the ANN model and are derived based on the plurality of training parameters and are input from a user with respect to the ANN model” (emphasis added) for better grammatical clarity.  Appropriate correction is required.

Dependent claim 7 recites “wherein masking the set of model parameters using the set of pre-defined rules comprises at least one of stemming tokens, lemmatizing tokens, de- duplicating tokens, adjusting weights of tokens, converting tokens in hard coded binary numbers, and converting token types in hard coded binary numbers” which should read as “wherein masking the set of model parameters using the set of pre-defined rules comprises at least one of stemming tokens, lemmatizing tokens, de- duplicating tokens, adjusting weights of tokens, converting tokens [[in]] into hard coded binary numbers, and converting token types in hard coded binary numbers” (emphasis added) for better grammatical clarity. Appropriate correction is required.

Dependent claim 14 recites “wherein the plurality of model parameters comprise at least one of tokens, intents, named entities, word vectors, part of speech (PoS) tags, input features, input neurons, and output neurons, and wherein masking the set of model parameters using the set of pre-defined rules comprises at least one of stemming tokens, lemmatizing tokens, de-duplicating tokens, adjusting weights of tokens, converting tokens in hard coded binary numbers, and converting token types in hard coded binary numbers” which should read as “herein the plurality of model parameters comprise at least one of tokens, intents, named entities, word vectors, part of speech (PoS) tags, input features, input neurons, and output neurons, and wherein masking the set of model parameters using the set of pre-defined rules comprises at least one of stemming tokens, lemmatizing tokens, de-duplicating tokens, adjusting weights of tokens, converting tokens in hard coded binary numbers, and converting token types [[in]] into hard coded binary numbers” (emphasis added) for better grammatical clarity. Appropriate correction is required.

Claim Rejections - 35 USC § 103

In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:

A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 1-3, 6-11, 14-18, and 20 are rejected under 35 U.S.C. 103 as being obvious over Chen et al. (US 20180357541 A1, hereinafter “Chen”) in view of Wu et al. (US 20190287685 A1, hereinafter “Wu”).

Regarding claim 1, Chen discloses [a] method of optimizing memory requirement for training an artificial neural network (ANN) model . . . , the method comprising: ([0002]; “a system, a method, and a non-transitory computer readable medium for training task optimization”, which discloses a method for optimization of memory; and [0004]; “calculate a memory distribution for the training task based on memory factors, the training samples and a neural network. The processor is configured to determine a mini-batch size that is fit to the memory distribution. In response to the training environment information, the processor is configured to output the mini-batch size for execution of the training task”)
receiving, by a memory optimization device, a plurality of training parameters and a plurality of model parameters, ([0023]; “The main memory 220, for example, can be a DRAM (Dynamic Random Access Memory) or a SRAM (Static Random Access Memory). When an instance of the training task begins, the latest model parameters (weights) can be pulled from a shared memory (not shown) to the main memory 220. Then, a subset of the training samples can be loaded from the storage 210 to the main memory 220. The CPU 230 can perform a data preparation process to the subset of the training samples for format transformation and augmentation”, which discloses, under a broadest reasonable interpretation of the claim language, receiving from memory training parameters in the form of training samples and a plurality of model parameters such as weights; and [0024]; “the model parameters can be stored in one or more parameter servers”; and Figure 1; the configuration module is the memory optimization device)
wherein the plurality of training parameters are derived from training data corpus for training the ANN model, and ([0026]; “Usually, the training samples of the training task can be fetched in several subsets, and each time a subset being fetched is a mini-batch. The mini-batch size herein refers to an amount of the training samples being fetched in one mini-batch”, the training samples being derived from a training corpus in the form of a mini batch; and [0023]; “training samples can be loaded from the storage”, the storage containing the training corpus and the training parameters are derived from the training data corpus)
wherein the plurality of model parameters are associated with the ANN model and are derived based on the plurality of training parameters and input from user with respect to the ANN model; ([0032]; “Similar to the embodiment of FIG. 1, in the embodiment, the user can input the training environment information to the application (or the external server for the webpage service) via the interaction interface 300, wherein the training environment information carries factors of the training task (which will be performed by the trainer 400). Contents of the training environment information of the training task can be referred to the embodiment of FIG. 1 . . . in this case, the user can submit training environment information (e.g. neural network, models or parameter servers in use) those has not been checked via the interaction interface 300”, which discloses the model parameters being associated with the ANN/neural network model and are derived from training data and are input from a user; and Figure 4;  the interaction interface allows a user to input model parameters, and the model parameters such as weights of an ANN are derived based on training parameters such as training data/training samples; and [0036]; “the application (or the webpage service) performed by the trainer 400 can still obtain the sizes of the training samples, the memory factors of the trainer 400, and the neural network (model) to be used in training task”; and [0040])
selecting, by the memory optimization device, a set of model parameters from among the plurality of model parameters for training the ANN model based on a characteristic and an architecture of the ANN model; ([0041]; “In some embodiments, the processor 120 can calculate a memory usage that represents spaces allocated to the classification stage. As mentioned, since the full-connected layers in the classification stage are already obtained, the processor 120 can obtain number of neurons and model parameters in the classification stage”, which discloses selecting by the memory optimization device or processor 120 a set of model parameters or weights from among the plurality of model parameters for training the ann/neural network model based on a characteristic and architecture of the number of neurons of the ANN model; and [0041]; “As mentioned, since the full-connected layers in the classification stage are already obtained, the processor 120 can obtain number of neurons and model parameters in the classification stage”; and [0050]; “The processor 120 can select the target mini-batch size as 512 and select the specific combination as an optimal algorithm combination for the training task. It is to say, the processor 120 is aimed at finding an optimal algorithm combination and an optimal mini-batch size simultaneously under the memory factors” the optimal algorithm combination including model parameters for training the ann based on a characteristic/architecture/algorithm of the ann model)
determining, by the memory optimization device, an amount of memory required for training the ANN model [[based on the set of masked model parameters]]; and (([0041]; “According to the number of neurons in each layer of the classification stage, the memory usage of output data in the classification stage can be calculated. According to the number of neurons in each pair of connected layers, the memory usage of model parameters in the classification stage can be calculated”)
providing, by the memory optimization device, the set of [[masked]] model parameters for training the ANN model when the amount of memory required is less than a determined threshold ([0041]; “According to the number of neurons in each pair of connected layers, the memory usage of model parameters in the classification stage can be calculated. Similarly, since a number of the gradients being calculated in the gradient descend computation is corresponding to the number of the model parameters, memory usage of the gradients in the classification stage can be calculated as well”, which discloses calculating or providing the set of model parameters for training the ann model; and [0064-0066]; the paragraphs disclose the providing of model parameters for training the ANN model, and this is implemented by parameter servers and determined by the memory optimization device/processor 120. “N.sub.ps represents a minimum number of parameter servers (i.e. the parameter servers 610-640) required in the training task”, the minimum number of server defining the determined threshold for which a minimum amount of memory is required). 
Chen fails to explicitly disclose but Wu discloses an artificial neural network (ANN) model employed for natural language processing (NLP) ([0031]; “systems and methods to normalize an input HPI and classify the HPI using a neural network tuned to process HPI information and generate a classification from the HPI information”, which discloses a neural network employed for NLP to process natural language in the form of history of present illness (hpi) records)
masking . . .  the set of model parameters in one or more layers of the ANN model based on a set of pre-defined rules to generate a set of masked model parameters; ([0031]; “to normalize an input HPI and classify the HPI using a neural network tuned to process HPI information and generate a classification from the HPI information. In some examples disclosed herein, the HPI is normalized with a natural language processor by tokenizing, lemmatizing, and replacing named entities and medical terms with standardized strings/predefined tags. In some examples disclosed herein, the natural language processor randomly reorganizes the order of each sentence in the input to the HPI. In some examples disclosed herein, the tokens are hashed into integers. In such examples, the integers are representative of an index of a sparse vector where each index represents a distinct word. In examples disclosed herein, the normalized HPI is classified with a neural network. In some examples, the neural network is a three-layer neural network including an embedding layer, recurrent neural network layer, and fully connected layer. In some examples, the recurrent neural network is a long short-term memory (LSTM) network. In some examples, the three-layer neural network outputs a binary output (e.g., a binary classification, either “extended” or “brief” represented as 0 or 1, 1 or 0, etc.) In other examples, the neural network outputs a vector including values corresponding to the presence of each HPI element in an input HPI. In some examples, the output of the neural network can also include a determination of which bodily system(s) is/are discussed in the input HPI” (emphasis added), which discloses masking or lemmatizing  or converting tokens to hash in the form of binary the set of model parameters or tokens in one or more layers of the ANN based on predefined rules to generate masked model parameters/lemmatizied tokens; and [0037]; “The example preprocessor 202 includes an example natural language processor 204 and an example tensor generator 212. The example natural language processor 204 includes an example tokenizer 206, an example lemmatizer 208, an example sentence reorderer 209 and an example named entity recognizer 210. The example neural network 214 includes an embedding layer 216, an example LSTM layer 218 and an example fully connected layer 220”; and [0039]; “Additionally or alternatively, the tokenizer 206 can tokenize short phrases together based on simple rules”).
Chen and Wu are analogous art because both are concerned with machine learning and model optimization.  Before the effective filing date of the claimed invention, it would have been obvious to one skilled in machine learning to combine the NLP and masking of Wu with the method of optimizing memory requirement of Chen to yield the predictable result of an artificial neural network (ANN) model employed for natural language processing (NLP) and masking, by the memory optimization device, the set of model parameters in one or more layers of the ANN model based on a set of pre-defined rules to generate a set of masked model parameters;. The motivation for doing so would be to improve operation of healthcare data processors by correctly and efficiently processing a variety of available information and generating a consistent, accurate result (Wu; [0093]).

	Regarding claim 11, it is a system claim corresponding to the steps of claim 1, and is rejected for the same reasons as claim 1.

	Regarding claim 18, it is a non-transitory computer-readable medium claim corresponding to the steps of claim 1, and is rejected for the same reasons as claim 1.

	Regarding claim 2, the rejection of claim 1 is incorporated and Chen further discloses receiving the training data corpus; deriving the plurality of training parameters by processing the training data corpus; and temporarily storing the plurality of training parameters ([0026]; “Usually, the training samples of the training task can be fetched in several subsets, and each time a subset being fetched is a mini-batch. The mini-batch size herein refers to an amount of the training samples being fetched in one mini-batch”, the training samples being derived from a training corpus in the form of a mini batch; and [0023]; “training samples can be loaded from the storage”, the storage containing the training corpus and the training parameters are derived from the training data corpus).

	Regarding claim 3, the rejection of claim 1 is incorporated and Chen further discloses receiving the input from the user; deriving the plurality of model parameters based on the plurality of training parameters and the input from the user; and deploying the plurality of model parameters ([0032]; “Similar to the embodiment of FIG. 1, in the embodiment, the user can input the training environment information to the application (or the external server for the webpage service) via the interaction interface 300, wherein the training environment information carries factors of the training task (which will be performed by the trainer 400). Contents of the training environment information of the training task can be referred to the embodiment of FIG. 1 . . . in this case, the user can submit training environment information (e.g. neural network, models or parameter servers in use) those has not been checked via the interaction interface 300”, which discloses the model parameters being associated with the ANN/neural network model and are derived from training data and are input from a user; and Figure 4;  the interaction interface allows a user to input model parameters, and the model parameters such as weights of an ANN are derived based on training parameters such as training data/training samples; and [0036]; “the application (or the webpage service) performed by the trainer 400 can still obtain the sizes of the training samples, the memory factors of the trainer 400, and the neural network (model) to be used in training task”; and [0040]).

	Regarding claim 6, the rejection of claim 1 is incorporated and Chen fails to explicitly disclose but Wu discloses wherein the plurality of model parameters comprise at least one of tokens, intents, named entities, word vectors, part of speech (PoS) tags, input features, input neurons, and output neurons ([0031]; “to normalize an input HPI and classify the HPI using a neural network tuned to process HPI information and generate a classification from the HPI information. In some examples disclosed herein, the HPI is normalized with a natural language processor by tokenizing, lemmatizing, and replacing named entities and medical terms with standardized strings/predefined tags. In some examples disclosed herein, the natural language processor randomly reorganizes the order of each sentence in the input to the HPI. In some examples disclosed herein, the tokens are hashed into integers” (emphasis added), which discloses that the model parameters are tokens; and [0039]; “The tokenizer 206 converts each word or group of words of the HPI 108 into a token. In some examples, the tokenizer 206 breaks the input HPI 108 string into individual tokens.”).
	The motivation to combine Chen and Wu is the same as discussed above with respect to claim 1.

	Regarding claim 7, the rejection of claims 1 and 6 are incorporated and Chen fails to explicitly disclose but Wu discloses wherein masking the set of model parameters using the set of pre-defined rules comprises at least one of stemming tokens, lemmatizing tokens, de-duplicating tokens, adjusting weights of tokens, converting tokens in hard coded binary numbers, and converting token types in hard coded binary numbers ([0031]; “to normalize an input HPI and classify the HPI using a neural network tuned to process HPI information and generate a classification from the HPI information. In some examples disclosed herein, the HPI is normalized with a natural language processor by tokenizing, lemmatizing, and replacing named entities and medical terms with standardized strings/predefined tags. In some examples disclosed herein, the natural language processor randomly reorganizes the order of each sentence in the input to the HPI. In some examples disclosed herein, the tokens are hashed into integers. In such examples, the integers are representative of an index of a sparse vector where each index represents a distinct word. In examples disclosed herein, the normalized HPI is classified with a neural network. In some examples, the neural network is a three-layer neural network including an embedding layer, recurrent neural network layer, and fully connected layer. In some examples, the recurrent neural network is a long short-term memory (LSTM) network. In some examples, the three-layer neural network outputs a binary output (e.g., a binary classification, either “extended” or “brief” represented as 0 or 1, 1 or 0, etc.) In other examples, the neural network outputs a vector including values corresponding to the presence of each HPI element in an input HPI. In some examples, the output of the neural network can also include a determination of which bodily system(s) is/are discussed in the input HPI” (emphasis added), which discloses masking or lemmatizing  or converting tokens to hash in the form of binary the set of model parameters or tokens in one or more layers of the ANN based on predefined rules to generate masked model parameters/lemmatizied tokens; and [0037]; “The example preprocessor 202 includes an example natural language processor 204 and an example tensor generator 212. The example natural language processor 204 includes an example tokenizer 206, an example lemmatizer 208, an example sentence reorderer 209 and an example named entity recognizer 210. The example neural network 214 includes an embedding layer 216, an example LSTM layer 218 and an example fully connected layer 220”; and [0039]; “Additionally or alternatively, the tokenizer 206 can tokenize short phrases together based on simple rules”).	
The motivation to combine Chen and Wu is the same as discussed above with respect to claim 1.

Regarding claims 8, 15, and 20, the rejection of claims 1, 11, and 18 are  incorporated and Chen fails to explicitly disclose but Wu discloses wherein masking the set of model parameters comprise masking the set of model parameters in one or more hidden layers of the ANN model. ([0037]; “The example preprocessor 202 includes an example natural language processor 204 and an example tensor generator 212. The example natural language processor 204 includes an example tokenizer 206, an example lemmatizer 208, an example sentence reorderer 209 and an example named entity recognizer 210. The example neural network 214 includes an embedding layer 216, an example LSTM layer 218 and an example fully connected layer 220”; and [0039]; “Additionally or alternatively, the tokenizer 206 can tokenize short phrases together based on simple rules”, the lemmatizing happening in one of the hidden layers such as an embedding layer or LSTM layer; and [0047]; “	The LSTM layer 218 leverages history or learned recognition of language, words, phrases, patterns, etc., in the input vectors 217 using information stored in recurrent gates from prior visible and/or hidden cells in the LSTM layer 218 to arrive at the output vector 219 based on the combination of information in the vector(s)”, which discloses that the lstm layer is a hidden layer, and that is where the masking or the lemmatizing happens).
	The motivation to combine Chen and Wu is the same as discussed above with respect to claim 1.

Regarding claims 9 and 16, the rejection of claims 1 and 11 are incorporated and Chen discloses training the ANN model with the set of [[masked]] model parameters; and ([0054]; “the trainer 400 can perform the training task according to the selected mini-batch size and the selected algorithms for each layer”, which discloses training the ann wth the model parameters; and [0066]; “the trainers 200a-200d can perform the training task according to the parameter server employment advice”)
unmasking a resultant by back-propagating using the set of pre-defined rules ([0041]; “Similarly, since a number of the gradients being calculated in the gradient descend computation is corresponding to the number of the model parameters, memory usage of the gradients in the classification stage can be calculated as well”, the gradient descent being the back propagation using pre-defined rules under a BRI, and the gradient descend computation being the pre-defined rule).
Chen fails to explicitly disclose but Wu discloses masked model parameters ([0031]; and [0037]).
The motivation to combine Chen and Wu is the same as discussed above with respect to claim 1.

Regarding claim 10 and 17, the rejection of claims 1 and 11 are incorporated and Chen discloses iteratively selecting an updated set of model parameter ([0023]; “The updated model parameters generated by the gradient descend computation can be transferred to the main memory 220. The updated model parameters can be transmitted to the shared memory as a replacement of the lasted model parameters. When the model parameters are updated, the instance of the training task is completed. When all the instances of the training task are completed, the training task is done”)
determining the amount of memory required based on an updated set of [[masked]] model parameters until the amount of memory required is less than the determined threshold ([0041]; “According to the number of neurons in each pair of connected layers, the memory usage of model parameters in the classification stage can be calculated. Similarly, since a number of the gradients being calculated in the gradient descend computation is corresponding to the number of the model parameters, memory usage of the gradients in the classification stage can be calculated as well”, which discloses calculating or providing the set of model parameters for training the ann model; and [0064-0066]; the paragraphs disclose the providing of model parameters for training the ANN model, and this is implemented by parameter servers and determined by the memory optimization device/processor 120. “N.sub.ps represents a minimum number of parameter servers (i.e. the parameter servers 610-640) required in the training task”, the minimum number of server defining the determined threshold for which a minimum amount of memory is required).).
Chen fails to explicitly disclose but Wu discloses masking the updated set of model parameters ([0031]; and [0037]).
The motivation to combine Chen and Wu is the same as discussed above with respect to claim 1.

Regarding claim 14 the rejection of claim 11 is incorporated and Chen fails to explicitly disclose but Wu discloses wherein the plurality of model parameters comprise at least one of tokens, intents, named entities, word vectors, part of speech (PoS) tags, input features, input neurons, and output neurons, and wherein masking the set of model parameters using the set of pre-defined rules comprises at least one of stemming tokens, lemmatizing tokens, de-duplicating tokens, adjusting weights of tokens, converting tokens in hard coded binary numbers, and converting token types in hard coded binary numbers ([0031]; “to normalize an input HPI and classify the HPI using a neural network tuned to process HPI information and generate a classification from the HPI information. In some examples disclosed herein, the HPI is normalized with a natural language processor by tokenizing, lemmatizing, and replacing named entities and medical terms with standardized strings/predefined tags. In some examples disclosed herein, the natural language processor randomly reorganizes the order of each sentence in the input to the HPI. In some examples disclosed herein, the tokens are hashed into integers. In such examples, the integers are representative of an index of a sparse vector where each index represents a distinct word. In examples disclosed herein, the normalized HPI is classified with a neural network. In some examples, the neural network is a three-layer neural network including an embedding layer, recurrent neural network layer, and fully connected layer. In some examples, the recurrent neural network is a long short-term memory (LSTM) network. In some examples, the three-layer neural network outputs a binary output (e.g., a binary classification, either “extended” or “brief” represented as 0 or 1, 1 or 0, etc.) In other examples, the neural network outputs a vector including values corresponding to the presence of each HPI element in an input HPI. In some examples, the output of the neural network can also include a determination of which bodily system(s) is/are discussed in the input HPI” (emphasis added), which discloses masking or lemmatizing  or converting tokens to hash in the form of binary the set of model parameters or tokens in one or more layers of the ANN based on predefined rules to generate masked model parameters/lemmatizied tokens; and [0037]; “The example preprocessor 202 includes an example natural language processor 204 and an example tensor generator 212. The example natural language processor 204 includes an example tokenizer 206, an example lemmatizer 208, an example sentence reorderer 209 and an example named entity recognizer 210. The example neural network 214 includes an embedding layer 216, an example LSTM layer 218 and an example fully connected layer 220”; and [0039]; “Additionally or alternatively, the tokenizer 206 can tokenize short phrases together based on simple rules”).	
The motivation to combine Chen and Wu is the same as discussed above with respect to claim 1.

Allowable Subject Matter

Claims 4, 5, 12, 13, and 19 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

Conclusion

Any inquiry concerning this communication or earlier communications from the examiner should be directed to Brent Hoover whose telephone number is (303)297-4403. The examiner can normally be reached Monday - Friday 9-5 MST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Abdullah Kawsar can be reached on 571-270-3169. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/BRENT JOHNSTON HOOVER/Examiner, Art Unit 2127