Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statement (IDS) was submitted on 07/18th/2019. The submission is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 1-3, 5-6, 8-9, 11-14, and 16-17 are rejected under 35 U.S.C. 103 as being unpatentable over Moghadam (US20200334569A1), in view of Hara (US20170228639A1).

Regarding claim 1, Moghadam teaches A system, comprising: a memory that stores computer executable components; and a processor that executes the computer executable components stored in the memory, wherein the computer executable components comprise: a configuration component that defines different hyperparameters of multiple artificial intelligence models ([0054] In step 306, a respective target set of hyperparameter settings is generated. A respective target set of hyperparameter settings is generated for each MML model using a hypertuning algorithm).
an application component that employs the artificial intelligence model to extract one or more entities from a data source based on the target hyperparameters ([0061] Step 312 may finish by invoking the RML corresponding to the algorithm with the highest score to obtain a result. For example, the result may be a classification/
recognition of an object within an dataset 110 or a larger dataset. The examiner notes that Moghadam teaches invoking a selected model to classify or identify an object in a dataset).
	However, Moghadam is not relied upon to teach determines target hyperparameters of an artificial intelligence model based on performance of the multiple artificial intelligence models. On the other hand, Hara teaches determines target hyperparameters of an artificial intelligence model based on performance of the multiple artificial intelligence models ([0076] At S250, the selecting section may select one setting based on performance of neural networks in which training is not terminated. For example, the selecting section may select a neural network that has the best evaluation value among the first neural networks trained at S130 and the second neural networks in which training is not terminated by the terminating section at S210. The examiner notes that Hara teaches [0004] that model hyperparameters are also called model settings. The examiner also notes that Moghadam and Hara both considered to be analogous because they are in the same field of machine learning. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Moghadam’s training method to incorporate determines target hyperparameters of an artificial intelligence model based on performance of the multiple artificial intelligence models as taught by Hara [0076] to predict an eventual model performance at an
early stage of training in order to automate the tuning of hyperparameters [0004]).

Regarding claim 2, Moghadam teaches The system of claim 1, wherein the artificial intelligence model comprises a collection of a plurality of artificial intelligence models trained with various hyperparameters ([0055] In step 308, a respective hyperparameter predictor set is trained that predicts a respective set of hyperparameter settings for the first data set. The hyperparameter predictor set may comprise one or more hyperparameter models. Each hyperparameter model in the hyperparameter predictor set is trained using the first plurality of first meta-feature sets, generated in step 304, and respective target set of hyperparameter settings, generated in step 306).

Regarding claim 3, Moghadam teaches The system of claim 1, wherein at least one of the artificial intelligence model or the multiple artificial intelligence models comprise at least one of a deep neural network model, a recurring neural network model, a long short term memory model, an elastic long short term memory model, or a decoupled elastic long short term memory model ([0097] An RNN has two major internal enhancements over other MLPs. The first is localized memory cells such as LSTM, which involves microscopic details. The other is cross activation of recurrence steps, which is macroscopic (i.e., gross topology). Each step receives two inputs and outputs two outputs. One input is external activation from an item in an input sequence. The other input is an output of the adjacent previous step that may embed details from some or all previous steps, which achieves sequential history (i.e., temporal context). The other output is a predicted next item in the sequence. Example mathematical formulae and techniques for RNNs and LSTM are taught in related U.S. patent application Ser. No. 15/347,501, entitled "MEMORY CELL UNIT AND RECURRENT NEURAL NETWORK INCLUDING MULTIPLE MEMORY CELL UNITS.").

Regarding claim 5, Moghadam teaches The system of claim 1, wherein the different hyperparameters are defined based on at least one of different content domains, different knowledge sources, different data types, or different applications ([0029] Features of a machine learning algorithm are referred to as hyperparameters. If machine learning algorithm 121 is a support vector machine, then hyperparameters typically include C and gamma. If machine learning algorithm 121 is a neural network, then hyperparameters may include features such as a count of layers and/or a count of neurons per layer. The examiner notes that Moghadam teaches hyperparameters that could be different data types such as C, gamma, count of neurons, or count of layers which are integer or real number data types).

Regarding claim 6, Moghadam teaches The system of claim 1, wherein the computer executable components further comprise: a tuner component that tunes one or more hyperparameters of the multiple artificial intelligence models to define the different hyperparameters of the multiple artificial intelligence models ([0030] Computer 100 creates or obtains hyperparameter predictors 135 for each of machine learning algorithms 121-123 to predict and produce optimal mini-model hyperparameters. The mini-model hyperparameters are optimal because they are tuned and produced by the system per mini-model to improve the accuracy of mini-model scores over previous techniques of using static or standard hyperparameters).

Regarding claim 8, Moghadam teaches A computer-implemented method, comprising: defining, by a system operatively coupled to a processor, different hyperparameters of multiple artificial intelligence models ([0054] In step 306, a respective target set of hyperparameter settings is generated. A respective target set of hyperparameter settings is generated for each MML model using a hypertuning algorithm).
employing, by the system, the artificial intelligence model to extract one or more entities from a data source based on the target hyperparameters ([0061] Step 312 may finish by invoking the RML corresponding to the algorithm with the highest score to obtain a result. For example, the result may be a classification/recognition of an object within an dataset 110 or a larger dataset. The examiner notes that Moghadam teaches invoking a selected model to classify or identify an object in a dataset).
However, Moghadam is not relied upon to teach determining, by the system, target hyperparameters of an artificial intelligence model based on performance of the multiple artificial intelligence models. On the other hand, Hara teaches determining, by the system, target hyperparameters of an artificial intelligence model based on performance of the multiple artificial intelligence models ([0076] At S250, the selecting section may select one setting based on performance of neural networks in which training is not terminated. For example, the selecting section may select a neural network that has the best evaluation value among the first neural networks trained at S130 and the second neural networks in which training is not terminated by the terminating section at S210. The examiner notes that Hara teaches [0004] that model hyperparameters are also called model settings. The examiner also notes that Moghadam and Hara both considered to be analogous because they are in the same field of machine learning. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Moghadam’s training method to incorporate determining, by the system, target hyperparameters of an artificial intelligence model based on performance of the multiple artificial intelligence models as taught by Hara [0076] to predict an eventual model performance at an early stage of training in order to automate the tuning of hyperparameters [0004]).

Regarding claim 9, Moghadam teaches The computer-implemented method of claim 8, wherein at least one of the artificial intelligence model or the multiple artificial intelligence models comprise at least one of a deep neural network model, a recurring neural network model, a long short term memory model, an elastic long short term memory model, or a decoupled elastic long short term memory model ([0097] An RNN has two major internal enhancements over other MLPs. The first is localized memory cells such as LSTM, which involves microscopic details. The other is cross activation of recurrence steps, which is macroscopic (i.e., gross topology). Each step receives two inputs and outputs two outputs. One input is external activation from an item in an input sequence. The other input is an output of the adjacent previous step that may embed details from some or all previous steps, which achieves sequential history (i.e., temporal context). The other output is a predicted next item in the sequence. Example mathematical formulae and techniques for RNNs and LSTM are taught in related U.S. patent application Ser. No. 15/347,501, entitled "MEMORY CELL UNIT AND RECURRENT NEURAL NETWORK INCLUDING MULTIPLE MEMORY CELL UNITS.").

Regarding claim 11, Moghadam teaches The computer-implemented method of claim 8, further comprising: defining, by the system, the different hyperparameters based on at least one of different content domains, different knowledge sources, different data types, or different applications ([0029] Features of a machine learning algorithm are referred to as hyperparameters. If machine learning algorithm 121 is a support vector machine, then hyperparameters typically include C and gamma. If machine learning algorithm 121 is a neural network, then hyperparameters may include features such as a count of layers and/or a count of neurons per layer. The examiner notes that Moghadam teaches hyperparameters that could be different data types such as C, gamma, count of neurons, or count of layers which are integer or real number data types).

Regarding claim 12, Moghadam teaches The computer-implemented method of claim 8, further comprising: tuning, by the system, one or more hyperparameters of the multiple artificial intelligence models to define the different hyperparameters of the multiple artificial intelligence models ([0030] Computer 100 creates or obtains hyperparameter predictors 135 for each of machine learning algorithms 121-123 to predict and produce optimal mini-model hyperparameters. The mini-model hyperparameters are optimal because they are tuned and produced by the system per mini-model to improve the accuracy of mini-model scores over previous techniques of using static or standard hyperparameters).

Regarding claim 13, Moghadam teaches A computer program product facilitating extraction of entities having defined lengths of text spans, the computer program product comprising a computer readable storage medium having program instructions embodied therewith, the program instructions executable by a processor to cause the processor to: define, by the processor, different hyperparameters of multiple artificial intelligence models ([0054] In step 306, a respective target set of hyperparameter settings is generated. A respective target set of hyperparameter settings is generated for each MML model using a hypertuning algorithm).
employ, by the processor, the artificial intelligence model to extract one or more entities from a data source based on the target hyperparameters ([0061] Step 312 may finish by invoking the RML corresponding to the algorithm with the highest score to obtain a result. For example, the result may be a classification/recognition of an object within an dataset 110 or a larger dataset. The examiner notes that Moghadam teaches invoking a selected model to classify or identify an object in a dataset).
However, Moghadam is not relied upon to teach determine, by the processor, target hyperparameters of an artificial intelligence model based on performance of the multiple artificial intelligence models. On the other hand, Hara teaches determine, by the processor, target hyperparameters of an artificial intelligence model based on performance of the multiple artificial intelligence models ([0076] At S250, the selecting section may select one setting based on performance of neural networks in which training is not terminated. For example, the selecting section may select a neural network that has the best evaluation value among the first neural networks trained at S130 and the second neural networks in which training is not terminated by the terminating section at S210. The examiner notes that Hara teaches [0004] that model hyperparameters are also called model settings. The examiner also notes that Moghadam and Hara both considered to be analogous because they are in the same field of machine learning. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Moghadam’s training method to incorporate determine, by the processor, target hyperparameters of an artificial intelligence model based on performance of the multiple artificial intelligence models as taught by Hara [0076] to predict an eventual model performance at an early stage of training in order to automate the tuning of hyperparameters [0004]).

Regarding claim 14, Moghadam teaches The computer program product of claim 13, wherein at least one of the artificial intelligence model or the multiple artificial intelligence models comprise at least one of a deep neural network model, a recurring neural network model, a long short term memory model, an elastic long short term memory model, or a decoupled elastic long short term memory model ([0097] An RNN has two major internal enhancements over other MLPs. The first is localized memory cells such as LSTM, which involves microscopic details. The other is cross activation of recurrence steps, which is macroscopic (i.e., gross topology). Each step receives two inputs and outputs two outputs. One input is external activation from an item in an input sequence. The other input is an output of the adjacent previous step that may embed details from some or all previous steps, which achieves sequential history (i.e., temporal context). The other output is a predicted next item in the sequence. Example mathematical formulae and techniques for RNNs and LSTM are taught in related U.S. patent application Ser. No. 15/347,501, entitled "MEMORY CELL UNIT AND RECURRENT NEURAL NETWORK INCLUDING MULTIPLE MEMORY CELL UNITS.").

Regarding claim 16, Moghadam teaches The computer program product of claim 13, wherein the program instructions are further executable by the processor to cause the processor to: define, by the processor, the different hyperparameters based on at least one of different content domains, different knowledge sources, different data types, or different applications ([0029] Features of a machine learning algorithm are referred to as hyperparameters. If machine learning algorithm 121 is a support vector machine, then hyperparameters typically include C and gamma. If machine learning algorithm 121 is a neural network, then hyperparameters may include features such as a count of layers and/or a count of neurons per layer. The examiner notes that Moghadam teaches hyperparameters that could be different data types such as C, gamma, count of neurons, or count of layers which are integer or real number data types).

Regarding claim 17, Moghadam teaches The computer program product of claim 13, wherein the program instructions are further executable by the processor to cause the processor to: tune, by the processor, one or more hyperparameters of the multiple artificial intelligence models to define the different hyperparameters of the multiple artificial intelligence models, thereby facilitating at least one of improved memory capacity, improved accuracy, or reduced execution cost of the artificial intelligence model ([0030] Computer 100 creates or obtains hyperparameter predictors 135 for each of machine learning algorithms 121-123 to predict and produce optimal mini-model hyperparameters. The mini-model hyperparameters are optimal because they are tuned and produced by the system per mini-model to improve the accuracy of mini-model scores over previous techniques of using static or standard hyperparameters).

Claims 4, 10, and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Moghadam (US20200334569A1), in view of Hara (US20170228639A1), further in view of Gers (Learning to Forget: Continual Prediction with LSTM).

Regarding claim 4, Moghadam teaches The system of claim 1. However,  Moghadam is not relied upon to teach wherein the target hyperparameters comprise a target forget gate bias and a target input gate bias of the artificial intelligence model, and wherein the different hyperparameters comprise different defined forget gate biases and different defined input gate biases of the multiple artificial intelligence models. On the other hand, Gers teaches wherein the target hyperparameters comprise a target forget gate bias and a target input gate bias of the artificial intelligence model, and wherein the different hyperparameters comprise different defined forget gate biases and different defined input gate biases of the multiple artificial intelligence models ([Page 2469] The examiner notes that Gers teaches [Page 2469] a table that lists a learning loop that includes parameters such as input gate and forget gate with biases set to zero. The examiner also notes that Moghadam and Gers both considered to be analogous because they are in the same field of machine learning. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Moghadam’s training method to incorporate wherein the target hyperparameters comprise a target forget gate bias and a target input gate bias of the artificial intelligence model, and wherein the different hyperparameters comprise different defined forget gate biases and different defined input gate biases of the multiple artificial intelligence models as taught by Gers [Page 2469] to permit LSTM models to learn local self-resets of memory contents that have become irrelevant [Page 2468, section 5]).

    PNG
    media_image1.png
    858
    515
    media_image1.png
    Greyscale


Regarding claim 10, Moghadam teaches The computer-implemented method of claim 8. However,  Moghadam is not relied upon to teach wherein the target hyperparameters comprise a target forget gate bias and a target input gate bias of the artificial intelligence model, and wherein the different hyperparameters comprise different defined forget gate biases and different defined input gate biases of the multiple artificial intelligence models. On the other hand, Gers teaches wherein the target hyperparameters comprise a target forget gate bias and a target input gate bias of the artificial intelligence model, and wherein the different hyperparameters comprise different defined forget gate biases and different defined input gate biases of the multiple artificial intelligence models ([Page 2469] The examiner notes that Gers teaches [Page 2469] a table that lists a learning loop that includes parameters such as input gate and forget gate with biases set to zero. The examiner also notes that Moghadam and Gers both considered to be analogous because they are in the same field of machine learning. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Moghadam’s training method to incorporate wherein the target hyperparameters comprise a target forget gate bias and a target input gate bias of the artificial intelligence model, and wherein the different hyperparameters comprise different defined forget gate biases and different defined input gate biases of the multiple artificial intelligence models as taught by Gers [Page 2469] to permit LSTM models to learn local self-resets of memory contents that have become irrelevant [Page 2468, section 5]).

    PNG
    media_image1.png
    858
    515
    media_image1.png
    Greyscale


Regarding claim 15, Moghadam teaches The computer program product of claim 13. However,  Moghadam is not relied upon to teach wherein the target hyperparameters comprise a target forget gate bias and a target input gate bias of the artificial intelligence model, and wherein the different hyperparameters comprise different defined forget gate biases and different defined input gate biases of the multiple artificial intelligence models. On the other hand, Gers teaches wherein the target hyperparameters comprise a target forget gate bias and a target input gate bias of the artificial intelligence model, and wherein the different hyperparameters comprise different defined forget gate biases and different defined input gate biases of the multiple artificial intelligence models ([Page 2469] The examiner notes that Gers teaches [Page 2469] a table that lists a learning loop that includes parameters such as input gate and forget gate with biases set to zero. The examiner also notes that Moghadam and Gers both considered to be analogous because they are in the same field of machine learning. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Moghadam’s training method to incorporate wherein the target hyperparameters comprise a target forget gate bias and a target input gate bias of the artificial intelligence model, and wherein the different hyperparameters comprise different defined forget gate biases and different defined input gate biases of the multiple artificial intelligence models as taught by Gers [Page 2469] to permit LSTM models to learn local self-resets of memory contents that have become irrelevant [Page 2468, section 5]).

    PNG
    media_image1.png
    858
    515
    media_image1.png
    Greyscale


Claim 7 is rejected under 35 U.S.C. 103 as being unpatentable over Moghadam (US20200334569A1), in view of Hara (US20170228639A1), further in view of Ali (Boosting Arabic Named Entity Recognition With Multi-Attention Layer).

Regarding claim 7, Moghadam teaches The system of claim 1. However, Moghadam is not relied upon to teach wherein the application component employs the artificial intelligence model to extract one or more entities having defined lengths of text spans reflected in at least one of target hypermeter values or target hyperparameter values, thereby facilitating at least one of improved memory capacity, improved accuracy, or reduced execution cost of the artificial intelligence model. On the other hand, Ali teaches wherein the application component employs the artificial intelligence model to extract one or more entities having defined lengths of text spans reflected in at least one of target hypermeter values or target hyperparameter values, thereby facilitating at least one of improved memory capacity, improved accuracy, or reduced execution cost of the artificial intelligence model ([Page 46579, Section C] we ran many trails and test experiments to optimize the hypermeters settings. The maximum sequence length was set to 100, the embedding dimension was fixed to 100, and the hidden-state size was set to 200. The examiner notes that Ali teaches the use of a text sequence size of 100 being set by a hypermeter. The examiner also notes that Moghadam and Ali are both considered to be analogous because they are in the same field of machine learning. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Moghadam’s selection of model parameter and hyperparameter to incorporate wherein the application component employs the artificial intelligence model to extract one or more entities having defined lengths of text spans reflected in at least one of target hypermeter values or target hyperparameter values, thereby facilitating at least one of improved memory capacity, improved accuracy, or reduced execution cost of the artificial intelligence model as taught by Ali [Page 46579, Section C] to optimize the model hyperparameters and the model performance [Page 2468, section 5]).

Claims 18, and 22 are rejected under 35 U.S.C. 103 as being unpatentable over Liang (TRSDL: Tag-Aware Recommender System Based on Deep Learning–Intelligent Computing Systems), in view of Hara (US20170228639A1).

Regarding claim 18, Liang teaches A system, comprising: a memory that stores computer executable components; and a processor that executes the computer executable components stored in the memory, wherein the computer executable components comprise: a trainer component that trains multiple artificial intelligence models to learn entities having different lengths of text spans based on different defined hyperparameters of the multiple artificial intelligence models ([Page 9, Para. 1] The final hyper-parameters we used are listed in Table 3, where L denotes the max sequence length for LSTM, m is the balance factor for the effects of hidden concatenated features. Furthermore, to prevent gradient explosion, we apply gradient clipping to TRSDL, and the max gradient norm is set to 5. The examiner notes that Liang teaches [Page 9, Table 3] hyperparameter L that denotes the max sequence length allowing sequences of up to 20 characters to be processed. The examiner also notes that Liang teaches [Page 8, Section 4.3.1] combining different models into two groups for experimentation purposes. Liang also teaches, for all baselines, tuning the learning rate, the batch size, regularization strength, and appropriate latent dimension.

    PNG
    media_image2.png
    392
    1231
    media_image2.png
    Greyscale


However, Liang is not relied upon to teach a configuration component that determines target hyperparameters of an artificial intelligence model based on performance of the multiple artificial intelligence models. On the other hand, Hara teaches a configuration component that determines target hyperparameters of an artificial intelligence model based on performance of the multiple artificial intelligence models ([0076] At S250, the selecting section may select one setting based on performance of neural networks in which training is not terminated. For example, the selecting section may select a neural network that has the best evaluation value among the first neural networks trained at S130 and the second neural networks in which training is not terminated by the terminating section at S210. The examiner notes that Hara teaches [0004] that model hyperparameters are also called model settings. The examiner also notes that Moghadam and Hara both considered to be analogous because they are in the same field of machine learning. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Moghadam’s training method to incorporate a configuration component that determines target hyperparameters of an artificial intelligence model based on performance of the multiple artificial intelligence models as taught by Hara [0076] to predict an eventual model performance at an early stage of training in order to automate the tuning of hyperparameters [0004]).

Regarding claim 22, Liang teaches A computer-implemented method, comprising: training, by a system operatively coupled to a processor, multiple artificial intelligence models to learn entities having different lengths of text spans based on different defined hyperparameters of the multiple artificial intelligence models ([Page 9, Para. 1] The final hyper-parameters we used are listed in Table 3, where L denotes the max sequence length for LSTM, m is the balance factor for the effects of hidden concatenated features. Furthermore, to prevent gradient explosion, we apply gradient clipping to TRSDL, and the max gradient norm is set to 5. The examiner notes that Liang teaches [Page 9, Table 3] hyperparameter L that denotes the max sequence length allowing sequences of up to 20 characters to be processed. The examiner also notes that Liang teaches [Page 8, Section 4.3.1] combining different models into two groups for experimentation purposes. Liang also teaches, for all baselines, tuning the learning rate, the batch size, regularization strength, and appropriate latent dimension.

    PNG
    media_image2.png
    392
    1231
    media_image2.png
    Greyscale


However, Liang is not relied upon to teach determining, by the system, target hyperparameters of an artificial intelligence model based on performance of the multiple artificial intelligence models. On the other hand, Hara teaches determining, by the system, target hyperparameters of an artificial intelligence model based on performance of the multiple artificial intelligence models ([0076] At S250, the selecting section may select one setting based on performance of neural networks in which training is not terminated. For example, the selecting section may select a neural network that has the best evaluation value among the first neural networks trained at S130 and the second neural networks in which training is not terminated by the terminating section at S210. The examiner notes that Hara teaches [0004] that model hyperparameters are also called model settings. The examiner also notes that Moghadam and Hara both considered to be analogous because they are in the same field of machine learning. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Moghadam’s training method to incorporate determining, by the system, target hyperparameters of an artificial intelligence model based on performance of the multiple artificial intelligence models as taught by Hara [0076] to predict an eventual model performance at an early stage of training in order to automate the tuning of hyperparameters [0004]).

Claims 19, 21, 23, and 25 are rejected under 35 U.S.C. 103 as being unpatentable over Liang (TRSDL: Tag-Aware Recommender System Based on Deep Learning–Intelligent Computing Systems), in view of Hara (US20170228639A1), further in view of Moghadam (US20200334569A1).

Regarding claim 19, Liang teaches The system of claim 18. However, Liang is not relied upon to teach wherein at least one of the artificial intelligence model or the multiple artificial intelligence models comprise at least one of a deep neural network model, a recurring neural network model, a long short term memory model, an elastic long short term memory model, or a decoupled elastic long short term memory model. On the other hand, Moghadam teaches wherein at least one of the artificial intelligence model or the multiple artificial intelligence models comprise at least one of a deep neural network model, a recurring neural network model, a long short term memory model, an elastic long short term memory model, or a decoupled elastic long short term memory model ([0097] An RNN has two major internal enhancements over other MLPs. The first is localized memory cells such as LSTM, which involves microscopic details. The other is cross activation of recurrence steps, which is macroscopic (i.e., gross topology). Each step receives two inputs and outputs two outputs. One input is external activation from an item in an input sequence. The other input is an output of the adjacent previous step that may embed details from some or all previous steps, which achieves sequential history (i.e., temporal context). The other output is a predicted next item in the sequence. Example mathematical formulae and techniques for RNNs and LSTM are taught in related U.S. patent application Ser. No. 15/347,501, entitled "MEMORY CELL UNIT AND RECURRENT NEURAL NETWORK INCLUDING MULTIPLE MEMORY CELL UNITS.". The examiner notes that Liang and Moghadam are both considered to be analogous because they are in the same field of machine learning. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Liang’s machine learning model to incorporate wherein at least one of the artificial intelligence model or the multiple artificial intelligence models comprise at least one of a deep neural network model, a recurring neural network model, a long short term memory model, an elastic long short term memory model, or a decoupled elastic long short term memory model as taught by Moghadam [0097] to take advantage of two major internal enhancements over other MLPs. The first is localized memory cells such as LSTM, which involves microscopic details. The other is cross activation of recurrence steps, which is macroscopic (i.e., gross topology) [0097]).

Regarding claim 21, Liang teaches The system of claim 18. However, Liang is not relied upon to teach wherein the computer executable components further comprise: a tuner component that tunes one or more hyperparameters of the multiple artificial intelligence models to define the different defined hyperparameters of the multiple artificial intelligence models. Liang is also not relied upon to teach wherein the different defined hyperparameters are defined based on at least one of different content domains, different knowledge sources, different data types, or different applications.
On the other hand, Moghadam teaches wherein the computer executable components further comprise: a tuner component that tunes one or more hyperparameters of the multiple artificial intelligence models to define the different defined hyperparameters of the multiple artificial intelligence models ([0030] Computer 100 creates or obtains hyperparameter predictors 135 for each of machine learning algorithms 121-123 to predict and produce optimal mini-model hyperparameters. The mini-model hyperparameters are optimal because they are tuned and produced by the system per mini-model to improve the accuracy of mini-model scores over previous techniques of using static or standard hyperparameters. The examiner notes that Liang and Moghadam are both considered to be analogous because they are in the same field of machine learning. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Liang’s machine learning model training method to incorporate wherein the computer executable components further comprise: a tuner component that tunes one or more hyperparameters of the multiple artificial intelligence models to define the different defined hyperparameters of the multiple artificial intelligence models as taught by Moghadam [0030] to improve the accuracy of the models [0030]).
Furthermore, Moghadam teaches wherein the different defined hyperparameters are defined based on at least one of different content domains, different knowledge sources, different data types, or different applications ([0029] Features of a machine learning algorithm are referred to as hyperparameters. If machine learning algorithm 121 is a support vector machine, then hyperparameters typically include C and gamma. If machine learning algorithm 121 is a neural network, then hyperparameters may include features such as a count of layers and/or a count of neurons per layer. The examiner notes that Moghadam teaches hyperparameters that could be different data types such as C, gamma, count of neurons, or count of layers which are integer or real number data types. The examiner notes that Liang and Moghadam are both considered to be analogous because they are in the same field of machine learning. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Liang’s machine learning model training method to incorporate wherein the different defined hyperparameters are defined based on at least one of different content domains, different knowledge sources, different data types, or different applications as taught by Moghadam [0029] to improve the accuracy of the models [0030]).

Regarding claim 23, Liang teaches The computer-implemented method of claim 22. However, Liang is not relied upon to teach wherein at least one of the artificial intelligence model or the multiple artificial intelligence models comprise at least one of a deep neural network model, a recurring neural network model, a long short term memory model, an elastic long short term memory model, or a decoupled elastic long short term memory model. On the other hand, Moghadam teaches wherein at least one of the artificial intelligence model or the multiple artificial intelligence models comprise at least one of a deep neural network model, a recurring neural network model, a long short term memory model, an elastic long short term memory model, or a decoupled elastic long short term memory model ([0097] An RNN has two major internal enhancements over other MLPs. The first is localized memory cells such as LSTM, which involves microscopic details. The other is cross activation of recurrence steps, which is macroscopic (i.e., gross topology). Each step receives two inputs and outputs two outputs. One input is external activation from an item in an input sequence. The other input is an output of the adjacent previous step that may embed details from some or all previous steps, which achieves sequential history (i.e., temporal context). The other output is a predicted next item in the sequence. Example mathematical formulae and techniques for RNNs and LSTM are taught in related U.S. patent application Ser. No. 15/347,501, entitled "MEMORY CELL UNIT AND RECURRENT NEURAL NETWORK INCLUDING MULTIPLE MEMORY CELL UNITS.". The examiner notes that Liang and Moghadam are both considered to be analogous because they are in the same field of machine learning. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Liang’s machine learning model to incorporate wherein at least one of the artificial intelligence model or the multiple artificial intelligence models comprise at least one of a deep neural network model, a recurring neural network model, a long short term memory model, an elastic long short term memory model, or a decoupled elastic long short term memory model as taught by Moghadam [0097] to take advantage of two major internal enhancements over other MLPs. The first is localized memory cells such as LSTM, which involves microscopic details. The other is cross activation of recurrence steps, which is macroscopic (i.e., gross topology) [0097]).

Regarding claim 25, Liang teaches The computer-implemented method of claim 22. However, Liang is not relied upon to teach further comprising: tuning, by the system, one or more hyperparameters of the multiple artificial intelligence models to define the different defined hyperparameters of the multiple artificial intelligence models. Liang is also not relied upon to teach wherein the different defined hyperparameters are defined based on at least one of different content domains, different knowledge sources, different data types, or different applications.
On the other hand, Moghadam teaches The computer-implemented method of claim 22. However, Liang is not relied upon to teach further comprising: tuning, by the system, one or more hyperparameters of the multiple artificial intelligence models to define the different defined hyperparameters of the multiple artificial intelligence models ([0030] Computer 100 creates or obtains hyperparameter predictors 135 for each of machine learning algorithms 121-123 to predict and produce optimal mini-model hyperparameters. The mini-model hyperparameters are optimal because they are tuned and produced by the system per mini-model to improve the accuracy of mini-model scores over previous techniques of using static or standard hyperparameters. The examiner notes that Liang and Moghadam are both considered to be analogous because they are in the same field of machine learning. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Liang’s machine learning model training method to incorporate further comprising: tuning, by the system, one or more hyperparameters of the multiple artificial intelligence models to define the different defined hyperparameters of the multiple artificial intelligence models as taught by Moghadam [0030] to improve the accuracy of the models [0030]).
Furthermore, Moghadam teaches wherein the different defined hyperparameters are defined based on at least one of different content domains, different knowledge sources, different data types, or different applications ([0029] Features of a machine learning algorithm are referred to as hyperparameters. If machine learning algorithm 121 is a support vector machine, then hyperparameters typically include C and gamma. If machine learning algorithm 121 is a neural network, then hyperparameters may include features such as a count of layers and/or a count of neurons per layer. The examiner notes that Moghadam teaches hyperparameters that could be different data types such as C, gamma, count of neurons, or count of layers which are integer or real number data types. The examiner notes that Moghadam teaches hyperparameters that could be different data types such as C, gamma, count of neurons, or count of layers which are integer or real number data types. The examiner notes that Liang and Moghadam are both considered to be analogous because they are in the same field of machine learning. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Liang’s machine learning model training method to incorporate wherein the different defined hyperparameters are defined based on at least one of different content domains, different knowledge sources, different data types, or different applications as taught by Moghadam [0029] to improve the accuracy of the models [0030]).

Claims 20, and 24 are rejected under 35 U.S.C. 103 as being unpatentable over Liang (TRSDL: Tag-Aware Recommender System Based on Deep Learning–Intelligent Computing Systems), in view of Hara (US20170228639A1), further in view of Gers (Learning to Forget: Continual Prediction with LSTM).

Regarding claim 20, Liang teaches The system of claim 18. However,  Liang is not relied upon to teach wherein the target hyperparameters comprise a target forget gate bias and a target input gate bias of the artificial intelligence model, and wherein the different hyperparameters comprise different defined forget gate biases and different defined input gate biases of the multiple artificial intelligence models. On the other hand, Gers teaches wherein the target hyperparameters comprise a target forget gate bias and a target input gate bias of the artificial intelligence model, and wherein the different hyperparameters comprise different defined forget gate biases and different defined input gate biases of the multiple artificial intelligence models ([Page 2469] The examiner notes that Gers teaches [Page 2469] a table that lists a learning loop that includes parameters such as input gate and forget gate with biases set to zero. The examiner also notes that Moghadam and Gers both considered to be analogous because they are in the same field of machine learning. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Liang’s training method to incorporate wherein the target hyperparameters comprise a target forget gate bias and a target input gate bias of the artificial intelligence model, and wherein the different hyperparameters comprise different defined forget gate biases and different defined input gate biases of the multiple artificial intelligence models as taught by Gers [Page 2469] to permit LSTM models to learn local self-resets of memory contents that have become irrelevant [Page 2468, section 5]).



    PNG
    media_image1.png
    858
    515
    media_image1.png
    Greyscale


Regarding claim 24, Liang teaches The computer-implemented method of claim 22. However, Liang is not relied upon to teach wherein the target hyperparameters comprise a target forget gate bias and a target input gate bias of the artificial intelligence model, and wherein the different hyperparameters comprise different defined forget gate biases and different defined input gate biases of the multiple artificial intelligence models. On the other hand, Gers teaches wherein the target hyperparameters comprise a target forget gate bias and a target input gate bias of the artificial intelligence model, and wherein the different hyperparameters comprise different defined forget gate biases and different defined input gate biases of the multiple artificial intelligence models ([Page 2469] The examiner notes that Gers teaches [Page 2469] a table that lists a learning loop that includes parameters such as input gate and forget gate with biases set to zero. The examiner also notes that Moghadam and Gers both considered to be analogous because they are in the same field of machine learning. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Liang’s training method to incorporate wherein the target hyperparameters comprise a target forget gate bias and a target input gate bias of the artificial intelligence model, and wherein the different hyperparameters comprise different defined forget gate biases and different defined input gate biases of the multiple artificial intelligence models as taught by Gers [Page 2469] to permit LSTM models to learn local self-resets of memory contents that have become irrelevant [Page 2468, section 5]).

    PNG
    media_image1.png
    858
    515
    media_image1.png
    Greyscale


Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
Wicaksono (Hyper Parameter Optimization using Genetic Algorithm on Machine Learning Methods for Online News Popularity Prediction)
“Wicaksono teaches optimizing hyperparameters using genetic algorithms to predict news popularity”
Henry (US2018/0189640A1)
“Henry teaches a method to improve performance and efficiency of computations associated with ANNs for the recognition of images and speech”
Nogueira (US2017/0308790Al)
“Nogueira teaches the use of convolutional neural network ranking to classify text ”

Any inquiry concerning this communication or earlier communications from the examiner should be directed to SHAMCY ALGHAZZY whose telephone number is (571) 272-8824. The examiner can normally be reached Monday-Friday 7:30am-4:30pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, OMAR FERNANDEZ RIVAS can be reached on (571) 272-2589. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/SHAMCY ALGHAZZY/           Examiner, Art Unit 2128       

/OMAR F FERNANDEZ RIVAS/           Supervisory Patent Examiner, Art Unit 2128