DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
The present action was filed on 09/18/2018.
This action is in response to amendments and remarks filed on 12/01/2021. In the current amendments, claims 1, 13, and 18 are amended. Claims 1-20 are pending and have been examined.
In response to amendments and remarks filed on 12/01/2021, the 35 U.S.C. 101 rejection to claims 1-6, 12-15, and 17-20 made in the previous Office Action has been withdrawn.

Specification
The specification is objected to as failing to provide proper antecedent basis for the claimed subject matter.  See 37 CFR 1.75(d)(1) and MPEP § 608.01(o).  Correction of the following is required: each of claims 1, 13, and 18 recites “wherein a structure of the hierarchical neural network classifier is determined automatically by performing a first training process using a first training dataset”; however, this limitation does not have provide proper antecedent basis in the Specification (Specification [0078] only recites “the structure of the hierarchical neural network classifier 406 is determined automatically based on a confusion matrix obtained using training dataset”).

Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.


The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.

Claims 1-20 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA  35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention. 
Each of claims 1, 13, and 18 recites “wherein a structure of the hierarchical neural network classifier is determined automatically by performing a first training process using a first training dataset”; however, the Specification does not provide written description for this amended feature.  Specification [0078] recites “the structure of the hierarchical neural network classifier 406 is determined automatically based on a confusion matrix obtained using training dataset” (emphasis added), but the Specification does not describe that the structure of the hierarchical neural network classifier 406 is determined automatically based on “performing a first training process using a first training dataset” as required by the claim. Therefore, claims 1, 13, and 18 lack proper written description in the Specification. Each dependent claim is rejected based on same rationale as the claim from which it depends.





Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-4, 6, 13-15, and 18-20 are rejected under 35 U.S.C. 103 as being unpatentable over Abdel-Reheem et al. (US 9,971,763 B2) in view of Le et al. (“Application of a Hybrid Bi-LSTM-CRF Model to the Task of Russian Named Entity Recognition”).
Regarding Claim 1,
Abdel-Reheem et al. teaches A method for natural language processing (Col. 1 lines 47-50: “Named entity recognition is described, for example, to detect an instance of a named entity in a web page and classify the named entity as being an organization or other predefined class” teaches a method for named entity recognition (a type of natural language processing)), 
the method comprising: receiving, by one or more processors, an unstructured text input (Fig. 3 and Col. 3 lines 15-19: “In an example, an end user at his personal computer is using a document viewing application to read a document about financial performance of a number of businesses. The named entity extractor 100 detects named entities in the document text with the class "business organization"” teach named entity extractor receiving a document as text input; Col. 3 lines 11-14: “A document is any text item. A non-exhaustive list of examples of documents is: email, blog, search query, letter, memo, report, web page, book, social media post or comment, tweet” teaches that the input can be unstructured text/document such as web page; Fig. 12 element 1202 teaches processor); 
identifying, using an entity classifier, entities in the unstructured text input, wherein the identifying the entities includes (Fig. 3 and Col. 4 lines 37-40: “The named entity extractor 302 is a machine learning component which is trained to label sequences of words and phrases. The labels indicate named entity classes as mentioned above” teach using the entity extractor (entity classifier) to identify entities in the input in which the extractor can be a machine learning component).
Abdel-Reheem et al. does not appear to explicitly teach generating, using a plurality of sub-classifiers of a hierarchical neural network classifier of the entity classifier, a plurality of lower-level entity identifications associated with the unstructured text input, wherein a structure of the hierarchical neural network classifier is determined automatically by performing a first training process using a first training dataset; and generating, using a combiner of the hierarchical neural network classifier, one or more higher-level entity identifications associated with the unstructured text input based on the plurality of lower-level entity identifications; and providing identified entities based on the one or more higher-level entity identifications.
However, Le et al. teaches generating, using a plurality of sub-classifiers of a hierarchical neural network classifier of the entity classifier, a plurality of lower-level entity identifications associated with the unstructured text input (Fig. 1 and pg. 94 Section 2.4: “In the combined model characters of each word in a sentence are fed into a Bi-LSTM network in order to capture character-level features of words. Then these character-level vector representations are concatenated with word embedding vectors and fed into another Bi-LSTM network. This network calculates a sequence of scores that represent likelihoods of tags for each word in the sentence. To improve accuracy of the prediction a CRF layer is trained to enforce constraints dependent on the order of tags” teach a Combined Bi-LSTM and CRF Model for the named entity recognition task wherein the combined model analyzes features at pg. 96 Section 3.1 teaches the input data can be data from unstructured text input such as documents from web sites), 
wherein a structure of the hierarchical neural network classifier is determined automatically by performing a first training process using a first training dataset (Fig. 1 and pg. 94 Section 2.4: “Thus, we expected that a combination of CRF model with a Bi-LSTM neural network encoding [2] should increase the accuracy of the tagging decisions. The architecture of the model is presented on the Fig. 1...To improve accuracy of the prediction a CRF layer is trained to enforce constraints dependent on the order of tags. Full set of parameters for this model consists of parameters of Bi-LSTM layers (weight matrices, biases, word embedding matrix) and transition matrix of CRF layer. All these parameters are tuned during training stage by back propagation algorithm with stochastic gradient descent. Dropout is applied to avoid over-fitting and improve the system performance” teaches the structure of the Combined Bi-LSTM and CRF Model (corresponds to hierarchical neural network classifier), as represented by its parameters, is tuned (determined) automatically during a training stage by the back propagation algorithm with stochastic gradient descent, which corresponds to performing a first training process using training data (also see pg. 97 Section 3.3)); and 
generating, using a combiner of the hierarchical neural network classifier, one or more higher-level entity identifications associated with the unstructured text input based on the plurality of lower-level entity identifications (Fig. 1 and pg. 94 Section 2.4: “In the combined model characters of each word in a sentence are fed into a Bi-LSTM network in order to capture character-level features of words. Then these character-level vector representations are concatenated with word embedding vectors and fed into another Bi-LSTM network. This network calculates a sequence of scores that represent likelihoods of tags for each word in the sentence. To improve accuracy of the prediction a CRF layer is trained to enforce constraints dependent on the order of tags” teach the Combined Bi-LSTM and CRF Model contains a CRF Model (corresponds to combiner), which generates higher-level entity identifications, based on the lower-level entity identifications from the Bi-LSTMs; pg. 94 first full paragraph: “The CRF model is trained to predict a vector...of tags given a sentence...To do this, a conditional probability is computed” teaches that the CRF model (combiner) generates conditional probability (higher-level entity identification)); and 
providing identified entities based on the one or more higher-level entity identifications (Fig. 1 and caption and pg. 94 first full paragraph: “The CRF model is trained to predict a vector...of tags given a sentence...To do this, a conditional probability is computed” teach providing the tag predictions (identified entities) based on the conditional probability (high-level entity identification)).
Abdel-Reheem et al. and Le et al. are analogous art to the claimed invention because they are directed named entity recognition.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate generating, using a plurality of sub-classifiers of a hierarchical neural network classifier of the entity classifier, a plurality of lower-level entity identifications associated with the unstructured text input, wherein a structure of the hierarchical neural network classifier is determined automatically by performing a first training process using a first training dataset; and generating, using a combiner of the hierarchical neural network classifier, one or more higher-level entity identifications associated with the unstructured text input based on the plurality of lower-level entity identifications; and providing identified entities based on the one or more higher-level entity identifications as taught by Le et al. to the disclosed invention of Abdel-Reheem et al.

Regarding Claim 2,
Abdel-Reheem et al. in view of Le et al. teaches the method of claim 1.
Abdel-Reheem et al. further teaches further comprising: updating a data store based on the identified entities (Col. 7 lines 11-18: “Then the system tags the pages associated with these categories as location pages. The tagged seed pages 704 are used to bootstrap an initial classifier 706 that is used to classify the remainder of the web-based document corpus 708. Then the process selects 15 those pages that were classified with high confidence 7 10 and uses them to augment the original training data. Retraining 712 is carried out” teaches augmenting the original training data with pages that have been classified (with the entity identified); Col. 9 lines 34-36: “A data store 1208 at memory 1210 may store training data, classifiers, documents, criteria, thresholds, rules and other data” teaches the data store stores training data, therefore when the training data is augmented (updated), the data store is also being updated).
Regarding Claim 3,
Abdel-Reheem et al. in view of Le et al. teaches the method of claim 1.
Abdel-Reheem et al. further teaches wherein each identified entity includes a type of an entity, a value of the entity, and a confidence level in the identification (Col. 8 lines 42-44: “P(Cj|tj) is the probability (confidence score) that was assigned by the decoder to the class label Cj for the jth token” teaches the confidence score (confidence level) in the identification of class label (entity type) for a token in which the confidence score is expressed as a probability value (corresponds to value of entity); Col. 6 line 67 to Col. 7 lines 1-2: “For example, these may be: person, location, organization, movie, music, books, miscellaneous entity class, disambiguation class, non-entity class” teaches various entity types as class labels).
Regarding Claim 4,
This claim is reciting alternatives, and has been interpreted as requiring at least one of the alternatives, and not all the alternatives.
Abdel-Reheem et al. in view of Le et al. teaches the method of claim 3.
Abdel-Reheem et al. further teaches wherein a type of the entity is selected from a group consisting of an organization, a person, a date, a time, a percentage, a monetary value, and a pick list type (Col. 6 line 67 to Col. 7 lines 1-2: “For example, these may be: person, location, organization, movie, music, books, miscellaneous entity class, disambiguation class, non-entity class” teaches various entity types as class labels, including organization and person).
Regarding Claim 6,
Abdel-Reheem et al. in view of Le et al. teaches the method of claim 1.
Le et al. further teaches further comprising: providing, using a base neural network classifier of the entity classifier, context information associated with the unstructured text input; and generating, using the plurality of sub-classifiers of the hierarchical neural network classifier, the plurality of lower-level entity identifications based on the context information (pg. 93 Section 2.2: “Correct recognition of named entity in a sentence depends on the context of the word. Both preceding and following words matter to predict a tag. Bi-directional recurrent neuronal networks [12] were designed to encode every element in a sequence taking into account left and right contexts which makes it one of the best choices for NER task. Bi-directional model calculation consists of two steps: (1) the forward layer computes representation of the left context, and (2) the backward layer computes representation of the right context. Outputs of these steps are then concatenated to produce a complete representation of an element of the input sequence. Bi-directional LSTM encoders have been demonstrated to be useful in many NLP tasks such as machine translation, question answering, and especially for NER problem” teaches that each Bi-LSTM neural network (a specific type of Bi-directional recurrent neural network) provides context information associated with the input; since Fig. 1 shows that there are at least two levels of Bi-LSTMs neural network, the first level of Bi-LSTMs neural network (character level) can be considered a base neural network, and provides output (based on the context information) to the second level of Bi-LSTMs (plurality of sub-classifiers) to generate lower-level entity identifications).
Abdel-Reheem et al. and Le et al. are analogous art to the claimed invention because they are directed named entity recognition.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate further comprising: providing, using a base neural network classifier of the entity classifier, context information associated with the unstructured text input; and generating, using the plurality of sub-classifiers of the hierarchical neural network classifier, the plurality of lower-level entity identifications based on the context information as taught by Le et al. to the disclosed invention of Abdel-Reheem et al.
One of ordinary skill in the arts would have been motivated to make this modification to leverage a Combined Bi-LSTM and CRF Model because “addition of CRF layer to the Bi-LSTM model significantly increases it’s quality” in the Named Entity Recognition task and because “Bi-directional LSTM encoders have been demonstrated to be useful in many NLP tasks such as machine translation, question answering, and especially for NER problem” (Le et al. pg. 101 Section 5; pg. 93 Section 2.2).
Regarding Claim 13,
Abdel-Reheem et al. teaches A non-transitory machine-readable medium comprising executable code which when executed by one or more processors associated with a computing device are adapted to cause the one or more processors to perform a method comprising (Col. 1 lines 47-50: “Named entity recognition is described, for example, to detect an instance of a named entity in a web page and classify the named entity as being an organization or other predefined class” teaches a method for named entity recognition (a type of natural language processing); Col. 9 lines 37-60 teaches computing based device, computer-readable media with computer executable instructions; Col. 9 lines 17-24 teaches processor):
receiving an unstructured text input (Fig. 3 and Col. 3 lines 15-19: “In an example, an end user at his personal computer is using a document viewing application to read a document about financial performance of a number of businesses. The named entity extractor 100 detects named entities in the document text with the class "business organization"” teach named entity extractor receiving a document as text input; Col. 3 lines 11-14: “A document is any text item. A non-exhaustive list of examples of documents is: email, blog, search query, letter, memo, report, web page, book, social media post or comment, tweet” teaches that the input can be unstructured text/document such as web page); 
identifying, using an entity classifier, entities in the unstructured text input, wherein the identifying the entities includes (Fig. 3 and Col. 4 lines 37-40: “The named entity extractor 302 is a machine learning component which is trained to label sequences of words and phrases. The labels indicate named entity classes as mentioned above” teach using the entity extractor (entity classifier) to identify entities in the input in which the extractor can be a machine learning component).
Abdel-Reheem et al. does not appear to explicitly teach generating, using a plurality of sub-classifiers of a hierarchical neural network classifier of the entity classifier, a plurality of lower-level entity identifications associated with the unstructured text input, wherein a structure of the hierarchical neural network classifier is determined automatically by performing a first training process using a first training dataset; and generating, using a combiner of the hierarchical neural network classifier, one or more higher-level entity identifications associated with the unstructured text input based on the 
However, Le et al. teaches generating, using a plurality of sub-classifiers of a hierarchical neural network classifier of the entity classifier, a plurality of lower-level entity identifications associated with the unstructured text input (Fig. 1 and pg. 94 Section 2.4: “In the combined model characters of each word in a sentence are fed into a Bi-LSTM network in order to capture character-level features of words. Then these character-level vector representations are concatenated with word embedding vectors and fed into another Bi-LSTM network. This network calculates a sequence of scores that represent likelihoods of tags for each word in the sentence. To improve accuracy of the prediction a CRF layer is trained to enforce constraints dependent on the order of tags” teach a Combined Bi-LSTM and CRF Model for the named entity recognition task wherein the combined model analyzes features at different levels (hierarchy), thus rendering the combined model to correspond to a hierarchical neural network classifier of the entity classifier; in the combined model, a plurality of Bi-LSTM networks (plurality of sub-classifiers) are used to generate lower-level entity identifications; for example, the first level of Bi-LSTMs generate character-level features which is then fed into another level of Bi-LSTMs to produce “likelihoods of tags for each word in the sentence” (more lower-level entity identifications); pg. 96 Section 3.1 teaches the input data can be data from unstructured text input such as documents from web sites),
wherein a structure of the hierarchical neural network classifier is determined automatically by performing a first training process using a first training dataset (Fig. 1 and pg. 94 Section 2.4: “Thus, we expected that a combination of CRF model with a Bi-LSTM neural network encoding [2] should increase the accuracy of the tagging decisions. The architecture of the model is presented on the Fig. 1...To improve accuracy of the prediction a CRF layer is trained to enforce constraints dependent on the order of tags. Full set of parameters for this model consists of parameters of Bi-LSTM layers (weight matrices, biases, word embedding matrix) and transition matrix of CRF layer. All these parameters are tuned during training stage by back propagation algorithm with stochastic gradient descent. Dropout is applied to avoid over-fitting and improve the system performance” teaches the structure of the Combined Bi-LSTM and CRF Model (corresponds to hierarchical neural network classifier), as represented by its parameters, is tuned (determined) automatically during a training stage by the back propagation algorithm with stochastic gradient descent, which corresponds to performing a first training process using training data (also see pg. 97 Section 3.3)); and 
generating, using a combiner of the hierarchical neural network classifier, one or more higher-level entity identifications associated with the unstructured text input based on the plurality of lower-level entity identifications (Fig. 1 and pg. 94 Section 2.4: “In the combined model characters of each word in a sentence are fed into a Bi-LSTM network in order to capture character-level features of words. Then these character-level vector representations are concatenated with word embedding vectors and fed into another Bi-LSTM network. This network calculates a sequence of scores that represent likelihoods of tags for each word in the sentence. To improve accuracy of the prediction a CRF layer is trained to enforce constraints dependent on the order of tags” teach the Combined Bi-LSTM and CRF Model contains a CRF Model (corresponds to combiner), which generates higher-level entity identifications, based on the lower-level entity identifications from the Bi-LSTMs; pg. 94 first full paragraph: “The CRF model is trained to predict a vector...of tags given a sentence...To do this, a conditional probability is computed” teaches that the CRF model (combiner) generates conditional probability (higher-level entity identification)); and 
providing identified entities based on the one or more higher-level entity identifications (Fig. 1 and caption and pg. 94 first full paragraph: “The CRF model is trained to predict a vector...of tags given a sentence...To do this, a conditional probability is computed” teach providing the tag predictions (identified entities) based on the conditional probability (high-level entity identification)).

It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate generating, using a plurality of sub-classifiers of a hierarchical neural network classifier of the entity classifier, a plurality of lower-level entity identifications associated with the unstructured text input, wherein a structure of the hierarchical neural network classifier is determined automatically by performing a first training process using a first training dataset; and generating, using a combiner of the hierarchical neural network classifier, one or more higher-level entity identifications associated with the unstructured text input based on the plurality of lower-level entity identifications; and providing identified entities based on the one or more higher-level entity identifications as taught by Le et al. to the disclosed invention of Abdel-Reheem et al.
One of ordinary skill in the arts would have been motivated to make this modification to leverage a Combined Bi-LSTM and CRF Model because “addition of CRF layer to the Bi-LSTM model significantly increases it’s quality” in the Named Entity Recognition task and because “Bi-directional LSTM encoders have been demonstrated to be useful in many NLP tasks such as machine translation, question answering, and especially for NER problem” (Le et al. pg. 101 Section 5; pg. 93 Section 2.2).
Regarding Claim 14,
Abdel-Reheem et al. in view of Le et al. teaches the non-transitory machine-readable medium of claim 13.
Abdel-Reheem et al. further teaches wherein each identified entity includes a type of an entity, a value of the entity, and a confidence level in the identification (Col. 8 lines 42-44: “P(Cj|tj) is the probability (confidence score) that was assigned by the decoder to the class label Cj for the jth token” teaches the confidence score (confidence level) in the identification of class label (entity type) for a token in which the confidence score is expressed as a probability value (corresponds to value of entity); Col. 6 line 67 to Col. 7 lines 1-2: “For example, these may be: person, location, organization, movie, music, books, miscellaneous entity class, disambiguation class, non-entity class” teaches various entity types as class labels).
Regarding Claim 15,
Abdel-Reheem et al. in view of Le et al. teaches the non-transitory machine-readable medium of claim 13.
Le et al. further teaches wherein the method further comprises: providing, using a base neural network classifier of the entity classifier, context information associated with the unstructured text input; and generating, using the plurality of sub-classifiers of the hierarchical neural network classifier, the plurality of lower-level entity identifications based on the context information (pg. 93 Section 2.2: “Correct recognition of named entity in a sentence depends on the context of the word. Both preceding and following words matter to predict a tag. Bi-directional recurrent neuronal networks [12] were designed to encode every element in a sequence taking into account left and right contexts which makes it one of the best choices for NER task. Bi-directional model calculation consists of two steps: (1) the forward layer computes representation of the left context, and (2) the backward layer computes representation of the right context. Outputs of these steps are then concatenated to produce a complete representation of an element of the input sequence. Bi-directional LSTM encoders have been demonstrated to be useful in many NLP tasks such as machine translation, question answering, and especially for NER problem” teaches that each Bi-LSTM neural network (a specific type of Bi-directional recurrent neural network) provides context information associated with the input; since Fig. 1 shows that there are at least two levels of Bi-LSTMs neural network, the first level of Bi-LSTMs neural network (character level) can be considered a base neural network, and provides output (based on the context information) to the second level of Bi-LSTMs (plurality of sub-classifiers) to generate lower-level entity identifications).

It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate wherein the method further comprises: providing, using a base neural network classifier of the entity classifier, context information associated with the unstructured text input; and generating, using the plurality of sub-classifiers of the hierarchical neural network classifier, the plurality of lower-level entity identifications based on the context information as taught by Le et al. to the disclosed invention of Abdel-Reheem et al.
One of ordinary skill in the arts would have been motivated to make this modification to leverage a Combined Bi-LSTM and CRF Model because “addition of CRF layer to the Bi-LSTM model significantly increases it’s quality” in the Named Entity Recognition task and because “Bi-directional LSTM encoders have been demonstrated to be useful in many NLP tasks such as machine translation, question answering, and especially for NER problem” (Le et al. pg. 101 Section 5; pg. 93 Section 2.2).
Regarding Claim 18,
Abdel-Reheem et al. teaches A computing device comprising: a memory; and one or more processors coupled to the memory; wherein the one or more processors are configured to (Col. 1 lines 47-50: “Named entity recognition is described, for example, to detect an instance of a named entity in a web page and classify the named entity as being an organization or other predefined class” teaches a method for named entity recognition (a type of natural language processing); Col. 9 lines 37-60 teaches computing based device and memory; Col. 9 lines 17-24 teaches processor):
identifying, using an entity classifier, entities in the unstructured text input, wherein the identifying the entities includes (Fig. 3 and Col. 4 lines 37-40: “The named entity extractor 302 is a machine learning component which is trained to label sequences of words and phrases. The labels indicate named entity classes as mentioned above” teach using the entity extractor (entity classifier) to identify entities in the input in which the extractor can be a machine learning component).
Abdel-Reheem et al. does not appear to explicitly teach generating, using a plurality of sub-classifiers of a hierarchical neural network classifier of the entity classifier, a plurality of lower-level entity identifications associated with the unstructured text input, wherein a structure of the hierarchical neural network classifier is determined automatically by performing a first training process using a first training dataset; and generating, using a combiner of the hierarchical neural network classifier, one or more higher-level entity identifications associated with the unstructured text input based on the plurality of lower-level entity identifications; and providing identified entities based on the one or more higher-level entity identifications.
However, Le et al. teaches generating, using a plurality of sub-classifiers of a hierarchical neural network classifier of the entity classifier, a plurality of lower-level entity identifications associated with the unstructured text input (Fig. 1 and pg. 94 Section 2.4: “In the combined model characters of each word in a sentence are fed into a Bi-LSTM network in order to capture character-level features of words. Then these character-level vector representations are concatenated with word embedding vectors and fed into another Bi-LSTM network. This network calculates a sequence of scores that represent likelihoods of tags for each word in the sentence. To improve accuracy of the prediction a CRF layer is trained to enforce constraints dependent on the order of tags” teach a Combined Bi-LSTM and CRF Model for the named entity recognition task wherein the combined model analyzes features at different levels (hierarchy), thus rendering the combined model to correspond to a hierarchical neural network classifier of the entity classifier; in the combined model, a plurality of Bi-LSTM networks (plurality of sub-classifiers) are used to generate lower-level entity identifications; for example, the first level of Bi-LSTMs generate character-level features which is then fed into another level of Bi-LSTMs to produce “likelihoods of tags for each word in the sentence” (more lower-level entity identifications); pg. 96 Section 3.1 teaches the input data can be data from unstructured text input such as documents from web sites), 
wherein a structure of the hierarchical neural network classifier is determined automatically by performing a first training process using a first training dataset (Fig. 1 and pg. 94 Section 2.4: “Thus, we expected that a combination of CRF model with a Bi-LSTM neural network encoding [2] should increase the accuracy of the tagging decisions. The architecture of the model is presented on the Fig. 1...To improve accuracy of the prediction a CRF layer is trained to enforce constraints dependent on the order of tags. Full set of parameters for this model consists of parameters of Bi-LSTM layers (weight matrices, biases, word embedding matrix) and transition matrix of CRF layer. All these parameters are tuned during training stage by back propagation algorithm with stochastic gradient descent. Dropout is applied to avoid over-fitting and improve the system performance” teaches the structure of the Combined Bi-LSTM and CRF Model (corresponds to hierarchical neural network classifier), as represented by its parameters, is tuned (determined) automatically during a training stage by the back propagation algorithm with stochastic gradient descent, which corresponds to performing a first training process using training data (also see pg. 97 Section 3.3)); and 
generating, using a combiner of the hierarchical neural network classifier, one or more higher-level entity identifications associated with the unstructured text input based on the plurality of lower-level entity identifications (Fig. 1 and pg. 94 Section 2.4: “In the combined model characters of each word in a sentence are fed into a Bi-LSTM network in order to capture character-level features of words. Then these character-level vector representations are concatenated with word embedding vectors and fed into another Bi-LSTM network. This network calculates a sequence of scores that represent likelihoods of tags for each word in the sentence. To improve accuracy of the prediction a CRF layer is trained to enforce constraints dependent on the order of tags” teach the Combined Bi-LSTM and CRF Model contains a CRF Model (corresponds to combiner), which generates higher-level pg. 94 first full paragraph: “The CRF model is trained to predict a vector...of tags given a sentence...To do this, a conditional probability is computed” teaches that the CRF model (combiner) generates conditional probability (higher-level entity identification)); and 
providing identified entities based on the one or more higher-level entity identifications (Fig. 1 and caption and pg. 94 first full paragraph: “The CRF model is trained to predict a vector...of tags given a sentence...To do this, a conditional probability is computed” teach providing the tag predictions (identified entities) based on the conditional probability (high-level entity identification)).
Abdel-Reheem et al. and Le et al. are analogous art to the claimed invention because they are directed named entity recognition.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate generating, using a plurality of sub-classifiers of a hierarchical neural network classifier of the entity classifier, a plurality of lower-level entity identifications associated with the unstructured text input, wherein a structure of the hierarchical neural network classifier is determined automatically by performing a first training process using a first training dataset; and generating, using a combiner of the hierarchical neural network classifier, one or more higher-level entity identifications associated with the unstructured text input based on the plurality of lower-level entity identifications; and providing identified entities based on the one or more higher-level entity identifications as taught by Le et al. to the disclosed invention of Abdel-Reheem et al.
One of ordinary skill in the arts would have been motivated to make this modification to leverage a Combined Bi-LSTM and CRF Model because “addition of CRF layer to the Bi-LSTM model significantly increases it’s quality” in the Named Entity Recognition task and because “Bi-directional LSTM encoders have been demonstrated to be useful in many NLP tasks such as machine translation, question answering, and especially for NER problem” (Le et al. pg. 101 Section 5; pg. 93 Section 2.2).
Regarding Claim 19,
Abdel-Reheem et al. in view of Le et al. teaches the computing device of claim 18.
Abdel-Reheem et al. further teaches wherein each identified entity includes a type of an entity, a value of the entity, and a confidence level in the identification (Col. 8 lines 42-44: “P(Cj|tj) is the probability (confidence score) that was assigned by the decoder to the class label Cj for the jth token” teaches the confidence score (confidence level) in the identification of class label (entity type) for a token in which the confidence score is expressed as a probability value (corresponds to value of entity); Col. 6 line 67 to Col. 7 lines 1-2: “For example, these may be: person, location, organization, movie, music, books, miscellaneous entity class, disambiguation class, non-entity class” teaches various entity types as class labels).
Regarding Claim 20,
This claim is reciting alternatives, and has been interpreted as requiring at least one of the alternatives, and not all the alternatives.
Abdel-Reheem et al. in view of Le et al. teaches computing device of claim 19.
Abdel-Reheem et al. further teaches wherein a type of the entity is selected from a group consisting of an organization, a person, a date, a time, a percentage, a monetary value, and a pick list type (Col. 6 line 67 to Col. 7 lines 1-2: “For example, these may be: person, location, organization, movie, music, books, miscellaneous entity class, disambiguation class, non-entity class” teaches various entity types as class labels, including organization and person).



Claim 5 is rejected under 35 U.S.C. 103 as being unpatentable over Abdel-Reheem et al. (US 9,971,763 B2) in view of Le et al. (“Application of a Hybrid Bi-LSTM-CRF Model to the Task of Russian Named Entity Recognition”) and further in view of Ma et al. (US 2019/0258904 A1).
Regarding Claim 5,
Abdel-Reheem et al. in view of Le et al. teaches the method of claim 1.
Abdel-Reheem et al. in view of Le et al. does not appear to explicitly teach wherein the plurality of sub-classifiers are automatically determined using a confusion matrix.
However, Ma et al. teaches wherein the plurality of sub-classifiers are automatically determined using a confusion matrix (pg. 7 [0070]: “In an operation 220, an eleventh indicator may include specified values for one or more of the hyperparameters and/or specified values for an automatic tuning method (autotune option) associated with each of the plurality of predictive type models...To reduce the effort in adjusting these hyperparameters, an automatic tuning process may be used to identify the best settings for the hyperparameters though the hyperparameters may optionally be selected as an input option by a user. An optimization algorithm (tuner) searches for the best possible combination of values of the hyperparameters while trying to minimize an objective function. The objective function is a validation error estimate (e.g., misclassification error for nominal targets or average square error for interval targets). The tuning process includes multiple iterations with each iteration typically involving multiple objective function evaluations” teaches autotuning a plurality of predictive type models, which corresponds to automatically determining sub-classifiers, based on misclassification error; pg. 2-3 [0031]: “The confusion matrix shows the number (count) of correct and incorrect predictions compared to a ground truth based on input dataset 124, which correctly indicates whether or not the event has or has not occurred” teaches a confusion matrix is used to represent the misclassification error information, thus rendering the determination of the models uses a confusion matrix (also see Table I)).

It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate wherein the plurality of sub-classifiers are automatically determined using a confusion matrix as taught by Ma et al. to the disclosed invention of Abdel-Reheem et al. in view of Le et al.
One of ordinary skill in the arts would have been motivated to make this modification in order to leverage “an automatic tuning process” of models based on the misclassification error, as represented by a confusion matrix, “[t]o reduce the effort in adjusting these hyperparameters” (Ma et al. pg. 7 [0070]; pg. 2-3 [0031]).

Claims 12 and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Abdel-Reheem et al. (US 9,971,763 B2) in view of Le et al. (“Application of a Hybrid Bi-LSTM-CRF Model to the Task of Russian Named Entity Recognition”) and further in view of Kim et al. (“Two-Phase Biomedical Named Entity Recognition Using A Hybrid Method”).
Regarding Claim 12,
Abdel-Reheem et al. in view of Le et al. teaches the method of claim 1.
Abdel-Reheem et al. in view of Le et al. does not appear to explicitly teach further comprising: providing, using a deterministic model, one or more of deterministic entity identifications associated with the unstructured text input; and generating the identified entities using the one or more higher-level entity identifications and one or more deterministic entity identifications.
However, Kim et al. teaches further comprising: providing, using a deterministic model, one or more of deterministic entity identifications associated with the unstructured text input; and generating the identified entities using the one or more higher-level entity identifications and one or more pg. 656 Section 5 to pg. 657: “We presented a two-phase biomedical NE recognition model, term boundary detection and semantic labeling. We proposed two exponential models for each phase. That is, CRFs are used for term detection phase including Markov process and ME is used for semantic labeling. The benefit of dividing the whole process into two processes is that, by separating the processes with different characteristics, we can select separately the discriminative feature set for each subtask, and moreover measure effectiveness of models at each phase. Furthermore, we use the rule-based method as post-processing to refine the result. The rules are extracted from the GENIA corpus, which is represented by the deterministic FST. The rule-based approach is effective to correct errors by cascading structures of biomedical NEs” teaches using a deterministic FST (finite state transducer) model (corresponds to deterministic model) to refine name entity identification results, which corresponds to providing deterministic entity identifications associated with text input; since the final results are refined by the deterministic FST, the final results correspond to identified entities using higher-level entity identification from the CRFs and ME models, and the deterministic identifications from the deterministic FST; pg. 646 Section 1 teaches that the task of biomedical named entity recognition involves identifying entities in biomedical texts in electronic form (corresponds to unstructured data), also see pg. 653 Section 4.1).
Abdel-Reheem et al., Le et al., and Kim et al. are analogous art to the claimed invention because they are directed to named entity recognition.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate further comprising: providing, using a deterministic model, one or more of deterministic entity identifications associated with the unstructured text input; and generating the identified entities using the one or more higher-level entity identifications and one or more deterministic entity identifications as taught by Kim et al. to the disclosed invention of Abdel-Reheem et al. in view of Le et al.

Regarding Claim 17,
Abdel-Reheem et al. in view of Le et al. teaches the non-transitory machine-readable medium of claim 13.
Abdel-Reheem et al. in view of Le et al. does not appear to explicitly teach wherein the method further comprises: providing, using a deterministic model, one or more deterministic entity identifications associated with the unstructured text input; and generating the identified entities based on the one or more higher-level entity identifications and one or more deterministic entity identifications.
However, Kim et al. teaches wherein the method further comprises: providing, using a deterministic model, one or more deterministic entity identifications associated with the unstructured text input; and generating the identified entities based on the one or more higher-level entity identifications and one or more deterministic entity identifications (pg. 656 Section 5 to pg. 657: “We presented a two-phase biomedical NE recognition model, term boundary detection and semantic labeling. We proposed two exponential models for each phase. That is, CRFs are used for term detection phase including Markov process and ME is used for semantic labeling. The benefit of dividing the whole process into two processes is that, by separating the processes with different characteristics, we can select separately the discriminative feature set for each subtask, and moreover measure effectiveness of models at each phase. Furthermore, we use the rule-based method as post-processing to refine the result. The rules are extracted from the GENIA corpus, which is represented by the deterministic FST. The rule-based approach is effective to correct errors by cascading structures of biomedical NEs” teaches using a deterministic FST (finite state transducer) model (corresponds to deterministic model) to refine name entity identification results, which corresponds to providing deterministic entity identifications associated with text input; since the final results are refined by the deterministic FST, the final results correspond to identified entities using higher-level entity identification from the CRFs and ME models, and the deterministic identifications from the deterministic FST; pg. 646 Section 1 teaches that the task of biomedical named entity recognition involves identifying entities in biomedical texts in electronic form (corresponds to unstructured data), also see pg. 653 Section 4.1).
Abdel-Reheem et al., Le et al., and Kim et al. are analogous art to the claimed invention because they are directed to named entity recognition.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate wherein the method further comprises: providing, using a deterministic model, one or more deterministic entity identifications associated with the unstructured text input; and generating the identified entities based on the one or more higher-level entity identifications and one or more deterministic entity identifications as taught by Kim et al. to the disclosed invention of Abdel-Reheem et al. in view of Le et al.
One of ordinary skill in the arts would have been motivated to make this modification to “use the rule-based method as postprocessing to refine the result” in which the rule-based method is a deterministic FST (finite state transducer) because such a “rule-based approach is effective to correct errors by cascading structures of biomedical NEs” (Kim et al. pg. 656 Section 5 to pg. 657).



Claims 7-10 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Abdel-Reheem et al. (US 9,971,763 B2) in view of Le et al. (“Application of a Hybrid Bi-LSTM-CRF Model to the Task of Russian Named Entity Recognition”) in view of Miao et al. (US 2017/0091652 A1) and further in view of Chen et al. (“Attention-based Hierarchical Neural Query Suggestion”).
Regarding Claim 7,
Abdel-Reheem et al. in view of Le et al. teaches the method of claim 6.
Abdel-Reheem et al. in view of Le et al. does not appear to explicitly teach wherein the method further comprises: performing a global training process on the base neural network classifier, for a plurality of users, using a global training dataset.
However, Miao et al. teaches wherein the method further comprises: performing a global training process on the base neural network classifier, for a plurality of users, using a global training dataset (pg. 2 [0023]: “The client may then generate an update to the global version by providing user feedback and/or other input data as training data to the global version. In turn, the clients may transmit updates 114-116 to server 102, and server 102 may merge updates 114-116 into subsequent global versions of statistical model 108. After a new global version of statistical model 108 is created, server 102 may transmit the new global version to the clients to propagate updates” and pg. 4 [0045]: “aggregated input from the other users used to create global versions of statistical model 108” teach that the global version of the statistical model is trained with a global training data set with data coming from multiple users, thus rendering the training process to be a global process; pg. 1-2 [0021]: “Statistical model 108 may be used to perform statistical inference, estimation, classification, clustering, personalization, recommendation, optimization, hypothesis testing, and/or other types of data analysis. For example, statistical model 108 may be a regression model, artificial neural network...hierarchical model, and/ or ensemble model. The results of such analysis may be used to discover relationships, patterns, and/or trends in the data; gain insights from the input data; and/or guide decisions or actions related to the data. For example, statistical model 108 may be used to analyze input data related to users, organizations, applications, websites, content, and/or other categories. Statistical model 108 may then be used to output scores, provide recommendations, make predictions, manage relationships, and/or personalize user experiences based on the data” teaches the statistical model 108 can be a neural network and the model can be used to analyze input data related to users to perform classification).
Abdel-Reheem et al., Le et al., and Miao et al. are analogous art to the claimed invention because they are directed to machine learning analytics.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate wherein the method further comprises: performing a global training process on the base neural network classifier, for a plurality of users, using a global training dataset as taught by Miao et al. to the disclosed invention of Abdel-Reheem et al. in view of Le et al.
One of ordinary skill in the arts would have been motivated to make this modification because 
“updates to the global versions are not affected by variations in the processing speed, computational power, and/or network delay of individual clients, statistical model 108 may be updated faster than distributed machine learning techniques that include barriers or locks for synchronizing statistical model updates” (Miao et al. pg. 2 [0026]).
Abdel-Reheem et al. in view of Le et al. in view of Miao et al. does not appear to explicitly teach performing a custom training process on the hierarchical neural network classifier, for a first user of the plurality of users, using a custom training dataset associated with the first user.
However, Chen et al. teaches performing a custom training process on the hierarchical neural network classifier, for a first user of the plurality of users, using a custom training dataset associated with the first user (Fig. 2 teaches the Hierarchical user-session RNN (corresponds to a hierarchical neural network classifier) that contains a user-level RNN and a session-level RNN; pg. 1094 Section 2.1: “As in [10], session-level RNNs are our starting point. Here in the neural based query suggestion model (NQS), queries in the current session are taken as sequential input” and pg. 1094 Section 2.2: “We assume that a user u has Nu query sessions” teach that the Hierarchical user-session RNN is trained with data associated with a first user’s query sessions, thus rendering the training process to be custom to the data associated with user; Table 1 teaches a plurality of users).
Abdel-Reheem et al., Le et al., Miao et al., and Chen et al. are analogous art to the claimed invention because they are directed to machine learning analytics.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate performing a custom training process on the hierarchical neural network classifier, for a first user of the plurality of users, using a custom training dataset associated with the first user as taught by Chen et al. to the disclosed invention of Abdel-Reheem et al. in view of Le et al. in view of Miao et al.
One of ordinary skill in the arts would have been motivated to make this modification in order to leverage a Hierarchical user-session RNN because the “hierarchical structure can model both the user’s short-term and long-term search behavior effectively” (Chen et al. pg. 1096 Section 5).
Regarding Claim 8,
Abdel-Reheem et al. in view of Le et al. in view of Miao et al. in view of Chen et al. teaches the method of claim 7.
Miao et al. further teaches wherein a size of the global training dataset is greater than a size of the custom training dataset (pg. 2 [0023]: “The client may then generate an update to the global version by providing user feedback and/or other input data as training data to the global version. In turn, the clients may transmit updates 114-116 to server 102, and server 102 may merge updates 114-116 into subsequent global versions of statistical model 108. After a new global version of statistical model 108 is created, server 102 may transmit the new global version to the clients to propagate updates” and pg. 4 [0045]: “aggregated input from the other users used to create global versions of statistical model 108” teach the global training data set contains data from multiple users; pg. 5 [0059]: “Each piece of user feedback 314 may be provided as training data that is used to create or update personalized version 306 during user session 310” teaches that the personalized (custom) training set is data from a user’s feedback, thus rendering the personalized version is trained with less data than the global version of the model).
Abdel-Reheem et al., Le et al., and Miao et al. are analogous art to the claimed invention because they are directed to machine learning analytics.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate wherein a size of the global training dataset is greater than a size of the custom training dataset as taught by Miao et al. to the disclosed invention of Abdel-Reheem et al. in view of Le et al.
One of ordinary skill in the arts would have been motivated to make this modification because 
“updates to the global versions are not affected by variations in the processing speed, computational power, and/or network delay of individual clients, statistical model 108 may be updated faster than distributed machine learning techniques that include barriers or locks for synchronizing statistical model updates” (Miao et al. pg. 2 [0026]).
Regarding Claim 9,
Abdel-Reheem et al. in view of Le et al. in view of Miao et al. in view of Chen et al. teaches the method of claim 7.
Le et al. further teaches wherein the base neural network classifier includes a base neural network model; and wherein each sub-classifier of the hierarchical classifier includes a custom neural network model (Fig. 1 teaches at least one base neural network that is a Bi-LSTM neural network (see Fig. 1 further teaches multiple Bi-LSTMs as sub-classifiers; for example, each input ci has its own custom Bi-LSTM neural network).
Abdel-Reheem et al. and Le et al. are analogous art to the claimed invention because they are directed named entity recognition.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate wherein the base neural network classifier includes a base neural network model; and wherein each sub-classifier of the hierarchical classifier includes a custom neural network model as taught by Le et al. to the disclosed invention of Abdel-Reheem et al.
One of ordinary skill in the arts would have been motivated to make this modification to leverage a Combined Bi-LSTM and CRF Model because “addition of CRF layer to the Bi-LSTM model significantly increases it’s quality” in the Named Entity Recognition task and because “Bi-directional LSTM encoders have been demonstrated to be useful in many NLP tasks such as machine translation, question answering, and especially for NER problem” (Le et al. pg. 101 Section 5; pg. 93 Section 2.2).
Regarding Claim 10,
Abdel-Reheem et al. in view of Le et al. in view of Miao et al. in view of Chen et al. teaches the method of claim 9. 
Le et al. further teaches wherein each of the base and custom neural network models includes a bidirectional recurrent neural network (BRNN) model (Fig. 1 teaches that each neural network model in the two levels of neural networks (first level being base and second level being custom) includes a Bi-LSTM neural network, which is a specific type of Bi-directional RNN).
Abdel-Reheem et al. and Le et al. are analogous art to the claimed invention because they are directed named entity recognition.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate wherein each of the base and custom neural network models 
One of ordinary skill in the arts would have been motivated to make this modification to leverage a Combined Bi-LSTM and CRF Model because “addition of CRF layer to the Bi-LSTM model significantly increases it’s quality” in the Named Entity Recognition task and because “Bi-directional LSTM encoders have been demonstrated to be useful in many NLP tasks such as machine translation, question answering, and especially for NER problem” (Le et al. pg. 101 Section 5; pg. 93 Section 2.2).
Regarding Claim 16,
Abdel-Reheem et al. in view of Le et al. teaches the non-transitory machine-readable medium of claim 15.
Abdel-Reheem et al. in view of Le et al. does not appear to explicitly teach wherein the method further comprises: performing a global training process on the base neural network classifier, for a plurality of users, using a global training dataset.
However, Miao et al. teaches wherein the method further comprises: performing a global training process on the base neural network classifier, for a plurality of users, using a global training dataset (pg. 2 [0023]: “The client may then generate an update to the global version by providing user feedback and/or other input data as training data to the global version. In turn, the clients may transmit updates 114-116 to server 102, and server 102 may merge updates 114-116 into subsequent global versions of statistical model 108. After a new global version of statistical model 108 is created, server 102 may transmit the new global version to the clients to propagate updates” and pg. 4 [0045]: “aggregated input from the other users used to create global versions of statistical model 108” teach that the global version of the statistical model is trained with a global training data set with data coming from multiple users, thus rendering the training process to be a global process; pg. 1-2 [0021]: “Statistical model 108 may be used to perform statistical inference, estimation, classification, clustering, personalization, recommendation, optimization, hypothesis testing, and/or other types of data analysis. For example, statistical model 108 may be a regression model, artificial neural network...hierarchical model, and/ or ensemble model. The results of such analysis may be used to discover relationships, patterns, and/or trends in the data; gain insights from the input data; and/or guide decisions or actions related to the data. For example, statistical model 108 may be used to analyze input data related to users, organizations, applications, websites, content, and/or other categories. Statistical model 108 may then be used to output scores, provide recommendations, make predictions, manage relationships, and/or personalize user experiences based on the data” teaches the statistical model 108 can be a neural network and the model can be used to analyze input data related to users to perform classification).
Abdel-Reheem et al., Le et al., and Miao et al. are analogous art to the claimed invention because they are directed to machine learning analytics.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate wherein the method further comprises: performing a global training process on the base neural network classifier, for a plurality of users, using a global training dataset as taught by Miao et al. to the disclosed invention of Abdel-Reheem et al. in view of Le et al.
One of ordinary skill in the arts would have been motivated to make this modification because 
“updates to the global versions are not affected by variations in the processing speed, computational power, and/or network delay of individual clients, statistical model 108 may be updated faster than distributed machine learning techniques that include barriers or locks for synchronizing statistical model updates” (Miao et al. pg. 2 [0026]).
Abdel-Reheem et al. in view of Le et al. in view of Miao et al. does not appear to explicitly teach performing a custom training process on the hierarchical neural network classifier, for a first user of the plurality of users, using a custom training dataset associated with the first user.
Chen et al. teaches performing a custom training process on the hierarchical neural network classifier, for a first user of the plurality of users, using a custom training dataset associated with the first user (Fig. 2 teaches the Hierarchical user-session RNN (corresponds to a hierarchical neural network classifier) that contains a user-level RNN and a session-level RNN; pg. 1094 Section 2.1: “As in [10], session-level RNNs are our starting point. Here in the neural based query suggestion model (NQS), queries in the current session are taken as sequential input” and pg. 1094 Section 2.2: “We assume that a user u has Nu query sessions” teach that the Hierarchical user-session RNN is trained with data associated with a first user’s query sessions, thus rendering the training process to be custom to the data associated with user; Table 1 teaches a plurality of users).
Abdel-Reheem et al., Le et al., Miao et al., and Chen et al. are analogous art to the claimed invention because they are directed to machine learning analytics.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate performing a custom training process on the hierarchical neural network classifier, for a first user of the plurality of users, using a custom training dataset associated with the first user as taught by Chen et al. to the disclosed invention of Abdel-Reheem et al. in view of Le et al. in view of Miao et al.
One of ordinary skill in the arts would have been motivated to make this modification in order to leverage a Hierarchical user-session RNN because the “hierarchical structure can model both the user’s short-term and long-term search behavior effectively” (Chen et al. pg. 1096 Section 5).

Claim 11 is rejected under 35 U.S.C. 103 as being unpatentable over Abdel-Reheem et al. (US 9,971,763 B2) in view of Le et al. (“Application of a Hybrid Bi-LSTM-CRF Model to the Task of Russian Named Entity Recognition”) in view of Miao et al. (US 2017/0091652 A1) in view of Chen et al. Xu et al. (“Neural Fine-Grained Entity Type Classification with Hierarchy-Aware Loss”).
Regarding Claim 11,
Abdel-Reheem et al. in view of Le et al. in view of Miao et al. in view of Chen et al. teaches the method of claim 9.
Abdel-Reheem et al. in view of Le et al. in view of Miao et al. in view of Chen et al. does not appear to explicitly teach wherein the base neural network model and custom neural network model are implemented with different types of neural network models.
However, Xu et al. teaches wherein the base neural network model and custom neural network model are implemented with different types of neural network models (Fig. 2 teaches the architecture of the Neural Fine-Grained Entity Type Classification model in which the base model (LSTM encoder) is different from the custom model (Bi-directional attentive LSTM encoder)).
Abdel-Reheem et al., Le et al., Miao et al., Chen et al., and Xu et al. are analogous art to the claimed invention because they are directed to machine learning analytics.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate performing a custom training process on the hierarchical neural network classifier, for a first user of the plurality of users, using a custom training dataset associated with the first user as taught by Xu et al. to the disclosed invention of Abdel-Reheem et al. in view of Le et al. in view of Miao et al. in view of Chen et al.
One of ordinary skill in the arts would have been motivated to make this modification in order to leverage “a neural network based model which jointly learns representations for entity mentions and their context” because “the proposed model is robust to...noise and outperforms previous state-of-the-art methods significantly” (Xu et al. pg. 8 Section 6 to pg. 9).

Response to Arguments
Applicant's arguments filed on 12/01/2021 with respect to the 35 U.S.C. 103 rejection to claims 1-20 have been fully considered but they are not persuasive. 
Applicant asserts that “Abdel-Reheem and Le, and to the extent that Examiner relies upon Ma, alone or in combination, do not teach or suggest at least these limitations. The OA acknowledges that Abdel-Reheem in view of Le "does not appear to explicitly teach wherein the plurality of sub-classifiers are automatically determined using a confusion matrix." OA, 41. Instead, in the rejection regarding claim 5, the OA relies upon Ma to teach automatically determining the plurality of sub-classifiers using a confusion matrix. To the extent that Examiner relies on Ma to teach that "a structure of the hierarchical neural network classifier is determined automatically by performing a first training process using a first training dataset" as recited in claim 1 as amended, Applicant submits that Ma fails to teach or suggest either the above limitation of claim I as amended or the limitation of claim 5” (Remarks, pg. 8-9).
Examiner’s Response:
The Examiner respectfully disagrees. First, the amended limitation in question is the following limitation of claim 1: “wherein a structure of the hierarchical neural network classifier is determined automatically by performing a first training process using a first training dataset,” which is a limitation of different scope from the following limitation of original claim 5: “wherein the plurality of sub-classifiers are automatically determined using a confusion matrix” (emphasis added). Second, contrary to Applicant’s arguments, the current Office Action does not rely on Ma to teach the amended limitation "a structure of the hierarchical neural network classifier is determined automatically by performing a first training process using a first training dataset" of claim 1. 
As indicated in the current Office Action, Le et al. teaches wherein a structure of the hierarchical neural network classifier is determined automatically by performing a first training Fig. 1 and pg. 94 Section 2.4: “Thus, we expected that a combination of CRF model with a Bi-LSTM neural network encoding [2] should increase the accuracy of the tagging decisions. The architecture of the model is presented on the Fig. 1...To improve accuracy of the prediction a CRF layer is trained to enforce constraints dependent on the order of tags. Full set of parameters for this model consists of parameters of Bi-LSTM layers (weight matrices, biases, word embedding matrix) and transition matrix of CRF layer. All these parameters are tuned during training stage by back propagation algorithm with stochastic gradient descent. Dropout is applied to avoid over-fitting and improve the system performance” teaches the structure of the Combined Bi-LSTM and CRF Model (corresponds to hierarchical neural network classifier), as represented by its parameters, is tuned (determined) automatically during a training stage by the back propagation algorithm with stochastic gradient descent, which corresponds to performing a first training process using training data (also see pg. 97 Section 3.3)).

Applicant asserts that “Examiner relies on paragraphs [0070] and [0031] of Ma to teach some of the elements. However, Ma simply teaches "an automatic tuning process" to "identify the best settings for the hyperparameters though the hyperparameters may optionally be selected as an input option." Ma, [0070]. Ma's teachings of tuning the hyperparameters is for controlling the training process, because hyperparameter is a parameter whose value is used to control the training process, while the structure of the hierarchical neural network classifier as claimed is derived via training” (Remarks, pg. 9).
Examiner’s Response:
The Examiner respectfully disagrees. First, as noted above, the amended limitation in question is the following limitation of claim 1: “wherein a structure of the hierarchical neural network classifier is determined automatically by performing a first training process using a first training dataset,” which is a limitation of different scope from the following limitation of original claim 5: “wherein the plurality of sub-classifiers are automatically determined using a confusion matrix” (emphasis added). Second, contrary to Applicant’s arguments, the current Office Action does not rely on Ma to teach the amended limitation "a structure of the hierarchical neural network classifier is determined automatically by performing a first training process using a first training dataset" of claim 1, therefore Applicant’s arguments regarding how Ma does not teach the amended limitation of claim 1 are moot. Please see the response above and the current Office Action for more information. 
As arguments regarding independent claims 13 and 18 are analogous to those for claim 1, the above responses are applicable to claims 13 and 18.
As arguments regarding dependent claims 2-12, 14-17, and 19-20 are analogous to those for claim 1, the above responses are applicable to dependent claims 2-12, 14-17, and 19-20.
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 

Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kamran Afshar can be reached on (571) 272-7796. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/Y.C./Examiner, Art Unit 2125                   

/KAMRAN AFSHAR/Supervisory Patent Examiner, Art Unit 2125