DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
1.	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Arguments
2.	 Applicant’s arguments and amendments in the Amendment, with respect to the objection to claims 14, 20, and 21 have been fully considered and are persuasive.  Therefore, the objection has been withdrawn.  Applicant’s arguments and amendments in the Amendment, with respect to the rejections of claims 1, 7, 13, and 19, and claims depending therefrom, under 35 U.S.C. 103 have been fully considered and are persuasive in part, as detailed below.  Therefore, the rejection has been withdrawn.  However, upon further consideration, new grounds of rejection are made in view of Ramerth et al., U.S. Patent App. Pub. No. 20120166942. Original Claims 1, 7, 13, and 19 are amended.  Amended independent Claims 1, 7, 13, and 19 have been considered as discussed below.  
3.	Applicant argues in the Amendment that Poria does not describe “wherein the features comprise at least encodings for parts of speech which further comprise tags identifying the parts of speech individually” and “biasing the model towards expecting the identified parts of speech” as now recited in amended independent Claims 1, 7, 13, and 19.  Ramerth et al., U.S. Patent App. Pub. No. 20120166942 is cited as teaching these features, as discussed below. 

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


4.	Claims 1 and 7 are rejected under 35 U.S.C. 103 as being unpatentable over Chiu et al. (“Named Entity Recognition with Bidirectional LSTM-CNNs,” hereinafter “Chiu”) in view of Salloum (U.S. Patent Publication No. 20190043486 hereinafter “Salloum”), Vinyals et al. (“Show and Tell: A Neural Image Caption Generator,” hereinafter “Vinyals”), and U.S. Patent App. Pub. No. 20120166942 (Ramerth et al., hereinafter “Ramerth”).
With respect to Claim 1, Chiu describes:
A method for deriving a model for a chatbot for predicting entities in a given sentence comprising a plurality of words wherein the words are separated by space, the method comprising the steps of:
…
obtaining features from the truncated sentence;
(Section 2.1 of Chiu describes a Bi-LSTM that transforms “word features into named entity tag scores” (page 359 of Chiu, top left).  Thus, Chiu describes that features are the input.)
performing a Long Short-Term Memory Recurring Neural Network forward pass on the features to obtain a first set of results;
(The left branch of Figure 3 of Chiu shows a LSTM forward pass.)

    PNG
    media_image1.png
    568
    617
    media_image1.png
    Greyscale

performing a Long Short-Term Memory Recurring Neural Network backward pass on the features to obtain a second set of results;
(The right branch of Figure 3 of Chiu shows a LSTM backward pass.)
performing a first concatenating on the first set of results and the second set of results.”
(Chiu describes on page 359 that “These two vectors are then simply added together to produce the final output.” This addition is being cited as the claimed “first concatenating.”)
However, Chiu does not describe:
“inputting the sentence into a named-entity recognition module and truncating the sentence;”
“wherein the features comprise at least encodings for parts of speech which further comprise tags identifying the parts of speech individually;
and
“performing a second concatenation on the first concatenation using output target entities, wherein the output target entities are shifted by one step;
obtaining a fully connected set of neurons from the second concatenation, which are shared across all encoded words;
obtaining a plurality of encoded outputs with dimensions equal to the number of entities; and
biasing the model towards expecting the identified parts of speech.”
However, Salloum describes “inputting the sentence into a named-entity recognition module and truncating the sentence.”  
(Paragraph 42 of Salloum describes a system that truncates sentences longer than 512 tokens, and pads sentences less than 512 tokens, and then uses that as the input to a Bi-LSTM. )
It would have been obvious before the effective filing date of the claimed invention to include the sentence truncation of Salloum into the method of Chiu to provide greater speech detection accuracy, as described in paragraph 43 of Salloum.
Chiu in view of Salloum does not teach:
 “wherein the features comprise at least encodings for parts of speech which further comprise tags identifying the parts of speech individually;
and
“performing a second concatenation on the first concatenation using output target entities, wherein the output target entities are shifted by one step;
obtaining a fully connected set of neurons from the second concatenation, which are shared across all encoded words;
obtaining a plurality of encoded outputs with dimensions equal to the number of entities; and
biasing the model towards expecting the identified parts of speech.”
However, Vinyals describes:
“performing a second concatenation on the first concatenation using output target entities, wherein the output target entities are shifted by one step;
(The caption to Figure 2 on page 3158 of Vinyals describes that “the predicted word at time t-1 is fed back in addition to the memory output m at time t into the Softmax for word prediction.”  The predicted word is being cited as “output target entities,” and the predicted is shifted one step in that the predicted word for time t-1 is used.  This predicted word is fed back in addition to the memory output at time t (cited as the result of the first concatenation).  The combination of the predicted word at time t-1 and the memory output m at time t is cited as the second concatenation.)

    PNG
    media_image2.png
    647
    476
    media_image2.png
    Greyscale

obtaining a fully connected set of neurons from the second concatenation, which are shared across all encoded words; and
(Page 3158 of Vinyals describes that the final word prediction is obtained from the softmax, which would include “obtaining a fully connected set of neurons”.)
obtaining a plurality of encoded outputs with dimensions equal to the number of entities.”
(Page 3158 of Vinyals describes that the final word prediction is obtained from the softmax, which would include “obtaining a plurality of encoded outputs”.)
It would have been obvious before the effective filing date of the claimed invention to include the feedback of the predicted word at time t-1 of Vinyals into the method of Chiu and Salloum to better prevent exploding or vanishing gradients, as described in the first paragraph of Section 3.1 on page 3158 of Vinyals.
Chiu in view of Salloum and Vinyals does not explicitly describe:
“wherein the features comprise at least encodings for parts of speech which further comprise tags identifying the parts of speech individually;
and
biasing the model towards expecting the identified parts of speech.”
However, Ramerth describes a named entity recognition system that includes parts of speech as feature attributes (paragraph 15).  Further, paragraph 17 of Ramerth describes that a replacement candidate that matches a statistically expected part of speech or other attribute (e.g., gerund, named entity, particular word sense, etc.) is assigned a higher confidence score than if the candidate did not match the expected attribute.  This assigning of higher confidence values for expected parts of speech is cited as “biasing the model” as recited in amended Claim 1.
It would have been obvious before the effective filing date of the claimed invention to include the biasing towards an expected part of speech of Ramerth into the method of Chiu, Salloum, and Vinyals to better predict a next word in a sequence, as described in paragraph 12 of Ramerth.
With respect to Claim 7, Chiu describes:
A system for deriving a model for a chatbot for predicting entities in a given sentence comprising a plurality of words wherein the words are separated by space, the system comprising: …
obtaining features from the truncated sentence;
(Section 2.1 of Chiu describes a Bi-LSTM that transforms “word features into named entity tag scores” (page 359 of Chiu, top left).  Thus, Chiu describes that features are the input.)
performing a Long Short-Term Memory Recurring Neural Network forward pass on the features to obtain a first set of results;
(The left branch of Figure 3 of Chiu shows a LSTM forward pass.)

    PNG
    media_image1.png
    568
    617
    media_image1.png
    Greyscale

performing a Long Short-Term Memory Recurring Neural Network backward pass on the features to obtain a second set of results;
(The right branch of Figure 3 of Chiu shows a LSTM backward pass.)
performing a first concatenating on the first set of results and the second set of results.”
(Chiu describes on page 359 that “These two vectors are then simply added together to produce the final output.” This addition is being cited as the claimed “first concatenating.”)
However, Chiu does not describe:
“a processor; and 
a memory in communication with the processor, the memory storing instructions that, when executed by the processor, causes the processor to derive the model by:”
“inputting the sentence into a named-entity recognition module and truncating the sentence;”
“wherein the features comprise at least encodings for parts of speech which further comprise tags identifying the parts of speech individually;
and
“performing a second concatenation on the first concatenation using output target entities, wherein the output target entities are shifted by one step;
obtaining a fully connected set of neurons from the second concatenation, which are shared across all encoded words;
obtaining a plurality of encoded outputs with dimensions equal to the number of entities; and
biasing the model towards expecting the identified parts of speech.”
However, Salloum describes:
“a processor; and 
(Salloum teaches memory at paragraphs 62 and 63.)
a memory in communication with the processor, the memory storing instructions that, when executed by the processor, causes the processor to derive the model by:”
(Salloum teaches a processor at paragraphs 62 and 63.)
“inputting the sentence into a named-entity recognition module and truncating the sentence.”  
(Paragraph 42 of Salloum describes a system that truncates sentences longer than 512 tokens, and pads sentences less than 512 tokens, and then uses that as the input to a Bi-LSTM. )
It would have been obvious before the effective filing date of the claimed invention to include the sentence truncation of Salloum into the method of Chiu to provide greater speech detection accuracy, as described in paragraph 43 of Salloum.
Chiu in view of Salloum does not teach:
“performing a second concatenation on the first concatenation using output target entities, wherein the output target entities are shifted by one step;
wherein the features comprise at least encodings for parts of speech which further comprise tags identifying the parts of speech individually;
obtaining a fully connected set of neurons from the second concatenation, which are shared across all encoded words; 
obtaining a plurality of encoded outputs with dimensions equal to the number of entities; and
biasing the model towards expecting the identified parts of speech.”
However, Vinyals describes:
“performing a second concatenation on the first concatenation using output target entities, wherein the output target entities are shifted by one step;
(The caption to Figure 2 on page 3158 of Vinyals describes that “the predicted word at time t-1 is fed back in addition to the memory output m at time t into the Softmax for word prediction.”  The predicted word is being cited as “output target entities,” and the predicted is shifted one step in that the predicted word for time t-1 is used.  This predicted word is fed back in addition to the memory output at time t (cited as the result of the first concatenation).  The combination of the predicted word at time t-1 and the memory output m at time t is cited as the second concatenation.)

    PNG
    media_image2.png
    647
    476
    media_image2.png
    Greyscale

obtaining a fully connected set of neurons from the second concatenation, which are shared across all encoded words; and
(Page 3158 of Vinyals describes that the final word prediction is obtained from the softmax, which would include “obtaining a fully connected set of neurons”.)
obtaining a plurality of encoded outputs with dimensions equal to the number of entities.”
(Page 3158 of Vinyals describes that the final word prediction is obtained from the softmax, which would include “obtaining a plurality of encoded outputs”.)
It would have been obvious before the effective filing date of the claimed invention to include the feedback of the predicted word at time t-1 of Vinyals into the method of Chiu and Salloum to better prevent exploding or vanishing gradients, as described in the first paragraph of Section 3.1 on page 3158 of Vinyals.
Chiu in view of Salloum and Vinyals does not explicitly describe:
“wherein the features comprise at least encodings for parts of speech which further comprise tags identifying the parts of speech individually;
and
biasing the model towards expecting the identified parts of speech.”
However, Ramerth describes a named entity recognition system that includes parts of speech as feature attributes (paragraph 15).  Further, paragraph 17 of Ramerth describes that a replacement candidate that matches a statistically expected part of speech or other attribute (e.g., gerund, named entity, particular word sense, etc.) is assigned a higher confidence score than if the candidate did not match the expected attribute.  This assigning of higher confidence values for expected parts of speech is cited as “biasing the model” as recited in amended Claim 7.
It would have been obvious before the effective filing date of the claimed invention to include the biasing towards an expected part of speech of Ramerth into the method of Chiu, Salloum, and Vinyals to better predict a next word in a sequence, as described in paragraph 12 of Ramerth.
5.	Claims 2-6 and 8-12 are rejected under 35 U.S.C. 103 as being unpatentable over Chiu in view of Salloum, Vinyals, and Ramerth and further in view of Poria et al. (“Deep Convolutional Neural Network Textual Features and Multiple Kernel Learning for Utterance-Level Multimodal Sentiment Analysis,” hereinafter “Poria”).
With regard to Claim 2, Chiu in view of Salloum, Vinyals, and Ramerth do not explicitly describe “wherein the shifting is performed to the right and zeroes are input into an empty first position.”  (However, Section 3 of Poria (page 2540, near middle of right column) describes that input vectors may be shifted to the right, and the open spots can be zero padded.) 
It would have been obvious before the effective filing date of the claimed invention to include the zero padded shifted vector of Poria into the method of Chiu in view of Salloum, Vinyals, and Ramerth to allow for convolution, as described at page 2540, near middle of right column of Poria.
With regard to Claim 3, Chiu in view of Salloum, Vinyals, and Ramerth do not explicitly describe “wherein the Long Short-Term Memory Recurring Neural Network comprises 100 memory cells.”  However, Poria describes an input vector of dimension 21,114 and an input layer of 21,114 neurons (page 2540, near middle of right column of Poria).  Although this does not explicitly describe “100 memory cells,” the number of memory cells is a result effective variable, that is, a variable which achieves a recognized result.  Thus, the specific number of claimed memory cells is a matter of routine experimentation.  See MPEP 2145(II)(B).  Accordingly, it would have been obvious before the effective filing date of the claimed invention to include 100 memory calls into the method of Chiu in view of Salloum, Vinyals, and Ramerth as this feature is a routine optimization of the number of memory cells based on a design need.  
With regard to Claim 4, Chiu in view of Salloum, Vinyals, and Ramerth do not explicitly describe “wherein the first concatenation results in 200 dimensions.”  However, Poria describes a sixth layer of 500 neurons just before the softmax layer (page 2540, lower portion of right column of Poria).  Although this does not explicitly describe a vector of “200 dimensions,” the number of dimensions is a result effective variable, that is, a variable which achieves a recognized result.  Thus, the specific number of dimensions is a matter of routine experimentation.  See MPEP 2145(II)(B).  Accordingly, it would have been obvious before the effective filing date of the claimed invention to include a 200 dimensional vector into the method of Chiu in view of Salloum, Vinyals, and Ramerth as this feature is a routine optimization of the number of dimensions based on a design need.  
With regard to Claim 5, Chiu in view of Salloum, Vinyals, and Ramerth do not explicitly describe “wherein the second concatenation results in a number of dimensions proportionally related to a number of features from the truncated sentence.”  However, Poria describes a sixth layer of 500 neurons just before the softmax layer (page 2540, lower portion of right column of Poria).  Although this does not explicitly describe a vector having “a number of dimensions proportionally related to a number of features from the truncated sentence,” the number of dimensions is a result effective variable, that is, a variable which achieves a recognized result.  Thus, the specific number of dimensions is a matter of routine experimentation.  See MPEP 2145(II)(B).  Accordingly, it would have been obvious before the effective filing date of the claimed invention to include a vector having a number of dimensions proportionally related to a number of features from the truncated sentence into the method of Chiu in view of Salloum, Vinyals, and Ramerth as this feature is a routine optimization of the number of dimensions based on a design need.
With regard to Claim 6, Chiu in view of Salloum, Vinyals, and Ramerth do not explicitly describe “wherein the features further comprise one or more of: word embeddings, special character information, digit count information, digit chunks count information, and fixed-size ordinally forgetting encoding.”  However, Poria describes using word embeddings as a feature (page 2540, lower portion of left column of Poria).  Accordingly, it would have been obvious before the effective filing date of the claimed invention to include a word embedding feature as decried by Poria into the method of Chiu in view of Salloum, Vinyals, and Ramerth as this would provide a more accurate classifier, as describes in the upper portion of the left column of page 2540 of Poria.
With regard to Claim 8, Chiu in view of Salloum, Vinyals, and Ramerth do not explicitly describe “wherein the shifting is performed to the right and zeroes are input into an empty first position.”  (However, Section 3 of Poria (page 2540, near middle of right column) describes that input vectors may be shifted to the right, and the open spots can be zero padded.) 
It would have been obvious before the effective filing date of the claimed invention to include the zero padded shifted vector of Poria into the method of Chiu in view of Salloum, Vinyals, and Ramerth to allow for convolution, as described at page 2540, near middle of right column of Poria.
With regard to Claim 9, Chiu in view of Salloum, Vinyals, and Ramerth do not explicitly describe “wherein the Long Short-Term Memory Recurring Neural Network comprises 100 memory cells.”  However, Poria describes an input vector of dimension 21,114 and an input layer of 21,114 neurons (page 2540, near middle of right column of Poria).  Although this does not explicitly describe “100 memory cells,” the number of memory cells is a result effective variable, that is, a variable which achieves a recognized result.  Thus, the specific number of claimed memory cells is a matter of routine experimentation.  See MPEP 2145(II)(B).  Accordingly, it would have been obvious before the effective filing date of the claimed invention to include 100 memory calls into the method of Chiu in view of Salloum, Vinyals, and Ramerth as this feature is a routine optimization of the number of memory cells based on a design need.  
With regard to Claim 10, Chiu in view of Salloum, Vinyals, and Ramerth do not explicitly describe “wherein the first concatenation results in 200 dimensions.”  However, Poria describes a sixth layer of 500 neurons just before the softmax layer (page 2540, lower portion of right column of Poria).  Although this does not explicitly describe a vector of “200 dimensions,” the number of dimensions is a result effective variable, that is, a variable which achieves a recognized result.  Thus, the specific number of dimensions is a matter of routine experimentation.  See MPEP 2145(II)(B).  Accordingly, it would have been obvious before the effective filing date of the claimed invention to include a 200 dimensional vector into the method of Chiu in view of Salloum, Vinyals, and Ramerth as this feature is a routine optimization of the number of dimensions based on a design need.  
With regard to Claim 11, Chiu in view of Salloum, Vinyals, and Ramerth do not explicitly describe “wherein the second concatenation results in a number of dimensions proportionally related to a number of features from the truncated sentence.”  However, Poria describes a sixth layer of 500 neurons just before the softmax layer (page 2540, lower portion of right column of Poria).  Although this does not explicitly describe a vector having “a number of dimensions proportionally related to a number of features from the truncated sentence,” the number of dimensions is a result effective variable, that is, a variable which achieves a recognized result.  Thus, the specific number of dimensions is a matter of routine experimentation.  See MPEP 2145(II)(B).  Accordingly, it would have been obvious before the effective filing date of the claimed invention to include a vector having a number of dimensions proportionally related to a number of features from the truncated sentence into the method of Chiu in view of Salloum, Vinyals, and Ramerth as this feature is a routine optimization of the number of dimensions based on a design need.
With regard to Claim 12, Chiu in view of Salloum, Vinyals, and Ramerth do not explicitly describe “wherein the features further comprise one or more of: word embeddings, special character information, digit count information, digit chunks count information, and fixed-size ordinally forgetting encoding.”  However, Poria describes using word embeddings as a feature (page 2540, lower portion of left column of Poria).  Accordingly, it would have been obvious before the effective filing date of the claimed invention to include a word embedding feature as decried by Poria into the method of Chiu in view of Salloum, Vinyals, and Ramerth as this would provide a more accurate classifier, as describes in the upper portion of the left column of page 2540 of Poria.
6.	Claims 13, 14, and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Chiu in view of Salloum, Vinyals, and Ramerth and further in view of Wu et al. (“Evaluating the Utility of Hand-crafted Features in Sequence Labelling,” hereinafter “Wu”).
With respect to Claim 13, Chiu describes:
A method for deriving a model for a chatbot for predicting entities in a given sentence comprising a plurality of words wherein the words are separated by space, the method comprising the steps of:
…
obtaining features from the truncated sentence;
(Section 2.1 of Chiu describes a Bi-LSTM that transforms “word features into named entity tag scores” (page 359 of Chiu, top left).  Thus, Chiu describes that features are the input.)
performing a Long Short-Term Memory Recurring Neural Network forward pass on the features to obtain a first set of results;
(The left branch of Figure 3 of Chiu shows a LSTM forward pass.)

    PNG
    media_image1.png
    568
    617
    media_image1.png
    Greyscale

performing a Long Short-Term Memory Recurring Neural Network backward pass on the features to obtain a second set of results;
(The right branch of Figure 3 of Chiu shows a LSTM backward pass.)
performing a first concatenating on the first set of results and the second set of results.”
(Chiu describes on page 359 that “These two vectors are then simply added together to produce the final output.” This addition is being cited as the claimed “first concatenating.”)
However, Chiu does not describe:
“inputting the sentence into a named-entity recognition module and truncating the sentence;”
wherein the features comprise at least encodings for parts of speech which further comprise tags identifying the parts of speech individually;
“performing a second concatenation on the first concatenation using output target entities, wherein entities are encoded using multi-hot encoding;
obtaining a fully connected set of neurons from the second concatenation, which are shared across all encoded words;
obtaining a plurality of encoded outputs with dimensions equal to the number of entities; and
biasing the model towards expecting the identified parts of speech.”
However, Salloum describes “inputting the sentence into a named-entity recognition module and truncating the sentence.”  
(Paragraph 42 of Salloum describes a system that truncates sentences longer than 512 tokens, and pads sentences less than 512 tokens, and then uses that as the input to a Bi-LSTM. )
It would have been obvious before the effective filing date of the claimed invention to include the sentence truncation of Salloum into the method of Chiu to provide greater speech detection accuracy, as described in paragraph 43 of Salloum.
Chiu in view of Salloum does not teach:
“performing a second concatenation on the first concatenation using output target entities, wherein entities are encoded using multi-hot encoding;
wherein the features comprise at least encodings for parts of speech which further comprise tags identifying the parts of speech individually;
obtaining a fully connected set of neurons from the second concatenation, which are shared across all encoded words; 
obtaining a plurality of encoded outputs with dimensions equal to the number of entities; and
biasing the model towards expecting the identified parts of speech.”
However, Vinyals describes:
“performing a second concatenation on the first concatenation using output target entities, [wherein entities are encoded using multi-hot encoding];
(The caption to Figure 2 on page 3158 of Vinyals describes that “the predicted word at time t-1 is fed back in addition to the memory output m at time t into the Softmax for word prediction.”  The predicted word is being cited as “output target entities.” This predicted word is fed back in addition to the memory output at time t (cited as the result of the first concatenation).  The combination of the predicted word at time t-1 and the memory output m at time t is cited as the second concatenation.)

    PNG
    media_image2.png
    647
    476
    media_image2.png
    Greyscale

obtaining a fully connected set of neurons from the second concatenation, which are shared across all encoded words; and
(Page 3158 of Vinyals describes that the final word prediction is obtained from the softmax, which would include “obtaining a fully connected set of neurons”.)
obtaining a plurality of encoded outputs with dimensions equal to the number of entities.”
(Page 3158 of Vinyals describes that the final word prediction is obtained from the softmax, which would include “obtaining a plurality of encoded outputs”.)
It would have been obvious before the effective filing date of the claimed invention to include the feedback of the predicted word at time t-1 of Vinyals into the method of Chiu and Salloum to better prevent exploding or vanishing gradients, as described in the first paragraph of Section 3.1 on page 3158 of Vinyals.
Chiu in view of Salloum and Vinyals does not explicitly describe:
“wherein the features comprise at least encodings for parts of speech which further comprise tags identifying the parts of speech individually;
wherein entities are encoded using multi-hot encoding; and
biasing the model towards expecting the identified parts of speech.”
However, Ramerth describes a named entity recognition system that includes parts of speech as feature attributes (paragraph 15).  Further, paragraph 17 of Ramerth describes that a replacement candidate that matches a statistically expected part of speech or other attribute (e.g., gerund, named entity, particular word sense, etc.) is assigned a higher confidence score than if the candidate did not match the expected attribute.  This assigning of higher confidence values for expected parts of speech is cited as “biasing the model” as recited in amended Claim 13.
It would have been obvious before the effective filing date of the claimed invention to include the biasing towards an expected part of speech of Ramerth into the method of Chiu, Salloum, and Vinyals to better predict a next word in a sequence, as described in paragraph 12 of Ramerth.
Chiu in view of Salloum, Vinyals, and Ramerth does not describe “wherein entities are encoded using multi-hot encoding.”
However, Wu describes “wherein entities are encoded using multi-hot encoding.”
(Section 2.2 of Wu describes at the upper part of the right column that multi-hot encoded vectors can be used to represent vectors in a Bi-LSTM system.  As the term “entities” is not used elsewhere in the claim, the broadest reasonable interpretation of this term is any vector in the system.)
 Accordingly, it would have been obvious before the effective filing date of the claimed invention to include the multi-hot encoded vectors of Wu into the method of Chiu in view of Salloum, Vinyals, and Ramerth to more compactly represent entities, as described in section 2.2 of Wu.
With regard to Claim 14, the end of Section 2.1 of Chiu describes that a final output is derived from two vectors added together.  As “unique entities” and “expected output entities” are not defined anywhere in the claim, the final output of Chiu is being interpreted as “unique entities” and the two vectors added together to create the final output are interpreted as “expected output entities.”
With respect to Claim 19, Chiu describes:
A system for deriving a model for a chatbot for predicting entities in a given sentence comprising a plurality of words wherein the words are separated by space, the system comprising:
…
obtaining features from the truncated sentence;
(Section 2.1 of Chiu describes a Bi-LSTM that transforms “word features into named entity tag scores” (page 359 of Chiu, top left).  Thus, Chiu describes that features are the input.)
performing a Long Short-Term Memory Recurring Neural Network forward pass on the features to obtain a first set of results;
(The left branch of Figure 3 of Chiu shows a LSTM forward pass.)

    PNG
    media_image1.png
    568
    617
    media_image1.png
    Greyscale

performing a Long Short-Term Memory Recurring Neural Network backward pass on the features to obtain a second set of results;
(The right branch of Figure 3 of Chiu shows a LSTM backward pass.)
performing a first concatenating on the first set of results and the second set of results.”
(Chiu describes on page 359 that “These two vectors are then simply added together to produce the final output.” This addition is being cited as the claimed “first concatenating.”)
However, Chiu does not describe:
“a processor; and
a memory in communication with the processor, the memory storing instructions that, when executed by the processor, causes the processor to derive the model by:”
“inputting the sentence into a named-entity recognition module and truncating the sentence;
wherein the features comprise at least encodings for parts of speech which further comprise tags identifying the parts of speech individually;
 performing a second concatenation on the first concatenation using output target entities, wherein entities are encoded using multi-hot encoding;
obtaining a fully connected set of neurons from the second concatenation, which are shared across all encoded words; 
obtaining a plurality of encoded outputs with dimensions equal to the number of entities; and
biasing the model towards expecting the identified parts of speech.”
However, Salloum describes:
“a processor; and 
(Salloum teaches memory at paragraphs 62 and 63.)
a memory in communication with the processor, the memory storing instructions that, when executed by the processor, causes the processor to derive the model by:”
(Salloum teaches a processor at paragraphs 62 and 63.)
“inputting the sentence into a named-entity recognition module and truncating the sentence.”  
(Paragraph 42 of Salloum describes a system that truncates sentences longer than 512 tokens, and pads sentences less than 512 tokens, and then uses that as the input to a Bi-LSTM. )
It would have been obvious before the effective filing date of the claimed invention to include the sentence truncation of Salloum into the method of Chiu to provide greater speech detection accuracy, as described in paragraph 43 of Salloum.
Chiu in view of Salloum does not teach:
“wherein the features comprise at least encodings for parts of speech which further comprise tags identifying the parts of speech individually;
 performing a second concatenation on the first concatenation using output target entities, wherein entities are encoded using multi-hot encoding;
obtaining a fully connected set of neurons from the second concatenation, which are shared across all encoded words; 
obtaining a plurality of encoded outputs with dimensions equal to the number of entities; and
biasing the model towards expecting the identified parts of speech.”
However, Vinyals describes:
“performing a second concatenation on the first concatenation using output target entities, [wherein entities are encoded using multi-hot encoding];
(The caption to Figure 2 on page 3158 of Vinyals describes that “the predicted word at time t-1 is fed back in addition to the memory output m at time t into the Softmax for word prediction.”  The predicted word is being cited as “output target entities.” This predicted word is fed back in addition to the memory output at time t (cited as the result of the first concatenation).  The combination of the predicted word at time t-1 and the memory output m at time t is cited as the second concatenation.)

    PNG
    media_image2.png
    647
    476
    media_image2.png
    Greyscale

obtaining a fully connected set of neurons from the second concatenation, which are shared across all encoded words; and
(Page 3158 of Vinyals describes that the final word prediction is obtained from the softmax, which would include “obtaining a fully connected set of neurons”.)
obtaining a plurality of encoded outputs with dimensions equal to the number of entities.”
(Page 3158 of Vinyals describes that the final word prediction is obtained from the softmax, which would include “obtaining a plurality of encoded outputs”.)
It would have been obvious before the effective filing date of the claimed invention to include the feedback of the predicted word at time t-1 of Vinyals into the method of Chiu and Salloum to better prevent exploding or vanishing gradients, as described in the first paragraph of Section 3.1 on page 3158 of Vinyals.
Chiu in view of Salloum and Vinyals does not describe
“wherein the features comprise at least encodings for parts of speech which further comprise tags identifying the parts of speech individually;
wherein entities are encoded using multi-hot encoding; and
biasing the model towards expecting the identified parts of speech.”
However, Ramerth describes a named entity recognition system that includes parts of speech as feature attributes (paragraph 15).  Further, paragraph 17 of Ramerth describes that a replacement candidate that matches a statistically expected part of speech or other attribute (e.g., gerund, named entity, particular word sense, etc.) is assigned a higher confidence score than if the candidate did not match the expected attribute.  This assigning of higher confidence values for expected parts of speech is cited as “biasing the model” as recited in amended Claim 19.
It would have been obvious before the effective filing date of the claimed invention to include the biasing towards an expected part of speech of Ramerth into the method of Chiu, Salloum, and Vinyals to better predict a next word in a sequence, as described in paragraph 12 of Ramerth.
Chiu in view of Salloum, Vinyals, and Ramerth does not describe “wherein entities are encoded using multi-hot encoding.”
(Section 2.2 of Wu describes at the upper part of the right column that multi-hot encoded vectors can be used to represent vectors in a Bi-LSTM system.  As the term “entities” is not used elsewhere in the claim, the broadest reasonable interpretation of this term is any vector in the system.)
 Accordingly, it would have been obvious before the effective filing date of the claimed invention to include the multi-hot encoded vectors of Wu into the method of Chiu in view of Salloum, Vinyals, and Ramerth to more compactly represent entities, as described in section 2.2 of Wu.

7.	Claims 15-18, 20, and 21 are rejected under 35 U.S.C. 103 as being unpatentable over Chiu in view of Salloum, Vinyals, Ramerth, and Wu and further in view of Poria.
With regard to Claim 15, Chiu in view of Salloum, Vinyals, Ramerth, and Wu do not explicitly describe “wherein the Long Short-Term Memory Recurring Neural Network comprises 100 memory cells.”  However, Poria describes an input vector of dimension 21,114 and an input layer of 21,114 neurons (page 2540, near middle of right column of Poria).  Although this does not explicitly describe “100 memory cells,” the number of memory cells is a result effective variable, that is, a variable which achieves a recognized result.  Thus, the specific number of claimed memory cells is a matter of routine experimentation.  See MPEP 2145(II)(B).  Accordingly, it would have been obvious before the effective filing date of the claimed invention to include 100 memory calls into the method of Chiu in view of Salloum, Vinyals, Ramerth, and Wu as this feature is a routine optimization of the number of memory cells based on a design need.  
With regard to Claim 16, Chiu in view of Salloum, Vinyals, Ramerth, and Wu do not explicitly describe “wherein the first concatenation results in 200 dimensions.”  However, Poria describes a sixth layer of 500 neurons just before the softmax layer (page 2540, lower portion of right column of Poria).  Although this does not explicitly describe a vector of “200 dimensions,” the number of dimensions is a result effective variable, that is, a variable which achieves a recognized result.  Thus, the specific number of dimensions is a matter of routine experimentation.  See MPEP 2145(II)(B).  Accordingly, it would have been obvious before the effective filing date of the claimed invention to include a 200 dimensional vector into the method of Chiu in view of Salloum, Vinyals, Ramerth, and Wu as this feature is a routine optimization of the number of dimensions based on a design need.  
With regard to Claim 17, Chiu in view of Salloum, Vinyals, Ramerth, and Wu do not explicitly describe “wherein the second concatenation results in a number of dimensions proportionally related to a number of features from the truncated sentence.”  However, Poria describes a sixth layer of 500 neurons just before the softmax layer (page 2540, lower portion of right column of Poria).  Although this does not explicitly describe a vector having “a number of dimensions proportionally related to a number of features from the truncated sentence,” the number of dimensions is a result effective variable, that is, a variable which achieves a recognized result.  Thus, the specific number of dimensions is a matter of routine experimentation.  See MPEP 2145(II)(B).  Accordingly, it would have been obvious before the effective filing date of the claimed invention to include a vector having a number of dimensions proportionally related to a number of features from the truncated sentence into the method of Chiu in view of Salloum, Vinyals, Ramerth, and Wu as this feature is a routine optimization of the number of dimensions based on a design need.
With regard to Claim 18, Chiu in view of Salloum, Vinyals, Ramerth, and Wu do not explicitly describe “wherein the features further comprise one or more of: word embeddings, special character information, digit count information, digit chunks count information, and fixed-size ordinally forgetting encoding.”  However, Poria describes using word embeddings as a feature (page 2540, lower portion of left column of Poria).  Accordingly, it would have been obvious before the effective filing date of the claimed invention to include a word embedding feature as described by Poria into the method of Chiu in view of Salloum, Vinyals, Ramerth, and Wu as this would provide a more accurate classifier, as describes in the upper portion of the left column of page 2540 of Poria.
With regard to Claim 20, Chiu in view of Salloum, Vinyals, Ramerth, and Wu do not explicitly describe “wherein the Long Short-Term Memory Recurring Neural Network comprises 100 memory cells.”  However, Poria describes an input vector of dimension 21,114 and an input layer of 21,114 neurons (page 2540, near middle of right column of Poria).  Although this does not explicitly describe “100 memory cells,” the number of memory cells is a result effective variable, that is, a variable which achieves a recognized result.  Thus, the specific number of claimed memory cells is a matter of routine experimentation.  See MPEP 2145(II)(B).  Accordingly, it would have been obvious before the effective filing date of the claimed invention to include 100 memory calls into the method of Chiu in view of Salloum, Vinyals, Ramerth, and Wu as this feature is a routine optimization of the number of memory cells based on a design need.  
With regard to Claim 21, Chiu in view of Salloum, Vinyals, Ramerth, and Wu do not explicitly describe “wherein the features further comprise one or more of: word embeddings, special character information, digit count information, digit chunks count information, and fixed-size ordinally forgetting encoding.”  However, Poria describes using word embeddings as a feature (page 2540, lower portion of left column of Poria).  Accordingly, it would have been obvious before the effective filing date of the claimed invention to include a word embedding feature as described by Poria into the method of Chiu in view of Salloum, Vinyals, Ramerth, and Wu as this would provide a more accurate classifier, as describes in the upper portion of the left column of page 2540 of Poria.

Conclusion
8.	The prior art made of record and not relied upon is considered pertinent to applicant’s disclosure.
U.S. Pat. App. Pub. No. 20130288722 (Ramanujam et al.) also describes a named entity recognition system where parts of speech labels are features.
9.	Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to EDWARD TRACY whose telephone number is (571)272-8332. The examiner can normally be reached Monday-Friday 9 AM- 5PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh Mehta can be reached at 571-272-7453.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/EDWARD TRACY JR./Examiner, Art Unit 2656                                                                                                                                                                                                        
/BHAVESH M MEHTA/Supervisory Patent Examiner, Art Unit 2656