DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Priority
Acknowledgment is made of applicant’s claim for foreign priority under 35 U.S.C. 119 (a)-(d). The certified copy has been filed in parent Application No. JP2019-068553, filed on 03/29/2019.
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 03/26/2020 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.
Drawings
The drawings are objected to because Figures 5, 6A and 6B are not written in English.  Corrected drawing sheets in compliance with 37 CFR 1.121(d) are required in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended. The figure or figure number of an amended drawing should not be labeled as “amended.” If a drawing figure is to be canceled, the appropriate figure must be removed from the replacement sheet, and where necessary, the remaining figures must be renumbered and appropriate changes made to the brief description of the several views of the drawings for consistency. Additional replacement sheets may be necessary to show the renumbering of the remaining figures. Each drawing sheet submitted after the filing date of an application must be labeled in the top margin as either “Replacement Sheet” or “New Sheet” pursuant to 37 CFR 1.121(d). If the changes are not accepted by the examiner, the applicant will 

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-5 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. The independent claims 1, and 4-5 recite “executing a first calculating processing that includes calculating, for each encoder time corresponding to a word string in the input text, a hidden state at the encoder time from a hidden state at one previous encoder time based on a word in the input text and a label of a named entity corresponding to the encoder time; executing an input processing that includes inputting the hidden state output from the encoder to a decoder; executing a second calculating processing that includes calculating, for each decoder time corresponding to the word string in a summary output from the decoder, a hidden state at the decoder time from a hidden state at one previous decoder time based on the word and label of the named entity in the summary generated at the one previous decoder time; executing a third calculating processing that includes calculating a first probability distribution based on the hidden state at the decoder time and the hidden state at the encoder time, the first probability distribution being a probability distribution in which each of words in the word string in the input text is to be copied as a word in the summary at the decoder time; executing a fourth calculating processing that includes calculating a second probability distribution based on the hidden state at the decoder time, the second probability distribution being a probability distribution in which each of words in a dictionary of a model including the encoder and the decoder is to be generated as a word in the summary at the decoder time; and 
The limitations of “executing…” and “generating…” as drafted cover mental activities. More specifically, a mind obtaining input text; calculating, at each time step of an encoder handling a word string in the input text, a hidden state at a time from a previous time step, the hidden state based on a word in the input text and a label of a named entity at that time, inputting the hidden state output to a decoder time step corresponding to the word string in a summary output from the decoder, a hidden state at the decoder time from a hidden state at one previous decoder time based on the word and label of the named entity in the summary, finding a probability of the word being copied and a second probability of the word being generated, and generating a summary based on these probabilities.
The independent claim 3 additionally recites “A learning method implemented by a computer, the method comprising: obtaining learning input text and a correct answer summary; for each encoder time corresponding to a word string in the learning input text, calculating a hidden state at the encoder time from a hidden state at one previous encoder time based on a word in the learning input text and a label of a named entity corresponding to the encoder time; inputting the hidden state output from the encoder to a decoder; for each decoder time corresponding to a word string in the correct answer summary, calculating a hidden state at the decoder time from a hidden state at one previous decoder time based on a word in the correct answer summary and a label of a named entity corresponding to the decoder time; calculating a first probability distribution based on the hidden state at the decoder time and the hidden state at the encoder time, the first probability distribution being a probability distribution in which each of words in the word string in the learning input text is to be copied as a word in the summary at the decoder time; calculating a second probability distribution based on the hidden state at the decoder time, the second probability distribution being a probability distribution in which each of words in a dictionary of a model including the encoder and the decoder is to be generated as a 
The limitations of “obtaining…”, “inputting…”, “calculating…” as drafted cover mental activities. More specifically, a mind obtaining input text and a correct summary; calculating, at each time step of an encoder handling a word string in the input text a hidden state at a time from a previous time step, the hidden state based on a word in the input text and a label of a named entity at that time, inputting the hidden state output to a decoder time step corresponding to the word string in a summary output from the decoder, a hidden state at the decoder time from a hidden state at one previous decoder time based on the word and label of the named entity in the summary, finding a probability of the word being copied, a second probability of the word being generated, and a third probability of the entity being selected to be labelled, and generating a summary based on these probabilities. The summary is then compared by a mind to the correct summary which learns from the observed differences.
	This judicial exception is not integrated into a practical application. In particular, claims 1, 3, 4 and 5 recite the additional elements of “processor” and “memory” as per the independent claims. For example, in [0062-3] of the as filed specification, there is a description of using a general purpose computing environment or computing device as recited in [0047]. Accordingly, these additional 
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to the integration of the abstract idea into a practical application, the additional element of using a computer is noted as a generalized architecture as a computing system as noted. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. Further, the additional limitation in the claims noted above are directed towards insignificant solution activity. The claims are not patent eligible. 
	With respect to claim 2 the claim relates to a hidden state at the decoder time is calculated based on the label of the named entity selected at the one previous decoder time, calculating a third probability distribution that each label of a named entity is to be selected at the decoder time, and selecting a label of a named entity at the decoder time based on the third probability distribution. This relates to a mind calculating a hidden state in the decoder with the label of the named entity, a third probability of the entity is selected to be labelled, and choosing a label. No additional limitations are present.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:


Claim 1-5 are rejected under 35 U.S.C. 103 as being unpatentable over Abigail SEE et al., "Get To The Point: Summarization with Pointer-Generator Networks", ACL, April 25, 2017 and further in view of Thiago C. FERREIRA et al., "NeuralREG: An end-to-end approach to referring expression generation." ACL, July 15, 2018
Regarding claims 1, 4 and 5, See teaches executing an obtaining processing that includes obtaining input text; (see p. 1, section 1, para. 1, where “Extractive methods assemble summaries exclusively from passages (usually whole sentences) taken directly from the source text”) 
executing a first calculating processing that includes calculating, for each encoder time corresponding to a word string in the input text, a hidden state at the encoder time from a hidden state at one previous encoder time based on a word in the input text (see p. 2, section 2.1, para. 1, where “The tokens of the article wi (eg. word in the learning input text) are fed one-by-one into the encoder (a single-layer bidirectional LSTM), producing a sequence of encoder hidden states hi.”)
executing a second calculating processing that includes calculating, for each decoder time corresponding to the word string in a summary output from the decoder, a hidden state at the decoder time from a hidden state at one previous decoder time based on the word (see p. 2, section 2.1, para. 1, where “On each step t, the decoder (a single-layer unidirectional LSTM) receives the word embedding of the previous word (…at test time it is the previous word emitted by the decoder) and has decoder state st” and the decoder hidden states are shown in Figure 3)
executing a third calculating processing that includes calculating a first probability distribution based on the hidden state at the decoder time and the hidden state at the encoder time, the first probability distribution being a probability distribution in which each of words in the word string in the (see p. 3, Figure 3 for the attention distribution, which is calculated from the encoder and decoder hidden states, and is “a probability distribution over the source words, that tells the decoder where to look to produce the next word.” (p. 3, section 2.1, para. 1))
executing a fourth calculating processing that includes calculating a second probability distribution based on the hidden state at the decoder time, the second probability distribution being a probability distribution in which each of words in a dictionary of a model including the encoder and the decoder is to be generated as a word in the summary at the decoder time (see p. 3, Figure 3 for the generation probability pgen, which is based from the decoder’s hidden states, and see p. 3, Figure 3 caption, where “For each decoder timestep a generation probability pgen ϵ [0, 1] is calculated, which weights the probability of generating words from the vocabulary, versus copying words from the source text.”, and see p. 3, section 2.2, para. 3, where “the extended vocabulary (eg. dictionary) denote the union of the vocabulary (eg. from the previous steps involving encoder and decoder), and all words appearing in the source document.”)
and executing a generating processing that includes generating words in the summary at the decoder time based on the first probability distribution and the second probability distribution. (see p. 3, Figure 3 caption, where “The vocabulary distribution and the attention distribution are weighted and summed to obtain the final distribution, from which we make our prediction”, and see also p. 5, section 3, para. 3, where it is stated that previous works “do not mix the probabilities from the copy distribution and the vocabulary distribution. We believe the mixture approach described here is better for abstractive summarization”)
As to claims 4 and 5, CRM claim 4 and generating apparatus claim 5 are related in the steps of claim 1 method, with each claimed element's function corresponding to the claimed method step. Accordingly claims 4 and 5 are similarly rejected under the same rationale as applied above with respect  (see p. 6, section 5, para. 4, “single Tesla K40m GPU with a batch size of 16”).
See does not teach the hidden state within the encoder and decoder including a label of a named entity. Ferreira teaches a hidden state at the encoder time from a hidden state at one previous encoder time based on label of a named entity corresponding to the encoder time; (see p. 1962, section 4-4.1, para. 1-2 where “NeuralREG aims to generate a referring expression (eg. label) y = {y1, y2, ..., yT} with T tokens to refer to a target entity (eg. named entity) token x(wiki) given a discourse precontext X(pre)… The precontext X(pre) is represented by forward and backward hidden-state vectors. The final annotation vector for each encoding timestep t is obtained by the concatenation of the forward and backward representations h(pre)”)
executing an input processing that includes inputting the hidden state output from the encoder to a decoder; calculating, for each decoder time corresponding to the word string in a summary output from the decoder, a hidden state at the decoder time from a hidden state at one previous decoder time based on label of the named entity in the summary generated at the one previous decoder time; (see p. 1962, section 4-4.2, para. 1-2, where “Finally, the encoding of target entity x(wiki) is simply its entry (eg. label) in the shared input word-embedding matrix Vwiki …All decoders at each timestep i of the generation process take as input features their previous state si−1, the target entity-embedding Vwiki”, and the Vwiki matrix is taken from the encoder output)
See and Ferreira are combinable because they both teach the use of an encoder/decoder architecture for generating expressions (whether summaries or referential phrases) by taking advantage of sequences of tokens within a text. Therefore it would have been obvious for a person of ordinary skill in the art before the effective filing date of the claimed invention to modify the generic text generating method with encoder/decoder hidden states and two probability distributions laid out in See with Ferreira’s teaching of the encoder and decoder hidden states also including a label of a named entity. 

Regarding claim 3, See teaches a learning method implemented by a computer, the method comprising: obtaining learning input text and a correct answer summary (see p. 9, section 8, para. 5, where “during training, the model receives word-by-word supervision in the form of the reference summary (eg. correct answer summary)” in addition to the article sentences copied as source text) 
for each encoder time corresponding to a word string in the learning input text, calculating a hidden state at the encoder time from a hidden state at one previous encoder time based on a word in the learning input text (see p. 2, section 2.1, para. 1, where “The tokens of the article wi (eg. word in the learning input text) are fed one-by-one into the encoder (a single-layer bidirectional LSTM), producing a sequence of encoder hidden states hi.”)
for each decoder time corresponding to a word string in the correct answer summary, calculating a hidden state at the decoder time from a hidden state at one previous decoder time based on a word in the correct answer summary (see p. 2, section 2.1, para. 1, where “On each step t, the decoder (a single-layer unidirectional LSTM) receives the word embedding of the previous word (while training, this is the previous word of the reference summary….) and has decoder state st” and the decoder hidden states are shown in Figure 3)
 calculating a first probability distribution based on the hidden state at the decoder time and the hidden state at the encoder time, the first probability distribution being a probability distribution in which each of words in the word string in the learning input text is to be copied as a word in the summary at the decoder time; (see p. 3, Figure 3 for the attention distribution, which is calculated from the encoder and decoder hidden states, and is “a probability distribution over the source words, that tells the decoder where to look to produce the next word.” (p. 3, section 2.1, para. 1))
calculating a second probability distribution based on the hidden state at the decoder time, the second probability distribution being a probability distribution in which each of words in a dictionary of a model including the encoder and the decoder is to be generated as a word in the summary at the decoder time (see Figure 3 for the generation probability pgen, which is based from the decoder’s hidden states, and see p. 3, Figure 3 caption, where “For each decoder timestep a generation probability pgen ϵ [0, 1] is calculated, which weights the probability of generating words from the vocabulary, versus copying words from the source text.”, and see p. 3, section 2.2, para. 3, where “the extended vocabulary (eg. dictionary) denote the union of the vocabulary (eg. from the previous steps involving encoder and decoder), and all words appearing in the source document.”)
calculating a first loss between the first probability distribution and the second probability distribution and the word in the correct answer summary at the decoder time (see p. 3, Equation 9, where “the probability distribution over the extended vocabulary” involves the first attention distribution and second vocabulary distribution and see p. 4, section 2.2, para. 4, where “The loss function is as described in equations (6) and (7), but with respect to our modified probability distribution P(w) given in equation (9).”)
and updating the parameters of the model based on the first loss (see p. 6, section 5, para. 2, where “for the models with vocabulary size 50k, the baseline model has 21,499,600 parameters, the pointer-generator (eg. from the pointer-generator network described in section 2.2) adds 1153 extra parameters…We use loss on the validation set to implement early stopping”)
See does not teach the hidden state within the encoder and decoder including a label of a named entity, and a third probability distribution for each of the labels of named entities. Ferreira teaches a hidden state at the encoder time from a hidden state at one previous encoder time based on (see p. 1962, section 4-4.1, para. 1-2 where “NeuralREG aims to generate a referring expression (eg. label) y = {y1, y2, ..., yT} with T tokens to refer to a target entity (eg. named entity) token x(wiki) given a discourse precontext X(pre)… The precontext X(pre) is represented by forward and backward hidden-state vectors. The final annotation vector for each encoding timestep t is obtained by the concatenation of the forward and backward representations h(pre)”)
inputting the hidden state output from the encoder to a decoder; calculating, for each decoder time corresponding to the word string in a summary output from the decoder, a hidden state at the decoder time from a hidden state at one previous decoder time based on label of the named entity in the summary generated at the one previous decoder time; (see p. 1962, section 4-4.2, para. 1-2, where “Finally, the encoding of target entity x(wiki) is simply its entry (eg. label) in the shared input word-embedding matrix Vwiki …All decoders at each timestep i of the generation process take as input features their previous state si−1, the target entity-embedding Vwiki”, and the Vwiki matrix is taken from the encoder output)
and a third probability distribution that each of labels of named entities is to be selected at a decoder time next to the decoder time; (see p. 1963, section 4.2, para. 2-3, where “the attention probability α ij (k) (eg. third probability distribution) determines the amount of contribution of the jth token of k-context in the generation of the ith token of the referring expression (eg. generation of each of labels of named entities). In each decoding step i, a final summary-vector for each context c i (k) is computed by summing the encoder states h j (k) weighted by the attention probabilities αi(k)” and “CAtt is an LSTM decoder augmented with an attention mechanism (Bahdanau et al., 2015) over the pre- and pos-context encodings, which is used to compute ci at each timestep”)
and calculating a second loss between the third probability distribution at the decoder time calculated at the one previous decoder time and the label of the named entity of the word in the correct (see p. 1961, section 3.3, para. 3, where “Each instance of the final dataset consists of a truecased tokenized referring expression (eg. from correct answer), the target entity (distinguished by its Wikipedia ID), and the discourse context preceding and following the relevant reference” and see p. 1963, section 4.2, para. 6-8 where “Given the summary-vector ci, the embedding of the previous referring expression token Vyi-1, the previous decoder state si-1 and the entity-embedding Vwiki, the decoders predict their next state which later is used to compute a probability distribution over the tokens in the output vocabulary for the next timestep as Equations 10 and 11 show…. The decoder is trained to minimize the negative log likelihood (eg. loss) of the next token in the target referring expression”)
See and Ferreira are combinable because they both teach the use of an encoder/decoder architecture for generating expressions (whether summaries or referential phrases) by taking advantage of sequences of tokens within a text. Therefore it would have been obvious for a person of ordinary skill in the art before the effective filing date of the claimed invention to modify the more general training method with encoder/decoder hidden states and two probability distributions laid out in See with Ferreira’s teaching of the encoder and decoder hidden states including a label of a named entity, another probability distribution for each of the labels of named entities, and another loss function based on the probability distribution. One would be motivated to do because generating reference labels for entities using the hidden states in a encoder/decoder architecture is a more integrated approach shown to provide more relevant labels than a feature extraction (p. 2, section 2, para. 5), enabling the text summarization model to effectively deal with entities within an input text.

Regarding claim 2, the combination of See and Ferreira teaches the generating method according to claim 1. Ferreira teaches calculating a third probability distribution that each label of a (see p. 1963, section 4.2, para. 3, where “the attention probability α ij (k) (eg. third probability distribution) determines the amount of contribution of the jth token of k-context in the generation of the ith token of the referring expression (eg. generation of each of labels of named entities)” and is computed by the decoder: ““CAtt is an LSTM decoder augmented with an attention mechanism (Bahdanau et al., 2015) over the pre- and pos-context encodings, which is used to compute ci at each timestep. We compute energies eij (pre) and eij (pos) between encoder states hi(pre) and hi(post) and decoder state si−1. These scores are normalized through the application of the softmax function to obtain the final attention probability α ij (pre) and α ij (post)”)
and selecting a label of a named entity at the decoder time based on the third probability distribution calculated at the one previous decoder time, (see p. 1963, section 4.2, para. 2, where “In each decoding step i, a final summary-vector for each context c i (k) is computed by summing the encoder states h j (k) weighted by the attention probabilities αi(k)”)
wherein the hidden state at the decoder time is calculated based on the label of the named entity selected at the one previous decoder time. (see p. 1962, section 4-4.2, para. 1-2, where “Finally, the encoding of target entity x(wiki) is simply its entry (eg. label) in the shared input word-embedding matrix Vwiki …All decoders at each timestep i of the generation process take as input features their previous state si−1, the target entity-embedding Vwiki”, and the Vwiki matrix is taken from the encoder output)

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SARVAJNA KALVA whose telephone number is (571) 272-4692. The examiner can normally be reached on Monday - Friday 9 to 6. Examiner interviews are available via telephone, in http://www.uspto.gov/interviewpractice. 
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Pierre-Louis Desir can be reached on 571-272-7799. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see https://ppairmy.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.\
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SARVAJNA KALVA whose telephone number is (571)272-4692. The examiner can normally be reached Monday - Friday 9 AM to 5 PM.

/SARVAJNA KALVA/               Examiner, Art Unit 2659                                                                                                                                                                                         

/PIERRE LOUIS DESIR/               Supervisory Patent Examiner, Art Unit 2659