DETAILED ACTION
1.	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
2.	This communication is in response to the Applicant’s submission filed 01 July 2022 [hereinafter Response], where:
Claims 1-15, 17 and 18 are amended.
Claim 16 is cancelled. 
Claims 1-15, 17, and 18 are pending.
Claims 1-15, 17, 18 are rejected.
Foreign priority is claimed to JP2018-022446, filed 27 June 2018. A certified copy of this paper has been filed 17 June 2019. Accordingly, receipt is acknowledged of certified copies of papers required by 37 CFR 1.55. 
Specification
4.	The objection to the title is WITHDRAWN in view of the Applicant’s amendment to the title.

Claim Rejections - 35 U.S.C. § 101
8.	The following is a quotation of 35 U.S.C. § § 101:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
9.	Claims 1-15, 17, and 18 are rejected under 35 U.S.C. § 101 because the claimed invention is directed to an abstract idea without significantly more. 
Claim 1 recites a “device,” which is a machine and is one of the four categories of subject matter that Congress deemed to be appropriate subject matter for a patent. Claim 1, however, recites the limitations of “generate a plurality of headings from the plurality of pieces of output information,” and “automatically select . . . at least one heading of the plurality of headings . . . .” These limitations pertain to an observation, evaluation, judgment, and/or opinion, and are “mental processes,” (see MPEP § 2106.04(a)(2), subsection III), which is one of the groupings of abstract ideas. The claims also recite to “automatically determine a similarity between each of headings.” This limitation pertains to a mathematical relationship, and is a “mathematical concept,” (see MPEP § 2106.04(a)(2), subsection I), which is another of the groupings of abstract ideas. Accordingly, claim 1 recites an abstract idea.
The abstract idea of claim 1 is not integrated into a practical application because the only other additional elements recited in claim 1 are (a) a processor programmed, (b) an electronic device, (c) an output device, which are generic computer components (that is, a processor) upon which the abstract idea is executed and does not represent a practical application of the abstract idea. (see MPEP § 2106.04(d)). Other additional elements recited in claim 1 include (d) acquire a plurality of pieces of output information . . . ,” and “automatically output the selected at least one heading . . . ,” which are synonymous to receiving and transmitting data that is directed to insignificant extra-solution activities. (see MPEP § 2106.05(d) subsection II.i). Also, the limitations that the “[a plurality of pieces of output information] generated from predetermined target information by a plurality of models, each of the models generating, from input information, the plurality of pieces of output information, . . . ,” merely provides further detail of the data being “acquired,” and accordingly, does not represent integrating the abstract idea to a practical application. Also, the recited language that links “a plurality of models” to a field-of-use limitation that “generally link[s] the use of a judicial exception to a particular technological environment or field of use,” (MPEP § 2106.04(d)) and cannot integrate the judicial exception into a practical application. Still also, the limitations reciting “each of the plurality of pieces of output information being a single word and having an order relation relative to the other pieces of output information,” merely provides additional detail to that of the output information, while the limitations reciting “the at least one heading summarizing the predetermined target information,” simply links the abstract idea to the intended use of summarizing data. Simply providing additional detail to the abstract idea, or linking the abstract idea to a field of use cannot integrate the judicial exception into a practical application. Therefore, claim 1 is directed to the abstract idea.
Finally, the additional elements, taken alone or in combination, do not represent significantly more than the abstract idea itself. Generally linking the abstract idea to a field of use (that is, processing data received from an output of a plurality of models) does not provide an inventive concept (MPEP § 2106.05(h)). Also, executing on generic computer components cannot provide significantly more than the abstract idea itself. (MPEP § 2106.05(d)). Moreover, there is no nexus between the field-of-use and the generic computer components which, when taken in combination, could provide an inventive concept nor provide significantly more than the abstract idea. Therefore, claim 1 is subject-matter ineligible.
Claims 2, 3, and 5 depend from claim 1, and only include limitations that recite mental processes (claim 2: select, based on a semantic similarity between headings, the output information that is to be output; claim 3: select, based on cosine similarity between vectors associated with the headings, the output information that is to be output; claim 5: select from among the headings, heading in which the similarity with another piece of output information satisfies a predetermined criteria). The recited limitations of “selecting” based on a similarity criterion is a form of evaluation, and accordingly, claims 2, 3, and 5 recite an abstract idea. The claims also do not recite additional elements that integrate the abstract idea into a practical application, nor provide additional elements that amount to significantly more to the claims. Thus, claims 2, 3, and 5 are patent ineligible.
Claim 4 depends from claim 3, and only includes additional limitations of “generate a vector, for every heading, by integrating the vectors associated with the plurality of pieces of output information as the vector that is associated with the heading,” which recites to generate by integrating (that is, combining), and “selects the heading . . . based on the cosine similarity between the generated vectors,” which is selected based on known information, and each recite a mental process, which is an abstract idea. The claims also do not recite additional elements that integrate the abstract idea into a practical application, nor provide additional elements that amount to significantly more to the claims. Thus, claim 4 is patent ineligible.
Claim 6 depends from claim 1, and claims 7 and 8 depend from claim 6, and only include additional limitations that recite a mental process (claim 6: estimate . . . a probability distribution of the output information . . . ,” and “select, from among the plurality of headings, the heading included in a predetermined area in the probability distribution;” claim 7: estimate, based on kernel density estimation in which the headings generated by the plurality of models are regarded as samples, the probability distribution of the heading that is generated from the predetermined target information;” and claim 8: select, in the probability distribution, the headings included in an area in which a density of the headings is higher than in other areas of the probability distribution.”), because to estimate and to select based on a criterion are forms of “observations, evaluations, judgments, and opinions.” (MPEP § 2106.04(a)(2) subsection III.A). The claims also do not recite additional elements that integrate the abstract idea into a practical application, nor provide additional elements that amount to significantly more to the claims. Thus, claims 6, 7, and 8 are patent ineligible.
Claims 9, 14, and 15 depend from claim 1, and claims 10-13 depend from claim 9, and each recite the additional limitations of “acquire,” and to use a recurrent neural network, a plurality of models, and an encoder / decoder, (Claim 9: acquire the heading generated by a recurrent neural network, and the recurrent neural network generates the heading as different from information which is input to the recurrent neural network;” claim 14: acquire the plurality of headings . . . by a plurality of models each of which includes an encoder that generates . . . feature information indicating the feature held by the input information and a decoder that sequentially generates a plurality of pieces of information included in the headings from the feature information . . . ;” and claim 15: acquire the plurality of headings generated . . . by the plurality of models each of which generates the output information that is a text from the input information that is a text.”). Claim 10-13 merely recite further details to the abstract idea of claims 9 (claim 10: “acquire the headings . . . generated from a plurality of models in each of which a connection coefficient between nodes is randomly different and that is allowed to learn a feature that is individually held by learning information; claim 11: “acquire the headings generated by the plurality of models that is generated from an identical model and that is allowed to individually learn the feature held by the-learning information to different stages;” claim 12: “acquire the headings generated by a plurality of models each having a different connection relation between nodes;” and claim 13: “acquire the headings generated by a plurality of models each of which has learned the feature of different pieces of learning target information and that has learned the feature of a plurality of pieces of learning target information generated from predetermined learning information). Because “acquire” is a form of receiving data, the claims recite insignificant extra-solution activity that cannot integrate the abstract idea into a practical application. Also, “to use” a recurrent neural network, a plurality of models, and an encoder / decoder form of the plurality of models is to use for the intended purposes, which similarly do not integrate the abstract idea into a practical application. Also, the additional limitations merely recite additional detail in the forms of acquiring with regard to the data provided to the plurality of models. The claims also do not recite additional elements that amount to significantly more to the claims. Thus, claims 9-15 are patent ineligible.
Claim 17 recites an “output method,” which is a process and is one of the four categories of subject matter that Congress deemed to be appropriate subject matter for a patent. Claim 17, however, recites the limitations of “generating a plurality of headings from the plurality of pieces of output information,” and “automatically selecting . . . at least one heading of the plurality of headings . . . .” These limitations pertain to an observation, evaluation, judgment, and/or opinion, and are “mental processes,” (see MPEP § 2106.04(a)(2), subsection III), which is one of the groupings of abstract ideas. The claims also recite to “automatically determining a similarity between each of headings.” This limitation pertains to a mathematical relationship, and is a “mathematical concept,” (see MPEP § 2106.04(a)(2), subsection I), which is another of the groupings of abstract ideas. Accordingly, claim 17 recites an abstract idea.
The abstract idea of claim 17 is not integrated into a practical application because the only other additional elements recited in claim 17 are (a) an electronic device, and (b) an output device, which are generic computer components (that is, a processor) upon which the abstract idea is executed and does not represent a practical application of the abstract idea. (see MPEP § 2106.04(d)). Other additional elements recited in claim 17 include (c) acquiring a plurality of pieces of output information . . . ,” and “automatically outputting the selected at least one heading . . . ,” which are synonymous to receiving and transmitting data that is directed to insignificant extra-solution activities. (see MPEP § 2106.05(d) subsection II.i). Also, the limitations that the “[a plurality of pieces of output information] generated from predetermined target information by a plurality of models, each of the models generating, from input information, the plurality of pieces of output information, . . . ,” merely provides further detail of the data being “acquired,” and accordingly, does not represent integrating the abstract idea to a practical application. Also, the recited language that links “a plurality of models” to a field-of-use limitation that “generally link[s] the use of a judicial exception to a particular technological environment or field of use,” (MPEP § 2106.04(d)) and cannot integrate the judicial exception into a practical application. Still also, the limitations reciting “each of the plurality of pieces of output information being a single word and having an order relation relative to the other pieces of output information,” merely provides additional detail to that of the output information, while the limitations reciting “the at least one heading summarizing the predetermined target information,” simply links the abstract idea to the intended use of summarizing data. Simply providing additional detail to the abstract idea, or linking the abstract idea to a field of use cannot integrate the judicial exception into a practical application. Therefore, claim 17 is directed to the abstract idea.
Finally, the additional elements, taken alone or in combination, do not represent significantly more than the abstract idea itself. Generally linking the abstract idea to a field of use (that is, processing data received from an output of a plurality of models) does not provide an inventive concept (MPEP § 2106.05(h)). Also, executing on generic computer components cannot provide significantly more than the abstract idea itself. (MPEP § 2106.05(d)). Moreover, there is no nexus between the field-of-use and the generic computer components which, when taken in combination, could provide an inventive concept nor provide significantly more than the abstract idea. Therefore, claim 17 is subject-matter ineligible.
Claim 18 recites a “non-transitory computer-readable storage medium,” which is an article of manufacture and is one of the four categories of subject matter that Congress deemed to be appropriate subject matter for a patent. Claim 18, however, recites the limitations of “generating a plurality of headings from the plurality of pieces of output information,” and “automatically selecting . . . at least one heading of the plurality of headings . . . .” These limitations pertain to an observation, evaluation, judgment, and/or opinion, and are “mental processes,” (see MPEP § 2106.04(a)(2), subsection III), which is one of the groupings of abstract ideas. The claims also recite to “automatically determining a similarity between each of headings.” This limitation pertains to a mathematical relationship, and is a “mathematical concept,” (see MPEP § 2106.04(a)(2), subsection I), which is another of the groupings of abstract ideas. Accordingly, claim 18 recites an abstract idea.
The abstract idea of claim 18 is not integrated into a practical application because the only other additional elements recited in claim 19 are (a) a computer to execute, (b) an electronic device, (c) an output device, which are generic computer components (that is, a computer) upon which the abstract idea is executed and does not represent a practical application of the abstract idea. (see MPEP § 2106.04(d)). Other additional elements recited in claim 1 include (d) acquiring a plurality of pieces of output information . . . ,” and “automatically outputting the selected at least one heading . . . ,” which are synonymous to receiving and transmitting data that is directed to insignificant extra-solution activities. (see MPEP § 2106.05(d) subsection II.i). Also, the limitations that the “[a plurality of pieces of output information] generated from predetermined target information by a plurality of models, each of the models generating, from input information, the plurality of pieces of output information, . . . ,” merely provides further detail of the data being “acquired,” and accordingly, does not represent integrating the abstract idea to a practical application. Also, the recited language that links “a plurality of models” to a field-of-use limitation that “generally link[s] the use of a judicial exception to a particular technological environment or field of use,” (MPEP § 2106.04(d)) and cannot integrate the judicial exception into a practical application. Still also, the limitations reciting “each of the plurality of pieces of output information being a single word and having an order relation relative to the other pieces of output information,” merely provides additional detail to that of the output information, while the limitations reciting “the at least one heading summarizing the predetermined target information,” simply links the abstract idea to the intended use of summarizing data. Simply providing additional detail to the abstract idea, or linking the abstract idea to a field of use cannot integrate the judicial exception into a practical application. Therefore, claim 18 is directed to the abstract idea.
Finally, the additional elements, taken alone or in combination, do not represent significantly more than the abstract idea itself. Generally linking the abstract idea to a field of use (that is, processing data received from an output of a plurality of models) does not provide an inventive concept (MPEP § 2106.05(h)). Also, executing on generic computer components cannot provide significantly more than the abstract idea itself. (MPEP § 2106.05(d)). Moreover, there is no nexus between the field-of-use and the generic computer components which, when taken in combination, could provide an inventive concept nor provide significantly more than the abstract idea. Therefore, claim 18 is subject-matter ineligible.
Claim Rejections - 35 U.S.C. § 103
10.	The following is a quotation of 35 U.S.C. § 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
11.	The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. § 103 are summarized as follows:
1. 	Determining the scope and contents of the prior art.
2. 	Ascertaining the differences between the prior art and the claims at issue.
3. 	Resolving the level of ordinary skill in the pertinent art.
4. 	Considering objective evidence present in the application indicating obviousness or nonobviousness.
12.	Claim 1-6, 8, 9, 13-15, 17, and 18 are rejected under 35 U.S.C. § 103 as being unpatentable over US Published Application 20190287012 to Celikyilmaz et al. [hereinafter Celikyilmaz] in view of Mir Tafseer Nayeem, "Methods of Sentence Extraction, Abstraction and Ordering for Automatic Text Summarization," Univ. of Lethbridge (Thesis Paper, 2017) [hereinafter Nayeem].
Regarding claims 1, 17, and 18, Celikyilmaz teaches [an] output device (Celikyilmaz ¶ 0048 teaches a computing system 500), a method (Celikyilmaz ¶ 0005), and a non-transitory computer-readable storage medium (Celikyilmaz ¶ 0015) comprising:
a processor (Celikyilmaz ¶ 0008 teaches using one or more hardware processors) programmed to:
acquire a plurality of pieces of output information generated from predetermined target information (Celikyilmaz ¶ 0060 teaches the label of the respective ground-truth token (that is, from predetermined target information) is chosen to determine its associated probability) by a plurality of models (Celikyilmaz, Fig. 5, teaches (Examiner annotations in dashed text-boxes):

    PNG
    media_image1.png
    677
    507
    media_image1.png
    Greyscale

(Celikyilmaz ¶ 0051 teaches decoder component 506 may be used during the inference (or test) phase to produce output with an already trained network, but may also, in some instances, be employed during training of the neural network 502), each of the models generating from input information, the plurality of pieces of output information, each of the plurality of pieces of output information being a single word (Celikyilmaz ¶ 0040 teaches [a]t each time step t, the decoder 112 predicts a new token y, in the output sequence 402 (e.g., a new word in the summary) (that is, “the new token y” or “a new word” is each of the plurality of pieces of output information being a single word)) and having an order relation (Celikyilmaz ¶ 0049 teaches [f]or a given definition and set of parameters 511 of the neural network 502, the decoder component 506 manages the process of generating an output sequence (that is, an “output sequence” is having an order relation, where output information that includes a plurality of pieces of information having an order relation) from a given input (that is, from input information) using the neural network 502) relative to the other pieces of information (Celikyilmaz ¶ 0006 teaches encoder output may flow into the computation, by the decoder, of an output probability distribution over an extended vocabulary that includes . . . tokens (that is, “tokens” are other pieces of information) copied from the input sequences to the various encoder agents; Celikyilmaz ¶ 0050 teaches [f]rom the vocabulary distributions output by the neural-network decoder 112, the decoder component 506 determines the output sequence (that is, the “vocabulary distributions” are relative to other pieces of information));
* * *
an output unit that outputs, as the association information, the output information selected by the selecting unit (Celikyilmaz ¶ 0051 teaches the decoder component 506 may cause the computed output 514 (that is, output information) (e.g., an answer to a question, a summary of a text file, or an image caption) to be displayed on-screen or stored for later retrieval. Alternatively, the input 512 may be fed into the decoder component 506 from another computational component (within or outside the computing system 500), and/or the output 514 may be sent to a downstream computational component for further processing).
Though Celikyilmaz teaches a cosine similarity to between two consecutively generated sentences summarizing target content, Celikyilmaz does not explicitly teach - 
* * *
generate a plurality of headings from the plurality of pieces of output information and automatically determine a similarity between each of headings;
automatically select, based on the automatically determined similarity between the headings, the at least one heading summarizing the predetermined target information; and
* * *
But Nayeem teaches -
* * *
generate a plurality of headings from the plurality of pieces of output information (Nayeem at p. 39, “3.2.5 Sentence Selection,” first paragraph, teaches to extract sentences (that is, generate a plurality of headings from the plurality of pieces of output information) that cover as many important concepts as possible, while ensuring the summary length is within a given budgeted constraint . . . .[W]e use keyphrases we use keyphrases as concepts. Keyphrases are the words or phrases that represent the main topics of a document. Sentences containing the most relevant keyphrases are important for the summary generation (that is, “keyphrases” are pieces of output information)) and automatically determine a similarity between each of headings (Nayeem at p. 41, “3.3 Sentence Ordering,” second paragraph, teaches [o]ur assumption is that a good sentence order implies the similarity between all adjacent sentences since word repetition (more specifically, named entity repetition) is one of the formal signs of text coherence (Barzilay et al., 2002). We define coherence of document D which consists of sentences from S1 to Sn in the following equation. For calculating Sim(Si, Si+1), we use the similarity function described in equation (3.1):

    PNG
    media_image2.png
    65
    582
    media_image2.png
    Greyscale

 with λ = 0:5, giving the named entities a little more preference (that is, automatically determine a similarity between each of headings));
automatically select, based on the automatically determined similarity between the headings (Nayeem, at p. 43, “3.4 Evaluation,” first paragraph, teaches [w]e evaluate our system ILPRankSumm ([Integer Linear Programming (ILP)] based sentence selection with TextRank for Extractive Summarization) using ROUGE26 (Lin, 2004) on the [Document Understanding Conference (DUC)] 2004 document set (Task-2, Length limit (L) = 100 words)), the at least one heading summarizing the predetermined target information (Nayeem, at p. 44, “3.4.1 Baseline Systems,” first paragraph, teaches extractive summarizer can jointly maintain information coverage from the document side (that is, the predetermined target information) and non-redundancy from the summary side (that is, the at least one heading summarizing the predetermined target information)); and
* * *
Celikyilmaz and Nayeem are from the same or similar field of endeavor. Celikyilmaz teaches an encoder-decoder network for text summarization. Nayeem teaches a rank based sentence selection that retains the most important and non-redundant contents to form a summary. Thus, it would have been obvious to a person having ordinary skill in the art as of the effective filing date of the Applicant’s invention to modify Celikyilmaz pertaining to word vector similarity to a ground truth with the word vector similarity summary generation of Nayeem.
The motivation for doing so is to improve the informativity as well as the grammaticality of the generated sentences for a document set. (Nayeem, Abstract).
Regarding claim 2, the combination of Celikyilmaz and Nayeem teaches all of the limitations of claim 1, as described above in detail.
Celikyilmaz teaches -
wherein the processor is programed to select, based on a semantic similarity between the headings (Celikyilmaz ¶ 0057 teaches that [the] decoder hidden-state vectors at the end of each sentence . . . can then be used to compute the cosine similarity that is, similarity) between two consecutively generated sentences (that is, “sentences” pertain to semantic similarity between the headings) [having a] resulting semantic-cohesion loss to be minimized), the output information that is to be output (see Celikyilmaz ¶ 0051, referred to hereinabove).
Regarding claim 3, the combination of Celikyilmaz and Nayeem teaches all of the limitations of claim 1, as described above in detail. 
Celikyilmaz teaches -
wherein the processor is programmed to select, based on cosine similarity . . . associated with the headings (Celikyilmaz ¶ 0057 teaches that [the] decoder hidden-state vectors at the end of each sentence . . . can then be used to compute the cosine similarity that is, selects, based on cosine similarity) between two consecutively generated sentences), the output information that is to be output (see Celikyilmaz ¶ 0051, referred to hereinabove).
Though Celikyilmaz teaches “cosine similarity” between two consecutively generated sentences, Celikyilmaz, however, does not explicitly teach “cosine similarity between vectors.”
But Nayeem teaches “cosine similarity between vectors,” (Nayeem, at p. 11, “2.2 Word Embedding,” first paragraph, teaches the concept of word embedding, which is a vector representation of words; Nayeem, at p. 36, “3.2.2 Sentence Similarity,” second paragraph, teaches we calculate the cosine similarity between the sentence vectors obtained from the above equation to find the relative distance between Si and Sj).
Regarding claim 4, the combination of Celikyilmaz and Nayeem teaches all of the limitations of claim 3, as described above in detail. 
Nayeem teaches -
wherein the processor is programmed to generate a vector, for every heading, by integrating the vectors associated with the plurality of pieces of output information (Nayeem, at p. 12, “2.2.2 Word2Vec Embedding,” first paragraph, teaches [t]he language model assigns (that is, “assigns” is by integrating the vectors associated with) higher probabilities to grammatical and meaningful sentences, and lower probabilities to meaningless sentence constructions (that is, “probabilities” are the plurality of pieces of output information)) as the vector that is associated with the heading (see Celikyilmaz ¶ 0051, referred to hereinabove) and 
selects the heading that is to be output based on the cosine similarity between the generated vectors (Celikyilmaz ¶ 0051 teaches the decoder component 506 may cause the computed output 514 (that is, output information) (e.g., an answer to a question, a summary of a text file, or an image caption) to be displayed on-screen or stored for later retrieval. Alternatively, the input 512 may be fed into the decoder component 506 from another computational component (within or outside the computing system 500), and/or the output 514 may be sent to a downstream computational component for further processing).
Regarding claim 5, the combination of Celikyilmaz and Nayeem teaches all of the limitations of claim1, as described above in detail. 
Though Celikyilmaz teaches a similarity between two consecutively generated sentences, Celikyilmaz, however, does not explicitly teach -
wherein the processor is programed to select, from among the plurality of pieces of headings, a heading in which the similarity with another piece of output information satisfies a predetermined criteria.
But Nayeem teaches -
wherein the processor is programed to select, from among the plurality of pieces of headings, a heading in which the similarity with another piece of output information satisfies a predetermined criteria (Nayeem, at p. 36, “3.2.2 Sentence Similarity,” second paragraph, teaches we calculate the cosine similarity between the sentence vectors obtained from the [equation]:

    PNG
    media_image3.png
    72
    238
    media_image3.png
    Greyscale

to find the relative distance between Si and Sj; Nayeem, at p. 36, “3.2.2 Sentence Similarity,” third paragraph teaches [a] standalone similarity function can be used in this work with different λ values . . . . The main challenge is finding an optimal λ threshold, as shown in Table 3.1 (Examiner annotations in dashed-text box):

    PNG
    media_image4.png
    162
    416
    media_image4.png
    Greyscale

To find the optimal threshold λ for the similarity function Sim(Si;Sj) . . . , 
we use the SICK dataset of SemEval-2014 (that is, a heading in which the similarity with another piece of output information satisfies a predetermined criteria); see also Nayeem, at p. 55, “4.4 Experimental Setup, second paragraph, teaches [t]o ensure pure abstractive compression generation, we remove paths that have cosineSimilarity ≥ 0.9 to any of the original sentence in the cluster (that is, “0.9” is a predetermined criteria)).
Regarding claim 6, the combination of Celikyilmaz and Nayeem teaches all of the limitations of claim 1, as described above in detail.
Celikyilmaz teaches -
wherein the processor is programed to estimate, based on the similarity between the headings, a probability distribution of the output information (Celikyilmaz Fig. 1, teaches 

    PNG
    media_image5.png
    580
    548
    media_image5.png
    Greyscale

(Celikyilmaz ¶ 0028 teaches when the neural network is used to compute the probability of the "ground-truth" output sequence of a known training pair of input and output sequences (that is, predetermined target information), the previous token of the output sequence is taken from the ground-truth output sequence (that is, based on the similarity between the headings). In the inference phase (or test phase), when no ground truth is available, the previous token of the output sequence is the output token computed by the decoder 112 in the previous time step (which, e.g., in the case of greedy decoding, takes the value that is most probable in the probability distribution output by the decoder 112) that is generated from the predetermined target information (Celikyilmaz ¶ 0063 the training switches over to a mixed training objective (that is, “mixed training objective,” is predetermined target information), e.g., as shown, combining [maximum-likelihood-estimation (MLE)], semantic-cohesion, and RL losses (that is, MLE is estimates, based on the similarity between the pieces of output information, a probability distribution of the output information that is likely to be generated from the predetermined target information)), . . . .
Though Celikyilmaz teaches probability distributions on a similarity basis with a ground truth, Celikyilmaz does not explicitly teach -
* * *
. . . wherein the processor is programmed to select, from among the plurality of headings, a heading included in a predetermined area in the probability distribution.
But Nayeem teaches -
* * *
. . . wherein the processor is programmed to select, from among the plurality of headings (Celikyilmaz ¶ 0030 teaches the decoder selects values for the tokens of the output sequence by including tokens [of the respective input sequence are] lifted from the input to the encoder agents 104, 105, 106 (that is, select, from among the pieces of [acquired] output information), a heading included in a predetermined area in the probability distribution (Nayeem, Fig. 2.1, teaches visualization of word to word similarity of all non-stop words from both headlines are embedded into a word2vec space (that is, a predetermined area in the probability distribution):

    PNG
    media_image6.png
    317
    562
    media_image6.png
    Greyscale

Nayeem, at p. 12, “2.2.2 Word2Vec Embedding,” first paragraph, teaches [v]ector space models have been used in distributional semantics since the 1990s for estimating continuous representations of words).
Regarding claim 8, the combination of Celikyilmaz and Nayeem teaches all of the limitations of claim 6, as described above in detail.
Though Celikyilmaz teaches abstractive summarization using probability estimation, Celikyilmaz, however, does not explicitly teach - 
wherein the processor is programmed to select, in the probability distribution, the headings included in an area in which a density of the headings is higher than in other areas of the probability distribution.
But Nayeem teaches -
wherein the processor is programmed to select, in the probability distribution, the headings included in an area in which a density of the headings is higher than in other areas of the probability distribution (Nayeem, at p. 24, “2.5.1 Encoder-Decoder Framework,” first paragraph, teaches [to capture semantic relationship between words in a source sentence,] it is useful to project the one-hot vector into a low-dimensional semantic space as a dense vector with fixed dimensions (that is, a density of the headings). For instance, [sentence vector] si =Cwi for the i-th word, with CɛRk x |V| as the projection matrix and K is the dimensionality of the word embedding vector and |V| is the size of the fixed vocabulary (that is, as variables change, then a density of the headings is higher than in other areas of the probability distribution); see also, Nayeem, at p. 55, “4.4 Experimental Setup,” second paragraph, also teaches “compression,” in which [t]o ensure pure abstractive compression generation, we remove the paths that have cosineSimilarity ≥ 0.9 to any of the original sentence in the cluster (that is, “compression generation” is a density of the headings is higher than in other areas of probability distribution where compression less than 0.9 are less compressed)).
Regarding claim 9, the combination of Celikyilmaz and Nayeem teaches all of the limitations of claim 1, as described above in detail.
Celikyilmaz teaches -
wherein the processor is programmed to acquire the headings generated by a recurrent neural network, and the recurrent neural network generates the heading as different from information which is input to the recurrent neural network (Celikyilmaz ¶ 0013 teaches encoder-decoder neural network includes a plurality of intercommunicating multi-layer encoder agents, each encoder agent taking, as input to one or more of its layers, one or more respective message vectors computed from hidden-state output of the other ones of the plurality of encoder agents; and a decoder comprising a recurrent neural network taking, as input at each time step, a respective current decoder state and a context vector computed from top-layer hidden-state outputs of the plurality of encoder agents (that is, information which is input to the recurrent neural network); Celikyilmaz ¶ 0014 teaches the decoder is configured to generate a sequence of output probability distributions (that is, acquire headings generated by a recurrent neural network) over a vocabulary (that is, “generate a sequence of output probability distributions” are to which the recurrent neural network generates the heading as different from information which is input to the recurrent neural network)).
Regarding claim 11, the combination of Celikyilmaz and Nayeem teaches all of the limitations of claim 9, as described in detail above.
Celikyilmaz teaches -
wherein the processor is programed to acquire the headings generated by the plurality of models that is generated from an identical model (Celikyilmaz ¶ 0026 teaches in text summarization, multiple agents of identical architecture (that is, from an identical model) may be used to process different sections of the input text (that is, acquires the headings generated by the plurality of models that is generated from an identical model)) and that is allowed to individually learn a feature held by the learning information to different stages ([Examiner notes the “that is allowed to individually learn” language merely recites an intended use, and is not positively recited within the claim]).
Regarding claim 13, the combination of Celikyilmaz and Nayeem teaches all of the limitations of claim 9, as described above in detail.
Celikyilmaz teaches -
wherein the processor is programmed to acquire the headings generated by a plurality of models each of which has learned the feature of different pieces of learning target information and that has learned a feature of a plurality of pieces of learning target information generated from predetermined learning information (Celikyilmaz ¶ 0052 teaches training data 516 includes pairs of an input sequence (e.g., a sequence of words for a text, or a sequence of pixels for an image) and an output sequence that constitutes the ground-truth output for the input. The type and data format of the input and output sequences depends on the specific application for which the neural network 502 is to be trained. For abstractive summarization, for instance, the input sequences may be longer texts, and the corresponding output sequences may be human-generated text sequences. As another example, for image captioning, the input sequences are images, and the output sequences may be human-generated image captions (that is, by a plurality of models each of which has learned the feature of different pieces of learning target information and that has learned a feature of a plurality of pieces of learning target information generated from predetermined learning information)).
Regarding claim 14, the combination of Celikyilmaz and Nayeem teaches all of the limitations of claim 1, as described above in detail.
Celikyilmaz teaches -
wherein the processor is programmed to acquire the plurality of headings generated from predetermined target information by a plurality of models each of which includes an encoder that generates, when input information including a plurality of pieces of information is input, feature information indicating the feature held by the input information and a decoder that sequentially generates a plurality of pieces of information included in the headings from a feature information that has been generated by the encoder (Celikyilmaz ¶ 0046-47 teaches [i]n contrast to a single-agent encoder-decoder network, the multi-agent pointer network 120 allows each agent to "vote" for a different out-of-vocabulary word at time step t, and only the word that is relevant to the generated summary up to time tis collaboratively selected as a result of the agent attentions . . . . Having described various aspects of a multi-agent encoder-decoder neural-network architecture in accordance herewith, the description now turns, with reference to FIG. 5, to a computing system 500 for implementing, training, and using such a neural-network architecture for sequence-to-sequence mapping tasks (that is, a plurality of models each of which includes an encoder that generates, when input information including a plurality of pieces of information is input, feature information indicating the feature held by the input information and a decoder that sequentially generates a plurality of pieces of information included in the headings from a feature information that has been generated by the encoder)).
Regarding claim 15, the combination of Celikyilmaz and Nayeem teaches all of the limitations of claim 1, as described in detail above.
Celikyilmaz teaches -
wherein the processor is programmed to acquire the plurality of headings generated, from the target information that is a text, by the plurality of models each of which generates the output information that is a text from the input information that is a text (Celikyilmaz, Claim 17, teaches for generating text output from input to the encoder-decoder neural network (that is, each [model] of which generates the output information that is a text from the input information that is a text)).
13.	Claim 7 is rejected under 35 U.S.C. § 103 as being unpatentable over US Published Application 20190287012 to Celikyilmaz et al. [hereinafter Celikyilmaz] in view of Mir Tafseer Nayeem, "Methods of Sentence Extraction, Abstraction and Ordering for Automatic Text Summarization," Univ. of Lethbridge (Thesis Paper, 2017) [hereinafter Nayeem], and Efron et al., “Temporal Feedback for Tweet Search with Non-Parametric Density Estimation,” ACM (2017) [hereinafter Efron].
Regarding claim 7, the combination of Celikyilmaz and Nayeem teaches all of the limitations of claim 6, as described in detail above. 
Though Celikyilmaz and Nayeem teaches text summarization using probability estimation, the combination of Celikyilmaz and Nayeem, however, does not explicitly teach - 
wherein the processor is programmed to estimate, based on kernel density estimation in which the headings generated by the plurality of models are regarded as samples, the probability distribution of the heading that is generated from the predetermined target information.
But Efron teaches -
wherein the processor is programmed to estimate, based on kernel density estimation in which the headings generated by the plurality of models (Efron, abstract, teaches in the context of tweet search and temporal feedback: starting with an initial set of results from a baseline retrieval model, we estimate the temporal density of relevant documents, which is then used for result reranking. Our contributions lie in a method to characterize this temporal density function using kernel density estimation . . . and an approach to integrating this information into a standard retrieval model) are regarded as samples (Efron, right column of p. 36, “4.1 Kernel Density Estimation,” first paragraph, teaches Let {x1; x2; : : : , xn} be an i.i.d. sample drawn from some distribution with an unknown density f (that is, headings generated by the plurality of models are regarded as samples), the probability distribution of the heading that is generated from the predetermined target information (Efron, Table 1, teaches: 

    PNG
    media_image7.png
    154
    553
    media_image7.png
    Greyscale

Efron, Table 1 Caption, teaches [t]uning parameters for the temporal retrieval models. Separate values for each parameter were estimated (that is, probability distribution of the heading) from training topics (that is, “training topics” is generated from the predetermined target information) for runs with no lexical relevance feedback and runs with lexical feedback).
Celikyilmaz, Nayeem, and Efron are from the same or similar field of endeavor. Celikyilmaz teaches abstractive summarization that may employ multiple communicating encoder agents to encode multiple respective input sequences that collectively constitute the overall input. Nayeem teaches a rank based sentence selection that retains the most important and non-redundant contents to form a summary. Efron teaches determining document clustering with respect to time in the context of tweet searching. Thus, it would have been obvious to a person having ordinary skill in the art as of the effective filing date of the Applicant’s invention to modify the combination of Celikyilmaz and Nayeem pertaining to abstractive summarization based on respective sentence similarities with the kernel density estimation of Efron.
The motivation for doing so is because temporal feedback improves over standard lexical (that is, semantic) feedback. (Efron, Abstract).
14.	Claim 10 and 12 are rejected under 35 U.S.C. § 103 as being unpatentable over US Published Application 20190287012 to Celikyilmaz et al. [hereinafter Celikyilmaz] in view of Mir Tafseer Nayeem, "Methods of Sentence Extraction, Abstraction and Ordering for Automatic Text Summarization," Univ. of Lethbridge (Thesis Paper, 2017) [hereinafter Nayeem], and Mishra et al., “Leveraging Semantic Annotations for Event-Focused Search & Summarization,” Thesis (2017) [hereinafter Mishra].
Regarding claim 10, the combination of Celikyilmaz and Nayeem teaches all of the limitations of claim 9, as described in detail above. 
Though Celikyilmaz and Nayeem teach the features of abstractive summarization using probability estimation, the combination of Celikyilmaz and Nayeem, however, does not explicitly teach -
wherein the processor is programmed to acquire the headings generated by a plurality of models that is generated from a plurality of models in each of which a connection coefficient between nodes is randomly different and that is allowed to learn a feature that is individually held by learning information.
But Mishra teaches -
wherein the processor is programmed to acquire the headings generated by a plurality of models that is generated from a plurality of models in each of which a connection coefficient between nodes is randomly different (Mishra at p. 161, “5.5.2 Goals, Measures, and Methodology,” last two paragraphs, teaches the Rand method randomly sets the edge weights (that is, the connection coefficient between nodes is randomly different) in an event graph. This method highlights the quality of the temporal expressions in the seed set. . . . We set the following parameters for our methods) and that is allowed to learn the feature that is individually held by learning information
[Examiner notes the “that is allowed to learn” language merely recites an intended use, and is not positively recited within the claim. Also, Examiner notes that the “connection coefficient” is not defined by the claims or specification; however, the specification recites “the model database 32, a plurality of models that are obtained by randomly changing the initial parameter and that output each of the words included in the output sentence in the appearance order of the words if each of the words included in the input sentence is input in the appearance order of the words is registered.” (Specification at p. 24, lines 3-8). That is, the “connection coefficient” is a parameter that is adjusted during training of the neural network]). 
Celikyilmaz, Nayeem, and Mishra are from the same or similar field of endeavor. Celikyilmaz teaches abstractive summarization that may employ multiple communicating encoder agents to encode multiple respective input sequences that collectively constitute the overall input. Nayeem teaches a rank based sentence selection that retains the most important and non-redundant contents to form a summary. Mishra teaches linking semantically similar information contained across the different sources to enforce a structure thereby providing multiple access paths to relevant information. Thus, it would have been obvious to a person having ordinary skill in the art as of the effective filing date of the Applicant’s invention to modify the combination of Celikyilmaz and Nayeem pertaining to abstractive summarization based on respective sentence similarities with the parameter randomization of Mishra.
The motivation for doing so is to improve searches across disparate information sources to address problems of linking, an integer linear programming for global references across text, time, geolocations, and entities, and estimation of accurate probabilistic time models. (Mishra, Abstract).
Regarding claim 12, the combination of Celikyilmaz and Nayeem teaches all of the limitations of claim 9, as described above in detail.
Though Celikyilmaz and Nayeem each the features of abstractive summarization using probability estimation, the combination of Celikyilmaz and Nayeem, however, does not explicitly teach - 
wherein the processor is programed to acquire the headings generated by a plurality of models each having a different connection relation between nodes.
But Mishra teaches -
wherein the processor is programmed to acquire the headings generated by a plurality of models each having a different connection relation between nodes (Mishra at p. 161, “5.5.2 Goals, Measures, and Methodology,” last two paragraphs, teaches the Rand method randomly sets (that is, “randomly sets” is to be “different,” which the edge weights (that is, the generated by a plurality of models each having a different connection relation between nodes) in an event graph. This method highlights the quality of the temporal expressions in the seed set. . . . We set the following parameters for our methods . . . .
[Examiner notes that the “connection relation” is synonymous to “connection coefficient,” and neither defined by the claims nor the specification; however, the specification recites “the model database 32, a plurality of models that are obtained by randomly changing the initial parameter and that output each of the words included in the output sentence in the appearance order of the words if each of the words included in the input sentence is input in the appearance order of the words is registered.” (Specification at p. 24, lines 3-8; Specification at p. 35, lines 16-18). That is, the “connection coefficient” is a parameter that is adjusted during training of the neural network]).
Celikyilmaz, Nayeem, and Mishra are from the same or similar field of endeavor. Celikyilmaz teaches abstractive summarization that may employ multiple communicating encoder agents to encode multiple respective input sequences that collectively constitute the overall input. Nayeem teaches a rank based sentence selection that retains the most important and non-redundant contents to form a summary. Mishra teaches linking semantically similar information contained across the different sources to enforce a structure thereby providing multiple access paths to relevant information. Thus, it would have been obvious to a person having ordinary skill in the art as of the effective filing date of the Applicant’s invention to modify the combination of Celikyilmaz and Nayeem pertaining to abstractive summarization based on respective sentence similarities with the parameter randomization of Mishra.
The motivation for doing so is to improve searches across disparate information sources to address problems of linking, an integer linear programming for global references across text, time, geolocations, and entities, and estimation of accurate probabilistic time models. (Mishra, Abstract).
Response to Arguments
15.	Examiner has fully considered the Applicant’s arguments, and responds below accordingly:
16.	Applicant argues with regard to the rejection under Section 101 that “the present claims are directed to a practical application under step 2A, prong 2. This is because the present application describes a more efficient way of generating headings.” (Response at pp. 8-9).
Examiner respectfully disagrees. The consideration under Step 2A, Prong 2, is whether the claim as a whole integrates the recited judicial exception into a practical application of the exception in Step 2A Prong Two. A claim that integrates a judicial exception into a practical application will apply, rely on, or use the judicial exception in a manner that imposes a meaningful limit on the judicial exception. (see MPEP § 2106.04(d)).
Applicant points to the Specification (PGPUB ¶¶ 6, 7, and 36) in support of improving efficiency in generating headings, and submits that claim 1 recites an additional limitation of an additional limitation of “automatically select, based on the automatically determined similarity between the headings.” This additional limitation, however, recites an abstract idea directed to a mental process, as set out in the rejections hereinabove. Accordingly, this limitation is not “an additional limitation,” and cannot demonstrate that the claim as a whole integrates the exception into a practical application. 
17.	Applicant argues that Celikyilmaz “does not disclose [the instant claims to] ‘acquire a plurality of pieces of output information generated from predetermined target information by a plurality of models . . . .’” (Response at p. 10).
Examiner respectfully disagrees. Though Applicant has amended the claim to provide further clarity, Celikyilmaz teaches these features, as set out in detail in the rejection above. 
18.	Applicant argues that Celikyilmaz “does not disclose ‘automatically select, based on the automatically determined similarity between the headings, at least one heading of the plurality of headings.’” (Response at p. 11).
Examiner agrees. Examiner cites to the teachings of Nayeem as teaching these features, as set out in detail hereinabove.
Conclusion
19.	Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a). 
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action.
20.	The prior art made of record and not relied upon is considered pertinent to Applicant's disclosure:
(US Patent 9858257 to Hamaker et al.) teaches identifying similarities of context between a new publication and new deviations of an intentional linguistic deviation. 
(Kyono et al., “Source-side Prediction for Neural Headline Generation,” arXiv 2017) teaches estimates the probability distributions over source and target vocabularies to capture a correspondence between source and target tokens.
21.	Any inquiry concerning this communication or earlier communications from the Examiner should be directed to KEVIN L. SMITH whose telephone number is (571) 272-5964. Normally, the Examiner is available on Monday-Thursday 0730-1730. 
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, Applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the Examiner by telephone are unsuccessful, the Examiner’s supervisor, KAKALI CHAKI can be reached on 571-272-3719. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/K.L.S./
Examiner, Art Unit 2122
/KAKALI CHAKI/Supervisory Patent Examiner, Art Unit 2122