DETAILED ACTION
Notice of AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Amendment
The Applicant’s amendments submitted on 11/15/2022 have been entered.  Claims 1-20 are pending.
Abstract.  The examiner withdraws the objection to the abstract in view of the Applicant’s amendments to the Abstract.
Specification. The examiner withdraws the objections to the specification (specifically paragraphs 0002, 0026, 0032, 0033, 0045, and 0049) in view of the Applicant’s amendments to the specification.
Objection to Claim 9.  The examiner withdraws the objection to claim 9 in view of the Applicant’s amendments.
Rejections to Claims 1-20 under 35 U.S.C. 112(b). The examiner withdraws the rejections of claims 1-20 under 35 U.S.C. 112(b) in view of the Applicant’s amendments to the claims.
Rejections to Claims 1-20 under 35 U.S.C. 103.  In view of the Applicant’s amendments to the claims, the examiner withdraws the rejections of claims 1-5, 7-12, 14-18, and 20 under LIU in view of DEVLIN and provides new grounds for rejection under 35 U.S.C. 103 under LIU in view of VAN DEN OORD as disclosed below that were necessitated by Applicant’s amendments.  Further, the examiner withdraws the rejections of claims 6, 13, and 19 under the combination of LIU-DEVLIN-AHKOUK and provides new grounds for rejection of such claims under LIU-VAN DEN OORD-AHKOUK as disclosed below that were necessitated by Applicant’s amendments.

Response to Arguments
Applicant's arguments filed 11/15/2022 have been fully considered but they are not persuasive. Further, Applicant’s arguments with respect to the DEVLIN reference are moot in view of the new grounds of rejection provided herein.  Applicant argues the following on pages 11-12 in the 11/15/2022 amendments:
Applicants respectfully submit that the cited references do not render the computer system of claim 1 unpatentable. For instance, the cited references do not disclose or suggest wherein the embedding layer is configured to generate a set of training data by generating a  plurality of token embeddings for the plurality of tokens and combining a token embedding in the plurality of token embeddings for a token in the plurality of tokens at a current position with a set of token embeddings in the plurality of token embeddings for a set of tokens in the plurality of tokens at one or more of a previous position and a subsequent position. 

The Office Action cites to Figure 1 of Liu and Figure 2 of Devlin as disclosing wherein one or more of the embedding layer and the classifier layer combine masked tokens at a current position with tokens at one or more of a previous position and a subsequent position. However, the cited sections of Liu and Devlin describes, for a particular input token, summing together a token embedding for the particular input token, a segmentation embedding for the particular input token, and a position embedding for the particular input token. Neither Liu nor Devlin teaches an embedding layer that generates a set of training data by (1) generating a plurality of token embeddings for the plurality of tokens and (2) combining a token embedding in the plurality of token embeddings for a token in the plurality of tokens at a current position with a set of token embeddings in the plurality of token embeddings for a set of tokens in the plurality of tokens at one or more of a previous position and a subsequent position. That is, Liu and Devlin do not disclose the concept of combining, for a particular input token, the token embedding for the particular token with token embeddings for tokens at other positions (e.g., a previous position, a subsequent position, etc.). As such, neither Liu, Devlin, nor their combination discloses the present limitation. None of the other cited references cure this deficiency. Accordingly, the cited references do not disclose the present limitation. (emphasis added).


	The examiner respectfully disagrees with Applicant’s argument that LIU does not teach “an embedding layer that generates a set of training data by (1) generating a plurality of token embeddings for the plurality of tokens”. LIU discloses token embeddings, including token embeddings, interval segment embeddings, and position embeddings (see p. 2, Fig. 1).  LIU further discloses generating and embedding training data (i.e., the CNN/DailyNews and New York Times Annotated Corpus) for training a transformer baseline based on BERT. (pp. 3-4, sections 3.2 and 4). 
	With respect to Applicant’s argument that LIU does not teach “an embedding layer that generates a set of training data by” “(2) combining a token embedding in the plurality of token embeddings for a token in the plurality of tokens at a current position with a set of token embeddings in the plurality of token embeddings for a set of tokens in the plurality of tokens at one or more of a previous position and a subsequent position”, the examiner refers to the new grounds of rejection under LIU in view of VAN DEN OORD that were necessitated by Applicant’s amendments to at least claims 1, 8, and 15.

	On page 13 of Applicant’s 11/15/2022 amendments, Applicant argues that claims 8-14 and 15-20 are distinguishable over the prior art for the same reasons as claim 1, above.  As noted above, the examiner respectfully disagrees with Applicant’s arguments with respect to the LIU reference and has set forth new grounds of rejection for each of claims 1-20 as set forth herein.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1-5, 7, 8-12, 14-18 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Liu Y., Fine-tune BERT for extractive summarization. arXiv preprint arXiv:1903.10318. 2019 Mar 25., hereinafter referenced as LIU, in view of van den Oord et al., US 20180075343 A1, hereinafter referenced as VAN DEN OORD.

Regarding Claim 1, LIU discloses:
A computer system (BERTSUM system, a variant of BERT, for extracting document summaries; Abstract) comprising: 
one or more processors (3 GPUs (GTX 1080 Ti); Section 3.1); and 
a non-transitory computer-readable medium storing instructions that when executed by the one or more processors causes the one or more processors to (system implemented using PyTorch and OpenNMT, open source machine learning frameworks, with code provided at the referenced GitHub link; Abstract and Section 3.1): 
receive a set of input data (BERTSUM system receives input documents as depicted in Figure 1 below; input documents include the CNN/Dailymail and NYT datasets; Section 1), the input data comprising a plurality of tokens at a plurality of positions, (Figure 1, the input documents are tokenized into word tokens (e.g., <sent>), including addition of [CLS] and [SEP] tokens, with position embeddings (e.g., E1, E2, … E12); Section 2.1 – Encoding Multiple Sentences) 

    PNG
    media_image1.png
    425
    868
    media_image1.png
    Greyscale

process the plurality of tokens in an embedding layer (See Figure 1 above, with token embeddings, interval segment embeddings, and position embeddings input into BERT; Section 2.1), the embedding layer being coupled to a transformer layer (See Fig. 1 above, BERT is a transformer and the embedded tokens are input into BERT; Abstract); 
process the plurality of tokens in the transformer layer (See Figure 1 above, BERT processes the tokens and outputs sentence vectors Ti; Section 2.1 – Extractive Summarization with BERT), the transformer layer being coupled to a classifier layer (See Figure 1 above, Summarization Layers receive sentence vectors from BERT and output extracted summaries using a simple classifier, an inter-sentence transformer, and a recurrent neural network; Section 2.2 – Fine-tuning with Summarization Layers); and 
process the plurality of tokens in the classifier layer, (See Figure 1 above, Summarization Layers output extracted summaries and use multiple classifiers; Section 2.2 – Fine-tuning with Summarization Layers) the classifier layer being coupled to a loss layer (loss of the model is calculated using the binary classification entropy on the output of the Summarization Layers; Section 2.2 – Fine-tuning with Summarization Layers), 
wherein the embedding layer is configured to generate a set of training data by generating a plurality of token embeddings for the plurality of tokens (CNN/DailyNews and New York Times Annotate Corpus used for training; p. 3, section 3.2; embeddings for training BERTSUM described at pp. 1-2, section 2.1 and Fig. 1)

However, LIU fails to explicitly teach:
and combining a token embedding in the plurality of token embeddings for a token in the plurality of tokens at a current position with a set of token embeddings in the plurality of token embeddings for a set of tokens in the plurality of tokens at one or more of a previous position and a subsequent position.

However, in a related field of endeavor, VAN DEN OORD pertains to processing and generating sequences using neural networks.  (para. 0002).  Fig. 10 is a flow diagram of a process for generating a target sequence from a source sequence, which maps words in the source sequence to corresponding source embedding vectors, and the embedding model is a bag of n-grams that associates adjacent character tokens by adding the respective n-gram embedding vectors. (paras. 0038, 0169).

The LIU-VAN DEN OORD combination makes obvious:
 and combining a token embedding in the plurality of token embeddings for a token in the plurality of tokens at a current position with a set of token embeddings in the plurality of token embeddings for a set of tokens in the plurality of tokens at one or more of a previous position and a subsequent position. (VAN DEN OORD discloses generating source embedding vectors for n-gram tokens by adding the respective embedding vectors or concatenating the respective embedding vectors, e.g., combining token embeddings for a current token with one or more tokens at a previous and subsequent position, where a 3-gram of tokens includes a center (i.e., current) token, a previous token, and a subsequent token; VAN DEN OORD, paras. 0038, 0169; LIU discloses grouping tokens into trigrams for trigram blocking; LIU, p. 3, section 3.1; the LIU-VAN DEN OORD combination now includes a 3-gram embedding in the embedding layer along with the existing token embeddings, interval segment embeddings, and position embeddings; LIU, p. 2, fig. 1 and p. 3, section 3.1 with VAN DEN OORD, paras. 0038, 0169).

Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the n-gram token adding/concatenation teachings of VAN DEN OORD with LIU.  As disclosed in VAN DEN OORD, one of ordinary skill would be motivated to utilize n-gram embeddings because they enable an open vocabulary that is able to more easily predict rare words, proper names, numerical digits, and so on.  (para. 0044).
The examiner further notes that LIU discloses utilizing 3-grams as part of trigram blocking.  (LIU, p. 3, section 3.1).

Regarding Claim 2, LIU in view of VAN DEN OORD discloses the computer system of claim 1.  The LIU-VAN DEN OORD combination makes obvious:

wherein the embedding layer combines the token embedding for the token at the current position with a first token embedding in the set of token embeddings for a first token in the set of tokens at a previous position and a second token embedding in the set of token embeddings for a second token in the set of tokens at a subsequent position. (VAN DEN OORD discloses generating source embedding vectors for n-gram tokens by adding the respective embedding vectors or concatenating the respective embedding vectors, e.g., combining token embeddings for a current token with one or more tokens at a previous and subsequent position, where a 3-gram of tokens includes a center (i.e., current) token, a previous token (i.e., a first token with associated token embeddings), and a subsequent token (i.e., a second token with associated token embeddings); VAN DEN OORD, paras. 0038, 0169; LIU discloses grouping tokens into trigrams for trigram blocking; LIU, p. 3, section 3.1; the LIU-VAN DEN OORD combination now includes a 3-gram embedding in the embedding layer along with the existing token embeddings, interval segment embeddings, and position embeddings; LIU, p. 2, fig. 1 and p. 3, section 3.1 with VAN DEN OORD, paras. 0038, 0169).

Regarding Claim 3, LIU in view of VAN DEN OORD discloses the computer system of claim 1.  LIU further discloses:
wherein the set of training data is used to train the transformer layer. (CNN/DailyNews and New York Times Annotate Corpus used for training; p. 3, section 3.2; BERT model, i.e., the transformer layer, is trained for the summarization task using these data sets; p. 4, section 4)

Regarding Claim 4, LIU in view of VAN DEN OORD discloses the computer system of claim 1.  The LIU-VAN DEN OORD combination further makes obvious:
	wherein the combining by the embedding layer comprises summing the token embedding for the token at the current position and the set of token embeddings for the set of tokens at the one or more of the previous position and the subsequent position. (VAN DEN OORD discloses generating source embedding vectors for n-gram tokens by summing the respective embedding vectors, where a 3-gram of tokens includes a center (i.e., current) token, a previous token, and a subsequent token; VAN DEN OORD, paras. 0038, 0169; LIU discloses grouping tokens into trigrams for trigram blocking; LIU, p. 3, section 3.1; the LIU-VAN DEN OORD combination now includes a 3-gram embedding in the embedding layer along with the existing token embeddings, interval segment embeddings, and position embeddings; LIU, p. 2, fig. 1 and p. 3, section 3.1 with VAN DEN OORD, paras. 0038, 0169).

Regarding Claim 5, LIU in view of VAN DEN OORD discloses the computer system of claim 1.  LIU further discloses:
wherein the classifier layer is configured to concatenate a first subset of the plurality of token embeddings and a second subset of the plurality of token embeddings. (the BERTSUM output is an “extractive summary,” which generates a summary by “copying and concatenating the most important spans (usually sentences [represented by tokens in a series of positions]) in a document”; p. 1, section 1.  The output from the Summarization Layers (i.e., classifier layers) identifies predicted score vectors Yi, which identifies the sentences (represented by tokens in a series of positions and their respective embeddings) to be copied and concatenated into the extractive summary, along with their respective token embeddings; Section 2.2)

Regarding Claim 7, LIU in view of VAN DEN OORD discloses the computer system of claim 1.  LIU further discloses:
wherein the classifier layer comprises gather modules to collect in parallel the plurality of tokens. (see Figure 1 above, the inputs T1, T2, and T3 to the Summarization Layers are depicted in parallel; the Summarization Layers, i.e., classifier layer, also include 3 separate layers providing separate outputs (simple classifier, inter-sentence transformer, recurrent neural network), which may collect and process masked tokens (input as sentence vectors T1, T2, and T3 with tokens in a series of positions) in parallel using the 3 separate GPUs referenced above with respect to Claim 1; Section 2.2 and 3.1) 

Claim 8 is directed to a method that corresponds to the computer system of claim 1, and is therefore rejected under the same grounds as claim 1 above.
Claim 9 depends from claim 8 and is directed to a method that corresponds to the computer system of claim 2, and is therefore rejected under the same grounds as claims 2 and 8 above.
Claim 10 depends from claim 8 and is directed to a method that corresponds to the computer system of claim 3, and is therefore rejected under the same grounds as claims 3 and 8 above.
Claim 11 depends from claim 8 and is directed to a method that corresponds to the computer system of claim 4, and is therefore rejected under the same grounds as claims 4 and 8 above.
Claim 12 depends from claim 8 and is directed to a method that corresponds to the computer system of claim 5, and is therefore rejected under the same grounds as claims 5 and 8 above.
Claim 14 depends from claim 8 and is directed to a method that corresponds to the computer system of claim 7, and is therefore rejected under the same grounds as claims 7 and 8 above.
Claim 15 is directed to a non-transitory machine-readable medium that corresponds to the computer system of claim 1, and is therefore rejected under the same grounds as claim 1 above.
Claim 16 depends from claim 15 and is directed to a non-transitory machine-readable medium that corresponds to the computer system of claim 2, and is therefore rejected under the same grounds as claims 2 and 15 above.
Claim 17 depends from claim 15 and is directed to a non-transitory machine-readable medium that corresponds to the computer system of claim 4, and is therefore rejected under the same grounds as claims 4 and 15 above.
Claim 18 depends from claim 15 and is directed to a non-transitory machine-readable medium that corresponds to the computer system of claim 5, and is therefore rejected under the same grounds as claims 5 and 15 above.
Claim 20 depends from claim 15 and is directed to a non-transitory machine-readable medium that corresponds to the computer system of claim 7, and is therefore rejected under the same grounds as claims 7 and 15 above.

Claims 6, 13, and 19 are rejected under 35 U.S.C. 103 as being unpatentable over LIU in view of VAN DEN OORD and further in view of Ahkouk K, Machkour M, Ennaji M, Erraha B, Antari J. Comparative study of existing approaches on the Task of Natural Language to Database Language. In2019 International Conference of Computer Science and Renewable Energies (ICCSRE) 2019 Jul 22 (pp. 1-6). IEEE, hereinafter referenced as AHKOUK.

Regarding Claim 6, LIU in view of VAN DEN OORD discloses the computer system of claim 1.  However, the LIU-VAN DEN OORD combination fails to explicitly teach:
wherein the embedding layer comprises embedding tables to process in parallel the plurality of tokens.

	However, in a related field of endeavor. AHKOUK pertains to translating natural language questions to database languages using the BERT language model.  (p. 1, section I and p. 5, section II).  The LIU-VAN DEN OORD-AHKOUK combination makes obvious:
	wherein the embedding layer comprises embedding tables to process in parallel the plurality of tokens. (AHKOUK discloses using BERT to determine the correct meaning for a word using the rest of the sentence as context (e.g., “lead” may mean “to guide” or a particular “metal” depending on the context) and then the applicable string tokens may be converted to numerical vectors to input into different models, where a person of ordinary skill would understand that a look-up table, i.e., an embedding table, is one way to convert from string (characters) to numbers (e.g., an ASCII or UNICODE character table); AHKOUK, page 5, left column; AHKOUK further explains that the BERT transform provides a parallelism benefit and that the “structure allows to train the different parts of the input in parallel”; AHKOUK, page 5, left column; the LIU-VAN DEN OORD-AHKOUK combination now utilizes the string-to-vector teachings of AHKOUK with respect to the input embeddings of LIU to convert character strings to ASCII or UNICODE for processing; LIU, p. 2, fig. 1 and p. 3, section 3.1 with VAN DEN OORD, paras. 0038, 0169; and AHKOUK, page 5, left column).

It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to apply the teachings of AHKOUK to LIU and VAN DEN OORD.  AHKOUK builds on the same BERT language model specifically identified in LIU.  As disclosed in AHKOUK, one of ordinary skill would be motivated to train different parts of the input in parallel, to “reduce[] the time of training enormously.”  (AHKOUK at page 5, left column). 
The examiner notes that parallelism could take advantage of the multiple GPUs used in LIU (see Section 3.1) and would reduce the 50,000 steps training time set forth in LIU (see Section 3.1).

Claim 13 depends from claim 8 and is directed to a method that corresponds to the computer system of claim 6, and is therefore rejected under the same grounds as claims 6 and 8 above.
Claim 19 depends from claim 15 and is directed to a non-transitory machine-readable medium that corresponds to the computer system of claim 6, and is therefore rejected under the same grounds as claims 6 and 15 above.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
US 11256707 B1 (Xiong et al.) discloses at Col. 8, lines 16-18, that token embeddings for n-grams are averaged across an entire utterance and concatenated.
Fisher, Joseph, et al. "Merge and label: A novel neural network architecture for nested NER." arXiv preprint arXiv:1907.00464 (2019), pp. 1-11.  At section 2.4, token embedding merging is explained at the structure layer.
Basta, Christine, et al. "Evaluating the underlying gender bias in contextualized word embeddings." arXiv preprint arXiv:1904.08783 (2019), pp. 1-7.  Section 2.1 discloses that the “outcome of word2vec is an embedding table, where a numeric vector is associated to each of the words included in the vocabulary.”

Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHAEL C LEE whose telephone number is (571)272-4933. The examiner can normally be reached M-F 9:00 am - 5:00 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Andrew Flanders can be reached on 571-272-7516. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/MICHAEL C. LEE/Examiner, Art Unit 2655                                                                                                                                                                                                        /ANDREW C FLANDERS/Supervisory Patent Examiner, Art Unit 2655