DETAILED ACTION
Notice of AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Priority
Regarding India Provisional Patent Application No.202041038527, filed on September 7, 2020, receipt is acknowledged of certified copies of papers required by 37 CFR 1.55.
Information Disclosure Statement
The information disclosure statement submitted on 02/10/2021 has been considered by the examiner.
Drawings
The drawings are objected to because in Figure 3, element 304 is labeled “resource detection component” but is referred to as “resource management component” in the instant specification at paras. 0054-0056 and it is unclear how a resource management component, such as a virtualization application, can be utilized for “resource detection”.
Corrected drawing sheets in compliance with 37 CFR 1.121(d) are required in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended. The figure or figure number of an amended drawing should not be labeled as “amended.” If a drawing figure is to be canceled, the appropriate figure must be removed from the replacement sheet, and where necessary, the remaining figures must be renumbered and appropriate changes made to the brief description of the several views of the drawings for consistency. Additional replacement sheets may be necessary to show the renumbering of the remaining figures. Each drawing sheet submitted after the filing date of an application must be labeled in the top margin as either “Replacement Sheet” or “New Sheet” pursuant to 37 CFR 1.121(d). If the changes are not accepted by the examiner, the applicant will be notified and informed of any required corrective action in the next Office action. The objection to the drawings will not be held in abeyance.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claims 1, 2, and 8-10 are rejected under 35 U.S.C. 103 as being unpatentable over Pruksachatkun, Yada, et al. "Intermediate-task transfer learning with pretrained models for natural language understanding: When and why does it work?." arXiv preprint arXiv:2005.00628 (May 9, 2020), hereinafter referenced as PRUKSACHATKUN, in view of Liu, Yinhan, et al. "Roberta: A robustly optimized bert pretraining approach." arXiv preprint arXiv:1907.11692 (2019), hereinafter referenced as LIU, and further in view of Lee, Jinhyuk, et al. "BioBERT: a pre-trained biomedical language representation model for biomedical text mining." Bioinformatics, (published Sept. 10, 2019), pp. 1234-1240, hereinafter referenced as LEE. 

Pursuant to MPEP 2131 I and II, the examiner also cites to the following article to provide further definitions and enablement for the BERT model, cited in PRUKSACHATKUN:
Devlin, Jacob et al., BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding (NAACL 2019), cited in Applicant’s 02/10/2021 IDS, hereinafter referenced as DEVLIN. (for BERT)

Regarding claim 1, PRUKSACHATKUN discloses
A method, comprising: (method of improving pretrained models on an intermediate task before fine-tuning again on a target task of interest; p. 1, section 1)
receiving, by a device, (NVIDIA P40 GPUs; p. 5, section 3) training data that includes one or more datasets associated with natural language processing; (“genre/source” data for training intermediate tasks and target tasks identified in table 1 on p. 3; training data is received by NVIDIA GPUs for experiments; p. 5, section 3)
training, by the device, a masked event cause-effect bidirectional encoder representations from transformers (C-BERT) model, (RoBERTa model, which stands for robustly optimized BERT pretraining approach; p. 2, section 2.1 and p. 11; RoBERTa is trained by utilizing masked language modeling tasks; p. 7, section 4.2; pursuant to MPEP 2131.01 II, examiner cites to LIU to define the meaning of “RoBERTa”, where LIU is the article cited in PRUKSACHATKUN for the RoBERTa model, and LIU at p. 1, section 1, defines RoBERTa as a “replication study of BERT pretraining” where RoBERTa proposes “an improved recipe for training BERT models) to generate pretrained weights and a trained masked event C-BERT model; (RoBERTa is fine-tuned, e.g., trained, on the Cosmos QA intermediate task that includes questions concerning the causes or effects of events that require reasoning based on context and commonsense reasoning and the fine-tuned RoBERTa model, e.g., trained masked event C-BERT model, is a neural network model having pretrained weights; pp. 2-3, sections 2.1 and 2.2.1; pursuant to MPEP 2131.01 II, examiner cites to DEVLIN, which is cited in both PRUKSACHATKUN and LIU to define the BERT model, where DEVLIN defines BERT as a multi-layer bidirectional transformer with a number of layers, where the nodes in the layers have respective trained weights, e.g., pretrained weights; DEVLIN, p. 4173, section 3 and p. 4176, section 4.1)
training, by the device, an event aware C-BERT model, (transfer learning from intermediate task, e.g., Cosmos QA fine-tuned RoBERTa model, to target task or probing task, e.g., COPA (cause and effect of premises) or Cosmos QA (cause or effects of events that requires reasoning) or Commonsense QA or ReCoRD (entity recognition); Abstract and p. 1, Fig. 1 and pp. 3-4, section 2.2.2) with the training data and the pretrained weights, to generate a trained event aware C-BERT model; (transfer learning to final target and probing tasks utilizes data sources in Table 1, e.g., training data, to transfer intermediate task mode, such as Cosmos QA fine-tuned RoBERTa model, e.g., event aware C-BERT model with pretrained weights, to generate a final task or probing task model, e.g., RoBERTa fine-tuned for COPA or Commonsense QA or Cosmos QA or ReCoRD; pp. 2-3, sections 2.1, 2.2 and Table 1 and pp. 3-4, sections 2.2.2 and 2.2.3)
receiving, by the device, natural language text data identifying one or more natural language events; (target tasks and probing tasks have datasets with text inputs, including entities and events, e.g., natural language text data identifying natural language events; pp. 4-5, sections 2.2.2 and 2.2.3; experiments use GPUs, which run a pipeline including the datasets from 10 target tasks and 25 probing tasks; pp. 5-6, section 3 and Fig. 2)
processing, by the device, the natural language text data, with the trained masked event C-BERT model, to determine one or more weights; (“run the same pipeline three times for the 11 intermediate tasks”, e.g., run the pipeline on the fine-tuned Cosmos QA RoBERTa model, which is the intermediate model, where the output from the intermediate task is the one or more weights, or values; pp. 5-6, section 3 and Fig. 2)
processing, by the device, the natural language text data and the one or more weights, with the trained event aware C-BERT model, (experiment pipeline then runs the experiment using the datasets and the intermediate tasks, e.g., natural language text data and the one or more weights, on the final task and probe models such as the fine-tuned COPA and Cosmos QA models, e.g., trained event aware C-Bert model; pp. 5-6, section 3 and Fig. 2) to predict one or more causality relationships between the one or more natural language events; and (COPA final task model is a “classification task that consists of premises and a question that asks for the cause or effect of each premise, in which models must correctly pick between two possible choices” and Cosmos QA “concern[s] the cases or effects of events that require reasoning not only based on the exact text spans in the context, but also wide-range abstractive commonsense reasoning”; pp. 3-5, sections 2.2.1 and 2.2.2; the output of the CPA and Cosmos QA final task models is a prediction based on cause-effect from the input dataset, e.g., natural language events)

However, PRUKSACHATKUN fails to explicitly teach:
masking, by the device, the training data to generate masked training data; 
with the masked training data,
performing, by the device, one or more actions, based on the one or more causality relationships between the one or more natural language events.

However, in a related field of endeavor, LIU is an article that introduces and describes the RoBERTa model referenced in PRUKSACHATKUN.  The PRUKSACHATKUN-LIU combination makes obvious:
masking, by the device, the training data to generate masked training data; (LIU discloses that RoBERTA uses masked language modeling using 5 different text corpora, e.g., training data, and further uses both static and dynamic masking on the training data to generate masking patterns for training; LIU, p. 3, section 3.2 and p. 4, sections 4 and 4.1; the PRUKSACHATKUN-LIU combination now utilizes the training techniques of LIU, including dynamic and statistic masking to generate training masking patterns, to the training data in PRUKSACHATKUN; PRUKSACHATKUN, pp. 3-5 and table 1; with LIU, p. 3, section 3.2 and p. 4, sections 4 and 4.1)
training, by the device, a masked event cause-effect bidirectional encoder representations from transformers (C-BERT) model, with the masked training data, to generate pretrained weights and a trained masked event C-BERT model; (LIU discloses re-implementing BERT with larger datasets and using both static and dynamic masking on the training data to generate masking patterns for training; LIU, p. 2, section 2.3 and p. 3, sections 3.1 and 3.2 and p. 4, sections 4 and 4.1; the PRUKSACHATKUN-LIU combination now pre-trains the RoBERTa model of PRUKSACHATKUN (as opposed to fine-tuning), using the Cosmos QA task dataset, which was masked pursuant to the BERT and RoBERTa masked language model training as disclosed in LIU, e.g., using the masked training data, (under MPEP 2131, see also DEVLIN at 4174, section 3.1, explaining that pre-training uses the [MASK] token and fine-tuning does not), to generate a newly-trained RoBERTa model trained, e.g., trained masked event C-BERT model; PRUKSACHATKUN, pp. 2-3, sections 2.1 and 2.2.1 with LIU, p. 2, section 2.3 and p. 3, sections 3.1 and 3.2 and p. 4, sections 4 and 4.1

Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to apply the teachings of LIU to PRUKSACHATKUN.  Indeed, one of ordinary skill would be motivated to do so because PRUKSACHATKUN is built upon the RoBERTa model introduced and described in LIU.  Moreover, one of ordinary skill would further be motivated to apply the teachings of LIU, as LIU explains that RoBERTa is an improvement on BERT by re-training the BERT model using a larger data set, training on longer sequences, and performing dynamic masking, and one of ordinary skill would therefore understand that there would be performance benefits for a cause-effect version of BERT by retraining on new datasets rather than merely fine-tuning, as explained in LIU.  (LIU, p. 1, section 1)

However, the PRUKSACHATKUN-LIU combination fails to explicitly teach:
performing, by the device, one or more actions, based on the one or more causality relationships between the one or more natural language events.

However, in a related field of endeavor, LEE discloses the BioBERT model, which was initialized with the BERT weights, and then pre-trained using biomedical domain corpora.   (p. 1235, section 2).  BioBERT is capable of performing downstream tasks, including named entity recognition, relation extraction, and question answering.  (p. 1236, section 3.3).

The PRUKSACHATKUN-LIU-LEE combination makes obvious:
performing, by the device, one or more actions, based on the one or more causality relationships between the one or more natural language events. (LEE discloses that BioBERT can be applied to various downstream text mining tasks, including question answering; LEE, p. 1236, section 3.3; the PRUKSACHATKUN-LIU-LEE combination now uses the newly-trained RoBERTa model using the COPA and Cosmos QA final task models, to perform downstream text mining, such as performing question and answering tasks, e.g., one or more actions, using the cause-effect relations determined by the COPA and Cosmos QA final task models; PRUKSACHATKUN, pp. 2-5, sections 2.1 and 2.2.1 and 2.2.2 with LEE, p. 1236, section 3.3)

Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to apply the teachings of LEE to PRUKSACHATKUN and LIU.  As disclosed in LEE, one of ordinary skill would be motivated to utilize the improvements that LEE made to the BERT model, including the performance benefits of re-training (as opposed to fine-tuning) the BERT model, to the RoBERTa model of PRUKSACHATKUN and LIU, which as described above, is itself an improvement on BERT.  (LEE, p. 1235, sections 1 and 2).  As disclosed in LEE, one of ordinary skill would further be motivated to apply the teachings of LEE in order to achieve state-of-the-art performance on downstream tasks, including biomedical text mining tasks, while requiring only minimal architectural modifications. (LEE, p. 1235, section 2).  

	Regarding claim 2, the PRUKSACHATKUN-LIU-LEE combination discloses the method of claim 1.  The PRUKSACHATKUN-LIU-LEE combination further discloses:
wherein the one or more datasets include one or more of: 
a Semeval 2007 dataset, a Semeval 2010 dataset, an adverse drug effect dataset, or a drug-drug interaction dataset. (PRUKSACHATKUN discloses using the Semeval 2010 task 8 dataset; PRUKSACHATKUN, p. 5, section 2.2.3; LIU discloses re-implementing BERT with larger datasets and using both static and dynamic masking on the training data to generate masking patterns for training; LIU, p. 2, section 2.3 and p. 3, sections 3.1 and 3.2 and p. 4, sections 4 and 4.1; LEE discloses re-training BERT on biomedical corpora to generate BioBERT, as opposed to fine-tuning; LEE, pp. 1235-1236, section 3.2 and Fig. 1, and p. 1237, section 4.2 (discussing that it took 23 days to pre-train BERT using biomedical corpora); the PRUKSACHATKUN-LIU-LEE combination now pre-trains the RoBERTa model of PRUKSACHATKUN (as opposed to fine-tuning), using the Cosmos QA task dataset and Semeval 2010 task 8 dataset, which was masked pursuant to the BERT and RoBERTa masked language model training as disclosed in LIU; PRUKSACHATKUN, p. 5, section 2.2.3 with LIU, p. 2, section 2.3 and p. 3, sections 3.1 and 3.2 and p. 4, sections 4 and 4.1 and LEE, pp. 1235-36, section 3.2 and Fig. 1 and p. 1237, section 4.2)

	Regarding claim 8, the PRUKSACHATKUN-LIU-LEE combination discloses:
A device, comprising: (LIU discloses training and implementing RoBERTa using DGX-1 machines; LIU, p. 3, section 3.1; pursuant to MPEP 2131 I, the examiner cites to NVIDIA, the “NVIDIA DGX-1: The Essential Instrument for AI Research” (July 2019) datasheet, which explains that the DGX-1 is an AI server that includes NVIDIA GPUs, system memory, storage, networking capabilities, etc.; PRUKSACHATKUN discloses training and implementing software using NVIDIA P40 GPUs; PRUKSACHATKUN, p. 5, section 3; the PRUKSACHATKUN-LIU-LEE combination is now implemented on a server-based system using NVIDIA DGX-1 machines with NVIDIA GPUs)
one or more memories; and LIU discloses training and implementing RoBERTa using DGX-1 machines; LIU, p. 3, section 3.1; pursuant to MPEP 2131 I, the examiner cites to NVIDIA, the “NVIDIA DGX-1: The Essential Instrument for AI Research” (July 2019) datasheet, which explains that the DGX-1 is an AI server that includes GPU memory and system memory)
one or more processors, communicatively coupled to the one or more memories, configured to: (LIU discloses training and implementing RoBERTa using DGX-1 machines; LIU, p. 3, section 3.1; pursuant to MPEP 2131 I, the examiner cites to NVIDIA, the “NVIDIA DGX-1: The Essential Instrument for AI Research” (July 2019) datasheet, which explains that the DGX-1 is an AI server that includes NVIDIA GPUs, system memory, storage, networking capabilities, etc.; PRUKSACHATKUN discloses training and implementing software using NVIDIA P40 GPUs; PRUKSACHATKUN, p. 5, section 3; the PRUKSACHATKUN-LIU-LEE combination is now implemented on a server-based system using NVIDIA DGX-1 machines with NVIDIA GPUs)
receive training data that includes one or more datasets associated with natural language processing; replace event descriptions, provided in the training data, with blank tokens to generate masked training data; (LIU discloses that RoBERTA uses masked language modeling using 5 different text corpora, e.g., training data, and further uses both static and dynamic masking on the training data to generate masking patterns, e.g., replacing tokens in the training data with static and dynamic masking patterns, utilizing [MASK] tokens, or blank tokens, for training; LIU, p. 3, section 3.2 and p. 4, sections 4 and 4.1; PRUKSACHATKUN  discloses “genre/source” data for training intermediate tasks and target tasks identified in table 1 on p. 3; PRUKSACHATKUN discloses that RoBERTa is fine-tuned, e.g., trained, on the Cosmos QA intermediate task that includes questions concerning the causes or effects of events that require reasoning based on context and commonsense reasoning and the fine-tuned RoBERTa model; PRUKSACHATKUN, pp. 2-3, sections 2.1 and 2.2.1; PRUKSACHATKUN  discloses that training data is received by NVIDIA GPUs for experiments; PRUKSACHATKUN, p. 5, section 3; pursuant to MPEP 2131.01 II, examiner cites to DEVLIN, which is cited in both PRUKSACHATKUN and LIU (including for static masking) to define the BERT model, where DEVLIN explains that masked language modeling uses a blank [MASK] token; DEVLIN, p. 4174, section 3.1 and p. 4182, section A.1; the PRUKSACHATKUN-LIU-LEE combination now utilizes the training techniques of LIU, including dynamic and statistic masking to generate training masking patterns, to the training data in PRUKSACHATKUN, including training data using the Cosmos QA intermediate task that includes questions concerning the causes or effects of events, e.g., event descriptions; PRUKSACHATKUN, pp. 2-3, sections 2.1 and 2.2.1, pp. 3-5 and table 1; with LIU, p. 3, section 3.2 and p. 4, sections 4 and 4.1)
The remaining limitations in claim 8 claim limitations carried out that correspond to the method of claim 1, and therefore claim 8 is rejected under the same grounds set forth above with respect to claim 1 under 35 U.S.C. 103 in view of the PRUKSACHATKUN-LIU-LEE combination.
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to apply the teachings of LIU to PRUKSACHATKUN.  Indeed, one of ordinary skill would be motivated to do so because PRUKSACHATKUN is built upon the RoBERTa model introduced and described in LIU.  Moreover, one of ordinary skill would further be motivated to apply the teachings of LIU, as LIU explains that RoBERTa is an improvement on BERT by re-training the BERT model using a larger data set, training on longer sequences, and performing dynamic masking, and one of ordinary skill would therefore understand that there would be performance benefits for a cause-effect version of BERT by retraining on new datasets rather than merely fine-tuning, as explained in LIU.  (LIU, p. 1, section 1)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to apply the teachings of LEE to PRUKSACHATKUN and LIU.  As disclosed in LEE, one of ordinary skill would be motivated to utilize the improvements that LEE made to the BERT model, including the performance benefits of re-training (as opposed to fine-tuning) the BERT model, to the RoBERTa model of PRUKSACHATKUN and LIU, which as described above, is itself an improvement on BERT.  (LEE, p. 1235, sections 1 and 2).  As disclosed in LEE, one of ordinary skill would further be motivated to apply the teachings of LEE in order to achieve state-of-the-art performance on downstream tasks, including biomedical text mining tasks, while requiring only minimal architectural modifications. (LEE, p. 1235, section 2).  

Regarding claim 9, the PRUKSACHATKUN-LIU-LEE combination discloses the device of claim 8, including the limitation “wherein the one or more natural language events”.  PRUKSACHATKUN further teaches:
wherein the one or more natural language events include one or more of: 
a nominal in the natural language text data, 
a phrase in the natural language text data, or 
a span of text in the natural language text data. (target tasks and probing tasks have datasets with text inputs, including entities and events, e.g., natural language events, such as the Cosmos QA intermediate task and final tasks that includes “questions concern[ing] the causes or effects of events that require reasoning not only based on the exact text spans in the context, but also wide-range abstractive commonsense reasoning”, e.g., phrases and spans of text in the natural language text, and the Winograd Scheme Challenge with noun and pronoun phrases, e.g., nominals; PRUKSACHATKUN, pp. 2-5, sections 2.1 and 2.2.1-3)

Regarding claim 10, the PRUKSACHATKUN-LIU-LEE combination discloses the device of claim 8, including the “wherein the one or more processors, when processing the natural language text data and the one or more weights, with the trained event aware C-BERT model, to predict the one or more causality relationships between the one or more natural language events” limitation (see claim 8). PRUKSACHATKUN further teaches:
wherein the one or more processors, when processing the natural language text data and the one or more weights, with the trained event aware C-BERT model, to predict the one or more causality relationships between the one or more natural language events, are configured to: (see claim 8)
combine event information, sentence argument structure, and overall sentence context, associated with the one or more natural language events, to predict the one or more causality relationships between the one or more natural language events. (COPA final task model is a “classification task that consists of premises and a question that asks for the cause or effect of each premise, in which models must correctly pick between two possible choices”, e.g., event information, the Cosmos QA dataset “concern[s] the cases or effects of events that require reasoning not only based on the exact text spans in the context, but also wide-range abstractive commonsense reasoning”, e.g., event information and sentence context, and the edge-probing “tasks focus on the syntactic and semantic relations between spans in a sentence”, e.g., event information and sentence argument structure; pp. 3-5, sections 2.2.1 and 2.2.2; the output of the CPA and Cosmos QA final task and edge-probing task models is a prediction based on cause-effect from the input dataset, e.g., natural language events)

Claims 3, 4, and 11, 15, and 16 are rejected under 35 U.S.C. 103 as being unpatentable over the PRUKSACHATKUN-LIU-LEE combination and further in view of Hendrickx, Iris, et al. "SemEval-2010 Task 8: Multi-Way Classification of Semantic Relations Between Pairs of Nominals." (2010) pp. 33-38 (cited in Applicant’s 02/10/2021 IDS), hereinafter referenced as HENDRICKX.

Regarding claim 3, the PRUKSACHATKUN-LIU-LEE combination discloses the method of claim 1.  PRUKSACHATKUN further teaches:
wherein the one or more datasets include one or more sentences (e.g., datasets for QAMR, HellaSwag, CCG, QA-SRL, SST-2, MLNI all disclose datasets having one or more sentences; pp. 2-3, sections 2.2.1 and table 1)

However, the PRUKSACHATKUN-LIU-LEE combination fails to explicitly teach:
one or more pairs of event interactions, and the method further comprises: 
annotating the one or more sentences with event descriptions; and 
assigning one or more labels for each pair of the one or more pairs of event interactions.

However, in a related field of endeavor, HENDRICKX discloses the SemEval-2010 task 8 dataset cited in PRUKSACHATKUN. The PRUKSACHATKUN-LIU-LEE-HENDRICKX combination makes obvious:
wherein the one or more datasets include one or more sentences and one or more pairs of event interactions, (HENDRICKX discloses that the SemEval-2010 dataset includes cause-effect semantic relations; p. 33, section 2.1; the PRUKSACHATKUN-LIU-LEE-HENDRICKX combination uses the SemEval-2010 dataset described in HENDRICKX as disclosed in PRUKSACHATKUN; PRUKSACHATKUN, p. 5, section 2.2.3 with HENDRICKX, p. 33, section 2.1) and the method further comprises: 
annotating the one or more sentences with event descriptions; and (HENDRICKX discloses annotating sentences with cause-effect relations, such as, “those cancers were caused by radiation exposures”, e.g., event descriptions; HENDICKX, p. 33, section 2.1 and p. 34, sections 2.2 and 2.3; the PRUKSACHATKUN-LIU-LEE-HENDRICKX combination uses the SemEval-2010 dataset, with annotated cause-effect, e.g. event descriptions, as described in HENDRICKX and as disclosed in PRUKSACHATKUN; PRUKSACHATKUN, p. 5, section 2.2.3 with HENDRICKX, p. 33, section 2.1 and p. 34, sections 2.2 and 2.3)
assigning one or more labels for each pair of the one or more pairs of event interactions. (HENDRICKX discloses annotating sentences with cause-effect relations and labels, such as in the sentence, “When I came, the <e1> apples</e1> were already put in the <e2>basket</e2>.”, where <e1></e1> and <e2></e2> are labels for the event interaction; HENDRICKX; p. 34, section 2.3 the PRUKSACHATKUN-LIU-LEE-HENDRICKX combination uses the SemEval-2010 dataset, with labeled cause-effect, e.g. labeled event interactions, as described in HENDRICKX and as disclosed in PRUKSACHATKUN; PRUKSACHATKUN, p. 5, section 2.2.3 with HENDRICKX, p. 34, section 2.3).

Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the present invention to combine the teachings of HENDRICKX with PRUKSACHATKUN, LIU, and LEE.  Indeed, PRUKSACHATKUN cites to HENDRICKX for the SemEval-2010 task 8 dataset.  As disclosed in HENDRICKX, one of ordinary skill would further be motivated to apply the teachings of HENDRICKX to take advantage of the SemEval-2010 dataset, comprising around 1,200 sentences annotated by a pair of two independent human annotators, where a process for disagreements between the annotators was resolved, as a ground truth dataset for training and testing a model.  (p. 34, section 2.3).  One of ordinary skill would further be motivated to utilize the HENDRICKX SemEval-2010 task 8 dataset, as HENDRICKX discloses that the cause-effect relation led to particularly high performance by compared models.  (p. 38, section 4).

Regarding claim 4, the PRUKSACHATKUN-LIU-LEE combination discloses the method of claim 1, including the “wherein masking the training data to generate the masked training data comprises” limitation (see claim 1).  However, the PRUKSACHATKUN-LIU-LEE combination fails to explicitly teach:
replacing event descriptions, provided in the training data, with blank tokens to generate the masked training data.

However, in a related field of endeavor, HENDRICKX discloses the SemEval-2010 task 8 dataset cited in PRUKSACHATKUN. The PRUKSACHATKUN-LIU-LEE-HENDRICKX combination makes obvious:
replacing event descriptions, provided in the training data, (HENDRICKX discloses the SemEval-2010 task 8 dataset that has annotated and labeled cause-effect relations, e.g., labeled event descriptions; HENDICKX, p. 33, section 2.1 and p. 34, sections 2.2 and 2.3) with blank tokens to generate the masked training data. (LIU discloses that RoBERTA uses masked language modeling, including using both static and dynamic masking on the training data to generate masking patterns for training; LIU, p. 3, section 3.2 and p. 4, sections 4 and 4.1; the PRUKSACHATKUN-LIU-LEE-HENDRICKX combination now utilizes the training techniques of LIU, including dynamic and statistic masking to generate training masking patterns, to the SemEval-2010 task 8 dataset disclosed in HENDRICKX and utilized in PRUKSACHATKUN; PRUKSACHATKUN, pp. 3-5 and table 1; with LIU, p. 3, section 3.2 and p. 4, sections 4 and 4.1 and HENDICKX, p. 33, section 2.1 and p. 34, sections 2.2 and 2.3).

Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the present invention to combine the teachings of HENDRICKX with PRUKSACHATKUN, LIU, and LEE.  Indeed, PRUKSACHATKUN cites to HENDRICKX for the SemEval-2010 task 8 dataset.  As disclosed in HENDRICKX, one of ordinary skill would further be motivated to apply the teachings of HENDRICKX to take advantage of the SemEval-2010 dataset, comprising around 1,200 sentences annotated by a pair of two independent human annotators, where a process for disagreements between the annotators was resolved, as a ground truth dataset for training and testing a model.  (p. 34, section 2.3).  One of ordinary skill would further be motivated to utilize the HENDRICKX SemEval-2010 task 8 dataset, as HENDRICKX discloses that the cause-effect relation led to particularly high performance by compared models.  (p. 38, section 4).

Regarding claim 11, the PRUKSACHATKUN-LIU-LEE combination discloses the device of claim 8.  However, the PRUKSACHATKUN-LIU-LEE combination fails to explicitly teach:
wherein each of the one or more causality relationships between the one or more natural language events includes a cause-effect interaction between two or more events expressed in a text expression.

However, in a related field of endeavor, HENDRICKX discloses the SemEval-2010 task 8 dataset cited in PRUKSACHATKUN. The PRUKSACHATKUN-LIU-LEE-HENDRICKX combination makes obvious:
wherein each of the one or more causality relationships between the one or more natural language events includes a cause-effect interaction between two or more events expressed in a text expression. (HENDRICKX discloses annotating sentences with cause-effect relations and labels, such as in the sentence, “When I came, the <e1> apples</e1> were already put in the <e2>basket</e2>.”, e.g., a text expression, where <e1></e1> and <e2></e2> are labels for the cause-effect interaction; HENDRICKX; p. 34, section 2.3 the PRUKSACHATKUN-LIU-LEE-HENDRICKX combination uses the SemEval-2010 dataset, with labeled cause-effect, e.g. labeled event interactions, as described in HENDRICKX and as disclosed in PRUKSACHATKUN; PRUKSACHATKUN, p. 5, section 2.2.3 with HENDRICKX, p. 34, section 2.3).

Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the present invention to combine the teachings of HENDRICKX with PRUKSACHATKUN, LIU, and LEE.  Indeed, PRUKSACHATKUN cites to HENDRICKX for the SemEval-2010 task 8 dataset.  As disclosed in HENDRICKX, one of ordinary skill would further be motivated to apply the teachings of HENDRICKX to take advantage of the SemEval-2010 dataset, comprising around 1,200 sentences annotated by a pair of two independent human annotators, where a process for disagreements between the annotators was resolved, as a ground truth dataset for training and testing a model.  (p. 34, section 2.3).  One of ordinary skill would further be motivated to utilize the HENDRICKX SemEval-2010 task 8 dataset, as HENDRICKX discloses that the cause-effect relation led to particularly high performance by compared models.  (p. 38, section 4).

Regarding claim 15, the PRUKSACHATKUN-LIU-LEE combination discloses:
A non-transitory computer-readable medium storing a set of instructions, the set of instructions comprising: (LIU discloses training and implementing RoBERTa using DGX-1 machines; LIU, p. 3, section 3.1; pursuant to MPEP 2131 I, the examiner cites to NVIDIA, the “NVIDIA DGX-1: The Essential Instrument for AI Research” (July 2019) datasheet, which explains that the DGX-1 is an AI server that includes NVIDIA GPUs, system memory, data storage including a 4x1.92 TB SSD Raid 0 system, e.g., non-transitory computer-readable storage medium; PRUKSACHATKUN discloses training and implementing software using NVIDIA P40 GPUs; PRUKSACHATKUN, p. 5, section 3; the PRUKSACHATKUN-LIU-LEE combination is now implemented on a server-based system using NVIDIA DGX-1 machines with NVIDIA GPUs and a 4x1.92 TB SSD Raid 0 system, e.g., non-transitory computer-readable storage medium storing computer instructions)
one or more instructions that, when executed by one or more processors of a device, cause the device to: (LIU discloses training and implementing RoBERTa using DGX-1 machines; LIU, p. 3, section 3.1; pursuant to MPEP 2131 I, the examiner cites to NVIDIA, the “NVIDIA DGX-1: The Essential Instrument for AI Research” (July 2019) datasheet, which explains that the DGX-1 is an AI server that includes NVIDIA GPUs, system memory, storage, networking capabilities, etc.; PRUKSACHATKUN discloses training and implementing software using NVIDIA P40 GPUs; PRUKSACHATKUN, p. 5, section 3; the PRUKSACHATKUN-LIU-LEE combination is now implemented on a server-based system using NVIDIA DGX-1 machines with NVIDIA GPUs with software instructions)
receive data that includes one or more datasets associated with natural language processing; (PRUKSACHATKUN discloses target tasks and probing tasks have datasets with text inputs, including entities and events, e.g., natural language text data identifying natural language events; pp. 4-5, sections 2.2.2 and 2.2.3; experiments use GPUs, which run a pipeline including the datasets from 10 target tasks and 25 probing tasks; PRUKSACHATKUN, pp. 5-6, section 3 and Fig. 2)
mask the training data to generate masked training data; (LIU discloses that RoBERTA uses masked language modeling using 5 different text corpora, e.g., training data, and further uses both static and dynamic masking on the training data to generate masking patterns for training; LIU, p. 3, section 3.2 and p. 4, sections 4 and 4.1; the PRUKSACHATKUN-LIU combination now utilizes the training techniques of LIU, including dynamic and statistic masking to generate training masking patterns, to the training data in PRUKSACHATKUN; PRUKSACHATKUN, pp. 3-5 and table 1; with LIU, p. 3, section 3.2 and p. 4, sections 4 and 4.1)
train a masked event cause-effect bidirectional encoder representations from transformers (C-BERT) model, with the masked training data, to generate pretrained weights and a trained masked event C-BERT model; (LIU discloses re-implementing BERT with larger datasets and using both static and dynamic masking on the training data to generate masking patterns for training; LIU, p. 2, section 2.3 and p. 3, sections 3.1 and 3.2 and p. 4, sections 4 and 4.1; LEE discloses re-training BERT on biomedical corpora to generate BioBERT, as opposed to fine-tuning; LEE, pp. 1235-1236, section 3.2 and Fig. 1, and p. 1237, section 4.2 (discussing that it took 23 days to pre-train BERT using biomedical corpora); LEE further discloses that BERT is a masked language model that predicts randomly masked words in a sequence, and that BioBERT and BERT have the same structure and same pre-training hyper-parameters; LEE, p. 1235, section 2 and p. 1237, sections 4.1 and 4.2; the PRUKSACHATKUN-LIU-LEE combination now pre-trains the RoBERTa model of PRUKSACHATKUN (as opposed to fine-tuning), using the Cosmos QA task dataset, which was masked pursuant to the BERT and RoBERTa masked language model training as disclosed in LIU and LEE, e.g., using the masked training data, (under MPEP 2131, see also DEVLIN at 4174, section 3.1, explaining that pre-training uses the [MASK] token and fine-tuning does not), to generate a newly-trained RoBERTa model trained, e.g., trained masked event C-BERT model; PRUKSACHATKUN, pp. 2-3, sections 2.1 and 2.2.1 with LIU, p. 2, section 2.3 and p. 3, sections 3.1 and 3.2 and p. 4, sections 4 and 4.1 and LEE, pp. 1235-36, sections 2 and 3.2 and Fig. 1 and p. 1237, sections 4.1 and 4.2)
train an event aware C-BERT model, (transfer learning from intermediate task, e.g., Cosmos QA fine-tuned RoBERTa model, to target task or probing task, e.g., COPA (cause and effect of premises) or Cosmos QA (cause or effects of events that requires reasoning) or Commonsense QA or ReCoRD (entity recognition); PRUKSACHATKUN, Abstract and p. 1, Fig. 1 and pp. 3-4, section 2.2.2) with the training data and the pretrained weights, to generate a trained event aware C-BERT model; (transfer learning to final target and probing tasks utilizes data sources in Table 1, e.g., training data, to transfer intermediate task mode, such as Cosmos QA fine-tuned RoBERTa model, e.g., event aware C-BERT model with pretrained weights, to generate a final task or probing task model, e.g., RoBERTa fine-tuned for COPA or Commonsense QA or Cosmos QA or ReCoRD; PRUKSACHATKUN, pp. 2-3, sections 2.1, 2.2 and Table 1 and pp. 3-4, sections 2.2.2 and 2.2.3)
receive natural language text data identifying one or more natural language events; (target tasks and probing tasks have datasets with text inputs, including entities and events, e.g., natural language text data identifying natural language events; PRUKSACHATKUN, pp. 4-5, sections 2.2.2 and 2.2.3; experiments use GPUs, which run a pipeline including the datasets from 10 target tasks and 25 probing tasks; PRUKSACHATKUN, pp. 5-6, section 3 and Fig. 2)
process the natural language text data, with the trained masked event C-BERT model, to determine one or more weights; (“run the same pipeline three times for the 11 intermediate tasks”, e.g., run the pipeline on the fine-tuned Cosmos QA RoBERTa model, which is the intermediate model, where the output from the intermediate task is the one or more weights, or values; PRUKSACHATKUN, pp. 5-6, section 3 and Fig. 2)
process the natural language text data and the one or more weights, with the trained event aware C-BERT model, (experiment pipeline then runs the experiment using the datasets and the intermediate tasks, e.g., natural language text data and the one or more weights, on the final task and probe models such as the fine-tuned COPA and Cosmos QA models, e.g., trained event aware C-Bert model; PRUKSACHATKUN, pp. 5-6, section 3 and Fig. 2) to predict one or more causality relationships between the one or more natural language events; and (COPA final task model is a “classification task that consists of premises and a question that asks for the cause or effect of each premise, in which models must correctly pick between two possible choices” and Cosmos QA “concern[s] the cases or effects of events that require reasoning not only based on the exact text spans in the context, but also wide-range abstractive commonsense reasoning”; PRUKSACHATKUN, pp. 3-5, sections 2.2.1 and 2.2.2; the output of the CPA and Cosmos QA final task models is a prediction based on cause-effect from the input dataset, e.g., natural language events)
perform one or more actions, based on the one or more causality relationships between the one or more natural language events. (LEE discloses that BioBERT can be applied to various downstream text mining tasks, including question answering; LEE, p. 1236, section 3.3; the PRUKSACHATKUN-LIU-LEE combination now uses the newly-trained RoBERTa model using the COPA and Cosmos QA final task models, to perform downstream text mining, such as performing question and answering tasks, e.g., one or more actions, using the cause-effect relations determined by the COPA and Cosmos QA final task models; PRUKSACHATKUN, pp. 2-5, sections 2.1 and 2.2.1 and 2.2.2 with LEE, p. 1236, section 3.3)

Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to apply the teachings of LIU to PRUKSACHATKUN.  Indeed, one of ordinary skill would be motivated to do so because PRUKSACHATKUN is built upon the RoBERTa model introduced and described in LIU.  Moreover, one of ordinary skill would further be motivated to apply the teachings of LIU, as LIU explains that RoBERTa is an improvement on BERT by re-training the BERT model using a larger data set, training on longer sequences, and performing dynamic masking, and one of ordinary skill would therefore understand that there would be performance benefits for a cause-effect version of BERT by retraining on new datasets rather than merely fine-tuning, as explained in LIU.  (LIU, p. 1, section 1)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to apply the teachings of LEE to PRUKSACHATKUN and LIU.  As disclosed in LEE, one of ordinary skill would be motivated to utilize the improvements that LEE made to the BERT model, including the performance benefits of re-training (as opposed to fine-tuning) the BERT model, to the RoBERTa model of PRUKSACHATKUN and LIU, which as described above, is itself an improvement on BERT.  (LEE, p. 1235, sections 1 and 2).  As disclosed in LEE, one of ordinary skill would further be motivated to apply the teachings of LEE in order to achieve state-of-the-art performance on downstream tasks, including biomedical text mining tasks, while requiring only minimal architectural modifications. (LEE, p. 1235, section 2).  

However, the PRUKSACHATKUN-LIU-LEE combination fails to explicitly teach:
annotate one or more sentences, provided in the one or more datasets, with event descriptions to generate annotated data; 
assign one or more labels for each pair of event interactions, provided in the one or more datasets, to generate labeled data; 
combine the annotated data and the labeled data to form training data; 

However, in a related field of endeavor, HENDRICKX discloses the SemEval-2010 task 8 dataset cited in PRUKSACHATKUN. The PRUKSACHATKUN-LIU-LEE-HENDRICKX combination makes obvious:
annotate one or more sentences, provided in the one or more datasets, with event descriptions to generate annotated data; (HENDRICKX discloses annotating sentences with cause-effect relations, such as, “those cancers were caused by radiation exposures”, e.g., event descriptions; HENDICKX, p. 33, section 2.1 and p. 34, sections 2.2 and 2.3; the PRUKSACHATKUN-LIU-LEE-HENDRICKX combination uses the SemEval-2010 dataset, with annotated cause-effect, e.g. event descriptions, as described in HENDRICKX and as disclosed in PRUKSACHATKUN; PRUKSACHATKUN, p. 5, section 2.2.3 with HENDRICKX, p. 33, section 2.1 and p. 34, sections 2.2 and 2.3)
assign one or more labels for each pair of event interactions, provided in the one or more datasets, to generate labeled data; (HENDRICKX discloses annotating sentences with cause-effect relations and labels, such as in the sentence, “When I came, the <e1> apples</e1> were already put in the <e2>basket</e2>.”, where <e1></e1> and <e2></e2> are labels for the event interaction; HENDRICKX; p. 34, section 2.3; the PRUKSACHATKUN-LIU-LEE-HENDRICKX combination uses the SemEval-2010 dataset, with labeled cause-effect, e.g. labeled event interactions, as described in HENDRICKX and as disclosed in PRUKSACHATKUN; PRUKSACHATKUN, p. 5, section 2.2.3 with HENDRICKX, p. 34, section 2.3).
combine the annotated data and the labeled data to form training data; (the PRUKSACHATKUN-LIU-LEE-HENDRICKX combination now uses the SemEval-2010 dataset, with annotated cause-effect and labeled event interactions, as described in HENDRICKX and as disclosed in PRUKSACHATKUN; PRUKSACHATKUN, p. 5, section 2.2.3 with HENDRICKX, p. 34, section 2.3).

Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the present invention to combine the teachings of HENDRICKX with PRUKSACHATKUN, LIU, and LEE.  Indeed, PRUKSACHATKUN cites to HENDRICKX for the SemEval-2010 task 8 dataset.  As disclosed in HENDRICKX, one of ordinary skill would further be motivated to apply the teachings of HENDRICKX to take advantage of the SemEval-2010 dataset, comprising around 1,200 sentences annotated by a pair of two independent human annotators, where a process for disagreements between the annotators was resolved, as a ground truth dataset for training and testing a model.  (p. 34, section 2.3).  One of ordinary skill would further be motivated to utilize the HENDRICKX SemEval-2010 task 8 dataset, as HENDRICKX discloses that the cause-effect relation led to particularly high performance by compared models.  (p. 38, section 4).

Claim 16 depends from claim 15 and claims a non-transitory computer-readable medium storing instructions that when carried out correspond to the device of claim 4, and therefore claim 16 is rejected using the same grounds as claims 4 and 15 above.

Claim 5 is rejected under 35 U.S.C. 103 as being unpatentable over the PRUKSACHATKUN-LIU-LEE combination in view of Veitch, Victor, et al. "Adapting Text Embeddings for Causal Inference." arXiv preprint arXiv:1905.12741v.2 (July 25, 2020) pp. 1-10, hereinafter referenced as VEITCH.

Regarding claim 5, the PRUKSACHATKUN-LIU-LEE combination discloses the method of claim 1, including the “wherein each of the masked event C-BERT model and the event aware C-BERT model” limitation as recited in claim 1.  However, the PRUKSACHATKUN-LIU-LEE combination fails to explicitly teach:
is a feed-forward neural network model built on a bidirectional encoder representations from transformers model.  

	However, in a related field of endeavor, VEITCH pertains to causal interference on text, where the BERT language model is adapted to perform causal inference.  (pp. 1-2, section 1). The PRUKSACHATKUN-LIU-LEE-VEITCH combination makes obvious:
wherein each of the masked event C-BERT model and the event aware C-BERT model is a feed-forward neural network model (VEITCH discloses an encoder using a feedforward neural network; VEITCH, p. 4, section 3 and p. 6, section 4.1) built on a bidirectional encoder representations from transformers model.  (VEITCH discloses a modified BERT language model, called “Causal BERT”, with an additional logit-linear layer mapping and 2-hidden layer neural network; VEITCH, p. 4, section 3; PRUKSACHATKUN discloses fine-tuning RoBERTa using intermediate and final tasks; PRUKSACHATKUN, pp. 2-3, sections 2.1, 2.2 and Table 1 and pp. 3-4, sections 2.2.2 and 2.2.3; the PRUKSACHATKUN-LIU-LEE-VEITCH combination now modifies RoBERTa as disclosed in PRUKSACHATKUN and adds the feedforward neural network encoder disclosed in VEITCH; PRUKSACHATKUN, pp. 2-3, sections 2.1, 2.2 and Table 1 and pp. 3-4, sections 2.2.2 and 2.2.3 with VEITCH, p. 4, section 3 and p. 6, section 4.1).

	Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to apply the teachings of VEITCH to PRUKSACHATKUN, LIU, and LEE.  As disclosed in VEITCH, one of ordinary skill would be motivated to apply the teachings of VEITCH because VEITCH explains the advantages of supervised dimensionality reduction and language modeling (using BERT) for producing casually sufficient embeddings.  (p. 2, section 1). As disclosed in VEITCH, one of ordinary skill would further be motivated to apply the teachings of VEITCH to adjust estimating causal effects based on confounding (e.g., common causes).  (p. 2, section 1 and p. 3, section 2).

Claims 12-14 are rejected under 35 U.S.C. 103 as being unpatentable over the PRUKSACHATKUN-LIU-LEE combination further in view of Narain et al., US 20160171383 A1, hereinafter referenced as NARAIN.

Regarding claim 12, the PRUKSACHATKUN-LIU-LEE combination discloses the device of claim 8, including the “one or more processors, when performing the one or more actions” limitation (see claim 8).  The PRUKSACHATKUN-LIU-LEE combination further discloses:
wherein the one or more processors, when performing the one or more actions, are configured to one or more of: 
generate a response based on the one or more causality relationships (LEE discloses that BioBERT can be applied to various downstream text mining tasks, including question answering on datasets with Q&A, e.g., responses, pertaining to drug/chemical interactions; LEE, p. 1236, section 3.3 and p. 1239, table 9; the PRUKSACHATKUN-LIU-LEE combination now uses the newly-trained RoBERTa model using the COPA and Cosmos QA final task models, to perform downstream text mining, such as performing question and answering tasks, e.g., one or more actions, using the cause-effect relations determined by the COPA and Cosmos QA final task models and the drug/chemical datasets of LEE; PRUKSACHATKUN, pp. 2-5, sections 2.1 and 2.2.1 and 2.2.2 with LEE, p. 1236, section 3.3 and p. 1239, table 9)

	However, the PRUKSACHATKUN-LIU-LEE combination fails to explicitly teach:
generate and display a user interface that includes data identifying the one or more causality relationships; or 
provide the response to a user device.

	However, in a related field of endeavor, NARAIN relates to systems and methods for data analysis, and in particular, for using healthcare data to generate a causal relationship network model. (para. 0002).  Figs. 1 and 21 discloses a computer system 900, which can be used to implement client devices 110, 115, 120, 125 or server 135, which includes a video display unit 910 and a user interface navigation device 914. (paras. 0184-0185).  Figs. 7, 8, and 9A-9B depict a relationship model.  (paras. 0040-0043).  The relationships can relate to medical conditions and/or drugs, including predictors based on such medical conditions and/or drugs.  (paras. 0066-0067, 0098-0103).

	The PRUKSACHATKUN-LIU-LEE-NARAIN combination makes obvious:
generate and display a user interface that includes data identifying the one or more causality relationships; or (NARAIN discloses displaying a graphical representation of the causal relationship network model, e.g., data identifying one or more causality relationship, as part of a user interface; paras. NARAIN, 0100-0104; Figs. 7, 8, and 9A and 9B; NARAIN discloses that relationship-network module 230 may be implemented on client devices 110, 115, 120, and 125, and can be used to generate a causal relationship network model; NARAIN, paras. 0082-0086; the PRUKSACHATKUN-LIU-LEE-NARAIN combination now uses the newly-trained RoBERTa model using the COPA and Cosmos QA final task models from PRUKSACHATKUN, to perform downstream text mining, such as performing question and answering tasks, e.g., one or more actions, using the cause-effect relations determined by the COPA and Cosmos QA final task models, or any of the models utilized for BioBERT as in LEE, and then displays the cause-effect relations using the relationship-network model and user interface of NARAIN; PRUKSACHATKUN, pp. 2-5, sections 2.1 and 2.2.1 and 2.2.2 with LEE, p. 1236, section 3.3 and NARAIN, paras. 0082-0086, 0100-0104)
generate a response based on the one or more causality relationships and provide the response to a user device. (NARAIN discloses an AI-based informatics platform that uses the relationship network models to obtain predictions, e.g., responses, accompanied by confidence levels, regarding hypotheses in the healthcare space; NARAIN, paras. 0095, 0109; client devices 110, 115, 120, and 125 are part of the healthcare analysis system and receive a causal relationship network generated by server 135; NARAIN, Fig. 1, paras. 0074, 0077; the PRUKSACHATKUN-LIU-LEE-NARAIN combination now uses the newly-trained RoBERTa model using the COPA and Cosmos QA final task models, or any of the models utilized for BioBERT as in LEE, and then displays the cause-effect relations using the relationship-network model and user interface of NARAIN on a client device; PRUKSACHATKUN, pp. 2-5, sections 2.1 and 2.2.1 and 2.2.2 with LEE, p. 1236, section 3.3 and NARAIN, paras. 0077, 0095, 0109)

Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of NARAIN to PRUKSACHATKUN, LIU, and LEE.  As disclosed in NARAIN, one of ordinary skill would be motivated to utilize the teachings of NARAIN to use data-driven methodology in healthcare to expedite medical research and improve patient case by providing unique insights into relationships of the data points in the datasets, such as drug interactions and disease and/or biological process interactions.  (paras. 0124, 0171-0172).  As further disclosed in NARAIN, one of ordinary skill would further be motivated to utilize the user interface of NARAIN to allow users to input information about medical conditions and then review displayed graphical representations of variables and predictors and the relationships between the variables and predictors. (paras. 0022-0023).
The examiner notes that LEE discloses the BioBERT pre-trained biomedical language representation model for biomedical text mining which is also in the healthcare field.

Regarding claim 13, the PRUKSACHATKUN-LIU-LEE combination discloses the device of claim 8, including the “one or more processors, when performing the one or more actions” limitation (see claim 8).  The PRUKSACHATKUN-LIU-LEE combination further discloses:
wherein the one or more processors, when performing the one or more actions, are configured to one or more of:
retrain the masked event C-BERT model or the event aware C-BERT model based on the one or more causality relationships. (LEE discloses that BioBERT may be fine-tuned, e.g., re-trained, using the biomedical text mining task of question/answering; LEE, p. 1235, section 2; PRUKSACHATKUN discloses the output of the CPA and Cosmos QA final task models is a prediction based on cause-effect events, e.g., causality relationships; PRUKSACHATKUN, pp. 3-5, sections 2.2.1 and 2.2; the PRUKSACHATKUN-LIU-LEE combination now fine-tunes RoBERTa, as disclosed in PRUKSACHATKUN and as trained using the Cosmos QA dataset of PRUKSACHATKUN or any of the biomedical datasets of LEE, and fine-tunes RoBERTa in view of the biomedical text mining task of question/answering as disclosed in LEE; PRUKSACHATKUN, pp. 3-5, sections 2.2.1 and 2.2 with LEE, p. 1235, section 2)

	However, the PRUKSACHATKUN-LIU-LEE combination fails to explicitly teach:
determine a decision support task based on the one or more causality relationships and cause the decision support task to be implemented; or 

However, in a related field of endeavor, NARAIN relates to systems and methods for data analysis, and in particular, for using healthcare data to generate a causal relationship network model. (para. 0002).  The PRUKSACHATKUN-LIU-LEE-NARAIN combination makes obvious:
determine a decision support task based on the one or more causality relationships and cause the decision support task to be implemented; or (NARAIN discloses an AI-based informatics platform that uses the relationship network models to generate hypotheses and predictions, e.g., decision support tasks, accompanied by confidence levels and provides the predictions and hypotheses to the client device for display on a user interface, e.g., implement the decision support task; NARAIN, paras. 0095, 0109; the PRUKSACHATKUN-LIU-LEE-NARAIN combination now uses the newly-trained RoBERTa model using the COPA and Cosmos QA final task models, or any of the models utilized for BioBERT as in LEE, and then displays predictions and hypotheses as disclosed in NARAIN using a user interface on a client device; PRUKSACHATKUN, pp. 2-5, sections 2.1 and 2.2.1 and 2.2.2 with LEE, p. 1236, section 3.3 and NARAIN, paras. 0077, 0095, 0109; the examiner notes that the broadest reasonable interpretation of implementing a decision support task includes determining a solution based on the causality relationships and providing the solution to the user device; instant specification, para. 0034)

Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of NARAIN to PRUKSACHATKUN, LIU, and LEE.  As disclosed in NARAIN, one of ordinary skill would be motivated to utilize the teachings of NARAIN to use data-driven methodology in healthcare to expedite medical research and improve patient case by providing unique insights into relationships of the data points in the datasets, such as drug interactions and disease and/or biological process interactions.  (paras. 0124, 0171-0172).  As further disclosed in NARAIN, one of ordinary skill would further be motivated to utilize the user interface of NARAIN to allow users to input information about medical conditions and then review displayed graphical representations of variables and predictors and the relationships between the variables and predictors. (paras. 0022-0023).
The examiner notes that LEE discloses the BioBERT pre-trained biomedical language representation model for biomedical text mining which is also in the healthcare field.

Regarding claim 14, the PRUKSACHATKUN-LIU-LEE combination discloses the device of claim 8, including the “one or more processors, when performing the one or more actions” limitation (see claim 8).  However, the PRUKSACHATKUN-LIU-LEE combination fails to explicitly teach:
wherein the one or more processors, when performing the one or more actions, are configured to: 
provide the one or more causality relationships for display;
receive feedback based on providing the one or more causality relationships for display; and
modify the masked event C-BERT model or the event aware C-BERT model based on the feedback.

However, in a related field of endeavor, NARAIN relates to systems and methods for data analysis, and in particular, for using healthcare data to generate a causal relationship network model. (para. 0002).  ).  The PRUKSACHATKUN-LIU-LEE-NARAIN combination makes obvious:
provide the one or more causality relationships for display; (NARAIN discloses displaying a graphical representation of the causal relationship network model, e.g., data identifying one or more causality relationship, as part of a user interface; paras. NARAIN, 0100-0104; Figs. 7, 8, and 9A and 9B; NARAIN discloses that relationship-network module 230 may be implemented on client devices 110, 115, 120, and 125, and can be used to generate a causal relationship network model; NARAIN, paras. 0082-0086; the PRUKSACHATKUN-LIU-LEE-NARAIN combination now uses the newly-trained RoBERTa model using the COPA and Cosmos QA final task models from PRUKSACHATKUN, to perform downstream text mining, such as performing question and answering tasks, e.g., one or more actions, using the cause-effect relations determined by the COPA and Cosmos QA final task models, or any of the models utilized for BioBERT as in LEE, and then displays the cause-effect relations using the relationship-network model and user interface of NARAIN; PRUKSACHATKUN, pp. 2-5, sections 2.1 and 2.2.1 and 2.2.2 with LEE, p. 1236, section 3.3 and NARAIN, paras. 0082-0086, 0100-0104)
receive feedback based on providing the one or more causality relationships for display; and (NARAIN discloses that the user may select one or more nodes displayed in the graphical representation of part or all of the causal relationship network model, e.g., user feedback of the displayed causality relationships; NARAIN, para. 0100; the PRUKSACHATKUN-LIU-LEE-NARAIN combination now uses the newly-trained RoBERTa model using the COPA and Cosmos QA final task models from PRUKSACHATKUN, or any of the models utilized for BioBERT as in LEE, and then displays the cause-effect relations using the relationship-network model and user interface of NARAIN and receives user feedback as disclosed in NARAIN; PRUKSACHATKUN, pp. 2-5, sections 2.1 and 2.2.1 and 2.2.2 with LEE, p. 1236, section 3.3 and NARAIN, paras. 0082-0086, 0100-0104)
modify the masked event C-BERT model or the event aware C-BERT model based on the feedback. (LEE discloses that BioBERT may be fine-tuned, e.g., modified, using the biomedical text mining task of question/answering; LEE, p. 1235, section 2; PRUKSACHATKUN discloses the output of the CPA and Cosmos QA final task models is a prediction based on cause-effect events, e.g., causality relationships; PRUKSACHATKUN, pp. 3-5, sections 2.2.1 and 2.2; the PRUKSACHATKUN-LIU-LEE-NARAIN combination now fine-tunes, e.g., modifies, RoBERTa (where the modified versions of RoBERTa correspond to the masked event C-BERT model or the event aware C-BERT model as explained with respect to claim 8) in view of the biomedical text mining task of question/answering as disclosed in LEE and the user selections disclosed in NARAIN; PRUKSACHATKUN, pp. 3-5, sections 2.2.1 and 2.2 with LEE, p. 1235, section 2; NARAIN; para. 0100)

Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of NARAIN to PRUKSACHATKUN, LIU, and LEE.  As disclosed in NARAIN, one of ordinary skill would be motivated to utilize the teachings of NARAIN to use data-driven methodology in healthcare to expedite medical research and improve patient case by providing unique insights into relationships of the data points in the datasets, such as drug interactions and disease and/or biological process interactions.  (paras. 0124, 0171-0172).  As further disclosed in NARAIN, one of ordinary skill would further be motivated to utilize the user interface of NARAIN to allow users to input information about medical conditions and then review displayed graphical representations of variables and predictors and the relationships between the variables and predictors. (paras. 0022-0023).
The examiner notes that LEE discloses the BioBERT pre-trained biomedical language representation model for biomedical text mining which is also in the healthcare field.


Claims 19-20 are rejected under 35 U.S.C. 103 as being unpatentable over the PRUKSACHATKUN-LIU-LEE-HENDRICKX combination further in view of NARAIN.
 
	Claim 19 depends from claim 15 and claims a non-transitory computer-readable medium storing instructions that when carried out correspond to the device of claims 12 and 13, and therefore claim 19 is rejected using the same grounds as claims 12, 13, and 15 above.
Claim 20 depends from claim 15 and claims a non-transitory computer-readable medium storing instructions that when carried out correspond to the device of claim 14, and therefore claim 20 is rejected using the same grounds as claims 14 and 15, above.

Regarding claims 19 and 20, therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of NARAIN to PRUKSACHATKUN, LIU, LEE, and HENDRICKX.  As disclosed in NARAIN, one of ordinary skill would be motivated to utilize the teachings of NARAIN to use data-driven methodology in healthcare to expedite medical research and improve patient case by providing unique insights into relationships of the data points in the datasets, such as drug interactions and disease and/or biological process interactions.  (paras. 0124, 0171-0172).  As further disclosed in NARAIN, one of ordinary skill would further be motivated to utilize the user interface of NARAIN to allow users to input information about medical conditions and then review displayed graphical representations of variables and predictors and the relationships between the variables and predictors. (paras. 0022-0023).
The examiner notes that LEE discloses the BioBERT pre-trained biomedical language representation model for biomedical text mining which is also in the healthcare field.
Allowable Subject Matter
Claims 6, 7, 17, and 18 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
The following is a statement of reasons for the indication of allowable subject matter:  

Regarding claim 6, the PRUKSACHATKUN-LIU-LEE combination discloses the method of claim 1, including the “wherein training the masked event C-BERT model, with the masked training data, to generate the pretrained weights and the trained masked event C- BERT model” limitation (see claim 1).  The PRUKSACHATKUN-LIU-LEE combination further discloses:
generating vectors for masked events in the masked training data; (LIU discloses that the BERT model (that RoBERTa is based on) takes as input a sequence of tokens x1, x2, … xN, e.g., an input vector, and that masked language modeling is performed on training data, using both static and dynamic masking on the training data to generate masking patterns; LIU, p. 2, section 2.1, p. 3, section 3.2 and p. 4, sections 4 and 4.1)
a non-linear activation layer (LIU discloses a GELU activation function)

	However, the PRUKSACHATKUN-LIU-LEE combination fails to explicitly teach:
averaging the vectors  to determine a sentence context and final contexts for the masked events; and
processing the sentence context and the final contexts with a non-linear activation layer and a fully connected layer to generate the pretrained weights and the trained masked event C- BERT model.  

	However, in a related field of endeavor, US 20190354850 A1 (WATSON) discloses techniques for performing transfer learning in neural networks to enhance the performance of one or more machine learning tasks. (para. 0001).  WATSON discloses:
averaging the vectors  to determine a sentence context and final contexts for the masked events; and (WATSON discloses vector averaging, where input vectors 302, which can be applied to sentence representations, e.g., sentence vectors, are input into a neural network to determine output vector representations 316 (also called a context set), e.g., sentence contexts on a per-sentence basis, and final contexts, e.g., an event defined within one or more sentences; WATSON, paras. 0030, 0067)
processing the sentence context and the final contexts (context set vectors 316, as disclosed in WATSON at para. 0067 with a non-linear activation layer and a fully connected layer to generate the pretrained weights and the trained masked event C- BERT model.  (WATSON discloses an output fully connected layer 318, transforming the context set representation 316 into predictions vector 320; WATSON, para. 0067)

	While PRUKSACHATKUN, LIU, LEE, and WATSON detail each limitation as claimed, one of ordinary skill in the art before the effective filing date of the present application would not have found sufficient motivation to combine the references without the hindsight of Applicant’s disclosure.  As such, the claims are allowable as the prior art does not anticipate nor make obvious the limitations of claim 6 as currently presented.

	Regarding claim 7, the PRUKSACHATKUN-LIU-LEE combination discloses the method of claim 1, including the “wherein training the event aware C-BERT model, with the training data and the pretrained weights, to generate the trained event aware C-BERT model” limitation (see claim 1).  The PRUKSACHATKUN-LIU-LEE combination further discloses:
generating vectors for events in the masked training data; (LIU discloses that the BERT model (that RoBERTa is based on) takes as input a sequence of tokens x1, x2, … xN, e.g., an input vector, and that masked language modeling is performed on training data, using both static and dynamic masking on the training data to generate masking patterns; LIU, p. 2, section 2.1, p. 3, section 3.2 and p. 4, sections 4 and 4.1)
a non-linear activation layer (LIU discloses a GELU activation function; LIU, p. 2, section 2.4)

However, the PRUKSACHATKUN-LIU-LEE combination fails to explicitly teach:
averaging the vectors, based on the pretrained weights, to determine a sentence context and final contexts for the events; and
processing the sentence context and the final contexts with a non-linear activation layer  and a fully connected layer to generate the trained event aware C-BERT model.

However, in a related field of endeavor, US 20190354850 A1 (WATSON) discloses techniques for performing transfer learning in neural networks to enhance the performance of one or more machine learning tasks. (para. 0001).  WATSON discloses:
averaging the vectors, based on the pretrained weights, to determine a sentence context and final contexts for the events; and (WATSON discloses vector averaging, where input vectors 302, which can be applied to sentence representations, e.g., sentence vectors, are input into a neural network with pretrained weights, to determine output vector representations 316 (also called a context set), e.g., sentence contexts on a per-sentence basis, and final contexts, e.g., an event defined within one or more sentences; WATSON, paras. 0030, 0067)
processing the sentence context and the final contexts (context set vectors 316, as disclosed in WATSON at para. 0067) with a non-linear activation layer (disclosed by LIU as discussed above with respect to this claim 7) and a fully connected layer to generate the trained event aware C-BERT model. (WATSON discloses an output fully connected layer 318, transforming the context set representation 316 into predictions vector 320; WATSON, para. 0067)

While PRUKSACHATKUN, LIU, LEE, and WATSON detail each limitation as claimed, one of ordinary skill in the art before the effective filing date of the present application would not have found sufficient motivation to combine the references without the hindsight of Applicant’s disclosure.  As such, the claims are allowable as the prior art does not anticipate nor make obvious the limitations of claim 7 as currently presented.

	Claim 17 depends from claim 15 and claims a non-transitory computer-readable medium storing instructions that when carried out corresponds to the method of claim 6, and therefore claim 17 contains allowable subject matter for the same reason as set forth above with respect to claim 6.
Claim 18 depends from claim 15 and claims a non-transitory computer-readable medium storing instructions that when carried out corresponds to the method of claim 7, and therefore claim 18 contains allowable subject matter for the same reason as set forth above with respect to claim 7.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
US 20210326751 A1 (Liu et al.) discloses the “ALUM” training algorithm for pretraining and fine-tuning machine learning models. (para. 0053).  ALUM is compared to BERT and RoBERTa and performed on datasets related to commonsense reasoning.  (paras. 0065-0066).
US 20210151029 A1 (Gururani et al.) discloses a text-to-speech encoder that utilizes an attention mechanism to generate attention weights for each of the combined encodings and averages each combined encoding by the respective attention weight to generate a context vector. (para. 0032).
US 20200372025 A1 (Yoon et al.) discloses utilizing averaging to consolidate multiple vectors of context representation in a language model. (para. 0032).
Bouraoui, Zied, et a. "Inducing relational knowledge from BERT." Proceedings of the AAAI Conference on Artificial Intelligence. (April 3, 2020), pp. 7456-7463.  Discloses fine-tuning BERT to identify relations in a dataset, where such relations include attributive knowledge, causality, and other forms of commonsense knowledge.  (p. 7461, section 4.2).
Dasgupta, Tirthankar, et al. "Automatic extraction of causal relations from text using linguistically informed deep neural networks." Proceedings of the 19th Annual SIGdial Meeting on Discourse and Dialogue. 2018, pp. 306-316.  Discloses a neural network architecture for automatically extracting cause-effect relations from text.
Ding, Xiao, et al. "ELG: an event logic graph." arXiv preprint arXiv:1907.08015 (2019), pp. 1-11.  Discloses using BERT to extract causal relations from text and to generate event logic graphs.  (p. 5, section 3.3.2).
Kayesh, Humayun, et al. "Answering binary causal questions: A transfer learning based approach." 2020 International Joint Conference on Neural Networks (IJCNN). IEEE, (July 2020).  Discloses fine-tuning the BERT model to answer binary causal questions.  (p. 5, section 1).  
Phang, Jason, et al. "Sentence encoders on stilts: Supplementary training on intermediate labeled-data tasks." arXiv preprint arXiv:1811.01088 (2018), pp. 1-12. Discloses utilizing intermediate labeled-data tasks as a second stage of pretraining.  BERT is one of the machine learning models utilized.  (p. 3, section 3).
Yu, Bei, et al. "Detecting causal language use in science findings." EMNLP-IJCNLP (2019) pp. 4664-4674. Discloses a BERT-based prediction model that classifies sentences based on causality.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHAEL C LEE whose telephone number is (571)272-4933. The examiner can normally be reached M-F 9:00 am - 5:00 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Andrew Flanders can be reached on 571-272-7516. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/MICHAEL C. LEE/Examiner, Art Unit 2655                                                                                                                                                                                                        


/JESSE S PULLIAS/Primary Examiner, Art Unit 2655