DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .


Priority
Receipt is acknowledged of certified copies of papers required by 37 CFR 1.55.


Specification
The lengthy specification has not been checked to the extent necessary to determine the presence of all possible minor errors. Applicant’s cooperation is requested in correcting any errors of which applicant may become aware in the specification.

Drawings
The drawings filed on 10/3/2019 were accepted


Allowable Subject Matter
Claims 13-17 are allowed.
The following is an examiner’s statement of reasons for allowance: Claim 13 is similar to claim 1 but also includes the limitation that states that the identification of actions to be executed are based on “verbs in the topics identified in the received domain-specific document.” Claim 9 contains a similar limitation. While the prior art discloses identifying actions to perform based on the identified topics and identifying part of speech of terms within the documents, they do not disclose the identification of verbs in the topics themselves, nor the identification of actions based on those verbs.
Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee.  Such submissions should be clearly labeled “Comments on Statement of Reasons for Allowance.”


Claim Objections
Claims 3 and 9-12 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
Claim 4 is objected is objected to because of the following informalities: The limitation “a selection method parameter set to select percentile” is unclear. It seems the applicant is trying to indicate that a user can set the percentile based on context from the specification, but this should be explicitly stated in the claim. Appropriate correction is required.



Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-2, 7, and 18-19 are rejected under 35 U.S.C. 103 as being unpatentable over Costas (US 20180053128 A1; filed 8/17/2017) in view of Bennett et al (US 20160337295 A1; filed 5/15/2015).

With regards to claim 1, Costas discloses an Artificial Intelligence (AI)-based regulatory data processing system comprising: one or more processors; and a non-transitory data storage comprising processor-executable instructions that are executed by one or more processors to (Costas, paragraph 1: “The technologies used in this invention relate to the fields of Artificial Intelligence, Machine Learning… These systems and methods can be embodied in computer code, and the realization of its activities may be exposed to its users in the form of a desktop application, a web page, a set of Application Programming Interfaces (API), automated digital assistant (such as a chatbot), and other methods and means;” computer code and machine learning are executed with processors and storage): receive a domain-specific document for analysis, the received domain-specific document including regulatory text associated with a specific domain (Costas, fig. 2: Policies, Procedures, Work Instructions received at block 205; paragraph 45: “the documents provided in block 205”); extract topics in the received domain-specific document using a topic extraction model (Costas, Fig. 2: Hierarchical Topic Extraction; paragraph 38: “create the Hierarchical Topic Model”), the topic extraction model being trained via unsupervised training on prior domain-specific documents (Costas, paragraph 38: “create the Hierarchical Topic Model, which is the output of block 190… There are well-known algorithms to extract topics from a set of documents in an automatic fashion. Two of the most widely known and used are Latent Semantic Indexing (LSI) and Latent Dirichlet Allocation (LDA).” LSI and LDS models are trained via unsupervised training), the prior domain-specific documents in a regulatory text corpus, and the topic extraction model for identifying the topics from the received domain-specific document (Costas, Fig. 1 and paragraph 38: the output of block 160, which includes the segmented and tagged regulatory documents, is used to create the Hierarchical Topic Model; Also see paragraph 36 and 42 for more details on block 190 input; “domain-specific” can be interpreted as just being related to a business, which would include any regulations or standards associated with the business); 
identify and classify one or more entities in the received domain-specific document … (Costas, paragraph 35: “Block 160 will then perform further processing, with the goals of obtaining a standard representation of the meaning of the input texts by finding words or sequences of words with similar semantic content. This can include word lemmatization, disambiguation, and canonicalization. The output of block 160 does not only include a set of specifically processed words, but also combination of words (bigrams, trigrams, etc.) so that the system processes the fullest meaning of the input text. Block 160 then performs lemmatization by implementing well-known algorithms. The system can further use algorithms such as Word2Vec for disambiguation and canonicalization.”); 
classify one or more portions in the received domain-specific document as belonging to one of a plurality of predetermined sections, the predetermined sections in a section identification model, and the section identification model being trained via supervised learning on the prior domain-specific documents (Costas, paragraph 36: “Block 180 is a classification model, implemented by a supervised machine learning means, trained using the output of block 160.” Paragraph 44: “The output of block 215 is presented to the model that was trained in 191, in order to determine which areas of the regulations and other pertinent texts input in FIG. 1 are related to the documents input in FIG. 2.” Also, figs. 1-2: The model 191 is trained using inputs 110, 120, and 130, then used to classify parts of an input document in 220);
 identify one or more actions to be executed for implementing processes based at least on the one or more sections and the topics extracted from the received domain-specific document (Costas, paragraph 40: “documents with a poor regulatory compliance score can be input to FIG. 1, so that other documents that are given to FIG. 2 and that have the same types of deficiencies are identified as well, so companies can take action and correct those deficiencies in a timely manner.”); 
obtain a corresponding priority for each of the one or more actions…; and generate one or more notifications regarding the actions and the corresponding priorities for each of the actions (Costas, paragraph 24: “The system will judge the likelihood of the applicability of those regulations in direct proportion to the calculated distance, and will have a configurable cutoff so that regulations that exceed that distance will not be used in the regulatory risk analysis;” paragraph 62: “The embodiment, in step 610, will then provide the analysis results showing what part of the applicable areas of the regulatory corpus are not properly covered by the received documents in step 560. In step 620, compliance indexes are provided based on how well the received documents cover the specific applicable areas of the regulatory corpus and how well the applicable areas of the regulatory corpus are covered by the received documents;” the output that is provided to the user of the system is being interpreted as the notification, while the compliance index is being interpreted as a type of priority – See below for a time related prioritization).
However, Costas does not disclose classify one or more entities in the received domain-specific document as belonging to entities in an entity identification model, the entity identification model being trained via supervised learning using the prior domain-specific documents… obtain a corresponding priority for each of the one or more actions, the obtaining based at least on date entities determined from the one or more entities.
However, Bennett et al teaches classify one or more entities in the received domain-specific document as belonging to entities in an entity identification model, the entity identification model being trained via supervised learning using the prior domain-specific documents (Bennett et al, paragraph 29: “identifying and extracting requests and commitments and related information using machine learning procedures that operate on training sets of annotated corpora of sentences or messages (e.g., machine learning features)”)… obtain a corresponding priority for each of the one or more actions, the obtaining based at least on date entities determined from the one or more entities (Bennett et al, paragraph 19: “a process may augment extracted task content (e.g., requests or commitments)… with additional information, such as identification of… times/dates;” paragraph 20: “task content (e.g., the proposal or affirmation of a commitment or request) of a communication may be further processed or analyzed to identify or infer semantics of the commitment or request including: specified or inferred pertinent dates (e.g., deadlines for completing the commitment or request)”).
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to have combined Costas and Bennett et al such that the extracted entities include dates that are conveyed to the user. This would have enabled the invention to provide important information about actions such as deadlines and other time management activities (Bennett, paragraph 20: “Such information resources, for example, may provide information about time, people, locations, and so on. The identified task content and inferences about the task content may be used to drive automatic (e.g., computer generated) services such as reminders, revisions (e.g., and displays) of to-do lists, appointments, meeting requests, and other time management activities.”).

	With regards to claim 2, which depends on claim 1, Costas discloses access the regulatory text corpus that includes prior domain-specific documents having regulatory information pertaining to the specific domain; and train the topic extraction model to extract the topics via unsupervised training on the prior domain-specific documents in the regulatory text corpus wherein the prior domain-specific documents include domain-specific topics (Costas, paragraph 38: “create the Hierarchical Topic Model, which is the output of block 190… There are well-known algorithms to extract topics from a set of documents in an automatic fashion. Two of the most widely known and used are Latent Semantic Indexing (LSI) and Latent Dirichlet Allocation (LDA).” LSI and LDS models are trained via unsupervised training).

	With regards to claim 7, which depends on claim 1, Costas discloses wherein to classify the one or more portions in the received domain-specific document the processor is to: access the regulatory text corpus that includes prior domain-specific documents having regulatory information pertaining to the specific domain (Costas, Fig. 1 or fig. 3: Input documents 110, 120, 130 or 310 and 315); and train the section identification model to classify portions of the received domain-specific document under one of the plurality of predetermined sections via supervised training on the prior domain-specific documents in the regulatory text corpus, wherein the prior domain-specific documents include labelled training data with portions of the prior domain-specific documents annotated as being classified under one of the plurality of predetermined sections (Costas, paragraph 36: “Block 180 is a classification model, implemented by a supervised machine learning means, trained using the output of block 160.” paragraph 37: “In addition to the labels, which were extracted in block 140, the other components that a supervised machine-learning algorithm needs to train the system are the text structure features. In this case, since we are working with text, the text structure features are the output of block 160”).

Claim 18 recites substantially similar limitations to claim 1 and is thus rejected along the same rationale.

With regards to claim 19, which depends on claim 18, Costas discloses obtain from the topic extraction model, a subset of the prior domain-specific documents identified as relevant to the received domain-specific document; estimate similarities between corresponding sections of each of the prior domain-specific documents and the domain-specific document; and compare the similarities to a similarity threshold (Costas, paragraph 46: “In the case of block 245, this index can be calculated by a set similarity measure, such as the Jaccard index. The Jaccard index is obtained by comparing the set of words comprising highly ranked topics of the relevant documents identified in block 220 and the topics identified in block 190 with the set of words in the topics with high ranking, for a particular document, found in block 225. If a document provided in block 205, for example, has a high Jaccard index, the risk that that document has missed important areas of highly rated classifications output in block 220 is low. If the Jaccard or other appropriate set similarity measurements are low, then the risk is high. Other measures of set similarity may also be used.” Paragraph 25: “if a document's compliance level is determined to be over 95%”).


Claims 4-5 are rejected under 35 U.S.C. 103 as being unpatentable over Costas  in view of Bennett et al, and further in view of Csomai et al (US 20100145678 A1; filed 11/6/2009).

With regards to claim 4, which depends on claim 1, the combination of Costas and Bennett et al does not teach wherein to identify and classify the one or more entities in the received domain-specific document the processor is to: extract linguistic features from the regulatory text of the received domain- specific document using an entity feature selection model based on sequence labelling (Csomai et al, paragraph 40: “Every sequence of words in a document such as a book represents a potential candidate for an entry in the keyword collection such as a back-of-the-book index… These represent the candidate index entries that will be used in the classification algorithm. Candidate entries from a training data set are then labeled as positive or negative, depending on whether the given n-gram was found in the back-of-the-book index associated with the book.”) with a selection method parameter set to select percentile (Csomai et al, paragraph 43: “For this example, an undersampling solution was adopted, where 10% of the negative examples are randomly selected for retention in the training data”) and a scoring mode set to Chi-squared (Csomai, paragraph 48: “To measure the informativeness of a keyword or keyphrase… χ2 (chi-squared) independence test, which measures the degree to which two events happen together more often than by chance”).
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to have combined Costas, Bennett et al, and Csomai et al such that the entity extraction of Bennett is based on the sequence labeling taught by Csomai et al. This would have enabled the invention to more accurately determine the informativeness of key phrases (Csomai et al, paragraph 48: “The informativeness of a keyphrase (or keyword) is measured by finding if the phrase (or word) occurs in the document more frequently than it would by chance.” Paragraph 41: “Moreover, the set is extremely unbalanced, with a ratio of positive and negative examples of 1:675, which makes it unsuitable for most machine learning algorithms. In order to address this problem, it is desirable to find ways to reduce the size of the data set, possibly eliminating the training instances that will have the least negative effect on the usability of the data set”).

With regards to claim 5, which depends on claim 4, Costas does not disclose yet Bennett et al teaches train the entity identification model to identify and classify domain-specific entities in the received domain-specific document via supervised training on the prior domain-specific documents in the regulatory text corpus, wherein the prior domain-specific documents include labelled training data identifying and classifying the domain-specific entities (Bennett et al, paragraph 75: “a training phase in which a machine learning algorithm is trained using supervised/labeled training data”).
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to have combined Costas and Bennett et al such that the extraction model is trained using labeled data. This would have enabled the invention to classify the extracted terms into categories (Bennett et al, paragraph 73: “Support vector machine block 804 may function as a supervised learning model with associated learning algorithms that analyze data and recognize patterns, used for classification and regression analysis. For example, given a set of training data, each marked as belonging to one of two categories, a support vector machine training algorithm builds a machine learning model that assigns new training data into one category or the other.”).


Claim 6 is rejected under 35 U.S.C. 103 as being unpatentable over Costas in view of Bennett et al and Csomai et al, and further in view of Lafferty et al (John Lafferty, et al., “Conditional random fields: Probabilistic models for segmenting and labeling sequence data,” Int'l Conf. on Machine Learning (ICML), pp. 282-289, 2001).

With regards to claim 6, which depends on claim 5, Costas does not disclose wherein the entity identification model is based on conditional random fields (CRF) methodology.
	However, Lafferty et al teaches wherein the entity identification model is based on conditional random fields (CRF) methodology (Lafferty et al, abstract: “We present conditional random fields, a framework for building probabilistic models to segment and label sequence data”).
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to have combined Costas, Bennett, Csomai, and Lafferty et al such that the entity identification model was based on the conditional random fields methodology as taught by Lafferty et al. “Conditional random fields offer several advantages over hidden Markov models and
 stochastic grammars for such tasks, including the ability to relax strong independence assumptions made in those models. Conditional random fields also avoid a fundamental limitation of maximum entropy Markov models (MEMMs) and other discriminative Markov models based on directed graphical models, which can be biased towards states with few successor states” (Lafferty et al, abstract).


Claim 8 is rejected under 35 U.S.C. 103 as being unpatentable over Costas in view of Deolalikar et al (US 20140164297 A1; filed 12/10/2012).

With regards to claim 8, which depends on claim 7, Costas discloses wherein the section identification model is based on … Naive Bayes methodology (Costas, paragraph 37: “There are well-known supervised machine learning algorithms for text classification, which include Naïve Bayes, Bayes Networks, and Support Vector Classifiers.” Note: This sentence is stated in the context of training the classification model).
However, Costas does not disclose a multinomial Naïve Bayes methodology.
Deolalikar et al teaches multinomial Naïve Bayes methodology (Deolalikar et al, paragraph 55: “The present disclosure describes multinomial naïve Bayes (MNB);” paragraph 15: “Naïve Bayes text classifiers may be used in classification of textual documents because of their simplicity, ease of implementation, ease of model interpretation, speed, and range of applications on which they yield surprisingly good results. Some research on textual document classification focuses on learning when adequate training samples are given, and naïve Bayes is no exception.”).
	It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to have combined Costas and Deolalikar et al such that the section identification model used the multinomial naïve Bayes methodology. Deolalikar “describes multinomial naïve Bayes (MNB), which may be posited as the version of naïve Bayes most appropriate for text classification” (Deolalikar et al, paragraph 55).

Claim 20 is rejected under 35 U.S.C. 103 as being unpatentable over Costas in view of Bennett et al, and further in view of examiner’s official notice.

With regards to claim 20, which depends on claim 19, Costas discloses output… identities of one or more of the subset of the prior domain-specific documents that have corresponding sections that do not meet the similarity threshold (Costas, paragraph 62: “In steps 590 and 600, the topic model created in step 550 is used to determine how well the received documents cover the applicable parts of the regulatory corpus by comparing the topic model to the topics extracted from the contents of the documents received in step 560. The embodiment, in step 610, will then provide the analysis results showing what part of the applicable areas of the regulatory corpus are not properly covered by the received documents in step 560”).
However, Costas does not expressly disclose output via a graphical user interface (GUI).
Examiner takes official notice that displaying an output via a GUI was well known in the art before the time of filing. It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to have modified Costas such that the output is displayed via a GUI. This would have enabled a user to view and/or access the output.




Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Calapodescu (US 20170300565 A1): Teaches entity extraction that uses conditional random field methodology.
Levi et al (US 20130297626 A1): Extracts policy information from text and classifies it. Uses Naïve Bayes.
Homeyer (US 20190026634 A1): Teaches automatically assigning team members to roles for a project.
Kaufman et al (US 20040205461 A1): Teaches using chunk size as a parameter for latent semantic indexing.
Jackson (US 20160103823 A1): Teaches extraction of rules and provisions from legal documents using multiple machine learning models.
Lagi et al (US 20200065857 A1): Teaches extracting topics from online sources and messaging someone based on the results.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to BRODERICK C ANDERSON whose telephone number is (313)446-6566. The examiner can normally be reached Monday-Friday 9-5 EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Stephen Hong can be reached on 5712724124. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/B.C.A/Examiner, Art Unit 2178                                                                                                                                                                                                        
/STEPHEN S HONG/Supervisory Patent Examiner, Art Unit 2178