Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows: 
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

With respect the claims 1-19, the claims 1-19 recite a series for analysis and determination of relationship from a variety of data source.  Thus the claims are directed to a statutory category, because a series of analysis and determination of relationship from a variety of data source (a series of acts).  Further, the claim is directed to a judicial exception.  The claims recites step by step such   as abstract ideas. The claim falls in one of abstract ideas, “Mental Process”.  One of The Abstract Ideas categories is “Mental Process " such as concepts performed in the human mind.   An idea standing alone such an unistantiated concept, plan, or scheme, as well as a mental process (thinking) that “can be performed in human mind, or by a human using pen and paper.  Like the invention in Alice Corp, the instant claim is merely limiting the abstract idea to a computer environment by simply performing the idea via a computer to do analysis and determination of relationship from a variety of data source.  This is abstract idea.  Further, at step 2B, the claims does not have any additional limitations recited that amount to significantly more than the abstract idea.  The claims require no additional limitations.  These generic computer components (processor, etc) are claimed to perform their basic functions analysis and determination of relationship from a variety of data source. This recitation of the computer limitations amounts to mere 
	
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-2, 5 and 7-11, 14, 16-19 are rejected under 35 U.S.C 103(a) as being unpatentable over Oh et al. (U.S. Pub. 2016/0155058 A1) in view of Aravamudan et al. (U.S. Pub. 2018/0082197 A1).
With respect to claims 1 and 10, Oh et al. discloses a computer-implemented method for analyzing data from a variety of data sources, the method comprising: 
identifying keywords in the received data (i.e.,” training texts are stored in advance. Using labeling unit 232, to each of the texts, labels indicating a position of a clue word stored in causal relation clue word storage unit 58 and ranges of a cause part and a result part of the causal relation expression connected by the clue word are manually added” (00104) or “Various existing method may be used for the document search by related document searching unit 54. By way of example, a method of document search using a content word extracted from the question as a keyword may be applied” (0058)); 
generating sentence or word embeddings based on the identified keywords (i.e. “Answer candidate extracting unit 56 extracts answer candidates each consisting of a set of five consecutive sentences, from the sentences contained in the documents searched by related document searching unit 54. As shown in FIG. 5, answer candidate extracting unit 56 extracts a plurality of sets each including five sentences, as represented by the first set 120 of five sentences, the second set 122 of five sentences, . . . the second to last set 130 of five sentences and the last set 132 of five sent” (0061) and “Each training data set contains a question and a plurality of sentences representing causal relations serving as answer candidates to the question. Each sentence has a label attached, indicating whether or not the result part of the causal relation expression included in the sentence is to be used as an answer to the question of the same training data set”(0097)); 
receiving a selection of one or more labels based on the generated sentence or word embeddings (i.e., “Using labeling unit 232, to each of the texts, labels indicating a position of a clue word stored in causal relation clue word storage unit 58 and ranges of a cause part and a result part of the causal relation expression connected by the clue word are manually added. Sentences having the labels added are stored as training data in training data storage unit 234” (0104)); 
adding the selected one or more labels to a model (i.e., “(i.e., “Using labeling unit 232, to each of the texts, labels indicating a position of a clue word stored in causal relation clue word storage unit 58 and ranges of a cause part and a result part of the causal relation expression connected by the clue word are manually added. Sentences having the labels added are stored as training data in training data storage unit 234” (0104)); 
training the model over the common data structure based on a configuration file (i.e., “Referring to FIG. 11, in learning unit 290 of scoring unit 302, a plurality of training data sets are stored in training document storage unit 310. Each training data set includes a question and a plurality of sentences representing causal relations serving as answer candidates to the question. Each sentence has a label indicating whether or not the result part of the causal relation included in the sentence is to be the answer to the question of the same training data set”(0105) and fig. 9 shows training data storage unit and machine learning unit 26 to create CRF model 222); and 
generating a result in response to a user question based on the model (i.e., “a clue word specifying unit 220 specifying, in each input answer candidate, any word stored in causal relation clue word storage unit 58; a CRF model 222 trained in advance to specify a cause part and a result part of the causal relation expression connected by the word, once the clue word is specified in the answer candidate; and a causal relation expression specifying unit 224, adding tags indicating start and end positions of a cause part and tags indicating start and end positions of a result part of the causal relation expression connected by the clue word to each answer candidate by looking up CRF model 222 using the clue word specified by clue word specifying unit 220 and the answer candidate, and outputting the answer candidate as answer candidate 204”(0075)), wherein the generating includes: 
retrieving related documents from the received data (i.e., “Any technique may be used provided that it can collect documents considered to be related to the question” (0044), 0050, 0055 and “question-answering system 30 further includes: a related document searching unit 54 searching and extracting documents considered to be related to question 34 from object document storage unit 32, by an existing information retrieval technique,” (0058)); 
determining which information should be reported from which of the retrieved related documents (i.e., “a causal relation recognizing unit 60 recognizing causal relation expression included in extracted answer candidates”(abstract) or “non-factoid question-answering system capable of giving appropriate answers to non-factoid questions, by appropriately handling various expressions of causal relations appearing in documents, can be provided”(0014) and (i.e., “from these answer candidate texts, causal relations effective in generating an answer to the question are recognized in the following manner. The result part of each causal relation is expressed with appropriate features and used for supervised learning, whereby the result part as an answer candidate is evaluated. One having high evaluation is adopted as an answer” (0045)); and 
providing the result based on the determination and a graph schema associated with the related documents (i.e., “it can be seen from the values of each rank ( graph 322) of the answer accuracy obtained by the technique of Non-Patent Literature 3 that the answer accuracy at the point of 25% of questions of which the score of the top answer is in the highest 25% among that of the top answer of all the questions (represented by chain-dotted line 326 in FIG. 13) was 62%. In contrast, the value of the technique in accordance with the above-described embodiment (graph 320) was 83%. For reference, FIG. 13 also shows a graph 324 representing an example using only the causal relation for ranking” (0119)); but Oh et al. does not explicitly disclose receiving, as inputs, data from the variety of data sources;  fig. 1 shows receiving data from variety of data sources such as 101, 117, 102); 
converting the received data from each of the variety of data sources into a common data structure (i.e., “FIG. 1 includes the system store 114 that can be used to convert words into vectors and analysis of the resulting semantic BioKnowledge graph in accordance with some embodiments of the present disclosure. The system store 114 can include information stored in a structured semantic database 106 (which can be a traditional database); a knowledge graph(s) 107 (which can be directed graphs of labeled (extracted from both paths 101a and 102a) and/or unlabeled entities (extracted from the 102a path)); word embeddings 108 (which can include word(s) and/or sentence(s)), document/paragraph/sentence embeddings 109; and sequence representations of unstructured data 110” (0263)).  It would have been obvious for a person of ordinary skill in the art, before the effective filing date of the claimed invention, to include Aravamudan’s feature in order to get better entity recognition from unstructured source for the stated purpose has been well known in the art as evidenced by teaching of Aravamudan et al (0008).  Both references teach the same field such analyzing the document. 

With respect to claims 2 and 11, Oh et al. discloses wherein the variety of data sources includes at least one of a machine-readable document (i.e., a related document searching unit 54 responsive to a question, for taking out answer candidates from an object document storage unit 32(abstract)), non-machine readable document , spreadsheet, image, a Hypertext Markup Language file.  
i.e., “From the view point of fully reflecting the nature of a task on the features used and other information”(0010)).  
With respect to claims 7, and 16 Oh et al. discloses further comprising: performing at least one quality assessment check (i.e., “semantic features such as semantic classes of words, evaluation expressions.”(0005) and “The result part of each causal relation is expressed with appropriate features and used for supervised learning, whereby the result part as an answer candidate is evaluated’ (0045)).  
With respect to claims 8 and 17, Oh et al. discloses further comprising: receiving, via a user interface, at least one expression (i.e., The result part of each causal relation is expressed with appropriate features and used for supervised learning, whereby the result part as an answer candidate is evaluated’(0045))); providing the at least one expression to the model (i.e., “causal relation expressions existing in texts prepared for searching an answer are recognized, and by supervised learning using appropriate features, an answer to the question is specified from the texts. Corresponding procedures are summarized below” (0038), 0075);   selecting, with an application programming interface, annotation candidates associated with the at least one expression (i.e., “Using labeling unit 232, to each of the texts, labels indicating a position of a clue word stored in causal relation clue word storage unit 58 and ranges of a cause part and a result part of the causal relation expression connected by the clue word are manually added”(0140)); and training the model based on the selected annotation candidates (i.e., “for taking out answer candidates from an object document storage unit 32; an answer candidate extracting unit 56 extracting plausible ones from the answer candidates; a causal relation recognizing unit 60 recognizing causal relation expression included in extracted answer candidates”(abstract)).  
With respect to claims 9 and 18, Oh et al. discloses wherein, during training (i.e., “a training text storage unit 230 storing training texts” (0076) and fig. 9), target truth labels (232) (fig. 9) and features are extracted from a training dataset (230) (fig. 9), and then provided to the model (CRF model) (222) (fig. 9). 
With respect to claim 19, Oh et al. discloses a computer-implemented system for analyzing data from a variety of data sources, the system comprising: 
an application programming interface; and 
a processor, wherein the processor is configured to: 
generate a result in response to a user question based on a machine learning model ((i.e., “causal relation expressions existing in texts prepared for searching an answer are recognized, and by supervised learning using appropriate features, an answer to the question is specified from the texts. Corresponding procedures are summarized below” (0038)), wherein the generating includes: 
retrieving related documents from the received data (i.e., “Any technique may be used provided that it can collect documents considered to be related to the question”(0044)); 
determining which information should be reported from which of the retrieved related documents (i.e., “from these answer candidate texts, causal relations effective in generating an answer to the question are recognized in the following manner. The result part of each causal relation is expressed with appropriate features and used for supervised learning, whereby the result part as an answer candidate is evaluated. One having high evaluation is adopted as an answer” (0045)); and 
providing the result based on the determination and a graph schema associated with the related documents (i.e., “it can be seen from the values of each rank ( graph 322) of the answer accuracy obtained by the technique of Non-Patent Literature 3 that the answer accuracy at the point of 25% of questions of which the score of the top answer is in the highest 25% among that of the top answer of all the questions (represented by chain-dotted line 326 in FIG. 13) was 62%. In contrast, the value of the technique in accordance with the above-described embodiment (graph 320) was 83%. For reference, FIG. 13 also shows a graph 324 representing an example using only the causal relation for ranking” (0119)); 
wherein the machine learning model is trained on annotation candidates provided by the application programming interface (i.e., “Referring to FIG. 9, causal relation recognizing unit 60 includes: a clue word specifying unit 220 specifying, in each input answer candidate, any word stored in causal relation clue word storage unit 58; a CRF model 222 trained in advance to specify a cause part and a result part of the causal relation expression connected by the word, once the clue word is specified in the answer candidate; and a causal relation expression specifying unit 224, adding tags indicating start and end positions of a cause part and tags indicating start and end positions of a result part of the causal relation expression connected by the clue word to each answer candidate by looking up CRF model 222 using the clue word specified by clue word specifying unit 220 and the answer candidate, and outputting the answer candidate as answer candidate 204” (0075)).  Further, Aravamudan et al. discloses computer-implemented system for analyzing data from a variety of data sources (fig. 1 shows receiving data from variety of data sources such as 101, 117, 102).  It would have been obvious for a person of ordinary skill in the art, before the . 
Claims 3-4 and 12-13 are rejected under 35 U.S.C 103(a) as being unpatentable over Oh et al. (U.S. Pub. 2016/0155058 A1), Aravamudan et al. (U.S. Pub. 2018/0082197 A1) and further in view of Pendar et al. (U.S. pub. 2017/060993 A1)
With respect to claims 3, and 12, Oh and aravamudan et al. disclose all limitations recited in claim 1 except for further comprising: splitting the received data into component documents, wherein the received data is split based on one of a heuristic model and a trained model.  However, Pendar et al. discloses further comprising: splitting the received data into component documents, wherein the received data is split based on one of a heuristic model and a trained model (i.e., “data collection module 222 that receives textual data (e.g. unlabeled textual data) from one or more of the network I/F module 208, a storage device 212 and input/output device 210 and passes it to the training set generator 228, an initial concept receiver module 224 that receives an initial concept, and passes the initial concept to the initial keyword generator module 226, an initial keyword generator module 226 that determines one or more knowledge sources and identifies a set of initial keywords using the one or more knowledge sources, and a training set generator 228 for searching, scoring, splitting and extracting Machine Learning features from the data.”(0040)). It would have been obvious for a person of ordinary skill in the art, before the effective filing date of the claimed invention, to include Pendar’s feature in order to get better training for leaning machine to get accurate and efficient for the stated purpose has 
With respect to claim 4 and 13, Pendar et al. discloses further comprising: tokenizing at least one of word elements and sentence elements in the received data (i.e., “the present disclosure focuses on the healthcare/medical space. In some embodiments, the following terms can be used interchangeably: "entity" and "token."” (0140)); and further, adding default word embeddings to at least one of the tokenized word elements and sentence elements (i.e., “the normalization process can involve generating Resource Description Framework (RDF) triples (node A, node B, with an attribute edge connecting them). The normalization/classification can leverage 107a, 108a off the existing structured data 106 and embeddings 108 for merging. Unstructured data can go to 102a through a tokenization/normalization, which can involve, for example, cleaning up tokens. In some embodiments, tokens can be words and/or phrases that constitute input to a machine teaming model. For example, the word "the" is a token” (0264)).  
Claims 6 and 15 are rejected under 35 U.S.C 103(a) as being unpatentable over Oh et al. (U.S. Pub. 2016/0155058 A1), Aravamudan et al. (U.S. Pub. 2018/0082197 A1) and further in view of Brown et al. (U.S. pub. 2012/0084293 A1)
With respect to claims 6 and 15, Oh et al. discloses the method of claim 5, wherein (i) the task type is one of: train, validate, and predict (i.e., “In order to determine whether an answer candidate is relevant as an answer to the question, it is necessary to recognize presence/absence of an entailment relation between the expression in the question and the expression in the answer candidate. This, however, is a difficult task. Therefore, in the present embodiment, the entailment relation is recognized using a concept of "polarity" of a predicate” (0084)), further, Aravamdudan discloses (iii) the features include word embeddings (i.e., “FIG. 1 includes the system store 114 that can be used to convert words into vectors and analysis of the resulting semantic BioKnowledge graph in accordance with some embodiments of the present disclosure. The system store 114 can include information stored in a structured semantic database 106 (which can be a traditional database); a knowledge graph(s) 107 (which can be directed graphs of labeled (extracted from both paths 101a and 102a) and/or unlabeled entities (extracted from the 102a path)); word embeddings 108 (which can include word(s) and/or sentence(s)), document/paragraph/sentence embeddings 109; and sequence representations of unstructured data 110”(0263)); but Oh or Aravamdudan et al. do not explicitly discloses  ) the algorithm type is one of a regression algorithm and a recursive algorithm.  However, Brown et al. discloses the algorithm type is one of a regression algorithm and a recursive algorithm (i.e., “a classification model 71 is trained over instances (based on prior data) with each candidate answer being classified as true/false for the question (using logistic regression or linear regression function or other types of prediction functions as known in the art)” (0082)).  It would have been obvious for a person of ordinary skill in the art, before the effective filing date of the claimed invention, to include Brwon et al.’s feature in order to get accurate training to predict the result the stated purpose has been well known in the art as evidenced by teaching of Brown et al (0082).  Both references teach the same field such analyzing the document. 
Reference:  
All list relate references are listed in PTO-892.



Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to HUNG T VY whose telephone number is (571)272-1954.  The examiner can normally be reached on M-F 8-5.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Tony Mahmoudi can be reached on (571)272-4078.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access 






/HUNG T VY/Primary Examiner, Art Unit 2163                                                                                                                                                                                                        July 29, 2021