DETAILED ACTION

This action is responsive to communications filed on April 17, 2019. This action is made Non-Final.
Claims 1-20 are pending in the case. 
Claims 1, 8, and 15 are independent claims.
Claims 1-20 are rejected.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statement (IDS(s)) submitted on 04/17/2019 is/are in compliance with the provisions of 37 C.F.R. 1.97. Accordingly, the IDS(s) is/are being considered by the examiner.

Claim Interpretation
	Claims 15-20 are interpreted as statutory under a 35 USC 101 CRM analysis. The Examiner notes the Specification, Paragraph 0024, recites “a computer readable storage medium, as used herein, is not to be construed as being transitory signals per se…” Accordingly, the “medium” recited in claims 15-20 are interpreted as non-transitory medium. 

Improper Markush Grouping
Claims 4, 6, 11, 13, 18, and 20 are rejected on the basis that the claims include an improper Markush grouping of alternatives. See In re Harnisch, 631 F.2d 716, 721-22 (CCPA 1980) and Ex parte Hozumi, 3 USPQ2d 1059, 1060 (Bd. Pat. App. & Int. 1984). A Markush grouping is proper if the alternatives defined by the Markush group (i.e., alternatives from which a selection is to be made in the context of a combination or process, or alternative chemical compounds as a whole) share a “single structural similarity” and a common use. A Markush grouping meets these requirements in two situations. First, a Markush grouping is proper if the alternatives are all members of the same recognized physical or chemical class or the same art-recognized class, and are disclosed in the specification or known in the art to be functionally equivalent and have a common use. Second, where a Markush grouping describes alternative chemical compounds, whether by words or chemical formulas, and the alternatives do not belong to a recognized class as set forth above, the members of the Markush grouping may be considered to share a “single structural similarity” and common use where the alternatives share both a substantial structural feature and a common use that flows from the substantial structural feature. See MPEP § 2117.
The Markush groupings of a percentage of valid characters, a percentage of valid words, an average sentence length, an average number of sentences per page, an average number of words per page, an average words per document, and an average number of pages per document AND a business letter, a poem, an essay, a legal document, a promissory note, a medical record, a scientific article, a newspaper article, a technical article, a thesis paper, a journal article, a blog entry, a financial memo, a resume, a patent application, and a post to a social media site is improper because the alternatives defined by the Markush grouping do not share both a single structural similarity and a common use. 
To overcome this rejection, Applicant may set forth each alternative (or grouping of patentably indistinct alternatives) within an improper Markush grouping in a series of independent or dependent claims and/or present convincing arguments that the group members recited in the alternative within a single claim in fact share a single structural similarity as well as a common use.


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Contreras et al., US Patent Application Publication no. US 2020/0142998 (“Contreras”), and further in view of Alkov et al., US Patent Application Publication no. US 2014/0358928 (“Alkov”).
Claim 1:
	Contreras teaches or suggests a method implemented by an information handling system that includes a processor and a memory accessible by the processor, the method comprising:
	receiving a document (see para. 0005 - receiving a plurality of documents, wherein each of the plurality of documents contains natural language text. A plurality of values is determined for a first plurality of predefined attributes of the plurality of documents.);
	retrieving a set of linguistic metrics (see para. 0003 - generating a plurality of quality scores for the plurality of documents by processing the plurality of values using a machine learning model, wherein the plurality of quality scores indicate a suitability of each of the plurality of documents to be processed using a target processing operation; para. 0012 - utilize concrete rules and machine learning models to provide objective quality determinations for documents to be ingested; para. 0014 - identify new attributes that are not considered by the current models, and the models can be updated to include these new attributes; para. 0015 – integration of their corpora. In such an embodiment, the relevant heuristics may vary; para. 0022 - analyzes these representative values to identify
features or heuristics that are useful to define the sets and classify documents as high or low quality; para. 0023 - once the Attribute Extractor 140 has extracted the relevant attributes or features for each document, these attributes or features are analyzed by the Document Scorer 150 to generate a quality score for each document. rejects documents with a quality score below a threshold, and approves the documents with a score above a threshold; para. 0026 - analyzes these representative values to identify features or heuristics that are useful to define the sets and classify documents as high or low quality.);
	automatically determining a quality of the received document, wherein the quality of the received document is based on a set of linguistic features found in the document as compared to the retrieved set of linguistic metrics (see para. 0022 - analyzes these representative values to identify features or heuristics that are useful to define the sets and 
	ingesting the document into a corpus that is utilized by a question-answering (QA) system, wherein in the ingesting is based on the determined quality (see para. 0041 - generating a quality score comprises identifying or extracting one or more attributes of the document, and analyzing those attributes using one or more ML models. detern1ines whether the generated quality score exceeds a threshold. If the score does not exceed the threshold, the method 400 continues to block 425, where the Quality Assessment Application 130 flags the selected document; para. 0042 - if the score exceeds a first threshold, the document can be marked as acceptable for immediate processing. If the score is below the first threshold but above a second, the document can be marked as potentially useful or acceptable, pending further processing. Additionally, in an embodiment, if the score is below the second threshold, the document can be rejected or flagged as unacceptable; para. 0043 - Quality Assessment Application 130 flags the document as suitable, or otherwise provides an indication that the document is ready for 
	Contreras appears to fail to explicitly disclose and a document type, wherein the document type identifies a document category to which the received document belongs; metrics corresponding to the document type.
	Alkov teaches or suggests and a document type, wherein the document type identifies a document category to which the received document belongs; linguistic metrics corresponding to the document type; linguistic features compared to linguistic metrics (see para. 0021 - clustering of questions in accordance with the features/attributes extracted from the questions. In one aspect of the illustrative embodiments, as part of a question analysis phase, the question is analyzed to identify various features/attributes of the question, e.g., focus, lexical answer type (LAT), question classification (QClass), and question sections (QSections). subsequently submitted questions may be similarly clustered such as by measuring the Euclidean dimensional distance of the subsequent questions from cluster centers. Depending on the training/testing objective, the subsequently submitted questions can be either accepted or rejected based on the clustering of the subsequently submitted questions with regard to the defined clusters; para. 0036 - Categorizing the questions, such as in terms of roles, type of information, tasks, or the like, associated with the question, in each document of a corpus of data may allow the QA system to more quickly and efficiently identify documents containing content related to a specific query; para. 0041 - performs deep analysis on the language of the input question and the language used in each of the portions of the corpus of data; para. 0096 - after having been generated by the separate feature/attribute extraction, clustering, and 
Accordingly, it would have been obvious to one having ordinary skill before the effective filing date of the claimed invention to modify the system and method, taught in Contreras, to include and a document type, wherein the document type identifies a document category to which the received document belongs; metrics corresponding to the document type for the purpose of efficiently identifying documents containing content related to specific query, thereby improving accuracy, performance, and confidence in a QA or knowledge system, as taught by Alkov (para. 0038). 
Claim(s) 8 and 15:
Claim(s) 8 and 15 correspond to Claim 1, and thus, Contreras and Alkov teach or suggest the limitations of claim(s) 8 and 15 as well.

Claim 2:
	Contreras further teaches or suggests computing a quality score corresponding to the determined quality; and comparing the quality score to a quality threshold, wherein the ingesting is performed in response to the quality score meeting the quality threshold (see para. 0005 - identifying a subset of documents from the plurality of documents having respective quality scores below a predefined threshold. The subset of documents is flagged for further processing. Finally, the operation includes selectively processing, using the target processing operation, at least one document of the plurality of documents that is not flagged; para. 0023 - Document Scorer 150 automatically rejects documents with a quality score below a threshold, and approves the documents with a score above a threshold. In such an embodiment, the approved documents can be used for further ingestion or processing; para. 0041 - generating a quality score comprises identifying or extracting one or more attributes of the document, and analyzing those attributes using one or more ML models. determines whether the generated quality score exceeds a threshold. If the score does not exceed the threshold, the method 400 continues to block 425, where the Quality Assessment Application 130 flags the selected document; para. 0042 - if the score exceeds a first threshold, the document can be marked as acceptable for immediate processing. If the score is below the first threshold but above a second, the document can be marked as potentially useful or acceptable, pending further processing. Additionally, in an embodiment, if the score is below the second threshold, the document can be rejected or flagged as unacceptable; para. 0043 - Quality Assessment Application 130 flags the document as suitable, or otherwise provides an indication that the document is ready for ingestion or processing; para. 0045 - selectively processes, using the target processing operation, at least one document of the plurality of documents that is not flagged.).
Claim(s) 9 and 16:
Claim(s) 9 and 16 correspond to Claim 2, and thus, Contreras and Alkov teach or suggest the limitations of claim(s) 9 and 16 as well.

Claim 3:
	Contreras further teaches or suggests computing a quality score corresponding to the determined quality; and ingesting, into the corpus, the quality score as metadata that is associated with the ingested document (see para. 0014 - ML model(s) can be refined based on new attributes or heuristics. For example, in one embodiment, if a document is scored poorly but the document is known to be of high quality, the document can be analyzed to identify new attributes that are not considered by the current models, and the models can be updated to include these new attributes; para. 0027 - ML Model(s) 155 may be refined based on this revised classification (e.g., the document can be used as an example of a "high quality" input. determine that this new attribute should be added as a new Heuristic 160 to aid future scoring; para. 0029 – Document Scorer 150 also identifies features that lowered the Quality Score 215, as indicated by block 220. That is, in some embodiments, the Document Scorer 150 can determine which of the Document Attributes 210 increased the Quality Score 215, and which decreased it; para. 0037 - determines that it is
likely useful or relevant in determining the quality of documents, and that future documents should be analyzed based in part on this attribute.).
Claim(s) 10 and 17:
Claim(s) 10 and 17 correspond to Claim 3, and thus, Contreras and Alkov teach or suggest the limitations of claim(s) 10 and 17 as well.

Claim 4:
	Contreras further teaches or suggests wherein each of the linguistic features are weighted based on an importance of the respective linguistic feature and wherein one or more of the linguistic features are selected from a group consisting of a percentage of valid characters, a percentage of valid words, an average sentence length, an average number of sentences per page, an average number of words per page, an average words per document, and an average number of pages per document (see para. 0018 - the Heuristics 160 are attributes or features that have been previously identified as useful or important in
determining the quality of a document. the heuristics can include features such as the number of sentences in the document, the percentage of sentences that are complete (e.g., as opposed to incomplete sentences or unexpected sentence breaks), the average length of sentences in a document, the average number of blank spaces or lines between sentences, words, or tokens in the document, the number of sections or other structured document tags, the number of sentences in each section or structure, and the like; para. 0021 - the attributes can include the number of sentences in the document, the percentage of sentences that are complete (e.g., as opposed to incomplete sentences or unexpected sentence breaks), the average length of sentences in the document, the average number of blank spaces or lines between sentences, words, or tokens in the document, the number of sections or other structured document tags, the number of sentences in each section or structure, a number of unexpected or misplaced characters (e.g., characters or symbols from a different alphabet or language), a number of unmatched parenthesis or other symbols, and the like; para. 0026 - Heuristics Generator 145 analyzes these representative  tends to correlate with a higher or lower Quality Score. correlations are detern1ined automatically (e.g., by processing a set of values, and increasing or decreasing one or more values to determine the effect on the Quality Score.).
	Alkov further teaches or suggests to the document type (see para. 0021 - clustering of questions in accordance with the features/attributes extracted from the questions. In one aspect of the illustrative embodiments, as part of a question analysis phase, the question is analyzed to identify various features/attributes of the question, e.g., focus, lexical answer type (LAT), question classification (QClass), and question sections (QSections).; para. 0036 - Categorizing the questions, such as in terms of roles, type of information, tasks, or the like, associated with the question, in each document of a corpus of data may allow the QA system to more quickly and efficiently identify documents containing content related to a specific query; para. 0096 - after having been generated by the separate feature/attribute extraction, clustering, and generation of the training and testing question sets; para. 0098 - separate training question sets and testing questions sets may be generated for different domains. training of the QA system, clustering may be 
Accordingly, it would have been obvious to one having ordinary skill before the effective filing date of the claimed invention to modify the system and method, taught in Contreras, to include to the document type for the purpose of efficiently identifying documents containing content related to specific query, thereby improving accuracy, performance, and confidence in a QA or knowledge system, as taught by Alkov (para. 0038). 
Claim(s) 11 and 18:
Claim(s) 11 and 18 correspond to Claim 4, and thus, Contreras and Alkov teach or suggest the limitations of claim(s) 11 and 18 as well.

Claim 5:
	Contreras further teaches or suggests prior to receiving the document, training a machine learning (ML) system of the linguistic metrics, wherein the training comprises: inputting a plurality of training document to the ML system, wherein the training documents are known to be high quality documents; extracting the linguistic metrics from the plurality of training documents; and providing a weighting of the extracted linguistic metrics based on an importance of the respective linguistic metrics (see para. 0018 - the Heuristics 160 are attributes or features that have been previously identified as useful or important in
determining the quality of a document. the heuristics can include features such as the number of sentences in the document, the percentage of sentences that are complete (e.g., as opposed to incomplete sentences or unexpected sentence breaks), the average length of sentences in a document, the average number of blank spaces or lines between sentences, words, or tokens in the document, the number of sections or other structured document tags, the number of sentences in each section or structure, and the like; para. 0021 - documents and extracts attributes, characteristics, or features about the text of each document. the attributes can include the number of sentences in the document, the percentage of sentences that are complete (e.g., as opposed to incomplete sentences or unexpected sentence breaks), the average length of sentences in the document, the average number of blank spaces or lines between sentences, words, or tokens in the document, the number of sections or other structured document tags, the number of sentences in each section or structure, a number of unexpected or misplaced characters (e.g., characters or symbols from a different alphabet or language), a number of unmatched parenthesis or other symbols, and the like; para. 0026 - Heuristics Generator 145 analyzes these representative values to identify features or heuristics that are useful to define the sets and classify documents as high or low quality. In one embodiment, the Heuristics Generator 145 determines whether the values for a particular feature differ between the high and low quality sets. If the difference exceeds a threshold, the Heuristics Generator 145 can deteerminne that the feature is valuable, and add it to the list of Heuristics; para. 0028 – attributes can include the number of sentences, the number of sections, the number of  tends to correlate with a higher or lower Quality Score. correlations are detern1ined automatically (e.g., by processing a set of values, and increasing or decreasing one or more values to determine the effect on the Quality Score; para. 0039 - single ML Model 155 can be trained based on documents gathered from multiple corpora. Once trained, the ML Models 155 can be used to classify or score new documents, in order to determine their suitability for ingestion or processing.).
	Alkov further teaches or suggests of the document type; corresponding to the document type; the document type; the document type (see para. 0021 - clustering of questions in accordance with the features/attributes extracted from the questions. In one aspect of the illustrative embodiments, as part of a question analysis phase, the question is analyzed to identify various features/attributes of the question, e.g., focus, lexical answer type (LAT), question classification (QClass), and question sections (QSections).; para. 0036 - Categorizing the questions, such as in terms of roles, type of information, tasks, or the like, associated with the question, in each document of a corpus of data may allow the QA system to more quickly and efficiently identify documents containing content related to a specific query; para. 0096 - after having been generated by the separate feature/attribute extraction, clustering, and generation of the training and testing question sets; para. 0098 - separate training question sets and testing questions sets may be generated for different domains. training of the QA system, clustering may be performed on the training questions to generate training clusters associated with different question domains, e.g., topics, areas of interest, question subject matter categories, or the like. These question domains may be of various types including, for example, healthcare, financial, legal, or other types of 
Accordingly, it would have been obvious to one having ordinary skill before the effective filing date of the claimed invention to modify the system and method, taught in Contreras, to include of the document type; corresponding to the document type; the document type; the document type for the purpose of efficiently identifying documents containing content related to specific query, thereby improving accuracy, performance, and confidence in a QA or knowledge system, as taught by Alkov (para. 0038). 
Claim(s) 12:
Claim(s) 12 and 19 correspond to Claim 5, and thus, Contreras and Alkov teach or suggest the limitations of claim(s) 12 and 19 as well.

Claim 6:
	Alkov further teaches or suggests wherein the document type is selected from a group consisting of a business letter, a poem, an essay, a legal document, a promissory note, a medical record, a scientific article, a newspaper article, a technical article, a thesis paper, a journal article, a blog entry, a financial memo, a resume, a patent application, and a post to a social media site (see para. 0098 - separate training question sets and testing questions sets may be generated for different domains. training of the QA system, clustering may be performed on the training questions to generate training clusters associated with different question domains, e.g., topics, areas of interest, question subject matter categories, or the 
Accordingly, it would have been obvious to one having ordinary skill before the effective filing date of the claimed invention to modify the system and method, taught in Contreras, to include wherein the document type is selected from a group consisting of a business letter, a poem, an essay, a legal document, a promissory note, a medical record, a scientific article, a newspaper article, a technical article, a thesis paper, a journal article, a blog entry, a financial memo, a resume, a patent application, and a post to a social media site for the purpose of efficiently identifying documents containing content related to specific query, thereby improving accuracy, performance, and confidence in a QA or knowledge system, as taught by Alkov (para. 0038). 
Claim(s) 13 and 20:
Claim(s) 13 and 20 correspond to Claim 6, and thus, Contreras and Alkov teach or suggest the limitations of claim(s) 13 and 20 as well.

Claim 7:
	Alkov further teaches or suggests wherein the type of document further includes a document subtype (see para. 0082 - questions than those included in training question set 592 such that the training question set 592 is a subset of the question pool; para. 0083 - based on predefined criteria (e.g., creation dates, numbers of times the question has been 
Accordingly, it would have been obvious to one having ordinary skill before the effective filing date of the claimed invention to modify the system and method, taught in Contreras, to include wherein the type of document further includes a document subtype for the purpose of efficiently identifying documents containing content related to specific query, thereby improving accuracy, performance, and confidence in a QA or knowledge system, as taught by Alkov (para. 0038). 
Claim(s) 14:
Claim(s) 14 correspond to Claim 7, and thus, Contreras and Alkov teach or suggest the limitations of claim(s) 14 as well.



Note
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure: 
	Beller et al., US Patent Application Publication no. US 2017/0329754.










Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Andrew T McIntosh whose telephone number is (571)270-7790. The examiner can normally be reached M-Th 8:00am-5:30pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.

Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/ANDREW T MCINTOSH/Primary Examiner, Art Unit 2176