DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . 
This communication is responsive to the original application filed on 9/23/2019. This action is Non-Final. Claims 1 – 20 are pending and have been examined.  
Drawings
The applicant’s drawings submitted are acceptable for examination purposes. 
Specification
The applicant’s specification submitted is acceptable for examination purposes. 
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1 – 20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to non-patentable subject matter. The claims are directed to an abstract idea without significantly more.
Claims 1 – 20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. The judicial exception is not integrated into a practical application. The claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception. The eligibility analysis in support of these findings is provided below, on accordance with the “2019 Revised Patent Subject Matter Eligibility Guidance” (published on 1/7/2019 in Fed. Register, Vol. 84, No. 4 at pgs. 50 – 57, hereinafter referred to as the “2019 PEG”).
Step 1. In accordance with Step 1 of the eligibility inquiry (as explained in MPEP 2106), it is first noted the claim method (claims 1 – 18), medium (claim 19) and device (claim 20) are directed to one of the eligible categories of subject matter and therefore satisfies Step 1.
Step 2. In accordance with Step 2A Prong one of 2019 PEG, it is noted that the claims recite an abstract idea by reciting a method of organization human activities, which falls into the “software per se” group within group within the enumerated groupings of abstract ideas set forth in the 2019 PEG. The claims recite the abstract idea of joined features, which falls within the abstract idea of a mental process. It is noted that cited abstract idea also falls within the mental processes group within the enumerated groupings of abstract ideas set forth in 2019 PEG. The recitation of generic computer components does not negate the abstractness of given limitation. The limitations reciting the abstract idea are highlighted in italics and the limitation directed to additional elements highlighted in bold, as set forth in exemplary claim 1: 
A method comprising: combining feature types within one or more levels of feature sets into a joined host level feature set; extracting numerical features and content features from ground truth documents and random documents; joining the numerical features with the one or more levels of feature sets to create a set of joined features for the ground truth documents and the random documents; training a document scoring model utilizing machine learning to score documents using the set of joined features; scoring documents with document scores using the document scoring model based upon the content features and the set of joined features with document scores obtained during training; and selectively indexing a subset of the documents based upon the document scores of the documents.
With respect to Step 2A Prong Two of the 2019 PEG, the judicial exception is not integrated into a practical application. The additional elements are directed to joined features (claim 1). However, these elements fail to integrate the abstract idea into a practical application because they fail to provide an improvement to the functioning of a computer or to any other technology or technical field, fail to apply the exception with a particular machine, fail to apply the judicial exception to effect a particular treatment or prophylaxis for a disease or medical condition, fail to effect a transformation of a particular article to a different state or thing, and fail to apply/use the abstract idea in a meaningful way beyond generally linking the use of the judicial exception to a particular technological environment. Furthermore, these elements have been fully considered, however they are directed to the use of generic computing elements to perform the abstract idea, which is not sufficient to amount to a practical application (as noted in the 2019 PEG) and is tantamount to simply saying “apply it” using a general purpose computer, which merely serves to tie the abstract idea to a particular technological environment (computer based operating environment) by using the computer as a tool to perform the abstract idea, which is not sufficient to amount a particular application.
Accordingly, because the Step 2A Prong One and Prong Two analysis resulted in the conclusion that the claims are directed to an abstract idea, additional analysis under Step 2B of the eligibility inquiry must be conducted in order to determine whether any claim element of combination of elements amount to significantly more than the judicial exception. 
Step 2B. It has been determined that the claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception. The additional limitations are directed to document scoring, though at a very high level of generality and without imposing meaningful limitation on the scope of the claim. Such generic, high-level, and nominal involvement of a computer or computer-based elements for carrying out the invention merely serves to tie the abstract idea to a particular technological environment, which is not enough to render the claims patent-eligible, as noted at pg. 74624 of Federal Register/ Vol. 79, No. 241, citing Alice, which in turn cites Mayo. Further, See, e.g., Alice Corp. Pty. Ltd. v. CLS Bank Int'l, 134 S. Ct. 2347, 2359-60, 110 USPQ2d 1976, 1984 (2014). See also OIP Techs. v. Amazon.com, 788 F.3d 1359, 1364, 115 USPQ2d 1090, 1093-94 (Fed. Cir. 2015) ("Just as Diehr could not save the claims in Alice, which were directed to 'implement[ing] the abstract idea of intermediated settlement on a generic computer', it cannot save O/P's claims directed to implementing the abstract idea of price optimization on a generic computer.") ( citations omitted). See also, Affinity Labs of Texas LLC v. DirecTV LLC, 838 F.3d 1253, 1257-1258 (Fed. Cir. 2016) (mere recitation of a GUI does not make a claim patent-eligible); Intellectual Ventures I LLC v. Capital One Bank, 792 F.3d 1363, 1370 (Fed. Cir. 2015) ("the interactive interface limitation is a generic computer element".
The additional elements are broadly applied to the abstract idea(s) at a high level of generality ("similar to how the recitation of the computer in the claims in Alice amounted to mere instructions to apply the abstract idea of intermediated settlement on a generic computer," as explained in MPEP § 2106.05(f)) and they operate in well-understood, routine, and conventional manners. Furthermore, generally transmitting, analyzing, and outputting (e.g., displaying) data are examples of insignificant extra-solution activity. The recitation document scoring is performed by an apparatus/device is the epitome of "mere instructions to implement an abstract idea on a computer". 
MPEP § 2106.0S(d)(II) sets forth the following:
The courts have recognized the following computer functions as well-understood, routine, and conventional functions when they are claimed in a merely generic manner (e.g., at a high level of generality) or as insignificant extra-solution activity.
• Receiving or transmitting data over a network, e.g., using the Internet to gather data, Symantec ... ; TLI Communications LLC v. AV Auto. LLC ... ; OIP Techs., Inc., v. Amazon.com, Inc ... ; buySAFE, Inc. v. Google, Inc ... ;
• Performing repetitive calculations, Flook ... ; Bancorp Services v. Sun Life ... ;
• Electronic recordkeeping, Alice Corp ... ; Ultramercial ... ;
• Storing and retrieving information in memory, Versata Dev. Group, Inc. v. SAP Am., Inc ... ;
• Electronically scanning or extracting data from a physical document, Content Extraction and Transmission, LLC v. Wells Fargo Bank ... ; and
• A web browser's back and forward button functionality, Internet Patent
• Corp. v. Active Network, Inc. ...

. . . Courts have held computer-implemented processes not to be significantly more than an abstract idea (and thus ineligible) where the claim as a whole amounts to nothing more than generic computer functions merely used to implement an abstract idea, such as an idea that could be done by a human analog (i.e., by hand or by merely thinking) ...
In addition, when taken as an ordered combination, the ordered combination adds nothing that is not already present as when the elements are taken individually. There is no indication that the combination of elements integrate the abstract idea into a practical application. Their collective functions merely provide conventional computer implementation. Therefore, when viewed as a whole, these additional claim elements do not provide meaningful limitations to transform the abstract idea into a practical application of the abstract idea or that the ordered combination amounts to significantly more than the abstract idea itself.
The dependent claims 2 – 18 have been fully considered as well, however, similar to the finding for claims above, these claims are similarly directed to the abstract idea of joined features, without integrating it into a practical application and with, at most, a general purpose computer that serves to tie the idea to a particular technological environment, which does not add significantly more to the claims. The ordered combination of elements in the dependent claims (including the limitations inherited from the parent claim(s)) add nothing that is not already present as when the elements are taken individually. There is no indication that the combination of elements improves the functioning of a computer or improves any other technology. Their collective functions merely provide conventional computer implementation. Accordingly, the subject matter encompassed by the dependent claims fails to amount to significantly more than the abstract idea.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1 – 20 are rejected under 35 U.S.C. 103 as being unpatentable over Wong et al., U.S. Patent Application Publication No.: 2008/0104113 (Hereinafter “Wong”), and further in view of Wang et al., U.S. Patent Application Publication No.: 2012/0143792 (Hereinafter “Wang”).
Regarding claim 1, Wong teaches, a method comprising:
combining feature types within one or more levels of feature sets into a joined host level feature set (Wong [0090]: In practice, an overall score can be generated using any combination of the metrics described above (and, in some embodiments, in addition to other suitable metrics).  For this example, the overall score is calculated in response to the domain density score, the anchor text score, the URL string score, and the category need score, and the overall score is also influenced by the link proximity score.  In one embodiment, a "combined" score is generated from the domain density, anchor text, URL string, and category need scores, and that combined score is adjusted using the link proximity score to obtain the downloading priority.);
extracting numerical features and content features from ground truth documents and random documents (Wong [0106]: As mentioned above in connection with the anchor text metric, task 308 may be performed to extract words from the anchor text of the URL and, for each extracted word, calculate a respective hash value that serves as an anchor text token.  Moreover, task 308 may be performed to extract words from the anchor text, identify at least one combination of extracted words, and, for each combination of words, calculate a respective hash value that serves as an anchor text token.  Likewise, process 300 may derive one or more URL string tokens from the character string of the URL (task 310).  As mentioned above in connection with the URL string metric, task 310 may be performed to extract strings from the URL and, for each extracted string, calculate a respective hash value that serves as a URL string token.  Moreover, task 310 may be performed to extract strings from the UWangRL, identify at least one combination of extracted strings, and, for each combination of strings, calculate a respective hash value that serves as a URL string token.);
Wong does not clearly teach, joining the numerical features with the one or more levels of feature sets to create a set of joined features for the ground truth documents and the random documents; However, Wang [0039 – 0040] teaches, “Document features that may be used to determine the value of a page for indexing include page length, topics of the page, number of ads in the page, and the like.  … Additionally, characteristics or attributes of the links between pages can also be considered as features for page selection, referred to herein as "edge features" (i.e., links between pages are represented as edges in the URL graph 110 described below).  Examples of edge features that may be considered can include whether the hyperlink between two pages is an inter-website link or an intra-website link.  Other edge features may include the number of real or separate hyperlinks between the two pages, and so forth.  The edge features can be attached to or associated with the two pages involved, and included as the features 108 or 128 for those pages.”
training a document scoring model utilizing machine learning to score documents using the set of joined features (Wang [0015]: For example, a training set of pages may be sorted into labeled groups based on gathered user behavior data.  Some labeled groups may be assigned a higher priority for being selected for indexing than other labeled groups.  Further, there may be multiple sources of data that can be used to define and determine appropriate labels or classifications for the sorted groups of pages.  Sources of label data may include user behavior information, such as click information, sampled queries and results, bookmark data, relevance data, spam data, abandoned queries, and so forth.  The labels for the groups may be defined based on information combined from the multiple sources to generate a label graph.  The label graph may be a directed graph that represents relative priority of each labeled group for selection for indexing.).
scoring documents with document scores using the document scoring model based upon the content features and the set of joined features with document scores obtained during training (Wang [0032]: In label graph 200, hierarchical selection priority relationships are established for the various different types of labeled groups established for the crawled training pages 120 obtained from the set of training URLs 114.  For example, the training URLs 114 and crawled training pages 120 may be cross-referenced with the user behavior data 122 for sorting the crawled training page 120 into the labeled groups and for establishing the label graph 200.  In the label graph 200, the clicked top-1 pages 214 have the highest priority for selection, as indicated by the edges 220 outbound to unclicked top-1 pages 216 and clicked top-10 pages 218.  Further, clicked top-10 pages 218 have a higher priority than clicked top-1000 pages 222 or unclicked top-10 pages 224, and both of these have a higher priority for selection than unclicked top-1000 pages 226.  In addition, good abandonment pages 228 also have a higher priority than clicked top-1000 pages 222 or unclicked top-10 pages 224, while not-in-index pages 230 have a higher priority than the unclicked top-1000 pages 226.  Highly bookmarked pages 232 and the unclicked top-1000 pages 226 have a higher priority than other unclicked pages 234 and spam or junk pages 236.); and
selectively indexing a subset of the documents based upon the document scores of the documents (Wang [0065]: At block 610, the page selection component 102 selects, for indexing, a subset of the crawled web pages 126 based on the URL graph 110, the extracted features 128 and the model 112.  Thus, when performing the selecting, the model 112 takes into consideration the links between URLs, the features of particular pages, and an established priority hierarchy for various types of user behavior with respect to the pages.). 
It would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to incorporate the teaching of Wong et al. to the Wang’s system by adding extracted features. The references (Wong and Wang) teach features that are analogous art and they are directed to the same field of endeavor, such as data indexing. Ordinary skilled artisan would have been motivated to do so to provide Wong’s system with enhanced machine learning. (See Wang [Abstract], [0015], [0032], [0039 – 0040], [0065]). One of the biggest advantages of machine learning algorithms is their ability to improve over time. Machine learning technology typically improves efficiency and accuracy thanks to the ever-increasing amounts of data that are processed. 
Regarding claim 2, the method of claim 1, wherein a set of search results for a query comprises a document, and the method comprising:
assigning a rank to a document within the set of search results based upon a document score assigned to the document (Wong [0029]: An embodiment of URL scoring module 208 may also be configured to assign downloading priorities to new URLs based upon one or more metrics that otherwise influence the order in which web crawler core module 202 downloads new web pages.).
Regarding claim 3, the method of claim 2, comprising:
displaying the set of search results in response to receiving the query, wherein a document is populated within the set of search results based upon the rank (Wong [0112]: The web crawler application may be included in, or executed by, the web crawler core module 202 in FIG. 2).  If process 300 determines that the downloaded web page (from task 302) contains more outlinked URLs that need to be analyzed (query task 328), then process 300 may be re-entered at task 306 to perform a similar procedure on the next outlinked URL.  Otherwise, process 300 may proceed to download the next web page in an order that is determined by its downloading priority (task 330).). 
Regarding claim 4, the method of claim 1, comprising: indexing a document based upon a document score exceeding a threshold (Wong [0022]: Web crawler system 200 may include, for example: a web crawler core module 202; a web page classifier 204 coupled to web crawler core module 202; an indexing engine 206 coupled to the web page classifier 204; a URL scoring module 208 coupled to indexing engine 206; and a token analyzer 210 coupled to web page classifier 204.  For this embodiment, web page classifier 204, indexing engine 206, and URL scoring module 208 represent (or are otherwise associated with) an indexing module 212 for web crawler system 200.).
Regarding claim 5, the method of claim 1, comprising:
refraining from indexing a document based upon a document score not exceeding a threshold (Wong [0022]: Web crawler system 200 may include, for example: a web crawler core module 202; a web page classifier 204 coupled to web crawler core module 202; an indexing engine 206 coupled to the web page classifier 204; a URL scoring module 208 coupled to indexing engine 206; and a token analyzer 210 coupled to web page classifier 204.  For this embodiment, web page classifier 204, indexing engine 206, and URL scoring module 208 represent (or are otherwise associated with) an indexing module 212 for web crawler system 200.).
Regarding claim 6, the method of claim 1, wherein the one or more levels of feature sets comprise at least one of page level features joined into a joined page level feature set, domain level features joined into a joined domain level feature set, or host level features joined into a joined host level feature set (Wang [0016]: Implementations herein may also take into account, during the page selection, a plurality of features identified from the crawled web pages.  For example, features that may be considered can include: ranking features, such as page rank, domain rank, number of in-links, etc.; URL features, such as URL length, number of hyphens in a URL, number of digits in a URL, URL depth, etc.; click features, such as number of clicks during different types of user browsing behavior; graph propagation features based on URL graph propagation to obtain projected aggregated clicks, user satisfaction, etc.; document features, such as page length, topics of the page, number of ads in the page, etc.; and link or edge features, such as whether a hyperlink is a link between two pages within the same website or at different sites, number of hyperlinks between two pages, and the like.).
Regarding claim 7, the method of claim 1, wherein the document score is indicative of at least one of an importance or quality of the document (Wong [Abstract]: The system employs a plurality of URL scoring metrics that generate individual scores for outlinked URLs contained in a downloaded web page.  For each outlinked URL, the individual scores are combined using an appropriate algorithm or formula to generate an overall score that represents a downloading priority for the outlinked URL.  The web crawler application can then download subsequent web pages in an order that is influenced by the downloading priorities.).
Regarding claim 8, the method of claim 1, wherein the document score is indicative of a relevancy of the document (Wong [0108]: In this example, web crawling process 300 also generates an anchor text score (task 314) in response to the anchor text of the URL being analyzed.  Notably, task 314 may utilize the anchor text token(s) derived by task 308.  As mentioned above in connection with the anchor text metric, for each extracted word, task 314 calculates a word score that indicates the probability of relevance to the desired web page type (e.g., commercial product web pages).  Moreover, task 314 calculates a combined word score that indicates the probability of relevance to the desired web page type.  In this example, one of the individual scores (word score or combined word score) is selected as the anchor text score utilized by process 300.).
Regarding claim 9, the method of claim 1, wherein a numerical feature corresponds to a numerical statistic of a target document (Wong [0030]: The downloading priority may, for example, be a simple numerical score.  In one embodiment, URL scoring module 208 calculates the downloading priority in response to all of the individual scores by processing the individual scores with a suitable algorithm or function.).
Regarding claim 10, the method of claim 1, wherein a document comprises a webpage (Wong [0014]: The downloading order is influenced by URL scoring that is performed for outlinks (and their corresponding URLs) contained in downloaded and analyzed web pages.  The URL scoring makes crawling and indexing of web pages of a desired or specified type more efficient, thus enabling targeted web crawling to be performed with less computation and hardware.).
Regarding claim 11, the method of claim 1, wherein a document comprises a text document (Wong [0030]: For this example, URL scoring module 208 is suitably configured to generate a plurality of scores for a new URL, where each individual score is related to a different scoring or ranking metric.  These metrics include, without limitation: a domain density metric that results in a domain density score for the URL; an anchor text metric that results in an anchor text score for the URL; a URL string score metric that results in a URL string score for the URL; a link proximity metric that results in a link proximity score for the URL; and a category need metric that results in a category need score for the URL--the category need metric may indicate a predicted category for the web page corresponding to the URL.).
Regarding claim 12, the method of claim 1, wherein a numerical feature corresponds to a number of times a target document is linked to (Wong [0069]: The URL string score metric is used to predict the likelihood that a URL string will lead to a web page of the desired type/category.  The URL string represents the actual character string that defines the URL (in contrast to the anchor text, which is the visible rendered link that corresponds to the underlying URL string).  In the example where commercial product web pages are of interest, the URL string score metric results in a URL string score that indicates probability that the outlinked web page is a product page.).
Regarding claim 13, the method of claim 6, wherein a domain level feature corresponds to a feature of a domain associated with a target document (Wong [0033]: Referring to FIG. 2, the indexing rate for each domain is maintained in domain tracking database 220.  Generally, the domain density score resulting from the domain density metric indicates relevance of a URL to a desired web page type.  In the example where commercial product web pages are of interest, for each domain, the domain density metric maintains a count of the total product pages indexed and the total pages processed.). 
Regarding claim 14, the method of claim 6, wherein a host level feature corresponds to a feature of a host associated with a target document (Wang [0067]: Each web server 704 may host or provide one or more web pages 708 having one or more corresponding URLs that may be targeted for crawling by a search engine 710 on the computing device 702.).
Regarding claim 15, the method of claim 1, wherein the machine learning comprises a gradient boosted decision tree regression technique (Wang [0014]: machine-learning techniques for page selection).
Regarding claim 16, the method of claim 1, comprising:
merging the numerical features with the content features for scoring a document using the document scoring model (Wong [0023]: Web crawler core module 202 is configured to download web pages for analysis and indexing by web crawler system 200.  Generally, web crawler system 200 analyzes downloaded web pages, ranks/scores the outgoing links (which correspond to outgoing URLs that point to different web pages) contained in the downloaded web pages using URL scoring module 208, and uses the URL scores to influence the order in which web crawler core module 202 downloads web pages corresponding to the outgoing links.).
Regarding claim 17, the method of claim 1, wherein a numerical feature corresponds to a ratio of an amount of a first type of content within a target document to an amount of a second type of content within the target document (Wong [0030]: URL scoring module 208 calculates a downloading priority (an overall score for the URL) from at least some of the individual scores.  The downloading priority may, for example, be a simple numerical score.  In one embodiment, URL scoring module 208 calculates the downloading priority in response to all of the individual scores by processing the individual scores with a suitable algorithm or function.).
Regarding claim 18, the method of claim 1, wherein the content features comprise textual features of a target document (Wong [0025]: For example, web page classifier 204 may analyze and process: content included in a web page, the URL of the web page; anchor text (i.e., the visible text associated with a hyperlink on the web page) of outgoing links on the web page; the URLs of outgoing links on the web page; and the like.  Web page classifier 204 sends classified web pages (and possibly descriptive data, characterization data, or metadata related to the classified web pages) to indexing engine 206 for further processing.).
Regarding claim 19, Wong teaches, a non-transitory machine readable medium comprising instructions for performing a method, which when executed by a machine, causes the machine to:
combine feature types within one or more levels of feature sets into a joined host level feature set (Wong [0090]: In practice, an overall score can be generated using any combination of the metrics described above (and, in some embodiments, in addition to other suitable metrics).  For this example, the overall score is calculated in response to the domain density score, the anchor text score, the URL string score, and the category need score, and the overall score is also influenced by the link proximity score.  In one embodiment, a "combined" score is generated from the domain density, anchor text, URL string, and category need scores, and that combined score is adjusted using the link proximity score to obtain the downloading priority.);
extract numerical features and content features from ground truth documents and random documents (Wong [0106]: As mentioned above in connection with the anchor text metric, task 308 may be performed to extract words from the anchor text of the URL and, for each extracted word, calculate a respective hash value that serves as an anchor text token.  Moreover, task 308 may be performed to extract words from the anchor text, identify at least one combination of extracted words, and, for each combination of words, calculate a respective hash value that serves as an anchor text token.  Likewise, process 300 may derive one or more URL string tokens from the character string of the URL (task 310).  As mentioned above in connection with the URL string metric, task 310 may be performed to extract strings from the URL and, for each extracted string, calculate a respective hash value that serves as a URL string token.  Moreover, task 310 may be performed to extract strings from the URL, identify at least one combination of extracted strings, and, for each combination of strings, calculate a respective hash value that serves as a URL string token.);
Wong does not clearly teach, join the numerical features with the one or more levels of feature sets to create a set of joined features for the ground truth documents and the random documents; However, Wang [0039 – 0040] teaches, “Document features that may be used to determine the value of a page for indexing include page length, topics of the page, number of ads in the page, and the like.  … Additionally, characteristics or attributes of the links between pages can also be considered as features for page selection, referred to herein as "edge features" (i.e., links between pages are represented as edges in the URL graph 110 described below).  Examples of edge features that may be considered can include whether the hyperlink between two pages is an inter-website link or an intra-website link.  Other edge features may include the number of real or separate hyperlinks between the two pages, and so forth.  The edge features can be attached to or associated with the two pages involved, and included as the features 108 or 128 for those pages.”
train a document scoring model utilizing machine learning to score documents using the set of joined features (Wang [0015]: For example, a training set of pages may be sorted into labeled groups based on gathered user behavior data.  Some labeled groups may be assigned a higher priority for being selected for indexing than other labeled groups.  Further, there may be multiple sources of data that can be used to define and determine appropriate labels or classifications for the sorted groups of pages.  Sources of label data may include user behavior information, such as click information, sampled queries and results, bookmark data, relevance data, spam data, abandoned queries, and so forth.  The labels for the groups may be defined based on information combined from the multiple sources to generate a label graph.  The label graph may be a directed graph that represents relative priority of each labeled group for selection for indexing.).
score documents with document scores using the document scoring model based upon the content features and the set of joined features with document scores obtained during training (Wang [0032]: In label graph 200, hierarchical selection priority relationships are established for the various different types of labeled groups established for the crawled training pages 120 obtained from the set of training URLs 114.  For example, the training URLs 114 and crawled training pages 120 may be cross-referenced with the user behavior data 122 for sorting the crawled training page 120 into the labeled groups and for establishing the label graph 200.  In the label graph 200, the clicked top-1 pages 214 have the highest priority for selection, as indicated by the edges 220 outbound to unclicked top-1 pages 216 and clicked top-10 pages 218.  Further, clicked top-10 pages 218 have a higher priority than clicked top-1000 pages 222 or unclicked top-10 pages 224, and both of these have a higher priority for selection than unclicked top-1000 pages 226.  In addition, good abandonment pages 228 also have a higher priority than clicked top-1000 pages 222 or unclicked top-10 pages 224, while not-in-index pages 230 have a higher priority than the unclicked top-1000 pages 226.  Highly bookmarked pages 232 and the unclicked top-1000 pages 226 have a higher priority than other unclicked pages 234 and spam or junk pages 236.); and 
selectively index a subset of the documents based upon the document scores of the documents (Wang [0065]: At block 610, the page selection component 102 selects, for indexing, a subset of the crawled web pages 126 based on the URL graph 110, the extracted features 128 and the model 112.  Thus, when performing the selecting, the model 112 takes into consideration the links between URLs, the features of particular pages, and an established priority hierarchy for various types of user behavior with respect to the pages.). 
It would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to incorporate the teaching of Wong et al. to the Wang’s system by adding extracted features. The references (Wong and Wang) teach features that are analogous art and they are directed to the same field of endeavor, such as data indexing. Ordinary skilled artisan would have been motivated to do so to provide Wong’s system with enhanced machine learning. (See Wang [Abstract], [0015], [0032], [0039 – 0040], [0065]). One of the biggest advantages of machine learning algorithms is their ability to improve over time. Machine learning technology typically improves efficiency and accuracy thanks to the ever-increasing amounts of data that are processed. 

Regarding claim 20, Wong teaches, a computing device comprising: 
a memory comprising instructions (Wong [0019]: memory); and 
a processor coupled to the memory, the processor configured to execute the instructions to cause the processor to (Wong [0016]: multiprocessor): 
combine feature types within one or more levels of feature sets into a joined host level feature set (Wong [0090]: In practice, an overall score can be generated using any combination of the metrics described above (and, in some embodiments, in addition to other suitable metrics).  For this example, the overall score is calculated in response to the domain density score, the anchor text score, the URL string score, and the category need score, and the overall score is also influenced by the link proximity score.  In one embodiment, a "combined" score is generated from the domain density, anchor text, URL string, and category need scores, and that combined score is adjusted using the link proximity score to obtain the downloading priority.); 
extract numerical features and content features from ground truth documents and random documents (Wong [0106]: As mentioned above in connection with the anchor text metric, task 308 may be performed to extract words from the anchor text of the URL and, for each extracted word, calculate a respective hash value that serves as an anchor text token.  Moreover, task 308 may be performed to extract words from the anchor text, identify at least one combination of extracted words, and, for each combination of words, calculate a respective hash value that serves as an anchor text token.  Likewise, process 300 may derive one or more URL string tokens from the character string of the URL (task 310).  As mentioned above in connection with the URL string metric, task 310 may be performed to extract strings from the URL and, for each extracted string, calculate a respective hash value that serves as a URL string token.  Moreover, task 310 may be performed to extract strings from the URL, identify at least one combination of extracted strings, and, for each combination of strings, calculate a respective hash value that serves as a URL string token.); 
Wong does not clearly teach, join the numerical features with the one or more levels of feature sets to create a set of joined features for the ground truth documents and the random documents; However, Wang [0039 – 0040] teaches, “Document features that may be used to determine the value of a page for indexing include page length, topics of the page, number of ads in the page, and the like.  … Additionally, characteristics or attributes of the links between pages can also be considered as features for page selection, referred to herein as "edge features" (i.e., links between pages are represented as edges in the URL graph 110 described below).  Examples of edge features that may be considered can include whether the hyperlink between two pages is an inter-website link or an intra-website link.  Other edge features may include the number of real or separate hyperlinks between the two pages, and so forth.  The edge features can be attached to or associated with the two pages involved, and included as the features 108 or 128 for those pages.”
train a document scoring model utilizing machine learning to score documents using the set of joined features (Wang [0015]: For example, a training set of pages may be sorted into labeled groups based on gathered user behavior data.  Some labeled groups may be assigned a higher priority for being selected for indexing than other labeled groups.  Further, there may be multiple sources of data that can be used to define and determine appropriate labels or classifications for the sorted groups of pages.  Sources of label data may include user behavior information, such as click information, sampled queries and results, bookmark data, relevance data, spam data, abandoned queries, and so forth.  The labels for the groups may be defined based on information combined from the multiple sources to generate a label graph.  The label graph may be a directed graph that represents relative priority of each labeled group for selection for indexing.);
score documents with document scores using the document scoring model based upon the content features and the set of joined features with document scores obtained during training (Wang [0032]: In label graph 200, hierarchical selection priority relationships are established for the various different types of labeled groups established for the crawled training pages 120 obtained from the set of training URLs 114.  For example, the training URLs 114 and crawled training pages 120 may be cross-referenced with the user behavior data 122 for sorting the crawled training page 120 into the labeled groups and for establishing the label graph 200.  In the label graph 200, the clicked top-1 pages 214 have the highest priority for selection, as indicated by the edges 220 outbound to unclicked top-1 pages 216 and clicked top-10 pages 218.  Further, clicked top-10 pages 218 have a higher priority than clicked top-1000 pages 222 or unclicked top-10 pages 224, and both of these have a higher priority for selection than unclicked top-1000 pages 226.  In addition, good abandonment pages 228 also have a higher priority than clicked top-1000 pages 222 or unclicked top-10 pages 224, while not-in-index pages 230 have a higher priority than the unclicked top-1000 pages 226.  Highly bookmarked pages 232 and the unclicked top-1000 pages 226 have a higher priority than other unclicked pages 234 and spam or junk pages 236.); and 
selectively index a subset of the documents based upon the document scores of the documents (Wang [0065]: At block 610, the page selection component 102 selects, for indexing, a subset of the crawled web pages 126 based on the URL graph 110, the extracted features 128 and the model 112.  Thus, when performing the selecting, the model 112 takes into consideration the links between URLs, the features of particular pages, and an established priority hierarchy for various types of user behavior with respect to the pages.).
It would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to incorporate the teaching of Wong et al. to the Wang’s system by adding extracted features. The references (Wong and Wang) teach features that are analogous art and they are directed to the same field of endeavor, such as data indexing. Ordinary skilled artisan would have been motivated to do so to provide Wong’s system with enhanced machine learning. (See Wang [Abstract], [0015], [0032], [0039 – 0040], [0065]). One of the biggest advantages of machine learning algorithms is their ability to improve over time. Machine learning technology typically improves efficiency and accuracy thanks to the ever-increasing amounts of data that are processed. 


Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant’s disclosure.
Doyle, US 2020/0125639, Generating Training Data from a machine learning model to identify offensive language
Zheng, US 2009/0248668, Learning Ranking functions incorporating isotonic regression for information retrieval and ranking
Cossock, US 7,197,497, Method and Apparatus for machine learning a document relevance function
Doyle, US 2020/0126533, Machine Learning Model for identifying offensive, computer-generated natural language text or speech

Any inquiry concerning this communication or earlier communications from the examiner should be directed to SABA AHMED whose telephone number is (571)270-0236.  The examiner can normally be reached on MON – FRI: 9AM – 5PM EST.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Hosain Alam can be reached on 571-272-3978. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/SABA AHMED/
Examiner, Art Unit 2154


/HOSAIN T ALAM/Supervisory Patent Examiner, Art Unit 2154