Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 29-33, 36-42, 45-48 are rejected under 35 U.S.C. 103 as being unpatentable over Friedlander (US 2008/0082356) in view of Meyer (US 2009/0067756).
Friedlander discloses:
29, 39, 48. (New) A model-assisted selection system for identifying candidates for placement into a cohort (“A cohort is a group of individuals with common characteristics. Frequently, cohorts are used to test the effectiveness of medical treatments”, 0030; “ automatically selecting an optimal control cohort. Attributes are selected based on patient data. Treatment cohort records are clustered to form clustered treatment cohorts. Control cohort records are scored to form potential control cohort members. The optimal control cohort is selected by minimizing differences between the potential control cohort members and the clustered treatment cohorts.”, abstract), the system comprising: 
a data interface (reads on any input/output device or GUI, clients, servers, network or bus, Fig. 1-2); and at least one processing device (client, server, Fig. 1; 206, Fig. 2) programmed to: 
receive, via the data interface, a selection of one or more relevant search terms (performing data mining, searches/queries, “Data mining application 308 searches the records for attributes that most frequently occur in common and groups the related records or members accordingly for display or analysis to the user. This grouping process is referred to as clustering. The results of clustering show the number of detected clusters and the attributes that make up each cluster. Clustering is further described with respect to FIGS. 4A-4B.”, 0037; “Data mining is the process of automatically searching large volumes of data for patterns. Data mining may be further defined as the nontrivial extraction of implicit, previously unknown, and potentially useful information from data. Data mining application 308 uses computational techniques from statistics, information theory, machine learning, and pattern recognition.”, 0036); 
receive, via the data interface, a plurality of records (reads on medical records of patients in or out of a cohort) from a database storing records associated with individuals (receiving records via data mining, “Data mining application 308 may be used to cluster records in feature database 304 based on similar attributes. Data mining application 308 searches the records for attributes that most frequently occur in common and groups the related records or members accordingly for display or analysis to the user. This grouping process is referred to as clustering. The results of clustering show the number of detected clusters and the attributes that make up each cluster. Clustering is further described with respect to FIGS. 4A-4B”, 0037; 0036) in a population of individuals (“Patient population records 502 are all records for patients who are potential control cohort members”, 0047; “A control cohort is a group selected from a population that is used as the control”, 0005);
extract, from the plurality of records (e.g., patient records, claim 4 or data mining records, 0036-0037), at least one snippet (reads on any amount of data received in the searching/querying process) based on the one or more relevant search terms and including at least one neighboring term (see Myer) in addition to the one or more relevant search terms (see Myer); 
derive, based on the at least one extracted snippet (reads on any amount of data received in the data mining, searching/querying process, 0036-0037; claim 4), one or more feature vectors associated with the plurality of records (clustering involves using feature vectors, Fig. 5; “Each record is assigned to a single cluster, but by using data mining application 308, a user may determine a record's Euclidean dimensional distance for all cluster prototypes. Clustering is performed for the treatment cohort”, 0040); 
score individuals associated with the plurality of records using the one or more feature vectors (scoring reads on determining distance from centroids/centers of clusters to the present vector, feature database, Fig. 5; 304, Fig. 3; “ Scores are generated based on the distance between each patient record and each of the cluster prototypes. Scores closer to zero have a higher degree of similarity to the cluster prototype. The higher the score, the more dissimilar the record is from the cluster prototype”, 0039; “a user may determine a record's Euclidean dimensional distance for all cluster prototypes. Clustering is performed for the treatment cohort. Clinical test control cohort selection program 310 minimizes the sum of the Euclidean distances between the individuals or members in the treatment cohorts and the control cohort”, 0040; “patient B 416 is scored into the cluster prototype or center of cluster 1 406, cluster 2 408, cluster 3 410 and cluster 4 412. A Euclidean distance between patient B 416 and cluster 1 406, cluster 2 408, cluster 3 410 and cluster 4 412 is shown. In this example, distance 1 426, separating patient B 416 from cluster 1 406, is the closest. Distance 3 428, separating patient B 416 from cluster 3 410, is the furthest. These distances indicate that cluster 1 406 is the best fit.”, 0045); and 
determine whether the individuals are candidates for a cohort based on the scoring (“The optimal control cohort is selected by minimizing differences between the potential control cohort members and the clustered treatment cohorts.”, 0008;
“Clinical test control cohort selection program 310 may incorporate an integer programming model, such as integer programming system 806 of FIG. 8. This program may be programmed in International Business Machine Corporation products, such as Mathematical Programming System extended (MPSX), the IBM Optimization Subroutine Library, or the open source GNU Linear Programming Kit. The illustrative embodiments minimize the summation of all records/cluster prototype Euclidean distances from the potential control cohort members to select the optimum control cohort.”, 0040; “ Scores are generated based on the distance between each patient record and each of the cluster prototypes. Scores closer to zero have a higher degree of similarity to the cluster prototype. The higher the score, the more dissimilar the record is from the cluster prototype”, 0039; “a user may determine a record's Euclidean dimensional distance for all cluster prototypes. Clustering is performed for the treatment cohort. Clinical test control cohort selection program 310 minimizes the sum of the Euclidean distances between the individuals or members in the treatment cohorts and the control cohort”, 0040;
“patient B 416 is scored into the cluster prototype or center of cluster 1 406, cluster 2 408, cluster 3 410 and cluster 4 412. A Euclidean distance between patient B 416 and cluster 1 406, cluster 2 408, cluster 3 410 and cluster 4 412 is shown. In this example, distance 1 426, separating patient B 416 from cluster 1 406, is the closest. Distance 3 428, separating patient B 416 from cluster 3 410, is the furthest. These distances indicate that cluster 1 406 is the best fit.”, 0045).
Friedlander fails to particularly call for using neighboring terms, search terms and records associated with individuals in a population.
Myer teaches using search terms and records associated with individuals in a population (“identifying the most probable correct candidate word can be achieved using each candidate word as a search arguments in an Internet search engine (by using an API, for example), and the measured number of hits from each word forms basis for deciding the most probable version of the word.”, 0014; “able to search the documents electronically (medical records for example, key word searching etc., electronic catalogues, databases with historical documents and information etc.),”, 0002; 0015-0016) and neighboring terms (one term/word/phrase can be used to in a query or a plurality of terms can be used along with words before and/or after the main search term or keyword, Myer: “whenever the measurement of hits provides a stalemate between candidates, for example an equal number of hits between two candidates, the candidate words are first combined with the previous word relative to the uncertain word under investigation, and then the combined words are used as search argument on the Internet, secondly the at least one succeeding word relative from the word under investigation on the same text line is used in a similar manner. Further, a combination of the at least one previous word, the word under investigation and the at least one succeeding word is also used as a search argument. The number of hits from each combination is used in a confirmation process to decide the most probable version of the words”, 0015-0016).
It would have been obvious to combine the references before the effective filing date because they are in the same field of endeavor and it is well known to perform data mining or searching using one or more words or phrases.  By considering search term before and after the main keyword used in searching, one can further define the search when searching for patients in a population to consider for a cohort.  Each word or phrase can receive its own vector when clustering to determine closeness to the centroid of the cluster.

30, 40. (New) The system of claim 29, wherein the selection of the one or more relevant search terms is input to the data interface by one or more users (reads on any input/output device or GUI, clients, servers, network or bus, Fig. 1-2 operated by a user, person(s) “Data mining application 308 may be used to cluster records in feature database 304 based on similar attributes. Data mining application 308 searches the records for attributes that most frequently occur in common and groups the related records or members accordingly for display or analysis to the user. This grouping process is referred to as clustering. The results of clustering show the number of detected clusters and the attributes that make up each cluster. Clustering is further described with respect to FIGS. 4A-4B”, 0037). 

31. (New) The system of claim 29, wherein the selection of the one or more relevant search terms is retrieved by the data interface from one or more storage media (data mining/searching data is input by users and patient records, clusters, vectors, search terms are stored, Fig. 3-9).

32, 41. (New) The system of claim 29, wherein the at least one snippet includes one or more additional terms after the one or more relevant search terms (one term/word/phrase can be used to in a query or a plurality of terms can be used along with words before and/or after the main search term or keyword, Myer: “whenever the measurement of hits provides a stalemate between candidates, for example an equal number of hits between two candidates, the candidate words are first combined with the previous word relative to the uncertain word under investigation, and then the combined words are used as search argument on the Internet, secondly the at least one succeeding word relative from the word under investigation on the same text line is used in a similar manner. Further, a combination of the at least one previous word, the word under investigation and the at least one succeeding word is also used as a search argument. The number of hits from each combination is used in a confirmation process to decide the most probable version of the words”, 0015-0016).

33, 42. (New) The system of claim 29, wherein the at least one snippet includes one or more additional terms before the one or more relevant search terms (one term/word/phrase can be used to in a query or a plurality of terms can be used along with words before and/or after the main search term or keyword, Myer: “whenever the measurement of hits provides a stalemate between candidates, for example an equal number of hits between two candidates, the candidate words are first combined with the previous word relative to the uncertain word under investigation, and then the combined words are used as search argument on the Internet, secondly the at least one succeeding word relative from the word under investigation on the same text line is used in a similar manner. Further, a combination of the at least one previous word, the word under investigation and the at least one succeeding word is also used as a search argument. The number of hits from each combination is used in a confirmation process to decide the most probable version of the words”, 0015-0016).

36, 45. (New) The system of claim 29, wherein at least one value along at least one dimension of the one or more feature vectors depends on the at least one neighboring term (using a plurality of keywords or phrases when data mining, Figs. 3-9; 
one or more terms/words/phrases can be used in a query along with words before and after the main search term or keyword; Myer: “whenever the measurement of hits provides a stalemate between candidates, for example an equal number of hits between two candidates, the candidate words are first combined with the previous word relative to the uncertain word under investigation, and then the combined words are used as search argument on the Internet, secondly the at least one succeeding word relative from the word under investigation on the same text line is used in a similar manner. Further, a combination of the at least one previous word, the word under investigation and the at least one succeeding word is also used as a search argument. The number of hits from each combination is used in a confirmation process to decide the most probable version of the words”, 0015-0016). 

37, 46. (New) The system of claim 29, wherein at least one value along at least one dimension of the one or more feature vectors depends on a number of instances of the one or more relevant search terms in the plurality of records (Myer: “identifying the most probable correct candidate word can be achieved using each candidate word as a search arguments in an Internet search engine (by using an API, for example), and the measured number of hits from each word forms basis for deciding the most probable version of the word.”, 0014).

Claim Rejections - 35 USC § 103
Claims 34-35, 43-44 are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Friedlander and Myer in view of Allen (US 2016/0148114). The combination fails to particularly call for structured data

34, 43. (New) The system of claim 29, wherein the at least one processing device is further programmed to extract structured information from the plurality of records.

35, 44. (New) The system of claim 34, wherein the one or more feature vectors are further derived based on the extracted structured information.
	Allen teaches structured data (“The corpus ingestion logic 410 operates on a training corpus of information 450 to read and process historical data within the corpus of information 450 (both structured and unstructured data), i.e. content within the corpus of information that is associated with dates/times, so as to associate the content with the particular dates/times. This process may involve analyzing the structured content and/or natural language statements within the corpus of information to identify textual patterns of content and structured fields that specify dates/times and then associating the corresponding textual content with those dates/times. Moreover, with regard to unstructured content, a creation date/time of the unstructured content in the corpus may be used to associate a date/time with the unstructured content.”, 0083; “the patient medical record may comprise a plurality of structured fields for specifying the patient's name, date of birth, address, occupation, contact information, answers to health status questionnaire, and other standard information used to identify the patient and identify a condition of the patient. In addition, the patient medical record may comprise free-form text areas where notes may be included in the patient medical record by medical professionals, e.g., nurses, doctors, medical technicians, lab personnel, and the like, in a natural language manner”, 0103).
	It would have been obvious to combine the references before the effective filing data because they are in the same field of endeavor and by using structured and unstructured data the searches can involve both set fields and text data as well as natural language data.
Claim Rejections - 35 USC § 103
Claims 38,47 are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Friedlander and Myer in view of Rawlings (US 2007/0174252). The combination fails to particularly call for normalizing
38, 48. (New) The system of claim 37, wherein the at least one value depends on a normalized representation of the number of instances.
	Rawlings teaches normalizing data (“ These types of calculations may be used to effectively normalize data among cities with vastly different populations--i.e., having 200 "hits" in a large city may actually indicate that it would be a more difficult recruiting region than a significantly smaller city having the same number of hits. ”, 0127).
 	It would have been obvious to combine the references before the effective filing data because they are in the same field of endeavor and by normalizing data to achieve better results in varying size populations and/or groups of patients.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to DAVID R VINCENT whose telephone number is (571)272-3080. The examiner can normally be reached ~Mon-Fri 12-8:30.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Alexey Shmatov can be reached on 5712703428. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/DAVID R VINCENT/Primary Examiner, Art Unit 2123