Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claim(s) 1-4, 7, 10 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Johns (US 2013/0268526).
Johns discloses:
(Previously Presented) A computing system, comprising:
at least one processor (Fig. 1);
memory, operatively connected to the at least one processor and containing including instructions that, when executed by the at least one processor, perform a method, the method to at least:
receive first and second records;
normalize the first and second records (does not specify exactly how records are normalized, and reads on e.g., normalizing counts of tokens, scores or documents “normalize the counts such ten occurrences of a word in a single page document could be normalized to be equivalent to fifty occurrences of the same word in a five page document”, 0057; “Because the process defines that a similarity score is calculated for every document in the dataset, that total set of similarity scores can be used to normalize each of those similarity scores to something comparable across datasets. Given extremely normalized similarity scores, a given designated document can yield a useful single set of similar documents derived from multiple datasets, by applying the condition that a given normalized similarity score is beyond some standard threshold.”, 0066, 0094, “normalizing the counts of the tokens between the related document and all documents in the source”, 0088) to obtain corresponding first and second entity tokens (comparing tokens, e.g., “For any given document of text (or in some cases simply text typed in by a user or input from multiple documents), a similarity calculation determines for each other document in the dataset a numeric similarity score. The computing process to determine the similarity score involves, with the possible aid of the statistics calculated during the computing process, comparing each token's count in a designated document (or text) to its matching token's count in each other document in the dataset. For a given token, the magnitude of token counts between two documents has a directly proportional contribution to the magnitude of the similarity score (i.e. the closer the token counts are for each token included in two compared documents, the more significant the contribution to improving the similarity score).”, 0062-0068; “This approach helps create consistency when terms are being counted and compared between any two documents. The computing process must also determine tokens in each document. For example, each word in a document can be considered a token.”, 0052);
	determine, based on at least the first and second entity tokens a match result indicative of whether the first and second records resolve to a same item (0061-0068, e.g., “Whenever the step of determining a numeric value that measures the magnitude of each token's significance in the dataset is utilized in the computing process, a given token's value of significance has a directly proportional contribution to the magnitude of the similarity score between two documents. In other words, if a particular token's value of significance is high, then the closeness of that particular token's count between documents is of increased importance in the similarity calculation between those documents. In the preferred embodiment, this means that if a given token count is exactly the same in a designated document and a compared document, the higher the value of significance token, the more favorable impact that exact token match will have in the similarity calculation between the documents”, 0064); and
	in response to the match result indicating that the first and second records do not resolve to the same item (2 or more records can be similar or the same based on comparisons, token counts, similarity scores), obtain a search result, based on at least one of the first and second entity tokens, the search result being added as a record to be processed by the computing system (adding a similar but not duplicate documents to a classification/cluster/category and/or searching on one or more records, “collecting any of the related documents into a collection; permitting at least one related document returned to be selected for a further search utilizing the content of the at least one related document as the search criteria in a selected source to return additional related documents”, 0046;
“By conducting such a similarity calculation for all documents in a dataset, the top N most similar documents or least similar documents to a designated document or text can then easily be obtained. A given similarity score is consistently comparable to any other similarity score in the dataset, but it may not be comparable to a similarity score calculated by passing the designated document through some other entirely different dataset. Because the process defines that a similarity score is utilizing the text of the related document as the search terms/criteria in a selected source”, 0089).

2.	(Original) The computing system of claim 1 and further comprising:
a record update component that receives the match result and updates a record of items based on the match result (collections of records can be updated automatically of manually by a user adding a document to a collection, “collecting any of the related documents into a collection; 
permitting at least one related document returned to be selected for a further search utilizing the content of the at least one related document as the search criteria in a selected source to return additional related documents”, 0046; updating search results and adding or not adding a record to a cluster/classified groups 0002, 0005 or see “permitting the related document returned to be selected for a further search utilizing the text of the related document as the search terms/criteria in a selected source”, 0089; searching on one or .

3.	(Previously Presented) The computing system of claim 2 wherein the record update component is configured to, in response to the match result indicating that the first and second records resolve to the same item, aggregate a superset of the attributes (attributes are not further defined and read on any criteria or statistics pertaining to a record/document; aggregating queries, or records comprising words/tokens, “with the possible aid of the statistics calculated during the computing process, comparing each token's count in a designated document (or text) to its matching token's count in each other document in the dataset.”, 0062;
exact token match will have in the similarity calculation between the documents”, 0064) in the first and second records and update the same item in the record of items based on the superset of attributes (updating search results and adding a record to a cluster/classified groups 0002, 0005 or see “permitting the related document returned to be selected for a further search utilizing the text of the related document as the search terms/criteria in a selected source”, 0089; 
searching on one or more records, “By conducting such a similarity calculation for all documents in a dataset, the top N most similar documents or least similar documents to a designated document or text can then easily be obtained. A given similarity score is consistently comparable to any other similarity score in the dataset, but it may not be comparable to a similarity score calculated by passing the designated document through some other entirely different dataset. Because the process defines that a similarity score is calculated for every document in the dataset, that total set of similarity scores can be used to normalize each of those similarity scores to something comparable across datasets”, 0067).

4.	(Previously Presented) The computing system of claim 3 wherein the method further comprises accessing a set of previously learned resolutions (reads on similarity scores, classifications, “different researchers may search different databases attempting to find the same type of documents. In other words, two different researchers may think that a given document they are searching for should be contained in two different databases due to their own notions of the proper categorization of the searched for document”, 0005;
“permitting the related document returned to be selected for a further search utilizing the text of the related document as the search terms/criteria in a selected source”, 0089) to identify whether a previously learned resolution indicates that the first and second entity tokens resolve to the same item (“different researchers may search different databases attempting to find the same type of documents. In other words, two different researchers may think that a given document they are searching for should be contained in two different databases due to their own notions of the proper categorization of the searched for document”, 0005; if a particular token's value of significance is high, then the closeness of that particular token's count between documents is of increased importance in the similarity calculation between those documents. In the preferred token count is exactly the same in a designated document and a compared document, the higher the value of significance for that particular token, the more favorable impact that exact token match will have in the similarity calculation between the documents”, 0064).

7.	(Previously Presented) The computing system of claim 1 and further comprising:
a partition component that receives an input record set and that partitions the input record set into blocks based on partitioning criteria (reads on identifying words, stop words or keywords in a record or parsing, see e.g., “parsing out the relevant text from any markup in a document and all documents in the source”, 0067; “The computing process may also transform phrases into individual tokens by, for example, taking a multiword phrase and making it into a single token in all documents.”, 0055).

10.	(Previously Presented) The computing system of claim 1 wherein each of the first and second records includes an attribute of the subject of the record (reads on records may be e.g., documents, words, sentences, categories, search terms, keywords, stop words, “calculating other statistics that apply to each token in the document and all documents in the source; statistics; and determining a numeric value that measures the magnitude of each token's significance in the source”, 0067; “permitting the related document returned to be selected for a further search utilizing the text of the related document as the search terms/criteria in a selected source”, 0089). 

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim 5-6, 8-9, 11-13 are rejected under 35 U.S.C. 103 as being unpatentable over Johns, as set forth above in view of Simard (US 2015/0019463).

5.	(Original) The computing system of claim 4 and further comprising:
a supervised machine learning system that provides the set of previously learned resolutions.
6.	(Previously Presented) The computing system of claim 5 wherein the method further comprises providing the match result to the supervised machine learning system to update the previously learned resolutions (“permitting the related document returned to be selected for a further search utilizing the text of the related document as the search terms/criteria in a selected source”, 0089, 0064).
	Although Johns discloses classifying Johns fails to particularly call for supervised machine learning, as specified in claims 5-6.
	Simard teaches supervised machine learning (“A supervised learning algorithm takes a set of input/output pairs and predicts a function that aims at predicting the output given the input”, 0307; “Classification as used herein is the task of predicting a class label given an input item. For that purpose, use is made of a supervised learning algorithm, which can automatically infer a function that maps an input feature representation to a class label from a set of labeled items, i.e., items for which the correct class has been identified by a human labeler”, 0348)
	It would have been obvious to combine the references before the effective filing date because they are in the same field of endeavor and an SVM/machine learning algorithm can provide an effective and efficient method to classify records into 

8.	(Previously Presented) The computing system of claim 7  wherein the partition component partitions the input record set into blocks (reads on identifying words, stop words or keywords in a record or parsing, see e.g., “parsing out the relevant text from any markup in a document and all documents in the source”, 0067; “The computing process may also transform phrases into individual tokens by, for example, taking a multiword phrase and making it into a single token in all documents.”, 0055) based on geographic location information contained in each record in the input record set (Simard: “With regard to the "months" dictionary, for the word "May," all the instances of "may" throughout the entire corpus might be considered. For each instance of "may," the two words before and the two words after the word "may" might be examined, for example. Based on those four words, a prediction may be made as to whether the word in the middle ("may") is a month or not”, 0257)

9.	(Previously Presented) The computing system of claim 8 and further comprising:
a plurality of different business subsystems, each having a record set, the record sets from the plurality of different business subsystems comprising the input record set.
	Johns discloses a plurality of record sets from a plurality of sources (“different researchers may search different databases attempting to find the same type of documents. In other words, two different researchers may think that a given document they are searching for should be contained in two different databases due to their own notions of the proper categorization of the searched for document”, 0005) but fails to particularly call for the record sets to be from business subsystems.
Simard teaches business subsystems (“A thorough approach may require building a large number of classifiers, corresponding to the various types of information, activities, and products”, abstract).
It would have been obvious to combine the references before the effective filing date because they are in the same field of endeavor and adding business subsystems can include adding searching on different types of business products or categories.  Labeling where the data comes from does not change the algorithm and amounts to non-function descriptive material.  The examiner takes official notice that it is well-known for a business/research organizations to have different departments or 

11.	(Previously Presented) The computing system of claim 10 and further comprising:
a vector generator that generates a similarity vector with vector values corresponding to each attribute in a normalized form, the vector values being indicative of whether the corresponding attributes in a first and second entity tokens match one another (vectors are used in classifying/clustering records Simard: “A feature f.sub.j(i, d) is a vector function of the document d, defined over each of the token position i. The featurization of the document is defined as f(d)=(f.sub.0 (., d), . . . , f.sub.J-1(., d)), where J is the number of individual features.”, 0386; “each token 1412 has a corresponding feature vector 1414 of dimension 2, as illustrated in FIG. 14”, 0419). It would be obvious to use vectors in a classification system to determine how records compare to each other or to a centroid/cluster.

12.	(Previously Presented) The computing system of claim 11 and further comprising:
a threshold component that identifies whether a similarity measure meets a threshold value and, if so, provides the match result to indicate that the first and second records resolve to the same item (classification inherently uses some type of threshold/hyperplane or centroid to decide which category/cluster input data belongs to, “A default setting or user selected setting could be utilized to create a threshold value for a similarity score that must be achieved for a document to be included in the collection or the maximum/minimum number of documents that can be included in the collection”, 0096;
Simard: “Another could come from semantic concepts automatically derived from clustering words based on a large database of documents.”, 0214).

13.	(Previously Presented) The computing system of claim 12 and further comprising:
a weighting component (reads on level of importance or search terms a user selects for a query) that identifies weights for each vector value and generates the similarity measure based on a weighted combination of the vector values (using weights, “Moreover, if a user wanted to alter the weighting assigned to the step of determining a numeric value that measures the magnitude of each token's significance in the dataset to .

Any inquiry concerning this communication or earlier communications from the examiner should be directed to DAVID R VINCENT whose telephone number is (571)272-3080.  The examiner can normally be reached on ~Mon-Fri 12-8:30.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Alexey Shmatov can be reached on 5712703428.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have 






/DAVID R VINCENT/Primary Examiner, Art Unit 2123