DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application is being examined under the pre-AIA  first to invent provisions. 

Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:

(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 

This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation(s) is/are: “an ingestion engine to generate a tokenized data source” in claim 1.
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) 

Claim Rejections - 35 USC § 112(b)
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claim limitation “an ingestion engine to generate a tokenized data source” in claim 1 invokes 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. However, the written description fails to disclose the corresponding structure, material, or acts for performing the entire claimed function and to clearly link the structure, material, or acts to the function. The phrase “ingestion engine” does not appear in the original disclosure outside of claim 1. No defined structure exists that is clearly associated with this phrase.  Therefore, the claim is indefinite and is rejected under 35 U.S.C. 112(b) or pre-AIA  35 U.S.C. 112, second paragraph.
Applicant may:
(a)        Amend the claim so that the claim limitation will no longer be interpreted as a limitation under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph; 
(b)        Amend the written description of the specification such that it expressly recites what structure, material, or acts perform the entire claimed function, without introducing any new matter (35 U.S.C. 132(a)); or 

If applicant is of the opinion that the written description of the specification already implicitly or inherently discloses the corresponding structure, material, or acts and clearly links them to the function so that one of ordinary skill in the art would recognize what structure, material, or acts perform the claimed function, applicant should clarify the record by either: 
(a)        Amending the written description of the specification such that it expressly recites the corresponding structure, material, or acts for performing the claimed function and clearly links or associates the structure, material, or acts to the claimed function, without introducing any new matter (35 U.S.C. 132(a)); or 
(b)        Stating on the record what the corresponding structure, material, or acts, which are implicitly or inherently set forth in the written description of the specification, perform the claimed function. For more information, see 37 CFR 1.75(d) and MPEP §§ 608.01(o) and 2181.

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

A person shall be entitled to a patent unless –

(b) the invention was patented or described in a printed publication in this or a foreign country or in public use or on sale in this country, more than one year prior to the date of application for patent in the United States.

Claim(s) 1-3, 5-7, 9-11, 13-14 is/are rejected under pre-AIA  35 U.S.C. 102(b) as being anticipated by Cetintemel et al. (hereinafter Cetintemel), Self-Adaptive User Profiles for Large-Scale Data Delivery.
Regarding Claim 1, Cetintemel discloses a system comprising:
one or more processors [“push-based WWW page dissemination” Abstract; Note: push-based WWW page dissemination requires a computer which has a processor]; 
a data repository having stored thereon a plurality of documents [“Internet” pg. 1, col. 1, line 27]; 
a feature extractor, under control of the one or more processors, to identify data features from the plurality of documents stored on the data repository [“a term is a word that exists in the document” pg. 3, col. 1, line 22]; 
an ingestion engine to generate a tokenized data source from the plurality of documents stored on the data repository [“each document is represented as a vector of term and weight pairs” pg. 3, col. 1, lines 18-19]; 
a memory having stored thereon a set of instructions [“push-based WWW page dissemination” Abstract; Note: push-based WWW page dissemination requires a computer which has memory and instructions] that when executed by the one or more processors causes the system to: 

present, to one or more analysts, a first source datum and a second source datum from the plurality of data elements based at least in part on the identifiers within the data cluster [“Relevance feedback” §2.2, lines 1-3]; 
score the first source datum and the second source datum based at least in part on relevance of the first source datum or the second source of datum to a substantive topic [“Users provide feedback to the system about the data items” pg. 2, col. 1, lines 30-31]; 
compare scores of the first source datum to the score of the second source datum [“incremental feedback can update a query (or profile) for each individual document judgment” pg. 3, col. 2, lines 24-27]; and 
providing a computer-based discovery avatar for discovering content within the data repository, wherein activity of the computer-based discovery avatar is optimized based on the comparison of scores [“In order to effectively target the right information to the right people, push-based systems rely upon user profiles that indicate the general information types (but not necessarily the specific data items) that a user is interested in receiving. For users, profiles are a means of passively retrieving relevant information.” pg. 1, col. 1-2, lines 31-3].

Regarding Claim 2, Cetintemel discloses the system of claim 1.  Cetintemel further discloses wherein the feature extractor identifies the data features from the plurality of documents using one or more of the following: 1) a natural language processor [“term and weight pairs” pg. 3, col. 1, line 19], 2) k- means, or 3) latent Dirichlet allocation and topic modeling.

Regarding Claim 3, Cetintemel discloses the system of claim 1.  Cetintemel further discloses wherein the ingestion engine uses white space tokenization to create the tokenized data source [“a term is a word” pg. 3, col. 1, line 22].

Regarding Claim 5, Cetintemel discloses a method comprising:
analyzing extracted data features from a tokenized data source to identify a data cluster, wherein the data cluster includes a plurality of data elements associated with identifiers and extracted data features that share an attribute [“profile can also be represented as a vector (or a collection of vectors), which can be derived from the previously judged document vectors.” pg. 3, col. 1, lines 10-12; Figures 1, 2; “clustering document vectors” pg. 4, line 1]; 
presenting, to an analyst, a first source datum for review, from the plurality of data elements from the tokenized data source, based at least in part on the identifiers within the data cluster [“Relevance feedback” §2.2, lines 1-3];

presenting, to the analyst, a second source datum from the plurality of data elements from the data cluster, wherein the second source datum is selected based at least in part on the identifiers within the data cluster [“Relevance feedback” §2.2, lines 1-3]; 
scoring the second source datum based at least in part on relevance of the second source datum to the substantive topic  [“Users provide feedback to the system about the data items” pg. 2, col. 1, lines 30-31]; 
comparing the score of the first source datum to the score of the second source datum [“incremental feedback can update a query (or profile) for each individual document judgment” pg. 3, col. 2, lines 24-27]; and 
generating a computer-based discovery avatar by optimizing a mathematical model based at least in part on the comparison of scores [“In order to effectively target the right information to the right people, push-based systems rely upon user profiles that indicate the general information types (but not necessarily the specific data items) that a user is interested in receiving. For users, profiles are a means of passively retrieving relevant information.” pg. 1, col. 1-2, lines 31-3].

Regarding Claim 6, Cetintemel discloses the method of claim 5.  Cetintemel further discloses further comprising identifying, using a natural language processor, the 

Regarding Claim 7, Cetintemel discloses the method of claim 5.  Cetintemel further discloses  further comprising:
creating the tokenized data source by tokenizing source data, wherein the tokenized data source is based at least in part on white space tokenization [“a term is a word” pg. 3, col. 1, line 22]; and 
vectorizing the tokenized data source [“each document is represented as a vector of term and weight pairs” pg. 3, col. 1, lines 18-19].

Regarding Claim 9, Cetintemel discloses the method of claim 5.  Cetintemel further discloses further comprising using a custom feature list to generate the extracted data features [“a term is a word” pg. 3, col. 1, line 22; Note: Using only words from a document can be a custom feature list.].

Regarding Claim 10, Cetintemel discloses the method of claim 5.  Cetintemel further discloses wherein the data cluster is a first data cluster in a plurality of data clusters identified by analyzing the extracted data features [“profile vector” p1, p2, p3 in Figure 1] and the method further comprising selecting an object within each of the plurality of data clusters having a largest magnitude for presentation to the analyst [“cluster representative” pg. 4, col. 1, lines 8-9; “threshold” pg. 4, col. 1, line 7].



Regarding Claim 13, Cetintemel discloses the method of claim 5.  Cetintemel further discloses further comprising:
selecting a second data source [“WWW pages” Abstract]; and 
creating, using the computer-based discovery avatar, a second set of data clusters from the second data source [“Internet users” pg. 1, col. 1, line 27; Note: several users would each have their own user profiles].

Regarding Claim 14, Cetintemel discloses the method of claim 5.  Cetintemel further discloses wherein the computer-based discovery avatar is deployed for use on a plurality of data sources [“WWW pages” Abstract] to create a plurality of data clusters that are scored and used to rank each of the plurality of data sources [“score and then rank-order a collection of documents based on their likelihood of relevance to a particular profile.” §4.3, lines 3-5] according to relevance to the substantive topic [“Yahoo! categories” Abstract].

Claim Rejections - 35 USC § 103
The following is a quotation of pre-AIA  35 U.S.C. 103(a) which forms the basis for all obviousness rejections set forth in this Office action:
(a) A patent may not be obtained though the invention is not identically disclosed or described as set forth in section 102, if the differences between the subject matter sought to be patented and the prior art are such that the subject matter as a whole would have been obvious at the 

Claims 4, 12 is/are rejected under pre-AIA  35 U.S.C. 103(a) as being unpatentable over Cetintemel in view of Hsu et al. (hereinafter Hsu), A Practical Guide to Support Vector Classification.
Regarding Claim 4, Cetintemel discloses the system of claim 1.  Cetintemel further discloses wherein the computer-based discovery avatar is a first computer- based discovery avatar based on a first mathematical model [“profile vector” Figure 1] and a set of instructions that when executed by the one or more processors causes the system to:
create a second computer-based discovery avatar by optimizing a second mathematical model [“Incorporating a document vector into a profile vector” Figure 1; Note: By updating the existing profile vector with a new document vector a new updated profile vector is created.]; 
create a cross-trained mathematical model incorporating a second attribute from the first mathematical model inherent in the first computer-based discovery avatar within the second computer-based discovery avatar [“incorporating a new document vector into a profile vector” pg. 4, col. 2, lines 20-21; Note: the updated model uses the attributes of the original model as a starting point before updating.].
However, Cetintemel fails to explicitly disclose validate the cross-trained mathematical model by deploying the second computer-based discovery avatar on the tokenized data source.

It would have been obvious to one having ordinary skill in the art, having the teachings of Cetintemel and Hsu before him at the time of invention, to modify the data management system of Cetintemel to incorporate validation of Hsu.
Given the advantage of validating a model to ensure model accuracy, one having ordinary skill in the art would have been motivated to make this obvious modification.

Regarding Claim 12, Cetintemel discloses the method of claim 5.
However, Cetintemel fails to explicitly disclose wherein the computer-based discovery avatar categorizes the tokenized data source based at least in part on use of support vector machines.
Hsu discloses wherein the computer-based discovery avatar categorizes the tokenized data source based at least in part on use of support vector machines [“support vector machine (SVM) is a popular classification technique” Abstract; Table I].
It would have been obvious to one having ordinary skill in the art, having the teachings of Cetintemel and Hsu before him at the time of invention, to modify the method of Cetintemel to incorporate SVMs of Hsu for classification.
Given the advantage of ease of classification by using a useful, well-tested, and well-known classification technique, one having ordinary skill in the art would have been motivated to make this obvious modification.

Claim 8 is/are rejected under pre-AIA  35 U.S.C. 103(a) as being unpatentable over Cetintemel in view of Chi et al. (hereinafter Chi), U.S. Patent Application Publication 2003/0018636.
Regarding Claim 8, Cetintemel discloses the method of claim 5.
However, while Cetintemel discloses clustering in general, Cetintemel fails to explicitly disclose wherein analyzing the extracted data features from the tokenized data source to identify the data cluster is based on k-means clustering or latent Dirichlet allocation (LDA) and topic modeling.
Chi discloses wherein analyzing the extracted data features from the tokenized data source to identify the data cluster is based on k-means clustering [“Multi-model clustering” and “using a type of multi-modal clustering such as K-means” Abstract] or latent Dirichlet allocation (LDA) and topic modeling.
It would have been obvious to one having ordinary skill in the art, having the teachings of Cetintemel and Chi before him at the time of invention, to modify the clustering of Cetintemel to incorporate K-means clustering of Chi.
Given the advantage of a substitution of the known element of K-means clustering to obtain predictable results and the use of a known technique of K-means clustering to improve the combination in a similar way, one having ordinary skill in the art would have been motivated to make this obvious modification.

Claim 15 is/are rejected under pre-AIA  35 U.S.C. 103(a) as being unpatentable over Cetintemel.
Regarding Claim 15, Cetintemel discloses the method of claim 14.

Therefore, it would have been obvious to a person of ordinary skill in the art at the time of filing the application to use various data in the method because such data does not functionally relate to the steps in the method claimed and because the subjective interpretation of the data does not patentably distinguish the claimed invention.

Claims 16, 18, 19 is/are rejected under pre-AIA  35 U.S.C. 103(a) as being unpatentable over Cetintemel in view of Stading, U.S. Patent Application Publication 2008/0244429.
Regarding Claim 16, Cetintemel discloses a method comprising:
analyzing, using a natural language processor [“term and weight pairs” pg. 3, col. 1, line 19], data features extracted from a data source to determine a one or more data clusters [“clustering document vectors” pg. 4, col. 1, line 1; “profile vector” pg. 4, col. 1, line 17], wherein the one or more data clusters include extracted data features that 
presenting, to one or more analysts, data elements from the one or more data clusters for review and scoring, wherein the data elements from the one or more data clusters are selected based at least in part on the identifiers relating to the super-set topic [“Users provide feedback to the system about the data items” pg. 2, col. 1, lines 30-31; “Relevance feedback” pg. 3, col. 1, line 47; “incremental feedback” pg. 3, col. 2, line 24]; 
generating a computer-based discovery avatar parent, by optimizing a first mathematical model, based at least in part on a comparison of scores of the data elements provided by the one or more analysts [“profile vector” Figure 1]; 
generating a computer-based discovery avatar child, by optimizing a second mathematical model, based on a second set of extracted data features that share a second attribute that is related to both the super-set topic and a subset topic [“the document vector is incorporated into that cluster and the cluster representative is repositioned” pg. 4, col. 1, lines 7-9; Fig. 1; “Yahoo! categories” Abstract].
However, Cetintemel fails to explicitly disclose generating a graphical user interface to receive, from the one or more analysts, a search request and present, to the one or more analyst, the source data based on the computer-based discovery avatar parent and the computer-based discovery avatar child in response to the search request.

It would have been obvious to one having ordinary skill in the art, having the teachings of Cetintemel and Stading before him at the time of invention, to modify the method of classification of Cetintemel to incorporate the generation of a GUI of Stading.
Given the advantage of providing a way for a user to interact with the system in order to increase its usefulness, one having ordinary skill in the art would have been motivated to make this obvious modification.

Regarding Claim 18, Cetintemel and Stading disclose the method of claim 16.  Cetintemel further discloses wherein the computer-based discovery avatar parent is memorialized and locked from further iterative improvement [“set to 0…active vectors will not be changed with feedback, and virtually no adaptation will take place” pg. 6, col. 1, lines 5-7].

Regarding Claim 19, Cetintemel and Stading disclose the method of claim 16.  Cetintemel further discloses further comprising generating a cross-trained mathematical model based on the first mathematical model and the second mathematical model [“incorporating a new document vector into a profile vector” pg. 4, col. 2, lines 20-21; .

Claims 17 is/are rejected under pre-AIA  35 U.S.C. 103(a) as being unpatentable over Cetintemel and Stading in view of Rennison et al. (hereinafter Rennison), U.S. Patent 6,154,213.
Regarding Claim 17, Cetintemel and Stading disclose the method of claim 16.
However, Cetintemel fails to explicitly disclose wherein the subset topic is defined by terms that are included in a set of terms used to define the super-set topic or by terms that are additive to a set of terms used to define the super-set topic.
Rennison discloses wherein the subset topic is defined by terms that are included in a set of terms used to define the super-set topic or by terms that are additive to a set of terms used to define the super-set topic [“"X is subtopic of Y" (inverse "is supertopic of"): this is a general parent/child relation, where the important defining factor is that more information is specified in the child than in the parent (e.g. a distinguishing feature, another modifying term, etc.).” col. 23, lines 26-32; “The basic idea is that if a document is "about" a subtopic, e.g. jazz, that entails that it is also loosely "about" the supertopic music.” col. 31, lines 36-38].
It would have been obvious to one having ordinary skill in the art, having the teachings of Cetintemel, Stading, and Rennison before him at the time of invention, to modify the combination to incorporate terms used for supertopics and subtopics.
.

Claims 20 is/are rejected under pre-AIA  35 U.S.C. 103(a) as being unpatentable over Cetintemel and Stading in view of Harris, U.S. Patent Application Publication 2003/0187584.
Regarding Claim 20, Cetintemel and Stading disclose the method of claim 16.
However, Cetintemel fails to explicitly disclose wherein scoring the data elements includes dynamically scoring the data elements by assigning a score as a measure of certainty based on a distance between a positioning of a respective data element of the data elements from a hyper-plane.
Harris discloses wherein scoring the data elements includes dynamically scoring the data elements by assigning a score as a measure of certainty based on a distance between a positioning of a respective data element of the data elements from a hyper-plane [“a score for a particular sample is the Euclidean distance from that hyperplane” ¶31, 35].
It would have been obvious to one having ordinary skill in the art, having the teachings of Cetintemel, Stading, and Harris before him at the time of invention, to modify the combination to incorporate the scoring technique of Harris.
Given the advantage of scoring the data to improve classification accuracy, one having ordinary skill in the art would have been motivated to make this obvious modification.

Examiner’s Note
The Examiner respectfully requests of the Applicant in preparing responses, to fully consider the entirety of the reference(s) as potentially teaching all or part of the claimed invention.  It is noted, REFERENCES ARE RELEVANT AS PRIOR ART FOR ALL THEY CONTAIN.  “The use of patents as references is not limited to what the patentees describe as their own inventions or to the problems with which they are concerned.  They are part of the literature of the art, relevant for all they contain.”  In re Heck, 699 F.2d 1331, 1332-33, 216 USPQ 1038, 1039 (Fed. Cir. 1983) (quoting In re Lemelson, 397 F.2d 1006, 1009, 158 USPQ 275, 277 (CCPA 1968)).  A reference may be relied upon for all that it would have reasonably suggested to one having ordinary skill in the art, including non-preferred embodiments (see MPEP 2123).  The Examiner has cited particular locations in the reference(s) as applied to the claim(s) above for the convenience of the Applicant.  Although the specified citations are representative of the teachings of the art and are applied to the specific limitations within the individual claim(s), typically other passages and figures will apply as well.

Conclusion
Applicant is reminded that in amending in response to a rejection of claims, the patentable novelty must be clearly shown in view of the state of the art disclosed by the references cited and the objections made.  Applicant must also show how the amendments avoid such references and objections.  See 37 CFR §1.111(c).  
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ROBERT H BEJCEK II whose telephone number is (571)270-3610.  The examiner can normally be reached on Monday - Friday: 9:00am - 5:00pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Alexey Shmatov can be reached on (571) 270-3428.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/R.B./            Examiner, Art Unit 2123                                                                                                                                                                                            

/ALEXEY SHMATOV/           Supervisory Patent Examiner, Art Unit 2123