DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 12-17 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.  Claims 12 and 13 recite “the plurality of database servers.”  There is insufficient antecedent basis for this term.  It appears as though claims 12 and 13 were intended to depend on claim 10, not claim 9.  For the purposes of examination, claims 12 and 13 are read as depending on claim 10, in order to provide sufficient antecedent basis.  Appropriate correction is required.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art at the time of 

Claim 1-6 and 9-11 is/are rejected under 35 U.S.C. 103 as being unpatentable over Cossock, US 2004/0215606 A1 (hereinafter “Cossock”), in view of Guggilla et al., US 2019/0065991 A1 (hereinafter “Guggilla”).

As per claims 1, 19, and 20, and Cossock teaches:
acquiring, by the server, document data associated with respective documents from the plurality of documents (Cossock ¶ 0045), where properties of respective documents are acquired;
for each document from the plurality of documents, generating, by the server employing a Machine Learning Algorithm (MLA), a respective document vector based on the respective document data (Cossock ¶ 0045), where a set of features is generated for a document, the MLA having been trained:
based on a given training document-query pair associated with a respective relevance score, the relevance score being indicative of a relevance of a training document in the given training pair to a training query in the given training pair (Cossock ¶¶ 0038-43), where scores are assigned to queries manually for the purpose of training,
to generate (i) a training document vector for the training document and (ii) a training query vector for the training query, such that a proximity value between (i) the training document vector of the training document and (ii) the training query vector of the training query is representative of the relevance score (Cossock ¶ 0044, “minimizing an error associated with the training relevance scores and the relevance scores produced by the relevance function”), where the trained relevance function is based on a set of features, the respective vectors (Cossock ¶ 0045).


storing, by the server, the plurality of documents as groups of documents in the database system, each group of documents being associated with a respective group vector,
a given group of documents having documents associated with document vectors that are in a spatial proximity to the respective group vector.

The analogous and compatible art of Guggilla, however, teaches clustering documents (Guggilla ¶ 0027, “k-means machine learning for clustering”) around a value – the claimed group vector – within spatial proximity (Guggilla ¶ 0027, “the output is the property value for the object. This value is the average of the values of its K-nearest neighbors.”).

It would therefore have been obvious to one of ordinary skill in the art at the time of filing to combine the teachings of Guggilla with those of Cossock to apply k-NN regression of Guggilla to the output of the trained relevance function of Cossock in order to assist in document recall by identifying a group of similar documents.

As per claim 2, the rejection of claim 1 is incorporated, but Cossock does not teach:
wherein the spatial proximity is indicative of the documents in the given group of documents being similar to one another.

The analogous and compatible art of Guggilla, however, teaches clustering documents (Guggilla ¶ 0027, “k-means machine learning for clustering”) around a value – the claimed group vector – within spatial proximity (Guggilla ¶ 0027, “the output is the property value for the object. This value is the 

It would therefore have been obvious to one of ordinary skill in the art at the time of filing to combine the teachings of Guggilla with those of Cossock to apply k-NN regression of Guggilla to the output of the trained relevance function of Cossock in order to assist in document recall by identifying a group of similar documents.

As per claim 3, the rejection of claim 1 is incorporated, but Cossock does not teach:
determining, by the server, a respective group vector for each group of documents based on the document vectors associated with the plurality of documents.

The analogous and compatible art of Guggilla, however, teaches determining a cluster (Guggilla ¶ 0027, “k-means machine learning for clustering”) for documents within spatial proximity to a value – the claimed group vector – for the cluster (Guggilla ¶ 0027, “the output is the property value for the object. This value is the average of the values of its K-nearest neighbors.”).

It would therefore have been obvious to one of ordinary skill in the art at the time of filing to combine the teachings of Guggilla with those of Cossock to apply k-NN regression of Guggilla to the output of the trained relevance function of Cossock in order to assist in document recall by identifying a group of similar documents.

As per claim 4, the rejection of claim 1 is incorporated, but Cossock does not teach:
wherein the groups of documents comprise K number of groups, K being a pre-determined number.

The analogous and compatible art of Guggilla, however, teaches determining k clusters (Guggilla ¶ 0027, “k-means machine learning for clustering”), where k-means clustering clusters into k groups (e.g., Specification ¶ 0170).

It would therefore have been obvious to one of ordinary skill in the art at the time of filing to combine the teachings of Guggilla with those of Cossock to apply k-NN regression of Guggilla to the output of the trained relevance function of Cossock in order to assist in document recall by identifying a group of similar documents.

As per claim 5, the rejection of claim 1 is incorporated, but Cossock does not teach:
wherein the method further comprises grouping the plurality of documents into the groups of documents.
The analogous and compatible art of Guggilla, however, teaches clustering documents into groups (Guggilla ¶ 0027).

It would therefore have been obvious to one of ordinary skill in the art at the time of filing to combine the teachings of Guggilla with those of Cossock to apply k-NN regression of Guggilla to the output of the trained relevance function of Cossock in order to assist in document recall by identifying a group of similar documents.

As per claim 6, the rejection of claim 5 is incorporated, but Cossock does not teach:
executing, by the server, a K-means-type algorithm onto the document vectors associated with the plurality of documents thereby determining the group vectors and the respectively associated groups of documents of the plurality of documents.

The analogous and compatible art of Guggilla, however, teaches a k-means clustering algorithm to cluster documents into groups (Guggilla ¶ 0027, “k-means machine learning for clustering”).

It would therefore have been obvious to one of ordinary skill in the art at the time of filing to combine the teachings of Guggilla with those of Cossock to apply k-NN regression of Guggilla to the output of the trained relevance function of Cossock in order to assist in document recall by identifying a group of similar documents.

As per claim 9, the rejection of claim 1 is incorporated, but Cossock does not teach:
wherein the database system is configured to host a database separated into a plurality of shards, and wherein the storing the groups of documents comprises:
storing, by the server, the groups of documents as respective shards of the database in the database system, each shard being associated with the respective group vector.

The analogous and compatible art of Guggilla, however, teaches using a classifier to map documents to cluster centroids, where the clusters are the shards, where the mapping is stored, thereby storing documents as shards associated with centroids – the claimed group vectors (Guggilla ¶ 0038). 

It would therefore have been obvious to one of ordinary skill in the art at the time of filing to combine the teachings of Guggilla with those of Cossock to apply k-NN classification of Guggilla to the 

As per claim 10, the rejection of claim 9 is incorporated, but Cossock does not teach:
wherein the database system comprises a plurality of database servers, and wherein the storing the groups of documents as the respective shards comprises:
storing, by the server, the plurality of shards of the database on the plurality of database servers of the database system.

The analogous and compatible art of Guggilla, however, teaches storing clusters of documents in a distributed repository (Guggila ¶¶ 0033, 0038, 0039).

It would therefore have been obvious to one of ordinary skill in the art at the time of filing to combine the teachings of Guggila with those of Cassock to store clusters created by the trained classifier in a distributed repository in order to aid in document retrieval.

As per claim 11, the rejection of claim 10 is incorporated, but Cossock does not teach:
wherein a given database server of the plurality of database servers stores more than one of the plurality of shards.

The analogous and compatible art of Guggilla, however, teaches storing clusters of documents in a distributed repository (Guggila ¶¶ 0033, 0038, 0039).

.

Claim 7-8 is/are rejected under 35 U.S.C. 103 as being unpatentable over Cossock, US 2004/0215606 A1 (hereinafter “Cossock”), in view of Guggilla et al., US 2019/0065991 A1 (hereinafter “Guggilla”), and further in view of Roitblat, US 6,189,002 B1 (hereinafter “Roitblat”).

As per claim 7, the rejection of claim 1 is incorporated, and Cossock teaches:
receiving, by the server, a current query from an electronic device communicatively coupled to the server, the current query for providing the electronic device with a current document being relevant to the current query (Cossock ¶ 0082, “a user submits a query to the search engine”); and
receiving, by the server, query data associated with the current query (Cossock ¶ 0082), where terms in the query are obtained, along with other data identifying the query.

Cossock, however, does not teach:
for the current query, generating, by the server employing the MLA, a current query vector for the current query based on the query data associated with the current query;
determining, by the server, a most similar group vector to the current query vector amongst the group vectors, the most similar group vector being associated with a target group of documents; and
accessing, by the server, the database system for retrieving documents from the target group of documents.

The analogous and compatible art of Roitblat, however, teaches generating, using a “neural network” – a claimed machine learning algorithm – a “resulting profile” – the claimed query vector – based on terms in the query – the claimed query data – and determining most similar “centroid” – the claimed group vector – the centroid representing a cluster of documents, and retrieving the documents from the cluster in order to rank them for recall (Roitblat 10:20-39).

It would therefore have been obvious to one of ordinary skill in the art at the time of filing to combine the teachings of Roitblat with those of Cossock to combine the use of the neural network of Roitblat with the trained relevance function of Cossock, which is itself a neural network (Cossock ¶¶ 0061-64) in order to better locate relevant documents to the query.

As per claim 8, the rejection of claim 7 is incorporated, but Cossock does not teach:
wherein the accessing the database system comprises not retrieving documents from other groups of documents other than the target group of documents.

The analogous and compatible art of Roitblat, however, teaches generating, using a “neural network” – a claimed machine learning algorithm – a “resulting profile” – the claimed query vector – based on terms in the query – the claimed query data – and determining most similar “centroid” – the claimed group vector – the centroid representing a cluster of documents, and retrieving the documents from the cluster in order to rank them for recall, where only documents in the best match cluster are in this cluster” (emphasis added).

It would therefore have been obvious to one of ordinary skill in the art at the time of filing to combine the teachings of Roitblat with those of Cossock to use the trained relevance function of Cossock (Cossock ¶¶ 0061-64) as the neural network of Roitblat in order to better locate relevant documents to the query.

Claims 12-13 is/are rejected under 35 U.S.C. 103 as being unpatentable over Cossock, US 2004/0215606 A1 (hereinafter “Cossock”), in view of Guggilla et al., US 2019/0065991 A1 (hereinafter “Guggilla”), and further in view of Borthakur, HDFS Architecture Guide (hereinafter “Borthakur”).

As per claim 12, the rejection of claim 10 is incorporated, but Cossock does not teach:
wherein more than one database servers of the plurality of database servers store a given shard from the plurality of shards.

The analogous and compatible art of Borthakur, however, teaches an HDFS store that replicates data stored therein (Borthakur pg. 3).

It would therefore have been obvious to one of ordinary skill in the art to combine the teachings of Borhtakur with those of Cossock and Guggilla to replicate the data stored in the distributed repository of Guggilla to promote fault tolerance (Borthakur pg. 3).

As per claim 13, the rejection of claim 10 is incorporated, but Cossock does not teach:
wherein the plurality of database servers are physically located in more than one geographic locations.

The analogous and compatible art of Borthakur, however, teaches an HDFS store that replicates data stored therein that spans multiple data centers, and therefore contains servers located in more than one geographic location (Borthakur pp. 3-5).

It would therefore have been obvious to one of ordinary skill in the art to combine the teachings of Borthakur with those of Cossock and Guggilla to replicate the data stored in the distributed repository of Guggilla to promote fault tolerance (Borthakur pg. 3).

Claims 15-17 is/are rejected under 35 U.S.C. 103 as being unpatentable over Cossock, US 2004/0215606 A1 (hereinafter “Cossock”), in view of Guggilla et al., US 2019/0065991 A1 (hereinafter “Guggilla”), and Borthakur, HDFS Architecture Guide (hereinafter “Borthakur”), and further in view of Roitblat, US 6,189,002 B1 (hereinafter “Roitblat”).

As per claim 15, the rejection of claim 13 is incorporated, and Cossock further teaches:
receiving, by the server, a current query from an electronic device communicatively coupled to the server, the current query for providing the electronic device with a current document being relevant to the current query (Cossock ¶ 0082, “a user submits a query to the search engine”); and
receiving, by the server, query data associated with the current query (Cossock ¶ 0082), where terms in the query are obtained, along with other data identifying the query.

Cossock, however, does not teach:
for the current query, generating, by the server employing the MLA, a current query vector for the current query based on the query data associated with the current query; or
determining, by the server, a most similar group vector to the current query vector amongst the group vectors, the most similar group vector being associated with a target shard from the plurality of shards.

The analogous and compatible art of Roitblat, however, teaches generating, using a “neural network” – a claimed machine learning algorithm – a “resulting profile” – the claimed query vector – based on terms in the query – the claimed query data – and determining most similar “centroid” – the claimed group vector – the centroid representing a cluster of documents – a claimed shard – and retrieving the documents from the cluster in order to rank them for recall (Roitblat 10:20-39).

It would therefore have been obvious to one of ordinary skill in the art at the time of filing to combine the teachings of Roitblat with those of Cossock to combine the use of the neural network of Roitblat with the trained relevance function of Cossock, which is itself a neural network (Cossock ¶¶ 0061-64) in order to better locate relevant documents to the query.

Neither Cossock, Guggilla, nor Roitblat, however, teach:
accessing, by the server, a target database server from the plurality of database servers for retrieving documents of the target shard, the target database server storing the target shard.

The analogous and compatible art of Borthakur, however, teaches an HDFS store that replicates data stored therein that spans multiple data centers, and therefore contains servers located in more than one geographic location (Borthakur pp. 3-5).

It would therefore have been obvious to one of ordinary skill in the art to combine the teachings of Borthakur with those of Cossock and Guggilla to access the data stored in the distributed repository of Guggilla as replicated by Borthakur in order to promote fault tolerance (Borthakur pg. 3).

As per claim 16, the rejection of claim 15 is incorporated, but Cossock does not teach:
wherein the accessing the target database server comprises:
not accessing, by the server, other database servers of the database system other than the target database server.

The analogous and compatible art of Borthakur, however, teaches that when accessing an HDFS store that replicates data stored therein that spans multiple data centers, only a single server in a single data center is to be accessed (Borthakur pp. 3-5).

It would therefore have been obvious to one of ordinary skill in the art to combine the teachings of Borthakur with those of Cossock and Guggilla to access the data stored in the distributed repository of Guggilla as replicated by Borthakur in order to promote fault tolerance (Borthakur pg. 3).

As per claim 17, the rejection of claim 15 is incorporated, but Cossock does not teach:
wherein the method further comprises determining the target database server:
based on a geographical location of the electronic device and the plurality of database servers.

The analogous and compatible art of Borthakur, however, teaches that when accessing an HDFS store that replicates data stored therein that spans multiple data centers, only a single server in a single 

It would therefore have been obvious to one of ordinary skill in the art to combine the teachings of Borthakur with those of Cossock and Guggilla to access the data stored in the distributed repository of Guggilla as replicated by Borthakur in order to promote fault tolerance (Borthakur pg. 3).

Claim 18 is/are rejected under 35 U.S.C. 103 as being unpatentable over Cossock, US 2004/0215606 A1 (hereinafter “Cossock”), in view of Guggilla et al., US 2019/0065991 A1 (hereinafter “Guggilla”), and further in view of Miller et al., US 2018/0357240 A1 (hereinafter “Miller”).

As per claim 18, the rejection of claim 1 is incorporated, and Cossock further teaches:
wherein the MLA is a Neural Network (NN) (Cossock ¶ 0064), the NN comprises a document-dedicated portion,
the document-dedicated portion being configured to generate the training document vector based on document data associated with the training document (Cossock ¶ 0044).

Cossock, however, does not teach:
the NN comprises a query-dedicated portion;
the query-dedicated portion being configured to generate the training query vector based on query data associated with the training query, and
the document-dedicated portion and the query-dedicated portion having been trained together such that the proximity value between (i) the training document vector and (ii) the training query vector is representative of the relevance score.

The analogous and compatible art of Miller, however, teaches training a machine-learning model to generate a query vector based on a query separate from a machine-learning model to generate a document vector for a document (Miller ¶ 0029).

It would therefore have been obvious to one of ordinary skill in the art at the time of filing to combine the teachings of Miller with those of Cossock and Guggilla to separately train a machine-learning model to generate query vectors in order search results that are better-directed to the query intent.

Allowable Subject Matter
Claim 14 would be allowable if rewritten to overcome the rejection(s) under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), 2nd paragraph, set forth in this Office action and to include all of the limitations of the base claim and any intervening claims.
As allowable subject matter has been indicated, applicant's reply must either comply with all formal requirements or specifically traverse each requirement not complied with.  See 37 CFR 1.111(b) and MPEP § 707.07(a).
The following is a statement of reasons for the indication of allowable subject matter:  the prior art does not teach assigning clusters of document to servers such that any two database servers of the plurality of database servers that are geographically close store clusters having group vectors that are more similar to each other than group vectors of chat are stored on any other two database servers of the plurality of database servers that are geographically farther from each other than the two database servers.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to WILLIAM SPIELER whose telephone number is (571)270-3883.  The examiner can normally be reached on Monday-Friday, 11-3.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Mariela Reyes can be reached on 571-270-1006.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.


WILLIAM SPIELER
Primary Examiner
Art Unit 2159



/WILLIAM SPIELER/               Primary Examiner, Art Unit 2159