DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Status of the Claims
Claims 1-8, 10-17 and 19 are pending of which claims 1, 10 and 19 are in independent form.  Claims 1-8, 10-17 and 19 are rejected under 35 U.S.C. 103.

Response to Claim Amendments and Arguments
The claim amendments and arguments filed on 07 January 2022 as they apply to the 35 U.S.C. 103 rejections of the claims have been fully considered.  On page 10 of the remarks Applicant’s representative appears to take the position the cited prior art references do not teach the amended portions of the independent claims with respect to the determination of the number of pre-computed query results to be re-computed in a specific re-computation cycle in a dynamic way through a Reinforcement Learning algorithm by applying an exploitation mode of the Reinforcement Learning algorithm based on pre-computed query results which have been re-computed in the previous time interval.  Examiner has reviewed the claim amendments and arguments, conducted a prior art search based on the claims as amended and applied a new reference to the amended claims detailed in the rejection provided below. 

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-3, 10-12 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Rickard U.S. Pub. No. 2004/0034633 (hereinafter “Rickard”) in view of Legrand et al. U.S. Pub. No. 2015/0234890 (hereinafter “Legrand”) in further view of Vorobev et al. U.S. Pub. No. 2017/0140053 (hereinafter “Vorobov”).  
Regarding independent claim 1, Rickard discloses:
A computation machine for re-computing pre-computed query results stored at a database server, the pre-computed query results composed of a plurality of shares, each share including a certain number of pre-computed query results… (Rickard at paragraph [0022] discloses as part a data search system, a processor coupled to data storage capacity and memory including a database.  Additionally, Rickard at paragraph [0141] discloses iteratively improving search results based on evaluations of a subset of results.  Examiner is interpreting the iteratively improving of search results disclosed in Rickard above as reading on pre-computed query results… with the previous iteration of search results being the pre-computed query results.  Additionally, Examiner is interpreting a subset of results as reading on a share.)

While Rickard at paragraph [0030] discloses computational resources needed to perform queries, Rickard does not disclose:
…and wherein computation resources of the computation machine needed to re-compute a pre-computed query result of a first share depend on whether or not other pre-computed query results of the first share are  re-computed during a given time interval forming a current re-computation cycle.
In other words, Rickard does not disclose computation resources dependent on other pre-computed query results re-computed during a given time interval.
However, Legrand at paragraph [0084] teaches in part, “Re -computing such interrelated pre -computed search results together (i.e., within the same re -computation cycle) could include synergetic effects and may thus be more efficient than re -computing them separately.”
Both the Rickard reference and the Legrand reference, such as Legrand at paragraph [0010], are in the field of endeavor of iteratively re-computing search results.  Before the effective filing date of the claimed invention it would have been obvious to one of ordinary skill in the art to combine the iteratively improving search results disclosed in Rickard, with the selection of which search results to recompute based in part on dependency relationships between queries and computing resource constraints taught in Legrand to facilitate in increasing efficienty in data processing (See Legrand at paragraph [0003]).

While Rickard discloses at paragraph [0022] discloses as part a data search system, hardware components such as a processor coupled to data storage capacity and memory including a database, Rickard at paragraph [0028] discloses textual data stored in databases and web servers, Rickard at paragraph [0141] discloses iteratively improving search results based on evaluations of a subset of results using Reinforcement learning and a feedback loop and Examiner is of the position that the iterations [i.e., previous time intervals] of the evaluating subsets of results and feedback looping read on receiving a request to dynamically re-compute results, Rickard does not disclose:
the computation machine comprising: one or more processors; and a memory coupled with the one or more processors, the memory including program code that, when executed by the one or more processors, cause the computation machine to: receive a request to dynamically re-compute the pre-computed query results of at least the first share; retrieve, from a statistics server, an indication of the pre-computed query results of the first share which been re-computed in a previous time interval; determine, based on the pre-computed query results of the first share which have been re-computed in the previous time interval, a number of pre-computed query results in the first share to be re-computed in the current re-computation cycle based on a Reinforcement Learning algorithm by applying an exploitation mode of the Reinforcement Learning algorithm or applying an exploration mode of the Reinforcement Learning algorithm…
In other words, Rickard does not disclose determine…a number of pre-computed query results…to be re-computed…based on a Reinforcement Learning algorithm applying an exploitation mode…or…an exploration mode…
However, Vorobev at paragraph [0010] teaches reinforcement learning and a method for performing the explore vs exploit tradeoff.  Further, Vorobev at paragraph [0016] teaches an iterative machine learning model for ranking.  Lastly, Vorobev at paragraph [0021] teaches in part, “…applying a second machine learned algorithm to determine, for each of the plurality of candidate web resources, an exploration score based at least in part of the respective predicted relevancy parameter, and inputting the determined exploration score of the plurality of candidate web resources into a bandit-based-ranking algorithm for: (i) ranking the plurality of candidate web resources; (ii) selecting a subset of higher-ranked candidate web resources by applying a predefined inclusion parameter indicative of an acceptable number of candidate web resources of the plurality of candidate web resources…”
Both the Rickard reference and the Vorobev reference, are in the field of endeavor of iteratively re-computing search results.  Before the effective filing date of the claimed invention it would have been obvious to one of ordinary skill in the art to combine the iteratively improving search results using Reinforcement Learning as disclosed in Rickard, with the use of Reinforcement Learning, iterative ranking and exploration in selecting a number of results in a subset as taught in Vorobev to facilitate in improving query results over the long term by taking feedback into consideration (See Vorobev at paragraph [0019]).

While Rickard discloses iteratively improving search results and computational resources being needed to perform queries, Rickard does not disclose:
 the determined number of pre-computed query results is limited by the computation resources of the computation machine available during the given time interval…
However, Legrand in the Abstract teaches in part, “Within a given time interval, a computation platform re-computes these pre-computed search results having a re-computation indicator indicating the highest need for re-computation. The number of pre-computed search results re-computed by the computation platform is limited by the computation resources of the computation platform that are available for the re-computation within the given time interval.”

While Rickard at paragraph [0151] discloses reinforcement learning comprising an interative scheme that incorporates the user’s initial query and priori knowledge in query refinement, Rickard does not disclose:
store the determined number of pre-computed query results in the first share in the statistics server for a subsequent determination of the number of pre-computed query results to be re-computed in a subsequent computation cycle; re-compute the determined number of pre-computed query results in the first share during the current re-computation cycle.
However, Legrand in the Abstract teaches in part, “The number of pre-computed search results re-computed by the computation platform is limited by the computation resources of the computation platform that are available for the re-computation within the given time interval.”  Additionally, Legrand at paragraph [0100] teaches in part with emphasis added by the Examiner, “Generally, the computation resources needed to re-compute the pre-computed search results to be re-computed are dynamically estimated by the re-computation controller 2 while selecting the pre-computed search results to be re-computed during the next computation cycle.”  Examiner is interpreting selecting search results to be re-computed during the next computation cycle as reading on …subsequent determination of the number of pre-computed query results to be re-computed in a subsequent computation cycle.

provide the re-computed determined number of pre-computed query results in the first share to the database server (Rickard at paragraph [0135] discloses providing search results.)

Regarding dependent claim 2, all of the particulars of claim 1 have been addressed above.  Additionally, Rickard as modified with Legrand and Vorobev discloses:
wherein the request to dynamically re-compute the pre-computed query results of the first share indicates the pre-computed query results of the first share which are to be re-computed, and the computation machine is arranged to determine the number of the  pre-computed query results in the first share to be re-computed based on the indicated pre-computed query results of the at least first share (Rickard at paragraph [0141] discloses performing reinforcement learning on a subset of results, Examiner is of the position that in selecting a subset of results to perform reinforcement learning on Rickard is determining the number of pre-computed query results in the first share to be re-computed.  Additionally, Vorobev at paragraph [0010] teaches reinforcement learning and a method for performing the explore vs exploit tradeoff.  Further, Vorobev at paragraph [0016] teaches an iterative machine learning model for ranking.  Lastly, Vorobev at paragraph [0021] teaches in part, “…applying a second machine learned algorithm to determine, for each of the plurality of candidate web resources, an exploration score based at least in part of the respective predicted relevancy parameter, and inputting the determined exploration score of the plurality of candidate web resources into a bandit-based-ranking algorithm for: (i) ranking the plurality of candidate web resources; (ii) selecting a subset of higher-ranked candidate web resources by applying a predefined inclusion parameter indicative of an acceptable number of candidate web resources of the plurality of candidate web resources…”

Regarding dependent claim 3, all of the particulars of claim 1 have been addressed above.  Additionally, Rickard as modified with Legrand and Vorobev discloses:
in response to determining to apply the exploitation mode, determining the number of pre-computed query results in the first share to be re-computed in the current re-computation cycle based on a value function of the Reinforcement Learning algorithm for the number of pre- computed query results indicated by the retrieved indication, the value function associating possible selections of the pre-computed query results in the first share for re- computation with respective estimated rewards (Vorobev at paragraph [0010] teaches reinforcement learning and a method for performing the explore vs exploit tradeoff.  Examiner is of the position that Reinforcement learning is generally a machine learning method that relies on rewarding desired behaviors and punishing undesirable behaviors.  Additionally, Examiner is of the position that exploration involves exploring wide ranges of a sample set for possible solution regions whereas exploitation involves finding minimum or maximum values in smaller regions of a sample set.  The Vorobev reference in teaching a tradeoff between exploration and exploitation in applications using reinforcement learning is teaching the tradeoffs or costs in exploring a large set vs the potential or estimated rewards of an optimal solution.)

Regarding independent claim 10, while independent claim 10, a method claim, and independent claim 1, a machine claim, are directed towards different statutory classes, they are similar in scope.  Therefore claim 10 is rejected under the same rationale as claim 1.

Regarding dependent claim 11, all of the particulars of claim 10 have been addressed above.  Claim 11 is rejected under the same rationale as claim 2.


Regarding dependent claim 12, all of the particulars of claim 10 have been addressed above.  Claim 12 is rejected under the same rationale as claim 3.

Regarding independent claim 19, while independent claim 19, a computer program product claim, and independent claim 1, a machine claim, are directed towards different statutory classes, they are similar in scope.  Therefore, claim 19 is rejected under the same rationale as claim 1.

Claims 4-5 and 13-14 are rejected under 35 U.S.C. 103 as being unpatentable over Rickard in view of Legrand in view of Vorobev in further view of Jones, III U.S. Pub. No. 2012/0209794 (hereinafter “Jones”).
Regarding dependent claim 4, all of the particulars of claims 1 and 3 have been addressed above.  While Rickard at paragraph [0141] discloses using reinforcement learning in conjunction with a search engine and Vorobev at paragraph [0009] teaches gathering and maximizing rewards, Rickard as modified does not disclose:
wherein the estimated rewards associated by the value function are specified by a reward function of the Reinforcement Learning algorithm which attributes an aggregated value of a plurality of key performance indicators to a corresponding selection of pre-computed query results of the first share.
In other words, Rickard as modified does not disclose aggregating a reward value with respect to the Reinforcement Learning algorithm.
However, Jones at paragraph [0048] does teach aggregating a reward value with use in reinforcement learning, more specifically, Jones teaches in part, “Otherwise (if convergence is not yet achieved) then feedback, r, a " reward" function, computed by UPDATE as a function of the prediction, PY, current sensory input, X, and feedback, F. The FDBKOUT provides reward r to neighboring nodes and aggregated in each as an updated feedback, F. This updated feedback variable (F) is in turn used to UPDATE the state value, V, which is used in a reinforcement learning to UPDATE the policy, .pi.. And this completes the cycle.”
Both the Rickard reference and the Jones reference are in the field of endeavor of applications of reinforcement learning and machine learning.  Before the effective filing date of the claimed invention it would have been obvious to one of ordinary skill in the art to combine the updating search results using reinforcement learning as disclosed in Rickard with the aggregating of a reward value used in conjunction with reinforcement learning as taught in Jones to facilitate in improving the reward/punishment models generated using reinforcement learning.

Regarding dependent claim 5, all of the particulars of claims 1 and 3-4 have been addressed above.  Additionally, Rickard as modified with Legrand does disclose:
wherein the aggregated value of the plurality of key performance indicators is based on a sum of numbers of the pre-computed query results to be re-computed, elapsed time of re-computing the pre- computed query results to be re-computed in the given time interval, maximum computation resources to re-compute the pre-computed query results indicated in the request, a maximum of memory required to re-compute pre-computed query results indicated in the request, or a combination thereof (Rickard at paragraph [0141] discloses iteratively improving a subset of search results and Legrand at paragraph [0100] teaches the number of re-computed results is limited by computational capacity and time interval of a computational cycle.)

Regarding dependent claim 13, all of the particulars of claims 10 and 12 have been addressed above.  Claim 13 is rejected under the same rationale as claim 4.

Regarding dependent claim 14, all of the particulars of claims 10 and 12-13 have been addressed above.  Additionally, claim 14 is rejected under the same rationale as claim 5.

Claims 6, 8, 15 and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Rickard in view of Legrand in view of Vorobev in view of Jones in further view of Cabral et al. U.S. Pub. No. 2017/0358285 (hereinafter “Cabral”).
Regarding dependent claim 6, all of the particulars of claims 1 and 3-4 have been addressed above.  While Rickard as modified does disclose using reinforcement learning in recomputing search results and balancing exploration and exploitation interests, Rickard does not disclose:
wherein the program code causes the computation machine to determine to apply the exploitation mode of the Reinforcement Learning algorithm or to apply the exploration mode of the Reinforcement Learning algorithm by causing the computation machine to: determine a reward given by the value function for the pre-computed query results indicated by the retrieved indication; determine a number of re-computation cycles to apply the exploitation mode of the Reinforcement Learning algorithm based on a comparison of the determined reward with a reward threshold; and apply the exploitation mode of the Reinforcement Learning algorithm during the current re-computation cycle and subsequent re-computation cycles given by the determined number of re-computation cycles.
In other words, Rickard as modified does not explicitly disclose a reward threshold.
However, Cabral in the Abstract teaches the following:
An approach is provided in which an information handling system configures a reinforcement learning model based inspiration selections received from a user. The information handling system performs training iterations using the configured reinforcement learning model, which generates multiple actions and multiple rewards corresponding to multiple actions. The information handling system determines that the multiple rewards reach an empirical threshold and, in turn, generates a musical composition based on the multiple actions.

Both the Rickard reference and the Cabral reference are in the field of endeavor of applications of reinforcement learning and machine learning.  Before the effective filing date of the claimed invention it would have been obvious to one of ordinary skill in the art to combine the updating search results using reinforcement learning as disclosed in Rickard with the use of reward threshold in a reinforcement learning model taught in Cabral to facilitate in improving the reward/punishment models generated using reinforcement learning.

Regarding dependent claim 8, all of the particulars of claims 1, 3-4 and 6 have been addressed above.  Additionally, Rickard as modified does disclose:
wherein the program code causes the computation machine to: in response to determining to apply the exploration mode, iteratively adapt the number of pre-computed query results indicated by the retrieved indication at least based on the number of pre-computed query results indicated by the retrieved indication, the value function of the Reinforcement Learning algorithm for the number of pre-computed query results indicated by the retrieved indication and the reward threshold (Rickard at paragraph [0100] discloses iteratively improving a subset of search results using reinforcement learning.  Examiner is of the position that Reinforcement learning using a reward/punishment system to implement learning techniques.  Legrand at paragraph [0100] teaches a number of recomputed search results is limited based on computational capacity and time and lastly Cabral in the Abstract teaches the use of a reward threshold in reinforcement learning models.)

Regarding dependent claim 15, all of the particulars of claims 10 and 12-13 have been addressed above.  Additionally, claim 15 is rejected under the same rationale as claim 6.

Regarding dependent claim 17, all of the particulars of claims 10, 12-13 and 15 have been addressed above.  Additionally, claim 17 is rejected under the same rationale as claim 8.

Claims 7 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Rickard in view of Legrand in view of Vorobev in view of Jones in view of Cabral in further view of Sivasubramanian et al. U.S. Patent No. 8,429,097 (hereinafter “Sivasubramanian”).
Regarding dependent claim 7, all of the particulars of claims 1, 3-4 and 6 have been addressed above.  While Rickard as modified does disclose the use of reward thresholds in reinforcement learning, Rickard does not disclose:
wherein the reward threshold is defined by given percentages of threshold values of respective ones of the plurality of key performance indicators for which re-computation during the given time interval would result in an error state of the computation machine.
In other words, Rickard as modified does not disclose a reward threshold resulting in an error state.
However, Sivasubramanian at Column 9, Lines 55-63 teaches, “In some embodiments, the application of a business rule to prune the set of candidate actions may result in identification of only one acceptable candidate action. In this case, the reinforcement learning technique may be forced to select this action. In some embodiments, the application of a business rule to prune the set of candidate actions may result in a null set of candidate actions. In this case, an error or other type of exception may be raised in the system.”
Both the Rickard reference and the Sivasubramanian reference are in the field of endeavor of applications of reinforcement learning and machine learning.  Before the effective filing date of the claimed invention it would have been obvious to one of ordinary skill in the art to combine the iterative updating of search results using reinforcement learning and reward thresholds as disclosed in Rickard as modified with the use of rules, exceptions and errors in reinforcement learning applications as taught in Sivasubramanian to facilitate in improving the reward/punishment models generated using reinforcement learning.

Regarding dependent claim 16, all of the particulars of claims 10, 12-13 and 15 have been addressed above.  Additionally, claim 16 is rejected under the same rationale as claim 7.

Prior Art
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
2015/0310068
Paragraph [0025] as it relates to using reinforcement learning to aid in tuning results or exploration of the document population of a query.


Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ANTHONY G GEMIGNANI whose telephone number is (571)272-1018. The examiner can normally be reached M-F 8-5 EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Hosain T Alam can be reached on 571-272-3978. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/A.G.G./Examiner, Art Unit 2154                                                                                                                                                                                                        
/HOSAIN T ALAM/Supervisory Patent Examiner, Art Unit 2154