Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

DETAILED ACTION

This is the first office action regarding application number 16/156,300, filed October 10, 2018.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

Drawings
The drawings are objected to because of the following informalities:
Figure 5: the drawing contains misspellings of the word “POLICY”, e.g., “POLYCY a1” should be “POLICY a1”; “POLYCY a2” should be “POLICY a2”.
Figure 9A, element S02: misspelling of the word “SELECT”; “SELESECT POLICY a” should be “SELECT POLICY a”.
Figure 9A, element S06: no description text defining element S06.
Corrected drawing sheets in compliance with 37 CFR 1.121(d) are required in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, 

Specification
The disclosure is objected to because of the following informalities:
p.14 line 17: the indicated element “s_prev” is not shown in Figure 5. Appropriate correction is required.
p.14 line 20: the indicated element “policy a” is not shown in Figure 5. Appropriate correction is required.
p.14 line 21: Figure 6 does not correlate with the description provided in step 5, e.g., it does not contain the elements listed in step 5. It is not clear whether this is an error due to missing text in step 5, or an error in specifying an incorrect figure. Appropriate correction is required.
pp.18-19: elements 112, 110, 122 shown in Figure 8 are not described in the corresponding paragraphs describing Figure 8. Appropriate correction is required.
p.21 line 2: indicated element “S26” should be corrected as “S25”. Appropriate correction is required.

Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 

Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitations are:
Claim 1: 5an action-value function initializing unit that 
inputs search information including a history of a solution, a constraint equation, and an initial state of a selectable domain of a decision variable, 
sets a decision variable selected in each step and a value of the decision variable as a policy, and 
10initializes an action-value function including the policy, a selectable domain of a decision variable before policy decision, and a selectable domain of a decision variable after the policy decision as parameters …
Claim 1: a post transition state calculating unit that 
calculates a 15selectable domain region of the decision variable after the policy decision from the selectable domain of the decision variable before the policy decision and the policy by constrain propagation …
Claim 1: a search unit that 
receives problem information including 20the constraint equation and the initial state of the domain of the decision variable and information of the action-value function initialized by the action-value function initializing unit, 
obtains a value of a corresponding action-value function from the policy, the domain of the decision variable before the24 5403516-1HITACHI19-341700841US01policy decision, and a domain of the action-value function after the policy decision, 
searches for a policy in which the action-value function is largest, and 
searches for an optimum solution for the problem information.
Claim 3: an action-value function learning unit that 
receives the search information, 
sets an improvement degree of a score for an objective function as a compensation, and 
updates the action-value function on the basis of the compensation.
Because these claim limitations are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, they are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.


Claim Rejections - 35 USC § 112







The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 1-8 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Regarding Claim 1, 
Claim 1 recites the claim limitation “an action-value function initializing unit”, which invokes 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph according to the above section. However, the written description fails to disclose the corresponding structure, material, or acts for performing the entire claimed function and to clearly link the structure, material, or acts to the function. The specification fails to disclose an algorithm for the action-value function initializing unit, other than indicating that Formula 2 [Q(s_pre, s_post, a) = f(x2,y2, z2)] represents the initialization process, which is not sufficient structure to The action-value function initializing unit 120 illustrated in Fig. 8 is a functional unit that initializes the action-value function Q. The action-value function initializing unit 120 initializes the action-value function Q using a history 10of a problem and a solution of previous data (offline learning 200). Here, Q is updated and initialized in accordance with Formula 2 by using the score of the objective function as the compensation.". Initializing the action-value function Q encompasses a wide range of possible methods, meaning that there are no boundaries for determining the scope or extent of this initialization process. While a person skilled in the relevant art would be able to identify a policy 'a', domain variables 's_pre' and 's_post', and perform initialization of those variables, there is no way to determine or measure whether that particular initialization would map to the expected initialization for this action function Q according to the claimed invention. Therefore, this claim limitation is indefinite and is rejected under 35 U.S.C. 112(b) or pre-AIA  35 U.S.C. 112, second paragraph.
Claim 1 further recites the claim limitation “a post transition calculating unit”, which invokes 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph according to the above section. However, the written description fails to disclose the corresponding structure, material, or acts for performing the entire claimed function and to clearly link the structure, material, or acts to the function. According to the specification p.9 line 22-p.10 line 3: "By executing the program 106, the solution search processing apparatus executes functions of respective functional units such as an action-value function initializing unit 120, a 95403516-1HITACHI19-341700841US01search unit 121, a post transition state calculating unit 122, and an action-value function learning unit 123. The function of the respective units will be described later in detail". However, subsequent paragraphs of the specification do not provide any further description for this claimed post transaction state calculating unit. By failing to distinctly point out a particular post transition state calculating unit, the specification fails to provide the metes and bounds of Therefore, this claim limitation is indefinite and is rejected under 35 U.S.C. 112(b) or pre-AIA  35 U.S.C. 112, second paragraph.
Claim 1 further recites the claim limitation “a search unit”, which invokes 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph according to the above section. However, the written description fails to disclose the corresponding structure, material, or acts for performing the entire claimed function and to clearly link the structure, material, or acts to the function. According to the specification p.19 lines 1-7: "The search unit 121 is a functional unit that searches for a solution in accordance with the action-value function Q. The search unit 121 receives data from the current search information in accordance with the action-value function Q tuned 5in the offline learning 200 and searches the optimum solution and the quasi-optimum solution by taking the policy a in each step.". The specification proceeds to describe the search process steps used by the solution search processing apparatus in Figures 9A and 9B (with Figure 9A describing the ε-greedy technique for the 1-ε probability, and Figure 9B describing the ε-greedy technique for the ε probability), but fails to provide a delineation of steps within the search process that would perform the steps for a search unit, as essentially the entire solution search processing apparatus can be considered as a search unit, thus making the metes and bounds of the claimed search unit unclear. While a person skilled in the relevant art would understand how to performing searching and construct a search unit, there would be no way to determine or measure whether that search unit would match the search unit of the claimed invention. Therefore, this claim limitation is indefinite and is rejected under 35 U.S.C. 112(b) or pre-AIA  35 U.S.C. 112, second paragraph.
For the above Claim 1 limitations that invoke 35 U.S.C. 112(f) or pre-AIA  35 U.S.C 112, sixth paragraph, the applicant may:
(a)        Amend the claim so that the claim limitation will no longer be interpreted as a limitation under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph; 

(c)        Amend the written description of the specification such that it clearly links the structure, material, or acts disclosed therein to the function recited in the claim, without introducing any new matter (35 U.S.C. 132(a)).
If applicant is of the opinion that the written description of the specification already implicitly or inherently discloses the corresponding structure, material, or acts and clearly links them to the function so that one of ordinary skill in the art would recognize what structure, material, or acts perform the claimed function, applicant should clarify the record by either: 
(a)        Amending the written description of the specification such that it expressly recites the corresponding structure, material, or acts for performing the claimed function and clearly links or associates the structure, material, or acts to the claimed function, without introducing any new matter (35 U.S.C. 132(a)); or 
(b)        Stating on the record what the corresponding structure, material, or acts, which are implicitly or inherently set forth in the written description of the specification, perform the claimed function. For more information, see 37 CFR 1.75(d) and MPEP §§ 608.01(o) and 2181.
Claim 1 further recites the limitation "in each step" in p.24, line 9:
5an action-value function initializing unit that 
inputs search information including a history of a solution, a constraint equation, and an initial state of a selectable domain of a decision variable, 
sets a decision variable selected in each step and a value of the decision variable as a policy, and
10initializes an action-value function including the policy, a selectable domain of a decision variable before policy decision, and a selectable domain of a decision variable after the policy decision as parameters…
There is insufficient antecedent basis for this limitation in the claim, since none of the claim limitations within the action-value function initializing unit are enumerated as “a first step”, “a second step”, etc., thus making it unclear which step is being referenced by this term. Therefore, this claim limitation is indefinite and is rejected under 35 U.S.C. 112(b) or pre-AIA  35 U.S.C. 112, second paragraph. For the purposes of examination, this limitation “in each step” will not be given any patentable weight in Claim 1.
Regarding Claims 2-3,
Claims 2-3 are dependent claims of Claim 1, and hence are also rejected as being indefinite under 35 U.S.C. 112(b) or pre-AIA  35 U.S.C. 112, second paragraph, by virtue of dependency.
Regarding Claim 3, 
Claim 3 recites the claim limitation “an action-value function learning unit”, which invokes 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph according to the above section. However, the written description fails to disclose the corresponding structure, material, or acts for performing the entire claimed function and to clearly link the structure, material, or acts to the function. According to the specification p.18 lines 14-24: "The action-value function learning unit 123 is a functional 15unit that learns the action-value function Q. The action-value function learning unit 123 searches for the improvement solution to the problem of the previous data for the initialized action- value function Q using the ε-greedy technique, and updates Q using the improvement degree as the compensation (the offline 20learning 200 (Formula 3)). Further, it is called during the search for the current problem, the improvement solution is searched for by the ε-greedy technique, and Q is updated using the improvement degree as the compensation (the online learning 210 (Formula 3)).". The ε-greedy technique for the 1-ε probability, and Figure 9B describing the ε-greedy technique for the ε probability), but fails to provide a delineation of steps within the search process that would perform the steps for an action-value function learning unit, as essentially the entire solution search processing apparatus implementing the search process can be considered as an action-value function learning unit, thus making the metes and bounds of the claimed search unit unclear. While a person skilled in the relevant art would understand how to performing searching and construct an action-value function learning unit, there would be no way to determine or measure whether that action-value function learning unit would match the action-value function learning unit of the claimed invention. Therefore, this claim limitation is indefinite and is rejected under 35 U.S.C. 112(b) or pre-AIA  35 U.S.C. 112, second paragraph.
For the above Claim 3 limitation that invokes 35 U.S.C. 112(f) or pre-AIA  35 U.S.C 112, sixth paragraph, the applicant may:
(a)        Amend the claim so that the claim limitation will no longer be interpreted as a limitation under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph; 
(b)        Amend the written description of the specification such that it expressly recites what structure, material, or acts perform the entire claimed function, without introducing any new matter (35 U.S.C. 132(a)); or 
(c)        Amend the written description of the specification such that it clearly links the structure, material, or acts disclosed therein to the function recited in the claim, without introducing any new matter (35 U.S.C. 132(a)).
If applicant is of the opinion that the written description of the specification already implicitly or inherently discloses the corresponding structure, material, or acts and clearly links them to the function so that one of ordinary skill in the art would recognize what structure, material, or acts perform the claimed function, applicant should clarify the record by either: 

(b)        Stating on the record what the corresponding structure, material, or acts, which are implicitly or inherently set forth in the written description of the specification, perform the claimed function. For more information, see 37 CFR 1.75(d) and MPEP §§ 608.01(o) and 2181.
Regarding Claim 4,
Claim 4 is a dependent claim of Claim 3, and hence is also rejected as being indefinite under 35 U.S.C. 112(b) or pre-AIA  35 U.S.C. 112, second paragraph, by virtue of dependency.
Regarding Claim 5, 
Claim 5 recites the limitation "in each step" in p.26 line 7:
a step of inputting, by the solution search processing apparatus, search information including a history of a solution, 5a constraint equation, and an initial state of a selectable domain of a decision variable, 
setting a decision variable selected in each step and a value of the decision variable as a policy, and
initializing an action-value function including the policy, a selectable domain of a decision variable before policy 10decision, and a selectable domain of a decision variable after the policy decision as parameters;
a step of calculating …
a step of receiving …
There is insufficient antecedent basis for this limitation in the claim, since the claim limitation itself begins with “a step of”, and it is not clear whether this “in each step” applies to Therefore, this claim limitation is indefinite and is rejected under 35 U.S.C. 112(b) or pre-AIA  35 U.S.C. 112, second paragraph. For the purposes of examination, this limitation “in each step” will not be given any patentable weight in Claim 5.
Regarding Claims 6-7,
Claims 6-7 are dependent claims of Claim 5, and hence are also rejected as being indefinite under 35 U.S.C. 112(b) or pre-AIA  35 U.S.C. 112, second paragraph, by virtue of dependency.
Regarding Claim 8,
Claim 8 is a dependent claim of Claim 7, and hence is also rejected as being indefinite under 35 U.S.C. 112(b) or pre-AIA  35 U.S.C. 112, second paragraph, by virtue of dependency.











The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.

The following is a quotation of the first paragraph of pre-AIA  35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.

Claims 1-4 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to Claim 1 recites the claim limitation “a post transition calculating unit”. However, there is no written description that defines in sufficient detail the metes and bounds of the corresponding structure, material, or acts for performing the entire claimed function and to clearly link the structure, material, or acts to the function. According to the specification p.9 line 22-p.10 line 3: "By executing the program 106, the solution search processing apparatus executes functions of respective functional units such as an action-value function initializing unit 120, a 95403516-1HITACHI19-341700841US01search unit 121, a post transition state calculating unit 122, and an action-value function learning unit 123. The function of the respective units will be described later in detail". However, subsequent paragraphs of the specification do not provide any further description for this claimed post transaction state calculating unit. By failing to distinctly point out a particular post transition state calculating unit, the specification fails to provide the metes and bounds of this post transition state calculating unit. Therefore, this claim limitation fails the written description requirement and is rejected under 35 U.S.C. 112(a) or pre-AIA  35 U.S.C. 112, first paragraph.
Regarding Claims 2-3,
Claims 2-3 are dependent claims of Claim 1, and hence are also rejected as failing to comply with the written description requirement under 35 U.S.C. 112(a) or pre-AIA  35 U.S.C. 112, first paragraph, by virtue of dependency.
Regarding Claim 4,
Claim 4 is a dependent claim of Claim 3, and hence is also rejected as failing to comply with the written description requirement under 35 U.S.C. 112(a) or pre-AIA  35 U.S.C. 112, first paragraph, by virtue of dependency.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 1-3 and 5-7 are rejected under 35 U.S.C. 103 as being unpatentable over Fukui, Toshio, EP1710736A1, published 10/11/2006 [hereafter referred as Fukui] in view of Xu et al., Learning Adaptation to Solve Constraint Satisfaction Problems, LION 2009 Learning and Intelligent OptimizatioN, January 2009, Microsoft Research, pp.1-5 [hereafter referred as Xu].
Regarding Claim 1, Fukui teaches
A solution search processing apparatus that searches for a quasi-optimum solution for an objective function of a discrete optimization problem ([Fukui paragraph [0008]: a constraint-solver performing a solution search for a problem (“solution search processing apparatus”) , where the problem is defined by simultaneous equations fk (which represents an action-value function), where solving the set of functions fk defines a policy, and the process of solving the policy is an optimization problem (“…the description of constraint conditions and the data used for the relevant constraint conditions are necessary in the declarative programming method, and the number of the constraint conditions and data becomes huge if it is applied to actual problems. An actual problem here is, for example, a combinatorial optimization problem that satisfies simultaneous equations of fk (X1, X2, ..., Xn)≤bk (k=1, 2, ... , m) and that makes evaluation function f0 (X1, X2, ..., Xn) minimum value. That is, as shown in Fig.17, the computer (the constraint solver in Fig.17) searches in a lump for the data which simultaneously satisfy all constraint conditions if the constraint conditions expressed with m pieces of simultaneous equations and evaluation function f0 are given to the computer.”).]), 
comprising:  
an action-value function initializing unit that 
inputs search information including a history of a solution, a constraint equation, and an initial state of a selectable domain of a decision variable ([Fukui Figure 10, elements 1, 2; paragraph [0015]: a constraint-based solver includes a Means for Constraint Condition Input, which is responsible for inputting a constraint condition (“inputs search information including … a constraint equation”), and a Means for Variable Value Setting, which is responsible for setting variable values used for the constraint conditions (“In the Constraint-based Solver of this invention which include- Means for Constraint Condition Inputting to input constraint conditions, Means for Variable Value Setting to set the values of variables used for the constraint conditions which are input by the above-mentioned Means for Constraint Condition Inputting …”).] [Fukui Figure 11, element 10; paragraph [0071]: a constraint-based solver receiving recorded input from databases and system resources, linking the variables in the constraint conditions based on time (“inputs search information including a history of a solution …”) (“As shown in Fig. 10, according to the configuration of the Constraint-based Solver 400 in this embodiment, partial or all variables in constraint conditions can be linked with the values recorded in aforementioned databases 11 and system resources 12, and such variables in the system as are linked with other matters like time, thus enabling easy management of variables that consistency between the variables outside the Constraint-based Solver 400 and the variables of constraint conditions is achieved.”).] [Fukui Figure 10, elements 1, 2; paragraph [0021]: constraint condition input includes the initial values for constraint variables (“inputs search information including … an initial state of a selectable domain of a decision variable”) (“Means for Constraint Condition Inputting 1 is to input the constraint conditions that the user decided to use through imagining the procedures of constraint propagation by the computer (CPU). Means for Variable Value Setting 2 is to set up specific initial values that the user assigns for each variable that comprises the constraint constraints, which were input with Means for Constraint Condition Inputting 1. Moreover, as described in detail later, it also sets up the values other than initial values (hereinafter, referred to as variation values) to each variable.”).]), 
sets a decision variable selected ([Fukui Figure 10, elements 1,2; paragraph [0021]: constraint condition input includes the initial values for constraint variables (“decision variable”) (“Means for Constraint Condition Inputting 1 is to input the constraint conditions that the user decided to use through imagining the procedures of constraint propagation by the computer (CPU). Means for Variable Value Setting 2 is to set up specific initial values that the user assigns for each variable that comprises the constraint constraints, which were input with Means for Constraint Condition Inputting 1. Moreover, as described in detail later, it also sets up the values other than initial values (hereinafter, referred to as variation values) to each variable.”).] [Fukui Figure 10, elements 1, 2; paragraph [0015]: a constraint-based solver includes a Means for Constraint Condition Input, which is responsible for inputting a constraint condition, and a Means for Variable Value Setting, which is responsible for setting variable values used for the constraint conditions (“sets a decision variable selected  and a value of the decision variable as a policy”) (“In the Constraint-based Solver of this invention which include- Means for Constraint Condition Inputting to input constraint conditions, Means for Variable Value Setting to set the values of variables used for the constraint conditions which are input by the above-mentioned Means for Constraint Condition Inputting …”).]), and 
initializes an action-value function including the policy, a selectable domain of a decision variable before policy decision, and a selectable domain of a decision variable after the policy decision as parameters ([Fukui Figure 10, element 2; paragraph [0014]: a constraint-based solver includes a Means for Variable Value Setting, which sets the constraint variables in the constraint conditions to a default value; the initializing of each of the constraint conditions represents initializing an action-value function, and thus as a whole, the initial state of the policy (“initializes an action-value function including the policy, a selectable domain of a decision variable before policy decision, and a selectable domain of a decision variable after the policy decision as parameters”) (“Means for Variable Value Setting to set the default and variation values used for the constraint conditions which are input by the above-mentioned Means for Constraint Condition Inputting…”).]); 
a post transition state calculating unit that 
calculates a selectable domain region of the decision variable after the policy decision from the selectable domain of the decision variable before the policy decision and the policy by constrain propagation ([Fukui Figure 10, element 8; paragraph [0022]: a constraint-based solver includes a Means for Constraint Condition Extraction, which extracts the constraint variables from the set of constraint conditions (“…from the selectable domain of the decision variable before the policy decision”), with the Means for Constraint Condition Extraction 8 containing the same functionality as Means for Constraint Condition Extraction 3 per paragraph [0070] (“Means for Constraint Condition Extraction 3 extracts constraint conditions bearing each variable to which an initial value or variation value as a constituent element is set from the constraint conditions that are input. In this embodiment, Means for Constraint Condition Extraction 3 is configured so that relevant constraint conditions are extracted in a mass, but it is no problem to extract one at a time, so that the solution of a constraint condition with Means for Constraint Condition Calculation 4 to be hereinafter described and the extraction of the next constraint condition are repeated in order.”).] [Figure 10, element 4; paragraph [0014]: a constraint-based solver includes a Means for Constraint Condition Calculation, which performs a calculation on the input variables received from the Means for Constraint Condition Extraction, to generate a solution of solved variables (“a selectable domain region of the decision variable after the policy decision”) using constraint propagation (“calculates … from the selectable domain of the decision variable before the policy decision by constrain propagation”) (“Means for Constraint Condition Calculation to calculate the solutions of the relevant constraint conditions, assigning the default or variation values to the above-mentioned variables, in conformity to the procedure orders of the constraint propagation which are considered by the above-mentioned user, concerning all of the constraint conditions that are extracted by the above-mentioned Means for Constraint Condition Extraction”).]); and 
a search unit that 
receives problem information including the constraint equation and the initial state of the domain of the decision variable and information of the action-value function initialized by the action-value function initializing unit ([Fukui Figure 18; paragraph [0018]: the constraint-based solver represents the search unit, as it searches for the solutions of constraint conditions (“Fig. 18 is a conceptual figure that shows the positioning between the conventional programming language environment and the Constraint-based Solver of the present invention, which searches for the solutions of constraint conditions.”).] [Fukui Figure 10, elements 8, 2, 1; Figure 2, elements S23-S27; paragraphs [0034]-[0035]: the Means for Constraint Condition Extraction receiving information from the databases, the Means for Constraint Condition Inputting (“receives problem information including the constraint equation…”), and the Means for Variable Value Setting (“receives problem information including … the initial state of the domain of the decision variable and information of the action-value function initialized by the action-value function initializing unit”), and performing an iterative step for each variable Sm in the constraint conditions, generating a set S that has variables that were modified based on the received inputs, and generating a set Q by solving a constraint condition Qm, with Q representing the condition set related with each variable in the set S (“In the step S23, Means for Constraint Condition Extraction 3 generates the set S that has the variables modified in the step S22 as its elements. Here generating the set S means, for example, writing suffixes, with which the abovementioned modified names of variables and the array elements of variables can be identified, in a certain memory area on the computer, and so forth. In addition, Means for Constraint Condition Extraction 3 judges whether there are any elements in the above-mentioned set S, that is, any variables of which the values have been modified with Means for Variable Value Setting 2 (the step S24). As a result of the above-mentioned judgment, when there are elements in the set S (when the set S is not an empty set), Means for Constraint Condition Extraction 3 picks up the element Sm one by one (the step S25). … Then, the set Q of the constraints related to the element Sm, which is picked up as mentioned above, is generated in the step S26. That is, if there are any constraint conditions that are comprised including the element Sm, Means for Constraint Condition Extraction 3 output them to the set Q. Then, while there are elements in the abovementioned set Q in the step S27, the following steps S28 to S30 are executed repeatedly. … ”).]), 
obtains a value of a corresponding action-value function from the policy, the domain of the decision variable before the policy decision, and a domain of the action-value function after the policy decision ([Fukui Figure 10, elements 8, 2, 1; Figure 2, elements S23-S27; paragraphs [0034]-[0035]: the Means for Constraint Condition Extraction receiving information from the databases, the Means for Constraint Condition Inputting, and the Means for Variable Value Setting, and initiating the iterative search for each variable for each variable Sm in the constraint conditions, generating a set S that has variables (“the domain of the decision variable before the policy decision …”) that were modified (“a domain of the action-value function after the policy decision”) based on the received inputs, and generating a set Q by solving a constraint condition Qm (“a value of a corresponding action-value function from the policy”), with Q representing the condition set related with each variable in the set S (“obtains a value of a corresponding action-value function from the policy, the domain of the decision variable before the policy decision, and a domain of the action-value function after the policy decision”), with the Means for Constraint Condition Extraction 8 containing the same functionality as Means for Constraint Condition Extraction 3 per paragraph [0070] (“In the step S23, Means for Constraint Condition Extraction 3 generates the set S that has the variables modified in the step S22 as its elements. Here generating the set S means, for example, writing suffixes, with which the abovementioned modified names of variables and the array elements of variables can be identified, in a certain memory area on the computer, and so forth. In addition, Means for Constraint Condition Extraction 3 judges whether there are any elements in the above-mentioned set S, that is, any variables of which the values have been modified with Means for Variable Value Setting 2 (the step S24). As a result of the above-mentioned judgment, when there are elements in the set S (when the set S is not an empty set), Means for Constraint Condition Extraction 3 picks up the element Sm one by one (the step S25). … Then, the set Q of the constraints related to the element Sm, which is picked up as mentioned above, is generated in the step S26. That is, if there are any constraint conditions that are comprised including the element Sm, Means for Constraint Condition Extraction 3 output them to the set Q. Then, while there are elements in the abovementioned set Q in the step S27, the following steps S28 to S30 are executed repeatedly. … ”).]), 
…
searches for an optimum solution for the problem information ([Fukui paragraph [0008]: a constraint-solver performing the solution search for a problem, where the problem is defined by simultaneous equations fk (which represents an action-value function), where solving the set of functions fk defines a policy, and the process of solving the policy represents a search using the problem information (“…the description of constraint conditions and the data used for the relevant constraint conditions are necessary in the declarative programming method, and the number of the constraint conditions and data becomes huge if it is applied to actual problems. An actual problem here is, for example, a combinatorial optimization problem that satisfies simultaneous equations of fk (X1, X2, ..., Xn)≤bk (k=1, 2, ... , m) and that makes evaluation function f0 (X1, X2, ..., Xn) minimum value. That is, as shown in Fig.17, the computer (the constraint solver in Fig.17) searches in a lump for the data which simultaneously satisfy all constraint conditions if the constraint conditions expressed with m pieces of simultaneous equations and evaluation function f0 are given to the computer.”).] [Fukui Figure 10, elements 4, 5; Figure 2, S28-S30; paragraphs [0036]-[0037]: the Means for Constraint Condition Calculation performing the calculation to search for variables and their values that should be modified to satisfy the constraint condition for all conditions Qm until all conditions have been solved, with the Means for Variable Value this process represents a search for an optimum solution for the problem information (“In the step S28, Means for Constraint Condition Calculation 4 picks up a constraint condition Qm from the set Q. And Means for Constraint Condition Calculation 4 solves the constraint condition Qm picked up in the step S28, and searches for variables and their values that should be modified so as to satisfy the constraint condition Qm (the step S29). Here is the detailed content of the process in the step S29. (1) Whether the constraint condition Qm is satisfied or not is checked, and if it is satisfied, the present step S29 is completed. (2) Variables and their values which should be modified to satisfy the constraint condition Qm are searched for. (3) Whether the values of variables that are searched for in the item (2) above are true or not is verified. Although the item (2) above is indispensable as the process in the step S29, the other items (1) and (3) are not necessarily so. The user may judge whether they are necessary or not in accordance with the target problems, and comprise so that they are explicitly or implicitly designated as constraint conditions. Furthermore, in the step S30, Means for Variable Value Resetting 5 resets the values of the variables that are searched for in the step S29, and in the following process the constraint conditions are solved based on the values that are reset.”).]).  
However, Fukui does not teach
searches for a policy in which the action-value function is largest …
Xu teaches
searches for a policy in which the action-value function is largest ([p.1 Section 2 CSP Background: expressing a constraint satisfaction problem (CSP) in the context of an adaptive learning problem (“A CSP is defined as a pair {V,C}, where V is a set of variables, each with a finite domain of values, and C is a set of constraints, expressed as relations over the variables in V. In general, CSP instances are solved by making a sequence of decisions that select a variable at a time and assign a value from its respective domain.”).] [p.2 Section 3.1 Problem Setup and Section 3.2 Function Approximation and Q-learning: referring to Algorithm 1 (Reinforcement Learning Algorithm in the context of CSP), using an adaptive/reinforcement learning algorithm to solve a search-based constraint-satisfaction problem (CSP) solver by defining the search for an optimal policy as a search for the states (search space of the CSP) that satisfy an argmax function (“searches for a policy in which the action-value function is largest”) (“The search-based CSP solver can be formulated as a reinforcement learning task as below: – A set of states S. Each state s ∈ S is an instance or sub-instance of CSP. The state space corresponds to the search space of the CSP. – A set of actions A. Each action a ∈ A is a variable ordering heuristic function. – Transition function T : S×A → S. Each transition corresponds to a decision made when solving a CSP instance. For example, given a pair (s, a), we can use the variable ordering heuristic a for the (sub-)instance s to select a variable. After the variable is selected and assigned a value (by some fixed value ordering heuristic), the (sub-)instance s is changed to another (sub-)instance s', which is exactly a transition. – Reward function R. The goal of a CSP solver is to solve instances. In the goal states where a solution is found, a reward is assigned. … Q-learning is a reinforcement learning technique that learns an optimal action-value function Q*(s, a), giving the expected value of taking action a in state s. Since the state space here is exponentially large, we approximate Q(s, a) by a function based on the features of a state s. Q(s, a) = wa · f(s) (1) where wa is a weight vector for the action a and f(s) is a feature vector representing the state s. Assuming that we have all functions Q(s, a) for all actions, the optimal policy is  𝛑*(s) = argmaxa Q*(s, a).”).]) …
Both Fukui and Xu are analogous art as both teach searching algorithms for solving constraint satisfaction problems.
([p.2 Section 3 Learning Adaption: “By directly interacting with the solver, learning adaption is able to discover a new strategy and therefore has the potential to solve instances that cannot be solved with the currently available approaches. The existing learning techniques for constraint-based problems include supervised learning [4] (QBF) and reinforcement learning [5] (SAT). … Here we describe how general reinforcement learning [7] can be applied in the context of CSP with the goal to adapt the search process dynamically at each decision point based on the structural properties of the current sub-problem to be solved.”]).
Regarding Claim 2, Fukui in view of Xu teaches
The solution search processing apparatus according to claim 1, wherein the search unit 
sets an improvement degree of a score for an objective function as a compensation ([Xu p.3 Section 3.2 Function Approximation and Q-learning: referring to Algorithm 1 (Reinforcement Learning Algorithm in the context of CSP), within the for loop in line 15 of the algorithm, the weight vector elements                         
                            
                                
                                    w
                                
                                
                                    
                                        
                                            a
                                        
                                        
                                            j
                                        
                                    
                                
                            
                             
                        
                    are updated with new values based on a reward function R (“compensation”), a learning rate α, a discount factor γ, and the Q values at levels j and j+1 upon each time a CSP instance (constraint equation; “objective function”) is being solved [                        
                            
                                
                                    w
                                
                                
                                    
                                        
                                            a
                                        
                                        
                                            j
                                        
                                    
                                
                            
                            =
                             
                            
                                
                                    w
                                
                                
                                    
                                        
                                            a
                                        
                                        
                                            j
                                        
                                    
                                
                            
                            +
                             
                            α
                             
                            ∙
                            
                                
                                    
                                        
                                            R
                                        
                                        
                                            j
                                        
                                    
                                    +
                                     
                                    γ
                                    ∙
                                    
                                        
                                            
                                                
                                                    m
                                                    a
                                                    x
                                                
                                                
                                                    a
                                                
                                            
                                        
                                        
                                            '
                                        
                                    
                                    Q
                                    
                                        
                                            
                                                
                                                    s
                                                
                                                
                                                    '
                                                
                                            
                                            ,
                                            
                                                
                                                    a
                                                
                                                
                                                    '
                                                
                                            
                                        
                                    
                                    -
                                    Q
                                    
                                        
                                            s
                                            ,
                                            
                                                
                                                    a
                                                
                                                
                                                    j
                                                
                                            
                                        
                                    
                                
                            
                            ∙
                            f
                            (
                            s
                            )
                        
                    ] (“sets an improvement degree of a score for an objective function as compensation”), where the weight vectors are used in the Q function (“action-value function”) [Q(s,a) =                         
                            
                                
                                    w
                                
                                
                                    a
                                
                            
                        
                     ∙ f(s)] that is used to determine the decision taken from state s and action a, with the chosen decision being the one which yields the largest value [argmaxa Q*(s,a)].]) and 
updates the action-value function on the basis of the compensation ([Xu p.3 Section 3.2 Function Approximation and Q-learning: referring to Algorithm 1 (Reinforcement Learning Algorithm in the context of CSP), within the for loop in line 15 of the algorithm, the weight vector elements                         
                            
                                
                                    w
                                
                                
                                    
                                        
                                            a
                                        
                                        
                                            j
                                        
                                    
                                
                            
                             
                        
                    are updated with new values based on a reward function R (“compensation”), a learning rate α, a discount factor γ, and the Q values at levels j and j+1 upon each time a CSP instance (constraint equation) is being solved [                        
                            
                                
                                    w
                                
                                
                                    
                                        
                                            a
                                        
                                        
                                            j
                                        
                                    
                                
                            
                            =
                             
                            
                                
                                    w
                                
                                
                                    
                                        
                                            a
                                        
                                        
                                            j
                                        
                                    
                                
                            
                            +
                             
                            α
                             
                            ∙
                            
                                
                                    
                                        
                                            R
                                        
                                        
                                            j
                                        
                                    
                                    +
                                     
                                    γ
                                    ∙
                                    
                                        
                                            
                                                
                                                    m
                                                    a
                                                    x
                                                
                                                
                                                    a
                                                
                                            
                                        
                                        
                                            '
                                        
                                    
                                    Q
                                    
                                        
                                            
                                                
                                                    s
                                                
                                                
                                                    '
                                                
                                            
                                            ,
                                            
                                                
                                                    a
                                                
                                                
                                                    '
                                                
                                            
                                        
                                    
                                    -
                                    Q
                                    
                                        
                                            s
                                            ,
                                            
                                                
                                                    a
                                                
                                                
                                                    j
                                                
                                            
                                        
                                    
                                
                            
                            ∙
                            f
                            (
                            s
                            )
                        
                    ], where the weight vectors are used to update the Q function (“action-value function”) [Q(s,a) =                         
                            
                                
                                    w
                                
                                
                                    a
                                
                            
                        
                     ∙ f(s)] (“updates the action-value function on the basis of the compensation”) that is used to determine the decision taken from state s and action a, with the chosen decision being the one which yields the largest value [argmaxa Q*(s,a)].]).  
Regarding Claim 3, Fukui in view of Xu teaches
The solution search processing apparatus according to claim 1, further comprising, 
an action-value function learning unit that 
receives the search information ([Fukui Figure 10, elements 1, 2; paragraph [0015]: a constraint-based solver includes a Means for Constraint Condition Input, which is responsible for inputting a constraint condition, and a Means for Variable Value Setting, which is responsible for setting variable values used for the constraint conditions (“In the Constraint-based Solver of this invention which include- Means for Constraint Condition Inputting to input constraint conditions, Means for Variable Value Setting to set the values of variables used for the constraint conditions which are input by the above-mentioned Means for Constraint Condition Inputting …”).] [Fukui Figure 10, elements 8, 2, 1; Figure 2, elements S23-S27; paragraphs [0034]-[0035]: the Means for Constraint Condition Extraction receiving information from the databases, the Means for Constraint Condition Inputting, and the Means for Variable Value Setting (“receives the search information”), and initiating an iterative search for each variable Sm in the constraint conditions, generating a set S that has variables that were modified based on the received inputs, and generating a set Q by solving a constraint condition Qm, with Q representing the condition set related with each variable in the set S (“obtains a value of a corresponding action-value function from the policy, the domain of the decision variable before the policy decision, and a domain of the action-value function after the policy decision”), with the Means for Constraint Condition Extraction 8 containing the same functionality as Means for Constraint Condition Extraction 3 per paragraph [0070] (“In the step S23, Means for Constraint Condition Extraction 3 generates the set S that has the variables modified in the step S22 as its elements. Here generating the set S means, for example, writing suffixes, with which the abovementioned modified names of variables and the array elements of variables can be identified, in a certain memory area on the computer, and so forth. In addition, Means for Constraint Condition Extraction 3 judges whether there are any elements in the above-mentioned set S, that is, any variables of which the values have been modified with Means for Variable Value Setting 2 (the step S24). As a result of the above-mentioned judgment, when there are elements in the set S (when the set S is not an empty set), Means for Constraint Condition Extraction 3 picks up the element Sm one by one (the step S25). … Then, the set Q of the constraints related to the element Sm, which is picked up as mentioned above, is generated in the step S26. That is, if there are any constraint conditions that are comprised including the element Sm, Means for Constraint Condition Extraction 3 output them to the set Q. Then, while there are elements in the abovementioned set Q in the step S27, the following steps S28 to S30 are executed repeatedly. … ”).]), 
sets an improvement degree of a score for an objective function as a compensation ([Xu p.3 Section 3.2 Function Approximation and Q-learning: referring to Algorithm 1 (Reinforcement Learning Algorithm in the context of CSP), within the for loop in line 15 of the algorithm, the weight vector elements                         
                            
                                
                                    w
                                
                                
                                    
                                        
                                            a
                                        
                                        
                                            j
                                        
                                    
                                
                            
                             
                        
                    are updated with new values based on a reward function R (“compensation”), a learning rate α, a discount factor γ, and the Q values at levels j and j+1 upon each time a CSP instance (constraint equation; “objective function”) is being solved [                        
                            
                                
                                    w
                                
                                
                                    
                                        
                                            a
                                        
                                        
                                            j
                                        
                                    
                                
                            
                            =
                             
                            
                                
                                    w
                                
                                
                                    
                                        
                                            a
                                        
                                        
                                            j
                                        
                                    
                                
                            
                            +
                             
                            α
                             
                            ∙
                            
                                
                                    
                                        
                                            R
                                        
                                        
                                            j
                                        
                                    
                                    +
                                     
                                    γ
                                    ∙
                                    
                                        
                                            
                                                
                                                    m
                                                    a
                                                    x
                                                
                                                
                                                    a
                                                
                                            
                                        
                                        
                                            '
                                        
                                    
                                    Q
                                    
                                        
                                            
                                                
                                                    s
                                                
                                                
                                                    '
                                                
                                            
                                            ,
                                            
                                                
                                                    a
                                                
                                                
                                                    '
                                                
                                            
                                        
                                    
                                    -
                                    Q
                                    
                                        
                                            s
                                            ,
                                            
                                                
                                                    a
                                                
                                                
                                                    j
                                                
                                            
                                        
                                    
                                
                            
                            ∙
                            f
                            (
                            s
                            )
                        
                    ] (“sets an improvement degree of a score for an objective function as compensation”), where the weight vectors are used in the Q function (“action-value function”) [Q(s,a) =                         
                            
                                
                                    w
                                
                                
                                    a
                                
                            
                        
                     ∙ f(s)] that is used to determine the decision taken from state s and action a, with the chosen decision being the one which yields the largest value [argmaxa Q*(s,a)].]), and 
updates the action- value function on the basis of the compensation ([Xu p.3 Section 3.2 Function Approximation and Q-learning: referring to Algorithm 1 (Reinforcement Learning Algorithm in the context of CSP), within the for loop in line 15 of the algorithm, the weight vector elements                         
                            
                                
                                    w
                                
                                
                                    
                                        
                                            a
                                        
                                        
                                            j
                                        
                                    
                                
                            
                             
                        
                    are updated with new values based on a reward function R (“compensation”), a learning rate α, a discount factor γ, and the Q values at levels j and j+1 upon each time a CSP instance (constraint equation) is being solved [                        
                            
                                
                                    w
                                
                                
                                    
                                        
                                            a
                                        
                                        
                                            j
                                        
                                    
                                
                            
                            =
                             
                            
                                
                                    w
                                
                                
                                    
                                        
                                            a
                                        
                                        
                                            j
                                        
                                    
                                
                            
                            +
                             
                            α
                             
                            ∙
                            
                                
                                    
                                        
                                            R
                                        
                                        
                                            j
                                        
                                    
                                    +
                                     
                                    γ
                                    ∙
                                    
                                        
                                            
                                                
                                                    m
                                                    a
                                                    x
                                                
                                                
                                                    a
                                                
                                            
                                        
                                        
                                            '
                                        
                                    
                                    Q
                                    
                                        
                                            
                                                
                                                    s
                                                
                                                
                                                    '
                                                
                                            
                                            ,
                                            
                                                
                                                    a
                                                
                                                
                                                    '
                                                
                                            
                                        
                                    
                                    -
                                    Q
                                    
                                        
                                            s
                                            ,
                                            
                                                
                                                    a
                                                
                                                
                                                    j
                                                
                                            
                                        
                                    
                                
                            
                            ∙
                            f
                            (
                            s
                            )
                        
                    ], where the weight vectors are used to update the Q function (“action-value function”) [Q(s,a) =                         
                            
                                
                                    w
                                
                                
                                    a
                                
                            
                        
                     ∙ f(s)] (“updates the action-value function on the basis of the compensation”) that is used to determine the decision taken from state s and action a, with the chosen decision being the one which yields the largest value [argmaxa Q*(s,a)].]).  
Regarding Claim 5,
A solution search method by a solution search processing apparatus that searches for a quasi-optimum solution for an objective function of a discrete optimization problem, comprising: 
a step of 
inputting, by the solution search processing apparatus, search information including a history of a solution, a constraint equation, and an initial state of a selectable domain of a decision variable (This claim limitation is similar in scope to a corresponding claim limitation in Claim 1, and hence is rejected under similar rationale), 
setting a decision variable selected (This claim limitation is similar in scope to a corresponding claim limitation in Claim 1, and hence is rejected under similar rationale), and 
initializing an action-value function including the policy, a selectable domain of a decision variable before policy decision, and a selectable domain of a decision variable after the policy decision as parameters (This claim limitation is similar in scope to a corresponding claim limitation in Claim 1, and hence is rejected under similar rationale); 
a step of 
calculating, by the solution search processing apparatus, a selectable domain region of the decision variable after the policy decision from the selectable domain of the decision variable before the policy decision and the policy by constrain propagation (This claim limitation is similar in scope to a corresponding claim limitation in Claim 1, and hence is rejected under similar rationale); and 
a step of receiving, by the solution search processing apparatus, problem information including the constraint equation and the initial state of the domain of the decision variable and information of the action-value function initialized by the action-value function initializing unit (This claim limitation is similar in scope to a corresponding claim limitation in Claim 1, and hence is rejected under similar rationale), 
obtaining a value of a corresponding action-value function from the policy, the domain of the decision variable before the policy decision, and a domain of the action-value function after the policy decision (This claim limitation is similar in scope to a corresponding claim limitation in Claim 1, and hence is rejected under similar rationale), 
searching for a policy in which the action-value function is largest (This claim limitation is similar in scope to a corresponding claim limitation in Claim 1, and hence is rejected under similar rationale), and 
searching for an optimum solution for the problem information (This claim limitation is similar in scope to a corresponding claim limitation in Claim 1, and hence is rejected under similar rationale).  
Regarding Claim 6, Fukui in view of Xu teaches
The solution search processing method according to claim 5, wherein, in the step of 
searching for the optimum solution for the problem information (This claim limitation is similar in scope to a corresponding claim limitation in Claim 2, and hence is rejected under similar rationale), 
an improvement degree of a score for an objective function is set as a compensation (This claim limitation is similar in scope to a corresponding claim limitation in Claim 2, and hence is rejected under similar rationale), and 
the action-value function is updated on the basis of the compensation (This claim limitation is similar in scope to a corresponding claim limitation in Claim 2, and hence is rejected under similar rationale).  
Regarding Claim 7, Fukui in view of Xu teaches
The solution search processing method according to claim 5, further comprising, a step of 
receiving the search information (This claim limitation is similar in scope to a corresponding claim limitation in Claim 3, and hence is rejected under similar rationale), 
setting an improvement degree of a score for an objective function as a compensation (This claim limitation is similar in scope to a corresponding claim limitation in Claim 3, and hence is rejected under similar rationale), and 
updating the action-value function on the basis of the compensation (This claim limitation is similar in scope to a corresponding claim limitation in Claim 3, and hence is rejected under similar rationale).  
Claims 4 and 8 are rejected under 35 U.S.C. 103 as being unpatentable over Fukui, Toshio, EP1710736, published 10/11/2006 [hereafter referred as Fukui] in view of Xu et al., Learning Adaptation to Solve Constraint Satisfaction Problems, LION 2009 Learning and Intelligent OptimizatioN, January 2009, Microsoft Research, pp.1-5 [hereafter referred as Xu], as applied to Claim 3 and Claim 7, in view of Sutton et al., Reinforcement Learning: An Introduction, 1998 MIT Press, Chapter 2, pp.25-49 [hereafter referred as Sutton].
Regarding Claim 4, Fukui in view of Xu as applied to Claim 3 teaches
The solution search processing apparatus according to claim 3.
However, Fukui in view of Xu does not teach
wherein the action-value function learning unit uses an ε-greedy greedy technique as a selection strategy of a policy for learning the action-value function.
Sutton teaches
wherein the action-value function learning unit uses an ε-greedy greedy technique as a selection strategy of a policy for learning the action-value function ([Sutton p.28 1st full paragraph, Section 2.2 Action-Value Methods: an action selection rule (i.e., a value ordering heuristic) using the ε-greedy technique, where most of the time the action can be a greedy action to select the action with the highest estimated action value, but once in a while (with probability ε) it can select a random action independent of the action-value estimates (“The simplest action selection rule is to select the action (or one of the actions) with highest estimated action value, that is, to select on play t one of the greedy actions a*, for which                 
                    
                        
                            Q
                        
                        
                            t
                        
                    
                
            (a*) =                 
                    
                        
                            m
                            a
                            x
                        
                        
                            a
                        
                    
                    
                        
                            Q
                        
                        
                            t
                        
                    
                
            (a). This method always exploits current knowledge to maximize immediate reward; it spends no time at all sampling apparently inferior actions to see if they might really be better. A simple alternative is to behave greedily most of the time, but every once in a while, say with small probability ε, instead select an action at random, uniformly, independently of the action-value estimates. We call methods using this near-greedy action selection rule ε-greedy methods. An advantage of these methods is that, in the limit as the number of plays increases, every action will be sampled an infinite number of times, guaranteeing that                 
                    
                        
                            k
                        
                        
                            a
                        
                    
                    →
                    ∞
                
             for all a, and thus ensuring that all the                 
                    
                        
                            Q
                        
                        
                            t
                        
                    
                
            (a) converge to Q*(a). This of course implies that the probability of selecting the optimal action converges to greater than 1-ε, that is, to near certainty.”).]) … 
Both Fukui in view of Xu and Sutton are analogous art as both teach adaptive/reinforcement learning methods.
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to take the value ordering heuristic h used in the adaptive/reinforcement learning algorithm of Fukui in view of Xu and replace it with the ε-greedy selection strategy of Sutton as a way to incorporate the ε-greedy technique into a constraint-based solver system to solve a constraint satisfaction problem. The motivation to combine is taught in Sutton, as ε-greedy techniques perform better over time and has less chance of getting stuck at a sub-optimal solution, thus improving the performance of a constraint-based solver system by finding a more optimal solution ([Sutton p.28 3rd full paragraph – p.29 Section 2.2 “Figure 2.1 compares a greedy method with two ε-greedy methods (ε=0.01 and ε=0.1), as described above, on the 10-armed testbed. Both methods formed their action-value estimates using the sample-average technique. The upper graph shows the increase in expected reward with experience. The greedy method improved slightly faster than the other methods at the very beginning, but then leveled off at a lower level. It achieved a reward per step of only about 1, compared with the best possible of about 1.55 on this testbed. The greedy method performs significantly worse in the long run because it often gets stuck performing suboptimal actions. The ε-greedy methods eventually perform better because they continue to explore, and to improve their chances of recognizing the optimal action. The ε=0.1 method explores more, and usually finds the optimal action earlier, but never selects it more than 91% of the time. The ε=0.01 method improves more slowly, but eventually performs better than the ε=0.01 method on both performance measures. It is also possible to reduce over time to try to get the best of both high and low values.”]).
Regarding Claim 8, Fukui in view of Xu as applied to Claim 7 teaches
The solution search processing method according to claim 7, wherein, the step of updating the action-value function, an ε-greedy technique is used as a selection strategy of a policy for learning the action-value function (This claim limitation is similar in scope to a corresponding claim limitation in Claim 4, and hence is rejected under similar rationale).  

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to WILLIAM WAI YIN KWAN whose telephone number is 303-297-4332.  The examiner can normally be reached on Monday-Friday 8:00am - 4:30pm PT.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Li Zhen can be reached on 571-272-3768.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.




/WILLIAM WAI YIN KWAN/
Examiner, Art Unit 2121



/Li B. Zhen/Supervisory Patent Examiner, Art Unit 2121