Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 1-15 are rejected under 35 U.S.C. 101 because the claimed invention is directed to claim an abstract idea without significantly more.
Regarding Step 1 of 101 Analysis.
Claims 1-5 are directed to a processor implemented method for performing negotiation tasks using reinforced learning agents, therefore, Claims 1-5 are within at least one of the four (i.e. method).  Claims 6-10 are directed to system for performing negotiation tasks using reinforced learning agents, therefore, Claims 6-10 are within at least one of the four (i.e. system).  Claims 11-15 are directed to one or more non-transitory machine readable information storage mediums for performing the instructions for negotiation tasks using reinforced learning agents, therefore, Claims 11-15 are within at least one of the four (i.e. article of manufacture).
Under Step 2A of the 2019 Revised Patent Subject Matter Eligibility Guidance (2019 PEG), it is determined whether the claims are directed to a judicially recognized exception.
Step 2A is a two-prong inquiry.
Under Prong 1, it is determined whether the claim recites a judicial exception (YES).
The exemplary Claims 10-16 recite limitations that fall within the certain methods of organizing human activity groupings of abstract ideas, including:
Re. Claim 6, A system (102) for performing a negotiation task, the system (102) comprises:
a processor (202);
an Input/output (1/0) interface (204); and
a memory (208) coupled to the processor (202), the memory (208) comprising:
receive, by a negotiating agent implemented by the processor, a request for performing the negotiation task between the negotiating agent and an opposition agent, to agree on an optimal contract proposal comprising a plurality of clauses from a set of clauses predefined for the negotiation task, wherein each of the negotiating agent and the opposition agent comprises a plurality of behavioral models modeled based on a reward function;
negotiate one on one, by the negotiating agent with the plurality of behavioral models of the opposition agent to agree on a plurality of intermediate contract proposals, wherein the negotiation between each of the negotiating agent and the opposition agent is in accordance with a negotiation training procedure; and
select, by a selector agent, the optimal contract proposal from the plurality of intermediate contract proposals generated
Re. Claim 7, The system (102) as claimed in claim 6, wherein each of the plurality of behavioral models comprises a Selfish-Selfish (SS) model, a Selfish-Prosocial (SP) model, a Prosocial-Selfish (PS) model and a Prosocial-Prosocial (PP) model reflecting behavioral aspect of the negotiating agent paired with behavioral aspect of the opposition agent.
Re. Claim 8, The system (102) as claimed in claim 6, wherein the negotiating training procedure for performing the negotiation task between each of the negotiating agent and the opposition agent comprises:
obtaining, by the negotiating agent at time step 't' a plurality of state inputs, wherein the plurality of state inputs includes a utility function, an opponent offer, a previous opponent offer and an agent ID; 
generating, by the negotiating agent for the corresponding behavior from the plurality of behavioral models, a first intermediate contract proposal utilizing the plurality of said state inputs for performing the negotiation task, wherein the first intermediate contract proposal predicts the number of bits to be flipped during the performance of the negotiation task;
generating, by the opposition agent at next time step 't+1
assigning, a reward for each behavior model of the intermediate contract proposal of the negotiating agent and the opposition agent based on the performed negotiation task.
Re. Claim 9, The system (102) as claimed in claim 6,wherein assigning the reward for each behavior model of the intermediate contract proposal comprises:
a maximum reward is assigned to the negotiating agent and the opposition agent, if the generated intermediate contract proposal is optimal; and
a minimum reward is assigned to the negotiating agent and the opposition agent, if the generated intermediate contract proposal is not optimal.
Re. Claim 10, The system as claimed in claim 6, wherein selecting the optimal contract
proposal using the selector agent comprises:
obtaining, the plurality of contract proposals generated by the negotiating agent and the opposition agent for each behavior from the plurality of behavioral models; and
determining, the intermediate contract proposal utilizing the plurality of contract proposals obtained from the plurality of behavioral models of the negotiating agent and the opposition agent and the maximum reward attained by each of the intermediate contract proposals and the frequency distribution of the negotiating agent selection sequence.
The MPEP addresses the instant case with clear guidance:
2106.04(a)(2) Abstract Idea Groupings
II. CERTAIN METHODS OF ORGANIZING HUMAN ACTIVITY
fundamental economic principles or practices (including hedging, insurance, and mitigating risk)
commercial or legal interactions (including agreements in the form of contracts; legal obligations; advertising, marketing or sales activities or behaviors; and business relations)
managing personal behavior or relationships or interactions between people (including social activities, teaching, and following rules or instructions)
The Claims 6-10 limitations of receive a request to agree on an optimal contract proposal; negotiate one on one to agree on a plurality of intermediate contract proposals; select the optimal contract proposal from the plurality of intermediate contract proposals generated; performing the negotiation task between each of the negotiating agent and the opposition agent; selecting the optimal contract proposal from the plurality of intermediate contract proposals generated; performing the negotiation task between each of the negotiating agent and the opposition agent; generating, by the negotiating agent for the corresponding behavior from the plurality of behavioral models, a first intermediate contract proposal; assigning, a reward for each behavior model; a maximum reward is assigned if generated contract proposal is optimal; a minimum reward is assigned if contract proposal is not optimal; obtaining, the plurality of contract proposals generated; determining, the intermediate contract proposal toward the goal of efficiently completing the negotiation between multiple parties saving both time and money with using a process that has a higher likelihood of attaining mutually beneficial results, all of which recite an abstract commercial and legal interaction, specifically one regarding contract negotiation and formation.
Under Prong 2, it is determined whether the claim recites additional elements that
integrate the exception into a practical application of the exception. This judicial exception is
not integrated into a practical application (NO).
Claims 10-16 do not recite additional elements beyond the judicial exceptions.  The claims recites no hardware or software or any indication that the parts in the system are performed in a technological environment, other than a generic computer system to receive, negotiate and select the optimal contract proposal; the use of the generic computer that amounts to mere instructions to implement an abstract idea on a computer, which is insufficient to integrate into a practical application.
Additionally, the claimed elements are insufficient to integrate the abstract idea into a practical application because the claim fails to i) reflect an improvement in the functioning of a computer or an improvement to another technology or technical field, ii) apply the judicial exception with, or use the judicial exception in conjunction with, a particular machine or manufacture that is integral to the claim, iii) effect a transformation or reduction of a particular article to a different state or thing, or iv) apply or use the judicial exception in some other meaningful way beyond generally linking the use of the judicial exception to a particular technological environment. Instead, this invention is merely implementing this abstract idea on generic computers and applying it to determine the most mutually beneficial outcome. See: MPEP § 2106.04(d)(I)
Accordingly, the judicial exception is not integrated into a practical application.
Under Step 2B, it is determined whether the claims recite additional elements that amount to significantly more than the judicial exception. The claims of the present application do not include additional elements that are sufficient to amount to significantly more than the judicial exception (NO).
The claim does not recite additional elements beyond the judicial exception(s). The claims recites no hardware or software or any indication that the steps of the method are performed in a technological environment, other than to receive inputs from multiple parties to perform negotiations based on a negotiation training procedure for all parties; which are mere instructions to implement an abstract idea on a computer, thereby being insufficient to amount to significantly more than the judicial exception.
The analysis above applies to all statutory categories of invention, so method Claim 1, and dependent Claims 2-5 and article of manufacture Claim 11 and dependent Claims 12-15 are also rejected under 35 U.S.C. 101.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1, 2, 5, 6, 7, 10, 11, 12 and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Hadingham et al. (US 7,373,325 B1, hereafter Hadingham), further in view of Se Hyung Oh (Does Content of  Concessions Matter in Negotiation? Match Between Concessions Strategy and Target’s Regulatory Focus (Graduate School of Vanderbilt University, hereafter Oh)) and further in view of Cao et al. (Automated negotiation for e-commerce decision making: A goal deliberated agent architecture for multi-strategy selection, Decision Support Systems 73 (2015) pgs. 1-14, hereafter Cao)
Re. Claims 1, 6 and 11, Hadingham teaches A processor implemented method for performing a negotiation task (Abs.; A system for use on an electronic network for negotiating contracts between at least one buyer and at least one seller), the method comprising:
Hadingham further teaches receiving, by a negotiating agent implemented by the processor, a request for performing the negotiation task between the negotiating agent and an opposition agent, to agree on an optimal contract proposal (Abs.; in which proposals may be made or called for by buyers and/or sellers, and in which each party is represented by a software agent.) Hadingham doesn’t teach comprising a plurality of clauses from a set of clauses predefined for the negotiation task, wherein each of the negotiating agent and the opposition agent comprises a plurality of behavioral models modeled based on a reward function;
However, Oh, in same field of endeavor does teach comprising a plurality of clauses from a set of clauses predefined for the negotiation task (Oh, Study One, Negotiation Task; The instructions stated that the price had already been agreed upon between the seller and the buyer but that the details of the car should be negotiated. There  wherein each of the negotiating agent and the opposition agent comprises a plurality of behavioral models modeled based on a reward function (Oh, Chap. 4, ¶ 3; Participants were invited to the experiment through email. The email described the experiment as a car sales negotiation simulation with a matched opponent over the internet. However, the negotiation partner was actually simulated by a computer, without any involvement of another participant. Participants volunteered for the experiment for a chance to win $100, determined by a lottery. Individuals’ momentary regulatory focus can be aroused by the instruction of experiment (e.g, Idson, Liberman, & Higgins, 2004; Shah, Higgins, Friedman, 1998). For example, if participants are instructed to obtain some reward for their negotiation performance, participants tend to become gain-focused (or promotion-focused).);
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify Hadingham’s negotiation technique with Oh’s feature That claims that the value of what individuals are doing comes not only from the outcome (e.g., high benefit and low cost), but also from their subjective feelings of “rightness,” which is generated when they pursue a goal with a means that matches with their regulatory focus. In other words, independent of the outcome value, when people pursue a goal in a manner that sustains their regulatory focus (e.g., promotion-focused) people pursue a goal with an eager manner (Oh, Ch. II, Literature Review, Regulatory Fit, p. 39).
Oh teaches negotiating one on one, by the negotiating agent (Oh, Chap. 4, Negotiating Task, pg. 67;  Participants were informed that they would take the role of car buyer and that their opponent over the internet (computer) would take the role of car dealer.) with the plurality of behavioral models of the opposition agent to agree (Oh, Chap. IV, Negotiating Task, pg. 66, Then, they were asked to answer Higgins et al.’s (2001) Regulatory Focus Questionnaire (RFQ). The RFQ measures individual differences in chronic regulatory focus, based on personal histories of success or failure of achieving goals in a promotion-focused or prevention-focused manner (see Appendix I)., Examiner points out that regulatory focus is the behavioral model used in this study.) on a plurality of intermediate contract proposals (Oh, Chap. II, Concession and Reciprocity in Negotiation, ¶ 1, Pg. 8; In most negotiation settings, concession is an imperative tactical move to generate an agreement between two parties, and multiple occurrences of concession are expected during the process of negotiation (Pruitt, 1981)., Examiner notes that the occurrence of concession in each round of negotiation is an intermediate contract proposals.); wherein the negotiation between each of the negotiating agent and the opposition agent is in accordance with a negotiation training procedure (Oh, Chap. IV, Negotiation Task, pg. 68; The instructions said that the point system reflected buyer’s needs and the actual values in the car market. The instructions also said that participants’ decisions should be based on the values presented in the table, not on their individual preferences.); and
The combination of Hadingham and Oh do not teach selecting, by a selector agent, the optimal contract proposal from the plurality of intermediate contract proposals generated by performing negotiation between the negotiation agent and the opposition agent based on the negotiation training procedure,
However, Cao does teach selecting, by a selector agent, the optimal contract proposal from the plurality of intermediate contract proposals generated by performing negotiation between the negotiation agent and the opposition agent based on the negotiation training procedure (Cao, §7.3.3, ¶3; Secondly, when the buyer adopts strategy selection mechanism to negotiate with the seller, we can find out from Table 1 that in comparison with the seller with Boulware tactics, the seller with Conceder tactics reaches a higher success rate, but the utility product (UP) and the utility difference (UD) are worse. This implies that the conceder tactic can help the seller to get more agreements because it is an aggressive strategy sacrificing the agent's own utility as compensation. When we consider all the factors and situations, the Boulware tactic is a better choice for both sides to obtain a better outcome.), 
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify Hadingham and Oh’s system for negotiations to maintain pace With the rapid growth of global emarkets, there has been a significant interest in designing Automated Negotiation System (ANS) [10] that can serve as surrogates for human business decision-makers, where software agents are designed to autonomously act on behalf of the real-world parties [11,12], (Cao, §1. Introduction, ¶2).
Cao further teaches wherein the selector agent is an ensemble of the plurality of behavioral models of the negotiating agent and the opposition agent (Cao, §2.2, ¶4; Departing from the prior studies [27,32],we conceive a possible way of conducting multi-strategy selection: combining the behavior dependent and time-dependent to take both the opponent's negotiation history and the time factor into consideration.).
Re. Claim 2, 7 and 12, Hadingham, Oh and Cao teach The method as claimed in claim 1, Oh teaches wherein each of the plurality of behaviors models comprises a Selfish-Selfish (SS) model, a Selfish-Prosocial (SP) model, a Prosocial-Selfish (PS) model and a Prosocial-Prosocial (PP) model reflecting behavioral aspect of the negotiating agent paired with behavioral aspect of the opposition agent 
Re. Claim 5, 10 and 15, Hadingham, Oh and Cao teach The method as claimed in claim 1, Cao teaches wherein selecting the optimal contract proposal using the selector agent comprises:
obtaining, the plurality of contract proposals generated by the negotiating agent and the opposition agent for each behavior from the plurality of behavioral models (Cao, §4, Step 3; The negotiating agent begins to handle the deliberation situations by applying varieties of the negotiation strategies, which have been stored in the belief base as a static belief, to generate negotiation goal options, which is so-called negotiation desires representing the possible negotiation goals that the agent tries to achieve. The negotiating agent then chooses among those possible goals based on certain constraints.) and
determining, the intermediate contract proposal utilizing the plurality of contract proposals obtained from the plurality of behavioral models of the negotiating agent and the opposition agent and the maximum reward attained by each of the intermediate contract proposals and the frequency distribution of the negotiating agent selection sequence .
Claims 3, 8 and 13 are rejected under 35 U.S.C. 103 as being unpatentable over Hadingham et al. (US 7,373,325 B1, hereafter Hadingham), further in view of Se Hyung Oh, Does Content of  Concessions Matter in Negotiation? Match Between Concessions Strategy and Target’s Regulatory Focus (Graduate School of Vanderbilt University, hereafter Oh), further in view of Cao et al. (Automated negotiation for e-commerce decision making: A goal deliberated agent architecture for multi-strategy selection, Decision Support Systems 73 (2015) pgs. 1-14, hereafter Cao) and further in view of Naveh (US 20050021486, hereafter Naveh)
Re. Claim 3, 8 and 13, Hadingham, Oh and Cao teach The method as claimed in claim 1, Cao teaches where in the negotiating training procedure for performing the negotiation task between each of the negotiating agent and the opposition agent comprises:
obtaining, by the negotiating agent at time step 't' a plurality of state inputs, wherein the plurality of state inputs includes a utility function, an opponent offer, a previous opponent offer and an agent ID (Cao, § 3, Definition 1(i); I represents the beliefs triggered by interaction, which is received from the environment or other negotiating agent during the negotiation process., Examiner notes that the definition step is same for both negotiating agents.), and (Cao, § 3, Definition 1(iii); S={s1, s2, ⋯, sn} is a set of negotiation strategies performed by the negotiating agent. Where negotiation strategy si is a function s: I -> O, meaning that the agent receives some input proposals (in the set of I) from its opposing negotiation party, and implements the current negation strategy, and then makes some new offers proposals (in the set of O) against that of its negotiating partner.), and (Cao, § 3, Definition 1(iv); M = {m1, m2, ⋯, mn} is the utility 
generating, by the negotiating agent for the corresponding behavioral from the plurality of behavioral models  a first intermediate contract proposal utilizing the plurality of said state inputs for performing the negotiation task (Cao , §2.2 Negotiating Strategy, ¶ 3; However, human negotiators usually perform a behavioral game process [31], rather than surmising the opponent's next offer in real world negotiations. Normally it is required to observe the opponent's behavior, including offers, words, actions, and so on, to collect enough information before making the next decision.), Cao does not teach wherein the first intermediate contract proposal predicts the number of bits to be flipped during the performance of the negotiation task 
However, Naveh teaches wherein the first intermediate contract proposal predicts the number of bits to be flipped during the performance of the negotiation task (Naveh, ¶ 87; On the other hand, if the solver finds at step 70 that the new state has a lower cost than the current state, it discards the current state and keeps the new state for further processing. Since the selection of bits to be changed, made at step 64, was successful in reducing the cost, processor 28 learns the characteristics of this state hop, at a learning step 74. These learned characteristics are applied in subsequent iterations through step 64, as described above. The characteristics learned at step 74 may include, for example, characteristic hop sizes (i.e., the number of bits to be flipped in each hop); sets of correlated bits (which should 
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify Hadingham, Oh and Cao’s system for negotiations so that each constraint may be expressed as a relation, defined over some subset of the variables, denoting valid combinations of their values. A solution to the problem is an assignment of a value to each variable from its domain that satisfies all of the hard constraints (Naveh, ¶ 2).
Cao teaches generating, by the opposition agent at next time step 't+1 'for the corresponding behavioral from the plurality of behavioral models, a second intermediate contract proposal based on the first intermediate contract proposal obtained from the negotiating agent, wherein the second intermediate contract proposal maximizes the offer in the intermediate contract proposal for performing the negotiation task (Cao, §6.3 and Fig. 4; Based on the theoretical model proposed in the previous subsections, Fig. 4 (below) shows the formal description of the multi-strategy selection algorithm, which consists of the following seven steps:
Step 1 At the beginning, the first offer is made by the seller.
Step 2 Since the buyer needs 3 sequential offers of the seller to get seller's concession rate, if it is the first two round of negotiation, the process will proceed to step 3; otherwise, the process goes to step 5.
Step 3 If the buyer's current offer is larger than the seller's current offer, the buyer does not need to propose its offer. Instead, the buyer accepts the seller's current 
Step 4 After the buyer's offer, the seller will make its new offer according to its strategy. If the next offer is less than the buyer's current offer, the seller will accept the buyer's current offer and go to the end; otherwise, sets the new offer of the next round, and go back to step 2.
Step 5 If the current negotiation is between the third round and the negotiation deadline (i.e., the shorter one between the buyer's and seller's deadlines), the flow goes to step 6, otherwise, terminates., Examiner notes that Steps 4 and 5 are included in this reference to indicate the process if negotiations go beyond t+1=2.).

    PNG
    media_image1.png
    200
    400
    media_image1.png
    Greyscale

Fig. 4. Buyer's multi-strategy selection algorithm, where it is the time, θ is the concession rate of seller agent, Bcur and Scur are the current bid of the Buyer and Seller respectively; Snext is Seller's bid for the next round of negotiation.
Oh Teaches assigning, a reward for each behavior model of the intermediate contract proposal of the negotiating agent and the opposition agent based on the performed negotiation task (Oh, Chap. II, Regulatory Focus, pg. 37; An individual’s regulatory focus was also stimulated by framing rewards or penalties for task performance differently. Researchers induced a promotion focus in subjects by framing rewards for a task as a benefit to be gained, and they stimulated a prevention focus in subjects by framing penalties for a task as a loss to be avoided (e.g., Crowe & Higgins, 1997; Forster, Higgins, & Idson, 1998).).
Claims 4, 9 and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Hadingham et al. (US 7,373,325 B1, hereafter Hadingham), further in view of Se Hyung Oh, Does Content of Concessions Matter in Negotiation? Match Between Concessions Strategy and Target’s Regulatory Focus (Graduate School of Vanderbilt University, hereafter Oh), further in view of Cao et al. (Automated negotiation for e-commerce decision making: A goal deliberated agent architecture for multi-strategy selection, Decision Support Systems 73 (2015) pgs. 1-14, hereafter Cao) and further in view of Chen et al.  (A reinforcement learning optimized negotiation method based on mediator agent, Expert Systems with Applications 41 (2014) pgs. 7630-7640, hereafter “Chen”)
Re. Claim 4, 9 and 14, Hadingham, Oh and Cao teach The method as claimed in claim 1, Hadingham, Oh and Cao do not teach wherein assigning the reward for each behavior model of the intermediate contract proposal comprises:
However, Chen does teach wherein assigning the reward for each behavior model of the intermediate contract proposal comprises:
a maximum reward is assigned to the negotiating agent and the opposition agent, if the generated intermediate contract proposal is optimal (Chen, §2.3, Definition 3; Both agents submit their offers to the mediator agent simultaneously. Only after the negotiation achieves success, it can obtain the corresponding reward value r. Expected utility is the reward value acquired after success. Assuming when the negotiation pT, achieves success, the agreement price is and the agreement qT, quantity is then the reward value of each agent is as follows: 

    PNG
    media_image2.png
    200
    400
    media_image2.png
    Greyscale

Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify Hadingham, Oh and Cao’s system for negotiations with Chen’s process and equations for Reinforcement learning toward an efficient machine learning method, learning from environment state to action mapping to maximize the cumulative reward obtained from environment, finding the optimal behavioral strategy by trial and error (Chen, Introduction, pg. 7631); and 
Chen teaches a minimum reward is assigned to the negotiating agent and the opposition agent, if the generated intermediate contract proposal is not optimal (see analysis for maximum reward above; same process and equation applies).
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to GEORGE EDWARD DUNNING JR whose telephone number is (469)295-9281.  The examiner can normally be reached on 7:30 - 4:30 CST Mon - Fri.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Florian “Ryan” Zeender can be reached on (571) 272-6790.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/G.E.D./Examiner, Art Unit 3627                                                                                                                                                                                                        
/JAN P MINCARELLI/Primary Examiner, Art Unit 3627