DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Status of Claims
This action is in reply to the response filed on 9/26/2022.
Claim 5 is amended. Claim 6 is added.
The Amendment filed 9/26/2022 has been entered.
Claims 1-6 are currently pending and have been examined. 
This action is made FINAL.

Response to Arguments
The amendment to claim 5 overcomes the objection in the non-final rejection (7/29/2022). Upon further search and consideration, claim 5 is allowable (see reasons for allowance section below).
Applicant’s argument, see page 7-8, with respect to the interpretation of claim 1 is unpersuasive. The applicant recites “a learning unit configured to generate a learning model for determining a quality of an operation of the industrial robot using the state data (i.e., "at least a state of force and moment applied to a manipulator and a state of a position and a posture in an operation when a force control of the industrial robot is carried out" recited at lines 4-5 of claim 1) and the determination data (i.e., "for the acquired data" recited at lines 9-10 of claim 1).” The examiner maintains the position that it is not claimed that the learning unit must use all of the acquired data (force, moment, position, and posture.), rather it is claimed that the learning unit uses state data and determination data. Wherein state data is merely described as created based on acquired data. The state of force and moment and state of position and a posture makeup acquired data and therefore these ‘states’ are interpreted to be different than the state data created by the preprocessing unit. Accordingly, Muraoka teaches a learning unit configured to generate a learning model for determining a quality of an operation of the industrial robot using the state data and the determination data (see the rejection provided below).
Applicant’s argument, see page 8, with respect to the combination of Muraoka and Rajkumar not disclosing all of the features recited in claim 1 is unpersuasive. The applicant’s argument is based on Rajkumar being silent on "an industrial robot that has a function of detecting force and moment applied to a manipulator,"... and “a state of a position and a posture in an operation when a force control of the industrial robot is carried out.” However, the examiner relies on Muraoka to teach these features and only relies on Rajkumar for the concept of creating the evaluation function by determining a threshold value.  Even though Rajkumar does not teach detecting force and moment applied nor a state of a position and a posture during force control, Rajkumar is relevant because it teaches evaluating robot operation in the context of robot learning. It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to modify the teachings of Muraoka to further include the teachings of Rajkumar to set thresholds for the evaluation function to filter out bad training examples in order to improve the training and performance (“The computing system can also set quality thresholds and quality checks for the training of machine learning models, so that updates to the models improve performance and are based on valid examples. These evaluation and filtering steps can also provide a measure of security by blocking malicious or intentionally incorrect information from propagating between robots and degrading robot performance.” [0162]). Accordingly, the prior art rejections are maintained.
Applicant’s argument, see page 9, with respect to Rajkumar and Muraoka not disclosing the features of newly added claim 6 has been fully considered and is persuasive. However, upon further consideration, a new ground(s) of rejection is made in view of Muraoka, Rajkumar, and Inoue (NPL: Deep Reinforcement Learning for High Precision Assembly Tasks).

Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation(s) is/are: Data acquisition unit in claims 1-2 and 5; Evaluation function creation unit in claims 1-3 and 5; Determination data creation unit in claims 1 and 5; Preprocessing unit in claims 1 and 5; Learning unit in claims 1 and 5-6.
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof. The structure of the units are described on page 9, “each functional block shown in FIG. 2 is achieved by the CPU 11 of the determination apparatus 1 shown in FIG. 1 and the processor 101 of the machine learning device 100 that executes respective system programs and controls the operations of the respective units of the determination apparatus 1 and the machine learning device 100.”
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claim(s) 1 is/are rejected under 35 U.S.C. 103 as being unpatentable over Muraoka (US 20190184564 A1) in view of Rajkumar (US 20190197396 A1).

Regarding Claim 1, Muraoka discloses:
A determination apparatus for determining an operation of an industrial robot that has a function of detecting force and moment applied to a manipulator, the determination apparatus comprising (“a robot control method controlling a robot so as to mount a first component supported by a hand of the robot driven by an actuator to a second component. The robot control method including: a reinforcement learning step acquiring a correspondence-relation between a plurality of half-mounted-states of the first component and an optimal action of the robot giving the highest reward for each of the plurality of half-mounted-states by mounting the first component to the second component multiple times by driving the hand; and a mounting step, when mounting the first component to the second component, detecting a half-mounted-state of the first component, identifying an optimal action corresponding to the half-mounted-state detected based on the correspondence-relation acquired in the reinforcement learning step, and controlling the actuator in accordance with the optimal action identified.” See at least [0006]; Examiner Interpretation: Detecting a half mounted state involves detecting force and moment. See at least [0041]):
a data acquisition unit configured to acquire, as acquired data, at least a state of force and moment applied to a manipulator and a state of a position and a posture in an operation when a force control of the industrial robot is carried out (“Each servo motor 13 is provided with an encoder 14 that detects the rotation angle of the servo motor 13. The detected rotation angle is fed back to the controller 2, which then feedback-controls the position and posture of the hand 12 in a three-dimensional space.” See at least [0023]; “the controller 2 receives signals from the encoder 14, as well as from a force detector 15 and an input unit 16.” see at least [0029]; “The force detector 15 detects translational forces Fx, Fy, and Fz in the X-axis, Y-axis, and Z-axis directions and moments Mx, My, and Mz around the X-axis, Y-axis, and Z-axis acting on the hand.” See at least [0030]; Examiner Interpretation: The data acquisition unit comprises the encoder, force detector, and input unit.);
an evaluation function creation unit configured to create an evaluation function that evaluates a quality of the operation based on the acquired data (“The Q-value is updated by the following formula (I) on the basis of a state s.sub.t and an action a.sub.t at time t.” [0034]; “In the formula (I), α is a coefficient (leaning rate) representing the degree to which the Q-value is updated, and γ is a coefficient (discount rate) representing the degree to which the result of an event which may occur from now on is reflected. The coefficients α, γ are properly adjusted and set within 0<α≤1 and 0<γ≤1, respectively, on the basis of experience. Also, r is an index (reward) for evaluating the action at with respect to a change in the state s.sub.t and is set such that the Q-value is increased when the state s.sub.t becomes better.” See at least [0035]; “The learning control unit 23 sets the reward r of the formula (I) in each step in accordance with the reward table in FIG. 6.” See at least [0046]; Examiner Interpretation: The Q-value is the measure of quality of the operation. The formula (I) is the function that evaluates the quality of the operation, and the learning control unit is the creation unit that creates the evaluation function (I) by setting the reward r value in the formula (I).);
a determination data creation unit configured to create determination data for the acquired data using the evaluation function created by the evaluation function creation unit (“The learning control unit 23 selects any action that allows for obtaining a positive reward, from these applicable actions and causes the robot 1 to take the selected action, as well as calculates the Q-value using the formula (I) each time it selects an action.” See at least [0054]; Examiner Interpretation: The calculation of the Q-value is the creation of the determination data based on the formula (I) which was created by the evaluation function creation unit.);
a preprocessing unit configured to create state data for machine learning based on the acquired data (“By using the amount of change ΔFz of the force as a parameter, the state can be identified accurately without being affected by the individual differences between workpieces 100. If the force Fz itself is used as a parameter, the threshold needs to be reset each time the type of workpiece changes. On the other hand, in the present embodiment, the amount of change ΔFz of the force is used as a parameter. Thus, even if the type of workpiece changes, the threshold does not need to be reset, and the state is easily identified. The moment Mx becomes a positive value when a rotation force in the positive Y-direction acts on the hand 12, and it becomes a negative value when a rotation force in the negative Y-direction acts on the hand 12. By determining whether the value of the moment Mx is positive or negative, the direction of misalignment of the workpiece 100 with respect to the axis CL3 can be identified.” [0041]; “The Q-value is set in accordance with the state and action in each of steps ST1 to ST20.” See at least [0056]; Examiner Interpretation: The identification of state (state data) based on forces and moments (acquired data) is the creation of state data based on the acquired data. These states are particular types of press-fitting situations/alignments (see at least [0040-44] and fig. 5).);
and a learning unit configured to generate a learning model for determining a quality of an operation of the industrial robot using the state data and the determination data (“The workpiece mounting operation as reinforcement learning is repeatedly performed until the Q-value converges in each of steps ST1 to ST20.” See at least [0054]; “The Q-values are updated in the reinforcement learning step. When the Q-values converge (the right side of FIG. 11), the converged Q-table is stored in the memory unit 21. The normal control unit 24 in FIG. 1 selects an action having the highest Q-value in each states from among the Q-tables stored in the memory unit 21. For example, when in the state MD1, the action a2 is selected, and when in the state MD2, the action a1 is selected. The normal control unit 24 then controls the servo motor 13 so that the robot 1 performs the selected action.” See at least [0057]; Examiner Interpretation: The generation of the learning model is the performing of reinforcement learning that updates and converges the Q-value to a value higher than an initial value. This then provides an optimal action based on the adjusted operation associated with the converged Q-value. This learning model determines the quality of the operation (converged Q-value) based on state data (state is used in part to determine the Q-value [0056]) and the updating of Q-values (determination data).).
wherein the evaluation function creation unit is further configured to create the evaluation function by determining a weighting coefficient (“In the formula (I), α is a coefficient (leaning rate) representing the degree to which the Q-value is updated, and γ is a coefficient (discount rate) representing the degree to which the result of an event which may occur from now on is reflected. The coefficients α, γ are properly adjusted and set within 0<α≤1 and 0<γ≤1, respectively, on the basis of experience.” [0035]; Examiner Interpretation: The coefficients described are the weighting coefficients that create the evaluation function (I).)

Muraoka does not explicitly teach 
wherein the evaluation function creation unit is further configured to create the evaluation function by determining … a threshold value.  
However, Rajkumar teaches
	“The computing system can also set quality thresholds and quality checks for the training of machine learning models, so that updates to the models improve performance and are based on valid examples.” [0162]
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to modify the teachings of Muraoka to further include the teachings of Rajkumar to set thresholds for the evaluation function to filter out bad training examples in order to improve the function (“The computing system can also set quality thresholds and quality checks for the training of machine learning models, so that updates to the models improve performance and are based on valid examples. These evaluation and filtering steps can also provide a measure of security by blocking malicious or intentionally incorrect information from propagating between robots and degrading robot performance.” [0162]). Even though Rajkumar is directed to sharing learned information between robots, the evaluation process itself of shared operational information would be obvious to use in a situation where the robot evaluates its own operation. 

Claim 2 is rejected under 35 U.S.C. 103 as being obvious over Muraoka (US 20190184564 A1) in view of Rajkumar (US 20190197396 A1) and Nishioka (US 20170315540 A1).
The applied reference, Nishioka, has a common assignee with the instant application. Based upon the earlier publication date of the reference, it constitutes prior art under 35 U.S.C. 102(a)(1).

Regarding claim 2,
Modified Muraoka discloses,
The determination apparatus according to claim 1,
Muraoka does not explicitly disclose:
wherein the data acquisition unit is configured to create temporary determination data in which a quality of an operation is determined according to time taken for the operation, 
and the evaluation function creation unit is configured to create the evaluation function based on the temporary determination data.  
However, Nishioka teaches:
	The data acquisition unit creates temporary determination data in which a quality of an operation is determined according to time taken for the operation (“The input part 22 is, for example, a keyboard for a personal computer connected to the machine controller 13 or a touch panel on a display device. It is understood that the predetermined range of dimensions and the predetermined time may be set by the manufacturing administrative computer 14.” [0052]; “During manufacturing of a part including multiple workpieces, the manufacturing adjustment system 10 thus configured sets a workpiece tolerance for each of the machines 12-1 to 12-n in real time, thereby adjusting the dimensions and manufacturing time of the manufactured part within the respective predetermined ranges.” [0071]; “In step S16 of FIG. 2, the decision part 21 decides whether the total machining time calculated by the total machining-time calculation part 20 falls within the predetermined time. If the calculated total machining time does not fall within the predetermined time as a decision result, the process returns to step Sll where the tolerance setting part 23 resets a workpiece tolerance for each of the machines 12-1 to 12-n. At this point, workpiece tolerances previously set for some of the machines 12-1 to 12-n are increased with the predetermined enlargement ratio. In other words, rough tolerances are set. In this case, it is preferable that the machines with the rough workpiece tolerances out of the machines 12-1 to 12-n be those which are likely to contribute to a reduction in total machining time. For example, the machine for producing a workpiece with a maximum machining time is preferably selected from the machines 12-1 to 12-n. The predetermined enlargement ratio may be equal to, for example, the reduction ratio. If the calculated total machining time falls within the predetermined time as a decision result of step S16, the process advances to step S12. Specifically, a workpiece is produced for each of the machines 12-1 to 12-n without changing the previously set workpiece tolerances.” [0069]; Examiner Interpretation: The data acquisition unit is interpreted as the decision part which receives a predetermined time from a user. This part makes a temporary determination of the quality of the operation by comparing the machining time to the input predetermined time. This determination data is temporary because a new determination is made every time a new work piece is operated on.).
The evaluation function creation unit creates the evaluation function based on the temporary determination data (“the learning unit 27 evaluates part dimensions and total machining times based on predetermined reference dimensions and a predetermined time when workpieces produced according to the workpiece tolerances of the machines 12-1 to 12-n are combined. At this point, the smaller the difference between the part dimension and the predetermined reference dimension, the higher the evaluation scores for the part dimension. The shorter the total machining time of a part relative to the predetermined time, the higher the evaluation score for the total machining time.” [0062]; Examiner Interpretation: An evaluation function is created based on the temporary determination data by using the difference between the actual machining time and the predetermined time.).
It would have been obvious to one of ordinary skill in the art at the time of filing to have modified modified Muraoka to include further teachings of Nishioka because when a machine degrades, it’s desired to maintain the timeliness and quality of manufacturing. Nishioka’s method to adjust machine parameters optimizes machining quality and time in real time (A machine in this application can refer to an industrial robot. [0043]; “Moreover, in the inventions described in Patent Literature 1 and Patent Literature 2, during the manufacturing of the part including the workpieces, the tolerances of the workpieces cannot be set in real time in machines so as to adjust the dimensions and manufacturing time of the part within respective predetermined ranges. Furthermore, a workpiece tolerance for each machine cannot be adjusted in real time in consideration of an actual machine status or an actual workpiece machining status, for example, machining accuracy reduced by wear of a tool.” See at least [0008]).

Claim 3 is rejected under 35 U.S.C. 103 as being obvious over Muraoka (US 20190184564 A1) in view of Rajkumar (US 20190197396 A1) and Porter (US 10766137 B1).

Regarding Claim 3,
Modified Muraoka teaches
The A determination apparatus according to claim 1, 
Muraoka does not explicitly teach
wherein the evaluation function creation unit is further configured to create temporary determination data for classifying each of a plurality of categories into a plurality of subcategories on a standard set by an operator.  
However, Porter teaches
Classifying into subcategories (“the classifier 234 can be trained using unsupervised training. For example, when provided with a large corpus of training examples of successful (or unsuccessful) task performance, the classifier 234 can use clustering and/or correlation analysis to identify features common to a significant subset of the training examples. The parameters of the classifier 234 can be set to identify such features in new input data to determine whether the new input data matches the paradigm of success (or failure) represented by the training examples.” Col. 10, lines 36-45; “Another advantage of the process 400 is that once the robot is consistently performing at a high level of success, recorded observations of its performance can be used to train the machine learning classifier (or a different machine learning classifier) to identify other examples of task success with greater precision. For example, after using the classifier (or human input) to identify that the recorded observations represent greater than a threshold degree of success at the task, the robotic control system 220 may determine to store these as examples of success in the training data repository.” See at least Col. 15, lines 42-51.; Examiner Interpretation: The original categories are successful and unsuccessful observations. The successful operations are further categorized into levels of success based on being better or worse than a threshold. The threshold is based on training data (determination data).);
Creation of temporary determination data that classifies on a standard set by an operator (“The machine learning classifier 234 is a module configured to learn the parameters that enable it to evaluate the level of success at a particular task that is represented by input data. The training data repository 242 stores training data that can be used to learn these parameters, and as illustrated can include one or both of external training data 241 and data received from observation system 215. External training data 241 can include examples of task performance by a human, computer simulation, or a different robotic system, and preferably is in the same format as the data that is received from the observation system 215. Data received from the observation system 215 may depict one or more robotic systems 210 performing the task. In some embodiments, the external training data 241 can include human-provided labels (e.g., A or B preferences, success level scores, or binary success/failure indications) on data received from the observation system 215.” See at least Col. 10, lines 7-24; Examiner Interpretation: The human provided labels on the training data is the determination data that is used for classifying with a standard set by an operator. The training data used for classifying levels of success is temporary because the training data can change.).
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to modify the teachings of modified Muraoka to further include the teachings of Portor so that machine learning can replicate a more specific group of results and therefore improve and/or maintain the robot’s operation quality (“For example, a training system can capture videos of a robot performing a task and provide these videos to the computer model for automated evaluation. The result of this evaluation by the computer model can be used to generate and/or recalibrate the control policy that dictates what sequence of actions the robot will take to perform the task. By using machine-learned computer models to evaluate task performance, the present technology is able to identify the factors that lead to success at a task and then leverage knowledge of those factors to develop robotic control policies without requiring significant human oversight throughout the process.” See at least Col. 1, lines 49-60; “Beneficially, the disclosed techniques can also be leveraged to recalibrate robotic systems by using a trained machine learning classifier to detect when robot task performance becomes less successful, and by using the output of the classifier to recalibrate the reward function and policy of the robotic system.” Col. 3, lines 3-9).

Claim 4 is rejected under 35 U.S.C. 103 as being obvious over Muraoka (US 20190184564 A1) in view of Rajkumar (US 20190197396 A1), Nishioka (US 20170315540 A1), and Porter (US 10766137 B1).

Regarding Claim 4,
Modified Muraoka teaches
The determination apparatus according to claim 2,
Muraoka teaches the weighting coefficient as a parameter of the evaluation function (see at least [0035] cited in claim 1 rejection).

Muraoka does not explicitly teach the concept of
wherein the evaluation function parameters (weighting coefficient and the threshold value) are determined so that respective evaluation values calculated at a time match the temporary determination data included in evaluation function creation data.  
However, Rajkumar teaches the threshold value as a parameter of the evaluation function (see at least [0162] cited in claim 1 rejection).

Rajkumar also does not explicitly teach the concept of
wherein the evaluation function parameters (weighting coefficient and the threshold value) are determined so that respective evaluation values calculated at a time match the temporary determination data included in evaluation function creation data.  
However, Porter teaches
“The machine learning classifier 234 is a module configured to learn the parameters that enable it to evaluate the level of success at a particular task that is represented by input data. The training data repository 242 stores training data that can be used to learn these parameters, and as illustrated can include one or both of external training data 241 and data received from observation system 215. External training data 241 can include examples of task performance by a human, computer simulation, or a different robotic system, and preferably is in the same format as the data that is received from the observation system 215. Data received from the observation system 215 may depict one or more robotic systems 210 performing the task. In some embodiments, the external training data 241 can include human-provided labels (e.g., A or B preferences, success level scores, or binary success/failure indications) on data received from the observation system 215.” See at least Col. 10, lines 7-24; “In some embodiments, a control engineer may input features that influence the success of the task performance, and the reward predictor 236 may learn the weights of these features that fit the output of the classifier 234. …  In use, the reward function 244 can receive recorded task observation data and generate a reward value representing the level of success of the recorded observation.” See at least Col. 11, lines 5-24; “The parameters of the classifier 234 can be set to identify such features in new input data to determine whether the new input data matches the paradigm of success (or failure) represented by the training examples.” Col. 10, lines 41-45; “For example, after using the classifier (or human input) to identify that the recorded observations represent greater than a threshold degree of success at the task, the robotic control system 220 may determine to store these as examples of success in the training data repository.” See at least col. 15, lines 47-51; Examiner Interpretation: Its interpreted that success level scores provided by the human to label external training data is temporary determination data. This is used to train (or determine) the parameters (i.e. weighting coefficients and threshold values) of the evaluation function such that the resulting calculated respective evaluation value (or reward value) at the time of providing the external training data would match the success level score provided by the human for the particular external training data provided.
	It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to modify the teachings of Muraoka, Rajkumar, and Nishioka to further include the teachings of Porter so that the temporary determination data can be used to benefit machine learning in a way to replicate a more specific group of results and therefore improve and/or maintain the robot’s operation quality (“For example, a training system can capture videos of a robot performing a task and provide these videos to the computer model for automated evaluation. The result of this evaluation by the computer model can be used to generate and/or recalibrate the control policy that dictates what sequence of actions the robot will take to perform the task. By using machine-learned computer models to evaluate task performance, the present technology is able to identify the factors that lead to success at a task and then leverage knowledge of those factors to develop robotic control policies without requiring significant human oversight throughout the process.” See at least Col. 1, lines 49-60; “Beneficially, the disclosed techniques can also be leveraged to recalibrate robotic systems by using a trained machine learning classifier to detect when robot task performance becomes less successful, and by using the output of the classifier to recalibrate the reward function and policy of the robotic system.” Col. 3, lines 3-9). Even though Porter’s temporary determination data is not particularly determined according to the particular quality metric of time taken for the operation (claim 2), the concept of applying the temporary determination data to train the parameters of an evaluation function would be applied in the same way no matter the particular quality metric and would yield the same benefits.  

Claim 6 is rejected under 35 U.S.C. 103 as being obvious over Muraoka (US 20190184564 A1) in view of Rajkumar (US 20190197396 A1) and Inoue (NPL: Deep Reinforcement Learning for High Precision Assembly Tasks).

Regarding Claim 6,
Modified Muraoka teaches
The determination apparatus according to claim 1,
Muraoka does not explicitly teach
wherein the learning unit is configured to generate the learning model using, as the state data, (a) the state of force and moment applied to the manipulator and (b) the state of the position and the posture in the operation.
However, Inoue teaches
“To learn the peg-in-hole task, we use a recurrent neural network, namely, Long Short Term Memory (LSTM) trained using reinforcement learning” See at least page 1, I. Introduction, paragraph 5; “In this section, we explain the RL algorithm to learn the peg-in-hole task (Fig. 2). The RL agent observes the current states of the system defined as: 
    PNG
    media_image1.png
    25
    183
    media_image1.png
    Greyscale
where F and M are the average force and moment obtained from the force-torque sensor; the subscript x, y, z denotes the axis. The peg position P is calculated by applying forward kinematics to joint angles measured by the robot encoders.” See at least page 2, III. Reinforcement Learning with Long Short Term Memory, paragraph 1.; Also see the Q-learning and training a deep recurrent neural network using state data on page 3, section III. Reinforcement Learning with Long Short Term Memory.
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to modify the teachings of modified Muraoka to further include the teachings of Inoue to generate a learning model to robustly perform high precision fitting tasks with a robot without a long setup time (See at least page 6, V. Conclusions and Future Work).

Allowable Subject Matter
Claim 5 is allowed.

Reasons for Allowance
The following is an examiner’s statement of reasons for allowance: The relevant prior art evaluated separately and in combination, do not disclose the entirety of the limitations of the independent claim 5 since classifying the success and failure categories into further subcategories as disclosed by the applicant is not taught. The closest prior art found is Muraoka (US 20190184564 A1) as it discloses creating an evaluation function to evaluate and learn how to optimally perform an operation (see the prior art cited in the rejection of claim 1). Though, Muraoka does not teach the subcategorization and does not disclose all of the claim limitations of any of the claims on its own or in combination with other relevant art.
Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee.  Such submissions should be clearly labeled “Comments on Statement of Reasons for Allowance.”

Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to Karston G Evans whose telephone number is (571)272-8480. The examiner can normally be reached Mon-Fri 9:00-5:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Abby Lin can be reached on (571)270-3976. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/K.G.E./Examiner, Art Unit 3664                   
/ABBY Y LIN/Supervisory Patent Examiner, Art Unit 3664