Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

DETAILED ACTION
1.  This action is in response to the application filed 2/21/2019.
2.  Claims 1-20 have been examined and are pending in the application.

Claim Rejections - 35 USC § 112
The following is a quotation of the second paragraph of 35 U.S.C. 112:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention. 


3.  Claims 1-13 are rejected under 35 U.S.C. 112, second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which applicant regards as the invention.
The following terms lack antecedent basis:	
(i) the gearbox model (line 2 claim 1).  Correction is required.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

4.  Claims 1-20 are rejected under 35 U.S.C. 103 as being unpatentable over Mallya Kasaragod U.S Publication No. 2020/0167437.
As to claim 1, Mallya Kasaragod teaches a method for automated design, the method comprising: 
instantiating a model (…initiate simulation of a robotic device application to train a reinforcement learning model…, paragraph 0074 page 11) having an initial parameter state in a modeling environment (…specifies an initial state of a simulation environment…, paragraph 0067 page 10); 
analyzing and/or characterizing the model in the modeling environment to determine model performance (…the system simulation agent 404 determines the resulting state of the simulation environment in response to performance of the selected action or pairing of initial state and action…, paragraph 0077 pages 11-12); and 
determining whether the model performance satisfies a performance target; wherein upon a determination that the model performance does not satisfy the performance target: a reward is calculated based on the model performance; a reinforcement machine learning agent determines a parameter change action based on the reward and a current parameter state of the model; and an updated parameter state of the model is determined based on the parameter change action (…Based on the simulation environment state achieved through execution of the action, the application may determine, based on the reinforcement function, a reward value. The simulation application may transmit this information to the training application operating in the other software container instance to cause the training application to use this information to update the reinforcement learning model…., paragraph 0021 page 2;…the model training application 412 may evaluate the reinforcement learning model 414 during subsequent iterations to determine whether a termination condition has been met. For instance, if based on the simulation data 416 obtained from the memory buffer, the model training application 412 determines that the reinforcement learning model 414 has converged on an optimal solution (e.g., the average reward value over an N number of iterations is greater than a minimum threshold value, etc.) and a determination is made that the reward value is not going to improve beyond the average reward value, the model training application 412 may transmit a notification to the system simulation agent 404 to indicate completion of the simulation. While average reward values are used extensively throughout the present disclosure for the purpose of illustration, other statistics or metrics involving reward values may be used to define a termination condition (e.g., average change in the reward value over a set of previous simulation iterations is below a threshold value, etc.). Similarly, the model training application 412 may determine that a termination condition has been satisfied based on the number of data points processed from the simulation data 416 collected from the memory buffer or in response to a determination that a time limit for performance of the simulation has elapsed. The model training application 412 and the system simulation agent 404 may provide simulation updates to a client account, which the customer may access to determine the state of the simulation…., paragraph 0079 page 12).
Mallya Kasaragod does not explicitly teach the model is a gearbox model.  However, Mallya Kasaragod teaches the model is a robotic device model (…a simulation management service receives a request, from a customer, to perform reinforcement learning for a set of robotic devices…, paragraph 0019 pages 1-2).  It 
As to claim 2, Mallya Kasaragod as modified further teaches iteratively performing the following operations until the gearbox model performance satisfies the performance target: analyzing and/or characterizing the gearbox model having the updated parameter state in the modeling environment to determine the gearbox model performance; calculating a new reward based on the gearbox model performance; determining, by the reinforcement machine learning agent, a new parameter change action based on the new reward and the updated parameter state of the gearbox model; and determining a new updated parameter state of the gearbox model based on the new parameter change action (…The training of the reinforcement learning model may further take into account the reward value, as determined via the custom-designed reinforcement function, corresponding to the action performed, the initial state, and the state attained via execution of the action. The training application container may provide the updated reinforcement learning model to a simulation application container to utilize in the simulation of the application and to obtain new state-action-reward data that may be used to continue updating the reinforcement learning model…., paragraph 0040 page 5). 
As to claim 3, Mallya Kasaragod as modified further teaches upon a determination that the gearbox model performance satisfies the performance target, outputting the current parameter state of the gearbox model as a final gearbox design (…the model training application 412 may evaluate the reinforcement learning model 414 during subsequent iterations to determine whether a termination condition has been met. For instance, if based on the simulation data 416 obtained from the memory buffer, the model training application 412 determines that the reinforcement learning model 414 has converged on an optimal solution (e.g., the average reward value over an N number of iterations is greater than a minimum threshold value, etc.) and a determination is made that the reward value is not going to improve beyond the average reward value, the model training application 412 may transmit a notification to the system simulation agent 404 to indicate completion of the simulation. While average reward values are used extensively throughout the present disclosure for the purpose of illustration, other statistics or metrics involving reward values may be used to define a termination condition (e.g., average change in the reward value over a set of previous simulation iterations is below a threshold value, etc.). Similarly, the model training application 412 may determine that a termination condition has been satisfied based on the number of data points processed from the simulation data 416 collected from the memory buffer or in response to a determination that a time limit for performance of the simulation has elapsed. The model training application 412 and the system simulation agent 404 may provide simulation updates to a client account, which the customer may access to determine the state of the simulation…., paragraph 0079 page 12).
 As to claim 4, Mallya Kasaragod as modified does not explicitly teach the gearbox is a gear reducer.  However, as disclosed above, Mallya Kasaragod teaches the model is a robotic device model (…a simulation management service receives a paragraph 0019 pages 1-2).  It would have been obvious at the time the invention was made to a person of ordinary skill in the art to have modified Mallya Kasaragod reference to include a gear reducer because a gear reducer is essentially a subset part of a robotic device, as well known in the art. 
As to claim 5, Mallya Kasaragod as modified further teaches the initial parameter state corresponds to an initial gearbox design provided by a user (…the simulation management service provides, through a graphical user interface (GUI), an editor that the customer may use to define the computer-executable code. Through the GUI, the customer may identify the simulation environment of the robotic devices, as well as other parameters that may be used to define the characteristics of the robotic devices…, paragraph 0019 pages 1-2). 
As to claim 6, Mallya Kasaragod as modified further teaches the performance target is based on engineering requirements provided by a user (The simulation management service may evaluate the provided parameters and the simulation environment to identify the variables in the simulation environment that may affect the system performance (e.g., learning the reinforcement learning model) and may expose these variables to the customer as function parameters in the editor. The customer may utilize any of these variables to build the custom-designed reinforcement function…, paragraph 0019 pages 1-2). 
As to claim 7, Mallya Kasaragod as modified further teaches the parameter change action comprises at least one of enlargements, reductions, material substitutions, or changes to shafts, bearings or gears, or changes to a kinematic layout of the gearbox (…Through the GUI, the customer may identify the simulation environment of the robotic devices, as well as other parameters that may be used to define the characteristics of the robotic devices…, paragraph 0019 pages 1-2).  Note the discussion of claim 1 above for the reason of including the gearbox as part of the robotic device in Mallya Kasaragod reference.
 As to claim 8, Mallya Kasaragod as modified further teaches the reinforcement machine learning agent determines the parameter change action based upon a value of the reward (…the system simulation agent 404 utilizes a value function to select, from a set of pairings of initial simulation environment states and corresponding actions, a pairing that may be used as input to the simulation application to cause the simulation application to perform the action. During the initial execution of the simulation application, the system simulation agent 404 may select this pairing at random, since the reinforcement learning model 406 has not been updated to provide sufficient guidance for selecting a pairing that would result in a higher reward value in accordance with the reinforcement function defined by the customer. In an embodiment, the system simulation agent 404 can additionally, or alternatively, utilize a policy function to identify an initial state for the simulation, which may be used to select the appropriate action to be performed within the simulation environment. Similar to the value function described above, the system simulation agent 404 may select the action to be performed at random if it is the initial action to be selected based on the initial state of the simulation environment. The action may be selected at random since the reinforcement learning model 406 has not been updated to provide the sufficient paragraph 0076 page 11). 
As to claim 9, Mallya Kasaragod as modified further teaches the reinforcement machine learning agent determines the parameter change action based on a randomization algorithm (…the system simulation agent 404 may select this pairing at random, since the reinforcement learning model 406 has not been updated to provide sufficient guidance for selecting a pairing that would result in a higher reward value in accordance with the reinforcement function defined by the customer…, paragraph 0076 page 11). 
As to claim 10, Mallya Kasaragod as modified further teaches the initial parameter state defines at least one of a dimension of a gearbox component, a material property, a surface hardness, a tolerance class, a type of gearbox, or a number of shafts, gears or bearings (…Through the GUI, the customer may identify the simulation environment of the robotic devices, as well as other parameters that may be used to define the characteristics of the robotic devices…, paragraph 0019 pages 1-2).  Note the discussion of claim 1 above for the reason of including the gearbox as part of the robotic device in Mallya Kasaragod reference.
As to claim 11, Mallya Kasaragod as modified further teaches the reward is calculated based upon a design criteria corresponding to operating efficiency (…Based on the simulation environment state achieved through execution of the action, the application may determine, based on the reinforcement function, a reward value…, paragraph 0021 page 2). 
As to claim 12, Mallya Kasaragod as modified further teaches the model environment comprises a machine element analysis program (…The simulation management service may inject the custom-defined reinforcement function into the application and execute the application in the simulation environment generated within the corresponding software container instance. The application may select an initial simulation environment state and a corresponding action to be performed by the robotic device in the simulation environment…, paragraph 0021 page 2). 
As to claim 13, Mallya Kasaragod as modified further teaches the reinforcement machine learning agent is configured to maximize a cumulative reward or to maximize a current reward (…the model training application 412 may evaluate the reinforcement learning model 414 during subsequent iterations to determine whether a termination condition has been met. For instance, if based on the simulation data 416 obtained from the memory buffer, the model training application 412 determines that the reinforcement learning model 414 has converged on an optimal solution (e.g., the average reward value over an N number of iterations is greater than a minimum threshold value, etc.) and a determination is made that the reward value is not going to improve beyond the average reward value, the model training application 412 may transmit a notification to the system simulation agent 404 to indicate completion of the simulation. While average reward values are used extensively throughout the present disclosure for the purpose of illustration, other statistics or metrics involving reward values may be used to define a termination condition (e.g., average change in the reward value over a set of previous simulation iterations is below a threshold value, etc.). Similarly, the model training application 412 may determine that a termination paragraph 0079 page 12). 
As to claims 14-20, note the discussions of claims 1-2, 6, 5, 12, 7 and 9 above, respectively. 

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
U.S Publication No. 2020/0074241 discloses a reinforcement learning architecture for facilitating reinforcement learning in connection with operation of an external real-time system that includes a plurality of devices operating in a real-world environment.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Andy Ho whose telephone number is (571) 272-3762.  A voice mail service is also available for this number.  The examiner can normally be reached on Monday – Friday, 8:30 am – 5:00 pm.
If attempts to reach the examiner by telephone are unsuccessful, the examiner's supervisor, Dennis Chow can be reached on (571) 272-7767. 

Any inquiry of a general nature or relating to the status of this application or proceeding should be directed to the receptionist whose telephone number is 571-272-2100.
Any response to this action should be mailed to:
Commissioner for Patents 
P.O Box 1450
Alexandria, VA 22313-1450
	Or fax to:
AFTER-FINAL faxes must be signed and sent to (571) 273 - 8300.
OFFICAL faxes must be signed and sent to (571) 273 - 8300.
NON OFFICAL faxes should not be signed, please send to (571) 273 – 3762

/Andy Ho/
Primary Examiner
Art Unit 2194