DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16 2013, is being examined under the pre-AIA  first to invent provisions.
The present application, filed on 01/30/2019. This action is in response to amendments and remarks filed on 09/06/2022. In the current amendments, claim 1, 7, 11 and 13 are amended. Claims 1-18 are pending and have been examined.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. 
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-4, 7-10 and 13-16 are rejected under 35 U.S.C. 103 as being unpatentable over Wei (“Composite rules selection using reinforcement learning for dynamic job-shop scheduling”) in view of Waschneck et al. (“Deep Reinforcement Learning for Semiconductor Production Scheduling”) further in view Wang (“Learning policies for single machine job dispatching”).  
Claim 1. 
Wei teaches a method, comprising:  generating a trained … model configured to provide instructions to a factory process of a factory, the generating comprising (Page 1088 SECTION VII. Conclusions and Future Research “The agent is able to perform dynamic scheduling based on the available information provided by the scheduling system. It was trained by Q-learning algorithm….application within manufacturing systems” teaches training model for the manufacturing (correspond to factory) system):
obtaining parameters from the factory process to derive a state of the factory process, the parameters comprising slack time (Page 1085 SECTION IV. Constructing New Composite Rules “the critical ratio is obtained by dividing the slack time by the remaining processing time…. at first step, agent selects the job with the smallest CR (critical ratio) value.(i.e. according to CR rule) Then the job is entailed to the machine with the earliest finish time(EFT) to finish its operation on this machine” teaches using critical ration which is obtain from the slack time wherein critical ratio chosen to finish the operation (correspond to factory process));
determining instructions for the factory process based on applying a…model on the factory process trained against lateness and tardiness (Page 1086 Section B. Designing the Reward Function “the goal of the learning agent In this study, the scheduling goal is to minimize the mean tardiness of finished jobs. Therefore, jobs' EMLT is used to determine the amount of reward or penalty for the agent's decision (composite rules selection)” and Page 1087 SECTION VI. Experimental Results “At the end of training, the Q-learning agent gives better results than the aforementioned alternatives” teaches training the q-learning agent wherein q-learning agent comprising EMLT which is tardiness and lateness);
providing instructions to the factory process for execution at the factory (Page 1087 SECTION VI. Experimental Results “in our experience of cooperating with real-world manufacturers” teaches manufacture process);
obtaining state transitions of the state from updated parameters of the factory process including completion time as received from the factory (Page 1086 Section D. Searching Stop Condition “These are the immediate reward, and the action value of the state to which a transition occurs as a result of that action” and Page Section A. Assumption for Job-shop Scheduling Problems “Ci  is the completed time of job i” and Page 1085 SECTION IV. Constructing New Composite Rules “the machine which may finish its operation with EFT has the highest priority rating for selection” teaches state transition obtain from the result of action of finishing time); 
and calculating a reward for the instructions provided to update the…model, the reward based on deriving lateness and tardiness from predetermined job due and the completion time (Page 1086 Section B. Designing the Reward Function “actions in the same state, the reward function is defined as the following equation: Rt=e−q⋅KMLT (12)  where e and q are constants of positive value for regulating EMLT to the reward received by the agent” and Page 1085 SECTION IV. Constructing New Composite Rules “the machine which may finish its operation with EFT has the highest priority rating for selection” teaches calculating reward based on the EMLT (correspond to lateness and tardiness) of the job due and finishing time)
Wherein the reward is calculated by…..derived lateness and tardiness over the completion time (B. Designing the Reward Function & Page 1086 4th paragraph 2nd column “the reward function is defined as the following equation Rt =e-q-EMLT (12)” and Section III INTELLIGENT AGENTS AND DYNAMIC JOB-SHOP SCHEDULING “Definition 2: The estimated mean lateness (EML T) IS defined as: 
    PNG
    media_image1.png
    40
    246
    media_image1.png
    Greyscale
 ” and A. State Determination Criterion & Page 1086, 1st column 2nd paragraph “This EMLT value was chosen over job tardiness since it is able to distinguish between early jobs and late job (unlike the tardiness measure)” teaches reward calculated based on EMLT value and EMLT value comprising tardiness, lateness and Pij (completion time))
While Wei teaches generating and using a trained model in their method, Wei does not teach that the trained model is a trained deep learning model.  
Waschenek, however, teaches generating a trained deep learning model configured to provide instructions to a factory process of a factory (Page 303 Section B. Supervised and RL in a Factory Simulation “The DQN agents are trained in cycles” and Page 301 SECTION I. Introduction “In this paper we apply deep Reinforcement Learning (RL) to production scheduling in complex job shops utilizing cooperative Deep Q Network (DQN) agents” teaches generating trained deep Q network (correspond to deep learning model)).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Wei by using a deep learning model, as does Waschenek, as the trained model to provide instructions to the factory process.  The motivation to do so is that the deep models can “optimize production autonomously for different targets”  and achieve “promising results” in scheduling problems similar to that of Wei (Waschenek, Abstract & pg. 302, 1st column, 4th paragraph).
While Wei in view of Waschenek teaches trained deep learning model reward function in their method, Wei in view of Waschenek does not teach reward is calculated by dividing. 
However, Wang teaches Wherein the reward is calculated by dividing the derived lateness and tardiness over the completion time (4.2. Factors for developing the reward function & Page 557-558 “Factors C, D, and E are concerned with the development of an appropriate reward function….. Factor C defines the number of ranges for determining the amount of reward/penalty…. Similar to factor B, a large value of n for factor D permits distinguishing between jobs that are extremely tardy when the system is under heavy loading condition or employing some dispatching rules like SPT…… Factor E impacts the magnitude of the reward and penalty assigned to each range of the reward function” and Table 1 and Table 2 teaches reward calculated based on the EMPT (completion time) and tardiness division). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Wei in view of Waschenek by using division by completion time, as does Wang, as the reward function based on comprising tardiness, lateness and completion time.  The motivation to do so is that the reward function having tardiness is “used to determine the amount of the reward or penalty for the agent's decision” in scheduling problems similar to that of Wei in view of Waschenek (Wang, 4.2. Factors for developing the reward function & Page 557).
Claim 2. 
Wei in view of Waschneck further in view of Wang teaches The method of claim 1, 
Wei further teaches wherein the determining instructions for the factory process, providing instructions to the factory process for execution at the factory, obtaining state transitions from updated parameters of the factory process as received from the factory (Page 1086 Section D. Searching Stop Condition “These are the immediate reward, and the action value of the state to which a transition occurs as a result of that action” and Page Section A. Assumption for Job-shop Scheduling Problems “Ci  is the completed time of job i” and Page 1085 SECTION IV. Constructing New Composite Rules “the machine which may finish its operation with EFT has the highest priority rating for selection” teaches state transition obtain from the result of action of finishing time),
 and calculating a reward for the instructions provided to update the…model are conducted in order and repeated until the…model converges as the trained….model (Page 1086 Section B. Designing the Reward Function “actions in the same state, the reward function is defined as the following equation: Rt=e−q⋅KMLT (12)” and Page 1087 Section VI. Experimental Results “the robustness and convergence of RL. At the end of training, the Q-leaning agent gives better results than the aforementioned alternatives” teaches calculating reward and model converges as it is trained).
The Wei/Waschneck/Wang combination has already demonstrated that the trained model is a trained deep learning model.
Claim 3. 
Wei in view of Waschneck further in view of Wang teaches The method of claim 1, 
Wei further teaches wherein the state of the factory process comprises a machine state and a job queue state (Page 1084 Section A. Assumption for Job-shop Scheduling Problems “We assume that there are 9 different and nonparallel machines and 15 jobs within the simulation environment” teaches state comprising machine and job (correspond to job queue) states), 
the machine state indicative of whether a machine associated with the factory process is allocated or idle at each predetermined timeslot (Page 1084 Section A. Assumption for Job-shop Scheduling Problems “We assume that there are 9 different and nonparallel machines… The processing times and inter-arrival times are defined” teaches machine state which associated it with processing time defined), 
the job queue state indicative of characteristics of jobs waiting to be processed in job slots or in backlog (Page 1084 Section A. Assumption for Job-shop Scheduling Problems “The jobs arrive at machines, wait in the queue until the machine is available” teaches job wait to be processed).
Claim 4. 
Wei in view of Waschneck further in view of Wang teaches The method of claim 3, 
Wei further teaches wherein the job queue state comprises processing time and the slack time (Page 1085 SECTION IV. Constructing New Composite Rules “critical ratio is obtained by dividing the slack time by the remaining processing time.. CR then gives the highest priority to the job” teaches obtain value from the stack time and processing time for job), 
wherein the method further comprises determining the lateness and tardiness based on a function of the job due time and completion time (Page 1084 Section A. Assumption for Job-shop Scheduling Problems “The tardiness of job i is: Ti=max{Ci−DDi,0} (5) The lateness of job i is defined as: Li=Ci−DDi (6) where Ci is the completed time of job i,DDi represents the due date of i job” teaches lateness and tardiness based on the completion time and job due).
Waschneck further teaches wherein the state of the factory process is represented in a 2-D matrix involving the machine state and the job queue state (Page 303 SECTION A. Production Scheduling as Markov Decision Process “The state space S=Smachines×Sjobs is a combination of machine states machine={s1,…,sMw}∈Smachines for Mw machines at a workcenter w and the state of surrounding jobs sjobs=⟨0s1,…,sj⟩∈Sjobs for j jobs….Machine capabilities are encoded in the dedication matrix d” teaches matrix involve for job state and machine state),
wherein the job queue state comprises processing time and the slack time represented in arrays (Page 303 SECTION A. Production Scheduling as Markov Decision Process “The state of the system st∈S at time t, where S is the space of all possible states…The state space S=Smachines×Sjobs is a combination of machine states machine={s1,…,sMw}∈Smachines for Mw machines at a workcenter w and the state of surrounding jobs sjobs=⟨0s1,…,sj⟩∈Sjobs for j jobs….Machine capabilities are encoded in the dedication matrix d” teaches time represented for time as the matrix (correspond to array)).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Wei by using machine state and the job queue, as does Waschenek, as the machine state and the job queue represented as matrix.  The motivation to do so is that the machine state and job queue represented matrix “Machine capabilities are encoded in the dedication matrix d (see Section II)” to achieve “The machine state sx reduces to a one-hot vector sx∈{0,1}ST for ST setup types” in scheduling problems similar to that of Wei (Waschenek, Page 303 SECTION A. Production Scheduling as Markov Decision Process).
Claims 7-10.
 Claims 7-10 recites a non-transitory computer readable medium, storing instructions for executing a process, the program for performing precisely the method of Claims 1-4, As Wei performs their method on a computer (Wei, Page 1087 Section VI. Experimental Results) in which a non-transitory computer readable medium in inherent, Claims 7-10 are rejected for reason set forth in the rejection of claim 1-4, respectively. 
Claims 13-16.
 Claims 13-16 recites an apparatus configured to control a factory process of a factory through instructions, the apparatus comprising: a memory configured to store a trained deep learning model; and a processor, the program for performing precisely the method of Claims 1-4, As Wei performs their method on a computer (Wei, Page 1087 Section VI. Experimental Results and Page 1086 SECTION V.Q-Learning Application to Composite Rule Selection) in which a non-transitory computer readable medium in inherent, Claims 13-16 are rejected for reason set forth in the rejection of claim 1-4, respectively. 
Claims 5-6, 11-12 and 17-18 are rejected under 35 U.S.C. 103 as being unpatentable over Wei in view of Waschneck further in view of and Wang further in view of Taylor (“Transfer via Inter-Task Mappings in Policy Search Reinforcement Learning”).  
Claim 5. 
Wei in view of Waschneck further in view of Wang teaches The method of claim 1, further comprising:
Wei further teaches and applying the….model to provide instructions to the … factory (Page 1088 SECTION VII. Conclusions and Future Research “The agent is able to perform dynamic scheduling based on the available information provided by the scheduling system. It was trained by Q-learning algorithm….application within manufacturing systems”).
Waschneck further teaches generating source state trajectories based on optimal policy of the factory determined from the trained deep learning model (“The neural network predicts the action at based on the state st…the agent trains the neural network in such a way that it predicts the cumulative, weighted rewards for all actions… The optimal action-value function Q∗(s, a) is approximated during the Q-learning algorithm” teaches training deep learning model for the state based on optimal policy)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Wei by using a deep learning model, as does Waschenek, as the trained model to provide instructions to the factory process.  The motivation to do so is that the deep models can “optimize production autonomously for different targets”  and achieve “promising results” in scheduling problems similar to that of Wei (Waschenek, Abstract & pg. 302, 1st column, 4th paragraph).
Wei in view of Waschneck further in view of Wang teaches deep learning model in their method, Wei in view of Waschneck further in view of Wang does not teach that transferring the deep learning model. 
However, Taylor teaches transferring the…model from the factory to another factory, the transferring comprising: (Page 158 Section 5.1 Hand-Coded Keepaway Mappings “we can intuitively define these mappings between states and actions to transfer knowledge between the two tasks” transferring the model between two task (correspond to factory)): 
computing a projection function between the factory and the another factory (Page 157 Section 2.2 Incompletely Defined Inter-TaskMappings “we can still use TVITM-PS with a modified transfer functional, ρI , generated using the same steps described in Algorithm 1” teaches computing function between tasks (correspond to factory)); 
converting the source state trajectories into target state trajectories (Page 159 Section 5.1 Hand-Coded Keepaway Mappings “We map the novel target action, “Pass to third closest keeper,” to “Pass to second closest keeper” in the source task” teaches map (correspond to converting) source to target task);
executing a recovery process to determine the…..model for the another factory from the target state trajectories (Page 159 Section 5.3 Learned Keepaway Inter-Task Mappings “, we consider the Keepaway state at the time an action is executed and the state of the world, as perceived from that same keeper, after the action has successfully finished (i.e. the next time a keeper can select a macro-action)” teaches executing mode from the target).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Wei in view of  Waschenek by using transferring deep learning model, as does Taylor, as the factory to another factory, the transferring.  The motivation to do so is that the transfer learning “significantly speed up learning in pairs of related RL tasks” in scheduling problems similar to that of Wei in view of Waschneck further in view of Wang (Waschenek, Conclusion & pg. 163, 1st column, 5th paragraph).
The Wei/Waschneck combination has already demonstrated that the trained model is a trained deep learning model.
Claim 6. 
Wei in view of Waschneck further in view of Wang and further in view of Taylor teaches The method of claim 5, 
Taylor further teaches the recovery process comprising: computing action trajectories for the another factory (Page 159 Section 5.3 Learned Keepaway Inter-Task Mappings “, we consider the Keepaway state at the time an action is executed and the state of the world, as perceived from that same keeper, after the action has successfully finished (i.e. the next time a keeper can select a macro-action)” teaches executing mode from the target), 
and training the…model through use of the target state trajectories as an input and the action trajectories as output class labels (Page 157 Section 2.1 Constructing a Transfer Functional “Given χH,X, χH,A, and a trained network πsource, our goal is to create a new network πtarget that can function in the target task.….the target policy has more or fewer inputs (state variables) or outputs (actions)” teaches training model comprising state of input and output).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Wei in view of  Waschenek by using transferring deep learning model, as does Taylor, as the factory to another factory, the transferring.  The motivation to do so is that the transfer learning “significantly speed up learning in pairs of related RL tasks” in scheduling problems similar to that of Wei in view of Waschenek (Waschenek, Conclusion & pg. 163, 1st column, 5th paragraph).
The Wei/Waschneck/Wang combination has already demonstrated that the trained model is a trained deep learning model.
Claims 11-12.
 Claims 11-12 recites a non-transitory computer readable medium, storing instructions for executing a process, the program for performing precisely the method of Claims 5-6, As Wei performs their method on a computer (Wei, Page 1087 Section VI. Experimental Results) in which a non-transitory computer readable medium in inherent, Claims 11-12 are rejected for reason set forth in the rejection of claim 5-6, respectively. 
Claims 17-18.
 Claims 17-18 recites an apparatus configured to control a factory process of a factory through instructions, the apparatus comprising: a memory configured to store a trained deep learning model; and a processor, the program for performing precisely the method of Claims 5-6, As Wei performs their method on a computer (Wei, Page 1087 Section VI. Experimental Results and Page 1086 SECTION V.Q-Learning Application to Composite Rule Selection) in which a non-transitory computer readable medium in inherent, Claims 17-18 are rejected for reason set forth in the rejection of claim 5-6, respectively. 

Response to Arguments
Applicant's arguments filed on 02/14/2022 with respect to 35 U.S.C. §102 & 103 rejection have been fully considered but they are not persuasive.
Regarding Claims 1, Applicant asserts “Wei fails to disclose, "wherein the reward is calculated by dividing the derived lateness and tardiness over the completion time" as recited in amended independent claim 1. Thus, Wei in view of Waschneck fails to teach, disclose, or suggest, at least, the above-cited features of amended independent claim 1. As discussed above, amended independent claim 1 is patentably distinguishable over Wei in view of Waschneck” (Remarks Pg. 9). 
Examiner response:  This argument has been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in this argument. A newly recited prior art, (Wang et al. (“Learning policies for single machine job dispatching”)) has been applied to teach the limitations referred to in this argument. 
              Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to LOKESHA G PATEL whose telephone number is (571)272-6267. The examiner can normally be reached Monday-Friday 8am-5pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Afshar, Kamran can be reached on (571) 272-7796. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/LOKESHA G PATEL/Examiner, Art Unit 2125                                                   
/BRIAN M SMITH/Primary Examiner, Art Unit 2122