DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Status of Claims

2.	The following is a NON-FINAL Office Action upon examination of application number 17/006,810 in response to Applicant’s Request for Continued Examination (RCE) filed on April 21, 2022.

3.	In accordance with Applicant’s amendment, claims 1, 7, and 8 are amended, claims 5-6 and 12-13 are cancelled. Claims 1, 4, 7-8, 11, and 14 are currently pending.

Priority

4.	Application 17/006,810, filed 08/29/2020 claims Priority from Provisional Application 62/896,059, filed 09/05/2019. Applicant’s claim for the benefit of a prior-filed application under 35 U.S.C. 119 and/or 35 U.S.C. 120 is acknowledged.

Response to Amendment

5.	In the response filed April 21, 2022, Applicant amended claims 1, and 7-8, and canceled claims 5-6 and 12-13. No new claims were presented for examination. 

6.	Applicant's amendments to claim 7 are hereby acknowledged. The amendments are sufficient to overcome the previously issued claim objection; accordingly this objection has been removed.

7.	Applicant's amendments to the claims are hereby acknowledged. The amendments are not sufficient to overcome the previously issued claim rejections under 35 U.S.C. 101; accordingly these rejections have been maintained.

Response to Arguments

8.	Applicant's arguments filed April 21, 2022, have been fully considered.

9.	Applicant submits “the limitations reciting in the current claim 1 fails to fall under the “Mental Processes”. In fact, the steps of cleaning productions data, pre-processing and calculating the productions data to obtain extraction data, establishing a reinforcement learning model and inputting the extraction data to the reinforcement learning mode, establishing different simulating environments according to the score function and the extraction data, producing multiple scheduling decisions for each of the different simulating environments cannot be performed via human observation or evaluation. Especially, a reinforcement learning model is established, and different simulating environments are established based on the score function and the extraction data, multiple scheduling decisions are then produced, and finally an optimal scheduling decision is judged,…, which should not be considered as mental processes that can be performed via human evaluation or judgment.” [Applicant’s Remarks, 04/21/2022, page 7]

With particular respect to the §101 rejection of claim 1, Applicant argues with respect to Step 2A, Prong One of the eligibility inquiry that “the steps of cleaning productions data, pre-processing and calculating the productions data to obtain extraction data, establishing a reinforcement learning model and inputting the extraction data to the reinforcement learning mode, establishing different simulating environments according to the score function and the extraction data, producing multiple scheduling decisions for each of the different simulating environments cannot be performed via human observation or evaluation.” In the “2019 Revised Patent Subject Matter Eligibility Guidance” (published on 01/07/2019 in Fed. Register, Vol. 84, No. 4 at pgs. 50-57), the USPTO provided instructions for evaluating claims under Step 2A by setting forth three groupings of abstract ideas, Mathematical Concepts, Mental Processes, and Certain Methods of Organizing Human Activities. In this instance, claim 1 has been found to recite an abstract idea that falls into the “Mental Processes” by reciting steps that can be accomplished mentally such as via human observation and perhaps with the aid of pen and paper, which fall under the “Mental Processes” abstract idea grouping set forth in the 2019 PEG. The 101 rejection found the limitations in claim 1 to be directed to an abstract idea that falls into the “mental processes” based on the limitations cleaning production data, pre-processing and calculating the production data to obtain extraction data, establishing a model and inputting the extraction data to the model to produce an optimal scheduling decision according to a score function and the extraction data, establishing different simulating environments according to the score function and the extraction data, producing multiple scheduling decisions for each of the different simulating environments, and judging the optimal scheduling decision for each simulating environment by virtue of a reward mechanism. These limitations recite an abstract idea that falls into the “Mental processes — concepts performed in the human mind (including an observation, evaluation, judgment, opinion)” group within the enumerated groupings of abstract ideas set forth in the 2019 PEG. As claimed, the steps can be practically performed mentally, by a human evaluating information. Cleaning, pre-processing, and calculating the production data to obtain extraction data, establishing a model and inputting the extraction data to the model to produce an optimal scheduling decision, producing multiple scheduling decisions for each of the different simulating environments, and judging the optimal scheduling decision for each simulating environment encompass evaluation steps that can be accomplished mentally such as via human observation/judgement perhaps with the aid of pen and paper. For instance, the cleaning production data step can be performed via human observation or perhaps by documenting the incorrect/duplicate data with the aid of pen and paper. The “cleaning” step can be carried out manually such as via human evaluation or judgement, and is therefore a mental step. Similarly, the Examiner points out that a human user can make a judgement regarding an optimal scheduling decision in his/her mind. These above noted steps describe data gathering, observation, and decision making. In particular, data is collected, data is analyzed, and data is evaluated to judge the optimal scheduling decision for each simulating environment, which are a combination of “observation, evaluation, judgment, opinion.” 2019 Revised Guidance, 84 Fed. Reg. at 52. Thus, the steps recite the abstract concept of “mental processes.” Therefore, Applicant’s arguments under Step 2A Prong 1 are not persuasive because the claims have been shown to set forth or describe activities falling under the “Mental Processes” abstract idea grouping set forth in the 2019 PEG. The Office maintains that the claims recite an abstract idea.
Applicant suggests that the details of the reinforcement learning model could not be performed in the human mind [Remarks at page 7]. The Examiner points out that the underlying analysis of the claims may be performed in the human mind and/or with the use of pen and paper. Furthermore, other claim limitations directed to gathering information and making decisions may be performed in the human mind and/or with the aid of pen and paper. The claims also incorporate details of mathematical concepts, as explained in the rejection. The details of the reinforcement learning model were more explicitly addressed in Step 2A – Prong 2 and Step 2B of the Subject Matter Eligibility test. For the reasons detailed above, this argument is found unpersuasive.

9.	Applicant submits “in the current claim 1 and claim 8, the system and the method as a whole provide improvement to scheduling decision field. An optimal scheduling decision can be automatically produced by the system and the method, without manual interference to the ERP or MES system, so as to improve the efficiency and reduce labor costs. Therefore, the subject matter encompassed by the current independent claims amounts to practical application and significantly more than an abstract idea itself.” [Applicant’s Remarks, 04/21/2022, pages 7-8]

The Examiner respectfully disagrees. Under Step 2A, Prong Two of the eligibility inquiry, Applicant argues that “the subject matter encompassed by the current  independent claims amounts to practical application.” The additional elements in exemplary claim 1 are directed to: a scheduling calculation host, multiple databases, a user terminal, and a reinforcement learning model, which merely serve to tie the abstract idea to a particular technological environment (computer-based operating environment) via generic computing hardware, software/instructions, which is not sufficient to amount to a practical application, as noted in the 2019 PEG. Furthermore, it is noted that Applicant’s claims are devoid of any discernible change, transformation, or improvement to a computer (software or hardware) or any existing technology. Applicant has not shown that any specific technological improvement is achieved within the scope of the claims. It bears emphasis that no scheduling calculation host, multiple databases, user terminal, or technological elements are modified or improved upon in any discernible manner. Instead, the result produced by the claims is simply information indicating multiple scheduling decisions, which is not a technical result or improvement thereof. Nevertheless, even assuming arguendo that an improvement was achieved, improving the process of producing scheduling decisions, at most, seems to provide an improvement to a business process (i.e., as supported by the Applicant’s Specification at paragraph 0027: “As a result, the scheduling process of production management is simplified to assist users in improving production efficiency and reducing production costs.”) using generic computing elements, such that any incidental improvement achieved by automating the claim steps would come from the capabilities of a general-purpose computer rather than the sequence of steps/activities recited in the method itself, which does not materially alter the patent eligibility of the claim. See Bancorp Servs., L.L.C. v. Sun Life Assurance Co. of Can. (U.S.), 687 F.3d 1266, 1278 (Fed. Cir. 2012) (“[T]he fact that the required calculations could be performed more efficiently via a computer does not materially alter the patent eligibility of the claimed subject matter.”) (cited in the Federal Circuit's FairWarning decision).
Next, it is noted that there is nothing particular about the computing elements (i.e., scheduling calculation host, multiple databases, and a user terminal), nor anything in the claims or Specification showing the device/medium as being modified or improved upon in any manner whatsoever, but instead these generic computing elements are similar to simply adding the words “apply it” to the abstract idea, which is not sufficient to amount to a practical application. See MPEP 2106.05(f)/(h). As explained above, the claims do nothing to modify, reconfigure, manipulate, or transform the computer, computer software, or any technology in any discernible manner, much less yield an improvement thereto. While Applicant submits that “the system and the method as a whole provide improvement to scheduling decision field. An optimal scheduling decision can be automatically produced by the system and the method, without manual interference to the ERP or MES system, so as to improve the efficiency and reduce labor costs,” [Remarks, at pages 7-8], Applicant has provided no showing that implementing the claim steps amounts to a technical improvement. It is noted that even if claimed, reducing labor costs on its face would not necessarily serve to improve the ability of the computer to function. It is not clear how the claimed limitations would accomplish a result that realizes an improvement in computer functionality. 
Lastly, the additional elements, including the scheduling calculation host, multiple databases, a user terminal, and a reinforcement learning model, fail to integrate the abstract idea into a practical application because they fail to provide an improvement to the functioning of a computer or to any other technology or technical field, fail to apply the exception with a particular machine, fail to apply the judicial exception to effect a particular treatment or prophylaxis for a disease or medical condition, fail to effect a transformation of a particular article to a different state or thing, and fail to apply/use the abstract idea in a meaningful way beyond generally linking the use of the judicial exception to a particular technological environment. For the above reasons, this argument is found unpersuasive.
In response to Applicant’s argument that “the subject matter encompassed by the current dependent claims amounts to significantly more than an abstract idea itself.” The Examiner respectfully disagrees. It is noted that to satisfy the "significantly more" aspect, Applicant would need to argue that the limitations are either 1) improving another technology, 2) improving the functioning of the computer itself, 3) applying the judicial exception with, or by use of, a particular machine, 4) effecting a transformation or reduction of a particular article to a different state or thing, 5) adding a specific limitation other than what is well-understood, routine and conventional in the field, or adding unconventional steps that confine the claim to a particular useful application, or 6) other meaningful limitations beyond generally linking the use of the judicial exception to a particular technological environment. 
In this case, the claims do not include limitations that meet the criteria listed above. Further, the Examiner points out there is no actual improvement to another technology or technical field, no improvement to the functioning of the computer itself, and no meaningful limitations beyond generally linking the use of the abstract idea to a particular technological environment evident in the claims. The steps recited in a claim could be programmed to be performed on a variety of different computer platforms. While the claim limitations are implemented by a computer, the computer is nothing more than a general purpose computer (as evidenced by the Applicant’s Specification – paragraph [0036]: “The user terminal 230 may include a user interface, such as a display or a tablet computer, connected to the scheduling calculation host 210. The scheduling calculation host 210 is configured to perform a series of processing to the production data from the database 220, and then generate an optimal scheduling decision to the user interface for the user's reference…”) and the claims do not include improvements to another technology or technical field; nor do they include improvements to the functioning of the computer itself. The specification does not indicate that the all the procedures are implemented on a special purpose computer. Further, it is noted that the claim limitations do not reflect an improvement in the functioning of the computer or an improvement in another technology. The claimed invention is directed towards, at best, an improvement in the business process of producing multiple scheduling decisions. The need for solving a problem of low efficiency and high labor costs, as Applicant’s claimed invention purports to address, is not a technical problem; it is a business problem.
Furthermore, it is not clear how the claimed limitations would accomplish a result that realizes an improvement in computer functionality. The Examiner emphasizes that nowhere in Applicant’s Specification is there any discussion or suggestion that the problem or solution is a technical one, nor is there even a hint of any contemplated improvement to technology. Furthermore, although Applicant’s claims involve a reinforcement learning model, the learning model is recited at a high level of generality and has not been shown to yield a technical improvement. The Examiner further points out there is no actual improvement to another technology or technical field, no improvement to the functioning of the computer itself, and no meaningful limitations beyond generally linking the use of the abstract idea to a particular technological environment evident in the claims. While the Applicant submits that “the system and the method as a whole provide improvement to scheduling decision field”, the Applicant’s claims do not adequately explain how the additional elements of the claim integrate to add any meaningful limits on the abstract idea. At the most, the claimed invention seems to provide improvement beneficial to the end users. The focus of the claims of the instant application is not on such an improvement in computers as tools, but on certain independently abstract ideas that use computers as tools. The claim does not, for example, purport to improve the functioning of any of the computer components. As stated above, the claim does not effect an improvement in any other technology or technical field. Accordingly, this argument is found unpersuasive.

For the reasons above, Applicant’s arguments concerning the §101 rejection are not persuasive at showing Applicant’s claims as eligible.

9.	Applicant submits “Chan fails to mention pre-processing and calculating the production data to obtain an extraction data; and the reinforcement learning model is configured to establishing different simulating environments according to the score function and the extraction data, producing multiple scheduling decisions for each of the different simulating environments, and judging the optimal scheduling decision for each simulating environment by virtue of a reward mechanism.” [Applicant’s Remarks, 04/21/2022, pages 8-9]

	In response to Applicant’s argument that “Chan fails to mention pre-processing and calculating the production data to obtain an extraction data, it is noted that Chan was not asserted as disclosing the disputed limitation. As described in the Office Actions, dated 08/17/2021 and 02/01/2022, Serita was relied upon to disclose the limitation pre-processing and calculating the production data to obtain an extraction data. Moreover, in response to Applicant’s argument that Chan fails to mention the reinforcement learning model is configured to establishing different simulating environments according to the score function and the extraction data, producing multiple scheduling decisions for each of the different simulating environments, and judging the optimal scheduling decision for each simulating environment by virtue of a reward mechanism, the Examiner notes that Chan was not asserted as disclosing the disputed limitations. For the reasons provided above, this argument is deemed moot.

9.	Applicant submits “Serita only discusses an order information and due date as input, which is suitable to flow line production, but not for multi-task sequence and multi-machine. In the claimed invention, multiple factors are considered, production time, order delivery date, machine maintenance status, urgency, and current production status will be input to the reinforcement learning model to perform the high-efficient and production. Such multiple data are beneficial to lead to a more comprehensive simulating, and to produce a desirable scheduling decision. Lettowsky also fails to discuss the above features.” [Applicant’s Remarks, 04/21/2022, page 9]

Regarding the rejection under 35 U.S.C. § 103, Applicant submits that “Serita only discusses an order information and due date as input, which is suitable to flow line production, but not for multi-task sequence and multi-machine. The Examiner points out that the claims do not present any specifics regarding a multi-task sequence and multi-machine. As to the discussion of production time, order delivery date, machine maintenance status, urgency, and current production status, Serita was only asserted as disclosing the order delivery date, and current production status. In at least paragraphs 0026, 0028, and 0046, Serita teaches the instant limitation by obtaining information such as a current production status and an order due date. In particular, Serita’s method for facilitating due date management in manufacturing systems, which encompasses considering inputs including a current production status and an order due date, as discussed in at least paragraphs 0026 and 0028 is reasonably understood as teaching the disputed “the extraction data comprises order delivery date, and current production status.”
In further support of the reasonableness of mapping Serita’s current production status to the claimed current production status, it is noted that paragraph 0026 of Serita discloses that “Scheduling policy 261 provides a function that takes current production status as input and outputs a dispatching decision.” In further support of the reasonableness of mapping Serita’s order due date to the claimed order delivery date, it is noted that paragraph 0027 of Serita discloses that “In situations where that customer can request the preferred due date, the order information can include such information.” Thus, given the broadest reasonable interpretation consistent with the specification in construing the claimed invention, it is Examiner’s position that the disclosure of Serita teaches and at least suggests the disputed limitation. Accordingly, this argument is found unpersuasive.
Lastly In response to the Applicant’s argument that “Lettowsky also fails to discuss the above features,” it is noted that this argument is a mere allegation of patentability by the Applicant with no supporting rationale or explanation. Merely stating that the claims do not teach a feature does not offer any insight as to why the specific sections of the prior art relied upon by the Examiner fail to disclose the claimed features. Applicant's arguments amount to a general allegation that the claims define a patentable invention without specifically pointing out how the language of the claims patentably distinguishes them from the references. Accordingly, this argument is found unpersuasive.

14.	Applicant’s remaining arguments either logically depend from the above-rejected arguments, in which case they too are unpersuasive for the reasons set forth above, or they are directed to features which have been newly added via amendment. Therefore this is now the Examiner's first opportunity to consider these limitations in view of the prior art and as such any arguments regarding these limitations would be inappropriate since they have not yet been examined. A full rejection of these limitations in view of the prior art will be presented later in this Office Action.

Claim Objections

15.	Claims 1 and 8 are objected to because of the following informalities: typographical errors. 

Claim 1 was amended to recite “inputting the extraction data to the reinforcement learning mode”. Claim 1 should recite  inputting the extraction data to the reinforcement learning model”.
Claim 8 was amended to recite “inputting the extraction data to the reinforcement learning mode”. Claim 1 should recite  inputting the extraction data to the reinforcement learning model”.
Claim Rejections - 35 USC § 101

16.	35 U.S.C. 101 reads as follows: 
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

17.	Claims 1, 4, 7-8, 11, and 14 are rejected under 35 U.S.C. 101 because the claimed invention is directed to non-patentable subject matter. The claims are directed to an abstract idea without significantly more.

18.	Claims 1, 4, 7-8, 11, and 14 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. The judicial exception is not integrated into a practical application. The claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception. The eligibility analysis in support of these findings is provided below, in accordance with the “2019 Revised Patent Subject Matter Eligibility Guidance” (published on 1/7/2019 in Fed. Register, Vol. 84, No. 4 at pgs. 50-57, hereinafter referred to as the “2019 PEG”) and further clarified in the “October 2019 Update: Subject Matter Eligibility” (published on 10/17/2019, hereinafter referred to as the “October 2019 Update”).
With respect to Step 1 of the eligibility inquiry (as explained in MPEP 2106), it is first noted that the system (claims 1, 4, 7), and method (claims 8, 11, 14), are each directed to at least one potentially eligible category of subject matter (machine, and process, respectively), and therefore satisfy Step 1 of the eligibility inquiry.
With respect to Step 2A Prong One of 2019 PEG, it is next noted that the claims recite an abstract idea that falls under the “Mental Processes” group within the enumerated groupings of abstract ideas set forth in the 2019 PEG by reciting steps that can be performed in the human mind via observation, evaluation, judgement, or opinion, and also fall under the “Mathematical Concepts,” such as mathematical relationships, formulas, and calculations. With respect to independent claim 1, the limitations reciting the abstract idea are indicated in bold below:
cleaning production data (This step falls under the “Mental Processes” grouping by reciting a step for fixing or removing incorrect, corrupted, incorrectly formatted, duplicate, or incomplete data within a dataset, this step can be performed via human observation or perhaps by documenting the incorrect/duplicate data with the aid of pen and paper. The “cleaning” step can be carried out manually such as via human evaluation or judgement, and is therefore a mental step) 
pre-processing and calculating the production data to obtain extraction data (This step recites mathematical concepts, relationships, formulas or equations, or calculations, as well as via mental processes that can be performed in the mind via observation, evaluation, and judgment); and
establishing a reinforcement learning model and inputting the extraction data to the reinforcement mode to produce an optimal scheduling decision according to a score function and the extraction data (This step is performable as mathematical relationships, formulas, equations, and /or calculations, as well as via mental processes that can be performed via human evaluation or judgment. The Examiner further notes that, although “a reinforcement learning model” is recited in the “produce” step, the actual step for producing an optimal scheduling decision according to a score function and the extraction data is an activity that mimics human thought processes of determining an optimal scheduling decision, i.e., evaluation, and creating perhaps with paper and pencil, graphic data, a mathematical relationship, or the like perceptible in the human mind. See FairWarning IP, LLC v. Iatric Systems, Inc., 839 F.3d 1089, 1093-94 (Fed. Cir. 2016). The Federal Circuit has held similar concepts to be abstract. Thus, for example, the Federal Circuit has held that abstract ideas include the concepts of collecting data, analyzing the data, and reporting the results of the collection and analysis, including when limited to particular content. See, e.g., Intellectual Ventures I LLC v. Capital One Fin. Corp., 850 F.3d 1332, 1340-41 (Fed. Cir. 2017) (identifying the abstract idea of organizing, displaying, and manipulating data); Elec. Power Grp., LLC v. Alstom S.A., 830 F.3d 1350,13 54 (Fed. Cir. 2016) (characterizing collecting information, analyzing information by steps people go through in their minds, or by mathematical algorithms, and presenting the results of collecting and analyzing information, without more, as matters within the realm of abstract ideas). Thus, but for the scheduling calculation host to implement the reinforcement learning model, the step involving the reinforcement learning model, involves mathematical relationships and mental processes that can be performed via human evaluation or judgment, also falls within the realm of mental processes);
wherein the reinforcement learning model is configured to implement: establishing different simulating environments according to the score function and the extraction data, producing multiple scheduling decisions for each of the different simulating environments, and judging the optimal scheduling decision for each simulating environment by virtue of a reward mechanism (This step is also performable as mathematical relationships, formulas, equations, and/or calculations, as well as via mental processes that can be performed via human evaluation or judgment. As noted above, although “a reinforcement learning model” is recited, the actual step for producing multiple scheduling decisions for each of the different simulating environments and the step for judging the optimal scheduling decision are activities that mimic human thought processes of judging the an optimal scheduling decision for each simulating environment, i.e., evaluation, and creating perhaps with paper and pencil, graphic data, a mathematical relationship, or the like perceptible in the human mind. See FairWarning IP, LLC v. Iatric Systems, Inc., 839 F.3d 1089, 1093-94 (Fed. Cir. 2016). The Federal Circuit has held similar concepts to be abstract. Thus, for example, the Federal Circuit has held that abstract ideas include the concepts of collecting data, analyzing the data, and reporting the results of the collection and analysis, including when limited to particular content. See, e.g., Intellectual Ventures I LLC v. Capital One Fin. Corp., 850 F.3d 1332, 1340-41 (Fed. Cir. 2017) (identifying the abstract idea of organizing, displaying, and manipulating data); Elec. Power Grp., LLC v. Alstom S.A., 830 F.3d 1350,13 54 (Fed. Cir. 2016) (characterizing collecting information, analyzing information by steps people go through in their minds, or by mathematical algorithms, and presenting the results of collecting and analyzing information, without more, as matters within the realm of abstract ideas). Thus, but for the scheduling calculation host to implement the reinforcement learning model, the step involving the reinforcement learning model, involves mathematical relationships and mental processes that can be performed via human evaluation or judgment, also falls within the realm of mental processes);
Considered together, these steps set forth an abstract idea of receiving, analyzing, calculating production data, and producing multiple scheduling decisions based thereon, which falls under the “Mental Processes” and “Mathematical Concepts” abstract idea groupings set forth in the 2019 PEG since these activities can be accomplished as mental steps. Independent claim 8 recites similar limitations as claim 1 and is therefore determined to recite the same abstract idea as claim 1.
With respect to Step 2A Prong Two of the 2019 PEG, the judicial exception is not integrated into a practical application. The additional elements are directed to: a scheduling calculation host, multiple databases, a user terminal, and a reinforcement learning model (claim 1), scheduling calculation host and databases (claim 4), scheduling calculation host and user terminal (claim 7), multiple databases, data cleaning module, and a reinforcement learning model (claim 8), and databases (claim 11). These elements have been considered, however they merely describe elements of one or more generic computers and/or instructions (software) to implement the abstract idea, similar to simply adding the words “apply it,” which is not sufficient to amount to a practical application, as noted in the 2019 PEG.  See also, MPEP 2106.05(f).  See also, Alice Corp., 134 S. Ct. 2347, 110 USPQ2d 1976; Versata Dev. Group, Inc. v. SAP Am., Inc., 793 F.3d 1306, 1334, 115 USPQ2d 1681, 1701 (Fed. Cir. 2015). Furthermore, the additional elements(s) fail to provide an improvement to the functioning of a computer or to any other technology or technical field, fail to apply the exception with a particular machine, fail to apply the judicial exception to effect a particular treatment or prophylaxis for a disease or medical condition, fail to effect a transformation of a particular article to a different state or thing, and fail to apply/use the abstract idea in a meaningful way beyond generally linking the use of the judicial exception to a particular technological environment. When the “reinforcement learning model” (claims 1, 7-8)  is evaluated as an additional element, these features are recited at a high level of generality and have not been shown to improve upon any technology or the server apparatus itself. Furthermore, the additional elements(s) fail to provide an improvement to the functioning of a computer or to any other technology or technical field, fail to apply the exception with a particular machine, fail to apply the judicial exception to effect a particular treatment or prophylaxis for a disease or medical condition, fail to effect a transformation of a particular article to a different state or thing, and fail to apply/use the abstract idea in a meaningful way beyond generally linking the use of the judicial exception to a particular technological environment.
Accordingly, because the Step 2A Prong One and Prong Two analysis resulted in the conclusion that the claims are directed to an abstract idea, additional analysis under Step 2B of the eligibility inquiry must be conducted in order to determine whether any claim element or combination of elements amount to significantly more than the judicial exception.
With respect to Step 2B of the eligibility inquiry, it has been determined that the claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception. The additional elements are directed to: a scheduling calculation host, multiple databases, a user terminal, and a reinforcement learning model (claim 1), scheduling calculation host and databases (claim 4), scheduling calculation host and user terminal (claim 7), multiple databases, data cleaning module, and a reinforcement learning model (claim 8), and databases (claim 11). These additional elements have been evaluated, but fail to add significantly more to the claims because they amount to using generic computing elements or instructions (software) to perform the abstract idea, similar to adding the words “apply it” (or an equivalent), which merely serves to link the use of the judicial exception to a particular technological environment (network computing environment) and does not amount to significantly more than the abstract idea itself. Notably, Applicant’s Specification suggests that virtually any computing device(s) under the sun may be used to implement the invention, including generic computers (Specification at paragraph [0036]: e.g., “The user terminal 230 may include a user interface, such as a display or a tablet computer, connected to the scheduling calculation host 210. The scheduling calculation host 210 is configured to perform a series of processing to the production data from the database 220, and then generate an optimal scheduling decision to the user interface for the user's reference.”). Therefore, the additional elements merely describe generic computing elements or computer-executable instructions (software) merely serve to tie the abstract idea to a particular operating environment, which does not add significantly more to the abstract idea.  See, e.g., Alice Corp., 134 S. Ct. 2347, 110 USPQ2d 1976; Versata Dev. Group, Inc. v. SAP Am., Inc., 793 F.3d 1306, 1334, 115 USPQ2d 1681, 1701 (Fed. Cir. 2015).
Next, when the “reinforcement learning model” (claims 1, 7-8) is evaluated as an additional element, these features are recited at a high level of generality and encompass well-understood, routine, and conventional prior art activity. See, e.g., Hobbs et al., Pub. No.: US 2003/0179717 A1, noting in paragraph [0008] that “Reinforcement Learning is a well-known technique relying on taking actions and observing the reward (or "Reinforcement") which results and adjusting future actions in accordance with that reward.” Accordingly, the use of a reinforcement learning model does not add significantly more to the claims.
In addition, when taken as an ordered combination, the ordered combination adds nothing that is not already present as when the elements are taken individually. There is no indication that the combination of elements integrate the abstract idea into a practical application. Their collective functions merely provide generic computer implementation. Therefore, when viewed as a whole, these additional claim elements do not provide meaningful limitations to transform the abstract idea into a practical application of the abstract idea or that, as an ordered combination, amount to significantly more than the abstract idea itself.
Dependent claims 4, 7, 11, and 14 recite the same abstract idea as recited in the independent claims, and when evaluated under Step 2A Prong One are found to merely recite additional details that narrow the abstract idea. For example, claims 4, 7, 11, 14 recite steps for cleaning and filtering useless data in the production data; and adjusting results of scheduling decisions in real time, which similar to base claims 1/8, fall under the same “Mental Processes” abstract idea groupings by describing mental activities that can be accomplished via human observation, judgment, or evaluation. The ordered combination of elements in the dependent claims (including the limitations inherited from the parent claim(s)) adds nothing that is not already present as when the elements are taken individually. There is no indication that the combination of elements improves the functioning of a computer or improves any other technology. Their collective functions merely provide generic computer implementation. Accordingly, the subject matter encompassed by the dependent claims fails to amount to a practical application or significantly more than the abstract idea itself.
The ordered combination of elements in the dependent claims (including the limitations inherited from the parent claim(s)) add nothing that is not already present as when the elements are taken individually. There is no indication that the combination of elements improves the functioning of a computer or improves any other technology. Their collective functions merely provide generic computer implementation.  Accordingly, the subject matter encompassed by the dependent claims fails to amount to a practical application or significantly more than the abstract idea itself.
For more information, see MPEP 2106. The January 2019 Guidance is available at https://www.uspto.gov/patent/laws-and-regulations/examination-policy/subject-matter-eligibility.

Claim Rejections - 35 USC § 103

19.	In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

20.	The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

21.	The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

22.	Claims 1, 4, 7, 11, and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Serita et al., Pub. No.: US 2021/0056484 A1, [hereinafter Serita], in view of Chan et al., Pub. No.: US 2020/0387818 A1, [hereinafter Chan], in view of Lettowsky et al., Pub. No.: US 2019/0240889 A1, [hereinafter Lettowsky], in further view of Wen et al., Pub. No.: WO 2020/040763 A1, [hereinafter Wen].

As per claim 1, Serita teaches a production scheduling system, comprising a scheduling calculation host, multiple databases connected to the scheduling calculation host, and a user terminal (paragraph 0011, discussing a system, involving means for determining an initial scheduling policy for internal processes to meet the order and a due date policy for the order; means for executing a simulation involving scheduling decisions and due date quotations based on the initial scheduling policy and the due date policy; means for executing a machine learning process on results of the simulation; and means for outputting the finalized scheduling policy and the due date policy in response to the order; paragraph 0014, “FIG. 2, illustrates an example of the system architecture”; paragraph 0024, discussing that FIG. 2 illustrates an example of the system architecture, in accordance with an example implementation. Example implementations can provide due date and scheduling management system [i.e., production scheduling system] on top of an existing system in a factory. In example implementations, it is assumed that a factory  manages customer orders with a computer system that facilitates functionality for an Enterprise Resource Planning (ERP) system. It is also assumed that a factory traces its production status and instructs the operations through a computer system [i.e., user terminal], which is a common function of Manufacturing Execution System (MES); paragraph 0025, discussing that due date production scheduling (DDPS) system [i.e., production scheduling system] stores history data in a factory about orders and production status, which facilitates the DDPS to build a simulation model that reproduces the events in the past. DDPS optimization module [i.e., scheduling calculation host] provides a function to learn policies for due date quotation and scheduling in simulation environment. DDPS also provides a function to apply learned policies to the existing system; FIG. 2, element 221, illustrating an Orders database, and element 231, illustrating a Production status database [i.e., multiple databases connected to the scheduling calculation host - as illustrated in FIG. 2, the Orders database and the Production status databases are connected to the DDPS optimization module]; FIG. 2, element 241 illustrating an Order history database, and element 242 illustrating a Production history database), and the scheduling calculation host configured to implement:

pre-processing and calculating the production data to obtain extraction data (paragraph 0030, discussing that the module collects historical data of orders and production status. This can be done by transferring batched data from the ERP and MES system, or by fetching snapshots in a real time fashion from those systems and storing the snapshots for a certain period. The required period of historical data depends on the variety of the customer orders and productions, and can be adjusted accordingly to fit the desired implementation; paragraph 0031, discussing that the module builds a simulation environment of order arriving and production lines with a discrete-event simulation (DES). A DES is a general tool to simulate stochastic discrete events and widely used to simulate discrete processes in manufacturing lines. With sufficient knowledge regarding the production lines including the process flow described in Table 1 and a list of machines, example implementations can thereby construct a simulator that virtually generates events in production lines. The events include job arrivals in a process, starting operation, finishing operation, and so on. From historical order information, example implementations can extract order trends [i.e., transferring batched data from the ERP and MES system to obtain extracted order trends suggests pre-processing and calculating the production data to obtain extraction data – it is noted that batched data is data that has been collected and processed] such as arriving rate of order in terms of product, quantity, and requested due date. The extracted trends are fed to the simulator to generate virtual orders which reflects the actual order trends. The simulator built at this stage is used to learn the policies; paragraph 0033, discussing that the simulator generates virtual orders based on the trends extracted; paragraph 0045, discussing that when the scheduler selects an order, the application module extracts job information as well as current production status from existing systems and convert them into a form to which the learned policy can be applied; paragraph 0046); and

establishing a reinforcement learning model and inputting the extraction data to produce an optimal scheduling decision according to a score function and the extraction data (paragraph 0008, discussing that example implementations search the optimal policies by updating both policies simultaneously based on the feedback in a simulation environment. The feedback reflects on how each decision was (e.g., better/worse) in terms of a target metric to be improved on; paragraph 0010, discussing instructions for processing and responding to an order, the instructions involving determining an initial scheduling policy for internal processes to meet the order and a due date policy for the order; a) executing a simulation involving scheduling decisions and due date quotations based on the initial scheduling policy and the due date policy; b) executing a machine learning process on results of the simulation to update the scheduling policy and the due date policy by evaluating the scheduling decisions and the due date quotations according to a scoring function which is common for evaluating the scheduling decisions and evaluating the due date quotations [i.e., evaluating the scheduling decisions according to a scoring function to determine a finalized scheduling policy suggests producing an optimal scheduling decision according to a score function]; iteratively executing a) and b) until a finalized scheduling policy and the due date policy is determined; and outputting the finalized scheduling policy [i.e., outputting the finalized scheduling policy is considered to be producing an optimal scheduling decision] and the due date policy in response to the order; paragraph 0033, discussing that DDPS (due date production scheduling system) continuously update the policies through multiple runs of the simulation. The simulator generates virtual orders based on the trends extracted. Each time the simulator issues an event such as order arriving and machine becoming available, the DDPS module makes a decision based on current policies. Then, the simulator advances the clock and generates the next event based on the decision taken. The simulation result gives the DDPS feedback regarding how good or bad the current situation is. A feedback is quantified by metrics described in the following session. Once DDPS receives feedback, DDPS updates the policies based on it. This update process can be done following the framework known as Reinforcement Learning, one of the machine learning methods; paragraph 0035, discussing that the detail of the mechanism to update due date quotation and scheduling policies is as follows. As mentioned previously, DDPS module uses reinforcement learning (RL) to update the policies. RL defines an agent, which is a subject taking an action according to the policy and interacting with an environment; paragraph 0045, discussing that when the scheduler selects an order, the application module extracts job information as well as current production status from existing systems and convert them into a form to which the learned policy can be applied [i.e., an optimal scheduling decision is produced according to the extraction data]; paragraph 0060, discussing that processor(s) can be configured to, in response to an order received by the apparatus such as computing device 703, determine an initial scheduling policy for internal processes to meet the order and a due date policy for the order; a) execute a simulation involving scheduling decisions and due date quotations based on the initial scheduling policy and the due date policy; b) execute a machine learning process on results of the simulation to update the scheduling policy and the due date policy by evaluating the scheduling decisions and the due date quotations according to a scoring function which is common for evaluating the scheduling decisions and evaluating the due date quotations as described with respect to reinforcement learning and the reinforcement learning implementations herein; iteratively execute a) and b) until a finalized scheduling policy and the due date policy is determined; and output the finalized scheduling policy and the due date policy in response to the order; paragraph 0064, “regarding reinforcement learning implementations, the scoring function can be a cost function that is weighted based on quoted due date and actual delivery date as determined by the simulation”; paragraph 0046);

wherein the reinforcement learning model is configured to implement: establishing different simulating environments according to the sore function and the extraction data, producing multiple scheduling decisions for each of the different simulating environments, and judging the optimal scheduling decision for each simulating environment by virtue of a reward mechanism (paragraph 0008, discussing that example implementations search the optimal policies by updating both policies simultaneously based on the feedback in a simulation environment. The feedback reflects on how each decision was (e.g., better/worse) in terms of a target metric to be improved on; paragraph 0012, discussing that the apparatus involving a processor, configured to, in response to an order received by the apparatus, determine an initial scheduling policy for internal processes to meet the order and a due date policy for the order; a) execute a simulation involving scheduling decisions and due date quotations based on the initial scheduling policy and the due date policy; b) execute a machine learning process on results of the simulation to update the scheduling policy and the due date policy by evaluating the scheduling decisions and the due date quotations according to a scoring function which is common for evaluating the scheduling decisions and evaluating the due date quotations; iteratively execute a) and b) until a finalized scheduling policy and the due date policy is determined; and output the finalized scheduling policy and the due date policy in response to the order [i.e., iteratively executing simulations involving scheduling decisions and executing a machine learning process on results of the simulations according to a scoring function suggests establishing different simulating environments according to the sore function and the extraction data and producing multiple scheduling decisions for each of different simulating environments]; paragraph 0033, discussing that DDPS continuously update the policies through multiple runs of the simulation. The simulator generates virtual orders based on the trends extracted. Each time the simulator issues an event such as order arriving and machine becoming available, the DDPS module makes a decision based on current policies. Then, the simulator advances the clock and generates the next event based on the decision taken. The simulation result gives the DDPS feedback regarding how good or bad the current situation is [i.e., providing feedback regarding how good or bad the scheduling decision determined by the simulation is suggests judging the optimal scheduling decision for each stimulating environment]. A feedback is quantified by metrics described in the following session. Once DDPS receives feedback, DDPS updates the policies based on it. This update process can be done following the framework known as Reinforcement Learning, one of the machine learning methods; paragraph 0035, discussing that the detail of the mechanism to update due date quotation and scheduling policies is as follows. As mentioned previously, DDPS module uses reinforcement learning (RL) to update the policies. RL defines an agent, which is a subject taking an action according to the policy and interacting with an environment; paragraph 0039, discussing that RL (reinforcement learning) receives a feedback from an environment as a reward. There are several options to define the reward for DDPS agents; paragraph 0043, discussing that FIG. 4 illustrates how two agents learn the policies through the interaction with the simulation environment, in accordance with an example implementation. Whenever the simulator 400 issues an event related to the due date quotation or job dispatch, the relevant agent 410 takes an action based on the current policy. The action taken changes the state of the environment in the simulation environment 400 and generates the feedback. Based on the reward, the agents update their policies [i.e., updating policies of the agents of the Reinforcement Learning algorithm based on a reward suggests judging the optimal scheduling decision for each simulating environment by virtue of a reward mechanism]. Sharing the same reward allows the agents to update their policy towards the same goal; paragraph 0060, discussing that processor(s) 810 can be configured to, in response to an order received by the apparatus such as computing device 703, determine an initial scheduling policy for internal processes to meet the order and a due date policy for the order; a) execute a simulation involving scheduling decisions and due date quotations based on the initial scheduling policy and the due date policy; b) execute a machine learning process on results of the simulation to update the scheduling policy and the due date policy by evaluating the scheduling decisions and the due date quotations according to a scoring function which is common for evaluating the scheduling decisions and evaluating the due date quotations as described with respect to reinforcement learning and the reinforcement learning implementations herein; iteratively execute a) and b) until a finalized scheduling policy and the due date policy is determined; and output the finalized scheduling policy and the due date policy in response to the order; paragraph 0064, “regarding reinforcement learning implementations, the scoring function can be a cost function that is weighted based on quoted due date and actual delivery date as determined by the simulation; paragraphs 0031, 0042); 

the scheduling calculation host is further configured to implement calculating and extracting the extraction data which is input to the model (paragraph 0025, discussing that the due date production scheduling (DDPS) system stores history data in a factory about orders and production status, which facilitates the DDPS to build a simulation model that reproduces the events in the past. DDPS optimization module provides a function to learn policies for due date quotation and scheduling in simulation environment. DDPS also provides a function to apply learned policies to the existing system; paragraph 0027, discussing that Table 2 illustrates an example of order information...In situations where that customer can request the preferred due date, the order information can include such information...; paragraph 0030, discussing that the module collects historical data of orders and production status. This can be done by transferring batched data from the ERP and MES system, or by fetching snapshots in a real time fashion from those systems and storing the snapshots for a certain period. The required period of historical data depends on the variety of the customer orders and productions, and can be adjusted accordingly to fit the desired implementation; paragraph 0031, discussing that the module builds a simulation environment of order arriving and production lines with a discrete-event simulation (DES). A DES is a general tool to simulate stochastic discrete events and widely used to simulate discrete processes in manufacturing lines...From historical order information, example implementations can extract order trends such as arriving rate of order in terms of product, quantity, and requested due date. The extracted trends are fed to the simulator to generate virtual orders which reflects the actual order trends. The simulator built at this stage is used to learn the policies [i.e., the extracted trends that are fed to the simulator suggests extraction data which is input to the reinforcement leaning model]; paragraph 0033, discussing that  DDPS continuously update the policies through multiple runs of the simulation. The simulator generates virtual orders based on the trends extracted. Each time the simulator issues an event such as order arriving and machine becoming available, the DDPS module makes a decision based on current policies…The simulation result gives the DDPS feedback regarding how good or bad the current situation is. A feedback is quantified by metrics described in the following session. Once DDPS receives feedback, DDPS updates the policies based on it. This update process can be done following the framework known as Reinforcement Learning, one of the machine learning methods; paragraph 0045, discussing that when the scheduler selects an order, the application module extracts job information as well as current production status from existing systems and convert them into a form to which the learned policy can be applied; paragraph 0028),

the extraction data comprises order delivery date, and current production status (paragraph 0026, discussing a function that takes an order information and current production status as input; paragraph 0028, discussing that quoted due is a due date the factory replied to the customer as a result of due date quotation. Process represents the current stage in which the job is positioned; paragraph 0045, discussing that when the scheduler selects an order, the application module extracts job information as well as current production status from existing systems and convert them into a form to which the learned policy can be applied; paragraph 0046, discussing that FIG. 6 illustrates an example of the dashboard for production scheduling operation, in accordance with an example implementation. The top left panel shows machine status in a factory. When the scheduler selects a process waiting for an assignment of the next job, the jobs in that process appear in the top right panel. The application module extracts current production status and applies the policies learned).

While Serita teaches pre-processing and calculating the production data to obtain extraction data, it does not explicitly teach cleaning production data from the multiple databases; inputting the extraction data to the reinforcement learning mode; the scheduling calculation host is further configured to implement calculating and extracting the extraction data which is input to the reinforcement model, and the extraction data comprises production time, machine maintenance status, and urgency. Chan in the analogous art of industrial process optimization teaches: 

cleaning production data from the multiple databases (paragraph 0006, discussing a computer system for building and deploying a model to optimize assets in an industrial process. The system includes a processor operatively coupled to a data storage system. The processor is configured to implement a data preparation module, a model development module, and an execution module; paragraph 0081, “various process data are obtained 120 by importing P&ID plant design data, and loading plant historical operating data. An improved dataset is generated by aggregation, data cleansing, and pre-processing 120”; paragraph 0090, discussing that the method may load (import) operations data for the subject production process variables from other sources, such as plant P&ID and design data, other plant data servers, plant management systems, or any other resources of the plant...The loaded operations data includes continuous measurements for a number of process variables for the subject production process, as, typically, measurements for hundreds or even thousands of process variables are stored in the plant historian or plant asset database over time for a production process; paragraph 0093, “the method 100, at step 120-2, performs data cleansing and repair on the raw input dataset generated in step 120-1”; paragraph 0131, discussing that the raw dataset may be cleansed of such missing values and bad measurements to generate a cleansed dataset; paragraph 0142, discussing that the MOP (Monthly Operation Plan) model building starts with selecting data sources and loading historical MOP cases data, then the data is cleaned and preprocessed; paragraph 0152, discussing that the user may request data cleansing to be performed on the generated dataset (or a plant system of network environment may automatically request the performance of data cleansing). In response, the user interface may communicate with the input data preparation module to perform functions on the dataset that may include data screening, slicing, repairing, and pre-processing to reduce the dataset (e.g., remove bad quality data segments and measurements for uninformative process variables; paragraph 0150).

Chan suggests that the extraction data comprises machine maintenance status (paragraph 0043, “a system should be able to build a model from historical data and maintenance information, predict failures in advance, and provide action guidance to prevent a process or an equipment from failures or unplanned shut-downs”; paragraph 0062, discussing that with validated input data values, the system executes one or more tasks with pre-defined problems in step (1). This may include generating online model predictions of a production quality, a projected profit, or an early detection of equipment failures, depending on the applications; the system execution may also include resolving an optimized production plan for maximum profits, an optimal equipment maintenance schedule for maximum uptime, or an adjustment of plant operation for minimum cost, etc.).

Serita is directed towards a system and method for optimizing production scheduling. Chan is directed towards production planning optimization. Therefore they are deemed to be analogous as they both are directed towards optimization of production scheduling. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Serita with Chan because the references are analogous art because they are both directed to solutions for production planning, which falls within applicant’s field of endeavor (production scheduling systems and methods), and because modifying Serita to include Chan’s features for cleaning production data from the multiple databases, in the manner claimed, would serve the motivation of reducing the dataset by cleansing bad quality data segments and generating an improved dataset by aggregation, data cleansing, and pre-processing (Chan at paragraph 0081), or in the pursuit of deploying multiple predictive models in an easy workflow and to support asset optimization, and for a long-term sustained safe operation and production, which supports manufacturers continually optimizing the performance of their assets --improving safety, managing risk, reducing downtime, enhancing productivity, and increasing profitability (Chan at paragraph 0124); and further obvious because the claimed invention is merely a combination of old elements, and in the combination each element merely would have performed the same function as it did separately, and one of ordinary skill in the art would have recognized that the results of the combination were predictable.

The Serita-Chan combination does not explicitly teach inputting the extraction data to the reinforcement learning mode; the scheduling calculation host is further configured to implement calculating and extracting the extraction data which is input to the reinforcement model, and the extraction data comprises production time, machine maintenance status, and urgency. Lettowsky in the analogous art of production planning systems teaches:

the extraction data comprises production time and machine maintenance status. (paragraph 0002, discussing a method for providing, retrieving and using a data element in the value chain/value creation chain of a plastic sheet material for producing a final product; paragraph 0225, discussing that the trigger to request this information can either be performed by an operator of the plant, an employee in the work preparation or automatically. The information contains information on the recipe as well as machine settings...Just like the system reads information from a memory about a suitable detection device, it writes this information into the memory, which is relevant for further processing of one or more semi-finished product(s). Information may include, inter alia: date of manufacture, manufacturing time, i.e. start time and end time of the roll, place of manufacture, the used machine (type, manufacturer, machine number, maintenance condition), recipe (exact name of type and amount of raw materials used), environmental conditions during production (air humidity, temperature and air pressure), machine setting values, process parameters and inline recorded quality parameters of the produced semi-finished product; paragraph 0236, discussing that additional information added to a data element as information can relate the date of manufacture, the time of manufacture, i.e. start time and end time of the roll, the place of manufacture, the machine used (type, manufacturer, machine number, maintenance condition), recipe, environmental conditions in the production, machine setting values, process parameters and quality parameters of the semi-finished products; paragraph 0109).

The Serita-Chan combination is directed towards production scheduling optimization. Lettowsky is directed towards optimized production planning and control. Therefore they are deemed to be analogous as they both are directed towards optimization of production planning. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the Serita-Chan combination with Lettowsky because the references are analogous art because they are both directed to solutions for production planning, which falls within applicant’s field of endeavor (production scheduling systems and methods), and because modifying the Serita-Chan combination to include Lettowsky’s feature for including extraction data comprising production time and machine maintenance status, in the manner claimed, would serve the motivation of reducing the cost of a production process and/or the throughput times of a production process (Lettowsky at paragraph 0269); and further obvious because the claimed invention is merely a combination of old elements, and in the combination each element merely would have performed the same function as it did separately, and one of ordinary skill in the art would have recognized that the results of the combination were predictable.

The Serita-Chan-Lettowsky combination does not explicitly teach inputting the extraction data to the reinforcement learning mode to produce an optimal scheduling decision; the scheduling calculation host is further configured to implement calculating and extracting the extraction data which is input to the reinforcement model, and the extraction data comprises urgency. However, Wen in the analogous art of production scheduling teaches these concepts. Wen teaches:

inputting the extraction data to the reinforcement learning mode to produce an optimal scheduling decision (paragraph 0004, discussing methods and systems for a fast production scheduling approach based on deep reinforcement learning (DRL); paragraph 0031, discussing that a state of a production schedule is identified. The state of the production schedule may include information relating to factors for different machines (e.g. machine availability, product on machine, remaining execution time, machine input queue, machine output queue), different products, workers, environmental factors, cost parameters, supply, demand, transportation parameters, among other data that is related to the production schedule; paragraph 0033, discussing that the simulator may predict multiple possible states for the future based on a current state and prior run simulations. The simulator may introduce random disturbances to relevant aspects of the environment, e. g. variable processing times, machine breakdown, and electricity prices when generating the future predicted states. Each of the future predicted states may be used below to identify futures steps and generate a production schedule; paragraph 0034, discussing that the state is input into a neural network trained to generate a plurality of sub-optimal scheduling policies. In an embodiment, the neural network may be a deep reinforcement learning (DRL) network. The DRL is pre-trained to identify one or more sub-optimal policies. The term sub-optimal here refers to policies that may not be the ideal or optimal policy for proceeding with the production schedule; paragraph 0036, discussing that the reward provided by the simulator may be identified by the simulator...The reward may reflect a makespan or other quantifiable value that reflects the object or objects of the production schedule. The reward may be based on multiple different values and may be determined using an algorithm that weigh different values differently; paragraph 0038, discussing that the neural network is defined as a plurality of sequential feature units or layers. The machine network inputs state data, compresses the state data into a latent space and maps the features from the latent space using the LSTM 402. The encoder is trained using classical unsupervised learning algorithm to train an autoencoder;  paragraph 0057, discussing that a production schedule is generated using the one or more near optimal scheduling policies. The production schedule may include one or more steps or actions to perform as defined by the one or more optimal policies...In a scenario where there is a change to the underlying manufacturing parameters (e.g. a cost or machine change), the production schedule may be adapted. The change in state may be input into the DRL (deep reinforcement learning)  directly or using the manufacturing plant simulator [i.e., inputting data into the deep reinforcement learning model corresponds to inputting the extraction data to the reinforcement learning model]. The DRL agent may use the updated search and parameters to generate a new near optimal production schedule given the new state and manufacturing parameters; paragraph 0032);

the scheduling calculation host is further configured to implement calculating and extracting the extraction data which is input to the reinforcement model (paragraph 0057, discussing that a production schedule is generated using the one or more near optimal scheduling policies. The production schedule may include one or more steps or actions to perform as defined by the one or more optimal policies...In a scenario where there is a change to the underlying manufacturing parameters (e.g. a cost or machine change), the production schedule may be adapted. The change in state may be input into the DRL (deep reinforcement learning)  directly or using the manufacturing plant simulator [i.e., inputting data into the deep reinforcement learning model corresponds to inputting the extraction data to the reinforcement learning model]. The DRL agent may use the updated search and parameters to generate a new near optimal production schedule given the new state and manufacturing parameters; paragraph 0060, discussing that  the DRL agent 103 runs a plurality of simulations using data from the simulator and identifies a reward using a predefined reward function. A high fidelity manufacturing plant simulator  is configured to generate new states given an action from the agent. The simulator and the agent are configured to run simulations from a state to an end of a production schedule. The rewards for each state and complete simulation may be calculated using a known reward function; paragraph 0082, discussing that the sub-optimal scheduling policies are used by the processor. The processor speeds up the search for the optimal polices. The processor is configured to return an initial schedule based on static state information…The processor performs continuous rollout. The rollout continues to search for a feasible/better scheduling policy and augment the schedule dynamically even after some tasks have already been dispatched…The continuous rollout provides that the schedule reacts to the dynamics of manufacturing systems in a timely manner, e.g. the schedule can be adjusted if, for example, a machine fails, or conditions change (e.g. price of a commodity or power changes or CPU availability or delivery or environmental conditions); paragraph 0032); and 

the extraction data comprises urgency (paragraph 0029, discussing that the sub-optimal scheduling policies are fed into the MCTS agent. The MCTS agent speeds up the search for near optimal polices in offline training phase…In real-time scheduling, the MCTS agent performs continuous rollout utilizing the continual acquisition of incrementally available information, e.g. machine breakdown, machine processing time etc. For example, the calculated policies from DRL (deep reinforcement learning) agent may become infeasible because of machine breakdown or significant changes in environmental conditions, e.g. order priority [i.e., urgency]. The rollout continues to search for a feasible/better scheduling policy and augment the schedule dynamically even after some tasks have already been dispatched. The continuous rollout provides that the schedule reacts to the dynamics of manufacturing systems in a timely manner, e.g. the schedule can be adjusted if, for example, a machine fails or conditions change (e.g. price of a commodity or power changes or CPU; paragraph 0031, discussing that a state of a production schedule is identified. The state of the production schedule may include information relating to factors for different machines (e.g. machine availability, product on machine, remaining execution time, machine input queue, machine output queue), different products, workers, environmental factors, cost parameters, supply, demand, transportation parameters, among other data that is related to the production schedule; paragraph 0057).

The Serita-Chan-Lettowsky combination is directed towards production scheduling optimization. Wen is directed towards a system for production scheduling. Therefore they are deemed to be analogous as they both are directed towards optimization of production schedules. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the Serita-Chan-Lettowsky combination with Wen because the references are analogous art because they are both directed to solutions for production scheduling, which falls within applicant’s field of endeavor (production scheduling systems and methods), and because modifying the Serita-Chan-Lettowsky combination to include Wen’s features for inputting the extraction data to the reinforcement learning mode to produce an optimal scheduling decision, calculating and extracting the extraction data which is input to the reinforcement model, and including extraction data comprising urgency, in the manner claimed, would serve the motivation of increased production efficiency and maximizing the efficiency of the operation and reduce costs (Wen at paragraph 0002) or in the pursuit of quickly and efficiently generate an efficient schedule for operation of the machines in order to generate, for example, a final product (Wen at paragraph 0075); and further obvious because the claimed invention is merely a combination of old elements, and in the combination each element merely would have performed the same function as it did separately, and one of ordinary skill in the art would have recognized that the results of the combination were predictable.

As per claim 4, the Serita-Chan-Lettowsky-Wen combination teaches the production scheduling system according to claim 1. Although not taught by Serita, Chan in the analogous art of industrial processes optimization teaches wherein the scheduling calculation host is further configured to implement cleaning and filtering useless data in the production data of the databases (paragraph 0090, discussing that the method may load (import) operations data for the subject production process variables from other sources, such as plant P&ID and design data, other plant data servers, plant management systems, or any other resources of the plant...The loaded operations data includes continuous measurements for a number of process variables for the subject production process, as, typically, measurements for hundreds or even thousands of process variables are stored in the plant historian or plant asset database over time for a production process; paragraph 0095, discussing that the method provides flexibility to pre-process the marked bad quality measurement values of the dataset with several repair and removal processing options to cleanse these values; paragraph 0131, discussing that the raw dataset may be cleansed of such missing values and bad measurements to generate a cleansed dataset; paragraph 0142, discussing that the MOP (Monthly Operation Plan) model building starts with selecting data sources and loading historical MOP cases data, then the data is cleaned and preprocessed; paragraph 0152, discussing that the user may request data cleansing to be performed on the generated dataset. In response, the user interface 401 may communicate with the input data preparation module to perform functions on the dataset that may include data screening, slicing, repairing, and pre-processing to reduce the dataset (e.g., remove bad quality data segments and measurements for uninformative process variables) – [i.e. removing bad quality data segments suggests cleaning and filtering useless data]; paragraph 0153, discussing that the input data preparation module may further reduce the enriched dataset by removing identified redundant inputs in each highly correlated input group, and eliminating less-contributed inputs through feature selections…; paragraph 0154).

Serita is directed towards a system and method for optimizing production scheduling. Chan is directed towards production planning optimization. Therefore they are deemed to be analogous as they both are directed towards optimization of production scheduling. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Serita with Chan because the references are analogous art because they are both directed to solutions for production planning, which falls within applicant’s field of endeavor (production scheduling systems and methods), and because modifying Serita to include Chan’s feature for cleaning and filtering useless data in the production data of the databases, in the manner claimed, would serve the motivation of reducing the dataset by cleansing bad quality data segments and generating an improved dataset by aggregation, data cleansing, and pre-processing (Chan at paragraph 0081), or in the pursuit of deploying multiple predictive models in an easy workflow and to support asset optimization, and for a long-term sustained safe operation and production, which supports manufacturers continually optimizing the performance of their assets --improving safety, managing risk, reducing downtime, enhancing productivity, and increasing profitability (Chan at paragraph 0124); and further obvious because the claimed invention is merely a combination of old elements, and in the combination each element merely would have performed the same function as it did separately, and one of ordinary skill in the art would have recognized that the results of the combination were predictable.
As per claim 7, the Serita-Chan-Lettowsky-Wen combination teaches the production scheduling system according to claim 1. Serita further teaches wherein the scheduling calculation host is further configured to implement receiving feedback information from the user terminal and adjusting results of scheduling decisions in real time, according to the feedback information returned by the reinforcement learning model (paragraph 0008, discussing that example implementations involve a virtual environment which simulate the factory production lines and order arrivals. The example implementations learn the policy for due date quotation and scheduling through trials of decisions made on the virtual environment. Due date quotation and scheduling processes are modeled as a collaborative decision making process, wherein the example implementations search the optimal policies by updating both policies simultaneously based on the feedback in a simulation environment [i.e., updating both policies simultaneously based on the feedback in a simulation environment is considered to be adjusting results of scheduling decisions according to the information feedback module – as described below the update to the scheduling policies is executed in real-time]. The feedback reflects on how each decision was (e.g., better/worse) in terms of a target metric to be improved on, such as tardiness. After learning the due date quotation and scheduling policies on the virtual environment, the policies are installed on the actual operation system. The policies thereby facilitate a factory or manufacturing system to execute an appropriate decision in a real time fashion; paragraph 0033, discussing that DDPS continuously update the policies through multiple runs of the simulation. The simulator generates virtual orders based on the trends extracted at 301. Each time the simulator issues an event such as order arriving and machine becoming available, the DDPS module makes a decision based on current policies. Then, the simulator advances the clock and generates the next event based on the decision taken. The simulation result gives the DDPS feedback regarding how good or bad the current situation is. A feedback is quantified by metrics described in the following session. Once DDPS receives feedback (i.e., the information feedback module is connected with the user terminal), DDPS updates the policies based on it. This update process can be done following the framework known as Reinforcement Learning, one of the machine learning methods; paragraph 0035, describes a mechanism to update due date quotation and scheduling policies using reinforcement learning (RL) to update the policies; paragraph 0039, discussing that the RL receives a feedback from an environment as a reward (i.e., the feedback information returned by the reinforcement learning model); paragraphs 0043, 0054).

Claim 8 recites substantially similar limitations that stand rejected via the art citations and rationale applied to claim 1, as discussed above. Further, as per claim 8 Serita teaches a production scheduling method (paragraph 0070, discussing a method for processing and responding to an order, involving determining an initial scheduling policy for internal processes to meet the order and a due date policy for the order; a) executing a simulation involving scheduling decisions and due date quotations based on the initial scheduling policy and the due date policy; b) executing a machine learning process on results of the simulation to update the scheduling policy and the due date policy by evaluating the scheduling decisions and the due date quotations according to a scoring function which is common for evaluating the scheduling decisions and evaluating the due date quotations; iteratively executing a) and b) until a finalized scheduling policy and the due date policy is determined; and outputting the finalized scheduling policy and the due date policy in response to the order; paragraphs 0069, 0070), comprising the steps of:

(2) pre-processing and calculating the production data to obtain extraction data (paragraph 0030, discussing that the module collects historical data of orders and production status. This can be done by transferring batched data from the ERP and MES system, or by fetching snapshots in a real time fashion from those systems and storing the snapshots for a certain period. The required period of historical data depends on the variety of the customer orders and productions, and can be adjusted accordingly to fit the desired implementation; paragraph 0031, discussing that the module builds a simulation environment of order arriving and production lines with a discrete-event simulation (DES). A DES is a general tool to simulate stochastic discrete events and widely used to simulate discrete processes in manufacturing lines. With sufficient knowledge regarding the production lines including the process flow described in Table 1 and a list of machines, example implementations can thereby construct a simulator that virtually generates events in production lines. The events include job arrivals in a process, starting operation, finishing operation, and so on. From historical order information, example implementations can extract order trends [i.e., transferring batched data from the ERP and MES system to obtain extracted order trends suggests pre-processing and calculating the production data to obtain extraction data – it is noted that batched data is data that has been collected and processed] such as arriving rate of order in terms of product, quantity, and requested due date. The extracted trends are fed to the simulator to generate virtual orders which reflects the actual order trends. The simulator built at this stage is used to learn the policies; paragraph 0033, discussing that the simulator generates virtual orders based on the trends extracted; paragraph 0045, discussing that when the scheduler selects an order, the application module extracts job information as well as current production status from existing systems and convert them into a form to which the learned policy can be applied; paragraph 0046); and

(3) creating reinforcement learning model and inputting the extraction data to produce an optimal scheduling decision according to a score function and the extraction data (paragraph 0008, discussing that example implementations search the optimal policies by updating both policies simultaneously based on the feedback in a simulation environment. The feedback reflects on how each decision was (e.g., better/worse) in terms of a target metric to be improved on; paragraph 0010, discussing instructions for processing and responding to an order, the instructions involving determining an initial scheduling policy for internal processes to meet the order and a due date policy for the order; a) executing a simulation involving scheduling decisions and due date quotations based on the initial scheduling policy and the due date policy; b) executing a machine learning process on results of the simulation to update the scheduling policy and the due date policy by evaluating the scheduling decisions and the due date quotations according to a scoring function which is common for evaluating the scheduling decisions and evaluating the due date quotations [i.e., evaluating the scheduling decisions according to a scoring function to determine a finalized scheduling policy suggests producing an optimal scheduling decision according to a score function]; iteratively executing a) and b) until a finalized scheduling policy and the due date policy is determined; and outputting the finalized scheduling policy [i.e., outputting the finalized scheduling policy is considered to be producing an optimal scheduling decision] and the due date policy in response to the order; paragraph 0033, discussing that DDPS (due date production scheduling system) continuously update the policies through multiple runs of the simulation. The simulator generates virtual orders based on the trends extracted. Each time the simulator issues an event such as order arriving and machine becoming available, the DDPS module makes a decision based on current policies. Then, the simulator advances the clock and generates the next event based on the decision taken. The simulation result gives the DDPS feedback regarding how good or bad the current situation is. A feedback is quantified by metrics described in the following session. Once DDPS receives feedback, DDPS updates the policies based on it. This update process can be done following the framework known as Reinforcement Learning, one of the machine learning methods; paragraph 0035, discussing that the detail of the mechanism to update due date quotation and scheduling policies is as follows. As mentioned previously, DDPS module uses reinforcement learning (RL) to update the policies. RL defines an agent, which is a subject taking an action according to the policy and interacting with an environment; paragraph 0045, discussing that when the scheduler selects an order, the application module extracts job information as well as current production status from existing systems and convert them into a form to which the learned policy can be applied [i.e., an optimal scheduling decision is produced according to the extraction data]; paragraph 0060, discussing that processor(s) can be configured to, in response to an order received by the apparatus such as computing device 703, determine an initial scheduling policy for internal processes to meet the order and a due date policy for the order; a) execute a simulation involving scheduling decisions and due date quotations based on the initial scheduling policy and the due date policy; b) execute a machine learning process on results of the simulation to update the scheduling policy and the due date policy by evaluating the scheduling decisions and the due date quotations according to a scoring function which is common for evaluating the scheduling decisions and evaluating the due date quotations as described with respect to reinforcement learning and the reinforcement learning implementations herein; iteratively execute a) and b) until a finalized scheduling policy and the due date policy is determined; and output the finalized scheduling policy and the due date policy in response to the order; paragraph 0064, “regarding reinforcement learning implementations, the scoring function can be a cost function that is weighted based on quoted due date and actual delivery date as determined by the simulation”; paragraph 0046);

wherein the step (3) comprises establishing different simulating environments according to the score function and the extraction data, producing multiple scheduling decisions corresponding to each of the different simulating environments, and judging the optimal scheduling decision for each simulating environment (paragraph 0008, discussing that example implementations search the optimal policies by updating both policies simultaneously based on the feedback in a simulation environment. The feedback reflects on how each decision was (e.g., better/worse) in terms of a target metric to be improved on; paragraph 0012, discussing that the apparatus involving a processor, configured to, in response to an order received by the apparatus, determine an initial scheduling policy for internal processes to meet the order and a due date policy for the order; a) execute a simulation involving scheduling decisions and due date quotations based on the initial scheduling policy and the due date policy; b) execute a machine learning process on results of the simulation to update the scheduling policy and the due date policy by evaluating the scheduling decisions and the due date quotations according to a scoring function which is common for evaluating the scheduling decisions and evaluating the due date quotations; iteratively execute a) and b) until a finalized scheduling policy and the due date policy is determined; and output the finalized scheduling policy and the due date policy in response to the order [i.e., iteratively executing the simulation involving scheduling decision and executing a machine learning process on results of the simulation according to a scoring function suggests producing multiple scheduling decisions corresponding to each of different simulating environments]; paragraph 0033, discussing that DDPS continuously update the policies through multiple runs of the simulation. The simulator generates virtual orders based on the trends extracted. Each time the simulator issues an event such as order arriving and machine becoming available, the DDPS module makes a decision based on current policies. Then, the simulator advances the clock and generates the next event based on the decision taken. The simulation result gives the DDPS feedback regarding how good or bad the current situation is [i.e., providing feedback regarding how good or bad the scheduling decision determined by the simulation is suggests judging the optimal scheduling decision for each stimulating environment]. A feedback is quantified by metrics described in the following session. Once DDPS receives feedback, DDPS updates the policies based on it. This update process can be done following the framework known as Reinforcement Learning, one of the machine learning methods; paragraph 0035, discussing that the detail of the mechanism to update due date quotation and scheduling policies is as follows. As mentioned previously, DDPS module uses reinforcement learning (RL) to update the policies. RL defines an agent, which is a subject taking an action according to the policy and interacting with an environment; paragraph 0039, discussing that RL (reinforcement learning) receives a feedback from an environment as a reward. There are several options to define the reward for DDPS agents; paragraph 0043, discussing that FIG. 4 illustrates how two agents learn the policies through the interaction with the simulation environment, in accordance with an example implementation. Whenever the simulator 400 issues an event related to the due date quotation or job dispatch, the relevant agent 410 takes an action based on the current policy. The action taken changes the state of the environment in the simulation environment 400 and generates the feedback. Based on the reward, the agents update their policies. Sharing the same reward allows the agents to update their policy towards the same goal; paragraph 0060, discussing that processor(s) 810 can be configured to, in response to an order received by the apparatus such as computing device 703, determine an initial scheduling policy for internal processes to meet the order and a due date policy for the order; a) execute a simulation involving scheduling decisions and due date quotations based on the initial scheduling policy and the due date policy; b) execute a machine learning process on results of the simulation to update the scheduling policy and the due date policy by evaluating the scheduling decisions and the due date quotations according to a scoring function which is common for evaluating the scheduling decisions and evaluating the due date quotations as described with respect to reinforcement learning and the reinforcement learning implementations herein; iteratively execute a) and b) until a finalized scheduling policy and the due date policy is determined; and output the finalized scheduling policy and the due date policy in response to the order; paragraph 0064, “regarding reinforcement learning implementations, the scoring function can be a cost function that is weighted based on quoted due date and actual delivery date as determined by the simulation; paragraphs 0031, 0042);

constructing a scheduling virtual environment according to the extraction data and the different simulating environments, and constructing multiple sub-learning models according to the multiple scheduling decisions (paragraph 0008, discussing that example implementations described involve a virtual environment which simulate the factory production lines and order arrivals. The example implementations learn the policy for due date quotation and scheduling through trials of decisions made on the virtual environment [i.e., scheduling virtual environment]. Due date quotation and scheduling processes are modeled as a collaborative decision making process, wherein the example implementations search the optimal policies by updating both policies simultaneously based on the feedback in a simulation environment. The feedback reflects on how each decision was (e.g., better/worse) in terms of a target metric to be improved on, such as tardiness. After learning the due date quotation and scheduling policies on the virtual environment, the policies are installed on the actual operation system. The policies thereby facilitate a factory or manufacturing system to execute an appropriate decision in a real time fashion; paragraph 0009, discussing a method for processing and responding to an order, involving determining an initial scheduling policy for internal processes to meet the order and a due date policy for the order; a) executing a simulation involving scheduling decisions and due date quotations based on the initial scheduling policy and the due date policy; b) executing a machine learning process on results of the simulation to update the scheduling policy and the due date policy by evaluating the scheduling decisions and the due date quotations according to a scoring function which is common for evaluating the scheduling decisions and evaluating the due date quotations; iteratively executing a) and b) until a finalized scheduling policy and the due date policy is determined [i.e., the machine learning models executed on the results of the simulation to evaluate the scheduling decisions are considered to be the multiple sub-learning models according to the multiple scheduling decisions]; paragraph 0031, discussing that the module builds a simulation environment of order arriving and production lines with a discrete-event simulation; paragraph 0033, discussing that DDPS continuously update the policies through multiple runs of the simulation; paragraph 0043);

determining whether a key performance indicator (KPI) of each scheduling decision is better than a KPI, if yes, rewarding the corresponding sub-learning model (paragraph 0008, discussing that example implementations described involve a virtual environment which simulate the factory production lines and order arrivals. The example implementations learn the policy for due date quotation and scheduling through trials of decisions made on the virtual environment. Due date quotation and scheduling processes are modeled as a collaborative decision making process, wherein the example implementations search the optimal policies by updating both policies simultaneously based on the feedback in a simulation environment. The feedback reflects on how each decision was (e.g., better/worse) in terms of a target metric to be improved on, such as tardiness. After learning the due date quotation and scheduling policies on the virtual environment, the policies are installed on the actual operation system. The policies thereby facilitate a factory or manufacturing system to execute an appropriate decision in a real time fashion; paragraph 0033, discs using that at 303, DDPS continuously update the policies through multiple runs of the simulation. The simulator generates virtual orders based on the trends extracted at 301. Each time the simulator issues an event such as order arriving and machine becoming available, the DDPS module makes a decision based on current policies. Then, the simulator advances the clock and generates the next event based on the decision taken. The simulation result gives the DDPS feedback regarding how good or bad the current situation is. A feedback is quantified by metrics described in the following session. Once DDPS receives feedback, DDPS updates the policies based on it. This update process can be done following the framework known as Reinforcement Learning, one of the machine learning methods; paragraph 0039, discussing that RL (reinforcement learning) receives a feedback from an environment as a reward. There are several options to define the reward for DDPS agents; paragraph 0043, discussing that FIG. 4 illustrates how two agents learn the policies through the interaction with the simulation environment, in accordance with an example implementation. Whenever the simulator 400 issues an event related to the due date quotation or job dispatch, the relevant agent 410 takes an action based on the current policy. The action taken changes the state of the environment in the simulation environment and generates the feedback. Based on the reward, the agents update their policies [i.e., rewarding the corresponding sub-learning model]; paragraph 0042; [With respect to the limitation “if yes, rewarding the corresponding sub-learning model” it is noted that “when given its broadest reasonable interpretation, claim 10 covers a method that comprises determining whether a key performance indicator (KPI) of each scheduling decision is better than a historical KPI, and that stops if a key performance indicator of each scheduling decision is not better than a historical KPI. As such, the step of “if yes, rewarding the corresponding sub-learning model” is conditional, and claim 10 needs not invoke this step. See Cybersettle, Inc. v. Natl Arbitration Forum, Inc., 243 Fed. Appx. 603, 607 (Fed. Cir. 2007) (unpublished) (“It is of course true that method steps may be contingent. If the condition for performing a contingent step is not satisfied, the performance recited by the step need not be carried out in order for the claimed method to be performed.”]); and 

judging optimization degree of each scheduling decision, thereby producing the optimal scheduling decision (paragraph 0008, discussing that example implementations described involve a virtual environment which simulate the factory production lines and order arrivals. The example implementations learn the policy for due date quotation and scheduling through trials of decisions made on the virtual environment. Due date quotation and scheduling processes are modeled as a collaborative decision making process, wherein the example implementations search the optimal policies by updating both policies simultaneously based on the feedback in a simulation environment. The feedback reflects on how each decision was (e.g., better/worse) in terms of a target metric to be improved on [i.e., judging optimization degree of each scheduling decision]. After learning the due date quotation and scheduling policies on the virtual environment, the policies are installed on the actual operation system. The policies thereby facilitate a factory or manufacturing system to execute an appropriate decision in a real time fashion; paragraph 0009, discussing a method for processing and responding to an order, involving determining an initial scheduling policy for internal processes to meet the order and a due date policy for the order; a) executing a simulation involving scheduling decisions and due date quotations based on the initial scheduling policy and the due date policy; b) executing a machine learning process on results of the simulation to update the scheduling policy and the due date policy by evaluating the scheduling decisions and the due date quotations according to a scoring function which is common for evaluating the scheduling decisions and evaluating the due date quotations; iteratively executing a) and b) until a finalized scheduling policy and the due date policy is determined; and outputting the finalized scheduling policy [i.e., producing the optimal scheduling decision] and the due date policy in response to the order; paragraph 0033, discussing that DDPS continuously update the policies through multiple runs of the simulation. The simulator generates virtual orders based on the trends extracted. Each time the simulator issues an event such as order arriving and machine becoming available, the DDPS module makes a decision based on current policies. Then, the simulator advances the clock and generates the next event based on the decision taken. The simulation result gives the DDPS feedback regarding how good or bad the current situation is [i.e., providing feedback regarding how good or bad the scheduling decision is suggests judging optimization degree of each scheduling decision]. A feedback is quantified by metrics described in the following session. Once DDPS receives feedback, DDPS updates the policies based on it. This update process can be done following the framework known as Reinforcement Learning, one of the machine learning methods; paragraph 0067, discussing that the internal processes can include manufacturing processes, and the processor(s) can be configured to output the finalized scheduling policy and the due date policy in response to the order by responding to the order with the due date policy and dispatching the finalized scheduling policy to the one or more machines to execute a manufacturing process according to the finalized scheduling policy as illustrated in FIG. 5 and FIG. 6, and whereupon when the finalized scheduling policy is determined, it can be dispatched to the one or more machines illustrated in FIG. 7 through the user interface of FIG. 6 via network and PLCs to be executed by the corresponding machines);

wherein step (2) comprises calculating and extracting the extraction data which is input to the model (paragraph 0025, discussing that the due date production scheduling (DDPS) system stores history data in a factory about orders and production status, which facilitates the DDPS to build a simulation model that reproduces the events in the past. DDPS optimization module provides a function to learn policies for due date quotation and scheduling in simulation environment. DDPS also provides a function to apply learned policies to the existing system; paragraph 0027, discussing that Table 2 illustrates an example of order information...In situations where that customer can request the preferred due date, the order information can include such information...; paragraph 0030, discussing that the module collects historical data of orders and production status. This can be done by transferring batched data from the ERP and MES system, or by fetching snapshots in a real time fashion from those systems and storing the snapshots for a certain period. The required period of historical data depends on the variety of the customer orders and productions, and can be adjusted accordingly to fit the desired implementation; paragraph 0031, discussing that the module builds a simulation environment of order arriving and production lines with a discrete-event simulation (DES). A DES is a general tool to simulate stochastic discrete events and widely used to simulate discrete processes in manufacturing lines...From historical order information, example implementations can extract order trends such as arriving rate of order in terms of product, quantity, and requested due date. The extracted trends are fed to the simulator to generate virtual orders which reflects the actual order trends. The simulator built at this stage is used to learn the policies [i.e., the extracted trends that are fed to the simulator suggests extraction data which is input to the reinforcement leaning model]; paragraph 0033, discussing that  DDPS continuously update the policies through multiple runs of the simulation. The simulator generates virtual orders based on the trends extracted. Each time the simulator issues an event such as order arriving and machine becoming available, the DDPS module makes a decision based on current policies…The simulation result gives the DDPS feedback regarding how good or bad the current situation is. A feedback is quantified by metrics described in the following session. Once DDPS receives feedback, DDPS updates the policies based on it. This update process can be done following the framework known as Reinforcement Learning, one of the machine learning methods; paragraph 0045, discussing that when the scheduler selects an order, the application module extracts job information as well as current production status from existing systems and convert them into a form to which the learned policy can be applied; paragraph 0028), and

the extraction data comprises order delivery date, and current production status (paragraph 0026, discussing a function that takes an order information and current production status as input; paragraph 0028, discussing that quoted due is a due date the factory replied to the customer as a result of due date quotation. Process represents the current stage in which the job is positioned; paragraph 0045, discussing that when the scheduler selects an order, the application module extracts job information as well as current production status from existing systems and convert them into a form to which the learned policy can be applied; paragraph 0046, discussing that FIG. 6 illustrates an example of the dashboard for production scheduling operation, in accordance with an example implementation. The top left panel shows machine status in a factory. When the scheduler selects a process waiting for an assignment of the next job, the jobs in that process appear in the top right panel. The application module extracts current production status and applies the policies learned).

While Serita teaches pre-processing and calculating the production data; and determining whether a key performance indicator (KPI) of each scheduling decision is better than a KPI, it does not explicitly teach that the production data is from the data cleaning module; inputting the extraction data to the reinforcement learning mode; that the KPI is a historical KPI; wherein step (2) comprises calculating and extracting the extraction data which is input to the reinforcement learning model, and the extraction data comprises production time, machine maintenance status, and urgency. Chan in the analogous art of industrial processes optimization teaches:

pre-processing and calculating the production data from the data cleaning module (paragraph 0053, discussing that the system performs data pre-processing, which includes data screening, repairing, and other preparation such as filtering, aggregation etc.; paragraph 0072, discussing that for a model-based solution, the embodiments can provide the following methods and execution steps to support successful applications: Perform feature calculation and extractions required by model inputs, such as applying transforms to raw data, compute derived variable values from measurements, running through inferential models to generate property estimated values, etc.; Execute model predictions and solve optimization problems online for the ultimate application solutions at a repeated cycle; and Export model prediction and solution results for decision making or real-time process control and optimization implementation; paragraph 0054, discussing that the system continues operating on the cleansed dataset--performing feature enhancement and feature selection, which may include calculating one or more features from original process data and operation data; paragraph 0097, discussing that the method may also prepare the measurement data with pre-processing options; paragraph 0104, discussing that the method 120 then performs data feature enrichment on the cleansed/repaired input dataset resulting from step 120-2. The feature enrichment enhances the dataset by adding physically meaningful or numerically more relevant derived process variables and corresponding values. Step 120-3 automatically derives various feature variables and corresponding values from the measurements of candidate process variables in the dataset. The derived feature variable values may be more predicative of the identified at least one process dependent variable of the subject plant process than the measurements of candidate process variables in the dataset; paragraph 0117, discussing that the method at 120-4 may export a small subset of the projected latent variables (e.g., mathematically equivalent to a set of transformed new variables) from the PCA or PLS model for use as "transformed" final model inputs (instead of the larger number of process variables) to build the model. The method, at step 120-5, may generate the reduced subset by truncating the projected latent variables from the PCA (Principal Component Analysis) model using a best trade-off between model fitting and simplicity…The reduced subset of the projected latent variables can represent most of the useful correlation information needed to facilitate the modeling efforts. In this way, the PCA model acts as a "data transform and compression" module [i.e., the data transform and compression module is considered to be the pre-processing calculation module, adapted for pre-processing and calculating the production data] and a "pre-filter" module with respect to the set of latent variables used as input to build a final PSE model for the subject process; paragraph 0119, discussing that the enhanced modeling depends on the availability of the amount of data and the extractable and useful information contained in the data, also depends on the specific PSE problem to solve; paragraphs 0069, 0081, 0142); and

determining whether a key performance indicator (KPI) of each scheduling decision is better than a historical KPI (paragraph 0011, discussing a user defined key performance indicator for the subject industrial process; paragraph 0063, discussing that the system monitors its performance while generating predictions and solutions, and can perform model adaptions when model predictions and solutions become sub-optimal. In such a way, the system keeps its model and solutions updated and ensures a sustained performance; paragraph 0073, discussing that a set of methods of performance monitoring and self-model adaptation to support sustained performance of the system. The methods can include: a pre-defined key performance indicator ( KPI) of model quality or optimizer performance measure, which is used to evaluate the current performance of a model or a solution based on recent process data; paragraph 0123, discussing that the method may deploy one or more models and execute one or more optimization tasks. These models may compare the current real-time data of the subject plant process to pre-defined performance criterions from historical data of the subject plant process [i.e., the pre-defined performance criterion from historical data  is considered to be the historical KPI]. Based on the comparison, one or more models detect whether degradation in performance conditions appeared in the subject plant process; paragraphs 0038, 0066, 0168).

Serita is directed towards a system and method for optimizing production scheduling. Chan is directed towards production planning optimization. Therefore they are deemed to be analogous as they both are directed towards optimization of production scheduling. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Serita with Chan’s feature for determining whether a key performance indicator (KPI) of each scheduling decision is better than a historical KPI because the references are analogous art because they are both directed to solutions for production planning, which falls within applicant’s field of endeavor (production scheduling systems and methods), and since each individual element and its function are shown in the prior art, albeit shown in separate references, the difference between the claimed subject matter and the prior art rests not on any individual element or function but in the very combination itself. That is in the substitution of the target KPI of Serita for the historical KPI of Chan. Both are key performance indicators that are used to make a comparison; thus, the simple substitution of one known element for another producing a predictable result renders the claim obvious. Additionally, it would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to modify Serita with Chan’s teachings of supporting manufacturers continually optimizing the performance of their assets --improving safety, managing risk, reducing downtime, enhancing productivity, and increasing profitability (Chan at paragraph 0124). Furthermore, modifying Serita to include Chan’s features for pre-processing and calculating the production data from the data cleaning module, in the manner claimed, would serve the motivation of reducing the dataset by cleansing bad quality data segments and generating an improved dataset by aggregation, data cleansing, and pre-processing (Chan at paragraph 0081).

The Serita-Chan combination does not explicitly teach inputting the extraction data to the reinforcement learning mode; wherein step (2) comprises calculating and extracting the extraction data which is input to the reinforcement learning model, and the extraction data comprises production time, machine maintenance status, and urgency. Lettowsky in the analogous art of production planning systems teaches:

the extraction data comprises production time and machine maintenance status (paragraph 0002, discussing a method for providing, retrieving and using a data element in the value chain/value creation chain of a plastic sheet material for producing a final product; paragraph 0225, discussing that the trigger to request this information can either be performed by an operator of the plant, an employee in the work preparation or automatically. The information contains information on the recipe as well as machine settings...Just like the system reads information from a memory about a suitable detection device, it writes this information into the memory, which is relevant for further processing of one or more semi-finished product(s). Information may include, inter alia: date of manufacture, manufacturing time, i.e. start time and end time of the roll, place of manufacture, the used machine (type, manufacturer, machine number, maintenance condition), recipe (exact name of type and amount of raw materials used), environmental conditions during production (air humidity, temperature and air pressure), machine setting values, process parameters and inline recorded quality parameters of the produced semi-finished product; paragraph 0236, discussing that additional information added to a data element as information can relate the date of manufacture, the time of manufacture, i.e. start time and end time of the roll, the place of manufacture, the machine used (type, manufacturer, machine number, maintenance condition), recipe, environmental conditions in the production, machine setting values, process parameters and quality parameters of the semi-finished products; paragraph 0109).

The Serita-Chan combination is directed towards production scheduling optimization. Lettowsky is directed towards optimized production planning and control. Therefore they are deemed to be analogous as they both are directed towards optimization of production planning. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the Serita-Chan combination with Lettowsky because the references are analogous art because they are both directed to solutions for production planning, which falls within applicant’s field of endeavor (production scheduling systems and methods), and because modifying the Serita-Chan combination to include Lettowsky’s feature for including extraction data comprising production time and machine maintenance status, in the manner claimed, would serve the motivation of reducing the cost of a production process and/or the throughput times of a production process (Lettowsky at paragraph 0269); and further obvious because the claimed invention is merely a combination of old elements, and in the combination each element merely would have performed the same function as it did separately, and one of ordinary skill in the art would have recognized that the results of the combination were predictable.

The Serita-Chan-Lettowsky combination does not explicitly teach inputting the extraction data to the reinforcement learning mode to produce an optimal scheduling decision; wherein step (2) comprises calculating and extracting the extraction data which is input to the reinforcement learning model, and the extraction data comprises urgency. However, Wen in the analogous art of production scheduling teaches these concepts. Wen teaches:

inputting the extraction data to the reinforcement learning mode to produce an optimal scheduling decision (paragraph 0004, discussing methods and systems for a fast production scheduling approach based on deep reinforcement learning (DRL); paragraph 0031, discussing that a state of a production schedule is identified. The state of the production schedule may include information relating to factors for different machines (e.g. machine availability, product on machine, remaining execution time, machine input queue, machine output queue), different products, workers, environmental factors, cost parameters, supply, demand, transportation parameters, among other data that is related to the production schedule; paragraph 0033, discussing that the simulator may predict multiple possible states for the future based on a current state and prior run simulations. The simulator may introduce random disturbances to relevant aspects of the environment, e. g. variable processing times, machine breakdown, and electricity prices when generating the future predicted states. Each of the future predicted states may be used below to identify futures steps and generate a production schedule; paragraph 0034, discussing that the state is input into a neural network trained to generate a plurality of sub-optimal scheduling policies. In an embodiment, the neural network may be a deep reinforcement learning (DRL) network. The DRL is pre-trained to identify one or more sub-optimal policies. The term sub-optimal here refers to policies that may not be the ideal or optimal policy for proceeding with the production schedule; paragraph 0036, discussing that the reward provided by the simulator may be identified by the simulator...The reward may reflect a makespan or other quantifiable value that reflects the object or objects of the production schedule. The reward may be based on multiple different values and may be determined using an algorithm that weigh different values differently; paragraph 0038, discussing that the neural network is defined as a plurality of sequential feature units or layers. The machine network inputs state data, compresses the state data into a latent space and maps the features from the latent space using the LSTM 402. The encoder is trained using classical unsupervised learning algorithm to train an autoencoder;  paragraph 0057, discussing that a production schedule is generated using the one or more near optimal scheduling policies. The production schedule may include one or more steps or actions to perform as defined by the one or more optimal policies...In a scenario where there is a change to the underlying manufacturing parameters (e.g. a cost or machine change), the production schedule may be adapted. The change in state may be input into the DRL (deep reinforcement learning)  directly or using the manufacturing plant simulator [i.e., inputting data into the deep reinforcement learning model corresponds to inputting the extraction data to the reinforcement learning model]. The DRL agent may use the updated search and parameters to generate a new near optimal production schedule given the new state and manufacturing parameters; paragraph 0032);

wherein step (2) comprises calculating and extracting the extraction data which is input to the reinforcement learning model (paragraph 0057, discussing that a production schedule is generated using the one or more near optimal scheduling policies. The production schedule may include one or more steps or actions to perform as defined by the one or more optimal policies...In a scenario where there is a change to the underlying manufacturing parameters (e.g. a cost or machine change), the production schedule may be adapted. The change in state may be input into the DRL (deep reinforcement learning)  directly or using the manufacturing plant simulator [i.e., inputting data into the deep reinforcement learning model corresponds to inputting the extraction data to the reinforcement learning model]. The DRL agent may use the updated search and parameters to generate a new near optimal production schedule given the new state and manufacturing parameters; paragraph 0060, discussing that  the DRL agent 103 runs a plurality of simulations using data from the simulator and identifies a reward using a predefined reward function. A high fidelity manufacturing plant simulator  is configured to generate new states given an action from the agent. The simulator and the agent are configured to run simulations from a state to an end of a production schedule. The rewards for each state and complete simulation may be calculated using a known reward function; paragraph 0082, discussing that the sub-optimal scheduling policies are used by the processor. The processor speeds up the search for the optimal polices. The processor is configured to return an initial schedule based on static state information…The processor performs continuous rollout. The rollout continues to search for a feasible/better scheduling policy and augment the schedule dynamically even after some tasks have already been dispatched…The continuous rollout provides that the schedule reacts to the dynamics of manufacturing systems in a timely manner, e.g. the schedule can be adjusted if, for example, a machine fails, or conditions change (e.g. price of a commodity or power changes or CPU availability or delivery or environmental conditions); paragraph 0032); and 

the extraction data comprises urgency (paragraph 0029, discussing that the sub-optimal scheduling policies are fed into the MCTS agent. The MCTS agent speeds up the search for near optimal polices in offline training phase…In real-time scheduling, the MCTS agent performs continuous rollout utilizing the continual acquisition of incrementally available information, e.g. machine breakdown, machine processing time etc. For example, the calculated policies from DRL (deep reinforcement learning) agent may become infeasible because of machine breakdown or significant changes in environmental conditions, e.g. order priority [i.e., urgency]. The rollout continues to search for a feasible/better scheduling policy and augment the schedule dynamically even after some tasks have already been dispatched. The continuous rollout provides that the schedule reacts to the dynamics of manufacturing systems in a timely manner, e.g. the schedule can be adjusted if, for example, a machine fails or conditions change (e.g. price of a commodity or power changes or CPU; paragraph 0031, discussing that a state of a production schedule is identified. The state of the production schedule may include information relating to factors for different machines (e.g. machine availability, product on machine, remaining execution time, machine input queue, machine output queue), different products, workers, environmental factors, cost parameters, supply, demand, transportation parameters, among other data that is related to the production schedule; paragraph 0057).

The Serita-Chan-Lettowsky combination is directed towards production scheduling optimization. Wen is directed towards a system for production scheduling. Therefore they are deemed to be analogous as they both are directed towards optimization of production schedules. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the Serita-Chan-Lettowsky combination with Wen because the references are analogous art because they are both directed to solutions for production scheduling, which falls within applicant’s field of endeavor (production scheduling systems and methods), and because modifying the Serita-Chan-Lettowsky combination to include Wen’s features for inputting the extraction data to the reinforcement learning mode to produce an optimal scheduling decision; calculating and extracting the extraction data which is input to the reinforcement model, and including extraction data comprising urgency, in the manner claimed, would serve the motivation of increased production efficiency and maximizing the efficiency of the operation and reduce costs (Wen at paragraph 0002) or in the pursuit of quickly and efficiently generate an efficient schedule for operation of the machines in order to generate, for example, a final product (Wen at paragraph 0075); and further obvious because the claimed invention is merely a combination of old elements, and in the combination each element merely would have performed the same function as it did separately, and one of ordinary skill in the art would have recognized that the results of the combination were predictable.

Claim 11 recites substantially similar limitations that stand rejected via the art citations and rationale applied to claim 4, as discussed above.

Claim 14 recites substantially similar limitations that stand rejected via the art citations and rationale applied to claim 7, as discussed above. Further, as per claim 14 Serita teaches the production scheduling method according to claim 8, further comprising receiving feedback information from the user terminal and adjusting results of scheduling decisions in real time according to the feedback information (paragraph 0008, discussing that the example implementations search the optimal policies by updating both policies simultaneously based on the feedback in a simulation environment [i.e., updating both policies simultaneously based on the feedback in a simulation environment is considered to be adjusting results of scheduling decisions according to the feedback information – as described below the update to the scheduling policies is executed in real-time]. The feedback reflects on how each decision was (e.g., better/worse) in terms of a target metric to be improved on, such as tardiness. After learning the due date quotation and scheduling policies on the virtual environment, the policies are installed on the actual operation system. The policies thereby facilitate a factory or manufacturing system to execute an appropriate decision in a real time fashion; paragraph 0021, discussing that selection can be conducted by a user through a user interface (i.e., user terminal) or other input means; paragraph 0033, discussing that DDPS continuously update the policies through multiple runs of the simulation. The simulator generates virtual orders based on the trends extracted at 301. Each time the simulator issues an event such as order arriving and machine becoming available, the DDPS module makes a decision based on current policies. Then, the simulator advances the clock and generates the next event based on the decision taken. The simulation result gives the DDPS feedback regarding how good or bad the current situation is. A feedback is quantified by metrics described in the following session. Once DDPS receives feedback, DDPS updates the policies based on it. This update process can be done following the framework known as Reinforcement Learning, one of the machine learning methods; paragraph 0035, describes a mechanism to update due date quotation and scheduling policies using reinforcement learning (RL) to update the policies; paragraph 0051).

Conclusion

The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
A.	 Tashman, Pub. No.: US 2020/0265331 A1 – describes a system for predicting equipment failure events and optimizing manufacturing operations.
B.	Waschneck, Bernd, et al. "Optimization of global production scheduling with deep reinforcement learning." Procedia Cirp 72 (2018): 1264-1269 - describes learning systems for production control.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Darlene Garcia-Guerra whose telephone number is (571) 270-3339. The examiner can normally be reached on M-F 7:30a.m.-5:00p.m. EST. 
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Brian M. Epstein can be reached on 571- 270-5389. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/Darlene Garcia-Guerra/
Examiner, Art Unit 3683