DETAILED ACTION
This communication is a Non-Final Office Action rejection on the merits. Claims 1-20 are currently pending and have been addressed below.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
Applicant's arguments filed 10/21/2022 (related to the 103 Rejection) have been fully considered but are moot in view of new grounds of rejection. Applicant's amendments necessitated the new ground(s) of rejection presented in this Office action. Rejection based on a newly cited reference(s) follows.
Applicant's arguments filed on 10/21/2022 (related to the 101 Rejection) have been fully considered but they are not persuasive.
Applicant states, on pages 8-10, that the pending claims provide a system and method for linking events in time to identify one or more processes. Then real-time inputs are monitored and matched to the one or more processes by forming hypotheses and changing those hypotheses as necessary. See, e.g., Specification, paragraphs [0035], [0052]. The claims are not merely using mathematical correlations, as such, the claims cannot be categorized as a mathematical concept abstract idea.
Examiner respectfully disagrees with Applicant. Claim 1 elements are considered to be abstract ideas because they are directed to “mathematical concepts” which include “mathematical relationships.” This is a form of “mathematical relationships” because the system is organizing information through mathematical correlations to identify a process. Also, this is a form of “mental processes” because the system is merely “comparing sequences and determining alterations.” If a claim limitation, under its broadest reasonable interpretation, covers “mathematical relationships” or “mental processes,” then it falls within the “mathematical concepts” or “mental processes” grouping of abstract ideas. Accordingly, the claim recites an abstract idea.
The mere nominal recitation of generic computer components does not take the claim out of the “mathematical concepts” or “mental processes” grouping. The additional elements of “unsupervised learning,” “non-transitory computer-readable medium,” and “keystroke and application telemetry” are recited at a high level of generality such that it amounts no more than mere instructions to apply the exception using a generic computer element (MPEP 2106.05f). Also, the “keystroke and application telemetry” is considered “field of use” MPEP 2106.05h at Step 2A, Prong 2, since the “keystroke and application telemetry” is merely used to collect information and the technology is not improved. At Step 2B, this is a conventional computer function of receiving or transmitting data over a network (See MPEP 2106.05d).  
The claim fails to recite any improvements to another technology or technical field, improvements to the functioning of the computer itself, use of a particular machine, effecting a transformation or reduction of a particular article to a different state or thing, adding unconventional steps that confine the claim to a particular useful application, and/or meaningful limitations beyond generally linking the use of an abstract idea to a particular environment.  See 84 Fed. Reg. 55. Viewed individually or as a whole, these additional claim element(s) do not provide meaningful limitation(s) to transform the abstract idea into a patent eligible application of the abstract idea such that the claim(s) amounts to significantly more than the abstract idea itself.  
Examiner concludes that the additional elements in combination fail to amount to significantly more than the abstract idea based on findings that each element merely performs the same function(s) in combination as each element performs separately. The claim is not patent eligible. 
Independent claim 9 recites similar features and therefore is rejected for the same reasons as independent claim 1. Claims 2-8, 10-16, 17-18, and 19-20 are rejected for having the same deficiencies as those set forth with respect to the claims that they depend from, independent claims 1 or 9.
Examiner recommends to further include specific training steps of a neural network model. For example, specific steps of training an RNN/LSTM (i.e., receive data, train the RNN/LSTM with the data, generate a process script with probabilities, update and retrain the machine learning model over time) may be enough to overcome the 101 Rejection. See example 39 of the 2019 Revised Patent Subject Matter Eligibility Guidance.




Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to a judicial exception (i.e. an abstract idea) without reciting significantly more. 

Independent Claim 1
Step One - First, pursuant to step 1 in the January 2019 Revised Patent Subject Matter Eligibility Guidance (“2019 PEG”) on 84 Fed. Reg. 53, the claim 1 is directed to an apparatus which is a statutory category.
Step 2A, Prong One - Claim 1 recites: A system for discovering business processes, the system is configured to: receive a plurality of sets of multimodal event data from a plurality of sources, at least one of the plurality of sets received in real-time, each set of the multimodal event data including a plurality of event instances, the plurality of data sources including audio data, video data, and a keystroke data log including timestamps for when a key was pressed; associate each set of the multimodal event data with a respective vector representation, such that the plurality of event instances is represented as a plurality of event vectors, each of the respective vector representations being generated by a plurality of vectorizers that include an adaptor to read the multimodal event data and convert the multimodal event data to a corresponding vectorized data; correlate by a process mining server the plurality of event vectors using unsupervised learning to identify one or more processes from the vectorized data, wherein the vectorized data associated with at least one of the sets of multimodal data including tasks or events that occur with an unknown or variable duration separating occurrences within the tasks or events, and wherein the vectorized data associated with at least two of the sets of multimodal event data vectorizes data have at least two different modalities, the process mining server using time-windowing to group events from the multimodal event data into macro level tasks for the one or more processes; generate and store a process model script for the one or more processes, wherein the at least one of the plurality of sets of multimodal event data received in real-time is classified as belonging to a first process in the one or more processes: continue monitoring the at least one of the plurality of sets of multimodal event data in real-time, and based on the continued monitoring, adjusting and determining that the at least one of the plurality of sets of multimodal event data received in real-time belongs to a second process in the one or more processes that is different from the first process. These claim elements are considered to be abstract ideas because they are directed to “mathematical concepts” which include “mathematical relationships.” This is a form of “mathematical relationships” because the system is organizing information through mathematical correlations to identify a process. Also, this is a form of “mental processes” because the system is merely “comparing sequences and determining alterations” (see MPEP 2106.04(a)(2)). If a claim limitation, under its broadest reasonable interpretation, covers “mathematical relationships” or “mental processes,” then it falls within the “mathematical concepts” or “mental processes” grouping of abstract ideas. Accordingly, the claim recites an abstract idea. 
Step 2A Prong 2 - The judicial exception is not integrated into a practical application. Claim 1 includes additional elements: an unsupervised learning; a non-transitory computer readable medium; and keystroke data including application telemetry.
The unsupervised learning is merely used to predict one or more processes from the vectorized data (Paragraph 0033). The non-transitory computer readable medium is merely used to store instructions (Paragraph 0005). The “keystroke and application telemetry” is merely used to monitor and save keystroke data, wherein the keystroke data can include a timestamp for when a key was pressed, the specific key that was pressed, etc. (Paragraph 0024). Merely stating that the step is performed by a computer component results in “apply it” on a computer (MPEP 2106.05f). These elements of “unsupervised learning,” “non-transitory computer-readable medium,” and “keystroke and application telemetry” are recited at a high level of generality such that it amounts no more than mere instructions to apply the exception using a generic computer element. Further, the unsupervised learning is merely indicating a “particular technology environment” in which to apply a judicial exception, which includes limiting the abstract idea of collecting information (e.g. colleting input data), analyzing it (e.g. correlating the plurality of event vectors to identify one or more processes), and displaying certain results (e.g. generating a process model script). The “keystroke and application telemetry” is considered “field of use”, as it’s just used to collect data and the technology is not improved. Accordingly, alone and in combination, these additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. Therefore, the claim is directed to an abstract idea.
Step 2B - The claim does not include additional elements that are sufficient to amount significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the claims describe how to generally “apply” the concept of generating a process model script for the one or more processes. The specification shows that the unsupervised learning is merely used to predict one or more processes from the vectorized data (Paragraph 0033). The non-transitory computer readable medium is merely used to store instructions (Paragraph 0005). The “keystroke and application telemetry” is merely used to monitor and save keystroke data, wherein the keystroke data can include a timestamp for when a key was pressed, the specific key that was pressed, etc. (Paragraph 0024). In this case, the unsupervised learning is merely used as a tool to perform an abstract idea and the “keystroke and application telemetry” is considered a conventional computer function of receiving or transmitting data over a network (See MPEP 2106.05d). Thus, nothing in the claim adds significantly more to the abstract idea. The claim is ineligible.

Independent Claim 9
Step One - First, pursuant to step 1 in the January 2019 Revised Patent Subject Matter Eligibility Guidance (“2019 PEG”) on 84 Fed. Reg. 53, the claim 9 is directed to a method which is a statutory category.
Step 2A, Prong One - Claim 9 recites: A method for discovering business processes, the method comprising: receiving a plurality of sets of multimodal event data from a plurality of sources, each set of the multimodal event data including a plurality of event instances, the plurality of data sources including a keystroke data log including timestamps for when a key was pressed; associating each set of the multimodal event data with a respective vector representation, such that the plurality of event instances is represented as a plurality of event vectors, each of the respective vector representations being generated by a plurality of vectorizers that include an adaptor to read the multimodal event data and convert the multimodal event data to a corresponding vectorized data; correlating by a process mining server the plurality of event vectors using unsupervised learning to identify one or more processes from the vectorized data, wherein the vectorized data associated with at least one of the sets of multimodal data including tasks or events that occur with an unknown or variable duration separating occurrences within the tasks or events, and wherein the vectorized data associated with at least two of the sets of multimodal event data vectorizes data have at least two different modalities; the process mining server using time-windowing to group events from the multimodal event data into macro level tasks for the one or more processes; generating and storing a process model script for the one or more processes, wherein the at least one of the plurality of sets of multimodal event data received in real-time is classified as belonging to a first process in the one or more processes; continuing to monitor the at least one of the plurality of sets of multimodal event data in real-time, and based on the continued monitoring, adjusting and determining that the at least one of the plurality of sets of multimodal event data received in real-time belongs to a second process in the one or more processes that is different from the first process. These claim elements are considered to be abstract ideas because they are directed to “mathematical concepts” which include “mathematical relationships.” This is a form of “mathematical relationships” because the system is organizing information through mathematical correlations to identify a process. Also, this is a form of “mental processes” because the system is merely “comparing sequences and determining alterations” (see MPEP 2106.04(a)(2)). If a claim limitation, under its broadest reasonable interpretation, covers “mathematical relationships” or “mental processes,” then it falls within the “mathematical concepts” or “mental processes” grouping of abstract ideas. Accordingly, the claim recites an abstract idea. 
Step 2A Prong 2 - The judicial exception is not integrated into a practical application. Claim 9 includes additional elements: an unsupervised learning; and keystroke data including application telemetry.
The unsupervised learning is merely used to predict one or more processes from the vectorized data (Paragraph 0033). The “keystroke and application telemetry” is merely used to monitor and save keystroke data, wherein the keystroke data can include a timestamp for when a key was pressed, the specific key that was pressed, etc. (Paragraph 0024). Merely stating that the step is performed by a computer component results in “apply it” on a computer (MPEP 2106.05f). These elements of “unsupervised learning” and “keystroke and application telemetry” are recited at a high level of generality such that it amounts no more than mere instructions to apply the exception using a generic computer element. Further, the unsupervised learning is merely indicating a “particular technology environment” in which to apply a judicial exception, which includes limiting the abstract idea of collecting information (e.g. colleting input data), analyzing it (e.g. correlating the plurality of event vectors to identify one or more processes), and displaying certain results (e.g. generating a process model script). The “keystroke and application telemetry” is considered “field of use”, as it’s just used to collect data and the technology is not improved. Accordingly, alone and in combination, these additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. Therefore, the claim is directed to an abstract idea.
Step 2B - The claim does not include additional elements that are sufficient to amount significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the claims describe how to generally “apply” the concept of generating a process model script for the one or more processes. The unsupervised learning is merely used to predict one or more processes from the vectorized data (Paragraph 0033). The “keystroke and application telemetry” is merely used to monitor and save keystroke data, wherein the keystroke data can include a timestamp for when a key was pressed, the specific key that was pressed, etc. (Paragraph 0024). In this case, the unsupervised learning is merely used as a tool to perform an abstract idea and the “keystroke and application telemetry” is considered a conventional computer function of receiving or transmitting data over a network (see MPEP 2106.05d). Thus, nothing in the claim adds significantly more to the abstract idea. The claim is ineligible.
Dependent claims 2-3 and 10-11 are not directed to any additional abstract ideas and are also not directed to any additional non-abstract claim elements. Rather, these claims offer further descriptive limitations of elements found in the independent claim and addressed above such as by specifying how correlation is used for determining a similarity or dissimilarity between the first process matrix and the second process matrix. These processes are similar to the abstract idea noted in the independent claim because they further the limitations of the independent claim which are directed to “mathematical concepts” or “mental processes.” In addition, no additional elements are integrated into the abstract idea. Therefore, the claims still recite an abstract idea that can be grouped into “mathematical concepts” or “mental processes.”
Dependent claims 4 and 12 are not directed to additional abstract ideas, but are directed to an additional non-abstract claim element. The additional non-abstract claim element is a long short term memory (LSTM) neural network. The LSTM neural network is merely used for correlating the plurality of event vectors (Paragraph 0033). The LSTM neural network is considered a “particular technological environment” MPEP 2106.05h at Step 2A. Also, the LSTM neural network is merely used as a tool to perform an abstract idea at Step 2B. Thus, nothing in the claim adds significantly more to the abstract idea. The claim is ineligible.
Dependent claims 5-8 and 13-16 are not directed to any additional abstract ideas and are also not directed to any additional non-abstract claim elements. Rather, these claims offer further descriptive limitations of elements found in the independent claim and addressed above such as by specifying: wherein the process model script includes one or more directed graphs; wherein the process model script is a robotic process automation (RPA) script; wherein the plurality of sources includes two or more selected from the group consisting of: one or more Internet Information Services (IIS) log files, one or more Apache log file, one or more application log files, one or more standard operating procedure (SOP) manuals, one or more screen capture logs, one or more keystroke logs, one or more business process documents (BPDs); and wherein the process model script identifies higher probability processes in the one or more processes. These processes are similar to the abstract idea noted in the independent claim because they further the limitations of the independent claim which are directed to “mathematical concepts” which include “mathematical relationships.” In addition, no additional elements are integrated into the abstract idea. Therefore, the claims still recite an abstract idea that can be grouped into “mathematical concepts.”
Dependent claims 17 and 19 are not directed to any additional abstract ideas and are also not directed to any additional non-abstract claim elements. Rather, these claims offer further descriptive limitations of elements found in the independent claim and addressed above such as by specifying: to group events from the multimodal event data into the macro level tasks to match tasks in a standard operating procedure manual. These processes are similar to the abstract idea noted in the independent claim because they further the limitations of the independent claim which are directed to “mathematical concepts” or “mental processes.” In addition, no additional elements are integrated into the abstract idea. Therefore, the claims still recite an abstract idea that can be grouped into “mathematical concepts” or “mental processes.”
Dependent claims 18 and 20 are not directed to additional abstract ideas, but are directed to an additional non-abstract claim element. The additional non-abstract claim element is “display a visual representation.” The “display a visual representation” is merely used to display the generated process (Paragraph 0069). Merely stating that the step is performed by a computer component results in “apply it” on a computer (MPEP 2106.05f) being applicable at both Step 2A, Prong 2 and Step 2B. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. Further, instructions to display and/or arrange information in a graphical user interface may not be sufficient to show an improvement in computer-functionality (MPEP 2106.05a). Thus, nothing in the claim adds significantly more to the abstract idea. The claim is ineligible.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.


Claims 1, 5-7, 9, 13-15, 17-18, and 19-20 are rejected under 35 U.S.C. 103 as being unpatentable Ma et al. (US 2020/0206920 A1), in view of Linnell et al. (US 9,278,449 B1).
Regarding claim 1 (Currently Amended), Ma et al. discloses a system for discovering business processes using unsupervised learning (Paragraph 0002, The present invention relates to process automation, and more specifically, this invention relates to systems and methods for identifying processes for robotic automation; Paragraph 0148, As an example, if during a segmentation operation 306 the following sequences are observed five times each within a given event stream or set of event streams: ABCDEFGHI, ABCDEFGPQ, XYZDEFGHI, and XYZDEFGPQ, then it is plausible to assume that the sequence “DEFG” represents a common individual task, and that the sequences “ABC” and “PQ” are instances of other tasks and should not be combined with the sequence “DEFG.” The frequency thresholds that indicate whether or not to extend a sequence are experimentally determined, in various approaches; Paragraph 0152, Moreover, in certain approaches segmentation per operation 306 of method 300 uses unsupervised models to delineate different tasks within event streams), the system including a non-transitory computer-readable medium storing computer-executable instructions thereon such that when the instructions are executed (Paragraph 0009, In another implementation, a computer program product for discovering processes for robotic process automation (RPA) includes a computer readable storage medium having program instructions embodied therewith. The computer readable storage medium is not a transitory signal per se, and the program instructions are executable by a processor to cause the processor to), the system is configured to:
receive a plurality of sets of multimodal event data from a plurality of sources, at least one of the plurality of sets received in real-time (Paragraph 0289, The illustrative interactive, automated agent implementation of the presently described inventive concepts, as with many other implementations, is unique primarily with respect to the nature of the inputs provided by the user and appropriate responses defining the various event streams. In the context of a “call center, "for example, it may be appropriate for an interactive, automated agent to generate one or more automated introductory communications to provide context to the user based on the user's desired goal/task to be performed. In addition, upon receiving various responses from the user, the automated agent may provide appropriate replies, preferably including suitable options from among which the user may choose to ultimately obtain the desired goal/perform the desired task; Paragraph 0292, For instance, event streams corresponding to users seeking to obtain a particular service are preferably grouped and separated from event streams corresponding to users seeking to post data to a project, similarly event streams corresponding to users seeking to purchase or return a particular product are preferably grouped and separated from event streams corresponding to other task types; Paragraph 0293, For example, users may be required/requested to indicate the particular task they wish to accomplish and streams filtered based on the users' responses; Paragraph 0303, For instance, in one approach identifying processes suitable for creating an automated, interactive agent includes recording event streams of users interacting with an online marketplace configured to facilitate obtaining services and/or products, modifying services and/or products, and/or canceling/terminating services or returning products. The event streams may be recorded in accordance with operation 302 of method 300, and preferably include a user's indication (whether provided in textual, auditory, visual format, combinations thereof, or otherwise) of the service(s) and/or product(s) of interest, as well as whether the user desires to obtain, modify, or cancel/return such service(s) and/or product(s). Preferably, entire turns of conversation, and associated actions taken by each party to the conversation, are independently considered individual events within the event stream; Examiner interprets the “responses of the user” as the data received in real time), each set of the multimodal event data including a plurality of event instances, the plurality of data sources including audio data, video data, and a keystroke data log including timestamps for when a key was pressed, the keystroke data including application telemetry associated with a software application in which the key presses were inputted (Paragraph 0022, An “event stream” as referenced in the present disclosure is a recorded sequence of UI actions (e.g. mouse clicks, keyboard strokes, interactions with various elements of a graphical user interface (GUI), auditory input, eye movements and/or blinks, pauses, gestures (including gestures received/input via a touchscreen device, as well as gestures performed in view of a camera, e.g. for VR applications), etc.) and/or associated device actions (e.g. OS actions, API calls, calls to data source(s), etc.) for a particular user over a particular time period. In preferred implementations, an “event stream” may also include contextual information associated with the user's interactions, such as an identity of the user, various data sources relied upon/used in the course of the user's interactions, content of the computing device's display, including but not limited to content of a particular window, application, UI, etc., either in raw form or processed to revel, for instance, key-value pairs, and/or other elements displayed on the screen, particularly contextual information such as the window/application/UI/UI element, etc. upon which the user is focused; time of day at which various interactions are performed, one or more “groups” with which the user is associated (e.g. a project name, a workgroup, a department of the enterprise, a position or rank of the user, various permissions associated with the user, etc.), an identification of a device, operating system, application, etc. associated with the user performing various interactions within the event stream, or any other relevant contextual information that may be provided by the computing device during the course of the event stream, whether such information is directly or indirectly related to input provided by the user, as would be understood by a person having ordinary skill in the art upon reading the present disclosures; Paragraph 0024, “Event streams” may be conceptualized as a series of events, where each “event” includes any suitable number or combination of UI actions within an event stream. For example, in one approach an event may include a particular keystroke, mouse click, or combination thereof performed within a given application running on a computing device. One concrete example would be a left mouse click while a particular key, such as Control, Shift, Alt, etc. is depressed and the computing device is “focused” on a spreadsheet application. Similarly, the meaning of a keypress “enter” depends upon the application/window/UI element upon which a user is focused, e.g. pressing the “enter” key when an application icon is selected may launch the application, while pressing “enter” when focused on a cell of a spreadsheet may cause a function to be executed or value entered into the cell. Accordingly, such events may indicate to perform a certain operation on a certain data value represented within a table or other data structure; Paragraph 0075, The event streams may be recorded using any combination of known techniques such as keylogging, video recording, audio recording, screen recording/snap shots, etc; Paragraph 0303, The event streams may be recorded in accordance with operation 302 of method 300, and preferably include a user's indication (whether provided in textual, auditory, visual format, combinations thereof, or otherwise) of the service(s) and/or product(s) of interest, as well as whether the user desires to obtain, modify, or cancel/return such service(s) and/or product(s); Examiner interprets the “plurality of sets of multimodal event data from a plurality of sources” as the keyboard strokes, gestures textual, auditory, visual format, video recording, or combinations thereof); 
associate each set of the multimodal event data with a respective vector representation, such that the plurality of event instances is represented as a plurality of event vectors, each of the respective vector representations being generated by a plurality of vectorizers that include an adaptor to read the multimodal event data and convert the multimodal event data to a corresponding vectorized data (Paragraph 0161, In an exemplary embodiment, each subsequence includes categorical and/or numerical features, where categorical features include a process or application ID; an event type (e.g. mouse click, keypress, gesture, button press, etc.; a series of UI widgets invoked during the subsequence; and/or a value (such as a particular character or mouse button press) for various events in the subsequence. Numerical features may include, for example, a coordinate location corresponding to an action, a numerical identifier corresponding to a particular widget within a UI, a time elapsed since a most recent previous event occurrence, etc. as would be appreciated by a person having ordinary skill in the art upon reading the present disclosure; Paragraph 0162,  Preferably, the feature vectors are calculated using a known auto-encoder and yield dense feature vectors for each window, e.g. vectors having a dimensionality in a range from about 50 to about 100 for a window length of about 30 events per subsequence. Exemplary auto-encoders suitable for calculating features for the various subsequences may include conventional auto-encoder networks, language-oriented networks (e.g. skip-grams), fastText, etc. as would be appreciated by a person having ordinary skill in the art upon reading the present disclosure. Alternatively, feature vectors can be calculated in this manner for each event in the event stream, using any of the methods above, and the feature vector for a window of length N starting at position p is the concatenation of the feature vectors of the events corresponding to the positions p through p+N−1); 
correlate by a process mining server the plurality of event vectors using unsupervised learning to identify one or more processes from the vectorized data (Paragraph 0148, As an example, if during a segmentation operation 306 the following sequences are observed five times each within a given event stream or set of event streams: ABCDEFGHI, ABCDEFGPQ, XYZDEFGHI, and XYZDEFGPQ, then it is plausible to assume that the sequence “DEFG” represents a common individual task, and that the sequences “ABC” and “PQ” are instances of other tasks and should not be combined with the sequence “DEFG.” The frequency thresholds that indicate whether or not to extend a sequence are experimentally determined, in various approaches; Paragraph 0152, Moreover , in certain approaches segmentation per operation 306 of method 300 uses unsupervised models to delineate different tasks within event streams), wherein the vectorized data associated with at least one of the sets of multimodal data including tasks or events that occur with an unknown or variable duration separating occurrences within the tasks or events (Paragraph 0141, The simplest implementation of segmentation per method 300 and operation 306 involves analyzing the text of concatenated event streams to extract common subsequences of a predetermined length. A major and unique challenge for repetitive pattern discovery in RPA mining is that the length of sequences of events that implement the same task is not the same, and that there is a duration associated with each event. Notably, this challenge is unique in the context of RPA mining, even though other fields such as bioinformatics face similar problems with respect to pattern discovery. Without consideration of the duration of events, an event sequence can be represented by a sequence of characters, without loss of generality. For instance, suppose that ABABCABABCABABC is an event stream. It contains repetitive sequence patterns that may not be unique. For instance, repetitive sequence patterns with length 2 are AB; BA; BC in the above event stream. The repetitive sequence patterns with length 4 are ABAB; BABC; ABCA; and BCAB, while the repetitive sequence patterns with length 5 are ABABC; BABCA; ABCAB; BCABA; and CABAB; It can be noted that the claim language is written in alternative form.  The limitation taught by Ma et al. is based on “including tasks or events that occur with an unknown duration"), and wherein the vectorized data associated with at least two of the sets of multimodal event data vectorizes data have at least two different modalities (Paragraph 0161, In an exemplary embodiment, each subsequence includes categorical and/or numerical features, where categorical features include a process or application ID; an event type (e.g. mouse click, keypress, gesture, button press, etc.; a series of UI widgets invoked during the subsequence; and/or a value (such as a particular character or mouse button press) for various events in the subsequence. Numerical features may include, for example, a coordinate location corresponding to an action, a numerical identifier corresponding to a particular widget within a UI, a time elapsed since a most recent previous event occurrence, etc. as would be appreciated by a person having ordinary skill in the art upon reading the present disclosure; Paragraph 0162,  Preferably, the feature vectors are calculated using a known auto-encoder and yield dense feature vectors for each window, e.g. vectors having a dimensionality in a range from about 50 to about 100 for a window length of about 30 events per subsequence. Exemplary auto-encoders suitable for calculating features for the various subsequences may include conventional auto-encoder networks, language-oriented networks (e.g. skip-grams), fastText, etc. as would be appreciated by a person having ordinary skill in the art upon reading the present disclosure. Alternatively, feature vectors can be calculated in this manner for each event in the event stream, using any of the methods above, and the feature vector for a window of length N starting at position p is the concatenation of the feature vectors of the events corresponding to the positions p through p+N−1), the process mining server using time-windowing to group events from the multimodal event data into macro level tasks for the one or more processes (Paragraph 0022, An “event stream” as referenced in the present disclosure is a recorded sequence of UI actions (e.g. mouse clicks, keyboard strokes, interactions with various elements of a graphical user interface (GUI), auditory input, eye movements and/or blinks, pauses, gestures (including gestures received/input via a touchscreen device, as well as gestures performed in view of a camera, e.g. for VR applications), etc.) and/or associated device actions (e.g. OS actions, API calls, calls to data source(s), etc.) for a particular user over a particular time period; Paragraph 0024, Preferably, “events” refer to a single point in time (or small window of time, e.g. less than one second or an amount of time required to perform a more complex action such as a double-click, gesture, or other action that is defined by multiple related inputs intended to be interpreted as a single input), along with the associated interactions and/or device actions occurring at the given point in time. In all cases, where an event encompasses multiple user interactions, device actions, etc., these are contiguous interactions, actions, etc. forming a single linear sequence; Paragraph 0152, In more approaches, a predefined window of length N may be used to segment event streams into individual traces; Paragraph 0197, Segmenting based on a window-based distance metric); 
generate and store a process model script for the one or more processes (Paragraph 0223, Regardless of whether segmentation and clustering are performed separately or in a combined fashion, in operation 310 of method 300, one or more processes for robotic automation (RPA) are identified from among the clustered traces. Identifying processes for RPA includes identifying segments/traces wherein a human-performed task is subject to automation (e.g. capable of being understood and performed by a computer without human direction); Paragraph 0229, For instance, in several preferred approaches method 300 may include selectively building a robotic process automation (RPA) model for at least one cluster based at least in part on a frequency of one or more variants of the clustered traces. The RPA model(s) may include or be represented by a directed, acyclic graph (DAG) describing some or all of the traces of a given cluster. Preferably, selectively building the RPA model comprises identifying a minimum-weight, maximum-frequency path from an initial node of the DAG to a final node of the DAG; Paragraph 0288, Recorded event streams may be stored, preferably in one or more tables of a database, in a peripheral 120 configured for data storage, e.g. a storage unit 220 as shown in FIG. 2);
wherein the at least one of the plurality of sets of multimodal event data received in real-time is classified as belonging to a first process in the one or more processes: continue monitoring the at least one of the plurality of sets of multimodal event data in real-time, ... (Paragraph 0289, The illustrative interactive, automated agent implementation of the presently described inventive concepts, as with many other implementations, is unique primarily with respect to the nature of the inputs provided by the user and appropriate responses defining the various event streams. In the context of a “call center, "for example, it may be appropriate for an interactive, automated agent to generate one or more automated introductory communications to provide context to the user based on the user's desired goal/task to be performed. In addition, upon receiving various responses from the user, the automated agent may provide appropriate replies, preferably including suitable options from among which the user may choose to ultimately obtain the desired goal/perform the desired task; Paragraph 0292, For instance, event streams corresponding to users seeking to obtain a particular service are preferably grouped and separated from event streams corresponding to users seeking to post data to a project, similarly event streams corresponding to users seeking to purchase or return a particular product are preferably grouped and separated from event streams corresponding to other task types; Paragraph 0293, For example, users may be required/requested to indicate the particular task they wish to accomplish and streams filtered based on the users' responses; Paragraph 0303, For instance, in one approach identifying processes suitable for creating an automated, interactive agent includes recording event streams of users interacting with an online marketplace configured to facilitate obtaining services and/or products, modifying services and/or products, and/or canceling/terminating services or returning products. The event streams may be recorded in accordance with operation 302 of method 300, and preferably include a user's indication (whether provided in textual, auditory, visual format, combinations thereof, or otherwise) of the service(s) and/or product(s) of interest, as well as whether the user desires to obtain, modify, or cancel/return such service(s) and/or product(s). Preferably, entire turns of conversation, and associated actions taken by each party to the conversation, are independently considered individual events within the event stream; Examiner interprets the “responses of the user” as the data received in real time. Further, based on the user responses from the one or more processes, the system adjusts hypotheses accordingly (e.g. streams filtered based on the user’s responses)).
Although Ma et al. discloses all the limitations above and selecting a process based on real-time data (e.g. streams filtered based on the user’s responses), Ma et al. does not specifically disclose based on the continued monitoring, adjusting and determining that the at least one of the plurality of sets of multimodal event data received in real-time belongs to a second process in the one or more processes that is different from the first process.
However, Linnell et al. discloses wherein the at least one of the plurality of sets of multimodal event data received in real-time is classified as belonging to a first process in the one or more processes: continue monitoring the at least one of the plurality of sets of multimodal event data in real-time, and based on the continued monitoring, adjusting and determining that the at least one of the plurality of sets of multimodal event data received in real-time belongs to a second process in the one or more processes that is different from the first process (Column 2, lines 6-32, In a further example, a system including a first computing device and a second computing device is disclosed. The first computing device may be configured to display a visual simulation of one or more robotic devices executing corresponding sequences of operations within a workcell. The second computing device may be configured to receive input data from the first computing device that identifies one or more data sources to monitor, where the input data further indicates one or more adjustments to make in response to one or more deviations by at least one of the data sources from at least one predicted state during subsequent execution of one or more sequences of operations by one or more robotic devices within a workcell. The second computing device may also be configured to receive one or more data streams from the one or more data sources during execution of the sequences of operations by the robotic devices within the workcell. The second computing device may further be configured to identify, based on the received data streams, a deviation by one of the data sources from a predicted state for which the received input data indicates one or more adjustments to the sequences of operations for the one or more robotic devices. The second computing device may additionally be configured to provide instructions to the one or more robotic devices to execute the adjusted sequences of operations. The second computing device may further be configured to provide instructions to the first computing device to update the visual simulation based on the adjusted sequences of operations; Column 23, lines 18-25, In some examples, the data sources to watch may include one or more of the robotic devices. For instance, control system 908 may receive information from the robotic devices Such as position information, joint parameters, position information associated with an axis of motion for a robotic device, parameters associated with operation of an end-effector mounted tool, and/or diagnostic information related to systems on a robotic device).
It would have been obvious to one ordinary skill in the art at the time the invention was filed to modify the unsupervised learning to identify one or more processes, wherein the process is selected based on real-time data of the invention of Ma et al. to further incorporate wherein the process is adjusted based on deviations in the current process of the invention of Linnell et al. because doing so would allow the system to identify, based on the received data streams, a deviation by one of the data sources from a predicted state for which the received input data indicates one or more adjustments to the sequences of operations for the one or more robotic devices (see Linnell et al., Column 2, lines 6-32). Further, the claimed invention is merely a combination of old elements, and in combination each element would have performed the same function as it did separately, and one of ordinary skill in the art would have recognized that the results of the combination were predictable.
Regarding claim 9 (Currently Amended), Ma et al. discloses a method for discovering business processes using unsupervised learning (Paragraph 0002, The present invention relates to process automation, and more specifically, this invention relates to systems and methods for identifying processes for robotic automation; Paragraph 0148, As an example, if during a segmentation operation 306 the following sequences are observed five times each within a given event stream or set of event streams: ABCDEFGHI, ABCDEFGPQ, XYZDEFGHI, and XYZDEFGPQ, then it is plausible to assume that the sequence “DEFG” represents a common individual task, and that the sequences “ABC” and “PQ” are instances of other tasks and should not be combined with the sequence “DEFG.” The frequency thresholds that indicate whether or not to extend a sequence are experimentally determined, in various approaches; Paragraph 0152, Moreover, in certain approaches segmentation per operation 306 of method 300 uses unsupervised models to delineate different tasks within event streams), the method comprising:  
receiving a plurality of sets of multimodal event data from a plurality of sources (Paragraph 0289, The illustrative interactive, automated agent implementation of the presently described inventive concepts, as with many other implementations, is unique primarily with respect to the nature of the inputs provided by the user and appropriate responses defining the various event streams. In the context of a “call center, "for example, it may be appropriate for an interactive, automated agent to generate one or more automated introductory communications to provide context to the user based on the user's desired goal/task to be performed. In addition, upon receiving various responses from the user, the automated agent may provide appropriate replies, preferably including suitable options from among which the user may choose to ultimately obtain the desired goal/perform the desired task; Paragraph 0292, For instance, event streams corresponding to users seeking to obtain a particular service are preferably grouped and separated from event streams corresponding to users seeking to post data to a project, similarly event streams corresponding to users seeking to purchase or return a particular product are preferably grouped and separated from event streams corresponding to other task types; Paragraph 0293, For example, users may be required/requested to indicate the particular task they wish to accomplish and streams filtered based on the users' responses; Paragraph 0303, For instance, in one approach identifying processes suitable for creating an automated, interactive agent includes recording event streams of users interacting with an online marketplace configured to facilitate obtaining services and/or products, modifying services and/or products, and/or canceling/terminating services or returning products. The event streams may be recorded in accordance with operation 302 of method 300, and preferably include a user's indication (whether provided in textual, auditory, visual format, combinations thereof, or otherwise) of the service(s) and/or product(s) of interest, as well as whether the user desires to obtain, modify, or cancel/return such service(s) and/or product(s). Preferably, entire turns of conversation, and associated actions taken by each party to the conversation, are independently considered individual events within the event stream; Examiner interprets the “responses of the user” as the data received in real time), each set of the multimodal event data including a plurality of event instances, the plurality of data sources including a keystroke data log including timestamps for when a key was pressed, the keystroke data including application telemetry associated with a software application in which the key presses were inputted (Paragraph 0022, An “event stream” as referenced in the present disclosure is a recorded sequence of UI actions (e.g. mouse clicks, keyboard strokes, interactions with various elements of a graphical user interface (GUI), auditory input, eye movements and/or blinks, pauses, gestures (including gestures received/input via a touchscreen device, as well as gestures performed in view of a camera, e.g. for VR applications), etc.) and/or associated device actions (e.g. OS actions, API calls, calls to data source(s), etc.) for a particular user over a particular time period. In preferred implementations, an “event stream” may also include contextual information associated with the user's interactions, such as an identity of the user, various data sources relied upon/used in the course of the user's interactions, content of the computing device's display, including but not limited to content of a particular window, application, UI, etc., either in raw form or processed to revel, for instance, key-value pairs, and/or other elements displayed on the screen, particularly contextual information such as the window/application/UI/UI element, etc. upon which the user is focused; time of day at which various interactions are performed, one or more “groups” with which the user is associated (e.g. a project name, a workgroup, a department of the enterprise, a position or rank of the user, various permissions associated with the user, etc.), an identification of a device, operating system, application, etc. associated with the user performing various interactions within the event stream, or any other relevant contextual information that may be provided by the computing device during the course of the event stream, whether such information is directly or indirectly related to input provided by the user, as would be understood by a person having ordinary skill in the art upon reading the present disclosures; Paragraph 0024, “Event streams” may be conceptualized as a series of events, where each “event” includes any suitable number or combination of UI actions within an event stream. For example, in one approach an event may include a particular keystroke, mouse click, or combination thereof performed within a given application running on a computing device. One concrete example would be a left mouse click while a particular key, such as Control, Shift, Alt, etc. is depressed and the computing device is “focused” on a spreadsheet application. Similarly, the meaning of a keypress “enter” depends upon the application/window/UI element upon which a user is focused, e.g. pressing the “enter” key when an application icon is selected may launch the application, while pressing “enter” when focused on a cell of a spreadsheet may cause a function to be executed or value entered into the cell. Accordingly, such events may indicate to perform a certain operation on a certain data value represented within a table or other data structure; Paragraph 0075, The event streams may be recorded using any combination of known techniques such as keylogging, video recording, audio recording, screen recording/snap shots, etc; Paragraph 0303, The event streams may be recorded in accordance with operation 302 of method 300, and preferably include a user's indication (whether provided in textual, auditory, visual format, combinations thereof, or otherwise) of the service(s) and/or product(s) of interest, as well as whether the user desires to obtain, modify, or cancel/return such service(s) and/or product(s); Examiner interprets the “plurality of sets of multimodal event data from a plurality of sources” as the keyboard strokes, gestures textual, auditory, visual format, video recording, or combinations thereof); 
associating each set of the multimodal event data with a respective vector representation, such that the plurality of event instances is represented as a plurality of event vectors, each of the respective vector representations being generated by a plurality of vectorizers that include an adaptor to read the multimodal event data and convert the multimodal event data to a corresponding vectorized data (Paragraph 0161, In an exemplary embodiment, each subsequence includes categorical and/or numerical features, where categorical features include a process or application ID; an event type (e.g. mouse click, keypress, gesture, button press, etc.; a series of UI widgets invoked during the subsequence; and/or a value (such as a particular character or mouse button press) for various events in the subsequence. Numerical features may include, for example, a coordinate location corresponding to an action, a numerical identifier corresponding to a particular widget within a UI, a time elapsed since a most recent previous event occurrence, etc. as would be appreciated by a person having ordinary skill in the art upon reading the present disclosure; Paragraph 0162,  Preferably, the feature vectors are calculated using a known auto-encoder and yield dense feature vectors for each window, e.g. vectors having a dimensionality in a range from about 50 to about 100 for a window length of about 30 events per subsequence. Exemplary auto-encoders suitable for calculating features for the various subsequences may include conventional auto-encoder networks, language-oriented networks (e.g. skip-grams), fastText, etc. as would be appreciated by a person having ordinary skill in the art upon reading the present disclosure. Alternatively, feature vectors can be calculated in this manner for each event in the event stream, using any of the methods above, and the feature vector for a window of length N starting at position p is the concatenation of the feature vectors of the events corresponding to the positions p through p+N−1); 
correlating by a process mining server the plurality of event vectors using unsupervised learning to identify one or more processes from the vectorized data (Paragraph 0148, As an example, if during a segmentation operation 306 the following sequences are observed five times each within a given event stream or set of event streams: ABCDEFGHI, ABCDEFGPQ, XYZDEFGHI, and XYZDEFGPQ, then it is plausible to assume that the sequence “DEFG” represents a common individual task, and that the sequences “ABC” and “PQ” are instances of other tasks and should not be combined with the sequence “DEFG.” The frequency thresholds that indicate whether or not to extend a sequence are experimentally determined, in various approaches; Paragraph 0152, Moreover , in certain approaches segmentation per operation 306 of method 300 uses unsupervised models to delineate different tasks within event streams), wherein the vectorized data associated with at least one of the sets of multimodal data including tasks or events that occur with an unknown or variable duration separating occurrences within the tasks or events (Paragraph 0141, The simplest implementation of segmentation per method 300 and operation 306 involves analyzing the text of concatenated event streams to extract common subsequences of a predetermined length. A major and unique challenge for repetitive pattern discovery in RPA mining is that the length of sequences of events that implement the same task is not the same, and that there is a duration associated with each event. Notably, this challenge is unique in the context of RPA mining, even though other fields such as bioinformatics face similar problems with respect to pattern discovery. Without consideration of the duration of events, an event sequence can be represented by a sequence of characters, without loss of generality. For instance, suppose that ABABCABABCABABC is an event stream. It contains repetitive sequence patterns that may not be unique. For instance, repetitive sequence patterns with length 2 are AB; BA; BC in the above event stream. The repetitive sequence patterns with length 4 are ABAB; BABC; ABCA; and BCAB, while the repetitive sequence patterns with length 5 are ABABC; BABCA; ABCAB; BCABA; and CABAB; It can be noted that the claim language is written in alternative form.  The limitation taught by Ma et al. is based on “including tasks or events that occur with an unknown duration"), and wherein the vectorized data associated with at least two of the sets of multimodal event data vectorizes data have at least two different modalities (Paragraph 0161, In an exemplary embodiment, each subsequence includes categorical and/or numerical features, where categorical features include a process or application ID; an event type (e.g. mouse click, keypress, gesture, button press, etc.; a series of UI widgets invoked during the subsequence; and/or a value (such as a particular character or mouse button press) for various events in the subsequence. Numerical features may include, for example, a coordinate location corresponding to an action, a numerical identifier corresponding to a particular widget within a UI, a time elapsed since a most recent previous event occurrence, etc. as would be appreciated by a person having ordinary skill in the art upon reading the present disclosure; Paragraph 0162,  Preferably, the feature vectors are calculated using a known auto-encoder and yield dense feature vectors for each window, e.g. vectors having a dimensionality in a range from about 50 to about 100 for a window length of about 30 events per subsequence. Exemplary auto-encoders suitable for calculating features for the various subsequences may include conventional auto-encoder networks, language-oriented networks (e.g. skip-grams), fastText, etc. as would be appreciated by a person having ordinary skill in the art upon reading the present disclosure. Alternatively, feature vectors can be calculated in this manner for each event in the event stream, using any of the methods above, and the feature vector for a window of length N starting at position p is the concatenation of the feature vectors of the events corresponding to the positions p through p+N−1), the process mining server using time-windowing to group events from the multimodal event data into macro level tasks for the one or more processes (Paragraph 0022, An “event stream” as referenced in the present disclosure is a recorded sequence of UI actions (e.g. mouse clicks, keyboard strokes, interactions with various elements of a graphical user interface (GUI), auditory input, eye movements and/or blinks, pauses, gestures (including gestures received/input via a touchscreen device, as well as gestures performed in view of a camera, e.g. for VR applications), etc.) and/or associated device actions (e.g. OS actions, API calls, calls to data source(s), etc.) for a particular user over a particular time period; Paragraph 0024, Preferably, “events” refer to a single point in time (or small window of time, e.g. less than one second or an amount of time required to perform a more complex action such as a double-click, gesture, or other action that is defined by multiple related inputs intended to be interpreted as a single input), along with the associated interactions and/or device actions occurring at the given point in time. In all cases, where an event encompasses multiple user interactions, device actions, etc., these are contiguous interactions, actions, etc. forming a single linear sequence; Paragraph 0152, In more approaches, a predefined window of length N may be used to segment event streams into individual traces; Paragraph 0197, Segmenting based on a window-based distance metric); 
generating and storing a process model script for the one or more processes (Paragraph 0223, Regardless of whether segmentation and clustering are performed separately or in a combined fashion, in operation 310 of method 300, one or more processes for robotic automation (RPA) are identified from among the clustered traces. Identifying processes for RPA includes identifying segments/traces wherein a human-performed task is subject to automation (e.g. capable of being understood and performed by a computer without human direction); Paragraph 0229, For instance, in several preferred approaches method 300 may include selectively building a robotic process automation (RPA) model for at least one cluster based at least in part on a frequency of one or more variants of the clustered traces. The RPA model(s) may include or be represented by a directed, acyclic graph (DAG) describing some or all of the traces of a given cluster. Preferably, selectively building the RPA model comprises identifying a minimum-weight, maximum-frequency path from an initial node of the DAG to a final node of the DAG; Paragraph 0288, Recorded event streams may be stored, preferably in one or more tables of a database, in a peripheral 120 configured for data storage, e.g. a storage unit 220 as shown in FIG. 2);
wherein the at least one of the plurality of sets of multimodal event data received in real-time is classified as belonging to a first process in the one or more processes: continuing to monitor the at least one of the plurality of sets of multimodal event data in real-time, ... (Paragraph 0289, The illustrative interactive, automated agent implementation of the presently described inventive concepts, as with many other implementations, is unique primarily with respect to the nature of the inputs provided by the user and appropriate responses defining the various event streams. In the context of a “call center, "for example, it may be appropriate for an interactive, automated agent to generate one or more automated introductory communications to provide context to the user based on the user's desired goal/task to be performed. In addition, upon receiving various responses from the user, the automated agent may provide appropriate replies, preferably including suitable options from among which the user may choose to ultimately obtain the desired goal/perform the desired task; Paragraph 0292, For instance, event streams corresponding to users seeking to obtain a particular service are preferably grouped and separated from event streams corresponding to users seeking to post data to a project, similarly event streams corresponding to users seeking to purchase or return a particular product are preferably grouped and separated from event streams corresponding to other task types; Paragraph 0293, For example, users may be required/requested to indicate the particular task they wish to accomplish and streams filtered based on the users' responses; Paragraph 0303, For instance, in one approach identifying processes suitable for creating an automated, interactive agent includes recording event streams of users interacting with an online marketplace configured to facilitate obtaining services and/or products, modifying services and/or products, and/or canceling/terminating services or returning products. The event streams may be recorded in accordance with operation 302 of method 300, and preferably include a user's indication (whether provided in textual, auditory, visual format, combinations thereof, or otherwise) of the service(s) and/or product(s) of interest, as well as whether the user desires to obtain, modify, or cancel/return such service(s) and/or product(s). Preferably, entire turns of conversation, and associated actions taken by each party to the conversation, are independently considered individual events within the event stream; Examiner interprets the “responses of the user” as the data received in real time. Further, based on the user responses from the one or more processes, the system adjusts hypotheses accordingly (e.g. streams filtered based on the user’s responses)).
Although Ma et al. discloses all the limitations above and selecting a process based on real-time data (e.g. streams filtered based on the user’s responses), Ma et al. does not specifically disclose based on the continued monitoring, adjusting and determining that the at least one of the plurality of sets of multimodal event data received in real-time belongs to a second process in the one or more processes that is different from the first process.
However, Linnell et al. discloses wherein the at least one of the plurality of sets of multimodal event data received in real-time is classified as belonging to a first process in the one or more processes; continuing to monitor the at least one of the plurality of sets of multimodal event data in real-time, and based on the continued monitoring, adjusting and determining that the at least one of the plurality of sets of multimodal event data received in real-time belongs to a second process in the one or more processes that is different from the first process (Column 2, lines 6-32, In a further example, a system including a first computing device and a second computing device is disclosed. The first computing device may be configured to display a visual simulation of one or more robotic devices executing corresponding sequences of operations within a workcell. The second computing device may be configured to receive input data from the first computing device that identifies one or more data sources to monitor, where the input data further indicates one or more adjustments to make in response to one or more deviations by at least one of the data sources from at least one predicted state during subsequent execution of one or more sequences of operations by one or more robotic devices within a workcell. The second computing device may also be configured to receive one or more data streams from the one or more data sources during execution of the sequences of operations by the robotic devices within the workcell. The second computing device may further be configured to identify, based on the received data streams, a deviation by one of the data sources from a predicted state for which the received input data indicates one or more adjustments to the sequences of operations for the one or more robotic devices. The second computing device may additionally be configured to provide instructions to the one or more robotic devices to execute the adjusted sequences of operations. The second computing device may further be configured to provide instructions to the first computing device to update the visual simulation based on the adjusted sequences of operations; Column 23, lines 18-25, In some examples, the data sources to watch may include one or more of the robotic devices. For instance, control system 908 may receive information from the robotic devices Such as position information, joint parameters, position information associated with an axis of motion for a robotic device, parameters associated with operation of an end-effector mounted tool, and/or diagnostic information related to systems on a robotic device).
It would have been obvious to one ordinary skill in the art at the time the invention was filed to modify the unsupervised learning to identify one or more processes, wherein the process is selected based on real-time data of the invention of Ma et al. to further incorporate wherein the process is adjusted based on deviations in the current process of the invention of Linnell et al. because doing so would allow the system to identify, based on the received data streams, a deviation by one of the data sources from a predicted state for which the received input data indicates one or more adjustments to the sequences of operations for the one or more robotic devices (see Linnell et al., Column 2, lines 6-32). Further, the claimed invention is merely a combination of old elements, and in combination each element would have performed the same function as it did separately, and one of ordinary skill in the art would have recognized that the results of the combination were predictable.
Regarding claims 5 and 13 (Original), which are dependent of claims 1 and 9, the combination of Ma et al. and Linnell et al. discloses all the limitations in claims 1 and 9. Ma et al. further discloses wherein the process model script includes one or more directed graphs (Paragraph 0229, The RPA model(s) may include or be represented by a directed, acyclic graph (DAG) describing some or all of the traces of a given cluster).
Regarding claims 6 and 14 (Original), which are dependent of claims 1 and 9, the combination of Ma et al. and Linnell et al. discloses all the limitations in claims 1 and 9. Ma et al. further discloses wherein the process model script is a robotic process automation (RPA) script (Paragraph 0229, For instance, in several preferred approaches method 300 may include selectively building a robotic process automation (RPA) model for at least one cluster based at least in part on a frequency of one or more variants of the clustered traces. The RPA model(s) may include or be represented by a directed, acyclic graph (DAG) describing some or all of the traces of a given cluster. Preferably, selectively building the RPA model comprises identifying a minimum-weight, maximum-frequency path from an initial node of the DAG to a final node of the DAG).
Regarding claims 7 and 15 (Original), which are dependent of claims 1 and 9, the combination of Ma et al. and Linnell et al. discloses all the limitations in claims 1 and 9. Ma et al. further discloses wherein the plurality of sources includes two or more selected from the group consisting of: one or more Internet Information Services (IIS) log files, one or more Apache log file, one or more application log files, one or more standard operating procedure (SOP) manuals, one or more screen capture logs, one or more keystroke logs, one or more business process documents (BPDs) (Paragraph 0075, The event streams may be recorded using any combination of known techniques such as keylogging, video recording, audio recording, screen recording/snap shots, etc; Paragraph 0303, The event streams may be recorded in accordance with operation 302 of method 300, and preferably include a user's indication (whether provided in textual, auditory, visual format, combinations thereof, or otherwise) of the service(s) and/or product(s) of interest, as well as whether the user desires to obtain, modify, or cancel/return such service(s) and/or product(s)).
Regarding claims 17 and 19 (New), which is dependent of claim 1, the combination of Ma et al. and Linnell et al. discloses all the limitations in claim 1. Ma et al. further configured to group events from the multimodal event data into the macro level tasks to match tasks in a standard operating procedure manual (Paragraph 0091, a “process name” field and name of an associated computer process (which may be specified from the user, obtained from image data depicting the process, obtained from or provided by the system manager/task manager of the operating system, obtained by performing a lookup using a related value such as a process ID, or any other suitable technique for obtaining computer process names), such as “notepad.exe” “WINWORD.exe”, etc. as would be appreciated by a skilled artisan reading the present disclosures).
Regarding claims 18 and 20 (New), which is dependent of claim 1, the combination of Ma et al. and Linnell et al. discloses all the limitations in claim 1. Ma et al. further configured to (i) display a visual representation of the one or more processes based on the process model script, (ii) automate the one or more processes based on the process model script, or both (i) and (ii) (Paragraph 0037, Keeping the foregoing definitions in mind, the following description discloses several preferred implementations of systems, methods and computer program products for identifying processes for robotic process automation, and building robotic process automation models to improve the efficiency of tasks typically performed manually by a user interacting with a device such as a computer, tablet, smartphone, personal digital assistant, etc.; Paragraph 0241, An exemplary DAG 400 for building an RPA model is shown in FIG. 4, according to one aspect of the inventive concepts presented herein. The DAG 400 generally comprises a plurality of nodes 402 connected by edges 404.


Claims 4, 8, 12, and 16 are rejected under 35 U.S.C. 102(a)(1) as being unpatentable Ma et al. (US 2020/0206920 A1), in view of Linnell et al. (US 9,278,449 B1), in further view of Dechu et al. (US 2020/0320383 A1).
Regarding claims 4 and 12 (Original), which are dependent of claims 1 and 9, the combination of and Ma et al. and Linnell et al. discloses all the limitations in claims 1 and 9. Although Ma et al. discloses to correlate the plurality of event vectors (Paragraph 0148 & 0152), the combination of Ma et al. and Linnell et al. does not specifically disclose wherein the correlation is performed using a long short term memory (LSTM) neural network. 
However, Dechu et al. discloses to correlate the plurality of event vectors using a long short term memory (LSTM) neural network (Paragraph 0021, The case vectors are provided as input to the event sequence prediction model 118. The event sequence prediction model 118 learns associations between the cases and complete traces based on the case vectors and the event embeddings. The joint model trainer 114 outputs the trained joint model 120; Paragraph 0024, In FIG. 2, iterations of the LSTM 212 correspond to LSTM 212-1 . . . LSTM 212-N, and events from the event dictionary are represented as E0 . . . EN-1. Event vectors are created for the events and are provided as input to the LSTM 212 model. In the non-limiting example shown in FIG. 2, the event vectors are created using Event2Vec. At each iteration, the LSTM 212 model outputs a probability distribution for a given event represented as log P(E1) . . . log P(EN). It is noted that each of LSTM 212-1 . . . LSTM 212-N may include multiple LSTM layers; Paragraph 0035, The joint machine-learning model may include two or more of: a convolutional neural network; a recurrent neural network; and a long short-term memory (LSTM) network).
It would have been obvious to one ordinary skill in the art at the time the invention was filed to modify the unsupervised learning to identify one or more processes, wherein the one or more processes are identified by a process mining based on a correlation between the plurality of events of the invention of Ma et al. to further incorporate wherein the correlation between the plurality of events is identified using a long short term memory (LSTM) neural network of the invention of Dechu et al. because doing so would allow the system to use multiple LSTM layers to give the probability distribution of the subsequent step (see Dechu et al., Paragraph 0026). Further, the claimed invention is merely a combination of old elements, and in combination each element would have performed the same function as it did separately, and one of ordinary skill in the art would have recognized that the results of the combination were predictable.
Regarding claims 8 and 16 (Original), which are dependent of claims 1 and 9, Ma et al. and Linnell et al. discloses all the limitations in claims 1 and 9. Although Ma et al. discloses to identify candidate process model script for robotic automation, the combination of Ma et al. and Linnell et al. does not specifically disclose wherein the process model script identifies higher probability processes in the one or more processes. 
However, Dechu et al. discloses wherein the process model script identifies higher probability processes in the one or more processes (Paragraph 0024, In FIG. 2, iterations of the LSTM 212 correspond to LSTM 212-1 . . . LSTM 212-N, and events from the event dictionary are represented as E0 . . . EN-1. Event vectors are created for the events and are provided as input to the LSTM 212 model. In the non-limiting example shown in FIG. 2, the event vectors are created using Event2Vec. At each iteration, the LSTM 212 model outputs a probability distribution for a given event represented as log P(E1) . . . log P(EN). It is noted that each of LSTM 212-1 . . . LSTM 212-N may include multiple LSTM layers; Paragraph 0029, The additional details from the image(s) and comments provided by the customer for each case may be used to one or more of: (i) train the joint machine learning model (120) to consider these features and (ii) to predict the complete trace of the events that are likely to happen and preemptively perform one or more corrective actions).
It would have been obvious to one ordinary skill in the art at the time the invention was filed to modify the unsupervised learning to identify one or more processes, wherein the one or more processes are identified by a process mining based on a correlation between the plurality of events of the invention of Ma et al. to further incorporate wherein the correlation between the plurality of events is identified using a long short term memory (LSTM) neural network of the invention of Dechu et al. because doing so would allow the system to use multiple LSTM layers to give the probability distribution of the subsequent step (see Dechu et al., Paragraph 0026). Further, the claimed invention is merely a combination of old elements, and in combination each element would have performed the same function as it did separately, and one of ordinary skill in the art would have recognized that the results of the combination were predictable.

Claims 2-3 and 10-11 are rejected under 35 U.S.C. 103 as being unpatentable Ma et al. (US 2020/0206920 A1), in view of Linnell et al. (US 9,278,449 B1), in further view of Liu (US 2017/0177703 A1).
Regarding claims 2 and 10 (Original), which are dependent of claims 1 and 9, the combination of Ma et al. and Linnell et al. discloses all the limitations in claims 1 and 9. Ma et al. further configured to correlate the plurality of event vectors by: joining a first subset of the plurality of event vectors to create a first process matrix, joining a second subset of the plurality of event vectors to create a second process matrix, determining a similarity between the first process matrix and the second process matrix, the similarity measured as a [Euclidean distance] between the first process matrix and the second process matrix, and identifying that the first process matrix and the second process matrix refer to a same process in the one or more processes based on the similarity being below a threshold (Paragraph 0157, According to the preferred, “combined” or “hybrid” segmentation and clustering approach, recorded event streams are concatenated (optionally following cleaning/normalization), and substantially similar subsequences (i.e. having a content similarity greater than a predetermined similarity threshold) that appear within an event stream more often than a predetermined frequency threshold and cannot be extended in length without creating larger overall changes in the clustering (e.g. greater than a predetermined weight or distance threshold) are identified; Paragraph 0163, Regardless of the particular manner in which feature vectors are generated, a distance matrix is computed for all pairs of subsequences. The preferred metric for the distance given the calculation of the feature vectors as described above is the Euclidean distance; however, other distance metrics can also be of value, for instance the cosine similarity, or the Levenshtein distance if the feature vectors are understood to be directly word sequences in the event language. Clusters of non-overlapping subsequences may then be identified according to similarity, using various techniques and without departing from the scope of the inventive concepts described herein. For example, in one embodiment a predetermined set k of pairs of subsequences characterized by the smallest distances between the elements of the pairs among the overall distance matrix may be selected as initial clusters representing k task types).
Although Ma et al. discloses all the limitations above and similarities between processes using a Euclidean distance, the combination of Ma et al. and Linnell et al. does not specifically disclose wherein the similarity is measured as a dot product.
However, Liu discloses to correlate the plurality of event vectors by: joining a first subset of the plurality of event vectors to create a first process matrix, joining a second subset of the plurality of event vectors to create a second process matrix, determining a similarity between the first process matrix and the second process matrix, the similarity measured as a dot product between the first process matrix and the second process matrix, and identifying that the first process matrix and the second process matrix refer to a same process in the one or more processes based on the similarity being below a threshold (Paragraph 0047, In some embodiments, in generating the set of shared semantic vectors, the vector component 222 determines a semantic similarity measured by a similarity function. In some instances, the similarity function is a cosine similarity function. The cosine similarity may be a dot product modified by a normalization of the dot products of the vectors. For example, the semantic similarity measure may be represented as X*Y (∥X∥*∥Y∥), where X is the source vector and Y is the target vector. Each vector X and Y may be floating points with n dimensions. X*Y may be the sum of the n dimensional floating points for the X vector and the Y vector. ∥X∥ may be a normalization of the dot product of the X vector. ∥Y∥ may be a normalization of the dot product of the Y vector. Although described in a specified embodiment with respect to cosine similarity functions, it should be understood that the semantic similarity may be determined by any suitable manner; Paragraph 0048, After generating a representation for each term of the one or more first categories and one or more second categories, the vector component 222 compares one or more terms for a specified first category of the one or more first categories to one or more terms for a specified second category of the one or more second categories using the cosine similarity function. The cosine similarity function may measure a similarity between two or more semantic vectors (e.g., vector representations of terms of each category). In some embodiments, the two or more semantic vectors may be non-zero vectors between which the cosine similarity function measures the cosine of the angle between the two or more vectors. In some embodiments, vectors are determined to be similar where the cosine of the angle between vectors is between zero and one in a positive space. In some instances, cosine similarity may be additionally determined when the cosine of the angle between vectors is above a predetermined threshold; Paragraph 0050, In various embodiments, the mapping is performed by learning the semantic similarity between a source sequence and a target sequence. In example embodiments, the semantic similarity, also referred to as semantic relevance, may be measured by a cosine similarity function sim (X, Y), where X represents the semantic vector of source sequence (i.e., derived from the seller's taxonomy) and Y represents the semantic vector of target sequence (i.e., derived from the category tree of the publication system). Both X and Y represent points in the shared semantic vector space. The output of the cosine similarity function represents how close those two points in the shared semantic vector space, i.e., how semantically similar between the source sequence and the target sequence. Generally, the best matched category of Y has the highest similarity score to X. The source sequence vector and target sequence vector have the same number of dimensions. For example, the source sequence represents an entry on the seller's inventory list (e.g., the seller's taxonomy entry and the item title) and the target sequence represents a category tree path (root-to-leaf) used by the publication system. In various embodiments, the target sequence is pre-computed before runtime and the source sequence is computed during runtime and then compared to the target sequence during runtime).
It would have been obvious to one ordinary skill in the art at the time the invention was filed to modify how the similarities between a first process matrix and a second process matrix are determined of the invention of Ma et al. to further specify that the similarities are determined using a cosine similarity function of the invention of Liu because doing so would allow the system to determine similarity between two vectors when the cosine of the angle between vectors is above a predetermined threshold (see Liu, Paragraph 0048). Further, the claimed invention is merely a combination of old elements, and in combination each element would have performed the same function as it did separately, and one of ordinary skill in the art would have recognized that the results of the combination were predictable.
Regarding claims 3 and 11 (Original), which are dependent of claims 1 and 9, the combination of Ma et al. and Linnell et al. discloses all the limitations in claims 1 and 9. Ma et al. further configured to correlate the plurality of event vectors by: joining a first subset of the plurality of event vectors to create a first process matrix, joining a second subset of the plurality of event vectors to create a second process matrix, determining a similarity between the first process matrix and the second process matrix, the similarity measured as a [Euclidean distance] between the first process matrix and the second process matrix, and identifying that the first process matrix and the second process matrix are different processes in the one or more processes based on the similarity being above a threshold (Paragraph 0157, According to the preferred, “combined” or “hybrid” segmentation and clustering approach, recorded event streams are concatenated (optionally following cleaning/normalization), and substantially similar subsequences (i.e. having a content similarity greater than a predetermined similarity threshold) that appear within an event stream more often than a predetermined frequency threshold and cannot be extended in length without creating larger overall changes in the clustering (e.g. greater than a predetermined weight or distance threshold) are identified; Paragraph 0163, Regardless of the particular manner in which feature vectors are generated, a distance matrix is computed for all pairs of subsequences. The preferred metric for the distance given the calculation of the feature vectors as described above is the Euclidean distance; however, other distance metrics can also be of value, for instance the cosine similarity, or the Levenshtein distance if the feature vectors are understood to be directly word sequences in the event language. Clusters of non-overlapping subsequences may then be identified according to similarity, using various techniques and without departing from the scope of the inventive concepts described herein. For example, in one embodiment a predetermined set k of pairs of subsequences characterized by the smallest distances between the elements of the pairs among the overall distance matrix may be selected as initial clusters representing k task types).
Although Ma et al. discloses all the limitations above and similarities between processes using a Euclidean distance, the combination of Ma et al. and Linnell et al. does not specifically disclose wherein the similarity is measured as a dot product.
However, Liu discloses to correlate the plurality of event vectors by: joining a first subset of the plurality of event vectors to create a first process matrix, joining a second subset of the plurality of event vectors to create a second process matrix, determining a similarity between the first process matrix and the second process matrix, the similarity measured as a dot product between the first process matrix and the second process matrix, and identifying that the first process matrix and the second process matrix are different processes in the one or more processes based on the similarity being above a threshold (Paragraph 0047, In some embodiments, in generating the set of shared semantic vectors, the vector component 222 determines a semantic similarity measured by a similarity function. In some instances, the similarity function is a cosine similarity function. The cosine similarity may be a dot product modified by a normalization of the dot products of the vectors. For example, the semantic similarity measure may be represented as X*Y (∥X∥*∥Y∥), where X is the source vector and Y is the target vector. Each vector X and Y may be floating points with n dimensions. X*Y may be the sum of the n dimensional floating points for the X vector and the Y vector. ∥X∥ may be a normalization of the dot product of the X vector. ∥Y∥ may be a normalization of the dot product of the Y vector. Although described in a specified embodiment with respect to cosine similarity functions, it should be understood that the semantic similarity may be determined by any suitable manner; Paragraph 0048, After generating a representation for each term of the one or more first categories and one or more second categories, the vector component 222 compares one or more terms for a specified first category of the one or more first categories to one or more terms for a specified second category of the one or more second categories using the cosine similarity function. The cosine similarity function may measure a similarity between two or more semantic vectors (e.g., vector representations of terms of each category). In some embodiments, the two or more semantic vectors may be non-zero vectors between which the cosine similarity function measures the cosine of the angle between the two or more vectors. In some embodiments, vectors are determined to be similar where the cosine of the angle between vectors is between zero and one in a positive space. In some instances, cosine similarity may be additionally determined when the cosine of the angle between vectors is above a predetermined threshold; Paragraph 0050, In various embodiments, the mapping is performed by learning the semantic similarity between a source sequence and a target sequence. In example embodiments, the semantic similarity, also referred to as semantic relevance, may be measured by a cosine similarity function sim (X, Y), where X represents the semantic vector of source sequence (i.e., derived from the seller's taxonomy) and Y represents the semantic vector of target sequence (i.e., derived from the category tree of the publication system). Both X and Y represent points in the shared semantic vector space. The output of the cosine similarity function represents how close those two points in the shared semantic vector space, i.e., how semantically similar between the source sequence and the target sequence. Generally, the best matched category of Y has the highest similarity score to X. The source sequence vector and target sequence vector have the same number of dimensions. For example, the source sequence represents an entry on the seller's inventory list (e.g., the seller's taxonomy entry and the item title) and the target sequence represents a category tree path (root-to-leaf) used by the publication system. In various embodiments, the target sequence is pre-computed before runtime and the source sequence is computed during runtime and then compared to the target sequence during runtime).
It would have been obvious to one ordinary skill in the art at the time the invention was filed to modify how the similarities between a first process matrix and a second process matrix are determined of the invention of Ma et al. to further specify that the similarities are determined using a cosine similarity function of the invention of Liu because doing so would allow the system to determine similarity between two vectors when the cosine of the angle between vectors is above a predetermined threshold (see Liu, Paragraph 0048). Further, the claimed invention is merely a combination of old elements, and in combination each element would have performed the same function as it did separately, and one of ordinary skill in the art would have recognized that the results of the combination were predictable.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MARJORIE PUJOLS-CRUZ whose telephone number is (571)272-4668. The examiner can normally be reached Mon-Thru 7:30 AM - 5:00 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Patricia H Munson can be reached on (571)270-5396. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/M.P./Examiner, Art Unit 3624                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 /PATRICIA H MUNSON/Supervisory Patent Examiner, Art Unit 3624