DETAILED ACTION
This communication is a Final Office Action rejection on the merits. Claims 1-16 are currently pending and have been addressed below.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
Applicant's arguments filed 06/14/2022 (related to the 103 Rejection) have been fully considered but are moot in view of new grounds of rejection. Applicant's amendments necessitated the new ground(s) of rejection presented in this Office action. Rejection based on a newly cited reference(s) follows.
Applicant's arguments filed on 06/14/2022 (related to the 101 Rejection) have been fully considered but they are not persuasive.
Applicant states, on page 7, that Independent claims 1 and 9 have been amended to recite, inter alia, that the plurality of data sources include a keystroke data including timestamps for when a key was pressed, the keystroke data including application telemetry associated with a software application in which the key presses were inputted. It is submitted that independent claims 1 and 9 as amended do not fall within any mathematical concepts grouping of abstract ideas, and therefore satisfy Step 2A, Prong One. Keystrokes and application telemetry are not mathematical concepts.
Examiner respectfully disagrees with Applicant. These claim elements are considered to be abstract ideas because they are directed to “mathematical concepts” which include “mathematical relationships.” This is a form of “mathematical relationships” because the system is organizing information through mathematical correlations to identify a process. If a claim limitation, under its broadest reasonable interpretation, covers “mathematical relationships,” then it falls within the “mathematical concepts” grouping of abstract ideas. Accordingly, the claim recites an abstract idea.
The mere nominal recitation of generic computer components does not take the claim out of the mathematical concepts grouping. The additional element of “keystroke and application telemetry” is merely used to monitor and save keystroke data, wherein the keystroke data can include a timestamp for when a key was pressed, the specific key that was pressed, etc. (Paragraph 0024). This is considered “field of use” MPEP 2106.05h at Step 2A, Prong 2, since the “keystroke and application telemetry” is merely used to collect information. At Step 2B, this is a conventional computer function of receiving or transmitting data over a network (See MPEP 2106.05d).  
Therefore, the claim does not include additional elements that are sufficient to amount significantly more than the judicial exception.  The claim is ineligible.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-16 are rejected under 35 U.S.C. 101 because the claimed invention is directed to a judicial exception (i.e. an abstract idea) without reciting significantly more. 

Independent Claim 1
Step One - First, pursuant to step 1 in the January 2019 Revised Patent Subject Matter Eligibility Guidance (“2019 PEG”) on 84 Fed. Reg. 53, the claim 1 is directed to an apparatus which is a statutory category.
Step 2A, Prong One - Claim 1 recites: A system for discovering business processes using unsupervised learning, the system is configured to: receive a plurality of sets of multimodal event data from a plurality of sources, each set of the multimodal event data including a plurality of event instances, the plurality of data sources including a keystroke data log including timestamps for when a key was pressed; associate each set of the multimodal event data with a respective vector representation, such that the plurality of event instances is represented as a plurality of event vectors, each of the respective vector representations being generated by a plurality of vectorizers that include an adaptor to read the multimodal event data and convert the multimodal event data to a corresponding vectorized data; correlate by a process mining server the plurality of event vectors using unsupervised learning to identify one or more processes from the vectorized data, wherein the vectorized data associated with at least one of the sets of multimodal data including tasks or events that occur with an unknown or variable duration separating occurrences within the tasks or events, and wherein the vectorized data associated with at least two of the sets of multimodal event data vectorizes data have at least two different modalities, the process mining server using time-windowing to group events from the multimodal event data into macro level tasks for the one or more processes; and generate and store a process model script for the one or more processes. These claim elements are considered to be abstract ideas because they are directed to “mathematical concepts” which include “mathematical relationships.” This is a form of “mathematical relationships” because the system is organizing information through mathematical correlations to identify a process. If a claim limitation, under its broadest reasonable interpretation, covers “mathematical relationships,” then it falls within the “mathematical concepts” grouping of abstract ideas. Accordingly, the claim recites an abstract idea. 
Step 2A Prong 2 - The judicial exception is not integrated into a practical application. Claim 1 includes additional elements: an unsupervised learning; a non-transitory computer readable medium; and keystroke data including application telemetry.
The unsupervised learning is merely used for predicting one or more processes from the vectorized data (Paragraph 0033). The non-transitory computer readable medium is merely used for storing instructions (Paragraph 0005). The “keystroke and application telemetry” is merely used to monitor and save keystroke data, wherein the keystroke data can include a timestamp for when a key was pressed, the specific key that was pressed, etc. (Paragraph 0024). Merely stating that the step is performed by a computer component results in “apply it” on a computer (MPEP 2106.05f). These elements of “unsupervised learning,” “non-transitory computer-readable medium,” and “keystroke and application telemetry” are recited at a high level of generality such that it amounts no more than mere instructions to apply the exception using a generic computer element. Further, the unsupervised learning is merely indicating a “particular technology environment” in which to apply a judicial exception, which includes limiting the abstract idea of collecting information (e.g. colleting input data), analyzing it (e.g. correlating the plurality of event vectors to identify one or more processes), and displaying certain results (e.g. generating a process model script). The “keystroke and application telemetry” is considered “field of use”, as it’s just used to collect data and the technology is not improved. Accordingly, alone and in combination, these additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. Therefore, the claim is directed to an abstract idea.
Step 2B - The claim does not include additional elements that are sufficient to amount significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the claims describe how to generally “apply” the concept of generating a process model script for the one or more processes. The specification shows that the unsupervised learning is merely used for predicting one or more processes from the vectorized data (Paragraph 0033). The non-transitory computer readable medium is merely used for storing instructions (Paragraph 0005). The “keystroke and application telemetry” is merely used to monitor and save keystroke data, wherein the keystroke data can include a timestamp for when a key was pressed, the specific key that was pressed, etc. (Paragraph 0024). In this case, the unsupervised learning is merely used as a tool to perform an abstract idea and the “keystroke and application telemetry” is considered a conventional computer function of receiving or transmitting data over a network (See MPEP 2106.05d). Thus, nothing in the claim adds significantly more to the abstract idea. The claim is ineligible.

Independent Claim 9
Step One - First, pursuant to step 1 in the January 2019 Revised Patent Subject Matter Eligibility Guidance (“2019 PEG”) on 84 Fed. Reg. 53, the claim 9 is directed to a method which is a statutory category.
Step 2A, Prong One - Claim 9 recites: A method for discovering business processes using unsupervised learning, the method comprising: receiving a plurality of sets of multimodal event data from a plurality of sources, each set of the multimodal event data including a plurality of event instances, the plurality of data sources including a keystroke data log including timestamps for when a key was pressed; associating each set of the multimodal event data with a respective vector representation, such that the plurality of event instances is represented as a plurality of event vectors, each of the respective vector representations being generated by a plurality of vectorizers that include an adaptor to read the multimodal event data and convert the multimodal event data to a corresponding vectorized data; correlating by a process mining server the plurality of event vectors using unsupervised learning to identify one or more processes from the vectorized data, wherein the vectorized data associated with at least one of the sets of multimodal data including tasks or events that occur with an unknown or variable duration separating occurrences within the tasks or events, and wherein the vectorized data associated with at least two of the sets of multimodal event data vectorizes data have at least two different modalities; the process mining server using time-windowing to group events from the multimodal event data into macro level tasks for the one or more processes; and generating and storing a process model script for the one or more processes. These claim elements are considered to be abstract ideas because they are directed to “mathematical concepts” which include “mathematical relationships.” This is a form of “mathematical relationships” because the method is organizing information through mathematical correlations to identify a process. If a claim limitation, under its broadest reasonable interpretation, covers “mathematical relationships,” then it falls within the “mathematical concepts” grouping of abstract ideas. Accordingly, the claim recites an abstract idea. 
Step 2A Prong 2 - The judicial exception is not integrated into a practical application. Claim 9 includes additional elements: an unsupervised learning; and keystroke data including application telemetry.
The unsupervised learning is merely used for predicting one or more processes from the vectorized data (Paragraph 0033). The “keystroke and application telemetry” is merely used to monitor and save keystroke data, wherein the keystroke data can include a timestamp for when a key was pressed, the specific key that was pressed, etc. (Paragraph 0024). Merely stating that the step is performed by a computer component results in “apply it” on a computer (MPEP 2106.05f). These elements of “unsupervised learning” and “keystroke and application telemetry” are recited at a high level of generality such that it amounts no more than mere instructions to apply the exception using a generic computer element. Further, the unsupervised learning is merely indicating a “particular technology environment” in which to apply a judicial exception, which includes limiting the abstract idea of collecting information (e.g. colleting input data), analyzing it (e.g. correlating the plurality of event vectors to identify one or more processes), and displaying certain results (e.g. generating a process model script). The “keystroke and application telemetry” is considered “field of use”, as it’s just used to collect data and the technology is not improved. Accordingly, alone and in combination, these additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. Therefore, the claim is directed to an abstract idea.
Step 2B - The claim does not include additional elements that are sufficient to amount significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the claims describe how to generally “apply” the concept of generating a process model script for the one or more processes. The specification shows that the unsupervised learning is merely used for predicting one or more processes from the vectorized data (Paragraph 0033). The “keystroke and application telemetry” is merely used to monitor and save keystroke data, wherein the keystroke data can include a timestamp for when a key was pressed, the specific key that was pressed, etc. (Paragraph 0024). In this case, the unsupervised learning is merely used as a tool to perform an abstract idea and the “keystroke and application telemetry” is considered a conventional computer function of receiving or transmitting data over a network (See MPEP 2106.05d). Thus, nothing in the claim adds significantly more to the abstract idea. The claim is ineligible.
Dependent claims 2-3 and 10-11 are not directed to any additional abstract ideas and are also not directed to any additional non-abstract claim elements. Rather, these claims offer further descriptive limitations of elements found in the independent claim and addressed above such as by specifying how correlation is used for determining a similarity or dissimilarity between the first process matrix and the second process matrix. These processes are similar to the abstract idea noted in the independent claim because they further the limitations of the independent claim which are directed to “mathematical concepts” which include “mathematical relationships.” In addition, no additional elements are integrated into the abstract idea. Therefore, the claims still recite an abstract idea that can be grouped into “mathematical concepts.”
Dependent claims 4 and 12 are not directed to additional abstract ideas, but are directed to an additional non-abstract claim element. The additional non-abstract claim element is a long short term memory (LSTM) neural network. The LSTM neural network is merely used for correlating the plurality of event vectors (Paragraph 0033). The LSTM neural network is considered a “particular technological environment” MPEP 2106.05h at Step 2A. Also, the LSTM neural network is merely used as a tool to perform an abstract idea at Step 2B. Thus, nothing in the claim adds significantly more to the abstract idea. The claim is ineligible.
Dependent claims 5-8 and 13-16 are not directed to any additional abstract ideas and are also not directed to any additional non-abstract claim elements. Rather, these claims offer further descriptive limitations of elements found in the independent claim and addressed above such as by specifying: wherein the process model script includes one or more directed graphs; wherein the process model script is a robotic process automation (RPA) script; wherein the plurality of sources includes two or more selected from the group consisting of: one or more Internet Information Services (IIS) log files, one or more Apache log file, one or more application log files, one or more standard operating procedure (SOP) manuals, one or more screen capture logs, one or more keystroke logs, one or more business process documents (BPDs); and wherein the process model script identifies higher probability processes in the one or more processes. These processes are similar to the abstract idea noted in the independent claim because they further the limitations of the independent claim which are directed to “mathematical concepts” which include “mathematical relationships.” In addition, no additional elements are integrated into the abstract idea. Therefore, the claims still recite an abstract idea that can be grouped into “mathematical concepts.”

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.


Claims 1, 4-9, 12-14, and 15-16 are rejected under 35 U.S.C. 102(a)(1) as being anticipated over Dechu et al. (US 2020/0320383 A1), in view of Ma et al. (US 2020/0206920 A1).
Regarding claim 1 (Currently Amended), Dechu et al. discloses a system for discovering business processes using unsupervised learning (Abstract, Methods, systems, and computer program products for complete trace prediction of process instance using multimodal attributes are provided herein. A computer-implemented method includes receiving a request to resolve an issue related to a product and/or a service, wherein the request comprises multimodal data corresponding to at least two modalities; creating a case based on the request, wherein the case comprises a plurality of case attributes corresponding to (i) queue state information related to a status of other pending requests and (ii) the multimodal data; generating a vector representation for the case based on the plurality of case attributes; providing the vector representation as input to a joint machine learning model to determine a sequence of events for resolving the issue, wherein the joint machine learning model is trained based at least in part on prior requests and sequences of events corresponding to the prior requests; Paragraph 0019, The historical event logs 110 are also provided as input to an event embedding encoder 112. The event embedding encoder 112 encodes events from the historical event logs 110 to create event embeddings (e.g., vector representations of events). For example, the event embedding encoder 112 learns an unsupervised embedding for each event as in Act2Vec, and the unsupervised task is used to predict an event from its context), the system including a non-transitory computer-readable medium storing computer-executable instructions thereon such that when the instructions are executed (Paragraph 0042, A data processing system suitable for storing and/or executing program code will include at least one processor 602 coupled directly or indirectly to memory elements 604 through a system bus 610. The memory elements can include local memory employed during actual implementation of the program code, bulk storage, and cache memories which provide temporary storage of at least some program code in order to reduce the number of times code must be retrieved from bulk storage during implementation), the system is configured to: 
receive a plurality of sets of multimodal event data from a plurality of sources, each set of the multimodal event data including a plurality of event instances (Paragraph 0015, An embodiment described herein includes predicting a complete trace for a process instance (such as a business process, for example) using multi-modal inputs available at the time the process instance is initiated. The multimodal inputs may include, for example, case features, images, comments, queue features, etc. Additionally, at least one of the example embodiments described herein includes learning a joint machine learning model that takes multimodal inputs in vector form), …;
associate each set of the multimodal event data with a respective vector representation, such that the plurality of event instances is represented as a plurality of event vectors, each of the respective vector representations being generated by a plurality of vectorizers that include an adaptor to read the multimodal event data and convert the multimodal event data to a corresponding vectorized data (Paragraph 0015, An embodiment described herein includes predicting a complete trace for a process instance (such as a business process, for example) using multi-modal inputs available at the time the process instance is initiated. The multimodal inputs may include, for example, case features, images, comments, queue features, etc. Additionally, at least one of the example embodiments described herein includes learning a joint machine learning model that takes multimodal inputs in vector form; Paragraph 0017, FIG. 1 is a diagram illustrating a system architecture, according to an embodiment of the invention. By way of illustration in FIG. 1, a vector embedding generator 104 generates vector embeddings for multimodal data 102. As an example, the multimodal data 102 may include data relating to images, comments, audio, video, etc. The vector embedding generator 104 may generate one or more vector embeddings for each type of data; Examiner interprets the vector embedding generator 104 as the as the adaptor to read the multimodal event data and convert the multimodal event data to a corresponding vectorized data); 
correlate by a process mining server the plurality of event vectors using unsupervised learning to identify one or more processes from the vectorized data (Paragraph 0019, The historical event logs 110 are also provided as input to an event embedding encoder 112. The event embedding encoder 112 encodes events from the historical event logs 110 to create event embeddings (e.g., vector representations of events). For example, the event embedding encoder 112 learns an unsupervised embedding for each event as in Act2Vec, and the unsupervised task is used to predict an event from its context; Paragraph 0020, The system architecture depicted in FIG. 1 also includes a joint model trainer 114 comprising a case embedding generator 116 and an event sequence prediction model 118. The case embedding generator 116 generates case vectors for each case based on the vector embeddings of the multimodal data 102, the vectors generated by the queue state embedding generator 108, and the vectors generated by the event embedding encoder 112. For example, the vector embeddings of the multimodal data, the queue state, and the events may be concatenated to generate and/or obtain one vector for each case; Paragraph 0021, The case vectors are provided as input to the event sequence prediction model 118. The event sequence prediction model 118 learns associations between the cases and complete traces based on the case vectors and the event embeddings. The joint model trainer 114 outputs the trained joint model 120; Examiner interprets the event sequence prediction model 118 as the process mining server), …; 
and generate and store a process model script for the one or more processes (Figure 1, item 128; Paragraph 0021, The case vectors are provided as input to the event sequence prediction model 118. The event sequence prediction model 118 learns associations between the cases and complete traces based on the case vectors and the event embeddings. The joint model trainer 114 outputs the trained joint model 120); Paragraph 0022, The sequential event predictor 126 determines a complete trace 128 for the new case document 122 using the trained joint model 120. In the example depicted in FIG. 1, the complete trace 128 includes five sequential events (i.e., E1, E2, 3, E4 and E5), wherein the last event (i.e., E5) corresponds to an end of trace (EOT) event).
Although Dechu et al. discloses to: receive a plurality of sets of multimodal event data from a plurality of sources (Figure 1, item 102, Multimodal Data); and correlate by a process mining server the plurality of event vectors using unsupervised learning to identify one or more processes from the vectorized data (see Figure 1 and Paragraph 0019), Dechu et al. does not specifically disclose the specifics of how the data is vectorized (e.g. embedded) and wherein the plurality of data sources including a keystroke data log including timestamps for when a key was pressed, the keystroke data including application telemetry associated with a software application in which the key presses were inputted.
However, Ma et al. discloses a system for discovering business processes using unsupervised learning (Paragraph 0002, The present invention relates to process automation, and more specifically, this invention relates to systems and methods for identifying processes for robotic automation; Paragraph 0148, As an example, if during a segmentation operation 306 the following sequences are observed five times each within a given event stream or set of event streams: ABCDEFGHI, ABCDEFGPQ, XYZDEFGHI, and XYZDEFGPQ, then it is plausible to assume that the sequence “DEFG” represents a common individual task, and that the sequences “ABC” and “PQ” are instances of other tasks and should not be combined with the sequence “DEFG.” The frequency thresholds that indicate whether or not to extend a sequence are experimentally determined, in various approaches; Paragraph 0152, Moreover , in certain approaches segmentation per operation 306 of method 300 uses unsupervised models to delineate different tasks within event streams), the system including a non-transitory computer-readable medium storing computer-executable instructions thereon such that when the instructions are executed (Paragraph 0009, In another implementation, a computer program product for discovering processes for robotic process automation (RPA) includes a computer readable storage medium having program instructions embodied therewith. The computer readable storage medium is not a transitory signal per se, and the program instructions are executable by a processor to cause the processor to), the system is configured to: 
receive a plurality of sets of multimodal event data from a plurality of sources, each set of the multimodal event data including a plurality of event instances, the plurality of data sources including a keystroke data log including timestamps for when a key was pressed, the keystroke data including application telemetry associated with a software application in which the key presses were inputted (Paragraph 0022, An “event stream” as referenced in the present disclosure is a recorded sequence of UI actions (e.g. mouse clicks, keyboard strokes, interactions with various elements of a graphical user interface (GUI), auditory input, eye movements and/or blinks, pauses, gestures (including gestures received/input via a touchscreen device, as well as gestures performed in view of a camera, e.g. for VR applications), etc.) and/or associated device actions (e.g. OS actions, API calls, calls to data source(s), etc.) for a particular user over a particular time period. In preferred implementations, an “event stream” may also include contextual information associated with the user's interactions, such as an identity of the user, various data sources relied upon/used in the course of the user's interactions, content of the computing device's display, including but not limited to content of a particular window, application, UI, etc., either in raw form or processed to revel, for instance, key-value pairs, and/or other elements displayed on the screen, particularly contextual information such as the window/application/UI/UI element, etc. upon which the user is focused; time of day at which various interactions are performed, one or more “groups” with which the user is associated (e.g. a project name, a workgroup, a department of the enterprise, a position or rank of the user, various permissions associated with the user, etc.), an identification of a device, operating system, application, etc. associated with the user performing various interactions within the event stream, or any other relevant contextual information that may be provided by the computing device during the course of the event stream, whether such information is directly or indirectly related to input provided by the user, as would be understood by a person having ordinary skill in the art upon reading the present disclosures; Paragraph 0024, “Event streams” may be conceptualized as a series of events, where each “event” includes any suitable number or combination of UI actions within an event stream. For example, in one approach an event may include a particular keystroke, mouse click, or combination thereof performed within a given application running on a computing device. One concrete example would be a left mouse click while a particular key, such as Control, Shift, Alt, etc. is depressed and the computing device is “focused” on a spreadsheet application. Similarly, the meaning of a keypress “enter” depends upon the application/window/UI element upon which a user is focused, e.g. pressing the “enter” key when an application icon is selected may launch the application, while pressing “enter” when focused on a cell of a spreadsheet may cause a function to be executed or value entered into the cell. Accordingly, such events may indicate to perform a certain operation on a certain data value represented within a table or other data structure; Examiner interprets the “plurality of sets of multimodal event data from a plurality of sources” as the keyboard strokes and the gestures); 
associate each set of the multimodal event data with a respective vector representation, such that the plurality of event instances is represented as a plurality of event vectors, each of the respective vector representations being generated by a plurality of vectorizers that include an adaptor to read the multimodal event data and convert the multimodal event data to a corresponding vectorized data (Paragraph 0162,  Preferably, the feature vectors are calculated using a known auto-encoder and yield dense feature vectors for each window, e.g. vectors having a dimensionality in a range from about 50 to about 100 for a window length of about 30 events per subsequence. Exemplary auto-encoders suitable for calculating features for the various subsequences may include conventional auto-encoder networks, language-oriented networks (e.g. skip-grams), fastText, etc. as would be appreciated by a person having ordinary skill in the art upon reading the present disclosure. Alternatively, feature vectors can be calculated in this manner for each event in the event stream, using any of the methods above, and the feature vector for a window of length N starting at position p is the concatenation of the feature vectors of the events corresponding to the positions p through p+N−1); 
correlate by a process mining server the plurality of event vectors using unsupervised learning to identify one or more processes from the vectorized data (Paragraph 0148, As an example, if during a segmentation operation 306 the following sequences are observed five times each within a given event stream or set of event streams: ABCDEFGHI, ABCDEFGPQ, XYZDEFGHI, and XYZDEFGPQ, then it is plausible to assume that the sequence “DEFG” represents a common individual task, and that the sequences “ABC” and “PQ” are instances of other tasks and should not be combined with the sequence “DEFG.” The frequency thresholds that indicate whether or not to extend a sequence are experimentally determined, in various approaches; Paragraph 0152, Moreover , in certain approaches segmentation per operation 306 of method 300 uses unsupervised models to delineate different tasks within event streams), wherein the vectorized data associated with at least one of the sets of multimodal data including tasks or events that occur with an unknown or variable duration separating occurrences within the tasks or events (Paragraph 0141, The simplest implementation of segmentation per method 300 and operation 306 involves analyzing the text of concatenated event streams to extract common subsequences of a predetermined length. A major and unique challenge for repetitive pattern discovery in RPA mining is that the length of sequences of events that implement the same task is not the same, and that there is a duration associated with each event. Notably, this challenge is unique in the context of RPA mining, even though other fields such as bioinformatics face similar problems with respect to pattern discovery. Without consideration of the duration of events, an event sequence can be represented by a sequence of characters, without loss of generality. For instance, suppose that ABABCABABCABABC is an event stream. It contains repetitive sequence patterns that may not be unique. For instance, repetitive sequence patterns with length 2 are AB; BA; BC in the above event stream. The repetitive sequence patterns with length 4 are ABAB; BABC; ABCA; and BCAB, while the repetitive sequence patterns with length 5 are ABABC; BABCA; ABCAB; BCABA; and CABAB; It can be noted that the claim language is written in alternative form.  The limitation taught by Ma et al. is based on “including tasks or events that occur with an unknown duration"), and wherein the vectorized data associated with at least two of the sets of multimodal event data vectorizes data have at least two different modalities (Paragraph 0024, “Event streams” may be conceptualized as a series of events, where each “event” includes any suitable number or combination of UI actions within an event stream. For example, in one approach an event may include a particular keystroke, mouse click, or combination thereof performed within a given application running on a computing device. One concrete example would be a left mouse click while a particular key, such as Control, Shift, Alt, etc. is depressed and the computing device is “focused” on a spreadsheet application. Similarly, the meaning of a keypress “enter” depends upon the application/window/UI element upon which a user is focused, e.g. pressing the “enter” key when an application icon is selected may launch the application, while pressing “enter” when focused on a cell of a spreadsheet may cause a function to be executed or value entered into the cell. Accordingly, such events may indicate to perform a certain operation on a certain data value represented within a table or other data structure; Paragraph 0162,  Preferably, the feature vectors are calculated using a known auto-encoder and yield dense feature vectors for each window, e.g. vectors having a dimensionality in a range from about 50 to about 100 for a window length of about 30 events per subsequence. Exemplary auto-encoders suitable for calculating features for the various subsequences may include conventional auto-encoder networks, language-oriented networks (e.g. skip-grams), fastText, etc. as would be appreciated by a person having ordinary skill in the art upon reading the present disclosure. Alternatively, feature vectors can be calculated in this manner for each event in the event stream, using any of the methods above, and the feature vector for a window of length N starting at position p is the concatenation of the feature vectors of the events corresponding to the positions p through p+N−1), the process mining server using time-windowing to group events from the multimodal event data into macro level tasks for the one or more processes (Paragraph 0022, An “event stream” as referenced in the present disclosure is a recorded sequence of UI actions (e.g. mouse clicks, keyboard strokes, interactions with various elements of a graphical user interface (GUI), auditory input, eye movements and/or blinks, pauses, gestures (including gestures received/input via a touchscreen device, as well as gestures performed in view of a camera, e.g. for VR applications), etc.) and/or associated device actions (e.g. OS actions, API calls, calls to data source(s), etc.) for a particular user over a particular time period; Paragraph 0024, Preferably, “events” refer to a single point in time (or small window of time, e.g. less than one second or an amount of time required to perform a more complex action such as a double-click, gesture, or other action that is defined by multiple related inputs intended to be interpreted as a single input), along with the associated interactions and/or device actions occurring at the given point in time. In all cases, where an event encompasses multiple user interactions, device actions, etc., these are contiguous interactions, actions, etc. forming a single linear sequence); 
and generate and store a process model script for the one or more processes (Paragraph 0223, Regardless of whether segmentation and clustering are performed separately or in a combined fashion, in operation 310 of method 300, one or more processes for robotic automation (RPA) are identified from among the clustered traces. Identifying processes for RPA includes identifying segments/traces wherein a human-performed task is subject to automation (e.g. capable of being understood and performed by a computer without human direction); Paragraph 0229, For instance, in several preferred approaches method 300 may include selectively building a robotic process automation (RPA) model for at least one cluster based at least in part on a frequency of one or more variants of the clustered traces. The RPA model(s) may include or be represented by a directed, acyclic graph (DAG) describing some or all of the traces of a given cluster. Preferably, selectively building the RPA model comprises identifying a minimum-weight, maximum-frequency path from an initial node of the DAG to a final node of the DAG; Paragraph 0288, Recorded event streams may be stored, preferably in one or more tables of a database, in a peripheral 120 configured for data storage, e.g. a storage unit 220 as shown in FIG. 2).
It would have been obvious to one ordinary skill in the art at the time the invention was filed to modify the unsupervised learning to identify one or more processes from the vectorized data, wherein the vectorized data includes a plurality of sets of multimodal event data of the invention of Dechu et al. to further specify how the unsupervised learning clusters the data to identify one or more processes of the invention of Ma et al. because doing so would allow the system to identify one or more processes for robotic automation (RPA) from among the clustered traces (See Ma et al., Paragraph 0223). Further, the claimed invention is merely a combination of old elements, and in combination each element would have performed the same function as it did separately, and one of ordinary skill in the art would have recognized that the results of the combination were predictable.
Regarding claim 9 (Currently Amended), Dechu et al. discloses a method for discovering business processes using unsupervised learning (Abstract, Methods, systems, and computer program products for complete trace prediction of process instance using multimodal attributes are provided herein. A computer-implemented method includes receiving a request to resolve an issue related to a product and/or a service, wherein the request comprises multimodal data corresponding to at least two modalities; creating a case based on the request, wherein the case comprises a plurality of case attributes corresponding to (i) queue state information related to a status of other pending requests and (ii) the multimodal data; generating a vector representation for the case based on the plurality of case attributes; providing the vector representation as input to a joint machine learning model to determine a sequence of events for resolving the issue, wherein the joint machine learning model is trained based at least in part on prior requests and sequences of events corresponding to the prior requests; Paragraph 0019, The historical event logs 110 are also provided as input to an event embedding encoder 112. The event embedding encoder 112 encodes events from the historical event logs 110 to create event embeddings (e.g., vector representations of events). For example, the event embedding encoder 112 learns an unsupervised embedding for each event as in Act2Vec, and the unsupervised task is used to predict an event from its context), the method comprising: 
receiving a plurality of sets of multimodal event data from a plurality of sources, each set of the multimodal event data including a plurality of event instances (Paragraph 0015, An embodiment described herein includes predicting a complete trace for a process instance (such as a business process, for example) using multi-modal inputs available at the time the process instance is initiated. The multimodal inputs may include, for example, case features, images, comments, queue features, etc. Additionally, at least one of the example embodiments described herein includes learning a joint machine learning model that takes multimodal inputs in vector form), …;
associating each set of the multimodal event data with a respective vector representation, such that the plurality of event instances is represented as a plurality of event vectors, each of the respective vector representations being generated by a plurality of vectorizers that include an adaptor to read the multimodal event data and convert the multimodal event data to a corresponding vectorized data (Paragraph 0015, An embodiment described herein includes predicting a complete trace for a process instance (such as a business process, for example) using multi-modal inputs available at the time the process instance is initiated. The multimodal inputs may include, for example, case features, images, comments, queue features, etc. Additionally, at least one of the example embodiments described herein includes learning a joint machine learning model that takes multimodal inputs in vector form; Paragraph 0017, FIG. 1 is a diagram illustrating a system architecture, according to an embodiment of the invention. By way of illustration in FIG. 1, a vector embedding generator 104 generates vector embeddings for multimodal data 102. As an example, the multimodal data 102 may include data relating to images, comments, audio, video, etc. The vector embedding generator 104 may generate one or more vector embeddings for each type of data; Examiner interprets the vector embedding generator 104 as the as the adaptor to read the multimodal event data and convert the multimodal event data to a corresponding vectorized data); 
correlating by a process mining server the plurality of event vectors using unsupervised learning to identify one or more processes from the vectorized data (Paragraph 0019, The historical event logs 110 are also provided as input to an event embedding encoder 112. The event embedding encoder 112 encodes events from the historical event logs 110 to create event embeddings (e.g., vector representations of events). For example, the event embedding encoder 112 learns an unsupervised embedding for each event as in Act2Vec, and the unsupervised task is used to predict an event from its context; Paragraph 0020, The system architecture depicted in FIG. 1 also includes a joint model trainer 114 comprising a case embedding generator 116 and an event sequence prediction model 118. The case embedding generator 116 generates case vectors for each case based on the vector embeddings of the multimodal data 102, the vectors generated by the queue state embedding generator 108, and the vectors generated by the event embedding encoder 112. For example, the vector embeddings of the multimodal data, the queue state, and the events may be concatenated to generate and/or obtain one vector for each case; Paragraph 0021, The case vectors are provided as input to the event sequence prediction model 118. The event sequence prediction model 118 learns associations between the cases and complete traces based on the case vectors and the event embeddings. The joint model trainer 114 outputs the trained joint model 120; Examiner interprets the event sequence prediction model 118 as the process mining server), …; 
and generating and storing a process model script for the one or more processes (Figure 1, item 128; Paragraph 0021, The case vectors are provided as input to the event sequence prediction model 118. The event sequence prediction model 118 learns associations between the cases and complete traces based on the case vectors and the event embeddings. The joint model trainer 114 outputs the trained joint model 120); Paragraph 0022, The sequential event predictor 126 determines a complete trace 128 for the new case document 122 using the trained joint model 120. In the example depicted in FIG. 1, the complete trace 128 includes five sequential events (i.e., E1, E2, 3, E4 and E5), wherein the last event (i.e., E5) corresponds to an end of trace (EOT) event).
Although Dechu et al. discloses receiving a plurality of sets of multimodal event data from a plurality of sources (Figure 1, item 102, Multimodal Data); and correlating by a process mining server the plurality of event vectors using unsupervised learning to identify one or more processes from the vectorized data (see Figure 1 and Paragraph 0019), Dechu et al. does not specifically disclose the specifics of how the data is vectorized (e.g. embedded) and wherein the plurality of data sources including a keystroke data log including timestamps for when a key was pressed, the keystroke data including application telemetry associated with a software application in which the key presses were inputted.
However, Ma et al. discloses a method for discovering business processes using unsupervised learning (Paragraph 0002, The present invention relates to process automation, and more specifically, this invention relates to systems and methods for identifying processes for robotic automation; Paragraph 0148, As an example, if during a segmentation operation 306 the following sequences are observed five times each within a given event stream or set of event streams: ABCDEFGHI, ABCDEFGPQ, XYZDEFGHI, and XYZDEFGPQ, then it is plausible to assume that the sequence “DEFG” represents a common individual task, and that the sequences “ABC” and “PQ” are instances of other tasks and should not be combined with the sequence “DEFG.” The frequency thresholds that indicate whether or not to extend a sequence are experimentally determined, in various approaches; Paragraph 0152, Moreover , in certain approaches segmentation per operation 306 of method 300 uses unsupervised models to delineate different tasks within event streams), the method comprising: 
receiving a plurality of sets of multimodal event data from a plurality of sources, each set of the multimodal event data including a plurality of event instances, the plurality of data sources including a keystroke data log including timestamps for when a key was pressed, the keystroke data including application telemetry associated with a software application in which the key presses were inputted (Paragraph 0022, An “event stream” as referenced in the present disclosure is a recorded sequence of UI actions (e.g. mouse clicks, keyboard strokes, interactions with various elements of a graphical user interface (GUI), auditory input, eye movements and/or blinks, pauses, gestures (including gestures received/input via a touchscreen device, as well as gestures performed in view of a camera, e.g. for VR applications), etc.) and/or associated device actions (e.g. OS actions, API calls, calls to data source(s), etc.) for a particular user over a particular time period. In preferred implementations, an “event stream” may also include contextual information associated with the user's interactions, such as an identity of the user, various data sources relied upon/used in the course of the user's interactions, content of the computing device's display, including but not limited to content of a particular window, application, UI, etc., either in raw form or processed to revel, for instance, key-value pairs, and/or other elements displayed on the screen, particularly contextual information such as the window/application/UI/UI element, etc. upon which the user is focused; time of day at which various interactions are performed, one or more “groups” with which the user is associated (e.g. a project name, a workgroup, a department of the enterprise, a position or rank of the user, various permissions associated with the user, etc.), an identification of a device, operating system, application, etc. associated with the user performing various interactions within the event stream, or any other relevant contextual information that may be provided by the computing device during the course of the event stream, whether such information is directly or indirectly related to input provided by the user, as would be understood by a person having ordinary skill in the art upon reading the present disclosures; Paragraph 0024, “Event streams” may be conceptualized as a series of events, where each “event” includes any suitable number or combination of UI actions within an event stream. For example, in one approach an event may include a particular keystroke, mouse click, or combination thereof performed within a given application running on a computing device. One concrete example would be a left mouse click while a particular key, such as Control, Shift, Alt, etc. is depressed and the computing device is “focused” on a spreadsheet application. Similarly, the meaning of a keypress “enter” depends upon the application/window/UI element upon which a user is focused, e.g. pressing the “enter” key when an application icon is selected may launch the application, while pressing “enter” when focused on a cell of a spreadsheet may cause a function to be executed or value entered into the cell. Accordingly, such events may indicate to perform a certain operation on a certain data value represented within a table or other data structure; Examiner interprets the “plurality of sets of multimodal event data from a plurality of sources” as the keyboard strokes and the gestures); 
associating each set of the multimodal event data with a respective vector representation, such that the plurality of event instances is represented as a plurality of event vectors, each of the respective vector representations being generated by a plurality of vectorizers that include an adaptor to read the multimodal event data and convert the multimodal event data to a corresponding vectorized data (Paragraph 0162,  Preferably, the feature vectors are calculated using a known auto-encoder and yield dense feature vectors for each window, e.g. vectors having a dimensionality in a range from about 50 to about 100 for a window length of about 30 events per subsequence. Exemplary auto-encoders suitable for calculating features for the various subsequences may include conventional auto-encoder networks, language-oriented networks (e.g. skip-grams), fastText, etc. as would be appreciated by a person having ordinary skill in the art upon reading the present disclosure. Alternatively, feature vectors can be calculated in this manner for each event in the event stream, using any of the methods above, and the feature vector for a window of length N starting at position p is the concatenation of the feature vectors of the events corresponding to the positions p through p+N−1); 
correlating by a process mining server the plurality of event vectors using unsupervised learning to identify one or more processes from the vectorized data (Paragraph 0148, As an example, if during a segmentation operation 306 the following sequences are observed five times each within a given event stream or set of event streams: ABCDEFGHI, ABCDEFGPQ, XYZDEFGHI, and XYZDEFGPQ, then it is plausible to assume that the sequence “DEFG” represents a common individual task, and that the sequences “ABC” and “PQ” are instances of other tasks and should not be combined with the sequence “DEFG.” The frequency thresholds that indicate whether or not to extend a sequence are experimentally determined, in various approaches; Paragraph 0152, Moreover , in certain approaches segmentation per operation 306 of method 300 uses unsupervised models to delineate different tasks within event streams), wherein the vectorized data associated with at least one of the sets of multimodal data including tasks or events that occur with an unknown or variable duration separating occurrences within the tasks or events (Paragraph 0141, The simplest implementation of segmentation per method 300 and operation 306 involves analyzing the text of concatenated event streams to extract common subsequences of a predetermined length. A major and unique challenge for repetitive pattern discovery in RPA mining is that the length of sequences of events that implement the same task is not the same, and that there is a duration associated with each event. Notably, this challenge is unique in the context of RPA mining, even though other fields such as bioinformatics face similar problems with respect to pattern discovery. Without consideration of the duration of events, an event sequence can be represented by a sequence of characters, without loss of generality. For instance, suppose that ABABCABABCABABC is an event stream. It contains repetitive sequence patterns that may not be unique. For instance, repetitive sequence patterns with length 2 are AB; BA; BC in the above event stream. The repetitive sequence patterns with length 4 are ABAB; BABC; ABCA; and BCAB, while the repetitive sequence patterns with length 5 are ABABC; BABCA; ABCAB; BCABA; and CABAB; It can be noted that the claim language is written in alternative form.  The limitation taught by Ma et al. is based on “including tasks or events that occur with an unknown duration"), and wherein the vectorized data associated with at least two of the sets of multimodal event data vectorizes data have at least two different modalities (Paragraph 0024, “Event streams” may be conceptualized as a series of events, where each “event” includes any suitable number or combination of UI actions within an event stream. For example, in one approach an event may include a particular keystroke, mouse click, or combination thereof performed within a given application running on a computing device. One concrete example would be a left mouse click while a particular key, such as Control, Shift, Alt, etc. is depressed and the computing device is “focused” on a spreadsheet application. Similarly, the meaning of a keypress “enter” depends upon the application/window/UI element upon which a user is focused, e.g. pressing the “enter” key when an application icon is selected may launch the application, while pressing “enter” when focused on a cell of a spreadsheet may cause a function to be executed or value entered into the cell. Accordingly, such events may indicate to perform a certain operation on a certain data value represented within a table or other data structure; Paragraph 0162,  Preferably, the feature vectors are calculated using a known auto-encoder and yield dense feature vectors for each window, e.g. vectors having a dimensionality in a range from about 50 to about 100 for a window length of about 30 events per subsequence. Exemplary auto-encoders suitable for calculating features for the various subsequences may include conventional auto-encoder networks, language-oriented networks (e.g. skip-grams), fastText, etc. as would be appreciated by a person having ordinary skill in the art upon reading the present disclosure. Alternatively, feature vectors can be calculated in this manner for each event in the event stream, using any of the methods above, and the feature vector for a window of length N starting at position p is the concatenation of the feature vectors of the events corresponding to the positions p through p+N−1), the process mining server using time-windowing to group events from the multimodal event data into macro level tasks for the one or more processes (Paragraph 0022, An “event stream” as referenced in the present disclosure is a recorded sequence of UI actions (e.g. mouse clicks, keyboard strokes, interactions with various elements of a graphical user interface (GUI), auditory input, eye movements and/or blinks, pauses, gestures (including gestures received/input via a touchscreen device, as well as gestures performed in view of a camera, e.g. for VR applications), etc.) and/or associated device actions (e.g. OS actions, API calls, calls to data source(s), etc.) for a particular user over a particular time period; Paragraph 0024, Preferably, “events” refer to a single point in time (or small window of time, e.g. less than one second or an amount of time required to perform a more complex action such as a double-click, gesture, or other action that is defined by multiple related inputs intended to be interpreted as a single input), along with the associated interactions and/or device actions occurring at the given point in time. In all cases, where an event encompasses multiple user interactions, device actions, etc., these are contiguous interactions, actions, etc. forming a single linear sequence); 
and generating and storing a process model script for the one or more processes (Paragraph 0223, Regardless of whether segmentation and clustering are performed separately or in a combined fashion, in operation 310 of method 300, one or more processes for robotic automation (RPA) are identified from among the clustered traces. Identifying processes for RPA includes identifying segments/traces wherein a human-performed task is subject to automation (e.g. capable of being understood and performed by a computer without human direction); Paragraph 0229, For instance, in several preferred approaches method 300 may include selectively building a robotic process automation (RPA) model for at least one cluster based at least in part on a frequency of one or more variants of the clustered traces. The RPA model(s) may include or be represented by a directed, acyclic graph (DAG) describing some or all of the traces of a given cluster. Preferably, selectively building the RPA model comprises identifying a minimum-weight, maximum-frequency path from an initial node of the DAG to a final node of the DAG; Paragraph 0288, Recorded event streams may be stored, preferably in one or more tables of a database, in a peripheral 120 configured for data storage, e.g. a storage unit 220 as shown in FIG. 2).
It would have been obvious to one ordinary skill in the art at the time the invention was filed to modify the unsupervised learning to identify one or more processes from the vectorized data, wherein the vectorized data includes a plurality of sets of multimodal event data of the invention of Dechu et al. to further specify how the unsupervised learning clusters the data to identify one or more processes of the invention of Ma et al. because doing so would allow the system to identify one or more processes for robotic automation (RPA) from among the clustered traces (See Ma et al., Paragraph 0223). Further, the claimed invention is merely a combination of old elements, and in combination each element would have performed the same function as it did separately, and one of ordinary skill in the art would have recognized that the results of the combination were predictable.
Regarding claims 4 and 12 (Original), which are dependent of claims 1 and 9, the combination of Dechu et al. and Ma et al. discloses all the limitations in claims 1 and 9. Dechu et al. further configured to correlate the plurality of event vectors using a long short term memory (LSTM) neural network (Paragraph 0021, The case vectors are provided as input to the event sequence prediction model 118. The event sequence prediction model 118 learns associations between the cases and complete traces based on the case vectors and the event embeddings. The joint model trainer 114 outputs the trained joint model 120; Paragraph 0024, In FIG. 2, iterations of the LSTM 212 correspond to LSTM 212-1 . . . LSTM 212-N, and events from the event dictionary are represented as E0 . . . EN-1. Event vectors are created for the events and are provided as input to the LSTM 212 model. In the non-limiting example shown in FIG. 2, the event vectors are created using Event2Vec. At each iteration, the LSTM 212 model outputs a probability distribution for a given event represented as log P(E1) . . . log P(EN). It is noted that each of LSTM 212-1 . . . LSTM 212-N may include multiple LSTM layers; Paragraph 0035, The joint machine-learning model may include two or more of: a convolutional neural network; a recurrent neural network; and a long short-term memory (LSTM) network).
Regarding claims 5 and 13 (Original), which are dependent of claims 1 and 9, the combination of Dechu et al. and Ma et al. discloses all the limitations in claims 1 and 9. Although Dechu et al. discloses a process model script (Paragraphs 0021-0022), Dechu et al. does not specifically disclose wherein the process model script includes one or more directed graphs.
However, Ma et al. discloses wherein the process model script includes one or more directed graphs (Paragraph 0229, The RPA model(s) may include or be represented by a directed, acyclic graph (DAG) describing some or all of the traces of a given cluster).
It would have been obvious to one ordinary skill in the art at the time the invention was filed to use the process model script of the invention of Dechu et al. to further specify wherein the process model script includes one or more directed graphs of the invention of Ma et al. because doing so would allow the system to describe some or all of the traces of a given cluster by using a DAG (See Ma et al., Paragraph 0229). Further, the claimed invention is merely a combination of old elements, and in combination each element would have performed the same function as it did separately, and one of ordinary skill in the art would have recognized that the results of the combination were predictable.
Regarding claims 6 and 14 (Original), which are dependent of claims 1 and 9, the combination of Dechu et al. and Ma et al. discloses all the limitations in claims 1 and 9. Although Dechu et al. discloses a process model script (Paragraphs 0021-0022), Dechu et al. does not specifically disclose wherein the process model script is a robotic process automation (RPA) script.
However, Ma et al. discloses wherein the process model script is a robotic process automation (RPA) script (Paragraph 0229, For instance, in several preferred approaches method 300 may include selectively building a robotic process automation (RPA) model for at least one cluster based at least in part on a frequency of one or more variants of the clustered traces. The RPA model(s) may include or be represented by a directed, acyclic graph (DAG) describing some or all of the traces of a given cluster. Preferably, selectively building the RPA model comprises identifying a minimum-weight, maximum-frequency path from an initial node of the DAG to a final node of the DAG).
It would have been obvious to one ordinary skill in the art at the time the invention was filed to use the process model script of the invention of Dechu et al. to further specify wherein the process model script is a robotic process automation (RPA) script of the invention of Ma et al. because doing so would allow the system to build a robotic process automation (RPA) model for at least one cluster based at least in part on a frequency of one or more variants of the clustered traces (See Ma et al., Paragraph 0229). Further, the claimed invention is merely a combination of old elements, and in combination each element would have performed the same function as it did separately, and one of ordinary skill in the art would have recognized that the results of the combination were predictable.
Regarding claims 7 and 15 (Original), which are dependent of claims 1 and 9, the combination of Dechu et al. and Ma et al. discloses all the limitations in claims 1 and 9. Dechu et al. further discloses wherein the plurality of sources includes two or more selected from the group consisting of: one or more Internet Information Services (IIS) log files, one or more Apache log file, one or more application log files, one or more standard operating procedure (SOP) manuals, one or more screen capture logs, one or more keystroke logs, one or more business process documents (BPDs) (Paragraph 0017, FIG. 1 is a diagram illustrating a system architecture, according to an embodiment of the invention. By way of illustration in FIG. 1, a vector embedding generator 104 generates vector embeddings for multimodal data 102. As an example, the multimodal data 102 may include data relating to images, comments, audio, video, etc. The vector embedding generator 104 may generate one or more vector embeddings for each type of data; Paragraph 0019, The historical event logs 110 are also provided as input to an event embedding encoder 112. The event embedding encoder 112 encodes events from the historical event logs 110 to create event embeddings (e.g., vector representations of events). For example, the event embedding encoder 112 learns an unsupervised embedding for each event as in Act2Vec, and the unsupervised task is used to predict an event from its context; It can be noted that the claim language is written in alternative form.  The limitation taught by Dechu et al. is based on “one or more application log files" and “one or more screen capture logs.” Examiner notes that an event log is an application log file and an image is a screen capture log).
Regarding claims 8 and 16 (Original), which are dependent of claims 1 and 9, Dechu et al. discloses all the limitations in claims 1 and 9. Dechu et al. further discloses wherein the process model script identifies higher probability processes in the one or more processes (Paragraph 0024, In FIG. 2, iterations of the LSTM 212 correspond to LSTM 212-1 . . . LSTM 212-N, and events from the event dictionary are represented as E0 . . . EN-1. Event vectors are created for the events and are provided as input to the LSTM 212 model. In the non-limiting example shown in FIG. 2, the event vectors are created using Event2Vec. At each iteration, the LSTM 212 model outputs a probability distribution for a given event represented as log P(E1) . . . log P(EN). It is noted that each of LSTM 212-1 . . . LSTM 212-N may include multiple LSTM layers; Paragraph 0029, The additional details from the image(s) and comments provided by the customer for each case may be used to one or more of: (i) train the joint machine learning model (120) to consider these features and (ii) to predict the complete trace of the events that are likely to happen and preemptively perform one or more corrective actions).

Claims 2-3 and 10-11 are rejected under 35 U.S.C. 103 as being unpatentable over Dechu et al. (US 2020/0320383 A1), in view of Ma et al. (US 2020/0206920 A1), in further view of Liu (US 2017/0177703 A1).
Regarding claims 2 and 10 (Original), which are dependent of claims 1 and 9, the combination of Dechu et al. and Ma et al. discloses all the limitations in claims 1 and 9. Dechu et al. further configured to correlate the plurality of event vectors by: joining a first subset of the plurality of event vectors to create a first process matrix, joining a second subset of the plurality of event vectors to create a second process matrix, determining a similarity between the first process matrix and the second process matrix, … (Paragraph 0032, In accordance with at least one example embodiment, the determination of the sequence of event considers one or more of: case-specific attributes, process-oriented attributes, importance of a given case and similarities between a given case and other cases. Case-specific attributes may include, for example, the amount of an item, vendor name of the item, category type of the item, etc. Process-oriented attributes may include, for example, sequence of actions taken on the issue, comments corresponding to specific actions taken, etc. The importance of a given case may relate to whether the customer is, for example, a frequent or a high-value customer when the current issue is raised. Similarities between a given case and other cases may correspond to a number of issues raised in the other cases that are similar to an issue of the given case within a certain time frame (e.g., within a day, month, etc.), and a number of actions (e.g., process steps) taken on all such issues).
Although Dechu et al. discloses all the limitations above and similarities between sequence of actions taken on the issue, Dechu et al. does not specifically disclose wherein the similarity is measured as a dot product between the first process matrix and the second process matrix, and identifying that the first process matrix and the second process matrix refer to a same process in the one or more processes based on the similarity being below a threshold.
However, Liu discloses to correlate the plurality of event vectors by: joining a first subset of the plurality of event vectors to create a first process matrix, joining a second subset of the plurality of event vectors to create a second process matrix, determining a similarity between the first process matrix and the second process matrix, the similarity measured as a dot product between the first process matrix and the second process matrix, and identifying that the first process matrix and the second process matrix refer to a same process in the one or more processes based on the similarity being below a threshold (Paragraph 0047, In some embodiments, in generating the set of shared semantic vectors, the vector component 222 determines a semantic similarity measured by a similarity function. In some instances, the similarity function is a cosine similarity function. The cosine similarity may be a dot product modified by a normalization of the dot products of the vectors. For example, the semantic similarity measure may be represented as X*Y (∥X∥*∥Y∥), where X is the source vector and Y is the target vector. Each vector X and Y may be floating points with n dimensions. X*Y may be the sum of the n dimensional floating points for the X vector and the Y vector. ∥X∥ may be a normalization of the dot product of the X vector. ∥Y∥ may be a normalization of the dot product of the Y vector. Although described in a specified embodiment with respect to cosine similarity functions, it should be understood that the semantic similarity may be determined by any suitable manner; Paragraph 0048, After generating a representation for each term of the one or more first categories and one or more second categories, the vector component 222 compares one or more terms for a specified first category of the one or more first categories to one or more terms for a specified second category of the one or more second categories using the cosine similarity function. The cosine similarity function may measure a similarity between two or more semantic vectors (e.g., vector representations of terms of each category). In some embodiments, the two or more semantic vectors may be non-zero vectors between which the cosine similarity function measures the cosine of the angle between the two or more vectors. In some embodiments, vectors are determined to be similar where the cosine of the angle between vectors is between zero and one in a positive space. In some instances, cosine similarity may be additionally determined when the cosine of the angle between vectors is above a predetermined threshold; Paragraph 0050, In various embodiments, the mapping is performed by learning the semantic similarity between a source sequence and a target sequence. In example embodiments, the semantic similarity, also referred to as semantic relevance, may be measured by a cosine similarity function sim (X, Y), where X represents the semantic vector of source sequence (i.e., derived from the seller's taxonomy) and Y represents the semantic vector of target sequence (i.e., derived from the category tree of the publication system). Both X and Y represent points in the shared semantic vector space. The output of the cosine similarity function represents how close those two points in the shared semantic vector space, i.e., how semantically similar between the source sequence and the target sequence. Generally, the best matched category of Y has the highest similarity score to X. The source sequence vector and target sequence vector have the same number of dimensions. For example, the source sequence represents an entry on the seller's inventory list (e.g., the seller's taxonomy entry and the item title) and the target sequence represents a category tree path (root-to-leaf) used by the publication system. In various embodiments, the target sequence is pre-computed before runtime and the source sequence is computed during runtime and then compared to the target sequence during runtime).
It would have been obvious to one ordinary skill in the art at the time the invention was filed to modify how the similarities between a first process matrix and a second process matrix are determined of the invention of Dechu et al. to further specify that the similarities are determined using a cosine similarity function of the invention of Liu because doing so would allow the system to determine similarity between two vectors when the cosine of the angle between vectors is above a predetermined threshold (See Liu, Paragraph 0048). Further, the claimed invention is merely a combination of old elements, and in combination each element would have performed the same function as it did separately, and one of ordinary skill in the art would have recognized that the results of the combination were predictable.
Regarding claims 3 and 11 (Original), which are dependent of claims 1 and 9, the combination of Dechu et al. and Ma et al. discloses all the limitations in claims 1 and 9. Dechu et al. further configured to correlate the plurality of event vectors by: joining a first subset of the plurality of event vectors to create a first process matrix, joining a second subset of the plurality of event vectors to create a second process matrix, determining a similarity between the first process matrix and the second process matrix, … (Paragraph 0032, In accordance with at least one example embodiment, the determination of the sequence of event considers one or more of: case-specific attributes, process-oriented attributes, importance of a given case and similarities between a given case and other cases. Case-specific attributes may include, for example, the amount of an item, vendor name of the item, category type of the item, etc. Process-oriented attributes may include, for example, sequence of actions taken on the issue, comments corresponding to specific actions taken, etc. The importance of a given case may relate to whether the customer is, for example, a frequent or a high-value customer when the current issue is raised. Similarities between a given case and other cases may correspond to a number of issues raised in the other cases that are similar to an issue of the given case within a certain time frame (e.g., within a day, month, etc.), and a number of actions (e.g., process steps) taken on all such issues).
Although Dechu et al. discloses all the limitations above and similarities between sequence of actions taken on the issue, Dechu et al. does not specifically disclose wherein the similarity is measured as a dot product between the first process matrix and the second process matrix, and identifying that the first process matrix and the second process matrix are different processes in the one or more processes based on the similarity being above a threshold.
However, Liu discloses to correlate the plurality of event vectors by: joining a first subset of the plurality of event vectors to create a first process matrix, joining a second subset of the plurality of event vectors to create a second process matrix, determining a similarity between the first process matrix and the second process matrix, the similarity measured as a dot product between the first process matrix and the second process matrix, and identifying that the first process matrix and the second process matrix are different processes in the one or more processes based on the similarity being above a threshold (Paragraph 0047, In some embodiments, in generating the set of shared semantic vectors, the vector component 222 determines a semantic similarity measured by a similarity function. In some instances, the similarity function is a cosine similarity function. The cosine similarity may be a dot product modified by a normalization of the dot products of the vectors. For example, the semantic similarity measure may be represented as X*Y (∥X∥*∥Y∥), where X is the source vector and Y is the target vector. Each vector X and Y may be floating points with n dimensions. X*Y may be the sum of the n dimensional floating points for the X vector and the Y vector. ∥X∥ may be a normalization of the dot product of the X vector. ∥Y∥ may be a normalization of the dot product of the Y vector. Although described in a specified embodiment with respect to cosine similarity functions, it should be understood that the semantic similarity may be determined by any suitable manner; Paragraph 0048, After generating a representation for each term of the one or more first categories and one or more second categories, the vector component 222 compares one or more terms for a specified first category of the one or more first categories to one or more terms for a specified second category of the one or more second categories using the cosine similarity function. The cosine similarity function may measure a similarity between two or more semantic vectors (e.g., vector representations of terms of each category). In some embodiments, the two or more semantic vectors may be non-zero vectors between which the cosine similarity function measures the cosine of the angle between the two or more vectors. In some embodiments, vectors are determined to be similar where the cosine of the angle between vectors is between zero and one in a positive space. In some instances, cosine similarity may be additionally determined when the cosine of the angle between vectors is above a predetermined threshold; Paragraph 0050, In various embodiments, the mapping is performed by learning the semantic similarity between a source sequence and a target sequence. In example embodiments, the semantic similarity, also referred to as semantic relevance, may be measured by a cosine similarity function sim (X, Y), where X represents the semantic vector of source sequence (i.e., derived from the seller's taxonomy) and Y represents the semantic vector of target sequence (i.e., derived from the category tree of the publication system). Both X and Y represent points in the shared semantic vector space. The output of the cosine similarity function represents how close those two points in the shared semantic vector space, i.e., how semantically similar between the source sequence and the target sequence. Generally, the best matched category of Y has the highest similarity score to X. The source sequence vector and target sequence vector have the same number of dimensions. For example, the source sequence represents an entry on the seller's inventory list (e.g., the seller's taxonomy entry and the item title) and the target sequence represents a category tree path (root-to-leaf) used by the publication system. In various embodiments, the target sequence is pre-computed before runtime and the source sequence is computed during runtime and then compared to the target sequence during runtime).
It would have been obvious to one ordinary skill in the art at the time the invention was filed to modify how the similarities between a first process matrix and a second process matrix are determined of the invention of Dechu et al. to further specify that the similarities are determined using a cosine similarity function of the invention of Liu because doing so would allow the system to determine dissimilarity between two vectors when the cosine of the angle between vectors is below a predetermined threshold (See Liu, Paragraph 0048). Further, the claimed invention is merely a combination of old elements, and in combination each element would have performed the same function as it did separately, and one of ordinary skill in the art would have recognized that the results of the combination were predictable.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to MARJORIE PUJOLS-CRUZ whose telephone number is (571)272-4668. The examiner can normally be reached Mon-Thru 7:30 AM - 5:00 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Patricia H Munson can be reached on (571)270-5396. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/M.P./Examiner, Art Unit 3624                                                                                                                                                                                                                                                                                                                                                                                                             /PATRICIA H MUNSON/Supervisory Patent Examiner, Art Unit 3624