DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 10/14/2020, 04/29/2022 was filed.  The submission is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claim(s) 1-2, 4, 9-12, 14-15,18-19, and 21-22 are rejected under 35 U.S.C. 102 (a)(2) as being anticipated by Puszkiewicz (US PAT 11099972 B2, Filed Date: Nov. 19, 2018, hereinafter “Puszkiewicz”)
Regarding independent claim 1, Puszkiewicz teaches: A system, comprising:
one or more user computing systems comprising respective recorder processes; (Puszkiewicz − [Col. 7 ll. 5-12] FIG. 3 shows a block diagram of a validation system 300 for validating an interface of an application, according to an example embodiment. Validation system 300 is an example implementation of system 100 of FIG. 1. [Col. 9 ll. 40-45] Image capturer 306 may capture images as screenshots, or portions of screenshots representing application GUI 312.)
and a server configured to train an artificial intelligence (Al) / machine learning (ML) model to recognize applications, screens, and user interface (UI) elements using computer vision (CV) (Examiner Notes: computer vision (CV) as non-textual visual components in an image such as a click, and/or hovering) (Puszkiewicz – Fig. 1, Server 106, Fig. 2 flowchart, [Col. 11 ll. 50-55] Fig.3, For example, model 326 may analyze an image using one or more suitable image analysis techniques known and appreciated to those skilled in the relevant art, to analyze each captured image to locate graphical elements present in the image, and classify such elements. As will be described in greater detail below, model 326 may comprise a machine-learning based model that is trained by model generator 324.) 
and to recognize user interactions with the applications, screens, and UI elements, (Puszkiewicz – [Col. 8 ll. 38-52] In some implementations, test script 302 may comprise one or more automated pointing device interactions, keyboard interactions, voice-based interactions, etc. Illustrative examples of such automated interactions include, but are not limited to, hovering over and/or clicking on interactive elements in application GUI 312 via a pointing device interaction (e.g., by sending commands to application GUI 312 to move a pointing device), typing characters or strings in application GUI 312, transmitting a voice-based interaction to application GUI 312, and/or any other type of interaction resembling a user interaction of the GUI.)
wherein the respective recorder processes are configured to: record screenshots or video frames of a display associated with the respective user computing system and other information, (Puszkiewicz – [Col. 9 ll. 40-45] Image capturer 306 may capture images as screenshots, or portions of screenshots representing application GUI 312.)
and send the recorded screenshots or video frames, and the other information, to storage accessible by the server, (Puszkiewicz – [Col. 10 ll. 1-6] In examples, image capturer 306 may store each captured image in a storage device, such as image storage 316. Image storage 316 may comprise any suitable storage device for storing hundreds, thousands, or even a greater number of images representing application GUI 312 (or a plurality of application GUIs).)
and the server is configured to: initially train the AI/ML model to recognize the applications, screens, and UI elements that are present in the recorded screenshots or video frames using the recorded screenshots or video frames and the other information, (Puszkiewicz – [Col. 15 ll. 9-19] In example implementations, model generator 324 may comprise one or more suitable machine-learning algorithms for training 342 model 326 for classifying graphical objects. Model generator 324 may comprise any suitable classification algorithm, [Col. 15 ll. 20-45] object classifier 320 may implement one or more OCR models or techniques to extract a letter, number, word, phrase, string, etc. associated with graphical objects detected in captured images. Object detector 318 may detect graphical objects on the captured image and object classifier 320 may classify the detected objects as described above.) 
and after the AI/ML model can recognize the applications, screens, and UI elements in the recorded screenshots or video frames with a confidence, train the AI/ML model to recognize individual user interactions with the UI elements. (Puszkiewicz – [Cols. 13,14 ll. 47-67, 1-2] In example embodiments, UI image validator 322 may also enable model 326 to be continuously refined and/or retrained 340 based on the outcome of the validation of application GUI 312. In this manner, model generator 324 may be configured to continuously retrain and/or refine model 326 based on user input (or lack thereof), thereby further improving the accuracy of the model and the automated validation of an application GUI. [Col. 16 ll. 1-5] model 326 may be configured to classify objects only where a measure of confidence exceeds a threshold value (e.g., a 90% confidence value) that may be predetermined and/or configurable in a similar manner as described above. The retraining of model is after the model recognize screenshot with a confidence value during the UI image validator 322.)
Regarding dependent claim 2, discloses all the features with respect to claim 1 as outline above
Puszkiewicz teaches: wherein the individual user interactions comprise button presses, entry of single characters or character sequences, selection of active UI elements, menu selections, screen changes, voice inputs, gestures, providing biometric information, haptic interactions, or a combination thereof. (Puszkiewicz − [Col. 8 ll. 5-27] graphical objects present in application GUI 312 are not limited to the above illustrative examples but may include any other types of selectable or non-selectable elements that may be displayed, including icons, buttons, lists, menus, toolbars, etc. In other examples, however, GUI elements such as completion list elements or other selectable options (e.g., pop-up dialogs, windows, etc.) may be presented on application GUI 312 in dynamic locations. As a result, application GUI 312 may comprise elements for which locations may differ depending on the type and location of received interactions.)
Regarding dependent claim 4, discloses all the features with respect to claim 1 as outline above
Puszkiewicz teaches: wherein the other information comprises a web browser history, one or more heat maps, key presses, mouse clicks, locations of mouse clicks and/or graphical elements on the display that a user is interacting with, locations where the user was looking on the display, time stamps associated with the screenshots or video frames, text that the user entered, content that the user scrolled past, a time that the user stopped on a part of content shown in the display, what application the user is interacting with, voice inputs, gestures, emotion information, biometrics, information pertaining to periods of no user activity, haptic information, multi-touch input information, or a combination thereof. (Puszkiewicz – [Col. 8 ll. 5-27, 38-52] In some implementations, test script 302 may comprise one or more automated pointing device interactions, keyboard interactions, voice-based interactions, etc. Illustrative examples of such automated interactions include, but are not limited to, hovering over and/or clicking on interactive elements in application GUI 312 via a pointing device interaction (e.g., by sending commands to application GUI 312 to move a pointing device), typing characters or strings in application GUI 312, transmitting a voice-based interaction to application GUI 312, and/or any other type of interaction resembling a user interaction of the GUI.)
Regarding dependent claim 9, discloses all the features with respect to claim 1 as outline above
Puszkiewicz teaches: further comprising: an automation box operably connected to a user computing system of the one or more user computing systems, the automation box configured to: receive input from one or more user input devices, associate time stamps with the input, (Puszkiewicz – [Col. 4 ll. 20-28] The test script launcher may further be configured to capture a plurality of images representing the GUI at different points in time (timestamp), such as different points in time based on the automatic interaction with the GUI, and associate a set of tags for each image that identifies expected objects in the image.)
and send the time stamped input to storage accessible by the server, (Puszkiewicz – [Col. 10 ll. 1-6] In examples, image capturer 306 may store each captured image in a storage device, such as image storage 316. Image storage 316 may comprise any suitable storage device for storing hundreds, thousands, or even a greater number of images representing application GUI 312 (or a plurality of application GUIs) wherein the server is configured to use the time stamped input for the initial training of the AI/ML model. (Puszkiewicz – [Cols. 13,14 ll. 47-67, 1-2] In example embodiments, UI image validator 322 may also enable model 326 to be continuously refined and/or retrained 340 based on the outcome of the validation of application GUI 312. In this manner, model generator 324 may be configured to continuously retrain and/or refine model 326 based on user input (or lack thereof), thereby further improving the accuracy of the model and the automated validation of an application GUI.)
Regarding dependent claim 10, discloses all the features with respect to claim 1 as outline above
Puszkiewicz teaches: wherein server is configured to perform the initial training of the AI/ML model without a prior knowledge of the applications, screens, and UI elements in the screenshots or video frames. (Puszkiewicz – Fig. 2 flowchart [Col. 9 ll. 39-45] Training the model for the first time is consider without prior knowledge. By capturing images in step 206 for training the model. [Col. 9 ll. 39-45] In step 206, a plurality of images representing the GUI at different points in time are captured. For instance, with reference to FIG. 3, image capturer 306 may capture 330 a plurality of images representing application GUI 312 at one or more points in time.)
Regarding independent claim 11, Puszkiewicz teaches: A non-transitory computer-readable medium storing a computer program configured to 
train an artificial intelligence (AI) / machine learning (ML) model to recognize applications, screens, and user interface (UI) elements using computer vision (CV) (Examiner Notes: computer vision (CV) as non-textual visual components in an image such as a click, and/or hovering) (Puszkiewicz – Fig. 1, Server 106, Fig. 2 flowchart, [Col. 11 ll. 50-55] Fig.3, For example, model 326 may analyze an image using one or more suitable image analysis techniques known and appreciated to those skilled in the relevant art, to analyze each captured image to locate graphical elements present in the image, and classify such elements. As will be described in greater detail below, model 326 may comprise a machine-learning based model that is trained by model generator 324.) 
and/or to recognize user interactions with the applications, screens, and UI elements, (Puszkiewicz – [Col. 8 ll. 38-52] In some implementations, test script 302 may comprise one or more automated pointing device interactions, keyboard interactions, voice-based interactions, etc. Illustrative examples of such automated interactions include, but are not limited to, hovering over and/or clicking on interactive elements in application GUI 312 via a pointing device interaction (e.g., by sending commands to application GUI 312 to move a pointing device), typing characters or strings in application GUI 312, transmitting a voice-based interaction to application GUI 312, and/or any other type of interaction resembling a user interaction of the GUI.)
the computer program configured to cause at least one processor to: access recorded screenshots or video frames of displays associated with one or more computing systems (Puszkiewicz – [Col. 9 ll. 40-45] Image capturer 306 may capture images as screenshots, or portions of screenshots representing application GUI 312.)
and access other information associated with the one or more computing systems; (Puszkiewicz – [Col. 10 ll. 1-6] In examples, image capturer 306 may store each captured image in a storage device, such as image storage 316. Image storage 316 may comprise any suitable storage device for storing hundreds, thousands, or even a greater number of images representing application GUI 312 (or a plurality of application GUIs).)
and initially train the AI/ML model to recognize the applications, screens, and UI elements that are present in the recorded screenshots or video frames using the recorded screenshots or video frames and the other information, (Puszkiewicz – [Col. 15 ll. 9-19] In example implementations, model generator 324 may comprise one or more suitable machine-learning algorithms for training 342 model 326 for classifying graphical objects. Model generator 324 may comprise any suitable classification algorithm, [Col. 15 ll. 20-45] object classifier 320 may implement one or more OCR models or techniques to extract a letter, number, word, phrase, string, etc. associated with graphical objects detected in captured images. Object detector 318 may detect graphical objects on the captured image and object classifier 320 may classify the detected objects as described above.)
wherein the initial training of the AI/ML model is performed without a prior knowledge of the applications, screens, and UI elements in the screenshots or video frames. (Puszkiewicz – Fig. 2 flowchart [Col. 9 ll. 39-45] Training the model for the first time is consider without prior knowledge. By capturing images in step 206 for training the model. [Col. 9 ll. 39-45] In step 206, a plurality of images representing the GUI at different points in time are captured. For instance, with reference to FIG. 3, image capturer 306 may capture 330 a plurality of images representing application GUI 312 at one or more points in time.)
Regarding dependent claim 12, discloses all the features with respect to claim 11 as outline above
Puszkiewicz teaches: wherein after the AI/ML model can recognize the applications, screens, and UI elements in the recorded screenshots or video frames with a confidence, the computer program is further configured to cause the at least one processor to: train the AI/ML model to recognize individual user interactions with the UI elements. (Puszkiewicz – [Cols. 13,14 ll. 47-67, 1-2] In example embodiments, UI image validator 322 may also enable model 326 to be continuously refined and/or retrained 340 based on the outcome of the validation of application GUI 312. In this manner, model generator 324 may be configured to continuously retrain and/or refine model 326 based on user input (or lack thereof), thereby further improving the accuracy of the model and the automated validation of an application GUI. [Col. 16 ll. 1-5] model 326 may be configured to classify objects only where a measure of confidence exceeds a threshold value (e.g., a 90% confidence value) that may be predetermined and/or configurable in a similar manner as described above. The retraining of model is after the model recognize screenshot with a confidence value during the UI image validator 322.)
Regarding dependent claim 14, discloses all the features with respect to claim 12 as outline above
Puszkiewicz teaches: wherein the individual user interactions comprise button presses, entry of single characters or character sequences, selection of active UI elements, menu selections, screen changes, voice inputs, gestures, providing biometric information, haptic interactions, or a combination thereof. (Puszkiewicz − [Col. 8 ll. 5-27] graphical objects present in application GUI 312 are not limited to the above illustrative examples but may include any other types of selectable or non-selectable elements that may be displayed, including icons, buttons, lists, menus, toolbars, etc. In other examples, however, GUI elements such as completion list elements or other selectable options (e.g., pop-up dialogs, windows, etc.) may be presented on application GUI 312 in dynamic locations. As a result, application GUI 312 may comprise elements for which locations may differ depending on the type and location of received interactions.)
Regarding dependent claim 15, discloses all the features with respect to claim 11 as outline above
Puszkiewicz teaches: wherein the other information comprises a web browser history, one or more heat maps, key presses, mouse clicks, locations of mouse clicks and/or graphical elements on the display that a user is interacting with, locations where the user was looking on the display, time stamps associated with the screenshots or video frames, text that the user entered, content that the user scrolled past, a time that the user stopped on a part of content shown in the display, what application the user is interacting with, voice inputs, gestures, emotion information, biometrics, information pertaining to periods of no user activity, haptic information, multi-touch input information, or a combination thereof. (Puszkiewicz – [Col. 8 ll. 5-27, 38-52] In some implementations, test script 302 may comprise one or more automated pointing device interactions, keyboard interactions, voice-based interactions, etc. Illustrative examples of such automated interactions include, but are not limited to, hovering over and/or clicking on interactive elements in application GUI 312 via a pointing device interaction (e.g., by sending commands to application GUI 312 to move a pointing device), typing characters or strings in application GUI 312, transmitting a voice-based interaction to application GUI 312, and/or any other type of interaction resembling a user interaction of the GUI.)
Regarding independent claim 18, Puszkiewicz teaches: A computer-implemented method for 
training an artificial intelligence (AI) / machine learning (ML) model to recognize applications, screens, and user interface (UI) elements using computer vision (CV) (Examiner Notes: computer vision (CV) as non-textual visual components in an image such as a click, and/or hovering) (Puszkiewicz – Fig. 1, Server 106, Fig. 2 flowchart, [Col. 11 ll. 50-55] Fig.3, For example, model 326 may analyze an image using one or more suitable image analysis techniques known and appreciated to those skilled in the relevant art, to analyze each captured image to locate graphical elements present in the image, and classify such elements. As will be described in greater detail below, model 326 may comprise a machine-learning based model that is trained by model generator 324.)
and to recognize user interactions with the applications, screens, and UI elements, (Puszkiewicz – [Col. 8 ll. 38-52] In some implementations, test script 302 may comprise one or more automated pointing device interactions, keyboard interactions, voice-based interactions, etc. Illustrative examples of such automated interactions include, but are not limited to, hovering over and/or clicking on interactive elements in application GUI 312 via a pointing device interaction (e.g., by sending commands to application GUI 312 to move a pointing device), typing characters or strings in application GUI 312, transmitting a voice-based interaction to application GUI 312, and/or any other type of interaction resembling a user interaction of the GUI.)
the method comprising: accessing recorded screenshots or video frames of displays associated with one or more computing systems (Puszkiewicz – [Col. 9 ll. 40-45] Image capturer 306 may capture images as screenshots, or portions of screenshots representing application GUI 312.)
and accessing other information associated with the one or more computing systems; (Puszkiewicz – [Col. 10 ll. 1-6] In examples, image capturer 306 may store each captured image in a storage device, such as image storage 316. Image storage 316 may comprise any suitable storage device for storing hundreds, thousands, or even a greater number of images representing application GUI 312 (or a plurality of application GUIs).)
initially training the AI/ML model to recognize the applications, screens, and UI elements that are present in the recorded screenshots or video frames using the recorded screenshots or video frames and the other information; (Puszkiewicz – [Col. 15 ll. 9-19] In example implementations, model generator 324 may comprise one or more suitable machine-learning algorithms for training 342 model 326 for classifying graphical objects. Model generator 324 may comprise any suitable classification algorithm, [Col. 15 ll. 20-45] object classifier 320 may implement one or more OCR models or techniques to extract a letter, number, word, phrase, string, etc. associated with graphical objects detected in captured images. Object detector 318 may detect graphical objects on the captured image and object classifier 320 may classify the detected objects as described above.)
and after the AI/ML model can recognize the applications, screens, and UI elements in the recorded screenshots or video frames with a confidence, training the AI/ML model to recognize individual user interactions with the UI elements. (Puszkiewicz – [Cols. 13,14 ll. 47-67, 1-2] In example embodiments, UI image validator 322 may also enable model 326 to be continuously refined and/or retrained 340 based on the outcome of the validation of application GUI 312. In this manner, model generator 324 may be configured to continuously retrain and/or refine model 326 based on user input (or lack thereof), thereby further improving the accuracy of the model and the automated validation of an application GUI. [Col. 16 ll. 1-5] model 326 may be configured to classify objects only where a measure of confidence exceeds a threshold value (e.g., a 90% confidence value) that may be predetermined and/or configurable in a similar manner as described above. The retraining of model is after the model recognize screenshot with a confidence value during the UI image validator 322.)
Regarding dependent claim 19, discloses all the features with respect to claim 18 as outline above
Puszkiewicz teaches: wherein the initial training of the AI/ML model is performed without a prior knowledge of the applications, screens, and UI elements in the screenshots or video frames. (Puszkiewicz – Fig. 2 flowchart [Col. 9 ll. 39-45] Training the model for the first time is consider without prior knowledge. By capturing images in step 206 for training the model. [Col. 9 ll. 39-45] In step 206, a plurality of images representing the GUI at different points in time are captured. For instance, with reference to FIG. 3, image capturer 306 may capture 330 a plurality of images representing application GUI 312 at one or more points in time.)
Regarding dependent claim 21, discloses all the features with respect to claim 19 as outline above
Puszkiewicz teaches: wherein the individual user interactions comprise button presses, entry of single characters or character sequences, selection of active UI elements, menu selections, screen changes, voice inputs, gestures, providing biometric information, haptic interactions, or a combination thereof. (Puszkiewicz − [Col. 8 ll. 5-27] graphical objects present in application GUI 312 are not limited to the above illustrative examples but may include any other types of selectable or non-selectable elements that may be displayed, including icons, buttons, lists, menus, toolbars, etc. In other examples, however, GUI elements such as completion list elements or other selectable options (e.g., pop-up dialogs, windows, etc.) may be presented on application GUI 312 in dynamic locations. As a result, application GUI 312 may comprise elements for which locations may differ depending on the type and location of received interactions.)
Regarding dependent claim 22, discloses all the features with respect to claim 19 as outline above
Puszkiewicz teaches: wherein the other information comprises a web browser history, one or more heat maps, key presses, mouse clicks, locations of mouse clicks and/or graphical elements on the display that a user is interacting with, locations where the user was looking on the display, time stamps associated with the screenshots or video frames, text that the user entered, content that the user scrolled past, a time that the user stopped on a part of content shown in the display, what application the user is interacting with, voice inputs, gestures, emotion information, biometrics, information pertaining to periods of no user activity, haptic information, multi-touch input information, or a combination thereof. (Puszkiewicz – [Col. 8 ll. 5-27, 38-52] In some implementations, test script 302 may comprise one or more automated pointing device interactions, keyboard interactions, voice-based interactions, etc. Illustrative examples of such automated interactions include, but are not limited to, hovering over and/or clicking on interactive elements in application GUI 312 via a pointing device interaction (e.g., by sending commands to application GUI 312 to move a pointing device), typing characters or strings in application GUI 312, transmitting a voice-based interaction to application GUI 312, and/or any other type of interaction resembling a user interaction of the GUI.)

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 3, 7, 8, 13 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Puszkiewicz in view of Ramamurthy (US PGPUB: 20190324781 A1, Filed Date: Jun. 12, 2018, hereinafter “ Ramamurthy”).
Regarding dependent claim 3, discloses all the features with respect to claim 1 as outline above
Puszkiewicz does not explicitly teach: wherein the training of the AI/ML model to recognize the individual user interactions with the UI elements comprises comparing two or more consecutive screenshots or video frames and determining that a typed character appeared from one screenshot to another, a button was pressed, or a menu selection occurred.
However, Ramamurthy teaches: wherein the training of the AI/ML model to recognize the individual user interactions with the UI elements comprises comparing two or more consecutive screenshots or video frames and determining that a typed character appeared from one screenshot to another, a button was pressed, or a menu selection occurred. (Ramamurthy − [0054] FIG. 9A depicts example screenshots 900A of a GUI to determine user interactions. FIG. 9A depicts a technique for identifying the text field that a user may work on based on blinking keyboard caret. The two screenshots shown in FIG. 9A represent the same text box captured at a defined time interval. In the first screenshot, the caret is not visible. In the second screenshot the caret is visible. The system may compare these two screenshots taken between a brief time interval to identify the blinking caret as shown in FIG. 9A. 0055] The method for finding the blinking caret accurately comprises: (i) Running a background thread/timer that captures screenshots of application interface every 0.5 seconds (configurable interval), (ii) Comparing the currently captured image with the image that was captured during the previous timer interval, (iii) Finding the differences between the two images, (iv) Discarding very small differences, e.g., any difference which is smaller than 2×2 pixel rectangle, (v) Identifying blinking caret/cursor, (vi) Given the region where the caret is blinking, finding the control which falls within this region, and (vi) Identifying that the control region is where the next user action/interaction may occur.)
Accordingly, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have combined the teaching of Puszkiewicz, and Ramamurthy as each inventions relates analyzing and developing models based on GUI interactions of GUI elements. Adding the teaching of Ramamurthy provides Puszkiewicz with the ability to compare screenshots when identifying GUI interactions with blinking carets. Therefore, providing the benefit of continuously of improvement of machine-learning based model for GUI interaction of GUI elements.
Regarding dependent claim 7, discloses all the features with respect to claim 1 as outline above
Puszkiewicz does not explicitly teach: wherein the respective recorder processes are implemented as feedback loop processes that continuously or periodically compare a current screenshot or video frame to a previous screenshot or video frame and identify one or more locations where changes between the current screenshot or video frame and the previous screenshot or video frame occurred.
However, Ramamurthy teaches: wherein the respective recorder processes are implemented as feedback loop processes that continuously or periodically compare a current screenshot or video frame to a previous screenshot or video frame and identify one or more locations where changes between the current screenshot or video frame and the previous screenshot or video frame occurred. (Ramamurthy − [0054] FIG. 9A depicts example screenshots 900A of a GUI to determine user interactions. FIG. 9A depicts a technique for identifying the text field that a user may work on based on blinking keyboard caret. The two screenshots shown in FIG. 9A represent the same text box captured at a defined time interval. In the first screenshot, the caret is not visible. In the second screenshot the caret is visible. The system may compare these two screenshots taken between a brief time interval to identify the blinking caret as shown in FIG. 9A. 0055] The method for finding the blinking caret accurately comprises: (i) Running a background thread/timer that captures screenshots of application interface every 0.5 seconds (configurable interval), (ii) Comparing the currently captured image with the image that was captured during the previous timer interval, (iii) Finding the differences between the two images, (iv) Discarding very small differences, e.g., any difference which is smaller than 2×2 pixel rectangle, (v) Identifying blinking caret/cursor, (vi) Given the region where the caret is blinking, finding the control which falls within this region, and (vi) Identifying that the control region is where the next user action/interaction may occur.)
Accordingly, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have combined the teaching of Puszkiewicz, and Ramamurthy as each inventions relates analyzing and developing models based on GUI interactions of GUI elements. Adding the teaching of Ramamurthy provides Puszkiewicz with the ability to compare screenshots when identifying GUI interactions with blinking carets. Therefore, providing the benefit of continuously of improvement of machine-learning based model for GUI interaction of GUI elements.
Regarding dependent claim 8, discloses all the features with respect to claim 7 as outline above
Puszkiewicz teaches: wherein the respective recorder processes are further configured to: perform optical character recognition (OCR) on the one or more locations where the changes occurred; (Puszkiewicz – [Col. 12 ll. 6-9] GUI validator 112 may also implement one or more optical character recognition (OCR) techniques to extract one or more alphanumeric characters from a captured image (or a subset thereof). [Col. 18 ll. 14-21] Validator UI enabling a user to view, modify bounded regions of an image. For instance, a predetermined interaction, such as clicking on a bounded region or hovering over a bounded region with a pointing device may cause application the validator interface to display the region identifier associated with the bounded image.)
compare results of the OCR to content of a keyboard queue to determine whether a match exists; and when a match exists, link text associated with the match to a respective location. (Puszkiewicz – [Col. 15 ll. 34-45] Upon classification, object classifier 320 may be configured to extract text from one or more classified objects and UI image validator 322 may compare the text expected to be present in the image (e.g., based on one or more associated tags) with the text extracted from the captured image. In this way, additional validation may be performed on images (or portions thereof) representing application GUI 312. It is noted that OCR techniques or models may be implemented as part of object classifier 320 or may be implemented separate from object classifier 320 in examples)
Regarding dependent claim 13, discloses all the features with respect to claim 12 as outline above
Puszkiewicz does not explicitly teach: wherein the training of the AI/ML model to recognize the individual user interactions with the UI elements comprises comparing two or more consecutive screenshots or video frames and determining that a typed character appeared from one to another, a button was pressed, or a menu selection occurred.
However, Ramamurthy teaches: wherein the training of the AI/ML model to recognize the individual user interactions with the UI elements comprises comparing two or more consecutive screenshots or video frames and determining that a typed character appeared from one to another, a button was pressed, or a menu selection occurred. (Ramamurthy − [0054] FIG. 9A depicts example screenshots 900A of a GUI to determine user interactions. FIG. 9A depicts a technique for identifying the text field that a user may work on based on blinking keyboard caret. The two screenshots shown in FIG. 9A represent the same text box captured at a defined time interval. In the first screenshot, the caret is not visible. In the second screenshot the caret is visible. The system may compare these two screenshots taken between a brief time interval to identify the blinking caret as shown in FIG. 9A. 0055] The method for finding the blinking caret accurately comprises: (i) Running a background thread/timer that captures screenshots of application interface every 0.5 seconds (configurable interval), (ii) Comparing the currently captured image with the image that was captured during the previous timer interval, (iii) Finding the differences between the two images, (iv) Discarding very small differences, e.g., any difference which is smaller than 2×2 pixel rectangle, (v) Identifying blinking caret/cursor, (vi) Given the region where the caret is blinking, finding the control which falls within this region, and (vi) Identifying that the control region is where the next user action/interaction may occur.)
Accordingly, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have combined the teaching of Puszkiewicz, and Ramamurthy as each inventions relates analyzing and developing models based on GUI interactions of GUI elements. Adding the teaching of Ramamurthy provides Puszkiewicz with the ability to compare screenshots when identifying GUI interactions with blinking carets. Therefore, providing the benefit of continuously of improvement of machine-learning based model for GUI interaction of GUI elements.
Regarding dependent claim 20, discloses all the features with respect to claim 19 as outline above
Puszkiewicz does not explicitly teach: wherein the training of the AI/ML model to recognize the individual user interactions with the UI elements comprises comparing two or more consecutive screenshots or video frames and determining that a typed character appeared from one to another, a button was pressed, or a menu selection occurred.
However, Ramamurthy teaches: wherein the training of the AI/ML model to recognize the individual user interactions with the UI elements comprises comparing two or more consecutive screenshots or video frames and determining that a typed character appeared from one to another, a button was pressed, or a menu selection occurred. (Ramamurthy − [0054] FIG. 9A depicts example screenshots 900A of a GUI to determine user interactions. FIG. 9A depicts a technique for identifying the text field that a user may work on based on blinking keyboard caret. The two screenshots shown in FIG. 9A represent the same text box captured at a defined time interval. In the first screenshot, the caret is not visible. In the second screenshot the caret is visible. The system may compare these two screenshots taken between a brief time interval to identify the blinking caret as shown in FIG. 9A. 0055] The method for finding the blinking caret accurately comprises: (i) Running a background thread/timer that captures screenshots of application interface every 0.5 seconds (configurable interval), (ii) Comparing the currently captured image with the image that was captured during the previous timer interval, (iii) Finding the differences between the two images, (iv) Discarding very small differences, e.g., any difference which is smaller than 2×2 pixel rectangle, (v) Identifying blinking caret/cursor, (vi) Given the region where the caret is blinking, finding the control which falls within this region, and (vi) Identifying that the control region is where the next user action/interaction may occur.)
Accordingly, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have combined the teaching of Puszkiewicz, and Ramamurthy as each inventions relates analyzing and developing models based on GUI interactions of GUI elements. Adding the teaching of Ramamurthy provides Puszkiewicz with the ability to compare screenshots when identifying GUI interactions with blinking carets. Therefore, providing the benefit of continuously of improvement of machine-learning based model for GUI interaction of GUI elements.

Claim(s) 5, 6, 16, 17 and 23 are rejected under 35 U.S.C. 103 as being unpatentable over Puszkiewicz in view of Gupta (US PGPUB: 20190251707 A1, Filed Date: Feb. 15, 2018, hereinafter “Gupta”).
Regarding dependent claim 5, discloses all the features with respect to claim 1 as outline above
Puszkiewicz does not explicitly teach: wherein the one or more user computing systems or the server are configured to generate one or more heat maps, the other information comprising the one or more heat maps, and the one or more heat maps comprise a frequency that a user used applications, a frequency that the user interacted with components of the applications, locations of the components in the applications, content of the applications and/or components, or a combination thereof.
However, Gupta teaches: wherein the one or more user computing systems or the server are configured to generate one or more heat maps, the other information comprising the one or more heat maps, and the one or more heat maps comprise a frequency that a user used applications, a frequency that the user interacted with components of the applications, locations of the components in the applications, content of the applications and/or components, or a combination thereof. (Gupta − [0043] The eye gaze data 134 is thus data from which a saliency score for each content element in the training content items can be determined. the system may store the pixel-level heat map for each content item as part of the eye gaze data 134.)
Accordingly, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have combined the teaching of Puszkiewicz, and Gupta as each inventions relates analyzing and developing models based on GUI interactions of GUI elements. Adding the teaching of Ramamurthy provides Puszkiewicz with the ability to generate heat map for GUI interaction. Therefore, providing the benefit of continuously of improvement of machine-learning based model for GUI interaction of GUI elements.
Regarding dependent claim 6, discloses all the features with respect to claim 5 as outline above
Puszkiewicz does not explicitly teach: wherein the one or more user computing systems or the server are configured to derive the one or more heat maps
However, Gupta teaches: wherein the one or more user computing systems or the server are configured to derive the one or more heat maps from display analysis comprising detection of typed and/or pasted text, caret tracking, active element detection, or a combination thereof. (Gupta − [0043] The eye gaze data 134 is thus data from which a saliency score for each content element in the training content items can be determined. the system may store the pixel-level heat map for each content item as part of the eye gaze data 134.)
Accordingly, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have combined the teaching of Puszkiewicz, and Gupta as each inventions relates analyzing and developing models based on GUI interactions of GUI elements. Adding the teaching of Ramamurthy provides Puszkiewicz with the ability to generate heat map for GUI interaction. Therefore, providing the benefit of continuously of improvement of machine-learning based model for GUI interaction of GUI elements.
Regarding dependent claim 16, discloses all the features with respect to claim 11 as outline above
Puszkiewicz does not explicitly teach: wherein the computer program is further configured to cause the at least one processor to: generate one or more heat maps, the other information comprising the one or more heat maps, wherein the one or more heat maps comprise a frequency that a user used one or more applications, a frequency that the user interacted with components of the one or more applications, locations of the components in the one or more applications, content of the one or more applications and/or components, or a combination thereof.
However, Gupta teaches: wherein the computer program is further configured to cause the at least one processor to: generate one or more heat maps, the other information comprising the one or more heat maps, wherein the one or more heat maps comprise a frequency that a user used one or more applications, a frequency that the user interacted with components of the one or more applications, locations of the components in the one or more applications, content of the one or more applications and/or components, or a combination thereof. (Gupta − [0043] The eye gaze data 134 is thus data from which a saliency score for each content element in the training content items can be determined. the system may store the pixel-level heat map for each content item as part of the eye gaze data 134.)
Accordingly, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have combined the teaching of Puszkiewicz, and Gupta as each inventions relates analyzing and developing models based on GUI interactions of GUI elements. Adding the teaching of Ramamurthy provides Puszkiewicz with the ability to generate heat map for GUI interaction. Therefore, providing the benefit of continuously of improvement of machine-learning based model for GUI interaction of GUI elements.
Regarding dependent claim 17, discloses all the features with respect to claim 16 as outline above
Puszkiewicz does not explicitly teach: wherein the one or more heat maps are derived 
However, Gupta teaches: wherein the one or more heat maps are derived from display analysis comprising detection of typed and/or pasted text, caret tracking, active element detection, or a combination thereof. (Gupta − [0043] The eye gaze data 134 is thus data from which a saliency score for each content element in the training content items can be determined. the system may store the pixel-level heat map for each content item as part of the eye gaze data 134.)
Accordingly, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have combined the teaching of Puszkiewicz, and Gupta as each inventions relates analyzing and developing models based on GUI interactions of GUI elements. Adding the teaching of Ramamurthy provides Puszkiewicz with the ability to generate heat map for GUI interaction. Therefore, providing the benefit of continuously of improvement of machine-learning based model for GUI interaction of GUI elements.
Regarding dependent claim 23, discloses all the features with respect to claim 19 as outline above
Puszkiewicz does not explicitly teach: further comprising: generating one or more heat maps, the other information comprising the one or more heat maps, wherein the one or more heat maps comprise a frequency that a user used one or more applications, a frequency that the user interacted with components of the one or more applications, locations of the components in the one or more applications, content of the one or more applications and/or components, or a combination thereof, and the one or more heat maps are derived from display analysis comprising detection of typed and/or pasted text, caret tracking, active element detection, or a combination thereof.
However, Gupta teaches: further comprising: generating one or more heat maps, the other information comprising the one or more heat maps, wherein the one or more heat maps comprise a frequency that a user used one or more applications, a frequency that the user interacted with components of the one or more applications, locations of the components in the one or more applications, content of the one or more applications and/or components, or a combination thereof, and the one or more heat maps are derived from display analysis comprising detection of typed and/or pasted text, caret tracking, active element detection, or a combination thereof. (Gupta − [0043] The eye gaze data 134 is thus data from which a saliency score for each content element in the training content items can be determined. the system may store the pixel-level heat map for each content item as part of the eye gaze data 134.)
Accordingly, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have combined the teaching of Puszkiewicz, and Gupta as each inventions relates analyzing and developing models based on GUI interactions of GUI elements. Adding the teaching of Ramamurthy provides Puszkiewicz with the ability to generate heat map for GUI interaction. Therefore, providing the benefit of continuously of improvement of machine-learning based model for GUI interaction of GUI elements.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
D4: Carmi US-9448908-B2 model an application based on GUI events and information via screenshots
D5: Jelveh US-20190087691-A1 machine learning system for determining user action and intent based on screen image analysis (screenshots)
D6: Kochura US-10474440-B2 recognizing UI elements of an application using computer-vision based training information (screenshots)
D7: P K US-20200019418-A1 capturing screenshots of user interface actions within test event data for training machine learning models
D8: Fernandes US-20200249964-A1 automatic detection of user interface elements via screenshots
D9: Singh US-11513670-B2 learning user interface controls via data synthesis via screenshots

Any inquiry concerning this communication or earlier communications from the examiner should be directed to CARL E BARNES JR whose telephone number is (571)270-3395. The examiner can normally be reached Monday-Friday 9am-6pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Cesar Paula can be reached on 571-272-4128. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/CARL E BARNES JR/Examiner, Art Unit 2177    

/CESAR B PAULA/Supervisory Patent Examiner, Art Unit 2177