DETAILED ACTION
The action is responsive to the amendment filed on 06/22/2022. Claims 1 and 3-11 are pending in the case. Claims 1 and 6 are independent claims.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 6 and 8 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention. As to claim 6, the claim recites “receives an output of said to thereby assign data” which is grammatically incorrect. For the purposes of examination, Examiner assumed the claim to recite “receives an output of said recognition neural network to thereby assign data”. As to claim 8, the claim recites “assign text data recognized by to said at least one area” which is grammatically incorrect. For the purposes of examination, Examiner assumed the claim to recite “assign text data recognized by said text processing neural network to said at least one area”.

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.


Claims 1 and 3-11 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Ang (US 20190018675 A1).

As to claim 1, Ang discloses a method for performing at least one task involving at least one interaction with a user interface, the method being executed by a processor (Ang Figure 1 101), the method comprising:
a) receiving said user interface ("the controller is configured to: receive a plurality of images of one or more GUIs of one or more software applications,” Ang paragraph 0018);
b) segmenting said user interface using a segmentation neural network to determine different areas of said user interface ("the controller is configured to: receive a plurality of images of one or more GUIs of one or more software applications, analyze the one or more GUI images via an image recognition algorithm to identify a position and associate a function for each of one or more user input controls found in the images," Ang paragraph 0018; "Once downloaded, the images are examined by one or more pieces of code which are capable of machine learning (e.g., artificially intelligent program(s)) which analyze the format, placement of text and buttons, data entry fields, etc. to automatically generate executable code that can be used to automatically control the software," Ang paragraph 0041; “In one example, tens of thousands of ‘software application screenshot’ images are fed to Elmo for the bot to learn to accurately identify the patterns of given software application controls. The system applies a mask to the Training Data using a convolutional neural network such as, for example, Faster RCNN. In another example, a fully convolutional network is used to improve the training model,” Ang paragraph 0008; “The software components of the bot are built to enable the bot to click, focus, enter text, select checkboxes, select drop down options, and locate labels. These functions may be build, for example, in C+ and Python. The bot performs the above actions in combination with the software application controls and a classifier program based on convolutional neural networks (CNNs) to build the model,” Ang paragraph 0009, using a neural network to locate areas of the UI that contain buttons and labels (i.e., segmenting the UI));
c) analyzing said user interface using a recognition neural network to determine data associated with each of said different areas ("In a further example, the bot uses Natural Language Processing (NLP) algorithms to parse user instructions. The bot may be taught how to interpret human commands (text (string)/chat/email) and convert those user instruction sets into executable instructions, such as clicking, selecting, text input, etc," Ang paragraph 0012; “In one example, tens of thousands of “software application screenshot” images are fed to Elmo for the bot to learn to accurately identify the patterns of given software application controls. The system applies a mask to the Training Data using a convolutional neural network such as, for example, Faster RCNN. In another example, a fully convolutional network is used to improve the training model,” Ang paragraph 0008; “The software components of the bot are built to enable the bot to click, focus, enter text, select checkboxes, select drop down options, and locate labels. These functions may be build, for example, in C+ and Python. The bot performs the above actions in combination with the software application controls and a classifier program based on convolutional neural networks (CNNs) to build the model,” Ang paragraph 0009, using neural network to determine controls and interpreting text instructions to use those controls);
d) determining, using a text processing neural network, which areas in said user interface contain data relevant to said at least one task by associating the relevant data with corresponding relevant areas, said determining comprising ignoring interface areas irrelevant to said at least one task ("An example of executable instructions may be 'Enter First Name”=>“Textbox Input' (90-97% probability) 'Parameters: First Name'. At run-time, the bot looks to identify the appropriate control through which to enter the first name. For example, the bot may identify the control based on what has been learned from prior examination of related screen shots or else it will try to locate the control based on label/text extraction and algorithms to find the control for 'First Name.' The words 'First Name' immediately adjacent to a textbox input indicate a high probability for that input to be the correct control," Ang paragraph 0013; Ang Figure 7 701 “Elmo, Please add ‘Bob Smith’ With email bsmith@o.com to the accounting system” command with name and email data; “For example: a user could email the following command to a bot (the bot is referred to herein ‘Elmo’): ‘Elmo, please add user ‘Nancy Smith’ to system’. The ‘Elmo’ bot would previously have the following computerized set of controls trained on this task: (1) Open program ‘User management system.exe’; (2) Click the tab Users; (3) Enter first name and last name in textboxes; (4) Click Submit. As described further herein, the Elmo bot knows to use the instructions above for the task assigned by a system user. The bot then executes the task and sends a confirmation to the user once complete,” Ang paragraph 0007, the users tab, first name and last name textboxes and submit button are all interacted with while no other control or part of the screen is interacted with (i.e., it is ignored));
e) executing said at least one task by executing at least one interaction with either: 
- at least one of said relevant areas determined in step d) ("At run-time, the bot looks to identify the appropriate control through which to enter the first name," Ang paragraph 0013; "For example, using voice commands, an end user may instruct the system 10 to please add 'Bob Smith' to a QuickBooks contact page. The system 10 can discern from these spoken commands that it should carry out the action of adding 'Bob Smith' to the QuickBooks contact page... The system 10 will then proceed to analyze both the spoken commands as well as the screen shot, determine the program control(s) to utilize (or create them), then finally carry out the action specified," Ang paragraph 0064); or 
- relevant data contained in said at least one of said areas.

As to claim 3, Ang further discloses a method according to claim 1, wherein b) further comprises determining which areas of said user interface can be activated ("the controller is configured to: receive a plurality of images of one or more GUIs of one or more software applications, analyze the one or more GUI images via an image recognition algorithm to identify a position and associate a function for each of one or more user input controls found in the images," Ang paragraph 0018, identifying the positions of user input controls (i.e., which area of the user interface that can be activated)).

As to claim 4, Ang further discloses a method according to claim 1, further comprising a step of determining which areas of said user interface comprises at least one field into which data is to be entered ("the controller is configured to: receive a plurality of images of one or more GUIs of one or more software applications, analyze the one or more GUI images via an image recognition algorithm to identify a position and associate a function for each of one or more user input controls found in the images," Ang paragraph 0018; "At run-time, the bot looks to identify the appropriate control through which to enter the first name," Ang paragraph 0013, identifying the positions of text user input controls (i.e., which area of the user interface into which data can be entered)).

As to claim 5, Ang further discloses a method according to claim 1, wherein said at least one task includes at least one of: 
- copying data into a data entry field in said user interface ("The execution instructions may be instructions to click a button, input text, select checkboxes, selects drop down options within an application, etc. or any combination thereof," Ang paragraph 0019); 
- activating at least one button on said user interface ("The execution instructions may be instructions to click a button, input text, select checkboxes, selects drop down options within an application, etc. or any combination thereof," Ang paragraph 0019); 
- copying data from at least one area in said user interface; and 
- selecting data from at least one area in said user interface ("The execution instructions may be instructions to click a button, input text, select checkboxes, selects drop down options within an application, etc. or any combination thereof," Ang paragraph 0019, system can automatically select checkboxes or drop down options (i.e., selecting data from an area in the user interface)).

As to claim 6, Ang discloses a system for determining components of a user interface, the system comprising: 
a processor (“These AI programs are run by the server's 100 central processing unit (CPU) 101 and stored upon its memory 102,” Ang paragraph 0037”); and 
a non-transitory storage medium operatively connected to the processor, the non- transitory storage medium storing computer-readable instructions, the processor, upon executing the computer-readable instructions (“These AI programs are run by the server's 100 central processing unit (CPU) 101 and stored upon its memory 102,” Ang paragraph 0037”), being configured for:
- determining, using a segmentation neural network, different areas in said user interface ("the controller is configured to: receive a plurality of images of one or more GUIs of one or more software applications, analyze the one or more GUI images via an image recognition algorithm to identify a position and associate a function for each of one or more user input controls found in the images," Ang paragraph 0018; “In this embodiment, the centralized server 100 hosts one or more artificial intelligence (AI) programs, algorithms, etc. These AI programs are run by the server's 100 central processing unit (CPU) 101 and stored upon its memory 102. The AI programs conduct various tasks to analyze and learn how to control various other software programs 160,” Ang paragraph 0037; "Once downloaded, the images are examined by one or more pieces of code which are capable of machine learning (e.g., artificially intelligent program(s)) which analyze the format, placement of text and buttons, data entry fields, etc. to automatically generate executable code that can be used to automatically control the software," Ang paragraph 0041; “In one example, tens of thousands of ‘software application screenshot’ images are fed to Elmo for the bot to learn to accurately identify the patterns of given software application controls. The system applies a mask to the Training Data using a convolutional neural network such as, for example, Faster RCNN. In another example, a fully convolutional network is used to improve the training model,” Ang paragraph 0008; “The software components of the bot are built to enable the bot to click, focus, enter text, select checkboxes, select drop down options, and locate labels. These functions may be build, for example, in C+ and Python. The bot performs the above actions in combination with the software application controls and a classifier program based on convolutional neural networks (CNNs) to build the model,” Ang paragraph 0009, using a neural network to locate areas of the UI that contain buttons and labels (i.e., segmenting the UI));
- determining, using a recognition neural network, a function for at least one of said different areas in said user interface ("the controller is configured to: receive a plurality of images of one or more GUIs of one or more software applications, analyze the one or more GUI images via an image recognition algorithm to identify a position and associate a function for each of one or more user input controls found in the images," Ang paragraph 0018; “In one example, tens of thousands of ‘software application screenshot’ images are fed to Elmo for the bot to learn to accurately identify the patterns of given software application controls. The system applies a mask to the Training Data using a convolutional neural network such as, for example, Faster RCNN. In another example, a fully convolutional network is used to improve the training model,” Ang paragraph 0008; “The software components of the bot are built to enable the bot to click, focus, enter text, select checkboxes, select drop down options, and locate labels. These functions may be build, for example, in C+ and Python. The bot performs the above actions in combination with the software application controls and a classifier program based on convolutional neural networks (CNNs) to build the model,” Ang paragraph 0009, using a neural network to locate areas of the UI that contain buttons and labels (i.e., segmenting the UI)); 
- determining, using a text processing neural network, data associated with at least one of said different areas in said user interface ("In a further example, the bot uses Natural Language Processing (NLP) algorithms to parse user instructions. The bot may be taught how to interpret human commands (text (string)/chat/email) and convert those user instruction sets into executable instructions, such as clicking, selecting, text input, etc," Ang paragraph 0012); 
wherein 
- said recognition neural network receives an output of said segmentation neural network to thereby assign functions to said at least one area in said user interface ("the controller is configured to: receive a plurality of images of one or more GUIs of one or more software applications, analyze the one or more GUI images via an image recognition algorithm to identify a position and associate a function for each of one or more user input controls found in the images," Ang paragraph 0018); and
- said text processing neural network receives an output of said recognition neural network to thereby assign data to be associated with said at least one area in said user interface ("An example of executable instructions may be 'Enter First Name’=>’Textbox Input' (90-97% probability) 'Parameters: First Name'. At run-time, the bot looks to identify the appropriate control through which to enter the first name. For example, the bot may identify the control based on what has been learned from prior examination of related screen shots or else it will try to locate the control based on label/text extraction and algorithms to find the control for 'First Name.' The words 'First Name' immediately adjacent to a textbox input indicate a high probability for that input to be the correct control," Ang paragraph 0013; Ang Figure 7 701 “Elmo, Please add ‘Bob Smith’ With email bsmith@o.com to the accounting system” command with name and email data; “For example: a user could email the following command to a bot (the bot is referred to herein ‘Elmo’): ‘Elmo, please add user ‘Nancy Smith’ to system’. The ‘Elmo’ bot would previously have the following computerized set of controls trained on this task: (1) Open program ‘User management system.exe’; (2) Click the tab Users; (3) Enter first name and last name in textboxes; (4) Click Submit. As described further herein, the Elmo bot knows to use the instructions above for the task assigned by a system user. The bot then executes the task and sends a confirmation to the user once complete,” Ang paragraph 0007).

As to claim 7, Ang further discloses the system according to claim 6, wherein said text processing neural network is configured to recognize text data in said user interface ("The one or more GUI images may be further analyzed by a text extraction program to identify a position and associate a function for each of one or more user input controls found in the images. The text extraction program may utilize optical character recognition," Ang paragraph 0019).

As to claim 8, Ang further discloses the system according to claim 6, wherein said text processing neural network is configured to assign text data recognized by said text processing neural network to said at least one area in said user interface ("The one or more GUI images may be further analyzed by a text extraction program to identify a position and associate a function for each of one or more user input controls found in the images. The text extraction program may utilize optical character recognition. The identified position of an input control may be a centroid position of the input control. The execution instructions may be instructions to click a button, input text, select checkboxes, selects drop down options within an application, etc. or any combination thereof," Ang paragraph 0019; "Alternatively, if the program control is a text box or combo box, the control finder program 510 is programmed to check directly to the right or directly below where an indicated label is found. For example, if the label is ‘First Name’ the system 10 will use text extraction to get coordinates of the ‘First Name’ label, and then locate the program control nearest to the right or directly below the label," Ang paragraph 0055; "An example of executable instructions may be 'Enter First Name”=>“Textbox Input' (90-97% probability) 'Parameters: First Name'. At run-time, the bot looks to identify the appropriate control through which to enter the first name. For example, the bot may identify the control based on what has been learned from prior examination of related screen shots or else it will try to locate the control based on label/text extraction and algorithms to find the control for 'First Name.' The words 'First Name' immediately adjacent to a textbox input indicate a high probability for that input to be the correct control," Ang paragraph 0013).

As to claim 9, Ang further discloses the system according to claim 6, wherein an output of said system is used to execute at least one task on said user interface, said at least one task involving at least one interaction with either: 
- at least one of said areas in said user interface ("At run-time, the bot looks to identify the appropriate control through which to enter the first name," Ang paragraph 0013; "For example, using voice commands, an end user may instruct the system 10 to please add 'Bob Smith' to a QuickBooks contact page. The system 10 can discern from these spoken commands that it should carry out the action of adding 'Bob Smith' to the QuickBooks contact page... The system 10 will then proceed to analyze both the spoken commands as well as the screen shot, determine the program control(s) to utilize (or create them), then finally carry out the action specified," Ang paragraph 0064); or 
- with data contained in said at least one of said areas.

As to claim 10, Ang further discloses, the method according to claim 1, wherein d) is performed independently of a location of said areas ("An example of executable instructions may be 'Enter First Name’=>’Textbox Input' (90-97% probability) 'Parameters: First Name'. At run-time, the bot looks to identify the appropriate control through which to enter the first name. For example, the bot may identify the control based on what has been learned from prior examination of related screen shots or else it will try to locate the control based on label/text extraction and algorithms to find the control for 'First Name.' The words 'First Name' immediately adjacent to a textbox input indicate a high probability for that input to be the correct control," Ang paragraph 0013; “Additionally, as the billing software is updated, changed, or replaced, the automated task(s) specified by an end user may continue to be carried out by the system 10. This is enabled its ability to identify control elements of a given piece of software. Practically, this could be useful if, for instance, the billing software changes its interface and alters the inputs needed to email out invoices. Using previous coding methods, the automated invoice email control program would also need to be updated. However, the present system 10 can identify the change, as well as determine the new steps required to send out the invoices monthly, as previously instructed. If it is still possible to conduct the instructed task, the system 10 continues to do so after adapting its program controls automatically,” Ang paragraph 0043, the locations of controls relevant to a task are determined at run-time so that the control can be located even if it is moved (i.e., the relevant control is determined and located independent of its previous location)).

As to claim 11, Ang further discloses, the method according to claim 10, further comprising, prior to d): 
determining a context of the task and a context of the user interface (“Additionally, as the billing software is updated, changed, or replaced, the automated task(s) specified by an end user may continue to be carried out by the system 10. This is enabled its ability to identify control elements of a given piece of software. Practically, this could be useful if, for instance, the billing software changes its interface and alters the inputs needed to email out invoices. Using previous coding methods, the automated invoice email control program would also need to be updated. However, the present system 10 can identify the change, as well as determine the new steps required to send out the invoices monthly, as previously instructed. If it is still possible to conduct the instructed task, the system 10 continues to do so after adapting its program controls automatically,” Ang paragraph 0043, controls and inputs needed to complete a task are identified (i.e., a context of the task) even if the UI of the application changes (i.e., a context of the UI)); and wherein 
d) is based on said determined context of the task and said determined context of the user interface (“Additionally, as the billing software is updated, changed, or replaced, the automated task(s) specified by an end user may continue to be carried out by the system 10. This is enabled its ability to identify control elements of a given piece of software. Practically, this could be useful if, for instance, the billing software changes its interface and alters the inputs needed to email out invoices. Using previous coding methods, the automated invoice email control program would also need to be updated. However, the present system 10 can identify the change, as well as determine the new steps required to send out the invoices monthly, as previously instructed. If it is still possible to conduct the instructed task, the system 10 continues to do so after adapting its program controls automatically,” Ang paragraph 0043).

Response to Arguments
Applicant's arguments filed 06/22/2022 have been fully considered but they are not persuasive. 

As to the arguments concerning limitation b) of claim 1, Ang does disclose segmentation of the user interface into different areas. In Ang, areas of the UI containing controls or labels are identified ("Once downloaded, the images are examined by one or more pieces of code which are capable of machine learning (e.g., artificially intelligent program(s)) which analyze the format, placement of text and buttons, data entry fields, etc. to automatically generate executable code that can be used to automatically control the software," Ang paragraph 0041) Identifying parts of the UI that contain controls or labels is segmentation according to the broadest reasonable interpretation of the term. Furthermore this segmentation can be achieved by a model that is based on a convolution neural network (“The software components of the bot are built to enable the bot to click, focus, enter text, select checkboxes, select drop down options, and locate labels. These functions may be build, for example, in C+ and Python. The bot performs the above actions in combination with the software application controls and a classifier program based on convolutional neural networks (CNNs) to build the model,” Ang paragraph 0009). Examiner further notes that Ang discloses using image recognition algorithms to identify a position of a control in the UI (“receive a plurality of images of one or more GUIs of one or more software applications, analyze the one or more GUI images via an image recognition algorithm to identify a position and associate a function for each of one or more user input controls found in the images,” Ang paragraph 0018). The images can then be further analyzed by a text extraction algorithm to determine a function for the control (“The one or more GUI images may be further analyzed by a text extraction program to identify a position and associate a function for each of one or more user input controls found in the images,” Ang paragraph 0019, the text extraction program further analyzes (i.e., after the image recognition algorithm to identify the position of the input controls) the GUI images.

As to the arguments concerning limitation c) of claim 1, Ang does disclose this. In Ang a model based on a convolutional neural network can be used to parse a user’s textual instructions (i.e., data) and determine which UI elements (i.e., different areas of the screen) are associated with the instructions ("In a further example, the bot uses Natural Language Processing (NLP) algorithms to parse user instructions. The bot may be taught how to interpret human commands (text (string)/chat/email) and convert those user instruction sets into executable instructions, such as clicking, selecting, text input, etc," Ang paragraph 0012; “The software components of the bot are built to enable the bot to click, focus, enter text, select checkboxes, select drop down options, and locate labels. These functions may be build, for example, in C+ and Python. The bot performs the above actions in combination with the software application controls and a classifier program based on convolutional neural networks (CNNs) to build the model,” Ang paragraph 0009).

As to the arguments concerning limitation c) of claim 1, Ang does disclose this. As shown above, in Ang a user’s text instructions are parsed to determine which UI elements to activate to complete the user’s desired task. Ang then performs the task requested in the user’s instructions by interacting with only the UI elements needed to complete the task (i.e., other irrelevant elements are ignored) (“For example: a user could email the following command to a bot (the bot is referred to herein ‘Elmo’): ‘Elmo, please add user ‘Nancy Smith’ to system’. The ‘Elmo’ bot would previously have the following computerized set of controls trained on this task: (1) Open program ‘User management system.exe’; (2) Click the tab Users; (3) Enter first name and last name in textboxes; (4) Click Submit. As described further herein, the Elmo bot knows to use the instructions above for the task assigned by a system user. The bot then executes the task and sends a confirmation to the user once complete,” Ang paragraph 0007).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
US 20190294641 A1 discloses determining functional and descriptive elements of application images for intelligent screen automation where icons are analyzed to determine their function for use in program automation.

THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to DANIEL SAMWEL whose telephone number is (313)446-6549. The examiner can normally be reached Monday through Thursday 8:00-6:00 EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Matthew Ell can be reached on (571) 270-3264. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/DANIEL SAMWEL/             Primary Examiner, Art Unit 2171