DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

DETAILED ACTION
The Amendment filed on 5/26/2022 has been received and entered. Application No. 17/251,468 Claims 1, 3-13, 15-20 are now pending. Claims 2 & 14 are canceled. Claims 1, 3, 4, 6, 8, 13, 15, 16, 18 & 20 have been amended.

Response to Amendment
Applicant’s amendment necessitated new grounds of rejection. 
This action is made final in view of the new grounds of rejection.
 
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 4, 5, 7, 10, 11, 13, 16, 17, 19 & 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over KANG et al. (U.S. Pub 2020/0020334) hereinafter Kang, in view of Vangen et al. (U.S. Pub 2019/0243883) hereinafter Vangen.

As per Claim 1, Kang teaches determining a user intent to interact with a particular graphical user interface ("GUI") based at least in part on a free-form natural language input; based on the user intent, perform a function (Fig. 4, Fig. 5A, ¶112 wherien the electronic device 101 may receive a first user utterance 501 through the microphone 280 and provide data about the first user utterance 501 to an external server including an ASR system and an intelligence system. The intelligence system may apply  natural-language understanding to a text obtained by, e.g., the ASR system and determine, e.g., the user's intent, thereby generating a command including a task corresponding thereto)
comprising an interactive webpage with one or more interactive elements . (Fig. 4A, Fig. 5A-5B, ¶106, , ¶108, ¶110 wherien in a case where the first application program is a web browsing application, a screen downloaded from a server corresponding to an access URL the electronic device 101 may display a second execution screen 510 corresponding to a particular URL, and the second execution screen 510 may include the first user interface, such as a text box 511 and a keyboard 512 for text entry to the text box 511)
and automatically populating the identified interactive element with data determined from the user intent. (Fig. 5A, ¶113 wherein the electronic device 101 may obtain the first user utterance 501 “Register Study schedule on February second.” The electronic device 101 may send the data about the first user utterance 501 to the external server, and the external server may apply ASR to the received data, thus obtaining the text “Register, Study schedule on February, second.” The external server may generate a command including tasks to execute a schedule management application and register the schedule of “Study” on February 2 on the schedule management application, corresponding to the first user utterance 501 from the obtained text using the intelligence system. The external server may send the generated command to the electronic device 101, and the electronic device 101 may perform the task included in the command. As shown on the right side of FIG. 5A, the electronic device 101 may display an execution screen 520 of the schedule management application and display the result of registering the schedule 522 of “Study” on the February 2 item 521)
Kang previously taught based on the user intent performing a function. However, Kang does not explicitly teach identifying a target visual cue to be located in the GUI; obtaining a bitmap screenshot of the GUI; using a trained machine learning model, performing object recognition processing on the bitmap screenshot of the GUI to generate output indicative of a location of a detected instance of the target visual cue in the bitmap screenshot; based on the location of the detected instance of the target visual cue, identifying one or more of the interactive elements of the GUI; and
Vangen teaches identifying a target visual cue to be located in the GUI; (¶189 wherien use computer vision object detection and classification on the web page bitmap to extract the location of the play icon on the video player's player controls bar)
obtaining a bitmap screenshot of the GUI; (¶47, ¶48 wherein The elements of interest in a web page can be identified from a visual analysis of the web page from a visual analysis of the bitmap representation of the rendered web page this can be done using appropriate computer vision techniques, such as pattern matching, optical character recognition, etc.)
 using a trained machine learning model, (¶49, ¶186 wherein Such visual analysis can be performed as desired, e.g. by scanning the bitmap to be displayed for the web page, to identify, e.g., and in an embodiment, visual elements, such as text (words), symbols, icons, etc., that represent elements within the web page, e.g., and in an embodiment, to identify whether a web page contains any particular, e.g. predefined, and in an embodiment selected, visual elements (e.g. icons or text) that could correspond to desired elements of interest within the web page wherein a neural network or machine learning based classifier may be fed with data extracted using one or more of these methods and used to detect and classify the state of a web page)
 performing object recognition processing on the bitmap screenshot of the GUI (¶47, ¶48 wherein The elements of interest in a web page can be identified from a visual analysis of the web page from a visual analysis of the bitmap representation of the rendered web page this can be done using appropriate computer vision techniques, such as pattern matching, optical character recognition, etc.)
 to generate output indicative of a location of a detected instance of the target visual cue in the bitmap screenshot; based on the location of the detected instance of the target visual cue, identifying one or more of the interactive elements of the GUI; and (¶189 wherien use computer vision object detection and classification on the web page bitmap to extract the location of the play icon on the video player's player controls bar, and a user input actuator emulator to then move the mouse to the location of the play button and to emulate a left mouse click to actuate the play button)
It would have been obvious to one having ordinary skill in the art at the time the invention was filed to utilize  the teaching of a browser module configured to retrieve web pages from the Internet and an analysis module (60) operable to analyze a retrieved web page to identify elements of interest in the web page of Vangen with the teaching of processing user speech of Kang because Vangen teaches a machine learning module that can learn from analysis of and interactions with web pages how to improve its operation, such as how to better identify elements of interest in a web page and/or how to better interact with a web page and analyze a user's interactions with web pages and web services that they access, and to correspondingly adapt and improve its operation based on that analysis. (¶109, ¶110)

	As per Claim 4, the rejection of claim 1 is hereby incorporated by reference; Kang as modified further teaches further comprising: automatically submitting the data determined from the user intent; and receiving a subsequent webpage that is generated at least in part on the data determined from the user intent. (Fig. 5A, ¶113 wherien the electronic device 101 may obtain the first user utterance 501 “Register Study schedule on February second.” The electronic device 101 may send the data about the first user utterance 501 to the external server, and the external server may apply ASR to the received data, thus obtaining the text “Register, Study schedule on February, second.” The external server may generate a command including tasks to execute a schedule management application and register the schedule of “Study” on February 2 on the schedule management application, corresponding to the first user utterance 501 from the obtained text using the intelligence system. The external server may send the generated command to the electronic device 101, and the electronic device 101 may perform the task included in the command. As shown on the right side of FIG. 5A, the electronic device 101 may display an execution screen 520 of the schedule management application and display the result of registering the schedule 522 of “Study” on the February 2 item 521; as taught by Kang)

As per Claim 5, the rejection of claim 4 is hereby incorporated by reference; Kang as modified further teaches further comprising searching a uniform resource locator ("URL") or content of the subsequent webpage to determine an outcome of the automatic submitting. (Fig. 5A, ¶106, ¶110 wherien in a case where the first application program is a web browsing application, a screen downloaded from a server corresponding to an access URL may be transitorily or non-transitorily stored, and the first user interface may be included in the downloaded screen wherein the second-type user input is entered while the first user interface is displayed, the electronic device 101 may perform the second operation on a user utterance 501 input through the microphone 280; as taught by Kang)

As per Claim 7, the rejection of claim 1 is hereby incorporated by reference; Kang as modified further teaches wherein the free-form natural language input takes the form of a speech input captured at a microphone, and(Fig. 4, Fig. 5A, ¶112 wherien the electronic device 101 may receive a first user utterance 501 through the microphone 280 and provide data about the first user utterance 501 to an external server including an ASR system and an intelligence system. The intelligence system may apply  natural-language understanding to a text obtained by, e.g., the ASR system and determine, e.g., the user's intent, thereby generating a command including a task corresponding thereto; as taught by Kang)
  the method further includes performing speech recognition processing on the speech input to generate textual output. (Fig. 5B,¶113, ¶116 wherein the electronic device 101 may obtain the second user utterance 503 “Register Study schedule on February second.” The electronic device 101 may send the data about the second user utterance 503 to the external server, and the external server may apply ASR to the received data, the external server may include an automatic speech recognition (ASR) system capable of generating text using data about an utterance and an intelligence system capable of natural-language understanding text, grasping the meaning of the text, and generating a command corresponding to the text, thus obtaining the text “Register, Study schedule on February, second.” The external server may send the obtained text to the electronic device 101, and the electronic device 101 may display at least part 513 of the obtained text in the text box 511 as shown on the right side of FIG. 5B. According to various embodiments of the present invention, the electronic device 101 may be configured to input the text received from the external server to the first user interface based on the state information indicating that the first user interface is being displayed; as taught by Kang)

As per Claim 10, the rejection of claim 1 is hereby incorporated by reference; Kang as modified further teaches further comprising generating, based on the identified interactive element (¶189 wherien use computer vision object detection and classification on the web page bitmap to extract the location of the play icon on the video player's player controls bar; as taught by Vangen)
a script (¶185 wherein Javascript may be injected in the web page and used to test for the presence of particular Javascript and/or HTML elements (functions and/or objects), and their attributes, in order to determine their current state, with the presence and/or state of one or more of these elements being used to determine the particular state of the web page; as taught by Vangen)
that is subsequently executable in association with the GUI and a subsequent free-form natural language input to trigger automatic population of the identified interactive element with data determined from a subsequent user intent determined from the subsequent free-form natural language input and submission of the data determined from the user intent via the GUI. (Fig. 5B, ¶113,¶116 wherein the electronic device 101 may obtain the second user utterance 503 “Register Study schedule on February second.” The electronic device 101 may send the data about the second user utterance 503 to the external server, and the external server may apply ASR to the received data, the external server may include an automatic speech recognition (ASR) system capable of generating text using data about an utterance and an intelligence system capable of natural-language understanding text, grasping the meaning of the text, and generating a command corresponding to the text, thus obtaining the text “Register, Study schedule on February, second.” The external server may send the obtained text to the electronic device 101, and the electronic device 101 may display at least part 513 of the obtained text in the text box 511 as shown on the right side of FIG. 5B. According to various embodiments of the present invention, the electronic device 101 may be configured to input the text received from the external server to the first user interface based on the state information indicating that the first user interface is being displayed; as taught by Kang)

As per Claim 11, the rejection of claim 1 is hereby incorporated by reference; Kang as modified further teaches wherein the subsequent automatic population and submission is performed without one or more of identifying the target visual cue, performing the object recognition, or identifying the interactive element of the GUI. (Fig. 5B, ¶113,¶116 wherein the electronic device 101 may obtain the second user utterance 503 “Register Study schedule on February second.” The electronic device 101 may send the data about the second user utterance 503 to the external server, and the external server may apply ASR to the received data, the external server may include an automatic speech recognition (ASR) system capable of generating text using data about an utterance and an intelligence system capable of natural-language understanding text, grasping the meaning of the text, and generating a command corresponding to the text, thus obtaining the text “Register, Study schedule on February, second.” The external server may send the obtained text to the electronic device 101, and the electronic device 101 may display at least part 513 of the obtained text in the text box 511 as shown on the right side of FIG. 5B. According to various embodiments of the present invention, the electronic device 101 may be configured to input the text received from the external server to the first user interface based on the state information indicating that the first user interface is being displayed; as taught by Kang;)

Claim 13 is similar in scope to Claim 1; therefore, Claim 13 is rejected under the same rationale as Claim 1.

Claim 16 is similar in scope to Claim 4; therefore, Claim 16 is rejected under the same rationale as Claim 4.

Claim 17 is similar in scope to Claim 5; therefore, Claim 17 is rejected under the same rationale as Claim 5.

Claim 19 is similar in scope to Claim 7; therefore, Claim 19 is rejected under the same rationale as Claim 7.

Claim 20 is similar in scope to Claim 1; therefore, Claim 20 is rejected under the same rationale as Claim 1.

Claims 3 & 15 is/are rejected under 35 U.S.C. 103 as being unpatentable over Kang in view of Vangen as applied to claims 1 & 13 above, and further in view of Lorimor et al. (U.S. Pat 10,037,552) hereinafter Lori.

As per Claim 3, the rejection of claim 1 is hereby incorporated by reference; Kang as modified previously taught the interactive webpage, the location of the detected instance of the target visual cue. Kang as modified further teaches wherein the interactive element of the GUI is identified by a document object model ("DOM") of the interactive webpage (¶185 wherein  by parsing the DOM (Document Object Model) for the web page to detect the presence and visibility of particular DOM items and then determining the state based on the presence and visibility of (or lack of) one or more DOM items. The bitmap for the web page may also be processed (analyzed) with computer vision to detect and classify particular objects in the supplied bitmap, with a presence (or lack of) of one or more objects being used to determine the state of the web page; as taught by Vangen)
However, Kang as modified does not explicitly teach wherein the interactive element of the GUI is identified by comparing a document object model ("DOM") of the interactive webpage with the location of the detected instance of the target visual cue.
Lori teaches wherein the interactive element of the GUI is identified by comparing a document object model ("DOM") of the interactive webpage with the location of the detected instance of the target visual cue. (Fig. 3, col 2 lines 19-29 wherien he advertisement discovery equipment may determine from the Document Object Model (DOM) associated with a publisher web page that a particular advertisement is located at a particular location on a web page, determine one or more test points within the location of the advertisement, obtain the visible element at each test point (e.g., by requesting the visible element at the test point with a web browser and/or web crawler of the advertisement discovery equipment), obtain an element of the advertisement at that test point, and compare the visible element to the advertisement element.)
It would have been obvious to one having ordinary skill in the art at the time the invention was filed to utilize the teaching of discovery and tracking of obscured web-based advertisement of Lori with the teaching of processing user speech of Kang as modified because Lori teaches provide improved systems for discovering and tracking of internet-based advertisements that can distinguish between obscured and unobscured advertisements by determining a score for the advertisement  If the visible percentage is below a threshold, the advertisement discovery equipment may determine that the advertisement is obscured and may take suitable action for an obscured advertisement. If the visible percentage is above the threshold, the advertisement discovery equipment may determine that the advertisement is visible and may take suitable action for a visible advertisement (col. 1 lines 45-47, col.2 lines 35-50)

Claim 15 is similar in scope to Claim 3; therefore, Claim 15 is rejected under the same rationale as Claim 3.

Claim 6, 8 & 18 is/are rejected under 35 U.S.C. 103 as being unpatentable over Kang in view of Vangen as applied to claims 1, 5 & 16 above, and further in view of PHAM et al. (U.S. Pub 2021/0081475) hereinafter Pham.

As per Claim 6, the rejection of claim 5 is hereby incorporated by reference; Kang as modified previously taught the automatic submitting. However, Kang as modified does not explicitly teach wherein the method further includes training the machine learning model based on the outcome of the automatic submitting.
Pham teaches wherein the method further includes training the machine learning model based on the outcome of the automatic submitting. (Fig. 1, ¶46, ¶47, ¶62 wherein content generation subsystem 116 may train a prediction model, such as a machine learning model wherein the prediction model may be trained using training data including the initially accessed websites, the selected text from the initially accessed websites, and the subsequently accessed websites wherein the prediction model may include one or more neural networks wherein image item 204 may include an image of an object. Image item 204 may be analyzed using an object recognition computer vision model to determine the object included within the image, and a topic associated with the object may be determined by the object recognition computer vision model. In some embodiments, the object recognition computer vision model may be a convolutional neural network )
It would have been obvious to one having ordinary skill in the art at the time the invention was filed to utilize  the teaching of integrating content into web pages of Pham with the teaching of processing user speech of Kang as modified because Pham teaches integrating content into one or more online resources, including, for example, embedding hyperlinks into webpage content based on prior user interactions allowing users to personalize documents based on their preferences. (¶1, ¶2)

As per Claim 8, the rejection of claim 1 is hereby incorporated by reference; Kang as modified further teaches the machine leaning model comprises a neural network. (¶58, ¶186 wherein he analysis of a web page to identify elements of interest in the web page and the analysis module is implemented as a trained neural network or other suitable machine learning system wherein a neural network or machine learning based classifier may be fed with data extracted using one or more of these methods and used to detect and classify the state of a web page )
     However, Kang as modified does not explicitly teach  further teaches wherein the machine leaning model comprises a convolutional neural network. 
Pham teaches wherein the machine leaning model comprises a convolutional neural network.  (Fig. 2B, Fig. 6,¶62 wherein  image item 204 may include an image of an object. Image item 204 may be analyzed using an object recognition computer vision model to determine the object included within the image, and a topic associated with the object may be determined by the object recognition computer vision model. In some embodiments, the object recognition computer vision model may be a convolutional neural network (CNN))
It would have been obvious to one having ordinary skill in the art at the time the invention was filed to utilize  the teaching of integrating content into web pages of Pham with the teaching of processing user speech of Kang as modified because Pham teaches integrating content into one or more online resources, including, for example, embedding hyperlinks into webpage content based on prior user interactions allowing users to personalize documents based on their preferences. (¶1, ¶2)

Claim 18 is similar in scope to Claim 6; therefore, Claim 18 is rejected under the same rationale as Claim 6.

Claim 9 is/are rejected under 35 U.S.C. 103 as being unpatentable over Kang in view of Vangen, as applied to claim 1 above, in further view of Gibson et al. (U.S. Pat 10,944,845) hereinafter Gibson.

As per Claim 9, the rejection of claim 1 is hereby incorporated by reference; Kang as modified previously taught the user intent. However, Kang as modified does not explicitly teach wherein the user intent comprises submission of a search query using the GUI, and the target visual cue comprises a magnifying glass.
Gibson teaches  wherein the user intent comprises submission of a search query using the GUI, and the target visual cue comprises a magnifying glass.
 (Fig. 8, col. 12 lines 37-58 wherien the top of a screen may contain interface elements (e.g., search bar with a magnifying glass icon) that can be clicked on. When the interface elements (e.g., the magnifying glass icon) is selected or clicked on, the aggregation application may direct a portion of the home screen (e.g., on the right) to disappear wherein as the user begins typing, the aggregation application will cause the suggested search queries to populate according to the text the user has typed in)
It would have been obvious to one having ordinary skill in the art at the time the invention was filed to utilize  the teaching of consolidated content aggregation of Gibson with the teaching of processing user speech of Kang because Gibson teaches a content aggregation and delivery system, and more specifically to an analytics engine that analyzes and selects particular content from aggregated content wherein tracking 760 may be performed during the process and information from the tracking may be used in a feedback loop to improve selection of a particular content for the user (col. 1 lines 18-21, col. 9 lines 25-29)

Claim 12 is/are rejected under 35 U.S.C. 103 as being unpatentable over Kang in view of Vangen as applied to claim 11 above, and further in view of Nolan et al. (U.S. Pat 9,954,729) hereinafter Nolan.

As per Claim 12, the rejection of claim 11 is hereby incorporated by reference; Kang as modified previously taught submission of the data determined from the user intent. However, Kang as modified does not explicitly teach further comprising validating that submission of the data determined from the user intent resulted in a desired outcome, wherein the script is generated based on the validating.
Nolan teaches further comprising validating that submission of the data determined from the user intent resulted in a desired outcome, wherein the script is generated based on the validating. (Fig. 5, col. 8 lines 38-67 wherein a test is conducted to determine whether the results of the validation rules is positive and if the results of the validation are positive, a user at the client 102 can optionally request the generation of the script subsequent to a confirmation of a valid processing)
It would have been obvious to one having ordinary skill in the art at the time the invention was filed to utilize the teaching of provisioning and configuration of network infrastructure of Nolan with the teaching of processing user speech of Kang as modified because Nolan teaches tool utilizes pre-configured templates to collect information utilized in the configuration of the infrastructure equipment and automatically generate configuration scripts. The tool dramatically increases the ability to configure or re-configure infrastructure equipment. (Abstract)

Response to Arguments
Applicant's arguments with respect to claim 1 have been considered but are moot in view of the new ground(s) of rejection wherein Vangen is relied upon to teach the following limitations “identifying a target visual cue to be located in the GUI; obtaining a bitmap screenshot of the GUI; using a trained machine learning model, performing object recognition processing on the bitmap screenshot of the GUI to generate output indicative of a location of a detected instance of the target visual cue in the bitmap screenshot; based on the location of the detected instance of the target visual cue, identifying one or more of the interactive elements of the GUI; and”
 

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
	The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. See PTO-Form 892 for listed of cited references.


	Inquiry
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ANGIE BADAWI whose telephone number is (571)270-7590. The examiner can normally be reached Monday thru Wednesday 9:00am - 5:00pm EST with Thursdays and Fridays off.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Renee Chavez can be reached on (571) 270-1104. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/ANGIE BADAWI/Primary Examiner, Art Unit 2179