DETAILED ACTION

The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This communication is in response to the Amendments and Arguments filed on 03 December 2020. The Applicants’ amendment and remarks have been carefully considered, but they are not persuasive. Hence, this Action has been made FINAL. 
Any rejections of the previous office action not addressed in this action are considered resolved and no longer pertain to the prosecution of this application.

Information Disclosure Statement
The information disclosure statements (IDS) submitted on 03 December 2019, respectively, are in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement is being considered by the examiner.

Response to Amendments and Arguments
Firstly, the applicant argues that Sharifi et al. may recognize objects on a display, but is silent regarding recognition based on actions performed of type of feature as described in amended claim 1. The examiner notes, though, that Sharifi et al., fig. 22(2205-2215) shows how images (i.e., a plurality of features) are identified based on the user selecting the assistance window (i.e., an action). The type of feature associated with the selection of the assistance window is an image. Thus, the rejection of claim 1 is maintained.
Secondly, the applicant argues that Sharifi et al. does not teach or suggest that the metadata would include slot type as described in amended claim 1. The examiner note, though, that Sharifi et al., col. 31, lines 60-65, explains how metadata is associated with key items in the captured screen image. Thus, the rejection of claim 21 is maintained.   

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claim(s) 1, 3-6, 9, 11-12, 14-18, and 20-23 is/are rejected under 35 U.S.C. 102(a)(2) as being anticipated by US 9916328, hereinafter referred to as Sharifi et al.

Regarding claim 1 (currently amended), Sharifi et al. discloses a method, comprising: 

receiving, at an electronic device, a command directed to a first application operated by the electronic device (“In the example of FIG. 24, display 2400 represents a selectable assistance window 2405 with a preview 2410 of the previously captured image,” Sharifi et al., col. 38, lines 6-9. Here, the command is a selection of an assistance window.); 

capturing, at the electronic device, a plurality of features presented by the first application in response to user interface interactions with the first application (“The previously captured screen image represented by preview 2410 may be included, for example, in an index of previously captured screen images from the user device. When the user selects the assistance window (or a control for the window, etc.), the system may automatically take the mobile device to the state represented by the preview 2410, using the previously captured screen image as the selected image,” Sharifi et al., col. 38, lines 9-16. Also, “Process 2200 may begin when the system receives a selection of a first image that represents a previously captured screen (2205) ...The first image is associated with a timestamp and a mobile application that was executing when the image was captured,” Sharifi et al., col. 38, lines 45-54.), wherein each feature of the plurality of features is identified based on actions performed and type of feature (Sharifi et al., fig. 22(2205-2215) shows how images (i.e., a plurality of features) are identified based on the user selecting the assistance window (i.e., an action). The type of feature associated with the selection of the assistance window is an image.); 

capturing, at the electronic device, data communicated with the first application via the user interface interactions with the first application (Sharifi et al., col. 38, lines 45-54. Here, the communicated data are previously captured screen images and associated timestamps.); and 

learning a task based on the captured plurality of features and communicated data (“When the user selects the assistance window (or a control for the window, etc.), the system may automatically take the mobile device to the state represented by the preview 2410, using the previously captured screen image as the selected image. As one example, the system may use the machine learning algorithm to determine that the user makes a dinner reservation for two at Mr. Calzone most Fridays and generate assistance window 2405 to automate the next reservation,” Sharifi et al., col. 38, lines 12-21.).  
As to claim 9, device claim 9 and method claim 1 are related as method and device of using same, with each claimed element’s function corresponding to the method step. Accordingly claim 9 is similarly rejected under the same rationale as applied above with respect to method claim Also, Sharifi et al., col. 2, lines 24-27, teach memory and processor. And, Sharifi et al., col. 2, lines 47-51, teach CRM and instructions.
As to claim 15, device claim 15 and method claim 1 are related as method and device of using same, with each claimed element’s function corresponding to the method step. Accordingly claim 15 is similarly rejected under the same rationale as applied above with respect to method claim. Also, Sharifi et al., col. 2, lines 24-27, teach memory and processor. Also, Sharifi et al., col. 2, lines 47-51, teach CRM and instructions.

Regarding claim 3 (original), Sharifi et al. discloses method of claim 1, further comprising: 

applying the command to a second application based on the task by: 

selecting the task from a task set based on the command (Sharifi et al., col. 26, lines 38-45. The task is selected by applying the command/action to the open the crossword application.); 



applying another task from a different task set to supplement remaining interactions with the second application (''…the system may include a machine learning algorithm that can learn actions commonly performed by the user in the past and predict when it is likely the user intends to perform those actions again. For example, if the user commonly opens two applications together, e.g., .a crossword application and a dictionary application, the action may be opening the dictionary application when the user opens the crossword application,” Sharifi et al., col. 26, lines 38-45. Here, the (implicit) command to open the dictionary is another task used to supplement the remaining interaction with the crossword application (i.e., the second application).).  
As to claim 11, device claim 11 and method claim 3 are related as method and device of using same, with each claimed element’s function corresponding to the method step. Accordingly claim 11 is similarly rejected under the same rationale as applied above with respect to method claim. Also, Sharifi et al., col. 2, lines 24-27, teach memory and processor. And, Sharifi et al., col. 2, lines 47-51, teach CRM and instructions.
As to claim 17, device claim 17 and method claim 3 are related as method and device of using same, with each claimed element’s function corresponding to the method step. Accordingly claim 17 is similarly rejected under the same rationale as applied above with respect to method claim. Also, Sharifi et al., col. 2, lines 24-27, teach memory and processor. And, Sharifi et al., col. 2, lines 47-51, teach CRM and instructions.

Regarding claim 4 (currently amended), Sharifi et al. discloses method of claim 1, wherein: 

the task includes: 

a set of data representing a sequence of actions interacting with the first application and a semantic 

Regarding claim 5 (original), Sharifi et al. discloses the method of claim 1, wherein the plurality of features includes visual features comprising user interface elements (Sharifi et al., col. 15, lines 23-26 and col. 34, lines 64-65. See also Sharifi et al., fig. 21.).  

Regarding claim 6 (currently amended), Sharifi et al. discloses the method of claim 1, further comprising:

user interface interactions with the first application (Sharifi et al., fig. 18, shows current screen on mobile device that includes information that suggests a calendar event. This information is extracted from the interactions with the current screen.), slot type and slot value (Here, slot type = date/time and slot value = specific values (Wednesday/11:45).); 

wherein: 


for the user interface interactions with the first application without voice instruction (“User input action data 351 represents user input actions such as taps, swipes, text input, or any other action the user takes to interact with the mobile device 170,” Sharifi et al., col. 13, lines 61-64. Thus, the interactions with the first application may be performed a non-voice instruction, such as taps, swipes, etc.), the extracting includes an extraction of one of textual data (In Sharifi et al., fig. 18, the user enters text and information that is extracted to generate assistance 1805.) or one or more visual icons from an interface displayed by the electronic device (The image of Sharifi et al., fig. 16(1625) may be selected as a visual icon.); 

the textual data are used directly as a slot type (Sharifi et al., fig. 18, shows an example in which the user enters text information that suggests scheduling an event. The system then provides an assistance window which includes a calendar widget that adds event information, such as date and time. The date and time text data are slot types, with the specific date and time being the slot values.); and 



Regarding claim 12 (currently amended), Sharifi et al. discloses the electronic device of claim [[8]]9, wherein: 

the task includes: a set of data representing a sequence of actions interacting with the first application (Sharifi et al., fig. 17, shows how user enters text (i.e., actions) for interacting with the current screen (i.e., first application).), and a semantic 

the plurality of features includes visual features comprising user interface elements (Sharifi et al., col. 15, lines 23-26 and col. 34, lines 64-65. See also Sharifi et al., fig. 21.).  

Regarding claim 14 (original), Sharifi et al. discloses the electronic device of claim 13, wherein the process is further configured to: 

construct labeled utterance samples for natural language understanding engine development based on the slot type and the slot value (“The system may use conventional natural language processing techniques to respond to natural language queries, whether typed or spoken,” Sharifi et al., col. 31, lines 7-9. Thus, natural language processing is used to extract meaning from the utterance, such as slot type/value.).  
As to claim 20, CRM claim 20 and device claim 14 are related as device and CRM of using same, with each claimed element’s function corresponding to the device step. Accordingly claim 20 is similarly rejected under the same rationale as applied above with respect to device claim. Also, Sharifi et al., col. 2, lines 24-27, teach memory and processor. And, Sharifi et al., col. 2, lines 47-51, teach CRM and instructions.

Regarding claim 18 (currently amended), Sharifi et al. discloses the non-transitory processor-readable medium of claim 15, wherein the task includes: 

a set of data representing a sequence of actions interacting with the first application, and a semantic

the plurality of features includes visual features comprising user interface elements (Sharifi et al., col. 15, lines 23-26 and col. 34, lines 64-65. See also Sharifi et al., fig. 21.).  

Regarding claim 21 (currently amended), Sharifi et al. discloses an electronic device comprising: 

a memory storing instructions (Sharifi et al., col. 2, lines 24-26); and 

at least one processor executing the instructions (Sharifi et al., col. 2, lines 24-26), the at least one processor configured to: 

receive, at the electronic device, a command directed to a first application operated by the electronic device (Sharifi et al., fig. 17, shows that the user enters a request for a contact’s phone number into the current screen (a first application).); Page 10 of 21 

capture, at the electronic device, user interface interactions with the first application (Sharifi et al., fig. 17, shows user interactions with the current screen (i.e., first application).); and 

metadata information based on the user interface interactions with the first application, slot type and slot value for understanding of the command (Sharifi et al., fig. 17, shows how slot types for name and number are extracted from the interactions, as well as the corresponding values “Robert Jones” and “(888)342-4506”. Also, “In addition, the system may associate metadata with the image and key item. For example, the metadata may include where in the image the key item occurs, the rank of the key item with regard to the image, a timestamp for the image, a geo location of the device when the image was captured, etc.,” Sharifi et al., col. 31, lines 60-65. Thus, metadata is associated with key items in the captured screen image.).  


Regarding claim 22 (currently amended), Sharifi et al. discloses the electronic device of claim 21, wherein: 

the at least one processor is further configured to: 

extract one of textual data or one or more visual icons from an interface displayed by the electronic device for the user interface interactions with the first application without voice instruction (In Sharifi et al., fig. 18, the user enters text and information that is extracted to generate assistance 1805.), wherein the textual data or the one or more visual icons is identified based on actions performed and type of feature (“The assistance window 1805 includes a calendar widget that adds a new event to the calendar with the event information, such as date and time, surfaced based on information found in the screen,” Sharifi et al., col. 34, lines 37-40. The textual data is 

capture, at the electronic device, a plurality of features presented by the first application in response to the user interface interactions with the first application (As shown in Sharifi et al., fig. 18, features, such as calendar event are captured in response to the interactions with the first application.); 

the textual data are used directly as a slot type (Sharifi et al., fig. 18, shows an example in which the user enters text information that suggests scheduling an event. The system then provides an assistance window which includes a calendar widget that adds event information, such as date and time. The date and time text data are slot types, with the specific date and time being the slot values.); and 

the one or more visual icons are processed to extract semantic meaning (Sharifi et al., fig. 16(1640) shows that annotation data (semantic information) may be extracted from the selected image (i.e., the visual icon).), and the semantic meaning is used as another slot type (Sharifi et al., col. 33, lines 48-55, explains that the state of the mobile device represented by the image is a slot type.).  


Regarding claim 23 (currently amended), Sharifi et al. discloses the electronic device of claim 22, wherein the at least one processor is further configured to: 

capture, at the electronic device, data communicated with the first application associated with the user interface interactions with the first application (“The user inputs may have been captured, for example, by a screen capture engine, such as screen capture application 301 of FIG. 3,” Sharifi et al., col. 39 lines 4-6.);  

CSI18-A123-A1 (SAM2S-P.e21)Page 43 of 46capture, at the electronic device, a plurality of features presented by the first application in response to the user interface interactions with the first application (“Process 2000 may begin when the system receives an image of a screen captured at a mobile device (2005),” Sharifi et al., col. 34, lines 63-65. Here, the image contains a plurality of features. ); and 

learn a task associated with the command based on the plurality of features and communicated data (“The server may use the user input actions and set of screen capture images as input to a machine learning algorithm, for example as data. The machine learning algorithm may be configured to predict future actions based on past actions, and could be used to determine action events,” Sharifi et al., col. 40, lines 24-29.).  

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.

Claims 2, 7-8, 10, 13, 16, 19, and 24-27 are rejected under 35 U.S.C. 103 as being unpatentable over US 9916328, hereinafter referred to as Sharifi et al., in view of US 20190102482, hereinafter referred to as Ni.  

Regarding claim 2 (currently amended), Sharifi et al. discloses the method of claim 1, further comprising: 

constructing a graph representing correlations between the communicated data, wherein the communicated data includes voice data (Sharifi et al., col. 5, lines 16-41. And, Sharifi et al., col. 31, lines 5-9, explains that communicated data may include voice data.); and 

determining semantic meaning of the captured plurality of features based on the graph, wherein the plurality of features comprises at least one of textual features or icon features (Sharifi et al., col. 6, lines 29-60. This excerpt shows that the features comprise textural features.), and each of the plurality of features is further identified based on feature properties (Sharifi et al., col. 39, lines 1-4. The timestamps (i.e., feature properties) are used to capture the appropriate screen images.). 

Although implied, Sharifi et al. does not specifically disclose speech-to-text conversion for converting a user’s voice input to text so that textual features may be analyzed.

Ni is cited to teach speech-to-text conversion (Ni, para [0050]). Ni benefits Sharifi et al. by providing Sharifi et al. with the ability to generate text based on voice input (Ni, para [0050]). Therefore, it would be obvious for one skilled in the art to combine the teachings of Sharifi et al. with those of Ni to enhance the user interactions of Sharifi et al. 
As to claim 10, device claim 10 and method claim 2 are related as method and device of using same, with each claimed element’s function corresponding to the method step. Accordingly claim 10 is similarly rejected under the same rationale as applied above with respect to method claim. Also, Sharifi et al., col. 2, lines 24-27, teach memory and processor. And, Sharifi et al., col. 2, lines 47-51, teach CRM and instructions.
As to claim 16, device claim 16 and method claim 2 are related as method and device of using same, with each claimed element’s function corresponding to the method step. Accordingly claim 16 is similarly rejected under the same rationale as applied above with respect to method claim. Also, Sharifi et al., col. 2, lines 24-27, teach memory and processor. And, Sharifi et al., col. 2, lines 47-51, teach CRM and instructions.

Regarding claim 7 (currently amended), Sharifi et al. discloses the method of claim 6, wherein for the user interface interactions with the first application with voice instruction, at least a portion of the voice instruction is used as the slot type, and another portion of the voice instruction is used as the slot value (Sharifi et al., col. 5, lines 16-41. And, Sharifi et al., col. 31, lines 5-9, explains that communicated data may include voice data. The natural language processing is used to extract information, such as slot type/value.). 

Although implied, Sharifi et al. does not specifically disclose speech-to-text conversion for converting a user’s voice input to text so that textual features may be analyzed.

Ni is cited to teach speech-to-text conversion (Ni, para [0050]). Ni benefits Sharifi et al. by providing Sharifi et al. with the ability to generate text based on voice input (Ni, para [0050]). Therefore, it would be obvious for one skilled in the art to combine the teachings of Sharifi et al. with those of Ni to enhance the user interactions of Sharifi et al. 

Regarding claim 8 (original), Sharifi et al., as modified by Ni, discloses the method of claim 6, further comprising constructing labeled utterance samples for natural language understanding engine development based on the slot type and the slot value (“The system may use conventional natural language processing techniques to respond to natural language queries, whether typed or spoken,” Sharifi et al., col. 31, lines 7-9. Thus, natural language processing is used to extract meaning from the utterance, such as slot type/value.).  

Although implied, Sharifi et al. does not specifically disclose speech-to-text conversion for converting a user’s voice input to text so that textual features may be analyzed.

Ni is cited to teach speech-to-text conversion (Ni, para [0050]). Ni benefits Sharifi et al. by providing Sharifi et al. with the ability to generate text based on voice input (Ni, para [0050]). Therefore, it would be obvious for one skilled in the art to combine the teachings of Sharifi et al. with those of Ni to enhance the user interactions of Sharifi et al. 

claim 13 (currently amended), Sharifi et al., as modified by Ni, discloses the electronic device of claim [[8]]9, wherein: 

the process is further configured to: 


extract, from the user interface interactions with the first application, slot type and slot value (Sharifi et al., fig. 18, shows current screen on mobile device that includes information that suggests a calendar event. This information is extracted from the interactions with the current screen. Here, slot type = date/time and slot value = specific values (Wednesday/11:45).); and  

CSI18-A123-A1 (SAM2S-P.e21)Page 40 of 46for the user interface interactions with the first application without voice instruction (“User input action data 351 represents user input actions such as taps, swipes, text input, or any other action the user takes to interact with the mobile device 170,” Sharifi et al., col. 13, lines 61-64. Thus, the interactions with the first application may be performed a non-voice instruction, such as taps, swipes, etc.), extract one of textual data or one or more visual icons from an interface displayed by the electronic device, the textual data are used directly as a slot type (Sharifi et al., fig. 18, shows an example in which the user enters text information that suggests scheduling an event. The system then provides an assistance window which includes a calendar widget that adds event information, such as date and time. The date and time text data are slot types, with the specific date and time being the slot values.), the one or more visual icons are processed to extract semantic meaning (Sharifi et al., fig. 16(1640) shows that annotation data (semantic information) may be extracted from the selected image (i.e., the visual icon).), and the semantic meaning is used as another slot type (Sharifi et al., 

for the user interface interactions with the first application with voice instruction: at least a portion of the voice instruction is used as the slot type, and another portion of the voice instruction is used as the slot value (“The system may use conventional natural language processing techniques to respond to natural language queries, whether typed or spoken,” Sharifi et al., col. 31, lines 7-9. Thus, natural language processing is used to extract meaning from the utterance, such as slot type/value.). 
As to claim 19, CRM claim 19 and device claim 3 are related as device and CRM of using same, with each claimed element’s function corresponding to the device step. Accordingly claim 19 is similarly rejected under the same rationale as applied above with respect to device claim. Also, Sharifi et al., col. 2, lines 24-27, teach memory and processor. And, Sharifi et al., col. 2, lines 47-51, teach CRM and instructions.

Regarding claim 24 (currently amended), Sharifi et al. discloses the electronic device of claim 23, wherein the user interface interactions with the first application comprises voice instruction, at least a portion of the voice instruction is used as the slot type, and another portion of the voice instruction is used as the slot value (“The system may use conventional natural language processing techniques to respond to natural language queries, whether typed or spoken,” Sharifi et al., col. 31, lines 7-9. Thus, natural language processing is used to extract meaning from the utterance, such as slot type/value.). 

Although implied, Sharifi et al. does not specifically disclose speech-to-text conversion for converting a user’s voice input to text so that textual features may be analyzed.

Ni is cited to teach speech-to-text conversion (Ni, para [0050]). Ni benefits Sharifi et al. by providing Sharifi et al. with the ability to generate text based on voice input (Ni, para [0050]). Therefore, it would be obvious for one skilled in the art to combine the teachings of Sharifi et al. with those of Ni to enhance the user interactions of Sharifi et al. 


Regarding claim 25 (original), Sharifi et al. discloses the electronic device of claim 23, wherein the at least one processor is further configured to: 

construct labeled utterance samples for natural language understanding engine development based on the slot type and the slot value (“The system may use conventional natural language processing techniques to respond to natural language queries, whether typed or spoken,” Sharifi et al., col. 31, lines 7-9. Thus, natural language processing is used to extract meaning from the utterance, such as slot type/value.); 

construct a graph representing correlations between the communicated data, wherein the communicated data includes voice data ( Sharifi et al., col. 5, lines 16-41. And, Sharifi et al., col. 31, lines 5-9, explains that communicated data may include voice data.); and 



Although implied, Sharifi et al. does not specifically disclose speech-to-text conversion for converting a user’s voice input to text so that textual features may be analyzed.

Ni is cited to teach speech-to-text conversion (Ni, para [0050]). Ni benefits Sharifi et al. by providing Sharifi et al. with the ability to generate text based on voice input (Ni, para [0050]). Therefore, it would be obvious for one skilled in the art to combine the teachings of Sharifi et al. with those of Ni to enhance the user interactions of Sharifi et al. 

Regarding claim 26 (original), Sharifi et al., as modified by Ni, discloses the electronic device of claim 25, wherein the at least one processor is further configured to: 

apply the command to a second application based on the task by: 

select the task from a task set based on the command (Sharifi et al., col. 26, lines 38-45. The task is selected by applying the command/action to the open the crossword application.); 

apply the task to carry out a part of interactions with the second application (Sharifi et al., col. 26, lines 38-45. The task is the command/action to the crossword application.); and 

apply another task from a different task set to supplement remaining interactions with the second application (“…the system may include a machine learning algorithm that can learn actions commonly performed by the user in the past and predict when it is likely the user intends to perform those actions again. For example, if the user commonly opens two applications together, e.g., .a crossword application and a dictionary application, the action may be opening the dictionary application when the user opens the crossword application,” Sharifi et al., col. 26, lines 38-45. Here, the (implicit) command to open the dictionary is another task used to supplement the remaining interaction with the crossword application (i.e., the second application).).  

Regarding claim 27 (currently amended0, Sharifi et al., as modified by Ni, discloses the electronic device of claim 25, wherein:  

CSI18-A123-A1 (SAM2S-P.e21)Page 44 of 46the task includes: 

a set of data representing a sequence of actions interacting with the first application, and a semantic

the plurality of features includes visual features comprising user interface elements (Sharifi et al., col. 15, lines 23-26 and col. 34, lines 64-65. See also Sharifi et al., fig. 21.).


Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.  
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ANNE L THOMAS-HOMESCU whose telephone number is (571)272-0899.  The examiner can normally be reached on Mon-Fri 8-6.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh Mehta can be reached on 5712727453.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.






/ANNE L THOMAS-HOMESCU/Primary Examiner, Art Unit 2656