DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Status of Claims
Claims 1-20 are pending in this application.
Response to Arguments
Regarding Rejection under 35 U.S.C. 103
Applicant’s arguments with respect to rejections have been fully considered but they are not persuasive.
Regarding Claim 1, the Applicant argues that the rejection under 35 U.S.C. 103 is improper because the combination of Eledath and Badr does not teach or suggest the amended claim 1 and Badr does not teach or suggest that the processor determines whether to trigger the camera by matching the first utterance input with the first recognized data because the camera continues to run.
However, the Examiner respectfully disagrees. The rejection under 35 U.S.C. 103 is still proper because Eledath in view of Badr do teach or suggest the newly amended claims. Also in response to applicant's argument that the references fail to show certain features of applicant's invention, it is noted that the features upon which applicant relies (i.e., “triggering camera automatically”) are not recited in the rejected claims.  Although the claims are interpreted in light of the specification, limitations from the specification are not read into the claims.  See In re Van Geuns, 988 F.2d 1181, 26 USPQ2d 1057 (Fed. Cir.1993). In addition, Badr teaches that in response to determining that the request is not resolvable, a prompt instructs the user to capture additional sensor data ([claim 1][0005][0053]), which is indicative of “activate the camera when the first recognized data does not match the first utterance input”. Thus, the rejection is maintained at this time. Please see the rejection below. 
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The text of those sections of Title 35, U.S. Code not included in this action can be found in a prior Office action.
The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 1-20 are rejected under pre-AIA  35 U.S.C. 103(a) as being unpatentable over Eledath et al., (US Pub. 2016/0378861) in view of Badr et al., (US Pub. 2018/0336414).
Regarding claim 1, Eledath discloses an electronic device comprising: a microphone; a display; a camera; a memory; and a processor (Figs. 1 and 4, microphone 116, display device 138, database 106, computing system 110), wherein the processor is configured to: 
receive a first utterance input through the microphone (Figs. 5 and 6, [0104][0105] receiving natural language speech audio/query);
obtain first recognized data from a first image displayed on the display or stored in the memory, wherein the first recognized data comprises information of an object or a text included in the first image (Figs. 5 and 6, [0104][0105] “Based on the system 110's semantic understanding of the feature 612, the system 110 generates and displays virtual element 614, which identifies the person depicted in the image as well as employment information about the person”; employment information of the captured image is indicative of “first recognized data from a first image”, as recited in claim);
determine whether the first recognized data matches the first utterance input by confirming whether an attribute of the first recognized data matches a domain of the first utterance input ([0062][0066][0068] “The system 110 combines the high confidence hypothesis matches produced by multiple subsystems to arrive at a final interpretation of the user's speech …Each visual understanding algorithm produces results with associated confidence scores so that higher-level reasoning components like the dynamic information aperture filter can adaptively ask for user guidance and to adapt the workflow.”);
store the first recognized data in association with the first utterance input when the first recognized data matches the first utterance input (Figs. 5 and 6, [0104][0105] “stores the link and related information in the knowledge base 106 or other databases or searchable storage locations”);
[activate the camera when the first recognized data does not match] the first utterance input; obtain second recognized data from a second image collected through the camera, wherein the second recognized data comprises information of an object or a text included in the second image ([0055][0067][0072][0119][0131][0137] when speech query is needed to further process, the system initiates the processing to determine additional details about the image object by Real-time access to visual feature data which is facilitated by the system 110's ability to rapidly cache data based on context);
determine whether the second recognized data matches the first utterance input by confirming whether an attribute of the second recognized data matches the domain of the first utterance input ([0072] “If the intent reasoner cannot determine the best match, it will first ask the dynamic information aperture module for additional information …If the multi-modal reasoning leads to inconclusive intent the reasoner will default to asking for additional information from the user to finalize the user intent”); and 
store the second recognized data in association with the first utterance input when the second recognized data matches the first utterance input ([0134] Based on the determined similarity of the extracted faces, the system establishes the links and may store these links).
Eledath does not explicitly teach including the bracketed limitation, however, Badr does explicitly teach:
[activate the camera when the first recognized data does not match] the first utterance input ([claim 1][0005] “determining that the one or more initial attributes fail to define the object with the degree of specificity necessary for resolving the request, wherein the degree of specificity is a target degree of classification in a classification taxonomy; in response to determining that the request is not resolvable: providing, for presentation to the user via the automated assistant interface of the client device, a prompt that instructs the user to capture additional sensor data or to move the object).
Therefore, it would have been obvious to one of ordinary skill before the effective filing date of the claimed invention to incorporate the method of resolving user’s speech query using image data as taught by Eledath with the method of determining that the request is not resolvable based on the initial sensor data as taught by Badr to provide guidance on further input that will enable the request to be resolved in order to enhance interaction between human and machine (Badr, [0005]).
Regarding claim 2, Eledath in view of Badr discloses the electronic device of claim 1, and Eledath further discloses:
wherein the processor is further configured to analyze the first image or the second image when the first utterance input includes a designated keyword or a parameter to perform a task corresponding to the first utterance input is omitted (Eledath, Figs. 5 and 6, [0038][0104][0105][0123][0131] identifying the person depicted in the image as well as employment information about the person by facial recognition technology corresponding to the utterance).
Regarding claim 3, Eledath in view of Badr discloses the electronic device of claim 1, and Eledath further discloses:
wherein the processor is further configured to: execute a voice recognition application, and process the first utterance input through the voice recognition application (Eledath, Fig. 6 and [0049][0062][0105] speech recognition and understanding technology).
Regarding claim 4, Eledath in view of Badr discloses the electronic device of claim 1, and Eledath further discloses:
wherein the processor is further configured to determine a format of the first recognized data or a format of the second recognized data based on an attribute of the first utterance input (Fig. 6, [0038][0104][0105] determining a format based on the input which includes the phrases e.g. “add a link”).
Regarding claim 5, Eledath in view of Badr discloses the electronic device of claim 2, and Eledath further discloses:
wherein the processor is further configured to determine a screen shot of an execution screen of a recently executed application as the first image ([0038] real time detecting of a later frame of a video/image).
Regarding claim 6, Eledath in view of Badr discloses the electronic device of claim 1, and Eledath further discloses:
wherein the processor is further configured to determine the first image based on receipt of the first image from a recently executed application ([0038] detecting object from a later frame of a video/image).
Regarding claim 7, Eledath in view of Badr discloses the electronic device of claim 1, and Eledath further discloses:
wherein the processor is further configured to determine a preview screen of the camera as the second image (Fig. 9, items 904 and 906, [0111] extracting a part of the image from an original image).
Regarding claim 8, Eledath in view of Badr discloses the electronic device of claim 7, and Eledath further discloses:
wherein the processor is further configured to display, on the display, a user notification or an image capture guide when the second recognized data corresponding to an attribute of the first utterance input is not recognized in the preview screen (Fig. 9, items 904 and 906, [0111] “The text box overlaid on the image 904 indicates that the system 110 provides feedback to let the user know that the user's inquiry has been received and is being processed”).
Regarding claim 9, Eledath in view of Badr discloses the electronic device of claim 1. Eledath does not explicitly teach, however Badr does explicitly teach:
wherein the processor is further configured to determine, as the second image, an image captured in response to an image capture input from a user (Badr, [0008][0013][0057] acquiring additional picture based on capturing image input by an user). 
Regarding claim 10, Eledath in view of Badr discloses the electronic device of claim 1, and Eledath further discloses:
wherein the processor is further configured to store additional information associated with the first recognized data or the second recognized data together with the first recognized data or the second recognized data ([0128] applying “one or more pattern matching algorithms or statistical or rules-based inference algorithms, which in turn utilize a plurality of data sources or knowledge bases such as information need models 1810, collection plan 1814, and dynamic user context 1816 (e.g., a combination of live and stored data) to perform inferencing”).
Regarding claim 11, Eledath in view of Badr discloses the electronic device of claim 1. Eledath does not explicitly teach, however Badr does explicitly teach:
wherein the processor is further configured to: store a portion of the first recognized data in association with a first application, and store another portion of the first recognized data in association with a second application (Fig. 2A and 2B, [0062]-[0068] a picture 261A is association with the request engine 124 and an additional picture 261B is association with the request resolution engine 130).
Regarding claim 12, Eledath in view of Badr discloses the electronic device of claim 1, and Eledath further discloses:
wherein the processor is further configured to display the first recognized data or the second recognized data automatically or when a second utterance input is received (Figs. 5, item 508, [0104][0105] displaying the recognized data automatically).
Regarding claim 13, Eledath in view of Badr discloses the electronic device of claim 1. Eledath does not explicitly teach, however Badr does explicitly teach:
wherein the processor is further configured to: execute an application associated with a task corresponding to the first utterance input, and allow the application to enter a designated state based on the first recognized data or the second recognized data (Fig. 4, [0059]-[0062] User input field 164 may be operable by a user to order a product
Regarding claim 14, Eledath in view of Badr discloses the electronic device of claim 13. Eledath does not explicitly teach, however Badr does explicitly teach:
wherein the processor is further configured to fill a field included in a user interface of the application based on the first recognized data or the second recognized data (Fig. 4, [0059]-[0062] ordering a product by the graphical user interface which includes a user input field 164).
Regarding claim 15, Eledath discloses an electronic device comprising: a microphone; a display; a camera; a memory; and a processor (Figs. 1 and 4, microphone 116, display device 138, database 106, computing system 110), wherein the processor is configured to: execute a first application, obtain a first utterance input through the microphone from the first application (Figs. 5 and 6, [0081][0104][0105] receiving natural language speech audio/query through AR application),  
obtain first recognized data associated with the first utterance input from a first image displayed on the display or stored in the memory, wherein the first recognized data comprises information of an object or a text included in the first image (Figs. 5 and 6, [0104][0105] “Based on the system 110's semantic understanding of the feature 612, the system 110 generates and displays virtual element 614, which identifies the person depicted in the image as well as employment information about the person”; employment information of the captured image is indicative of “first recognized data from a first image”, as recited in claim);
store the first recognized data when an attribute of the first recognized data matches a domain of the first utterance input (Figs. 5 and 6, [0104][0105] “stores the link and related information in the knowledge base 106 or other databases or searchable storage locations”; [0062][0066][0068] “The system 110 combines the high confidence hypothesis matches produced by multiple subsystems to arrive at a final interpretation of the user's speech …Each visual understanding algorithm produces results with associated confidence scores so that higher-level reasoning components like the dynamic information aperture filter can adaptively ask for user guidance and to adapt the workflow.”);
Eledath does not explicitly teach, however, Badr does explicitly teach:
provide the first recognized data so that the first recognized data is used through a second application (Fig. 2A and 2B, [0062]-[0068] a picture 261A is association with the request engine 124 and an additional picture 261B is association with the request resolution engine 130).
Therefore, it would have been obvious to one of ordinary skill before the effective filing date of the claimed invention to incorporate the method of resolving user’s speech query using image data as taught by Eledath with the method of determining that the request is not resolvable based on the initial sensor data as taught by Badr to provide guidance on further input that will enable the request to be resolved in order to enhance interaction between human and machine (Badr, [0005]).
Regarding claim 16, Eledath in view of Badr discloses the electronic device of claim 15. Eledath does not explicitly teach, however Badr does explicitly teach:
wherein the processor is further configured to: analyze the first image; execute the first application based on a result of the first image that is executed; and provide, to a user, a query corresponding to the result of an analysis through the first application (Fig. 2A, [0013] A prompt/query is provided in response to determining the request being unresolvable).
Regarding claim 17, Eledath in view of Badr discloses the electronic device of claim 15. Eledath does not explicitly teach, however Badr does explicitly teach:
wherein the processor is further configured to store the first recognized data in a security area of the memory when the first recognized data or the first utterance input is related to personal information about a user ([0113] “the systems and methods discussed herein collect, store and/or use user personal information only upon receiving explicit authorization from the relevant users to do so”).
Regarding claim 18, Eledath discloses a voice recognition method performed in an electronic device, the voice recognition method comprising:
receive a first utterance input through the microphone (Figs. 5 and 6, [0104][0105] receiving natural language speech audio/query), 
receive a first utterance input through the microphone (Figs. 5 and 6, [0104][0105] receiving natural language speech audio/query);
extracting first recognized data from a first image displayed on a display of the electronic device or stored in a memory of the electronic device, wherein the first recognized data comprises information of an object or a text included in the first image (Figs. 5 and 6, [0104][0105] “Based on the system 110's semantic understanding of the feature 612, the system 110 generates and displays virtual element 614, which identifies the person depicted in the image as well as employment information about the person”; employment information of the captured image is indicative of “first recognized data from a first image”, as recited in claim);
determining whether the first recognized data that is extracted from the first image matches the first utterance input by confirming whether an attribute of the first recognized data matches a domain of the first utterance input ([0062][0066][0068] “The system 110 combines the high confidence hypothesis matches produced by multiple subsystems to arrive at a final interpretation of the user's speech …Each visual understanding algorithm produces results with associated confidence scores so that higher-level reasoning components like the dynamic information aperture filter can adaptively ask for user guidance and to adapt the workflow.”);
storing the first recognized data in association with the first utterance input when the first recognized data that is extracted from the first image matches the first utterance input (Figs. 5 and 6, [0104][0105] “stores the link and related information in the knowledge base 106 or other databases or searchable storage locations”);
[activating a camera when the first recognized data that is extracted from the first image does not match] the first utterance input; extracting second recognized data from a second image collected through the camera: wherein the second recognized data comprises information of an object or a text included in the second image ([0055][0067][0072][0119][0131][0137] when speech query is needed to further process, the system initiates the processing to determine additional details about the image object by Real-time access to visual feature data which is facilitated by the system 110's ability to rapidly cache data based on context); and
storing the second recognized data in association with the first utterance input when the second recognized data that is extracted from the second image matches the first utterance input ([0134] Based on the determined similarity of the extracted faces, the system establishes the links and may store these links).
Eledath does not explicitly teach including the bracketed limitation, however, Badr does explicitly teach:
[activating a camera when the first recognized data that is extracted from the first image does not match] the first utterance input ([claim 1][0005] “determining that the one or more initial attributes fail to define the object with the degree of specificity necessary for resolving the request, wherein the degree of specificity is a target degree of classification in a classification taxonomy; in response to determining that the request is not resolvable: providing, for presentation to the user via the automated assistant interface of the client device, a prompt that instructs the user to capture additional sensor data or to move the object).
Therefore, it would have been obvious to one of ordinary skill before the effective filing date of the claimed invention to incorporate the method of resolving user’s speech query using image data as taught by Eledath with the method of determining that the request is not resolvable based on the initial sensor data as taught by Badr to provide guidance on further input that will enable the request to be resolved in order to enhance interaction between human and machine (Badr, [0005]).
Regarding claims 19-20, Claims 19-20 are corresponding method claims to system claims 5, and 7, respectively. Therefore, claims 19-20 are rejected using the same rationale as applied to claims 5, and 7 above.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Please see attached form PTO-892.
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SEONG-AH A. SHIN whose telephone number is (571)272-5933. The examiner can normally be reached 9 AM-3PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Pierre-Louis Desir can be reached on 571-272-7799. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/SEONG-AH A SHIN/Primary Examiner, Art Unit 2659