DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

The text of those sections of Title 35, U.S. Code not included in this action can be found in a prior Office action.

Response to Amendment
The amendment filed on December 8, 2021 has been entered.
The amendment of claims 1, 7, 8, 14, 15, and 20 has been acknowledged.
In view of the amendment, the 35 U.S.C. 112(b) and Double Patenting rejections have been withdrawn.

Response to Arguments
Applicant’s arguments filed on December 8, 2021, with respect to the pending claims, have been fully considered but are moot because the arguments rely on newly added and/or amended claim limitations. The examiner has revised the rejections to match the new claim limitations.

Claim Rejections - 35 USC § 103
Claim(s) 1-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Athsani et al. (US 2013/0044132 A1), in view of Ganjam et al. (US 2017/0311053 A1), and further in view of Folkens et al. (US 10,185,898 B1), hereinafter referred to as Athsani, Ganjam, and Folkens, respectively.
Regarding claim 1, Athsani teaches a method implemented by one or more processors (Athsani Fig. 9 & ¶0012: “The mobile device further includes processor and a memory that is configured to perform one or more of the above described operations”), the method comprising: 
determining, by a computing device, that a user has utilized an automated assistant that is accessible via the computing device (Athsani Fig. 2 & ¶0004: “As the user points the mobile device's camera at one or more objects in one or more scenes, such objects are automatically analyzed by the UAR to identify the one or more objects”); 
causing, based on the user’s utilization of the automated assistant, a camera of the computing device to provide a real-time image feed at a display interface of the computing device (Athsani ¶0004: “As the user points the mobile device's camera at one or more objects in one or more scenes, such objects are automatically analyzed by the UAR to identify the one or more objects and then provide meta data regarding the identified objects in the display of the mobile device … The user can utilize the UAR to continuously pass the camera over additional objects and scenes so that the meta data presented in the display of the mobile device is continuously updated); 
identifying, while the camera is providing the real-time image feed, an object that is represented in the real-time image feed from the camera (Athsani ¶0004 discussed above; Athsani ¶0005: “a method of providing information regarding one or more scenes captured with a camera of a mobile device”); 
Athsani ¶0044: “The image or video that is received by the camera may then be processed so as to identify one or more objects in the scene in operation 210”);
generating, based on identifying the object, object data that characterizes the object that is graphically represented in the real-time image feed from the camera (Athsani Fig. 7 & ¶0042-¶0043: “an image/video of a restaurant (i.e., Mike's Cafe) 710 is captured in the display 704. If the camera is not pointed at a scene, the procedure 200 may again check whether the UAR option has been selected … When the camera is pointed at a scene, such scene may be displayed with overlaid UAR options for selecting an encyclopedia, decision support, or action mode in operation 208”), 
wherein the object data includes graphical content, that is different from the real-time image feed, and natural language content characterizing the object (Athsani Fig. 7 & ¶0042-¶0043: “an image/video of a restaurant (i.e., Mike's Cafe) 710 is captured in the display 704. If the camera is not pointed at a scene, the procedure 200 may again check whether the UAR option has been selected … When the camera is pointed at a scene, such scene may be displayed with overlaid UAR options for selecting an encyclopedia, decision support, or action mode in operation 208” – note the graphical representation shown in Figs. 7C-7F are different from Fig. 7B, an image of the restaurant); 
causing, in response to the object being in a field of view of the camera and based on the object data, the graphical content and the natural language content to be rendered at the display interface, wherein the graphical content includes a graphical representation of the object that is simultaneously rendered with the natural language content characterizing the object (Athsani Fig. 7 & ¶0042-¶0043 discussed above – an image of the restaurant is simultaneously rendered with the text  “Mike’s Cafe”).
However, Athsani does not appear to explicitly teach determining that a user has provided a spoken utterance directed to an automated assistant;

rendering the natural language content with the graphical representation of the object.
Pertaining to the same field of endeavor, Ganjam teaches determining that a user has provided a spoken utterance directed to an automated assistant (Ganjam ¶0025: “receives one or more natural language queries from the user as part of a user input 137, and responds using stored information associated with the user, or using information retrieved from one or more external sources (i.e., the Internet)”; Ganjam Figs. 3-4 described in ¶0091-¶0092: teaches that when the user speaks “yes” to the automated assistance, the assistant provides graphical representation of the scene including the attributes associated with the objects; see another example shown in Ganjam Figs. 7-8).
Athsani and Ganjam are considered to be analogous art because they are directed to image processing for automated assistant applications. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the user augmented reality for camera-enabled mobile devices (as taught by Athsani) to detect the user’s vocal queries (as taught by Ganjam) because the combination allows the user to interact with the application without holding the device (Ganjam Figs. 3-4).
However, Athsani, in view of Ganjam, does not appear to explicitly teach that the graphical content is a graphical representation of the object that is obtained from a difference source; and rendering the natural language content with the graphical representation of the object.
Pertaining to the same field of endeavor, Folkens teaches that the graphical content is a graphical representation of the object that is obtained from a difference source; and rendering the natural language content with the graphical representation of the object (Folkens Figs. 2-3: the real-time image from Fig. 2, 210 is shown in Fig. 3 as 310 at the same time with a graphical representation of the object being image 320. The images are further displayed simultaneously  Folkens col. 25 lines 22-52: “Image Capture Screen 210 as illustrated is generated by, for example, an application executing on a smartphone, electronic glasses or other Image Source 120. Image Capture Screen 210 is includes features configured to capture an image, mark a specific area of interest, and receive image tags”; Folkens col. 25 lines 53-58: “Image Capture Screen 210 further includes a Field 240 showing a previously captured image and resulting image tags. In the example, show the previously captured image includes the same white cup without the Rectangle 230 and the image tags include ‘White Starbucks Coffee Cup.’ Also shown is text stating ‘Slide for options’”; Folkens col. 37 lines 1-13: “Display 920 is optionally configured to display both an image and an advertisement. For example, Display 920 may be configured to display an advertisement and an image at the same time (e.g., as an overlay. Display 920 may further be configured to display an image sequence received from Image Processing System 110”).
Athsani, in view of Ganjam, and Folkens are considered to be analogous art because they are directed to image processing for automated assistant applications. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the user augmented reality for camera-enabled mobile devices using speech (as taught by Athsani, in view of Ganjam) to provide a reference image containing graphical representation of the object of interest (as taught by Folkens) because the combination provides previously processed images with matching characteristics for faster processing.

Regarding claims 8 and 15, Athsani, in view of Ganjam and Folkens, further teaches a non-transitory computer readable storage medium and a computing device, comprising: one or more processors, and memory configured to store instructions that, when executed by one or more processors, cause the one or more processors to perform operations of claim 1 (Athsani Fig. 9 & ¶0076: “The computer system 900 includes any number of processors 902 (also referred to as central processing units, or CPUs) that are coupled to storage devices including 

Regarding claims 2, 9, and 16, Athsani, in view of Ganjam and Folkens, teaches the method, non-transitory computer readable storage medium, and computing device of claims 1, 8, and 15, further comprising: selecting, based on the spoken utterance, a conversation mode from a plurality of conversation modes for interacting with an assistant application via the camera of the computing device, wherein generating the object is performed in accordance with the selected conversation mode (Athsani Fig. 3 & ¶0005: “When a camera of the mobile device is pointed at a scene having one or more object(s), an image or video of the scene is presented in a display of the mobile device, and the image or video is overlaid with a plurality of options for selecting one of a plurality of user augmented reality modes that include an encyclopedia mode, a decision support mode, and an action mode”; Ganjam Figs. 3-4 & 7-8: the assistant program provides different modes based on what is being spoken by the user).

Regarding claims 3, 10, and 17, Athsani, in view of Ganjam and Folkens, teaches the method, non-transitory computer readable storage medium, and computing device of claims 1, 8, and 15, further comprising: receiving, at the display interface of the computing device, a selection of the graphical representation of the object, and causing, in response to receiving the selection of the graphical representation of the object, additional data to be rendered at the display interface of the computing device, wherein the additional data is associated with the object (Athsani Abstract: “The meta data is interactive and allows the user to obtain additional Athsani Fig. 4 & ¶0053: “it may then be determined whether more information has been requested in operation 410. For example, a user may select search filter options, such as the number or type (e.g., category) of search results. If more information has been requested, an additional search may be performed and resulting search results may be overlaid in an image/video that is returned to the mobile device in operations 404, 406, and 408. If the user selects filter criteria, the search may be performed or search results refined based on such filter criteria”; Ganjam Figs. 3-4 & 7-8 discussed above – after processing the user’s voice command, the device provides with additional information regarding the objects appearing in the scene, also see Ganjam Figs. 5-6).

Regarding claims 4, 11, and 18, Athsani, in view of Ganjam and Folkens, teaches the method, non-transitory computer readable storage medium, and computing device of claims 1, 8, and 15, further comprising: causing, in response to the object being in the field of view of the camera, audio to be rendered via an audio interface of the computing device, wherein the audio is rendered when the object is graphically represented in the real-time image feed from the camera (Athsani ¶0029: “The obtained information may then be presented to the user on the mobile device, for example, via the display or an audio output”; Ganjam ¶0023: “Where the sensor data is sound, the entities 167 may include individuals or animals whose identity or type can be recognized by their voice or sounds”; Ganjam ¶0026: “the entity engine 120 may apply one or more of audio, face, text, and object recognition algorithms to the sensor data to identify the entities 167”; Ganjam ¶0047: “Each entity 167 in the global entity data 213 may include descriptive information that may be used by the entity identifier 210, and one or more APIs, to identify the entities 167. The descriptive information may include information such as size, color, 

Regarding claims 5, 12, and 19, Athsani, in view of Ganjam and Folkens, teaches the method, non-transitory computer readable storage medium, and computing device of claims 1, 8, and 15, further comprising: causing, in response to the object being in the field of view of the camera and based on the object data, audible content to be rendered via an audio interface of the computing device, wherein the audible content includes additional natural language content characterizing the object (Athsani ¶0029 discussed above; Athsani Fig. 5: different mode outputs, e.g., encyclopedia and decision support, may be presented to the user via the display or audio according to Athsani ¶0029).

Regarding claims 6-7, 13-14, and 20, Athsani, in view of Ganjam and Folkens, teaches the method, non-transitory computer readable storage medium, and computing device of claims 1, 8, and 15, wherein processing the one or more images from the real-time image feed includes: 
performing optical character recognition on the one or more images from the real-time image feed (Athsani ¶0065: “For graphical components on a captured image/video, the size, shape, location, etc. that correspond to specific objects in an image/video may be predetermined which allows a template to be constructed for particular object sets), 15) using optical character recognition algorithms and methods”); and 
identifying a reference image determined to match the one or more images under consideration (Athsani ¶0063: “the image/video 606 includes the restaurant ‘Mike's Cafe' 605. The data may also be captured in the image recognition database 611, and this recorded data may be used for later object recognition of a similar image/video and/or so as to be accessible by the user A (602). The image recognition database could also contain training object Athsani ¶0066: “This recorded data may be used for later searching with respect to a similar image/video and/or so as to be accessible by the user A (602)”; also see Folkens col. 25 lines 53-58: “Image Capture Screen 210 further includes a Field 240 showing a previously captured image and resulting image tags. In the example, show the previously captured image includes the same white cup without the Rectangle 230 and the image tags include ‘White Starbucks Coffee Cup.’”; Folkens col. 42 lines 5-19: “an image descriptor identifying a vehicle shape may match with a previously stored image descriptor associated with a ‘vehicle’ class”),
the graphical representation of the object is a reference image of the object, and the reference image is rendered on top of the real-time image feed (Folkens Figs. 2-3 discussed above).

Statement Regarding Double Patenting
Upon further consideration of the claims as a whole and the amendment filed on December 8, 2021, the examiner has withdrawn the Double Patenting rejections because the claims of the instant application requires determining the user’s speech as the first step of the process. Rather than the user moving the camera device and activating the automated assistance based on the camera movement, the instant application activates the automated assistance only after recognizing the user’s spoken utterance.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to SOO J SHIN whose telephone number is (571)272-9753. The examiner can normally be reached M-F; 10-6.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Matthew Bella can be reached on (571)272-7778. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/Soo Shin/Primary Examiner, Art Unit 2667