Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 6/30/2022 has been entered.
 
Response to Arguments
Applicant's arguments submitted on 6/30/2022 have been fully considered, but are moot, because the arguments are directed at Katayama.  In response to Applicant’s amendments, Examiner now cites new prior art Harris et al., US 2016/0109954 A1 (hereinafter referred to as “Harris”), that discloses presenting the image data to a user via a display; and augmenting the image data presented in the display, while the objects remain in the line-of-sight of the camera, by: generating text comprising information associated with the first object; and overlaying the image data presented in the display with the text, a position of the overlaid text being based on the detected location of the first object within the image data (see Harris Figs. 12G, 16C, and 31D, and paras. 0210, 0277, and 0353, where “. . . the V-GLASSES may determine a position of the virtual label (e.g., the X-Y coordinate values, etc.) 2063, e.g., the virtual label may be positioned close to the object, and inject the generated virtual label overlaying the live video at the position 2065”).  It would have been obvious to one of ordinary skill in the art at the time of filing to use the object recognition and display technique of Harris to highlight and describe the locations of the detected objects of Sawada, because it is predictable that doing so would improve the speed and ease at which object locations are understood by the user, and that the displayed information would assist the user in deciding the user’s next action, for example, grabbing or purchasing the object.

Claim Objections
The claim objections are hereby withdrawn in response to Applicant’s amendments.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claim(s) 1, 2, 8, 9, 15, and 16 is/are rejected under 35 U.S.C. 103 as being unpatentable over Sawada et al., US 2019/0197345 A1 (hereinafter referred to as “Sawada”) in view of Harris et al., US 2016/0109954 A1 (hereinafter referred to as “Harris”).  

Regarding claim 1, Sawada discloses a method, comprising: capturing image data of objects within a line-of-sight of a camera (see Sawada Figs. 1-4, and paras. 0032 and 0037, where a CPU and memory execute instructions for processing images captured by a camera; see also paras. 0078-0083 and 0091, where the example applications indicate that the object detector detects the foreground object and differentiates the foreground object from the background, and that background includes various background objects such as roads, non-character regions of signs, and a conveyer line, for example); generating a two-dimensional general activation map based on the image data; comparing each value within the two-dimensional general activation map to a predetermined threshold; and detecting a location of a first object of the objects within the image data based on a set of values within the two-dimensional general activation map that exceed the predetermined threshold (see Sawada Figs. 1-4, and paras. 0053-0060, where object regions are detected by comparing each of the features from the saliency map to a threshold). 
Sawada does not explicitly disclose presenting the image data to a user via a display; and augmenting the image data presented in the display, while the objects remain in the line-of-sight of the camera, by: generating text comprising information associated with the first object; and overlaying the image data presented in the display with the text, a position of the overlaid text being based on the detected location of the first object within the image data.
However, Harris discloses presenting the image data to a user via a display; and augmenting the image data presented in the display, while the objects remain in the line-of-sight of the camera, by: generating text comprising information associated with the first object; and overlaying the image data presented in the display with the text, a position of the overlaid text being based on the detected location of the first object within the image data (see Harris Figs. 12G, 16C, and 31D, and paras. 0210, 0277, and 0353, where “. . . the V-GLASSES may determine a position of the virtual label (e.g., the X-Y coordinate values, etc.) 2063, e.g., the virtual label may be positioned close to the object, and inject the generated virtual label overlaying the live video at the position 2065”).
It would have been obvious to one of ordinary skill in the art at the time of filing to use the object recognition and display technique of Harris to highlight and describe the locations of the detected objects of Sawada, because it is predictable that doing so would improve the speed and ease at which object locations are understood by the user, and that the displayed information would assist the user in deciding the user’s next action, for example, grabbing or purchasing the object.

Regarding claim 2, Sawada discloses wherein the two-dimensional general activation map is generated without using class-specific weights (see Sawada Figs. 1-4, and paras. 0053-0060, where object regions are detected by comparing each of the features from the saliency map to a threshold without using any class-specific weights).

Regarding claim 8, Sawada discloses a non-transitory computer readable medium including one or more sequences of instructions that, when executed by one or more processors, cause a computing system to perform operations comprising: capturing image data of objects within a line-of-sight of a camera (see Sawada Figs. 1-4, and paras. 0032 and 0037, where a CPU and memory execute instructions for processing images captured by a camera; see also paras. 0078-0083 and 0091, where the example applications indicate that the object detector detects the foreground object and differentiates the foreground object from the background, and that background includes various background objects such as roads, non-character regions of signs, and a conveyer line, for example); generating a two-dimensional general activation map based on the image data; comparing each value within the two-dimensional general activation map to a predetermined threshold; and detecting a location of a first object within the image data based on a set of values within the two-dimensional general activation map that exceed the predetermined threshold (see Sawada Figs. 1-4, and paras. 0053-0060, where object regions are detected by comparing each of the features from the saliency map to a threshold).
Sawada does not explicitly disclose presenting the image data to a user via a display, while objects remain in the line-of-sight of the camera; generating text comprising information associated with the first object; and overlaying the image data presented in the display with the text, a position of the overlaid text being based on the detected location of the first object within the image data.
However, Harris discloses presenting the image data to a user via a display, while objects remain in the line-of-sight of the camera; generating text comprising information associated with the first object; and overlaying the image data presented in the display with the text, a position of the overlaid text being based on the detected location of the first object within the image data (see Harris Figs. 12G, 16C, and 31D, and paras. 0210, 0277, and 0353, where “. . . the V-GLASSES may determine a position of the virtual label (e.g., the X-Y coordinate values, etc.) 2063, e.g., the virtual label may be positioned close to the object, and inject the generated virtual label overlaying the live video at the position 2065”).
It would have been obvious to one of ordinary skill in the art at the time of filing to use the object recognition and display technique of Harris to highlight and describe the locations of the detected objects of Sawada, because it is predictable that doing so would improve the speed and ease at which object locations are understood by the user, and that the displayed information would assist the user in deciding the user’s next action, for example, grabbing or purchasing the object.

Regarding claim 9, Sawada discloses wherein the two-dimensional general activation map is generated without using class-specific weights (see Sawada Figs. 1-4, and paras. 0053-0060, where object regions are detected by comparing each of the features from the saliency map to a threshold without using any class-specific weights).

Regarding claim 15, Sawada discloses a system, comprising: a processor; and a memory having programming instructions stored thereon, which, when executed by the processor, causes the system to perform operations, comprising: capturing image data of objects within a line-of-sight of a camera (see Sawada Figs. 1-4, and paras. 0032 and 0037, where a CPU and memory execute instructions for processing images captured by a camera; see also paras. 0078-0083 and 0091, where the example applications indicate that the object detector detects the foreground object and differentiates the foreground object from the background, and that background includes various background objects such as roads, non-character regions of signs, and a conveyer line, for example); generating a two-dimensional general activation map based on the image data; comparing each value within the two-dimensional general activation map to a predetermined threshold; and detecting a location of a first object within the image data based on a set of values within the two-dimensional general activation map that exceed the predetermined threshold (see Sawada Figs. 1-4, and paras. 0053-0060, where object regions are detected by comparing each of the features from the saliency map to a threshold).
Sawada does not explicitly disclose presenting the image data to a user via a display, while the objects remain in the line-of-sight of the camera; generating text comprising information associated with the first object; and overlaying the image data presented in the display with the text, a position of the overlaid text being based on the detected location of the first object within the image data.
However, Harris discloses presenting the image data to a user via a display, while the objects remain in the line-of-sight of the camera; generating text comprising information associated with the first object; and overlaying the image data presented in the display with the text, a position of the overlaid text being based on the detected location of the first object within the image data (see Harris Figs. 12G, 16C, and 31D, and paras. 0210, 0277, and 0353, where “. . . the V-GLASSES may determine a position of the virtual label (e.g., the X-Y coordinate values, etc.) 2063, e.g., the virtual label may be positioned close to the object, and inject the generated virtual label overlaying the live video at the position 2065”).
It would have been obvious to one of ordinary skill in the art at the time of filing to use the object recognition and display technique of Harris to highlight and describe the locations of the detected objects of Sawada, because it is predictable that doing so would improve the speed and ease at which object locations are understood by the user, and that the displayed information would assist the user in deciding the user’s next action, for example, grabbing or purchasing the object.

Regarding claim 16, Sawada discloses wherein the two-dimensional general activation map is generated without using class-specific weights (see Sawada Figs. 1-4, and paras. 0053-0060, where object regions are detected by comparing each of the features from the saliency map to a threshold without using any class-specific weights).

Claim(s) 3, 10, and 17 is/are rejected under 35 U.S.C. 103 as being unpatentable over Sawada in view of Harris as applied to claim(s) 1, 8, and 15 above, and in further view of Katayama et al., US 2020/0193241 A1 (hereinafter referred to as “Katayama”).

Regarding claim 3, Sawada discloses wherein detecting the location of the first object within the image data based on the set of values within the two-dimensional general activation map that exceeds the predetermined threshold comprises: grouping one or more elements that exceed the predetermined threshold (see Sawada Figs. 1-4, and paras. 0053-0060, where object regions are detected by comparing each of the features from the saliency map to a threshold).
Sawada does not explicitly disclose grouping one or more elements into a bounding box, wherein the bounding box defines the location of the first object within the image data.
However, Katayama discloses grouping one or more elements into a bounding box, wherein the bounding box defines the location of the first object within the image data (see Katayama Figs. 6 and 9, and paras. 0065-0067 and 0071-0073, where a bounding box and a text label for an object are overlaid on an image displayed to a user).
It would have been obvious to one of ordinary skill in the art a t the time of filing to use the bounding box technique of Katayama to further highlight and describe the locations of the detected objects of Sawada, as modified by Harris, because it is predictable that doing so would improve the speed and ease at which object locations are understood by the user (see Katayama para. 0006-0009, where “[i]t is an object of the present invention to provide a surgical instrument detection system that can readily identify the kinds and number of diverse surgical instruments without special processing, such as application of an optically readable symbol, to the surgical instruments”).

Regarding claim 10, Sawada discloses wherein detecting the location of the first object within the image data based on the set of values within the two-dimensional general activation map that exceeds the predetermined threshold comprises: grouping one or more elements that exceed the predetermined threshold (see Sawada Figs. 1-4, and paras. 0053-0060, where object regions are detected by comparing each of the features from the saliency map to a threshold).
Sawada does not explicitly disclose grouping one or more elements into a bounding box, wherein the bounding box defines the location of the first object within the image data.
However, Katayama discloses grouping one or more elements into a bounding box, wherein the bounding box defines the location of the first object within the image data (see Katayama Figs. 6 and 9, and paras. 0065-0067 and 0071-0073, where a bounding box and a text label for an object are overlaid on an image displayed to a user).
It would have been obvious to one of ordinary skill in the art a t the time of filing to use the bounding box technique of Katayama to further highlight and describe the locations of the detected objects of Sawada, as modified by Harris, because it is predictable that doing so would improve the speed and ease at which object locations are understood by the user (see Katayama para. 0006-0009, where “[i]t is an object of the present invention to provide a surgical instrument detection system that can readily identify the kinds and number of diverse surgical instruments without special processing, such as application of an optically readable symbol, to the surgical instruments”).

Regarding claim 17, Sawada discloses wherein detecting the location of the first object within the image data based on the set of values within the two-dimensional general activation map that exceeds the predetermined threshold comprises: grouping one or more elements that exceed the predetermined threshold (see Sawada Figs. 1-4, and paras. 0053-0060, where object regions are detected by comparing each of the features from the saliency map to a threshold).
Sawada does not explicitly disclose grouping one or more elements into a bounding box, wherein the bounding box defines the location of the first object within the image data.
However, Katayama discloses grouping one or more elements into a bounding box, wherein the bounding box defines the location of the first object within the image data (see Katayama Figs. 6 and 9, and paras. 0065-0067 and 0071-0073, where a bounding box and a text label for an object are overlaid on an image displayed to a user).
It would have been obvious to one of ordinary skill in the art a t the time of filing to use the bounding box technique of Katayama to further highlight and describe the locations of the detected objects of Sawada, as modified by Harris, because it is predictable that doing so would improve the speed and ease at which object locations are understood by the user (see Katayama para. 0006-0009, where “[i]t is an object of the present invention to provide a surgical instrument detection system that can readily identify the kinds and number of diverse surgical instruments without special processing, such as application of an optically readable symbol, to the surgical instruments”).

Claim(s) 4, 11, and 18 is/are rejected under 35 U.S.C. 103 as being unpatentable over Sawada in view of Harris and Katayama as applied to claim(s) 3, 10, and 17 above, and in further view of Hisada, US 2019/0392606 A1 (hereinafter referred to as “Hisada”).

Regarding claim 4, Sawada does not explicitly disclose further comprising: receiving a request from the user for a desired granularity of the first object; based on the request, adjusting the predetermined threshold; and regrouping the one or more elements based on the adjusted predetermined threshold.
However, Hisada discloses further comprising: receiving a request from the user for a desired granularity of the first object; based on the request, adjusting the predetermined threshold; and regrouping the one or more elements based on the adjusted predetermined threshold (see Hisada para. 0059, where a granularity and corresponding threshold are used).
It would have been obvious to one of ordinary skill in the art at the time of filing to permit the user of Sawada and Harris to choose the granularity to match the object the user wants to detect, because it is predictable that doing so would improve the accuracy of the object detection by searching for the object at its closest and most correct granularity level.

Regarding claim 11, Sawada does not explicitly disclose further comprising: receiving a request from the user for a desired granularity of the first object; based on the request, adjusting the predetermined threshold; and regrouping the one or more elements based on the adjusted predetermined threshold.
However, Hisada discloses further comprising: receiving a request from the user for a desired granularity of the first object; based on the request, adjusting the predetermined threshold; and regrouping the one or more elements based on the adjusted predetermined threshold (see Hisada para. 0059, where a granularity and corresponding threshold are used).
It would have been obvious to one of ordinary skill in the art at the time of filing to permit the user of Sawada and Harris to choose the granularity to match the object the user wants to detect, because it is predictable that doing so would improve the accuracy of the object detection by searching for the object at its closest and most correct granularity level.

Regarding claim 18, Sawada does not explicitly disclose wherein the operations further comprise: receiving a request from the user for a desired granularity of the first object; based on the request, adjusting the predetermined threshold; and regrouping the one or more elements based on the adjusted predetermined threshold.
However, Hisada discloses wherein the operations further comprise: receiving a request from the user for a desired granularity of the first object; based on the request, adjusting the predetermined threshold; and regrouping the one or more elements based on the adjusted predetermined threshold (see Hisada para. 0059, where a granularity and corresponding threshold are used).
It would have been obvious to one of ordinary skill in the art at the time of filing to permit the user of Sawada and Harris to choose the granularity to match the object the user wants to detect, because it is predictable that doing so would improve the accuracy of the object detection by searching for the object at its closest and most correct granularity level.

Claim(s) 5, 6, 12, 13, and 19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Sawada in view of Harris as applied to claim(s) 1, 8, and 15 above, and in further view of Yu et al., US 2018/0032840 A1 (hereinafter referred to as “Yu”).

Regarding claim 5, Sawada does not explicitly disclose wherein detecting the location of the first object within the image data comprises: interpolating the two-dimensional general activation map to generate a tighter bounding box about the first object.
However, Yu discloses wherein detecting the location of the first object within the image data comprises: interpolating the two-dimensional general activation map to generate a tighter bounding box about the first object (see Yu para. 0037, where the map is interpolated to determine the bounding box).
It would have been obvious to one of ordinary skill in the art at the time of filing to simply substitute the map resizing of Sawada, as modified by Harris, with the interpolation of Yu, because it is predictable that doing so would successfully resize the maps to the original image size, and it is also predictable that interpolation strategies are more accurate than simple resizing by ensuring the new values take into account the entire neighborhood of values instead of merely duplicating a single value.  

Regarding claim 6, Sawada discloses the two-dimensional general activation map (see Sawada Figs. 1-4, and paras. 0053-0060, 0068, and 0069, where the saliency maps correspond to image size). 
Sawada does not explicitly disclose wherein the tighter bounding box does not align.
However, Yu discloses wherein the tighter bounding box does not align (see Yu para. 0037, where cropping is disclosed).

Regarding claim 12, Sawada does not explicitly disclose wherein detecting the location of the first object within the image data comprises: interpolating the two-dimensional general activation map to generate a tighter bounding box about the first object.
However, Yu discloses wherein detecting the location of the first object within the image data comprises: interpolating the two-dimensional general activation map to generate a tighter bounding box about the first object (see Yu para. 0037, where the map is interpolated to determine the bounding box).
It would have been obvious to one of ordinary skill in the art at the time of filing to simply substitute the map resizing of Sawada, as modified by Harris, with the interpolation of Yu, because it is predictable that doing so would successfully resize the maps to the original image size, and it is also predictable that interpolation strategies are more accurate than simple resizing by ensuring the new values take into account the entire neighborhood of values instead of merely duplicating a single value.  

Regarding claim 13, Sawada discloses the two-dimensional general activation map (see Sawada Figs. 1-4, and paras. 0053-0060, 0068, and 0069, where the saliency maps correspond to image size).
Sawada does not explicitly disclose wherein the tighter bounding box does not align.
However, Yu discloses wherein the tighter bounding box does not align (see Yu para. 0037, where cropping is disclosed).

Regarding claim 19, Sawada does not explicitly disclose wherein detecting the location of the first object within the image data comprises: interpolating the two-dimensional general activation map to generate a tighter bounding box about the first object.
However, Yu discloses wherein detecting the location of the first object within the image data comprises: interpolating the two-dimensional general activation map to generate a tighter bounding box about the first object (see Yu para. 0037, where the map is interpolated to determine the bounding box).
It would have been obvious to one of ordinary skill in the art at the time of filing to simply substitute the map resizing of Sawada, as modified by Harris, with the interpolation of Yu, because it is predictable that doing so would successfully resize the maps to the original image size, and it is also predictable that interpolation strategies are more accurate than simple resizing by ensuring the new values take into account the entire neighborhood of values instead of merely duplicating a single value.  

Claim(s) 7, 14, and 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Sawada in view of Harris as applied to claim(s) 1, 8, and 15 above, and in further view of Somerville, US 2010/0295760 A1 (hereinafter referred to as “Somerville”).

Regarding claim 7, Sawada does not explicitly disclose further comprising: converting the image data to a normalized matrix-based representation of the image data.
However, Somerville discloses further comprising: converting the image data to a normalized matrix-based representation of the image data (see Somerville Figs. 3B-3D, and para. 0081, where the image is converted to a normalized matrix).
It would have been obvious to one of ordinary skill in the art at the time of filing to simply substitute the display technique of Sawada, as modified by Harris, with that of Somerville, because it is predictable that doing so would succeed at displaying the image, and it is predictable that normalizing the image will reduce erroneous outliers thereby improving the aesthetics of the displayed image.

Regarding claim 14, Sawada does not explicitly disclose further comprising: converting the image data to a normalized matrix-based representation of the image data.
However, Somerville discloses further comprising: converting the image data to a normalized matrix-based representation of the image data (see Somerville Figs. 3B-3D, and para. 0081, where the image is converted to a normalized matrix).
It would have been obvious to one of ordinary skill in the art at the time of filing to simply substitute the display technique of Sawada, as modified by Harris, with that of Somerville, because it is predictable that doing so would succeed at displaying the image, and it is predictable that normalizing the image will reduce erroneous outliers thereby improving the aesthetics of the displayed image.

Regarding claim 20, Sawada does not explicitly disclose wherein the operations further comprise: converting the image data to a normalized matrix-based representation of the image data.
However, Somerville discloses wherein the operations further comprise: converting the image data to a normalized matrix-based representation of the image data (see Somerville Figs. 3B-3D, and para. 0081, where the image is converted to a normalized matrix).
It would have been obvious to one of ordinary skill in the art at the time of filing to simply substitute the display technique of Sawada, as modified by Harris, with that of Somerville, because it is predictable that doing so would succeed at displaying the image, and it is predictable that normalizing the image will reduce erroneous outliers thereby improving the aesthetics of the displayed image.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ANDREW M MOYER whose telephone number is (571)272-9523. The examiner can normally be reached Monday-Friday 9-5 EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Claire Wang can be reached on (571)270-1051. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/ANDREW M MOYER/             Primary Examiner, Art Unit 2663