Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . 

DETAILED ACTION
Claims 1 – 20 are pending in this application. Claims 1, 8 and 15 are independent.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. § 112 (b) or 35 U.S.C. § 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the Inventor(s) (or (pre-AlA) Applicant(s)) regards as the invention.

Claim(s) 4, 11 and 18 are rejected under 35 U.S.C. § 112 (b) or 35 U.S.C. § 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which Inventor(s) (or (pre-AlA) Applicant(s)) regards as the invention.

Claims 4, 11 and 18 recite the limitation "…and weight the autofocus parameters…" However, there is insufficient antecedent basis for this limitation in the claim.
Appropriate action is required.







Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1 – 20 are rejected under 35 U.S.C. 103 as being unpatentable over Burachas, Giedrius (US-20190370587-A1, hereinafter simply referred to as Burachas) in view of Lund, Christopher Dale (US-20200372610-A1, hereinafter simply referred to as Lund).

Regarding independent claims 1, 8 and 15, Burachas teaches:
A method (See at least Burachas, ¶ [0101], FIGS. 3, 5, 6; "…heatmap 600 and 602…") for utilizing a saliency heatmap (See at least Burachas, ¶ [0102], FIGS. 3, 5, 6; "…heatmap 600 may correctly identify, with pixel-level granularity the pizza, the slice of pizza, coloring the pizza red…"), comprising: obtaining image data corresponding to an image of a scene (See at least Burachas, ¶ [0102], FIGS. 3, 5, 6; "…heatmap 600 may correctly identify, with pixel-level granularity the pizza, the slice of pizza, coloring the pizza red…overlay model 120 generates heatmaps 600 and 602 in a manner that shows the association within scene graph 111 between the hand (identifying the person referenced in the question) and what that person is eating and/or holding…"); generating a saliency heatmap for the image of the scene based on a saliency network (i.e., merely a network that generates/outputs a heatmap according to para. [0012] of Applicant’s PG PUB – which is seen to correspond to VQA model 20 (which embodies model 120) of Burachas) (See at least Burachas, ¶ [0102], FIGS. 3, 5, 6; "…heatmap 600 may correctly identify, with pixel-level granularity the pizza, the slice of pizza, coloring the pizza red…overlay model 120 generates heatmaps 600 and 602 in a manner that shows the association within scene graph 111 between the hand (identifying the person referenced in the question) and what that person is eating and/or holding…").
Burachas teaches all the subject matters of the claimed inventive concept as expressed in the rejections above.
But, Burachas does not expressly disclose the concept of wherein the saliency heatmap indicates a likelihood of saliency for a corresponding portion of the scene; and manipulating the image data based on the saliency heatmap.
Nevertheless, Lund teaches the concept of wherein the saliency heatmap indicates a likelihood (e.g., probability(ies)/likelihoods of Lund) of saliency for a corresponding portion of the scene (See at least Lund, ¶ [Abstract], FIGS. 4, 5A, 5B, 8; "…The convolutional neural network identifies a probability that each pixel in the scaled document image is text and generates a heat map of these probabilities. The heat map is then scaled back to the size of the original document image, and the probabilities in the heat map are used to adjust the intensities of the text and non-text pixels…"); and manipulating the image data based on the saliency heatmap (See at least Lund, ¶ [Abstract, 0048, 0049], FIGS. 4, 5A, 5B, 8; "…The convolutional neural network identifies a probability that each pixel in the scaled document image is text and generates a heat map of these probabilities. The heat map is then scaled back to the size of the original document image, and the probabilities in the heat map are used to adjust the intensities of the text and non-text pixels of the image…", "…The heat maps generated by the convolutional neural network may use colors, shades of gray, or other means to indicate a range of likelihoods of finding a particular feature type at specific locations within the image…", "…the intensity of each pixel may be adjusted by a variable amount that depends on the associated probability and the original intensity of the pixel…the intensity-adjusted image may be processed to identify bounding boxes…").
Thus, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use the known technique of wherein the saliency heatmap indicates a likelihood of saliency for a corresponding portion of the scene; and manipulating the image data based on the saliency heatmap as disclosed in the device of Lund to modify the known and similar device of Burachas for the desirable and advantageous purpose of improving the effectiveness of identifying the portions of a document image that correspond to text and the portions that correspond to non-textual background elements, so that OCR algorithms can more effectively recognize and extract text from the document image, as discussed in Burachas (See ¶ [0007]); thereby, helping to improve the overall system robustness by improving the effectiveness of identifying the portions of a document image that correspond to text and the portions that correspond to non-textual background elements, so that OCR algorithms can more effectively recognize and extract text from the document image.

Regarding dependent claims 2, 9 and 16, Burachas modified by Lund above teaches:
wherein the saliency network is trained on training image data to identify a likely salient foreground object (e.g., text portions of image in Lund) (See at least Lund, ¶ [Abstract, 0048, 0049], FIGS. 4, 5A, 5B, 8; "…The convolutional neural network identifies a probability that each pixel in the scaled document image is text and generates a heat map of these probabilities. The heat map is then scaled back to the size of the original document image, and the probabilities in the heat map are used to adjust the intensities of the text and non-text pixels of the image…", "…The heat maps generated by the convolutional neural network may use colors, shades of gray, or other means to indicate a range of likelihoods of finding a particular feature type at specific locations within the image…", "…the intensity of each pixel may be adjusted by a variable amount that depends on the associated probability and the original intensity of the pixel…the intensity-adjusted image may be processed to identify bounding boxes…").

Regarding dependent claims 3, 10 and 17, Burachas modified by Lund above teaches:
wherein the saliency heatmap provides a saliency value (e.g., label in Burachas) for each pixel in the image (See at least Burachas, ¶ [0057, 0102], FIGS. 3, 5, 6; "…Semantic segmentation may apply, to each of the classified one or more pixels in each of the frames and/or images, a label corresponding to the object type. For example, the set of semantic classification labels to which pixels may be semantically labeled may include: Sky, Building, Pole, Road Marking, Road, Pavement, Tree, Sign Symbol, Fence, Vehicle, Pedestrian, and Bicycle. Thus, each semantically-segmented sub-view may contain only those pixels that are labeled with a particular class…", "…heatmap 600 may correctly identify, with pixel-level granularity the pizza, the slice of pizza, coloring the pizza red…overlay model 120 generates heatmaps 600 and 602 in a manner that shows the association within scene graph 111 between the hand (identifying the person referenced in the question) and what that person is eating and/or holding…" Also, see at least Lund, ¶ [Abstract, 0010, 0012, 0046 – 0049, 0058 – 0062], FIGS. 4, 5A, 5B, 8).

Regarding dependent claims 4, 11 and 18, Burachas modified by Lund above teaches:
obtain a set of autofocus statistics (e.g., a well-known feature of e.g., image capture device 43 (e.g., a camera) of Burachas) for one or more pixels in the image (See at least Burachas, ¶ [0057, 0102], FIGS. 3, 5, 6; "…Semantic segmentation may apply, to each of the classified one or more pixels in each of the frames and/or images, a label corresponding to the object type. For example, the set of semantic classification labels to which pixels may be semantically labeled may include: Sky, Building, Pole, Road Marking, Road, Pavement, Tree, Sign Symbol, Fence, Vehicle, Pedestrian, and Bicycle. Thus, each semantically-segmented sub-view may contain only those pixels that are labeled with a particular class…", "…heatmap 600 may correctly identify, with pixel-level granularity the pizza, the slice of pizza, coloring the pizza red…overlay model 120 generates heatmaps 600 and 602 in a manner that shows the association within scene graph 111 between the hand (identifying the person referenced in the question) and what that person is eating and/or holding…" Also, see at least Lund, ¶ [Abstract, 0010, 0012, 0028, 0046 – 0049, 0058 – 0062], FIGS. 4, 5A, 5B, 8); and weight the autofocus parameters based on the saliency value for the one or more pixels in the image (See at least Burachas, ¶ [0057, 0102], FIGS. 3, 5, 6; "…Semantic segmentation may apply, to each of the classified one or more pixels in each of the frames and/or images, a label corresponding to the object type. For example, the set of semantic classification labels to which pixels may be semantically labeled may include: Sky, Building, Pole, Road Marking, Road, Pavement, Tree, Sign Symbol, Fence, Vehicle, Pedestrian, and Bicycle. Thus, each semantically-segmented sub-view may contain only those pixels that are labeled with a particular class…", "…heatmap 600 may correctly identify, with pixel-level granularity the pizza, the slice of pizza, coloring the pizza red…overlay model 120 generates heatmaps 600 and 602 in a manner that shows the association within scene graph 111 between the hand (identifying the person referenced in the question) and what that person is eating and/or holding…" Also, see at least Lund, ¶ [Abstract, 0010, 0012, 0028, 0046 – 0049, 0058 – 0062], FIGS. 4, 5A, 5B, 8)
Regarding dependent claims 5, 12 and 19, Burachas modified by Lund above teaches:
generate a binary mask (e.g., mask of Burachas) of the image by applying a threshold value to the saliency value for each pixel (See at least Burachas, ¶ [0057, 0074, 0076, 0077, 0102, 0107], FIGS. 3, 5, 6; "…Semantic segmentation may apply, to each of the classified one or more pixels in each of the frames and/or images, a label corresponding to the object type. For example, the set of semantic classification labels to which pixels may be semantically labeled may include: Sky, Building, Pole, Road Marking, Road, Pavement, Tree, Sign Symbol, Fence, Vehicle, Pedestrian, and Bicycle. Thus, each semantically-segmented sub-view may contain only those pixels that are labeled with a particular class…", "…Fusion module 116 may be trained to recognize computed gradients within a threshold gradient of the point in the feature space representative of the one or more keywords 115…Fusion module 116 may output selected gradient 117 to attention mask generation module 118…", "…Attention mask generation module 118 may associate selected gradient 117 with scene graph 111, assigning, based on selected gradient 117, attention scores to each scene, object, part, sub-part, etc…", "…Attention mask generation module 118 may associate the attention scores with various nodes in scene graph 111 to transform scene graph 111 into attention mask 119, thereby obtaining attention mask 119…", "…heatmap 600 may correctly identify, with pixel-level granularity the pizza, the slice of pizza, coloring the pizza red…overlay model 120 generates heatmaps 600 and 602 in a manner that shows the association within scene graph 111 between the hand (identifying the person referenced in the question) and what that person is eating and/or holding…", "…Attention mask generation module 118 may present the image with bounding box overlays…" Also, see at least Lund, ¶ [Abstract, 0010, 0012, 0028, 0046 – 0049, 0058 – 0062], FIGS. 4, 5A, 5B, 8); and generate a bounding box based on the saliency heatmap and a bounding box network (See at least Burachas, ¶ [0057, 0074, 0076, 0077, 0102, 0107], FIGS. 3, 5, 6; "…Semantic segmentation may apply, to each of the classified one or more pixels in each of the frames and/or images, a label corresponding to the object type. For example, the set of semantic classification labels to which pixels may be semantically labeled may include: Sky, Building, Pole, Road Marking, Road, Pavement, Tree, Sign Symbol, Fence, Vehicle, Pedestrian, and Bicycle. Thus, each semantically-segmented sub-view may contain only those pixels that are labeled with a particular class…", "…Fusion module 116 may be trained to recognize computed gradients within a threshold gradient of the point in the feature space representative of the one or more keywords 115…Fusion module 116 may output selected gradient 117 to attention mask generation module 118…", "…Attention mask generation module 118 may associate selected gradient 117 with scene graph 111, assigning, based on selected gradient 117, attention scores to each scene, object, part, sub-part, etc…", "…Attention mask generation module 118 may associate the attention scores with various nodes in scene graph 111 to transform scene graph 111 into attention mask 119, thereby obtaining attention mask 119…", "…heatmap 600 may correctly identify, with pixel-level granularity the pizza, the slice of pizza, coloring the pizza red…overlay model 120 generates heatmaps 600 and 602 in a manner that shows the association within scene graph 111 between the hand (identifying the person referenced in the question) and what that person is eating and/or holding…", "…Attention mask generation module 118 may present the image with bounding box overlays…" Also, see at least Lund, ¶ [Abstract, 0010, 0012, 0028, 0046 – 0049, 0058 – 0062], FIGS. 4, 5A, 5B, 8).

Regarding dependent claims 6, 13 and 20, Burachas modified by Lund above teaches:
wherein the bounding box network has been trained to estimate a location of the bounding box and a movement of the bounding box over a series of frames comprising the (See at least Burachas, ¶ [0057, 0066, 0074, 0076, 0077, 0102, 0107], FIGS. 3, 5, 6; "…Semantic segmentation may apply, to each of the classified one or more pixels in each of the frames and/or images, a label corresponding to the object type. For example, the set of semantic classification labels to which pixels may be semantically labeled may include: Sky, Building, Pole, Road Marking, Road, Pavement, Tree, Sign Symbol, Fence, Vehicle, Pedestrian, and Bicycle. Thus, each semantically-segmented sub-view may contain only those pixels that are labeled with a particular class…", "…Scene-graph generation module 110 may specify each node of scene graph 111 to define the object, part, sub-part, etc. at a pixel-level of granularity, a bounding box outlining the object, part, sub-part, etc. to highlight a general location of each object, part, sub-part, etc., and/or one or more attributes associated with the object, part, sub-part, etc. (such as a color, a shape, a label--which may also be referred to as a tag or feature)…", "…Fusion module 116 may be trained to recognize computed gradients within a threshold gradient of the point in the feature space representative of the one or more keywords 115…Fusion module 116 may output selected gradient 117 to attention mask generation module 118…", "…Attention mask generation module 118 may associate selected gradient 117 with scene graph 111, assigning, based on selected gradient 117, attention scores to each scene, object, part, sub-part, etc…", "…Attention mask generation module 118 may associate the attention scores with various nodes in scene graph 111 to transform scene graph 111 into attention mask 119, thereby obtaining attention mask 119…", "…heatmap 600 may correctly identify, with pixel-level granularity the pizza, the slice of pizza, coloring the pizza red…overlay model 120 generates heatmaps 600 and 602 in a manner that shows the association within scene graph 111 between the hand (identifying the person referenced in the question) and what that person is eating and/or holding…", "…Attention mask generation module 118 may present the image with bounding box overlays…" Also, see at least Lund, ¶ [Abstract, 0010, 0012, 0028, 0046 – 0049, 0058 – 0062], FIGS. 4, 5A, 5B, 8).

Regarding dependent claims 7 and 14, Burachas modified by Lund above teaches:
wherein the saliency network comprises optical flow data (e.g., motion/velocity of objects in at least ¶ [0037] of Burachas) (See at least Burachas, ¶ [0057, 0066, 0074, 0076, 0077, 0102, 0107], FIGS. 3, 5, 6; "…Semantic segmentation may apply, to each of the classified one or more pixels in each of the frames and/or images, a label corresponding to the object type. For example, the set of semantic classification labels to which pixels may be semantically labeled may include: Sky, Building, Pole, Road Marking, Road, Pavement, Tree, Sign Symbol, Fence, Vehicle, Pedestrian, and Bicycle. Thus, each semantically-segmented sub-view may contain only those pixels that are labeled with a particular class…", "…Scene-graph generation module 110 may specify each node of scene graph 111 to define the object, part, sub-part, etc. at a pixel-level of granularity, a bounding box outlining the object, part, sub-part, etc. to highlight a general location of each object, part, sub-part, etc., and/or one or more attributes associated with the object, part, sub-part, etc. (such as a color, a shape, a label--which may also be referred to as a tag or feature)…", "…Fusion module 116 may be trained to recognize computed gradients within a threshold gradient of the point in the feature space representative of the one or more keywords 115…Fusion module 116 may output selected gradient 117 to attention mask generation module 118…", "…Attention mask generation module 118 may associate selected gradient 117 with scene graph 111, assigning, based on selected gradient 117, attention scores to each scene, object, part, sub-part, etc…", "…Attention mask generation module 118 may associate the attention scores with various nodes in scene graph 111 to transform scene graph 111 into attention mask 119, thereby obtaining attention mask 119…", "…heatmap 600 may correctly identify, with pixel-level granularity the pizza, the slice of pizza, coloring the pizza red…overlay model 120 generates heatmaps 600 and 602 in a manner that shows the association within scene graph 111 between the hand (identifying the person referenced in the question) and what that person is eating and/or holding…", "…Attention mask generation module 118 may present the image with bounding box overlays…" Also, see at least Lund, ¶ [Abstract, 0010, 0012, 0028, 0046 – 0049, 0058 – 0062], FIGS. 4, 5A, 5B, 8).


















Conclusion
The prior art made of record and not relied upon is considered pertinent to Applicant's disclosure: See the Notice of References Cited (PTO–892)
Any inquiry concerning this communication or earlier communications from the examiner should be directed to IDOWU O. OSIFADE whose telephone number is (571)272-0864. The Examiner can normally be reached on Monday-Friday 8:00am-5:00pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.  
If attempts to reach the examiner by telephone are unsuccessful, the Examiner’s Supervisor, Kim Vu can be reached on (571) 272 -3859. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. 
Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/IDOWU O OSIFADE/Primary Examiner, Art Unit 2666