DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1, 2, 7, 10-12 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Kwant et al. (US Patent Publication 2019/0073774 A1, hereinafter “Kwant”) in view of Yang et al. (US Patent Publication 2019/0102908 A1, hereinafter “Yang”).
With respect to claim 1,
Kwant discloses the invention substantially as claimed, including receiving an image depicting an object, the image comprising an n-dimensional array-like data structure (Kwant [0047], Item 401 in Figure 4 e.g. Fig. 4 illustrates an example input image 401 that depicts a road sign that is to be detected [receiving an image depicting an object] wherein said input image can be captured in real-time by a camera system of the vehicle as one or more raster images at a predetermined pixel resolution [the image comprising an n-dimensional array-like data structure (i.e., an image that is w pixels wide by h pixels high with d channels of depth)]);
generating a set of image features using a Convolutional Neural Network (CNN) encoder implemented on one or more computers (Kwant [0043, 0046, 0073] e.g. the computer vision system processes an image to generate a cell-based parametric representation of one or more edges of the object as depicted in the image [generating a set of image features using an encoder] wherein computer vision system employs a neural network (e.g., a convolutional neural network) to recognize objects [using a CNN encoder] wherein computer vision system may include multiple servers, intelligent networking devices, computing devices [implemented on one or more computers], components and corresponding software for providing parametric representations);
predicting a set of vertex predictions using the set of image features (Kwant [0066] e.g. the computer vision system can trace the perimeter of the intersection or polygon interior, take an intersection of the intersection and the edge lines, determine the vertices of the intersection [predicting a set of vertex predictions using the set of image features] to define a polygon having the determined vertices, and /or any other equivalent process);
producing a set of polygon predictions of the object  (Kwant [0068] e.g. The computer vision system can then retrieve ground truth representations 1113a and 1113b that depict the known road sign from different known camera poses [producing a set of polygon predictions of the object that exploits the set of vertex predictions and the set of image features]); and
selecting a polygon object annotation from the set of polygon predictions (Kwant [0068] e.g. The distance and camera pose of the ground truth polygon with the greatest polygon similarity can then be selected as the camera pose of the detected polygon [selecting a polygon object annotation from the set of polygon predictions]).
Kwant may not explicitly teach using a recurrent decoder wherein the recurrent decoder implemented on one or more computers.  
Yang, in the same field of computer vision, teaches determination of polygon predictions utilizing a recurrent neural network.  Further, Yang teaches using a recurrent decoder implemented on one or more computers (Yang [0058], Figure 1C, Items 234-236 in Figure 2C e.g. the polygon predictions 234-236 of Figure 2C can utilize any network such as RNN, 2D convolution, 3D convolution, etc. [using a recurrent decoder] wherein The PPU may be included in a desktop computer, a laptop computer, a tablet computer, servers, super-computers [the recurrent decoder implemented on one or more computers] a smart-phone, personal digital assistant (PDA), a digital camera, a vehicle, a head mounted display, a hand-held electronic device, and the like).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Kwant and Yang.  Yang teaches determination of polygon predictions utilizing a recurrent neural network in order to ease computational and memory requirements of said predictions, whereas Kwant is silent in this respect.  One of ordinary skill in the art would have motivation to ease computational power and memory demand by implementing a recurrent neural network in the computer vision system (Yang [0005]).
With respect to claim 2,
The combination of Kwant and Yang teaches all the limitations of claim 1.
Kwant further teaches wherein the selecting of a polygon object annotation is performed using an evaluator network (Kwant [0068] e.g. The computer vision system then evaluates the polygon similarity between the detected polygon against each of the ground truth polygons using any known polygon similarity evaluation or metric [wherein the selecting of a polygon object annotation is performed using an evaluator network]).
With respect to claim 7,
The combination of Kwant and Yang teaches all the limitations of claim 2.
Yang further teaches wherein producing each polygon prediction of the set of polygon predictions includes a series of timesteps to produce a set of vertex predictions defining the polygon prediction and the evaluator network is applied at each timestep of the series of timesteps (Yang [0073], Figure 2D e.g. the clips t-1 and t+1 provide the necessary information for associating the catching and shooting of a ball with the subject [wherein producing each polygon prediction of the set of polygon predictions includes a series of timesteps (i.e. t-1 and t+1) to produce a set of vertex predictions defining the polygon prediction and the evaluator network is applied at each timestep of the series of timesteps]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Kwant and Yang.  Yang, in the same field of computer vision as Kwant, teaches utilizing timesteps to refine polygon predictions of objects in order to improve accuracy of ground truths.  One of ordinary skill in the art would have motivation to improve the accuracy of ground truth data by utilizing timesteps to analyze successive frames in a computer vision system (Yang [0074]).
With respect to claim 10,
The combination of Kwant and Yang teaches all the limitations of claim 1.
Kwant further teaches wherein the image is received from a sensor. (Kwant [0047])
With respect to claim 11,
Claim 11 is directed to a system that performs the method recited in claim 1.  Therefore, the rejection made to claim 1 is applied to claim 11.
With respect to claim 12,
The combination of Kwant and Yang teaches all the limitations of claim 11.
Kwant further teaches to produce each polygon of the set of polygon predictions one vertex at a time (Kwant [0066] Figure 6B e.g. to construct the boundary, the computer vision system can trace the perimeter of the intersection or polygon interior, take an intersection of the intersection and the edge lines, determine the vertices of the intersection [to produce each polygon of the set of polygon predictions one vertex at a time] to define a polygon having the determined vertices, and/or any other equivalent process).
Yang further teaches wherein the recurrent decoder includes an attention unit at each time step (Yang [0065] e.g. the same anchor tube is regressed to respective shapes 234, 235 and 236 to capture the person to a sufficient completeness so that a classification of the action can be made with a high level of confidence [wherein the decoder applies an attention unit (i.e. identifies a substantial semantically similar region of the image) at each time step (i.e. 234, 235, and 236 representing time steps, iterations, in the recurrent analysis)]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Kwant and Yang.  Yang, in the art of computer vision, further teaches use of an attention unit at each time step to identify objects in a sequence of images in order to increase classification accuracy.  One of ordinary skill in the art would have motivation to increase accuracy of a computer vision system by utilizing an attention unit at each time step (Yang [0065]).
With respect to claim 18,
Claim 18 is directed to a system that performs the method recited in claim 10.  Therefore, the rejection made to claim 18 is applied to claim 10.

Claims 3 and 5 are rejected under 35 U.S.C. 103 as being unpatentable over Kwant in view of Yang as applied to claim 1 above, and further in view of Bazrafkan et al. (US Patent Publication 2018/0211155 A1, hereinafter “Bazrafkan”).
With respect to claim 3,
The combination of Kwant and Yang teach all the limitations of claim 1.
Kwant further teaches generating a higher resolution polygon prediction from the polygon object annotation  (Kwant [0073, 0102] e.g. For example, in areas where higher resolution is needed, smaller cells can be used to provide greater resolution [generating a higher resolution polygon from the polygon object annotation] wherein computer vision system may include multiple servers, intelligent networking devices, computing devices [implemented on one or more computers], components and corresponding software for providing parametric representations).
Kwant may not explicitly teach using a graph neural network.  
Bazrafkan, in the similar art of neural networks as Kwant and Yang, teaches representation of complex neural network architectures as a graph to rationalize resources for required for the network.  Further, Bazrafkan teaches using a graph neural network (Bazrafkan [0050], Figures 4 and 5 e.g. contraction of the graphs shown in Figure 1, right hand side, results in the optimized graph shown in Figure 5 right hand side [using a graph neural network (i.e., representing a neural network architecture as a graph)]).
 
    PNG
    media_image1.png
    402
    765
    media_image1.png
    Greyscale
  
    PNG
    media_image2.png
    305
    632
    media_image2.png
    Greyscale


It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Kwant and Yang with Bazrafkan.  Bazrafkan, in the similar art of neural networks as Kwant and Yang, teaches representation of complex neural network architectures as a graph to rationalize resources required for the overall network.  One of ordinary skill in the art would have motivation to represent the complex detection architectures of an object detection system as a graph in order to rationalize resources required for the detection system (Bazrafkan [0019]).
With respect to claim 5,
The combination of Kwant, Yang, and Bazrafkan teach all the limitations of claim 3.
Yang further teaches wherein the decoder applies an attention unit at each time step wherein the attention unit is a computer-implemented structure that can accomplish the function of visual temporal attention. (Yang [0065] e.g. the same anchor tube is regressed to respective shapes 234, 235 and 236 to capture the person to a sufficient completeness so that a classification of the action can be made with a high level of confidence [wherein the decoder applies an attention unit (i.e. identifies a substantial semantically similar region of the image) at each time step (i.e. 234, 235, and 236 representing time steps in the analysis)]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Kwant, Yang and Bazrafkan.  Yang, in the same art of computer vision as Kwant, and in the similar field of neural networks of Bazrafkan, further teaches use of an attention unit at each time step to identify objects in a sequence of images in order to increase classification accuracy.  One of ordinary skill in the art would have motivation to increase accuracy of a computer vision system by utilizing an attention unit at each time step (Yang [0065]).
Claim 4 is rejected under 35 U.S.C. 103 as being unpatentable over Kwant in view of Yang and Bazrafkan as applied to claim 3 above, and further in view of Hazan (US Patent Publication 2019/0197682 A1).
With respect to claim 4,
The combination of Kwant, Yang and Bazrafkan teaches all the limitations of claim 3.
The combination of Kwant, Yang and Bazrafkan may not explicitly teach adding a set of supplementary vertex predictions to a set of primary vertex predictions defining the polygon objection annotation; defining a propagation model; and applying the propagation model to adjust the position of the vertices of the set of supplementary vertex predictions and the vertices of the set of primary vertex predictions.  
Hazan, in the same art of computer vision, teaches the use of a propagation model to adjust vertex positions in a detected object in order to improve classification.  Further, Hazan teaches adding a set of supplementary vertex predictions to a set of primary vertex predictions defining the polygon objection annotation (Hazan [0038], Figure 6 e.g. The layers were segmented by computing the center of mass of the lesion, stretching rays from the center, and segmenting each ray to equal number of segments to create layer waypoints [adding a set of supplementary vertex predictions to a set of primary vertex predictions defining the polygon objection annotation (i.e. each successive layer’s representation represents a set of supplementary vertex predictions)]); defining a propagation model; and applying the propagation model to adjust the position of the vertices of the set of supplementary vertex predictions and the vertices of the set of primary vertex predictions (Hazan [0044], Figure 6 e.g. statistics of features from all pixels in each layer may be calculated separately.  The resulting vectors for a layer may propagate to the next layer of the network, thereby providing additional information on the tumor [defining a propagation model (i.e. the convolutional neural network classification model in 0044); and applying the propagation model to adjust the position of the vertices of the set of supplementary vertex predictions and the vertices of the set of primary vertex predictions (i.e. the CNN model adjusts the position of the vertices through successive layers, as seen in Figure 6)]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Kwant and Yang with Hazan.  Hazan, in the same art of computer vision as Kwant, teaches the use of a propagation model to adjust vertex positions in a detected object in order to improve classification.  One of ordinary skill in the art would have motivation to improve classification of a computer vision system by implementing a propagation model to adjust vertex positions of the predicted objects (Hazan [0025]).
Claim 6 is rejected under 35 U.S.C. 103 as being unpatentable over Kwant in view of Yang as applied to claim 2 above, and further in view of Katabi et al. (US Patent Publication 2019/0188533 A1, hereinafter “Katabi”).
With respect to claim 6,
The combination of Kwant and Yang teaches all the limitations of claim 2.
The combination of Kwant and Yang may not explicitly teach wherein the evaluator network predicts an Intersection over Union (IoU) of each polygon prediction of the set of polygon predictions using gamma testing, and the polygon object annotation is the polygon prediction of the set of polygon predictions having the maximum IoU.  
Katabi, in the same art of computer vision, teaches use of an Intersection of Union metric to improve time spent training a model.  Further, Katabi teaches wherein the evaluator network predicts an Intersection over Union (IoU) of each polygon prediction of the set of polygon predictions (Katabi [0088] e.g. A binary label is assigned to each window for training to indicate whether it contains a subject or not.  To set the label, a simple intersection-over-union (IoU) metric is used [wherein the evaluator network predicts an Intersection over Union (IoU) of each polygon prediction of the set of polygon predictions]) using gamma testing, and the polygon object annotation is the polygon prediction of the set of polygon predictions having the maximum IoU (Katabi [0088-0089] e.g. a binary label is assigned to each window for training to indicate whether it contains a subject or not wherein a window that overlaps more than 0.7 IoU with any ground truth region is set as positive and a window that overlaps less than 0.3 IoU with all ground truth is set as negative [using gamma testing (i.e. by comparing a rectangular window, polygon, that may contain a subject or not, to all the ground truth or “In house” data), and the polygon object annotation is the polygon prediction of the set of polygon predictions having the maximum IoU (i.e. the highest IoU produces a positive output)]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Kwant and Yang with Katabi.  Katabi, in the same art of computer vision, teaches use of an Intersection of Union metric to improve time spent training a model.  One of ordinary skill in the art would have motivation improve training time by implementing an Intersection of Union metric using gamma testing in a computer vision system (Katabi [0087]).
Claims 8, 9, 17, 19 and 20 are rejected under 35 U.S.C. 103 over Kwant in view of Yang as applied to claims 1 and 9 above, and further in view of Castrejón et al., “Annotating Object Instances with a Polygon-RNN” (IEEE Conference on Computer Vision and Pattern Recognition (CVPR), April 18, 2017, pp. 5230-5238), hereinafter “Castrejón.”
With respect to claim 8,
The combination of Kwant and Yang teaches all the limitations of claim 1.
The combination of Kwant and Yang may not explicitly teach wherein the set of polygon predictions comprises one or more human corrections to the set of vertex predictions.  
Castrejón, in the same art of computer vision as Kwant and Yang, teaches a human correction feedback mechanism to improve accuracy of predicted polygons.  Further, Castrejón teaches wherein the set of polygon predictions comprises one or more human corrections to the set of vertex predictions (Castrejón 4.3, p. 5235 e.g. The main advantage of our model is that it allows a human annotator to easily interfere if a mistake occurs [wherein the set of polygon predictions comprises one or more human corrections to the set of vertex predictions]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Kwant and Yang with Castrejón.  Castrejón, in the same art of computer vision as Kwant and Yang, teaches a human correction feedback mechanism to improve accuracy of predicted polygons.  One of ordinary skill in the art would have motivation to provide a human feedback mechanism to improve the accuracy of predicted polygons in a computer vision system (Castrejón 4.3, p. 5235).
With respect to claim 9,
The combination of Kwant and Yang teaches all the limitations of claim 1.
The combination of Kwant and Yang may not explicitly teach wherein the set of polygon predictions comprises one or more human corrections to the set of vertex predictions.  
Castrejón, in the same art of computer vision as Kwant and Yang, teaches a human correction feedback mechanism to improve accuracy of predicted polygons.  Further, Castrejón teaches wherein the set of polygon predictions comprises one or more human corrections to the set of vertex predictions (Castrejón 4.3, p. 5235 e.g. The main advantage of our model is that it allows a human annotator to easily interfere if a mistake occurs [wherein the set of polygon predictions comprises one or more human corrections to the set of vertex predictions]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Kwant and Yang with Castrejón.  Castrejón, in the same art of computer vision as Kwant and Yang, teaches a human correction feedback mechanism to improve accuracy of predicted polygons.  One of ordinary skill in the art would have motivation to provide a human feedback mechanism to improve the accuracy of predicted polygons in a computer vision system (Castrejón 4.3, p. 5235).
With respect to claim 20,
The combination of Kwant and Yang teaches all the limitations of claim 1.
The combination of Kwant and Yang may not explicitly teach applying one or more simulated human corrections or human corrections to one or more vertex predictions.  
Castrejón, in the same art of computer vision as Kwant and Yang, teaches a human correction feedback mechanism to improve accuracy of predicted polygons.  Further, Castrejón teaches applying one or more simulated human corrections or human corrections to one or more vertex predictions. (Castrejón 4.3, p. 5235 e.g. The main advantage of our model is that it allows a human annotator to easily interfere if a mistake occurs [wherein the set of polygon predictions comprises one or more human corrections to the set of vertex predictions]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Kwant and Yang with Castrejón.  Castrejón, in the same art of computer vision as Kwant and Yang, teaches a human correction feedback mechanism to improve accuracy of predicted polygons.  One of ordinary skill in the art would have motivation to provide a human feedback mechanism to improve the accuracy of predicted polygons in a computer vision system (Castrejón 4.3, p. 5235).
With respect to claim 17 and 19,
Claim 17 and 19 are directed to a system that performs the method recited in claim 8 and 9, respectively.  Therefore, the rejection made to claim 8 and 9 is applied to claim 17 and 19, respectively.
Claims 13-14 are rejected under 35 U.S.C. 103 as being unpatentable over Kwant in view of Yang as applied to claim 9 above, and further in view of Li et al., “Gated Graph Sequence Neural Networks” (International Conference on Learning Representations (ICLR) 2016, published 22 September 2017), hereinafter “Li.”
With respect to claim 13,
Kwant and Yang teach all the limitations of claim 11.
Kwant and Yang may not explicitly teach a gated graph neural network for generating a higher resolution polygon from the selected polygon prediction.  
Li, in the similar art of neural networks as Kwant and Yang, teaches a gated graph neural network architecture to infer custom features of an input graph.  Further, Li teaches a gated graph neural network for generating a higher resolution polygon from the selected polygon prediction (Li Section 1 p. 1, Section 3, p. 3 e.g. an extension of Graph Neural networks (i.e. a Gated Graph Neural Network) that outputs sequences … Examples include paths on a graph, enumerations of graph nodes with desirable properties, or sequences of global classifications mixed with, for example, a start and end node (i.e. wherein a change in path requires a higher resolution, or determination of said path requires a higher resolution, desirable properties) wherein Gated Graph Neural Networks (GG-NNs) … use Gated Recurrent Units and unroll the recurrence for a fixed number of steps T and use backpropagation through time in order to compute gradients [a gated graph neural network for generating a higher resolution polygon from the selected polygon prediction]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Kwant and Yang with Li.  Li, in the similar art of neural networks as Kwant and Yang, teaches a gated graph neural network architecture to infer custom features of an input graph.  One of ordinary skill in the art would have motivation to use a gated graph architecture to infer custom features by using a gated graph neural network architecture in a computer vision system (Li Section 1, p. 1).
With respect to claim 14,
The combination of Kwant, Yang and Li teaches all the limitations of claim 13.
Li further teaches wherein the gated graph neural network includes a propagation block and an output block (Li Section 2, p. 2 e.g. First, there is a propagation step that computes node representations for each node; second, an output model maps from node representations and corresponding labels to an output [wherein the gated graph neural network includes a propagation block and an output block]).
Claim 15 is rejected under 35 U.S.C. 103 as being unpatentable over Kwant in view of Yang as applied to claim 9 above, and further in view of Fitzpatrick (US Patent Publication 2019/0000382 A1).
With respect to claim 15,
The combination of Kwant and Yang teaches all the limitations of claim 11.
The combination of Kwant and Yang may not explicitly teach an application unit for receiving a resultant object annotation.  
Fitzpatrick, in the same art of computer vision as Kwant and Yang, teaches an application unit that displays information computed from the computer vision system in order to provide feedback about said computations.  Further, Fitzpatrick teaches an application unit for receiving a resultant object annotation (Fitzpatrick Claim 13 e.g. wherein the image of the one or more items is an image, picture, video or visual display on a television, computer monitor or virtual reality device [an application unit for receiving a resultant object annotation]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Kwant and Yang with Fitzpatrick.  Fitzpatrick, in the same art of computer vision as Kwant and Yang, teaches an application unit that displays information computed from the computer vision system in order to provide feedback about said computations.  One of ordinary skill in the art would have motivation to provide feedback to a user via an application unit in a computer vision system (Fitzpatrick Figure 1, Claim 13).
Claim 16 is rejected under 35 U.S.C. 103 as being unpatentable over Kwant in view of Yang as applied to claim 11 above, and further in view of Long et al., “Fully Convolutional Networks for Semantic Segmentation” (IEEE Conference on Computer Vision and Pattern Recognition (CVPR), April 2017, pp. 3431-3440), hereinafter “Long.”
With respect to claim 16,
The combination of Kwant and Yang teaches all the limitations of claim 11.
The combination of Kwant and Yang may not explicitly teach wherein the CNN encoder includes a skip layer architecture.  
Long, in the same art of computer vision as Kwant and Yang, teaches a skip layer architecture to improve segmentation analysis in computer vision applications.  Further, Long teaches wherein the CNN encoder includes a skip layer architecture (Long Abstract, p. 3431 e.g. we then define a skip architecture that combines semantic information from a deep, coarse layer with appearance information from a shallow, fine layer to produce accurate and detailed segmentations [wherein the CNN encoder includes a skip layer architecture]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Kwant and Yang with Long.  Long, in the same art of computer vision as Kwant and Yang, teaches a skip layer architecture to improve segmentation analysis in computer vision applications.  One of ordinary skill in the art would have motivation to augment a computer vision architecture with a skip layer in order to improve segmentation analysis (Long, Abstract).

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to UTPAL D SHAH whose telephone number is (571)272-5729. The examiner can normally be reached M-F: 7:30-5:30.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Edward Urban can be reached on 571-272-7899. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/UTPAL D SHAH/Primary Examiner, Art Unit 2665