Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED ACTION
Response to Arguments
Applicant's amendments and remarks submitted 05/10/2021 have been entered and considered, but are not found convincing. Claims 1, 3, 8,10, 15, 17 have been amended. In summary, claims 1-21 are pending in the application. Applicant’s amendments have necessitated the new grounds of rejection set forth herein; accordingly, this action is made final.
Response to Arguments
Specification:
Applicant has amended specification to correct typo. The objection of Specification has been withdrawn.
Claim Rejections - 35 U.S.C. 103
Applicant's arguments filed with respect to independent claims 1, 8 and 15 have been fully considered but are moot because the rejection has been modified to address a newly added limitations. The Examiner relies on Mikhailov for argues limitation.
Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 

(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.

“A hybrid representation generator for” in claim 15; “an input that is configured to”; “a selection unit that is configured to”; “an output that is configured to”; “a media unit signature that” in claim 15.
“hybrid representation generator” in claims 16-21.
 “A hybrid representation generator” is being interpreted to cover the corresponding structure “system 4900” described in the specification paragraph [00342] “System 4900 may include sensing unit 4902, communication unit 4904, input 4911, processor 4950, and output 4919. The communication unit 4904 may include the input and/or the output.”
an input that is configured to”; “an output that is configured to” are being interpreted to cover the corresponding structure described in the specification paragraph [00343] “Input and/or output may be any suitable communications component such as a network interface card, universal serial bus (USB) port, disk reader, modem or transceiver that may be operative to use protocols such as are known in the art to communicate either directly, or indirectly, with other elements of the system.”)
 “a selection unit that is configured to”; “a media unit signature that” are being interpreted to cover the corresponding structure “ a processor” described in the specification paragraph [00344] “Processor 4950 may include at least some out of • Multiple spanning elements 495l(q). • Multiple merge elements 4952(r). • Object detector 4953. • Cluster manager 4954. • Controller 4955. • Selection unit 4956. • Object detection determination unit 4957. • Signature generator 4958. • Movement information unit 4959. • Identifier unit 4960.”
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
1.	Claims 1-4, 8-11, 15-18 are rejected under 35 U.S.C. 103 as being unpatentable over Ren, Shaoqing, et al. "Faster r-cnn: Towards real-time object detection with region proposal networks." IEEE transactions on pattern analysis and machine intelligence 39.6 (2016): 1137-1149.(“Ren”) in view of Lin, Tsung-Yi, et al. "Feature pyramid networks for object detection." Proceedings of the IEEE conference on computer vision and pattern recognition. 2017 (“Lin”) further in view of Mikhailov, U.S Patent Application Publication No. 20090102835 (“Mikhailov”)


    PNG
    media_image1.png
    358
    479
    media_image1.png
    Greyscale

See Fig. 3 of Ren
Regarding independent claim 1, Ren teaches a method for generating a hybrid representation of a media unit, the method comprises:
receiving or generating the media unit (see section 3.1 Region Proposal Networks, “A Region Proposal Network (RPN) takes an image (of any size) as input and outputs a set of rectangular object proposals, each with an objectness score”) processing the media unit by performing multiple iterations, wherein at least some of the multiple iterations (see section 3.1 Region Proposal Networks, “A Region Proposal Network (RPN) takes an image (of any size)  selecting, based on an output of the multiple iterations, media unit regions of interest that contributed to the output of the multiple iterations ( see section 3.1 Region proposal networks, second paragraph “To generate region proposals, we slide a small network over the convolutional feature map output by the last shared convolutional layer. This small network takes as input an n × n spatial window of the input convolutional feature map. Each sliding window is mapped to a lower-dimensional feature (256-d for ZF and 512-d for VGG, with ReLU [33] following). This feature is fed into two sibling fullyconnected layers—a box-regression layer (reg) and a box-classification layer (cls). We use n = 3 in this paper, noting that the effective receptive field on the input image is large (171 and 228 pixels for ZF and VGG, respectively). This mini-network is illustrated at a single position in Figure 3 (left). Note that because the mini-network operates in a sliding-window fashion, the fully-connected layers are shared across all spatial locations. This architecture is naturally implemented with an n×n convolutional layer followed by two sibling 1 × 1 convolutional layers (for reg and cls, respectively). 3.1.1 Anchors, “At each sliding-window location, we simultaneously predict multiple region proposals, where the number of maximum ∼2,400), there are WHk anchors in total.”) and providing the hybrid representation (see Fig.3, right where label of classification and bounding box for example, person, car, dog, etc..), wherein the hybrid representation comprises shape information regarding shapes of the media unit regions of interest (see Fig. 3 where showing bounding box ) and a media unit signature that comprises identifiers that identify the media unit regions of interest (see Fig.3 where anchor is considered as identifiers as  label classification such as person, dog with ratio number) wherein the shape information comprises polygons that represent shapes that substantially bound the media unit regions of interest (see Fig. 3 where bounding box for example person, car, dog)  Ren is understood to be silent on the remaining limitations of claim 1.
In the same field of endeavor, Lin teaches receiving or generating the media unit (see section 3. Feature Pyramid Networks, second paragraph “Our method takes a single-scale image of an arbitrary size as input, and outputs proportionally sized feature maps at multiple levels, in a fully convolutional fashion….”); processing the media unit by performing multiple iterations, wherein at least some of the multiple iterations comprises applying, by spanning elements of the iteration, dimension expansion process that are followed by a merge operation (see section 3. Top-down pathway and lateral connections, second paragraph “Fig. 3 shows the building block that constructs our top-down feature maps. With a coarser-resolution feature map, we upsample the spatial resolution by a factor of 2 (using nearest neighbor upsampling for simplicity). The upsam- pled map is then merged with the corresponding bottom-up map (which undergoes a 1×1 convolutional layer to reduce channel dimensions) by element-wise addition. This process is iterated until the finest resolution map is generated. To start the iteration, we simply attach a 1×1 convolutional layer on C5 to produce the coarsest resolution map. Finally, we append a 3×3 convolution on each merged map to generate the final feature map, which is to reduce the aliasing effect of upsampling. This final set of feature maps is called {P2, P3, P4, P5}, corresponding to {C2,C3,C4,C5} that are respectively of the same spatial sizes.”); selecting, based on an output of the multiple iterations, media unit regions of interest that contributed to the output of the multiple iterations (section 4.2. Feature Pyramid Networks for Fast RCNN, “Fast R-CNN [11] is a region-based object detector in which Region-of-Interest (RoI) pooling is used to extract features. Fast R-CNN is most commonly performed on a single-scale feature map. To use it with our FPN, we need to assign RoIs of different scales to the pyramid levels.). Both the corresponding image region size (light orange) and canonical object size (dark orange) are shown.”) 
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to using Region Proposal Network (RPN) simultaneously predicts object bounds and objectness scores at each position of Ren with feature pyramid networks as seen in Lin because this modification would takes a single –
In the same field of endeavor, Mikhailov teaches wherein the shape information comprises polygons that represent shapes that substantially bound the media unit regions of interest (¶0028 “Once the objects 103, 105, 107 have been located in the image 101, Polygons 113, 115, 117 may be traced around each object, as indicated at 206 and as illustrated in FIG. 3C. Polygon data 124 may be stored in the memory 120. There are a number of different techniques for tracing the polygons”), wherein a number of edges per polygon of the polygons is based on a shape of a media unit region of interest represented by the polygon (¶0024 as shown in Fig. 3C “The processor 110 may be programmed with instructions that facilitate such operation. In particular, the processor 110 may be programmed with image capture instructions 112 that obtain an image 101 from the image capture device 106 and store the data 122 representing the image 101 or retrieve the stored image data 122 from some other device. The processor 110 may be further programmed with outlining instructions 114 that analyze the image data 122 to locate edges of the objects 103, 105, 107 in the image 101 and generate data 124 representing the corresponding polygons 113, 115, 117. The polygon data 124 may identify, among other things, locations of endpoints of a plurality of line segments that make up each side of each polygon. The locations of the endpoints within the image 101 may be defined with respect to some coordinate system. An origin of the coordinate system may be arbitrarily defined and each location may be identified in terms of a 
Therefore, in combination of Ren and Lin, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify bounding box of Ren with optimize the number of line segments in the polygon surrounding the object of Mikhailov because this modification would automatically trace polygons around each object (¶0028 of Mikhailov)
Thus, the combination of Ren, Lin and Mikhailov teaches a method for generating a hybrid representation of a media unit, the method comprises: receiving or generating the media unit; processing the media unit by performing multiple iterations, wherein at least some of the multiple iterations comprises applying, by spanning elements of the iteration, dimension expansion process that are followed by a merge operation; selecting, based on an output of the multiple iterations, media unit regions of interest that contributed to the output of the multiple iterations; and providing the hybrid representation, wherein the hybrid representation comprises shape information regarding shapes of the media unit regions of interest, and a media unit signature that comprises identifiers that identify the media unit regions of interest ; wherein the shape information comprises polygons that represent shapes that substantially bound the media unit regions of interest, wherein a number of edges per polygon of the polygons is based on a shape of a media unit region of interest represented by the polygon.
Regarding claim 2, Ren, Lin and Mikhailov teach the method according to claim 1 wherein the selecting of the media regions of interest is executed per segment out of multiple segments of the media unit (  see section 3.1 Region proposal networks, second paragraph of Ren “To generate region proposals, we slide a small network over the convolutional feature map output by the last shared convolutional layer. This small network takes as input an n × n spatial window of the input convolutional feature map. Each sliding window is mapped to a lower-dimensional feature (256-d for ZF and 512-d for VGG, with ReLU [33] following). This feature is fed into two sibling fullyconnected layers—a box-regression layer (reg) and a box-classification layer (cls). We use n = 3 in this paper, noting that the effective receptive field on the input image is large (171 and 228 pixels for ZF and VGG, respectively). This mini-network is illustrated at a single position in Figure 3 (left). Note that because the mini-network operates in a sliding-window fashion, the fully-connected layers are shared across all spatial locations. This architecture is naturally implemented with an n×n convolutional layer followed by two sibling 1 × 1 convolutional layers (for reg and cls, respectively). 3.1.1 Anchors, “At each sliding-window location, we simultaneously predict multiple region proposals, where the number of maximum ∼2,400), there are WHk anchors in total.”; see 3. Feature Pyramid Networks of Lin “Our goal is to leverage a ConvNet’s pyramidal feature hierarchy, which has semantics from low to high levels, and build a feature pyramid with high-level semantics throughout. The resulting Feature Pyramid Network is general purpose and in this paper we focus on sliding window proposers (Region Proposal Network, RPN for short) [29] and region-based detectors (Fast R-CNN) [11]. We also generalize FPNs to instance segmentation proposals in Sec. 6.”) In addition, the same motivation is used as the rejection for claim 1.
Regarding claim 3, Ren, Lin and Mikhailov teach the method according to claim 1 wherein at least one of the polygons differs from a rectangle (Fig. 3C of Mikhailov where the polygon 117 differs from a rectangle) In addition, the same motivation is used as the rejection for claim 1.
he method according to claim 1 wherein the providing of the hybrid representation of the media unit comprises compressing the shape information of the media unit to provide compressed shape information of the media unit (3.1.1 Anchors of Ren , “At each sliding-window location, we simultaneously predict multiple region proposals, where the number of maximum possible proposals for each location is denoted as k. So the reg layer has 4k outputs encoding the coordinates of k boxes, and the cls layer outputs 2k scores that estimate probability of object or not object for each proposal4. The k proposals are parameterized relative to k reference boxes, which we can anchors. An anchor is centered at the sliding window in question, and is associated with a scale and aspect ratio (Figure 3, left). By default we use 3 scales and 3 aspect ratios, yielding k = 9 anchors at each) sliding position. For a convolutional feature map of a size W × H (typically ∼2,400), there are WHk anchors in total.”; Fig. 3 of Ren where label person, dog, car is considered as compressed shape) 
Regarding independent claim 8, Ren teaches a non-transitory computer readable medium for generating a hybrid representation of a media unit, the non-transitory computer readable medium stores instructions (see abstract of Ren For the very deep VGG-16 model [3], our detection system has a frame rate of 5fps (including all steps) on a GPU, while achieving state-of-the-art object detection accuracy on PASCAL VOC 2007, 2012, and MS COCO datasets with only 300 proposals per image. In ILSVRC and COCO 2015 competitions, Faster R-CNN and RPN are the foundations of the 1st-place winning entries in several tracks”) for: Remaining of claim 8 is similar in scope to claim 1 and therefore rejected under the same rationale.
the non-transitory computer readable medium according to claim 8 Remaining of claim 9 is similar in scope to claim 2 and therefore rejected under the same rationale.
Regarding claim 10, Ren, Lin and Mikhailov teach the non-transitory computer readable medium according to claim 8 Remaining of claim 10 is similar in scope to claim 3 and therefore rejected under the same rationale.
Regarding claim 11, Ren, Lin and Mikhailov teach the non-transitory computer readable medium according to claim 8 Remaining of claim 11 is similar in scope to claim 4 and therefore rejected under the same rationale.
Regarding independent claim 15, Ren teaches a hybrid representation generator for generating a hybrid representation of a media unit, the hybrid representation generator (section 3 FASTER R-CNN of Ren “Our object detection system, called Faster R-CNN, is composed of two modules.”) comprises: Remaining of claim 15 is similar in scope to claim 1 and therefore rejected under the same rationale.
Regarding claim 16, Ren, Lin and Mikhailov teach the hybrid representation generator according to claim 15 Remaining of claim 16 is similar in scope to claim 2 and therefore rejected under the same rationale.
Regarding claim 17, Ren, Lin and Mikhailov teach the hybrid representation generator according to claim 15 Remaining of claim 17 is similar in scope to claim 3 and therefore rejected under the same rationale.
Regarding claim 18, Ren, Lin and Mikhailov teach the hybrid representation generator according to claim 15 that is configured to Remaining of claim 18 is similar in scope to claim 4 and therefore rejected under the same rationale.
s 5-7, 12-14, 19-21  are rejected under 35 U.S.C. 103 as being unpatentable over  Ren, Shaoqing, et al. "Faster r-cnn: Towards real-time object detection with region proposal networks." IEEE transactions on pattern analysis and machine intelligence 39.6 (2016): 1137-1149.(“Ren”) in view of Lin, Tsung-Yi, et al. "Feature pyramid networks for object detection." Proceedings of the IEEE conference on computer vision and pattern recognition. 2017. (“Lin”) further in view of Mikhailov, U.S Patent Application Publication No. 20090102835 (“Mikhailov”)  further in view of Felzenszwalb, Pedro F., et al. "Object detection with discriminatively trained part-based models." IEEE transactions on pattern analysis and machine intelligence 32.9 (2009): 1627-1645.(“ Felzenszwalb”)
Regarding claim 5, Ren, Lin and Mikhailov teach the method according to claim 4 comprising: comparing the media unit signature of the media unit to signatures of multiple concept structures to find a matching concept structure that has at least one matching signature that matches to the media unit signature (3.1.2 Loss Function of Ren “For training RPNs, we assign a binary class label (of being an object or not) to each anchor. We assign a positive label to two kinds of anchors: (i) the anchor/anchors with the highest Intersection-over-Union (IoU) overlap with a ground-truth box, or (ii) an anchor that has an IoU overlap higher than 0.7 with any ground-truth box. Note that a single ground-truth box may assign positive labels to multiple anchors.  Usually the second condition is sufficient to determine the positive samples; but we still adopt the first condition for the reason that in some rare cases the second condition may find no positive sample. We assign a negative label to a non-positive anchor if its IoU ratio is lower than 0.3 for all ground-truth boxes. Anchors that are neither positive nor negative do not contribute to the training objective” where an anchor that has 
In the same field of endeavor, Felzenszwalb teaches calculating higher accuracy shape information that is related to regions of interest of the media unit, wherein the higher accuracy shape information is of higher accuracy than the compressed shape information of the media unit (see section 7.3 Contextual Information, “…Let (D1, . . . , Dk) be a set of detections obtained using k different models (for different object categories) in an image I. Each detection (B, s) ∈ Di is defined by a bounding box B = (x1, y1, x2, y2) and a score s. We define the context of I in terms of a k-dimensional vector c(I) = (σ(s1), . . . , σ(sk)) where si is the score of the highest scoring detection in Di, and σ(x) = 1/(1+exp(−2x)) is a logistic function for renormalizing the scores. To rescore a detection (B, s) in an image I we build a 25-dimensional feature vector with the original score of the detection, the top-left and bottom-right bounding box coordinates, and the image context, g = (σ(s), x1, y1, x2, y2, c(I)). (30) The coordinates x1, y1, x2, y2 ∈ [0, 1] are normalized by the width and height of the image. We use a category specific classifier to score this vector to obtain a new score for the detection. The classifier is trained to distinguish correct detections from false positives by integrating contextual information defined by g.” where score of highest scoring detection is considered higher accuracy shape information), wherein the calculating is based on shape information associated with at least some of the matching signatures 
Therefore, in the combination of Ren, Lin and Mikhailov, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify Region Proposal Network (RPN) simultaneously predicts object bounds and objectness scores at each position of Ren with bounding box predict, rescore detections using contextual information as seen Felzenszwalb because this modification would lead to a noticible  improvement in the average precision on several categories in the PASCAL datasets (see section 7.3 Contextual Information, last paragraph of Felzenszwalb).
Thus, the combination of Ren, Lin, Mikhailov, and Felzenszwalb teaches comparing the media unit signature of the media unit to signatures of multiple concept structures to find a matching concept structure that has at least one matching signature that matches to the media unit signature; and calculating higher accuracy shape information that is related to regions of interest of the media unit, wherein the higher accuracy shape information is of higher accuracy than the compressed shape information of the media unit, wherein the calculating is based on shape information associated with at least some of the matching signatures.
Regarding claim 6, Ren, Lin, Mikhailov, and Felzenszwalb teach the method according to claim 5 comprising determining shapes of the media unit regions of interest using the higher accuracy shape information (see section 7.3 Contextual Information of Felzenszwalb “…Let (D1, . . . , Dk) be a set of detections obtained using k different models (for different object categories) in an image I. Each detection (B, s) ∈ Di is defined by a bounding box B = (x1, y1, x2, y2) and a score s. We define the context of I in terms of a k-dimensional vector c(I) = (σ(s1), . . . , σ(sk)) where si is the score of the highest scoring detection in Di, and σ(x) = 1/(1+exp(−2x)) is a logistic function for renormalizing the scores. To rescore a detection (B, s) in an image I we build a 25-dimensional feature vector with the original score of the detection, the top-left and bottom-right bounding box coordinates, and the image context, g = (σ(s), x1, y1, x2, y2, c(I)). (30) The coordinates x1, y1, x2, y2 ∈ [0, 1] are normalized by the width and height of the image. We use a category specific classifier to score this vector to obtain a new score for the detection. The classifier is trained to distinguish correct detections from false positives by integrating contextual information defined by g.”; 8 EMPIRICAL RESULTS of Felzenszwalb “A predicted bounding box is considered correct if it overlaps more than 50% with a ground-truth bounding box, otherwise the bounding box is considered a false positive detection. Multiple detections are penalized. If a system predicts several bounding boxes that overlap with a single ground-truth bounding box, only one prediction is considered 
Regarding claim 7,Ren, Lin, Mikhailov, and Felzenszwalb teach the method according to claim 5 wherein for each media unit region of interest, the calculating of the higher accuracy shape information comprises virtually 36overlaying shapes of corresponding media units of interest of at least some of the matching signatures (7.1 Bounding Box Prediction, 7.2 Non-Maximum Suppression, “Using the matching procedure from Section 3.2 we usually get multiple overlapping detections for each instance of an object. We use a greedy procedure for eliminating repeated detections via non-maximum suppression. After applying the bounding box prediction method described above we have a set of detections D for a particular object category in an image. Each detection is defined by a bounding box and a score. We sort the detections in D by score, and greedily select the highest scoring ones while skipping detections with bounding boxes that are at least 50% covered by a bounding box of a previously selected detection”; 8 EMPIRICAL RESULTS of Felzenszwalb “A predicted bounding box is considered correct if it overlaps more than 50% with a ground-truth bounding box, otherwise the bounding box is considered a false positive detection. Multiple detections are penalized. If a system predicts several bounding boxes that overlap with a single ground-truth bounding box, only one prediction is considered correct, the others are considered false positives. One scores a system by the average precision (AP) of its precision-recall curve across a testset.”) In addition, the same motivation is used as the rejection or claim 5.
the non-transitory computer readable medium according to claim 11 that stores instructions for: Remaining of claim 12 is similar in scope to claim 5 and therefore rejected under the same rationale.
Regarding claim 13, Ren, Lin, Mikhailov and Felzenszwalb teach the non-transitory computer readable medium according to claim 12 that stores instructions for Remaining of claim 13 is similar in scope to claim 6 and therefore rejected under the same rationale.
Regarding claim 14, Ren, Lin, Mikhailov and Felzenszwalb teach the non-transitory computer readable medium according to claim 12 Remaining of claim 14 is similar in scope to claim 7 and therefore rejected under the same rationale.
Regarding claim 19, Ren, Lin, Mikhailov teach the hybrid representation generator according to claim 18 that configured to:  Remaining of claim 19 is similar in scope to claim 5 and therefore rejected under the same rationale.
Regarding claim 20, Ren, Lin, Mikhailov and Felzenszwalb teach the hybrid representation generator according to claim 19 that configured to:  Remaining of claim 20 is similar in scope to claim 6 and therefore rejected under the same rationale.
Regarding claim 21, Ren, Lin, Mikhailov and Felzenszwalb teach the hybrid representation generator according to claim 19 that configured to:  Remaining of claim 21 is similar in scope to claim 7 and therefore rejected under the same rationale.
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 

Contact
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SARAH LE whose telephone number is (571)270-7842.  The examiner can normally be reached on Monday: 8AM-4:30PM EST, Tuesday: 8 AM-3:30PM EST, Wednesday: 8AM-2:30PM EST, Thursday and Friday off.

If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Mark Zimmerman can be reached on 571-272-7653.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/SARAH LE/Primary Examiner, Art Unit 2619