DETAILED ACTION

Notice of Pre-AIA  or AIA  Status

The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .


Claim Rejections - 35 USC § 102

The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claim(s) 1, 4, 9, 10, 13, 18 and 19 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Cobb et al. (US 2011/0052000).
Regarding claim 1, Cobb et al. discloses a method comprising: 
receiving, at a video analysis service, video data captured by one or more cameras at a particular location (“Network 110 receives video data (e.g., video stream(s), video images, or the like) from the video input source 105. The video input source 105 may be a video camera, a VCR, DVR, DVD, computer, web-cam device, or the like. For example, the video input source 105 may be a stationary video camera 
applying, by the service, a neural network-based model to portions of the video data, to identify objects within the video data (“In one embodiment, the machine learning engine may include a mapper component configured to parse data coming from the context event stream and the primitive event stream and to supply portions of these streams as input to multiple neural networks (e.g., Adaptive Resonance Theory networks). As is known, Adaptive Resonance Theory (ART) describes a class of neural network models which use supervised and unsupervised learning methods. Each individual ART network generates clusters from the set of inputs data specified for that ART network. Each cluster represents an observed statistical distribution of a particular thing or event being modeled by that ART network. Further, the mapper component may be configured to detect unusual events occurring in the scene depicted by the video frames” at paragraph 0022, line 1); 
mapping, by the service, outputs of the neural network-based model to symbols using a conceptual space, wherein the outputs of the model comprise the identified objects (“In one embodiment, the machine learning engine may also include an analyzer. The semantic labeler may send a symbol trajectory to the analyzer. The symbol trajectory may be derived from observing a foreground object moving through the scene. The symbol trajectory represents semantic concepts extracted from the trajectory. Further, the analyzer may determine whether the symbol trajectory is anomalous (relative to prior observation)” at paragraph 0026, line 1); 

sending, by the service, the alert to a user interface in conjunction with the video data (“Additionally, data describing whether a normal/abnormal behavior/event has been determined and/or what such behavior/event is may be provided to output devices 118 to issue alerts, for example, an alert message presented on a GUI screen” at paragraph 0033, line 8).
Regarding claim 4, Cobb et al. discloses a method wherein applying the neural network-based model to portions of the video data, to identify objects within the video data, comprises: 
tracking movement of an object over time across frames from the video data (“Further, the computer vision engine may identify features (e.g., height/width in pixels, average color values, shape, area, and the like) used to track the object from frame-to-frame” at paragraph 0020, line 10).
Regarding claim 9, Cobb et al. discloses a method wherein the identified objects comprise a vehicle (“In such a case, the computer vision engine could initially recognize the car as a foreground object; classify it as being a vehicle” at paragraph 0021, line 8) and a pedestrian (“Once identified, the object may be evaluated by a classifier 
Regarding claim 10, Cobb et al. discloses an apparatus, comprising: 
one or more network interfaces to communicate with a network (“The network 110 may transmit video data recorded by the video input 105 to the computer system 115” at paragraph 0030, line 7); 
a processor coupled to the network interfaces and configured to execute one or more processes (“Illustratively, the computer system 115 includes a CPU 120” at paragraph 0030, line 9); and 
a memory configured to store a process executable by the processor (“memory 130 containing both a computer vision engine 135 and a machine learning engine 140” at paragraph 0030, line 11), the process when executed configured to: 
receive video data captured by one or more cameras at a particular location (“Network 110 receives video data (e.g., video stream(s), video images, or the like) from the video input source 105. The video input source 105 may be a video camera, a VCR, DVR, DVD, computer, web-cam device, or the like. For example, the video input source 105 may be a stationary video camera aimed at a certain area (e.g., a subway station, a parking lot, a building entry/exit, etc.), which records the events taking place therein” at paragraph 0031, line 1); 
apply a neural network-based model to portions of the video data, to identify objects within the video data (“In one embodiment, the machine learning engine may include a mapper component configured to parse data coming from the context event stream and the primitive event stream and to supply portions of these streams as input 
map outputs of the neural network-based model to symbols using a conceptual space, wherein the outputs of the model comprise the identified objects (“In one embodiment, the machine learning engine may also include an analyzer. The semantic labeler may send a symbol trajectory to the analyzer. The symbol trajectory may be derived from observing a foreground object moving through the scene. The symbol trajectory represents semantic concepts extracted from the trajectory. Further, the analyzer may determine whether the symbol trajectory is anomalous (relative to prior observation)” at paragraph 0026, line 1); 
apply a symbolic reasoning engine to the symbols, to generate an alert (“For example, the analyzer may compute a likelihood of observing the symbol trajectory (based on symbol trajectories previously observed in the scene). Further still, anomalous behavior of a foreground object (i.e., behavior that produces a symbol trajectory determined to be anomalous) may result in an alert passed to users of the behavioral recognition system” at paragraph 0026, line 8); and 

Regarding claim 13, Cobb et al. discloses an apparatus wherein the apparatus applies the neural network-based model to portions of the video data, to identify objects within the video data, by: 
tracking movement of an object over time across frames from the video data (“Further, the computer vision engine may identify features (e.g., height/width in pixels, average color values, shape, area, and the like) used to track the object from frame-to-frame” at paragraph 0020, line 10).
Regarding claim 18, Cobb et al. discloses an apparatus wherein the identified objects comprise a vehicle (“In such a case, the computer vision engine could initially recognize the car as a foreground object; classify it as being a vehicle” at paragraph 0021, line 8) and a pedestrian (“Once identified, the object may be evaluated by a classifier configured to determine what is depicted by the foreground object (e.g., a vehicle or a person)” at paragraph 0020, line 5).
Regarding claim 19, Cobb et al. discloses tangible, non-transitory, computer-readable medium storing program instructions (“One embodiment of the invention is implemented as a program product for use with a computer system. The program(s) of the program product defines functions of the embodiments (including the methods described herein) and can be contained on a variety of computer-readable storage media. Examples of computer-readable storage media include (i) non-writable storage 
receiving, at a video analysis service, video data captured by one or more cameras at a particular location (“Network 110 receives video data (e.g., video stream(s), video images, or the like) from the video input source 105. The video input source 105 may be a video camera, a VCR, DVR, DVD, computer, web-cam device, or the like. For example, the video input source 105 may be a stationary video camera aimed at a certain area (e.g., a subway station, a parking lot, a building entry/exit, etc.), which records the events taking place therein” at paragraph 0031, line 1); 
applying, by the service, a neural network-based model to portions of the video data, to identify objects within the video data (“In one embodiment, the machine learning engine may include a mapper component configured to parse data coming from the context event stream and the primitive event stream and to supply portions of these streams as input to multiple neural networks (e.g., Adaptive Resonance Theory networks). As is known, Adaptive Resonance Theory (ART) describes a class of neural network models which use supervised and unsupervised learning methods. Each individual ART network generates clusters from the set of inputs data specified for that ART network. Each cluster represents an observed statistical distribution of a particular thing or event being modeled by that ART network. Further, the mapper component 
mapping, by the service, outputs of the neural network-based model to symbols using a conceptual space, wherein the outputs of the model comprise the identified objects (“In one embodiment, the machine learning engine may also include an analyzer. The semantic labeler may send a symbol trajectory to the analyzer. The symbol trajectory may be derived from observing a foreground object moving through the scene. The symbol trajectory represents semantic concepts extracted from the trajectory. Further, the analyzer may determine whether the symbol trajectory is anomalous (relative to prior observation)” at paragraph 0026, line 1); 
applying, by the service, a symbolic reasoning engine to the symbols, to generate an alert (“For example, the analyzer may compute a likelihood of observing the symbol trajectory (based on symbol trajectories previously observed in the scene). Further still, anomalous behavior of a foreground object (i.e., behavior that produces a symbol trajectory determined to be anomalous) may result in an alert passed to users of the behavioral recognition system” at paragraph 0026, line 8); and 
sending, by the service, the alert to a user interface in conjunction with the video data (“Additionally, data describing whether a normal/abnormal behavior/event has been determined and/or what such behavior/event is may be provided to output devices 118 to issue alerts, for example, an alert message presented on a GUI screen” at paragraph 0033, line 8).


Claim Rejections - 35 USC § 103

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 2, 7, 8, 11, 16, 17 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Cobb et al. and Chaubard (US 2020/0005225).
Regarding claim 2, Cobb et al. discloses a method as described in claim 1 above.
Cobb et al. does not explicitly disclose that sending the alert to the user interface in conjunction with the video data comprises: providing the alert as an overlay for one or more frames of the video data.
Chaubard teaches a method in the same field of endeavor of scene surveillance, wherein sending the alert to the user interface in conjunction with the video data comprises: 
providing the alert as an overlay for one or more frames of the video data (“In another embodiment, the out-of-stock detection system presents the image with the generated bounding box 120 for each product label 115 for display to a store associate who manually identifies the product 110 associated with each bounding box 120 and inputs identifying information for the product 110, such as the SKU number, into an 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to utilize the bounding box display as taught by Chaubard for the alert of Cobb et al. to clearly identify the area of the frame where the anomaly is detected.
Regarding claim 7, Cobb et al. discloses a method as described in claim 1 above.
Cobb et al. does not explicitly disclose that the identified objects comprise a shelf and one or more items on the shelf.
Chaubard teaches a method in the same field of endeavor of scene surveillance, wherein the identified objects comprise a shelf and one or more items on the shelf (“Once the bounding boxes 120 are generated for each product label 115, the product 110 associated with each bounding box 120 is identified. As used herein, a product 110 associated with a bounding box 120 is an item of the same type and of the same brand. For example, a shelf unit may display multiple different brands and kinds of soap. Thus, as used herein, a product 110 refers to the same kind of soap being offered under the same brand name and each of the different brands and kinds of soap would have their own bounding box 120. Moreover, a product label 115 identifies a product display location for displaying the same type of product of the same brand, which is bounded by a bounding box 120” at paragraph 0017).

Regarding claim 8, Chaubard discloses a method wherein the alert is indicative of an item availability on the shelf (“In another embodiment, the out-of-stock detection system presents the image with the generated bounding box 120 for each product label 115 for display to a store associate who manually identifies the product 110 associated with each bounding box 120 and inputs identifying information for the product 110, such as the SKU number, into an input field for each bounding box 120 provided in a user interface by the out-of-stock detection system” at paragraph 0021).
Regarding claim 11, Cobb et al. discloses an apparatus as described in claim 10 above.
Cobb et al. does not explicitly disclose that sending the alert to the user interface in conjunction with the video data comprises: providing the alert as an overlay for one or more frames of the video data.
Chaubard teaches an apparatus in the same field of endeavor of scene surveillance, wherein the apparatus sends the alert to the user interface in conjunction with the video data by: 
providing the alert as an overlay for one or more frames of the video data (“In another embodiment, the out-of-stock detection system presents the image with the generated bounding box 120 for each product label 115 for display to a store associate who manually identifies the product 110 associated with each bounding box 120 and inputs identifying information for the product 110, such as the SKU number, into an 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to utilize the bounding box display as taught by Chaubard for the alert of Cobb et al. to clearly identify the area of the frame where the anomaly is detected.
Regarding claim 16, Cobb et al. discloses an apparatus as described in claim 10 above.
Cobb et al. does not explicitly disclose that the identified objects comprise a shelf and one or more items on the shelf.
Chaubard teaches an apparatus in the same field of endeavor of scene surveillance, wherein the identified objects comprise a shelf and one or more items on the shelf (“Once the bounding boxes 120 are generated for each product label 115, the product 110 associated with each bounding box 120 is identified. As used herein, a product 110 associated with a bounding box 120 is an item of the same type and of the same brand. For example, a shelf unit may display multiple different brands and kinds of soap. Thus, as used herein, a product 110 refers to the same kind of soap being offered under the same brand name and each of the different brands and kinds of soap would have their own bounding box 120. Moreover, a product label 115 identifies a product display location for displaying the same type of product of the same brand, which is bounded by a bounding box 120” at paragraph 0017).
Regarding claim 17, Chaubard discloses an apparatus wherein the alert is indicative of an item availability on the shelf (“In another embodiment, the out-of-stock 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to utilize the system of Cobb et al. in a store setting as taught by Chaubard to be able to identify anomalous events in a retail environment.
Regarding claim 20, Cobb et al. discloses a computer-readable medium as described in claim 19 above.
Cobb et al. does not explicitly disclose that sending the alert to the user interface in conjunction with the video data comprises: providing the alert as an overlay for one or more frames of the video data.
Chaubard teaches a method in the same field of endeavor of scene surveillance, wherein sending the alert to the user interface in conjunction with the video data comprises: 
providing the alert as an overlay for one or more frames of the video data (“In another embodiment, the out-of-stock detection system presents the image with the generated bounding box 120 for each product label 115 for display to a store associate who manually identifies the product 110 associated with each bounding box 120 and inputs identifying information for the product 110, such as the SKU number, into an input field for each bounding box 120 provided in a user interface by the out-of-stock detection system” at paragraph 0021).
.

Claims 3 and 12 are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Cobb et al. and Ramani et al. (US 2004/0249809).
Regarding claim 3, Cobb et al. discloses a method as described in claim 1 above.
Cobb et al. does not explicitly disclose that applying the neural network-based model to portions of the video data, to identify objects within the video data, comprises: dividing a frame from the video data into segmented regions by applying a segmented Bezier curve approximation to the frame.
Ramani et al. teaches a method in the same field of endeavor machine learning based object recognition, wherein applying the neural network-based model to portions of the video data, to identify objects within the video data, comprises: 
dividing a frame from the video data into segmented regions by applying a segmented Bezier curve approximation to the frame (“Bezier curves are used to obtain a parametric equation of the prismatic skeletal entities. This provides an affine invariant measure of the curve shape. An algorithm based on the curvature for curve similarity comparison is used and similar curves have similar curvature profiles. To compare two curves the curves are divided into a defined number of entities and the curvature profile for the curve is the sequence of these curvatures from one end of the curve to the other. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to utilize the Bezier curve approximation as taught by Ramani et al. on the video data of Cobb et al. to be able to characterize the shapes of the objects for subsequent processing.
Regarding claim 12, Cobb et al. discloses an apparatus as described in claim 10 above.
Cobb et al. does not explicitly disclose that applying the neural network-based model to portions of the video data, to identify objects within the video data, comprises: dividing a frame from the video data into segmented regions by applying a segmented Bezier curve approximation to the frame.
Ramani et al. teaches an apparatus in the same field of endeavor machine learning based object recognition, wherein the apparatus applies the neural network-based model to portions of the video data, to identify objects within the video data, by: 
dividing a frame from the video data into segmented regions by applying a segmented Bezier curve approximation to the frame (“Bezier curves are used to obtain a parametric equation of the prismatic skeletal entities. This provides an affine invariant measure of the curve shape. An algorithm based on the curvature for curve similarity comparison is used and similar curves have similar curvature profiles. To compare two curves the curves are divided into a defined number of entities and the curvature profile for the curve is the sequence of these curvatures from one end of the curve to the other. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to utilize the Bezier curve approximation as taught by Ramani et al. on the video data of Cobb et al. to be able to characterize the shapes of the objects for subsequent processing.

Claims 5, 6, 14 and 15 are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Cobb et al. and Modayil (“Robot Developmental Learning of an Object Ontology Grounded in Sensorimotor Experience”).
Regarding claim 5, Cobb et al. discloses a method as described in claim 1 above.
Cobb et al. does not explicitly disclose that mapping outputs of the neural network-based model to symbols using a conceptual space comprises: applying a seed ontology to the outputs of the neural network-based model.
Modayil teaches a method in the same field of endeavor of machine learning object recognition, wherein mapping outputs of the neural network-based model to symbols using a conceptual space comprises: applying a seed ontology to the outputs of the neural network-based model (“The robot’s task is to create an object ontology that enables object perception, reasoning, and interaction. Figure 3.1(d) shows the robot’s description of the environment after the object ontology has been learned. The robot can then represent an object in the environment, recognize the object, and estimate an 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to utilize the object recognition system as taught by Cobb et al. in a robotic application as taught by Modayil as the “learned ontology creates trackers for individual objects, forms percepts from observations, forms classes to generalize from past experience, and learns actions to change the perceived properties of an object” (Modayil at section 6.10, paragraph 3, line 2).
Regarding claim 6, Modayil discloses a method further comprising: 
using a sensori-motor control system to expand the ontology for a particular object (“This chapter describes an algorithm for a robot to autonomously learn new high-level actions to change object properties. This work extends the learned ontology of objects by learning actions that are grounded in the robot’s sensorimotor experience” at section 6.1, paragraph 2, line 1).
Regarding claim 14, Cobb et al. discloses an apparatus as described in claim 10 above.
Cobb et al. does not explicitly disclose that mapping outputs of the neural network-based model to symbols using a conceptual space comprises: applying a seed ontology to the outputs of the neural network-based model.

It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to utilize the object recognition system as taught by Cobb et al. in a robotic application as taught by Modayil as the “learned ontology creates trackers for individual objects, forms percepts from observations, forms classes to generalize from past experience, and learns actions to change the perceived properties of an object” (Modayil at section 6.10, paragraph 3, line 2).
Regarding claim 15, Modayil discloses an apparatus wherein the process when executed is further configured to: use a sensori-motor control system to expand the ontology for a particular object (“This chapter describes an algorithm for a robot to autonomously learn new high-level actions to change object properties. This work 


Conclusion


Any inquiry concerning this communication or earlier communications from the examiner should be directed to KATRINA R FUJITA whose telephone number is (571)270-1574.  The examiner can normally be reached on Monday - Friday 9:30-5:30 pm ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Sumati Lefkowitz can be reached on 5712723638.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-






/KATRINA R FUJITA/Primary Examiner, Art Unit 2662