Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

DETAILED ACTION
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale or otherwise available to the public before the effective filing date of the claimed invention.
 
Claims 1-4, 7-12, 15-18 and 20 are rejected under 35 U.S.C. 102(a) (1) as being anticipated by Bleiweiss et al (Patent. No.: U.S. 9,928,605 B2).   
          Regarding claim 1, Bleiweiss an image processing system comprising: an image data interface arranged to receive first image data representative of an image frame (see abstract, various systems and methods for real-time cascaded object recognition are described herein. A system for real-time cascaded object recognition comprises a processor; and a memory, including instructions, which when executed on the processor, cause the processor to perform the operations comprising: accessing image data at the system, the image data of an environment around the system, the image data is captured by a camera system; determining a set of regions in the image data, the set of regions including candidate objects; transmitting a subset of the image data corresponding to the set of regions to a remote server, the remote server to analyze the subset of the image data and detect an object in the subset of the image data; and receiving at 
           Also column 2, lines 35-40 and lines 66-67, the input video stream 102 includes a scene with multiple objects. In an object proposal stage 104, one or more of the objects are identified as candidates and the portions of the image with the candidate objects are segmented. Some of the candidates may be filtered based on search criteria. The input image may be a portion of a video stream (e.g., one frame of a video stream))’
           a first, local, object classifier arranged to perform object classification in the image frame (see above, also column 2, lines 39-40, a local classifier (first classifier) may be used to filter the objects);
           and storage arranged to store categorization data comprising a set of object definitions for use during object classification (see column 3, lines 14-31, a local classifier 206 is executed on the local machine (e.g., user device). The local classifier 206 is a simple classifier, which is able to run quickly on the local device. The goal of this classifier is to rule out obvious image regions, so that only relevant regions are sent to the cloud for additional processing. The local classifier 206 may filter the image data based on surface characteristics of the objects in the image (e.g., planar or non-planar); dimensions, volume, or colors of objects; distance objects are from the camera; IMU data; or location information (e.g., global positioning system (GPS) coordinates or Simultaneous Localization and Mapping (SLAM) tags). Distance or depth data may be obtained using a depth camera in the camera system of the local machine. A depth camera may include an infrared (IR) camera that is able to pick up a projected IR light, which may be projected from an IR laser projector. While some depth cameras use a single IR camera, the use of multiple IR cameras provides a stereoscopic IR to produce depth.

           wherein the first object classifier is arranged to: detect an object in the image frame; determine whether to transmit image data for the detected object to a second, remote, object classifier (see column 2, lines 51-67, the process illustrated in FIG. 1 may run continually or periodically so that a user, a robot, a drone, or other camera-enabled device may detect and identify objects as it moves through the room. For example, as the camera moves toward the sofa, additional candidate segments may be obtained and sent to a cloud service to identify. If additional objects, such as a pillow and a remote control are on the sofa, the additional objects may recognized and their identification may be provided to the local device in the room. In this manner, a robot tasked to retrieve a remote control from a sofa may use the ongoing object recognition process to first find the sofa and then after moving to the sofa, finding the remote control on the sofa. FIG. 2 is a block diagram illustrating control and data flow 200, according to 
           in response to said determining: transmit second image data, derived from the first image data, to the second object classifier (column 2, lines 30-47, the client system may be a user device operated by a user. The user device may be any type of compute device including, but not limited to a mobile phone, a smartphone, a phablet, a tablet, a personal digital assistant, a laptop, a digital camera, a desktop computer, an in-vehicle infotainment system, or the like. The input video stream 102 includes a scene with multiple objects. In an object proposal stage 104, one or more of the objects are identified as candidates and the portions of the image with the candidate objects are segmented. Some of the candidates may be filtered based on search criteria. A local classifier (first classifier) may be used to filter the objects. Candidates that pass the filter are sent individually to the cloud (stage 106). The candidates may be sent to the same cloud service or different ones. For example, one cloud service may be used to classify furniture while another may be used to classify lights and lamps. Based on the local classification, the appropriate cloud service classifier may be selected and the candidate may be sent to the appropriate cloud service classifier (second classifier));
           and receive object data, representative of the detected object, from the second object classifier (see column 2, lines 48-50, full classifiers run in the cloud (stage 108) and the objects that are detected are returned to the client).
          Regarding claim 2, Bleiweiss an image processing system according to claim 1, wherein the first object classifier being arranged to determine whether to transmit image data for the detected object to the second object classifier comprises the first object classifier being arranged to determine that the detected object does not have a predetermined correspondence with an 
           Also column 7, line 14-24, in an embodiment, the system 600 comprises an image compression module 610 to compress the subset of the image data before transmitting the subset 
           Finally, column 8, lines 5-25, in an embodiment, the method 700 also includes compressing the subset of the image data before transmitting the subset of the image data to the remote server. In a further embodiment, compressing the subset of the image data is performed using one of: run-length encoding, area image compression, differential pulse-code modulation (DPCM) and predictive coding, entropy encoding, adaptive dictionary algorithms, deflation, chain codes, reducing color space, chroma subsampling, transform coding, or fractal compression. In an embodiment, the method 700 includes displaying the indication of the object on a display of the compute device. For example, the object's identification may be displayed to a user to confirm the correctness of the identification. Either the local or remote classifier may be trained using user feedback. Thus, in various embodiments, a feedback mechanism is implemented in case of mis-detected objects or false positives. For example, the user may indicate a false detection, upload the relevant images, and re-train the classifier on the cloud). 
          Regarding claim 3, Bleiweiss an image processing system according to claim 1, wherein the first object classifier being arranged to determine whether to transmit image data for the detected object to the second object classifier comprises the first object classifier being arranged to: select a subset of the set of object definitions and determine whether the detected object has a predetermined correspondence with an object definition in the subset of object definitions (see claim 2, also 4, lines 5-20, tThe compressed image data is transmitted to one or more cloud 
          Regarding claim 4, Bleiweiss an image processing system according to claim 3, wherein the first object classifier is arranged to select the subset of object definitions based on a most recently used scheme (column 2, lines 12-26, the systems and methods described herein support a large set of recognized objects by using cloud resources. This mechanism yields a just-in-time usage module of the classifier, embracing the cloud for hierarchical layers of the classifier depending on the robot or the device's location. This approach enables an infinite number of recognized object, which would be impossible using a purely local approach given the limitations of local hardware. Several mechanisms are implemented to efficiently use cloud resources. While local devices may have less power and resources, a local classifier may be used to pre-process an image and identify regions of the image, which may then be segmented, compressed, and transmitted to cloud resources for further processing. Additional operations are discussed below.

          Regarding claim 7, Bleiweiss an image processing system according to claim 1, wherein the first object classifier is arranged to: determine a location of the detected object in the image frame; define a portion of the image frame on the basis of the determined location; and derive the second image data, for transmitting to the second object classifier, based on the portion of the image frame (see claim 1, also column 3, lines 50-55, in another aspect, using IMU data, the local classifier 206 may confirm that objects are consistently located in the image based on gravity and accelerometer input. Such data may be used by the local classifier 206 to filter on 
          Regarding claim 8, Bleiweiss an image processing system according to claim 1, wherein the first image data comprises first feature data representative of at least one feature in the image frame (see column 4, lines 48-62, in an embodiment, to increase efficiency, the local device compresses the image data before sending it to the cloud. In one aspect, rather than stream the video data all the time, interesting object proposals are segmented using depth data and only that segmented data is streamed to the cloud. This enables real-time streaming of large environments, focusing on relevant data only. Using depth data provides an additional efficiency because a standard RGB camera would not work as well due to the complexity of differentiating between object colors. Instead of depth data, edge detection or other “feature detection” algorithms may 
          Regarding claim 9, Bleiweiss an image processing system according to claim 1, wherein the second image data comprises second feature data derived from the first image data using the object classifier (see column 4, lines 5-20, the compressed image data is transmitted to one or more cloud services where a full classifier is used for additional processing. At stage 210, the full classifier is applied at a cloud service. The classifier may be a convolutional neural network (CNN or ConvNet), which is a type of feed-forward artificial neural network where the individual neurons are tiled in such a way that they respond to overlapping regions in the visual field. The full classifier may detect a specific object in the selected pixels (image region), detect a specific brand or manufacturer, detect a model number, or the like. The full classifier may work as the camera/robot moves closer to the object of interest. For example, additional features may become visible on an object and sent to the cloud, such as a logo on the object, which may be classified. The additional features (e.g., model, make, brand, etc.) may be returned to the user device. Also column 4, lines 48-62, in an embodiment, to increase efficiency, the local device compresses the image data before sending it to the cloud. In one aspect, rather than stream the video data all the time, interesting object proposals are segmented using depth data and only that segmented data is streamed to the cloud. This enables real-time streaming of large environments, focusing on relevant data only. Using depth data provides an additional efficiency because a standard RGB camera would not work as well due to the complexity of differentiating between object colors. Instead of depth data, edge detection or other “feature detection” algorithms may be used for segmentation. However, using depth data for segmentation is advantageous because other segmentation algorithms are more processor intensive, less robust to lighting changes, etc). 

           Also column 3, lines 14-31, a local classifier 206 is executed on the local machine (e.g., user device). The local classifier 206 is a simple classifier, which is able to run quickly on the local device. The goal of this classifier is to rule out obvious image regions, so that only relevant regions are sent to the cloud for additional processing. The local classifier 206 may filter the image data based on surface characteristics of the objects in the image (e.g., planar or non-planar); dimensions, volume, or colors of objects; distance objects are from the camera; IMU data; or location information (e.g., global positioning system (GPS) coordinates or Simultaneous Localization and Mapping (SLAM) tags). Distance or depth data may be obtained using a depth camera in the camera system of the local machine. A depth camera may include an infrared (IR) camera that is able to pick up a projected IR light, which may be projected from an IR laser projector. While some depth cameras use a single IR camera, the use of multiple IR cameras provides a stereoscopic IR to produce depth).
          Regarding claim 11, Bleiweiss an image processing system according to claim 1, wherein the first object classifier is arranged to further receive second categorization data, comprising an object definition corresponding to the detected object, from the second object classifier (see column 3, lines 14-31, a local classifier 206 is executed on the local machine (e.g., user device). The local classifier 206 is a simple classifier, which is able to run quickly on the local device. The goal of this classifier is to rule out obvious image regions, so that only relevant regions are sent to the cloud for additional processing. The local classifier 206 may filter the image data based on surface characteristics of the objects in the image (e.g., planar or non-planar); dimensions, volume, or colors of objects; distance objects are from the camera; IMU data; or location information (e.g., global positioning system (GPS) coordinates or Simultaneous Localization and Mapping (SLAM) tags). Distance or depth data may be obtained using a depth camera in the camera system of the local machine. A depth camera may include an infrared (IR) camera that is able to pick up a projected IR light, which may be projected from an IR laser projector. While some depth cameras use a single IR camera, the use of multiple IR cameras provides a stereoscopic IR to produce depth).
          Regarding claim 12, Bleiweiss an image processing system according to claim 11, wherein the first object classifier is arranged to update the first categorization data by at least one of: (a) including the received object definition in the set of object definitions; or (b) replacing an existing object definition in the set of object definitions with the received object definition (see claim 1, also column 5, lines 6-14, in FIG. 4 another object 400 is segmented to identify three regions 402A, 402B, 402C, and certain pixels 404A, 404B, 404C are identified in each of the regions 402. The pixels 404 are transmitted to the cloud for classification. In both FIGS. 3 and 4, the removal of regions of the complete image, which do not include proposed objects, effectively reduces the amount of data to send. This reductions acts as a form of image compression and may remove large portions of the original image's pixels). 
           With regard to claims 15-18 and 20 the arguments analogous to those presented above for claims 1, 2, 3, 4, 7, 8, 9, 10, 11 and 12 are respectively applicable to claims 15-18 and 20.  

Allowable Subject Matter
Claims 5, 6, 13, 14 and 19 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

                                                       Contact Information
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Seyed Azarian whose telephone number is (571) 272-7443. The examiner can normally be reached on Monday through Thursday from 6:00 a.m. to 7:30 p.m. 
           If attempts to reach the examiner by telephone are unsuccessful, the examiner's supervisor, Matthew Bella, can be reached at (571) 272-7778. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
           Information regarding the status of an application may be obtained from the Patent Application information Retrieval (PAIR) system. Status information for published application may be obtained from either Private PAIR or Public PAIR.
Status information about the PAIR system, see http:// pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free).
/SEYED H AZARIAN/Primary Examiner, Art Unit 2667
April 17, 2021