Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claim(s) 1 – 2, 8 – 9, 11, 13, 15 – 17 and 19 - 24 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Wang et al. (“Improving pedestrian detection using motion-guided filtering”, IDS), hereinafter referred as Wang.

Regarding claim 1, Wang discloses a method (Fig. 1), comprising: 
obtaining a first image of a scene at a first time (Fig. 1: current frame It); 
obtaining a second image of at least a portion of the scene (Fig. 1: background image Bt), the second image including image information captured at a second time that is different from the first time (Sect. 3.1, lines 14-15: “[ ... ] The initial background B0 is obtained following a temporal median filter on the first 200 frames of the video”); 
obtaining a difference (Fig. 1: temporal gradient Δt) between one or more pixels of the first image and one or more corresponding pixels of the second image (cf. equation (1) in Sect. 3.1); 
providing the obtained difference as an input to a machine-learning model (as illustrated in Fig. 1, cf. lines 1-5 of Sect. 3, the temporal gradient Δt is subjected to a non-linear filter operation (ii) for subsequently being input to a pedestrian detection step (iii); in one embodiment, said pedestrian detection step (iii) is performed by a deep learning based pedestrian detector called DeepPed, cf. lines 1-9 of Sect. 4); and 
obtaining, as an output from the machine-learning model responsive to providing the input, an identification of an object or an action depicted in at least one of the first image or the second image (Fig. 1: each bounding box in the final output Ot is an identification of a respective pedestrian in the image It).

Regarding claim 2 (depends on claim 1), Wang discloses the method further comprising providing at least one of the first image (the original frame It in Fig. 1) or the second image as an additional input to the machine-learning model (for generating the output Ot based on the original frame It).

Regarding claim 8 (depends on claim 1), Wang discloses the method wherein obtaining the difference between the one or more pixels of the first image and the one or more corresponding pixels of the second image comprises obtaining a raw difference or an absolute value of the raw difference between the one or more pixels of the first image and the one or more corresponding pixels of the second image (cf. equation (1) in Sect. 3.1).

Regarding claim 9 (depends on claim 8), Wang discloses the method further comprising applying a filter to the obtained difference prior to providing the obtained difference as the input to the machine-learning model (see the nonlinear filter operation (ii) in Fig. 1).

Regarding claim 11 (depends on claim 1), Wang discloses the method wherein obtaining the second image comprises generating the second image (i.e. the background image Bt) by combining a plurality of additional images each captured at a corresponding time prior to the first time (as expressed in equation (2) of Sect. 3.1, the frames I1, ... , It-1, which precede the current frame It are combined to generate the background image Bt).

Regarding claim 13 (depends on claim 1), Wang discloses the method wherein obtaining the first image comprises capturing the first image using a stationary camera, wherein obtaining the second image comprises capturing the second image with the stationary camera, and wherein each of the pixels of the first image and each of the corresponding pixels of the second image are capturing using the same physical pixel of the stationary camera (cf. Sect. 1, first bullet point: ''[ ... ] The filter[. .. ] works on a large variety of surveillance videos"; surveillance cameras are typically stationary CCTV cameras), wherein the corresponding pixels are the same physical pixels (cf. equation (1) in Sect. 3.1).

Regarding claim 15 (depends on claim 1), Wang discloses the method further comprising training the machine-learning model by: providing a training difference image as a training input to the machine-learning model, the training difference image generated from a subtraction of a first training image captured at a first time and a second training image captured at a second time, at least one of the first training image or the second training image including an image of a known training object; generating, as a training output of the machine-learning model using a set of weights of the machine-learning model and responsive to providing training difference image, a training output; comparing the training output from the machine-learning model with a label corresponding to the known training object; and adjusting one or more weights of the machine-learning model based on the comparing (cf. Sect. 4: "We tested[. .. ] DeepPed by Tome et al. [25], a deep learning based pedestrian detector"; the DeepPed detector is a convolutional neural network, which is trained by optimising the weights of the network).

Regarding claim 16 (depends on claim 1), Wang discloses the method wherein the identification of the object or the action comprises a classification of the object or the action.

Regarding claim 17, Wang discloses a method (Fig. 1), comprising: 
obtaining a first image of a scene captured at a first time (Fig. 1: current frame It); 
obtaining a second image (Fig. 1: background image Bt) containing image information for at least a portion of the scene captured at a second time that is prior to the first time (the recursive equation (2) in Set. 3.1 shows that the background image Bt comprises image information of the previous background image Bt-1, which is obtained inter alia from the previous frame It-1 captured at time t-1); 
providing the first image and the second image as input to a machine-learning model (as illustrated in Fig. 1, both the current frame It and the background image Bt are input to a subsequent process, which finally performs pedestrian detection; in one embodiment, the pedestrian detection is performed by a deep learning based pedestrian detector called DeepPed, cf. lines 1-9 of Sect. 4); and 
obtaining, as an output from the machine-learning model responsive to providing the input, an identification of an object or an action depicted in at least one of the first image or the second image (Fig. 1: each bounding box in the final output Ot is an identification of a respective pedestrian in the image It).

Regarding claims 19 and 22, they are corresponding to claim 11, thus, they are rejected for the same reason set forth for claim 11.

Regarding claim 20 (depends on claim 17), Wang discloses the method wherein the machine-learning model has been trained (cf. Sect. 4: "We tested[. .. ] DeepPed by Tome et al. [25], a deep learning based pedestrian detector"; the DeepPed detector is a convolutional neural network, which is trained by optimising the weights of the network) to recognize the object or the action based, at least in part, on a change in position of the object between the first image and the second image (as illustrated in Fig. 4 and in the lower branch of Fig. 1).

Regarding claim 21, Wang discloses a system (see the last five paragraph of Sect. 4), comprising: 
a camera configured to capture a first image of a scene (see the last five paragraph of Sect. 4); and 
one or more processors (see the last paragraph of Sect. 4), configured to: 
obtain the first image of the scene (Fig. 1: current frame It); 
obtain a second image of at least a portion of the scene (Fig. 1: background image Bt); 
obtain a difference (Fig. 1: temporal gradient Δt) between one or more pixels of the first image and one or more corresponding pixels of the second image (cf. equation (1) in Sect. 3.1); 
provide the obtained difference as an input to a machine-learning model (as illustrated in Fig. 1, cf. lines 1-5 of Sect. 3, the temporal gradient Δt is subjected to a non-linear filter operation (ii) for subsequently being input to a pedestrian detection step (iii); in one embodiment, said pedestrian detection step (iii) is performed by a deep learning based pedestrian detector called DeepPed, cf. lines 1-9 of Sect. 4); and 
obtain, as an output from the machine-learning model responsive to providing the input, an identification of an object or an action depicted in at least one of the first image or the second image (Fig. 1: each bounding box in the final output Ot is an identification of a respective pedestrian in the image It).

Regarding claim 23, it is corresponding to claim 13, thus, it is rejected for the same reason set forth for claim 13.

Regarding claim 24, it is corresponding to claim 9, thus, it is rejected for the same reason set forth for claim 9.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 3 – 5 and 18 is/are rejected under 35 U.S.C. 103 as being unpatentable over Wang.

Regarding claim 3 (depends on claim 2), Wang fails to explicitly disclose the method wherein providing the obtained difference and the at least one of the first image or the second image comprises concatenating the obtained difference and the first image, and providing a result of the concatenating as a single input to the machine-learning model.
	However, Wang discloses obtaining difference and the first image, and providing both as an input to the machine-learning model (as illustrated in Fig. 1, both the current frame It and the background image Bt are input to a subsequent process, which finally performs pedestrian detection; in one embodiment, the pedestrian detection is performed by a deep learning based pedestrian detector called DeepPed, cf. lines 1-9 of Sect. 4). 
Wang discloses a “base” method upon which the claimed invention can be seen as an improvement.
Wang contains a known technique that is applicable to the base method that both messages are inputted to the machine-learning model (Fig. 1).
One of ordinary skill in the art would have recognized that applying the known technique would have yield predictable results and resulted in an improved system (KSR scenario C, Use of Known Technique To Improve Similar Devices (Methods, or Products) in the Same Way).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Wang, and concatenating the obtained difference and the first image, and providing a result of the concatenating as a single input to the machine-learning model. The motivation for doing this is that the input can be more concise so that the application of Wang can be extended.

Regarding claim 4 (depends on claim 3), Wang discloses the method wherein the obtained difference comprises a difference image that includes a difference between each of the pixels of the first image and each corresponding pixel of the second image (cf. equation (1) in Sect. 3.1).

Regarding claim 5 (depends on claim 4), Wang discloses the method wherein the first image and the second image are each multi-channel images, wherein obtaining the difference comprises obtaining a difference image for each channel of the multi-channel images (second para. of Sect. 3.1, obtaining a difference image channel for each image channel, is an obvious intermediate step when calculating the Euclidean norm of the pixel values of the difference image in RGB space).

Regarding claim 18, it is corresponding to claim 3, thus, it is rejected for the same reason set forth for claim 3.

Claim(s) 6 is/are rejected under 35 U.S.C. 103 as being unpatentable over Wang in view of Huang et al. (US Patent Application Publication 2017/0270652), hereinafter referred as Huang.

Regarding claim 6 (depends on claim 5), Wang fails to explicitly disclose the method wherein concatenating the obtained difference and the first image comprises adding the difference image for each channel as an additional channel of the first image.
	However, in a similar field of endeavor Huang discloses an image processing method (Fig. 4). In addition, Huang discloses the method adding the difference image for each channel as an additional channel of the image ([0047], claim 17).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Wang, and adding the difference image for each channel as an additional channel of the image. The motivation for doing this that the defect/object can be more accurately detected so that the method of Wang can be more accurate.

Claim(s) 7 and 14 is/are rejected under 35 U.S.C. 103 as being unpatentable over Wang in view of Rigney et al. (US Patent 6,985,172, IDS), hereinafter referred as Rigney.

Regarding claim 7 (depends on claim 1), Wang fails to explicitly disclose the method further comprising performing image pre-processing operations on the first image and the second image prior to obtaining the difference.
	However, in a similar field of endeavor Rigney discloses a model-based incident detection system (abstract). In addition, Rigney discloses the system performing image pre-processing operations on the first image and the second image prior to obtaining the difference (Fig. 4: image intensity normalisation).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Wang, and performing image pre-processing operations on the first image and the second image prior to obtaining the difference. The motivation for doing this to correct for changing ambient conditions so that the method of Wang can be more accurate.

Regarding claim 14 (depends on claim 1), Wang fails to explicitly disclose the method further comprising, prior to obtaining the difference between the one or more pixels of the first image and the one or more corresponding pixels of the second image, aligning the first image and the second image to identify the one or more corresponding pixels of the second image that correspond to the one or more pixels of the first image.
	However, in a similar field of endeavor Rigney discloses a model-based incident detection system (abstract). In addition, Rigney discloses the system prior to obtaining the difference between the one or more pixels of the first image and the one or more corresponding pixels of the second image, aligning the first image and the second image to identify the one or more corresponding pixels of the second image that correspond to the one or more pixels of the first image (cf. col. 4, II. 43-45: ''[ ... ] image registration would be conducted as necessary for images obtained from variable aiming cameras or fixed cameras on non-rigid platforms").
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Wang, and prior to obtaining the difference between the one or more pixels of the first image and the one or more corresponding pixels of the second image, aligning the first image and the second image to identify the one or more corresponding pixels of the second image that correspond to the one or more pixels of the first image. The motivation for doing this is to be able to compare or integrate the data obtained from these different measurements.

Claim(s) 10 and 12 is/are rejected under 35 U.S.C. 103 as being unpatentable over Wang in view of Gaidon (European Patent Application Publication EP 3 229 206, IDS).

Regarding claim 10 (depends on claim 1), Wang fails to explicitly disclose the method wherein obtaining the second image comprises capturing the second image at the second time that is different from the first time, and wherein the second time is prior to the first time.
	However, in a similar field of endeavor Gaidon discloses a system for deep data association for online multi-class multi-object tracking (abstract). In addition, Gaidon discloses the system capturing the second image at the second time that is different from the first time, and wherein the second time is prior to the first time (cf. par. 8: “[…] transforming the previous and current video frames into a temporal difference input image”).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Wang, and capturing the second image at the second time that is different from the first time, and wherein the second time is prior to the first time. The motivation for doing this is that various of timing situation for input images can be handled so that the Application of Wang can be broadened.

Regarding claim 12 (depends on claim 1), Wang fails to explicitly disclose the method wherein obtaining the second image comprises selecting the second image from a candidate pool of images each captured at a corresponding time prior to the first time.
	However, in a similar field of endeavor Gaidon discloses a system for deep data association for online multi-class multi-object tracking (abstract). In addition, Gaidon discloses the obtaining the second image comprises selecting the second image from a candidate pool of images each captured at a corresponding time prior to the first time (cf. par. 25: “[…] the temporal difference generator 44 transforms two video frames into a temporal difference input image. In the contemplated embodiment, the video frames can be adjacent frames. Or, in the case where the system operates on every nth frame, the video frames can include a current frame and a temporally distant previous frame”).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Wang, and obtaining the second image comprises selecting the second image from a candidate pool of images each captured at a corresponding time prior to the first time. The motivation for doing this is that various of timing situation for input images can be handled so that the Application of Wang can be broadened.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to QIAN YANG whose telephone number is (571)270-7239.  The examiner can normally be reached on Monday-Thursday 6am-6pm.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Vu Le can be reached on 571-272-7332.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/QIAN YANG/Primary Examiner, Art Unit 2668