DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 1-6 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention. Each of these claims recites the abbreviation “DNN” without the long form of the phrase. Claim 7 is rejected for its dependency on claim 4. For the purpose of examination, the examiner will treat the limitation as instead reciting “deep neural network (DNN)”.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have 

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1-7 is/are rejected under 35 U.S.C. 103 as being unpatentable over Kontschieder et al. (U.S. Pub. No. 2016/0071284), hereinafter “Kontschieder”, in view of Alvarez (U.S. Pub. No. 2016/0203360).
Claim 1 is met by the combination of Kontschieder and Alvarez, wherein
Kontschieder teaches:
A moving state analysis device (See the Abstract and Fig. 3.) comprising: 
a memory (See memory 310 in Fig. 3.); and 
a processor coupled to the memory and configured to (See processor 308 in Fig. 3.): 
detect, from image data associated with a frame, an object (See [0040]: “The centering process 406 may comprise using template-matching to detect a head or other specified body part of the person depicted in the frame.”) and a region of the object (See result 410 in Fig. 4 and [0040]: “Once detected this body part may be centered in the pre-processed image 410 and scaled or mapped 408 to a canonical depth (in the case that depth frames are involved).”), for each of frames (See frame 400 of multiple frames of a video in Fig. 4.) that constitute first video data captured in a course of movement of a first moving body (See Fig. 2 and [0021]: “In various examples described below, videos of motor tasks are processed to analyze the motor tasks. For example, to classify the motor tasks into two or more classes such as good, medium, weak (for example, to indicate performance level of the motor task). By using an automated, video processing system as described herein it is possible to obtain unbiased, accurate ratings/classifications in a fast and practical manner.”), and 
learn a [neural network] model (See trained machine learning system 508 in Fig. 5 and [0041]: “For example, the trained machine learning system comprises a random decision forest, an ensemble of randomized support vector machines, neural networks, or boosting systems.”) that takes video data and sensor data as input (See the depth camera data serving as “sensor data” in [0032]: “In some examples, in addition to or instead of, the depth camera 302, the capture device 100 can comprise an RGB camera 306. The RGB camera 306 is arranged to capture sequences of images of the scene at visible light frequencies, and can hence provide images that can be used to augment the depth images.”) and that outputs a probability of each moving state (See [0074]: “The processes outputs 816 at least one motor task class as a result, and is able to output a confidence weighting of the motor task class. This helps any subsequent algorithm assess whether the proposal is good or not. More than one motor class may be output; for example, where there is uncertainty.”), based on the first video data (See the videos in the “labeled videos of motor tasks” 500 in Fig. 5 and [0041]: “The labeled videos are pre-processed using preprocessing apparatus 118 described above to produce training data 502.”), a feature of first sensor data measured in relation to the first moving body and corresponding to a capture of the first video data (See the videos in the “labeled videos of motor tasks” 500 in Fig. 5 and [0032]: “In some examples, in addition to or instead of, the depth camera 302, the capture device 100 can comprise an RGB camera 306. The RGB camera 306 is arranged to capture sequences of images of the scene at visible light frequencies, and can hence provide images that can be used to augment the depth images.” Then see [0028], where the depth values serve as the claimed “feature of first sensor data”.), a detection result of the object and the region of the object (See processing 118 in Fig. 5 and [0040]: “The centering process 406 may comprise using template-matching to detect a head or other specified body part of the person depicted in the frame. Once detected this body part may be centered in the pre-processed image 410”.), and information that indicates a moving state associated with the first video data (See labels in the “labeled videos of motor tasks” 500 in Fig. 5 and [0041]: “The machine learning system is trained using labeled videos 500 of motor tasks. The labels indicate which class the depicted motor task falls into. The labels may be assigned by human judges for example.”).
Kontschieder does not appear to disclose the following; however, Alvarez teaches:
learn a DNN model (See [0036]: “In an embodiment, the processor may utilize data from or related to a Deep Neural Networks (DNN) to recognize gestures.”)
Motivation to combine:
Kontschieder and Alvarez together teach the limitations of claim 1. Alvarez is directed to a similar field of art (analysis of images and videos to identify gestures). Therefore, Kontschieder and Alvarez are combinable. Kontschieder only broadly discloses learning a neural network model. Modifying the system and method of Kontschieder by adding the capability of learning a deep neural network model, as taught by Alvarez, would yield the expected and predictable result of enabling features to be generated and optimally tuned for more effective moving state prediction. Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to combine Kontschieder and Alvarez in this way.

Claim 2 is met by the combination of Kontschieder and Alvarez, wherein
The combination of Kontschieder and Alvarez teaches:
The moving state analysis device according to claim 1, wherein the processor is configured to 
And Kontschieder further teaches:
detect, from image data associated with a frame, an object and a region of the object, for each of frames that constitute second video data captured in a course of movement of a second moving body (See Fig. 8, 800 and [0071]: “A video is referred to as ‘unseen’ to distinguish it from a training video which has the motor task class specified. Note that the unseen video can be pre-processed to an extent, as described above, with reference to FIG. 4.” Fig. 4 depicts detecting an object and a region of the object.), and calculate a probability of each moving state (See Fig. 8, 816 and [0074]: “The processes outputs 816 at least one motor task class as a result, and is able to output a confidence weighting of the motor task class.”), with respect to the second video data, by inputting into the [neural network] model the second video data (See Fig. 8, unseen video.), second sensor data measured in relation to the second moving body and corresponding to a capture of the second video data (In the case that an RGB camera and depth camera are both used as in Figs. 1 & 3 and [0032], depth data appears to be measured along with the unseen video.), and a detection result of the object and the region of the object detected from the image data associated with the frame from which the second video data is constituted, the [neural network] model being read and executed by the hardware processor (See Fig. 8, 800 and [0071]: “A video is referred to as ‘unseen’ to distinguish it from a training video which has the motor task class specified. Note that the unseen video can be pre-processed to an extent, as described above, with reference to FIG. 4.” Fig. 4 depicts detecting an object and a region of the object.).
And Alvarez further teaches:
DNN (See [0036]: “In an embodiment, the processor may utilize data from or related to a Deep Neural Networks (DNN) to recognize gestures.”)
Motivation to combine:
See the motivation to combine in the treatment of claim 1.

Claim 3 is met by the combination of Kontschieder and Alvarez, wherein
The combination of Kontschieder and Alvarez teaches:
The moving state analysis device according to claim 2, wherein the processor is configured to generate, based on the detection result of the object and the region of the object by the detection unit, data indicating for each object a feature of a region in which the object appears, learn the DNN model based on the generated data in relation to the first video data (See trained machine learning system 508 downstream from processing 118 in Fig. 5, where depth values (serving as the claimed “feature of a region in which the object appears”) are generated for a body part as described in [0040]: “Once detected this body part may be centered in the pre-processed image 410 and scaled or mapped 408 to a canonical depth (in the case that depth frames are involved).”.); and calculate a probability of each moving state based on the generated data in relation to the second video data (See [0074]: “The processes outputs 816 at least one motor task class as a result, and is able to output a confidence weighting of the motor task class. This helps any subsequent algorithm assess whether the proposal is good or not. More than one motor class may be output; for example, where there is uncertainty.”).

Claim 4 is met by the combination of Kontschieder and Alvarez for the reasons given in the treatment of claim 1. Kontschieder further teaches:
A moving state analysis method executed by a computer (See the Abstract.) comprising:

Claim 5 is met by the combination of Kontschieder and Alvarez for the reasons given in the treatment of claim 2.

Claim 6 is met by the combination of Kontschieder and Alvarez for the reasons given in the treatment of claim 3.

Claim 7 is met by the combination of Kontschieder and Alvarez for the reasons given in the treatment of claim 1. Kontschieder further teaches:
A non-transitory computer-readable recording medium having a program (See [0084].) that causes a computer to execute the moving state analysis method of claim 4.


Contact
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JONATHAN S LEE whose telephone number is (571)272-1981. The examiner can normally be reached 11 AM - 7 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Vincent Rudolph can be reached on 571-272-8243. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/Jonathan S Lee/Primary Examiner, Art Unit 2661