DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Drawings
The drawings are objected to because the text appearing in the drawings has not been translated into English.  Corrected drawing sheets in compliance with 37 CFR 1.121(d) are required in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended. The figure or figure number of an amended drawing should not be labeled as “amended.” If a drawing figure is to be canceled, the appropriate figure must be removed from the replacement sheet, and where necessary, the remaining figures must be renumbered and appropriate changes made to the brief description of the several views of the drawings for consistency. Additional replacement sheets may be necessary to show the renumbering of the remaining figures. Each drawing sheet submitted after the filing date of an application must be labeled in the top margin as either “Replacement Sheet” or “New Sheet” pursuant to 37 CFR 1.121(d). If the changes are not accepted by the examiner, the applicant will be notified and informed of any required corrective action in the next Office action. The objection to the drawings will not be held in abeyance.
Claim Objections
Claim 16 is objected to because of the following informalities: the claim at lines 22-23 appears to contain a typographical error in the form of an extraneous “and” in “ensuring a recording of the 2D thermal images and any said 2D video images and is effected at a same time” where the second “and” makes the claim grammatically incorrect and while not resulting in indefiniteness, results in making the claim less clear.  In the interest of compact prosecution the Examiner will interpret the claim as if recites, “ensuring a recording of the 2D thermal images and any said 2D video images .  Appropriate correction is required.
Claims 22-23 and 31 are objected to because of the following informalities:  the claims refer to an abbreviation or acronym “WB” in which it is not clear what is actually meant by the acronym, although its functional characteristics can be gleaned from inspection.  It is possible this arises from the same issue as the drawing sheets not being translated as they also refer to “WB” in the drawings without any explanation.  Appropriate correction is required.
Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation(s) is/are: “calibration unit for ensuring…”, “synchronization unit for ensuring…”, “segmentation unit for segmenting…”, “reconstruction unit for reconstructing…”, “projection unit for projecting…”, and “identification unit for identifying…” in claim claim 16 and through its child claims; as well as a “2D supplementation unit that supplements” of claim 17, a “3D supplementation unit, which supplements” of claim 18, “assignment unit, which assigns” of claim 24, a “weighting unit which weights…” of claim 26 
Claim 25’s “assignment unit” is not interpreted as such given that “any image processing units” may be used as the structure to perform the claimed technique.
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claim(s) 16 and 20-30 is/are rejected under 35 U.S.C. 102(a)(1) and/or 102(a)(2) as being anticipated by Kanaujia et al1 (“Kanaujia”).
Regarding claim 16,  teaches a motion analysis system for at least one moved or moving object which is thermally distinct from their surroundings, the motion analysis system comprising (see below for the elements acting together to function as such a system; note that the system is “for at least one moved or moving object which is thermally distinct from their surroundings” such that this is an intended use and is given weight according to the body of the claim addressed below where the object is referred to and note that a moved or moving object considers both a state where something is not moving but has moved in the past such as “moved” and “moving” would mean that its state is assessed is continuing to move in some regard, and further note that objects may be considered thermally distinct from their surroundings if any difference or quality or characteristic may be ascribed to the object compared to a surrounding or background):
a group of cameras (see Kanaujia, paragraphs 0077-0079 teaching “video surveillance system 101” which “may be configured to monitor a scene to estimate human shapes of detected human objects in one or more video streams” such that one or more video streams could come from a group of cameras in this context as further explained where the system “provides multiple video streams from multiple video sources” such that it “may comprise three video cameras operating to take a video of an area to be monitored” where these “video sources” are taught as being any of “one or more of the following: a video imager and lens apparatus; a video camera; a digital video camera; a color camera; a monochrome camera; a camera; a camcorder; a PC camera; a webcam; an infrared (IR) video camera; a low-light video camera; a thermal video camera; a closed-circuit television (CCTV) camera; a pan, tilt, zoom (PTZ) camera; and a video sensing device” as in paragraph 0069) having an arrangement in relation to one another such that a field of view of each of said cameras overlaps (see Kanaujia, paragraphs 0077-0079 and figure 1B for example where as above the system “provides multiple video streams from multiple video sources” such that it “may comprise three video cameras operating to take a video of an area to be monitored” and as in figure 1B the frames of the sources can be seen to overlap from the different fields of view of the cameras imaging the subject such that as in paragraph 0081 the “3D visual hull extraction module 103 receives plural video streams from block 102 and extracts a 3D visual hull of a human object detected in the plural video streams” where figure 1B “shows six frames of six different video streams taken with different perspectives (different angles) of a monitored area” which are overlapping perspectives as can be clearly seen from these cameras arranged to have overlapping perspectives), said group of cameras having:
at least a first and a second camera with objective lenses which are disposed at a distance x of at least two meters from one another and/or with optical axes being oriented at an angle a of at least 45 degrees with respect to one another (see Kanaujia, paragraphs 0077-0079 and figure 1B for example where as above the system “provides multiple video streams from multiple video sources” such that it “may comprise three video cameras operating to take a video of an area to be monitored” and as in figure 1B the frames of the sources can be seen to overlap from the different fields of view of the cameras imaging the subject such that as in paragraph 0081 the “3D visual hull extraction module 103 receives plural video streams from block 102 and extracts a 3D visual hull of a human object detected in the plural video streams” where figure 1B “shows six frames of six different video streams taken with different perspectives (different angles) of a monitored area” where such angles can be seen to be at least 45 degrees with respect to one another as further evidenced by the frames from such cameras showing a subject from overlapping views which include opposing views with angles of 180 degrees with respect to one another for example);
said first camera is a thermal imaging camera for recording thermal radiation via continuous digital storage of 2D thermal images using at least one thermal imaging recorder (see Kanaujia, paragraphs 0077-0079 teaching “video surveillance system 101” which “may be configured to monitor a scene to estimate human shapes of detected human objects in one or more video streams” such that one or more video streams could come from a group of cameras in this context as further explained where the system “provides multiple video streams from multiple video sources” such that it “may comprise three video cameras operating to take a video of an area to be monitored” where these “video sources” are taught as being any of “one or more of the following: a video imager and lens apparatus; a video camera; a digital video camera; a color camera; a monochrome camera; a camera; a camcorder; a PC camera; a webcam; an infrared (IR) video camera; a low-light video camera; a thermal video camera; a closed-circuit television (CCTV) camera; a pan, tilt, zoom (PTZ) camera; and a video sensing device” as in paragraph 0069 meaning that a first camera of those as seen in figure 1B as explained above is taught to be an “infrared (IR) video camera” such that this is for recording thermal radiation via continuous digital storage of 2D thermal images using the IR video camera as the recorder where the thermal radiation could be recorded of reflected thermal radiation or could be providing such thermal images from the “thermal video camera”;
said second camera is:
a further thermal imaging camera (see Kanaujia, paragraphs 0077-0079 teaching “video surveillance system 101” which “may be configured to monitor a scene to estimate human shapes of detected human objects in one or more video streams” such that one or more video streams could come from a group of cameras in this context as further explained where the system “provides multiple video streams from multiple video sources” such that it “may comprise three video cameras operating to take a video of an area to be monitored” where these “video sources” are taught as being any of “one or more of the following: a video imager and lens apparatus; a video camera; a digital video camera; a color camera; a monochrome camera; a camera; a camcorder; a PC camera; a webcam; an infrared (IR) video camera; a low-light video camera; a thermal video camera; a closed-circuit television (CCTV) camera; a pan, tilt, zoom (PTZ) camera; and a video sensing device” and such cameras could correspond to the cameras capturing the frames as in figure 1B from the overlapping perspectives and as above these could be from a further thermal imaging camera such as another “infrared (IR) video camera” or another “thermal video camera”); or
a video image camera for recording light radiation via continuous digital storage of 2D video images using at least one video image recorder (see Kanaujia, paragraphs 0077-0079 teaching “video surveillance system 101” which “may be configured to monitor a scene to estimate human shapes of detected human objects in one or more video streams” such that one or more video streams could come from a group of cameras in this context as further explained where the system “provides multiple video streams from multiple video sources” such that it “may comprise three video cameras operating to take a video of an area to be monitored” where these “video sources” are taught as being any of “one or more of the following: a video imager and lens apparatus; a video camera; a digital video camera; a color camera; a monochrome camera; a camera; a camcorder; a PC camera; a webcam; an infrared (IR) video camera; a low-light video camera; a thermal video camera; a closed-circuit television (CCTV) camera; a pan, tilt, zoom (PTZ) camera; and a video sensing device” and such cameras could correspond to the cameras capturing the frames as in figure 1B from the overlapping perspectives and as above these could be from a further thermal imaging camera such as another “infrared (IR) video camera” or “thermal video camera” which record light radiation in the infrared spectrum and/or acquire thermal video data and are thus such a video image camera as claimed or could be any of the “video imager and lens apparatus; a video camera; a digital video camera; a color camera; a monochrome camera; a camera; a camcorder; a PC camera; a webcam; an infrared (IR) video camera; a low-light video camera; a thermal video camera; a closed-circuit television (CCTV) camera; a pan, tilt, zoom (PTZ) camera; and a video sensing device”);
a calibration unit for ensuring spatial 3D calibration of all of said cameras with overlapping fields of view (note that a unit which is functioning “for ensuring spatial 3D calibration of all of said cameras” is extremely broad even under the interpretation in view of the Specification at page 8, lines 4-6 teaching simply disclosing “[u]sing a calibration unit 51, simultaneous spatial 3D calibration of all thermal cameras 11, 12, ... and possibly video cameras 21, 22, ... with overlapping fields of view 111, 121, ...; 211, 221, ... is ensured, for example according to the known prior art”; in light of this, see Kanaujia, teaching the system functioning as a calibration unit to ensure spatial 3D calibration of all said cameras as in paragraphs 0099-0104 teaching a “module 102” that “may provide streams of multi-view image sequences of a human target from a set of calibrated cameras as inputs” which through such calibrated camera input then is ensuring spatial 3D calibration of all said cameras which may be “[t]here, four or more calibrated cameras…placed along directions to maximally capture the entire viewing sphere around the target” and these capture the overlapping fields of view to comprise this viewing sphere such that “[i]mage streams from multiple calibrated sensors may be used to reconstruct 3D volumetric representation (visual hull) of the human target using space carving;
finally, note that Applicant admits that this unit is prior art as the Specification teaches on page 8 “[u]sing a calibration unit 51…for example according to the known prior art”);
a synchronization unit for ensuring a recording of the 2D thermal images and any said 2D video images note that the claim does note define nor require any specific technique for how a synchronization unit performing ensuring of recording is accomplished, and thus any technique which ensures recording of the 2D thermal and video images is effected at the same time and/or ensures recording time points of the 2D images are known in some manner is within the scope of the claims as even Applicant’s specification does not specifically limit the manner in which synchronization is accomplished, though it at least provides some structure for doing so; thus see Kanaujia where in the cited portions below the system is taught to be acting as such a synchronization unit as 2D thermal and video images are ensured to be recorded at an effected same time as this is necessary for initial 3D space carving into a visual hull as referenced below as in figure 1B and its associated description below and where recording time points of the 2D images are known as again the space carving of the target silhouettes must be from frames which are shot at the same time as clearly the space carving would not function if the viewpoints of each camera were not synchronized and the mere fact that the 2D images are shot from video cameras means that the recording time points are known in relation to one another for example as the frames per second which a video is shooting would ensure that recording time points of the 2D images are known relative to one another and the frame rate; see also for example paragraphs 0081-0082 teaching the “video streams are analyzed to detect the human object, and each of the six frames (associated with the same time) shown in (a) of FIG. 1B may be used to extract a 3D visual hull by module 103” such that here clearly this means the recording time points of the 2D images are known and that recording of the 2D thermal and video images is effected at a “same time” which again means that this time point is known);
a segmentation unit for segmenting associated 2D pixel regions of the object in synchronously available thermal images and any said video images according to predefined homogeneity criteria (“homogeneity criteria” being some criteria that things are homogenous or similar such that segmenting of the 2D regions is segmenting of regions which are similar or homogenous in some way such as having similar characteristics or otherwise defining features recognizable in the 2D image; see Kanaujia, paragraphs 0081-0083 and figure 1B where “visual hull extraction module 103 receives plural video streams from block 102 and extracts a 3D visual hull of a human object detected in the plural video streams” where “FIG. 1B shows six frames of six different video streams taken with different perspectives (different angles) of a monitored area” and a “human within the monitored area appears as a human object on each of these frames” such that the “video streams are analyzed to detect the human object, and each of the six frames (associated with the same time) shown in (a) of FIG. 1B may be used to extract a 3D visual hull by module 103” where this “feature extraction” acts as a segmentation unit for segmenting the 2D pixel regions of the object in the images according to predefined homogeneity criteria such as a region being homogenous if it meets a predefined definition of a “human object” allowing the human silhouette to be segmented out from the images for 3D visual hull reconstruction; note this is further taught in paragraph 0100 teaching “Module 102 may provide streams of multi-view image sequences of a human target from a set of calibrated cameras as inputs” and “3D volumetric reconstruction (visual hull) of the target is obtained by module 103 using space carving from the target silhouettes” such that target silhouettes are segmented 2D pixel regions of the object in the synchronously available images segmented according to criteria that the region belongs to the human object target; see also  );
a reconstruction unit for reconstructing a 3D voxel model of the object from segmented 2D pixel regions (see Kanaujia, paragraphs 0081-0082 teaching “3D visual hull extraction module 103 receives plural video streams from block 102 and extracts a 3D visual hull of a human object detected in the plural video streams” such that this is a reconstruction unit which reconstructs a 3D voxel model of the object called a “3D visual hull” from the 2D pixel regions in the frames as further explained in Kanaujia, paragraphs 0104 teaching “[i]mage streams from multiple calibrated sensors may be used to reconstruct 3D volumetric representation (visual hull) of the human target using space carving” where an “[o]ctree-based fast iterative space carving algorithm may be used to extract volumetric reconstruction of the target” where a “single volume (cube) that completely encloses the working space of the acquisition system may be defined” and “[b]ased on the projection to the camera image plane, each voxel is classified as inside, outside or on the boundary of the visual hull using the target silhouette” such that the “boundary voxels may be iteratively subdivided into eight parts (voxels) until the size of voxels is less than the threshold size” defining then a 3D voxel model of the object from the segmented 2D pixel regions of the target silhouette—note that this 3D voxel model as well as the images and regions from which it is created then serve as the basis for all 3D voxel model reconstruction as below where these are the basis for the visual hull, as well as the coarse 3D shape model and the detailed 3D shape model, all of which are 3D voxel models reconstructed from the original segmented 2D pixel regions as further explained below; 
as noted above, 3D reconstruction may be considered to occur from the 2D pixel regions multiple times if such segmented regions are used to reconstruct a 3D voxel model in some manner such that above in a first stage there is a 3D reconstruction of a voxel model to generate a first 3D voxel model such as the “visual hull” 3D voxel model, but there is also 3D reconstruction based on these segmented regions as well whereby a refined 3D reconstruction is found comprising a 3D pose refined and 3D shape estimated 3D reconstruction from the 2D images which comprises a 3D voxel model such as in paragraph 0083 teaching “3D pose refinement module 105 receives the 3D visual hull from module 103 and receives pose hypotheses from module 104 and refines the pose predictions” and  “pose predictions may be refined by the 3D pose refinement module 105 using cylindrical body part models to obtain a coarse 3D human shape model” where “[f]or each pose prediction, cylinder model parts may be mapped on a part by part basis (e.g., leg, arm, torso, head, etc.) to the skeleton corresponding to the pose prediction” where the “[s]izes of the cylinder may be selected by comparing the cylinder to the 3D visual hull to maximize correspondence” to generate a “resulting coarse 3D human shape model” which is another 3D voxel model available for processing which has been reconstructed (and still from the segmented 2D regions which are the basis for the entire technique) and which is then provided to the 3D shape estimation stage which may generate another version of a 3D voxel model—relatedly see paragraph 0090 teaching “generation of a coarse 3D human shape model for each pose and comparing the same to the extracted 3D visual hull obtained in step S103” where “the coarse 3D human shape model have a calculated silhouette compared with silhouettes of the human object for each of the plural video images of the video streams to refine the initial pose hypotheses” such that again it can be seen that a 3D voxel model such as the coarse 3D human shape model is reconstructed by the system acting as a reconstruction unit;
note the final instance of 3D reconstruction from the initial 2D segmented regions above can be found taught as in paragraph 0085 where “3D shape estimation module 107 receives the refined pose estimations from module 105 and receives the different body type detailed 3D human shape models from module 106 and provides an estimated pose and a detailed 3D model of the human object detected in the video streams” such that again based on the 2D segmented regions of the object initially, there is then created a “detailed 3D model” which is a 3D voxel model which is generated by module 107 acting as such a reconstruction unit claimed where “FIG. 1B represents this operation at (e), showing a coarse 3D shape human model provided by module 105 (at top of (e)) transformed into a detailed 3D shape human model (at the bottom of (e))” all of which of course is based on the initial segmented regions of the frames of video taken by the multicam setup);
a projection unit projecting the 3D voxel model as reference for a search space back into the synchronously available 2D thermal images and any said 2D video images (see Kanaujia, paragraphs 0104-0105 where as above the 3D voxel model is reconstructed from the regions and then “[a]s 2D shapes of the silhouette are used in discriminative 3D pose prediction, a visual hull is back projected to obtain clean silhouettes of the target using Z-buffering” where these “improved silhouettes generate cleaner shape descriptors for improved 3D pose estimation using bottom-up methods” such that here the pose prediction module acts as such a projection unit projecting the 3D visual hull model as a references for a search space in the 2D silhouette space; 
additionally and/or alternatively, note in paragraph 0083 when the reconstructed 3D voxel model is the “coarse 3D human shape model” used by module 105 it is taught that “FIG. 1B represents the operations of 3D pose refinement module 105 at (d), showing cylindrical human body parts mapped to portions of the skeleton representing a pose prediction” where the “resulting coarse 3D human shape model is compared to the visual hull extracted from module 103 to refine the prose predictions” or “in the alternative, other comparisons of the coarse 3D human shape model may be made with the video images, such as a comparison of a calculated silhouette to a silhouette extracted from a corresponding video image frame” such that here and as can be seen in the figure 1B in part (d) that the coarse 3D human model which is reconstructed is used as a reference for a search space back into the synchronously available “corresponding video image frame” allowing the 3D voxel model reconstructed to be compared with the 2D silhouette extracted—see also related paragraph 0090 teaching with regard to the coarse 3D human shape model as the 3D voxel model “to compare the coarse 3D human shape model to the human object in the video images to refine the pose hypotheses” where “the coarse 3D human shape model have a calculated silhouette compared with silhouettes of the human object for each of the plural video images of the video streams to refine the initial pose hypotheses” such that this projects the 3D coarse human shape voxel model back to the 2D image silhouette space to search the space for comparison purposes;
alternatively and/or additionally, note that when the 3D voxel model is to be considered the detailed shape model as from paragraph 0085 and part (e ) of figure 1B as explained above, it is further taught in paragraph 0085 “[u]sing the detailed 3D human shape models obtained by mapping the different body type detailed 3D human shape models to each of several pose predictions (via the associated coarse 3D human shape model), for each video stream of the plural video streams, a calculated silhouette of the detailed 3D human shape model may be compared to a silhouette extracted from the video image frame of that video stream” where the “calculated silhouetted may correspond to a projection of the detailed 3D human shape model to the image plane of the corresponding video image from which the actual silhouette is extracted” such that here there is a projection from the 3D model space as a reference for a search space back into the synchronously available frames “from the video image frame of that video stream” – note paragraph 0098 teaches further with regard to another instance of projecting the 3D voxel model where “accessories may be detected by detecting significant anomalies between the 3D human model and the video image(s) of the video streams” and “the final estimated 3D human model may be used to calculate corresponding silhouette(s) on one or more image planes of the video images (e.g., by projecting the estimated 3D human model onto these image planes)” where the “calculated silhouette(s) may be compared to corresponding extracted silhouette(s) extracted from the video images to analyze the anomalous shape” and further as in paragraph 0099 “the final 3D human model may be projected onto each of the video image planes to obtain a calculated silhouette for each of these images planes (the image planes may correspond to each of the video images of the video streams taken by the various video cameras monitoring the desired location)” and “a video image of each of the video streams may be analyzed to extract a silhouette of the detected human object” such that here there is projection of the 3D voxel model as a reference for a search space back into the available images for comparison ); and
an identification unit for identifying silhouettes of the object in the synchronously available 2D thermal images and any said 2D video images on a basis of the search space that is defined by back projection (see Kanaujia, paragraphs 0104-0105 for the case where the back projection refers to the 3D visual hull voxel model, where as above the 3D voxel model is reconstructed from the regions and then “[a]s 2D shapes of the silhouette are used in discriminative 3D pose prediction, a visual hull is back projected to obtain clean silhouettes of the target using Z-buffering” where these “improved silhouettes generate cleaner shape descriptors for improved 3D pose estimation using bottom-up methods” such that here the pose prediction module acts as an identification unit which identifies such silhouettes above which are synchronously available for comparison on a basis of the search space of the 2D silhouette space which is used for searching for pose information about the subject;
additionaly and/or alternatively, see Kanaujia, paragraph 0083 as above for the case when the back projection refers to the back projection of coarse 3D human shape voxel model to a “corresponding video image frame” for “comparisons of the coarse 3D human shape model” such as “comparison of a calculated silhouette to a silhouette extracted from a corresponding video image frame,” then the pose refinement module generating this refined coarse 3D model is such an identification unit identifying silhouettes in the synchronously available corresponding video images on the basis of the search space -- see also related paragraph 0090 teaching with regard to the coarse 3D human shape model as the 3D voxel model “to compare the coarse 3D human shape model to the human object in the video images to refine the pose hypotheses” where “the coarse 3D human shape model have a calculated silhouette compared with silhouettes of the human object for each of the plural video images of the video streams to refine the initial pose hypotheses” such that this projects the 3D coarse human shape voxel model back to the 2D image silhouette space to search the space for comparison purposes;
alternatively and/or additionally as explained above when the back projection refers to the back projection of the detailed shape model then the shape estimation module acts as such an identification unit for identifying silhouettes as in paragraph 0085 referenced above teaching “[u]sing the detailed 3D human shape models obtained by mapping the different body type detailed 3D human shape models to each of several pose predictions (via the associated coarse 3D human shape model), for each video stream of the plural video streams, a calculated silhouette of the detailed 3D human shape model may be compared to a silhouette extracted from the video image frame of that video stream” where the “calculated silhouetted may correspond to a projection of the detailed 3D human shape model to the image plane of the corresponding video image from which the actual silhouette is extracted” such that as explained above there is a projection from the 3D model space as a reference for a search space back into the synchronously available frames “from the video image frame of that video stream” such that this identifies such silhouettes of the object in the multicamera images in order to perform this comparison such that the “estimated pose and shape of the human object may be determined as that which results in the best comparison of the calculated silhouette and the extracted silhouettes” –see further as in paragraph 0099 “the final 3D human model may be projected onto each of the video image planes to obtain a calculated silhouette for each of these images planes (the image planes may correspond to each of the video images of the video streams taken by the various video cameras monitoring the desired location)” and “a video image of each of the video streams may be analyzed to extract a silhouette of the detected human object” such that here there is projection of the 3D voxel model as a reference for a search space back into the available images for comparison where this is based on identifying silhouettes of the object in the synchronously available frames as the model silhouette is then compared with the corresponding silhouette of the captured frame to determine in this instance whether there is some anomalous object associated with the target subject ).
Regarding claim 20, Kanaujia teaches all that is required as applied to claim 16 above and further teaches wherein said segmentation unit, which is cycled through in an iterative sequence and in a process additionally takes into consideration the search space limitations from a preceding iteration step and adapts homogeneity criteria to a current iteration step (note that an “iterative sequence” is interpreted to simply mean a process which iterates in some manner which itself is extremely broad such that in this context an iterative sequence is some series of steps followed which at each step works to provide some further processing or improvement on an initial calculation; thus see Kanaujia, paragraphs 0081-0083 and figure 1B where as taught above, “visual hull extraction module 103 receives plural video streams from block 102 and extracts a 3D visual hull of a human object detected in the plural video streams” where “FIG. 1B shows six frames of six different video streams taken with different perspectives (different angles) of a monitored area” and a “human within the monitored area appears as a human object on each of these frames” such that the “video streams are analyzed to detect the human object, and each of the six frames (associated with the same time) shown in (a) of FIG. 1B may be used to extract a 3D visual hull by module 103” where this “feature extraction” acts as a segmentation unit for segmenting the 2D pixel regions of the object in the images according to predefined homogeneity criteria such as a region being homogenous if it meets a predefined definition of a “human object” allowing the human silhouette to be segmented out from the images for 3D visual hull reconstruction and this is a step which is cycled through such as in the iterative process of figure 1A and and 1B and can be seen to take into consideration the search space limitations from a preceding iteration step which may be implicit such as the image frames of the subject being captured in an initial step by the available cameras and adapts homogeneity criteria to the current iteration step such as by segmenting out the subject in each current view of the cameras for reconstruction and as in paragraphs 0104-0105 “2D shapes of the silhouette are used in discriminative 3D pose prediction, a visual hull is back projected to obtain clean silhouettes of the target using Z-buffering” such that also this takes into consideration the search space limitations from the preceding iteration step capturing possibly noisy images which need to be cleaned and the homogeneity criteria is adapted to a current iteration step such as the homogeneity criteria being adapted to a current step as the clean silhouettes are used as the homogeneity criteria for segmenting for the following 3D reconstruction steps).
Regarding claim 21, Kanaujia teaches all that is required as applied to claim 16 above and further teaches wherein said reconstruction unit, which selects, in an iterative sequence, additionally the segmented 2D pixel regions used for reconstruction of the 3D voxel model in dependence on a current iteration step, a type of camera and/or a quality criteria of the 2D pixel regions (see Kanaujia, paragraphs 0081-0083 and figure 1B where as taught above, “visual hull extraction module 103 receives plural video streams from block 102 and extracts a 3D visual hull of a human object detected in the plural video streams” where “FIG. 1B shows six frames of six different video streams taken with different perspectives (different angles) of a monitored area” and a “human within the monitored area appears as a human object on each of these frames” such that the “video streams are analyzed to detect the human object, and each of the six frames (associated with the same time) shown in (a) of FIG. 1B may be used to extract a 3D visual hull by module 103” where this “feature extraction” acts as a segmentation unit for segmenting the 2D pixel regions of the object in the images according to predefined homogeneity criteria such as a region being homogenous if it meets a predefined definition of a “human object” allowing the human silhouette to be segmented out from the images for 3D visual hull reconstruction and this is a step which is cycled through such as in the iterative process of figure 1A and and 1B and can be seen to be selecting the segmented 2D pixel regions used for reconstruction based on a current iteration step such as that in paragraphs 0104-0105 teaching “2D shapes of the silhouette are used in discriminative 3D pose prediction, a visual hull is back projected to obtain clean silhouettes of the target using Z-buffering” meaning that these regions are used based on the current iteration step performing this technique, which also must be seen as being dependent on the type of camera as well given that the camera provides the frame of pixels of the object which would be different depending on the camera and image type necessarily, and finally quality criteria of the 2D pixel region is also a basis for the pixels actually used as those which go through the above cleaning process are considered to be higher quality pixels ).
Regarding claim 22, Kanaujia teaches all that is required as applied to claim 16 above and further teaches wherein: said group of cameras includes at least two thermal imaging cameras; and
said reconstruction unit, which initially reconstructs a 3D WB voxel model of the objects only from segmented 2D WB pixel regions (note that “a 3D WB voxel model” only makes sense in light of the Specification, where Applicant does not establish any exclusive meaning to “WB” but uses “WB” only in examples whereby this refers to the images taken by thermal cameras such as at page 6 for example referring to “2D thermal images WB” such that these are recognized as images from the thermal cameras; thus see Kanaujia, paragraphs 0077-0079 teaching “video surveillance system 101” which “may be configured to monitor a scene to estimate human shapes of detected human objects in one or more video streams” such that one or more video streams could come from a group of cameras in this context as further explained where the system “provides multiple video streams from multiple video sources” such that it “may comprise three video cameras operating to take a video of an area to be monitored” where these “video sources” are taught as being any of “one or more of the following: a video imager and lens apparatus; a video camera; a digital video camera; a color camera; a monochrome camera; a camera; a camcorder; a PC camera; a webcam; an infrared (IR) video camera; a low-light video camera; a thermal video camera; a closed-circuit television (CCTV) camera; a pan, tilt, zoom (PTZ) camera; and a video sensing device” and such cameras could correspond to the cameras capturing the frames as in figure 1B from the overlapping perspectives and as above these could be from a further thermal imaging camera such as another “infrared (IR) video camera” or another “thermal video camera” such that in the case where both cameras are thermal video cameras which are capturing thermal images WB as each frame to be supplied for reconstruction then an initial 3D visual hull which is reconstructed would be a 3D WB voxel model of the objects only from segmented 2D WB pixel regions).
Regarding claim 23, Kanaujia teaches all that is required as applied to claim 22 above and further teaches wherein said projection unit, which initially projects back the 3D WB voxel model as a reference for the search space into the synchronously available 2D thermal images and any said 2D video images (see Kanaujia as applied in parent claim 22 above teaching at paragraphs 0077-0079 for example “video surveillance system 101” which “may be configured to monitor a scene to estimate human shapes of detected human objects in one or more video streams” such that one or more video streams could come from a group of cameras in this context as further explained where the system “provides multiple video streams from multiple video sources” such that it “may comprise three video cameras operating to take a video of an area to be monitored” where these “video sources” are taught as being any of “one or more of the following: a video imager and lens apparatus; a video camera; a digital video camera; a color camera; a monochrome camera; a camera; a camcorder; a PC camera; a webcam; an infrared (IR) video camera; a low-light video camera; a thermal video camera; a closed-circuit television (CCTV) camera; a pan, tilt, zoom (PTZ) camera; and a video sensing device” and such cameras could correspond to the cameras capturing the frames as in figure 1B from the overlapping perspectives and as above these could be from a further thermal imaging camera such as another “infrared (IR) video camera” or another “thermal video camera” such that in the case where both cameras are thermal video cameras which are capturing thermal images WB as each frame to be supplied for reconstruction then an initial 3D visual hull which is reconstructed would be a 3D WB voxel model of the objects only from segmented 2D WB pixel regions).
Regarding claim 24, Kanaujia teaches all that is required as applied to claim 16 above and further teaches an assignment unit, which assigns points of previously known silhouettes of a model of the object as a correspondence to points of identified silhouettes and/or assigns the points of the identified silhouettes as correspondence to the points of the previously known silhouettes of the model of the object (this is taught during space carving where points of the silhouettes are tracked or assigned values during this process; see Kanaujia, paragraphs 0081-0082 teaching “3D visual hull extraction module 103 receives plural video streams from block 102 and extracts a 3D visual hull of a human object detected in the plural video streams” such that this is a reconstruction unit which reconstructs a 3D voxel model of the object called a “3D visual hull” from the 2D pixel regions in the frames as further explained in Kanaujia, paragraphs 0104 teaching “[i]mage streams from multiple calibrated sensors may be used to reconstruct 3D volumetric representation (visual hull) of the human target using space carving” where an “[o]ctree-based fast iterative space carving algorithm may be used to extract volumetric reconstruction of the target” where a “single volume (cube) that completely encloses the working space of the acquisition system may be defined” and “[b]ased on the projection to the camera image plane, each voxel is classified as inside, outside or on the boundary of the visual hull using the target silhouette” such that the “boundary voxels may be iteratively subdivided into eight parts (voxels) until the size of voxels is less than the threshold size” defining then a 3D voxel model of the object from the segmented 2D pixel regions of the target silhouette such that the system acts as an assignment unit which assigns points of previously known silhouettes of a model of the object as a correspondence to points of identified silhouettes such as when each object pixel from each image such as in (a) of figure 1B is assigned to a same point as another object pixel thus allowing for the photogrammetric calculations to create the 3D visual hull reconstruction, in other words when each image is processed from each camera to determine its silhouette then these points are tracked in order to find corresponding silhouette points in currently identified silhouettes until the 3D visual hull is extracted once all images are processed).
Regarding claim 25, Kanaujia teaches all that is required as applied to claim 16 above and further teaches an assignment unit, which additionally to correspondences obtained from data of the silhouettes, uses data of further sensors, any image processing units and/or a depth image camera for establishing correspondences (see Kanaujia, paragraphs 0077-0079 teaching “video surveillance system 101” which “may be configured to monitor a scene to estimate human shapes of detected human objects in one or more video streams” such that one or more video streams could come from a group of cameras in this context as further explained where the system “provides multiple video streams from multiple video sources” such that it “may comprise three video cameras operating to take a video of an area to be monitored” where these “video sources” are taught as being any of “one or more of the following: a video imager and lens apparatus; a video camera; a digital video camera; a color camera; a monochrome camera; a camera; a camcorder; a PC camera; a webcam; an infrared (IR) video camera; a low-light video camera; a thermal video camera; a closed-circuit television (CCTV) camera; a pan, tilt, zoom (PTZ) camera; and a video sensing device” and such cameras could correspond to the cameras capturing the frames as in figure 1B from the overlapping perspectives which for example include more than the two required cameras above such that this is establishing correspondences from data of further sensors and/or depth image cameras and for example the correspondences are obtained using any image processing unit such as the “computer” performing such steps above as in paragraphs 0072-0077 where the computer and its components function as image processing units as they perform image processing).
Regarding claim 26, Kanaujia teaches all that is required as applied to claim 24 above and further teaches a weighting unit, which weights established correspondences in accordance with fixedly predefined and/or variable parameters (see Kanaujia, paragraph 0111 teaching that in “multi-camera settings, visual cues can be fused at feature level to train a single discriminative model to predict 3D pose using concatenated feature vector obtained from multiple sensors” such that “pose predictions may be performed by module 104” acting as a weighting unit as “combined predictive distribution is obtained by simply summing the mixture of Gaussian distributions obtained from each of the sensor models C={C1, . . . , CN} with gate weights re-weighted to sum to one” according to equation “(5)” where “N is the number of sensors and M are the experts in each of the Mixture of Experts model used to learn the mapping” such that this weights established correspondences in accordance these fixedly predefined/and or variable parameters above).
Regarding claim 27, Kanaujia teaches all that is required as applied to claim 16 above and further teaches a motion analysis unit, which analyzes poses and/or movements of the object from a finally available orientation of a model of the object (see Kanaujia, paragraphs 0085-0086 teaching the 3D shape estimation module 107 acting as such a motion analysis unit where it analyzes pose of the target human object from a finally available orientation of a model of the object such as the finally available “detailed 3D human shape model” when it “receives the refined pose estimations from module 105 and receives the different body type detailed 3D human shape models from module 106 and provides an estimated pose and a detailed 3D model of the human object detected in the video streams” and “[u]sing the detailed 3D human shape models obtained by mapping the different body type detailed 3D human shape models to each of several pose predictions (via the associated coarse 3D human shape model), for each video stream of the plural video streams, a calculated silhouette of the detailed 3D human shape model may be compared to a silhouette extracted from the video image frame of that video stream” such that here the pose of the object is analyzed with respect to its pose compared to the silhouette and additionally this may also be considered to analyze movements of the object from the final model as the 3D model is of a moving object and thus gives information about the movements such as their movement through the surveilled environment).
Regarding claim 28, Kanaujia teaches all that is required as applied to claim 16 above and further teaches such a motion analysis system according to claim 16 as explained above and further teaches a motion tracking unit, which performs a reorientation of a model of the object from assigned correspondences (see Kanaujia, paragraphs 0081-0085 and figure 1B where for example in the 3D pose refinement step this functions as the claimed motion tracking unit as it performs reorientation of a model to a certain “coarse 3D human shape model” where “comparisons may be made for each of the different pose predictions of the pose hypotheses output by module 104 (corresponding to (c) in FIG. 1B) and used to modify the pose hypotheses output by module 104” and for example “probabilities associated with each of the poses may be adjusted” such that this results in reorientation of a model through such pose refinement where changing pose changes orientation of the model as recognized by one of ordinary skill in the art).
Regarding claim 29, Kanaujia teaches all that is required as applied to claim 28 above and further teaches wherein said motion tracking unit, which, in an iterative procedure, carries out after each iteration, with already present correspondences and/or with correspondences which have been re-established, a reorientation of the model of the object until an orientation of the model meets a predefined criterion (see Kanaujia, paragraphs 0081-0085 and figure 1B where for example in the 3D pose refinement step this functions as the claimed motion tracking unit as it performs a first “3D pose refienement” reorientation of a model to a certain “coarse 3D human shape model” where “comparisons may be made for each of the different pose predictions of the pose hypotheses output by module 104 (corresponding to (c) in FIG. 1B) and used to modify the pose hypotheses output by module 104” and for example “probabilities associated with each of the poses may be adjusted” such that this results in reorientation of a model through such pose refinement using already present correspondences and performs reorientation at each iteration such as in the next 3D shape estimation and pose refinement as in part (e ) where “estimated pose and shape of the human object may be determined as that which results in the best comparison of the calculated silhouette and the extracted silhouettes” such that this best comparision is the predefined criterion).
Regarding claim 30, Kanaujia teaches all that is required as applied to claim 28 above and further teaches a motion analysis unit, which analyzes poses and/or movements of the object from a finally available orientation of the model of the object (see Kanaujia, paragraphs 0085-0086 teaching the 3D shape estimation module 107 acting as such a motion analysis unit where it analyzes pose of the target human object from a finally available orientation of a model of the object such as the finally available “detailed 3D human shape model” when it “receives the refined pose estimations from module 105 and receives the different body type detailed 3D human shape models from module 106 and provides an estimated pose and a detailed 3D model of the human object detected in the video streams” and “[u]sing the detailed 3D human shape models obtained by mapping the different body type detailed 3D human shape models to each of several pose predictions (via the associated coarse 3D human shape model), for each video stream of the plural video streams, a calculated silhouette of the detailed 3D human shape model may be compared to a silhouette extracted from the video image frame of that video stream” such that here the pose of the object is analyzed with respect to its pose compared to the silhouette and additionally this may also be considered to analyze movements of the object from the final model as the 3D model is of a moving object and thus gives information about the movements such as their movement through the surveilled environment ).
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claim(s) 19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Kanaujia in view of Lee et al2 (“Lee”).
Regarding claim 19, Kanaujia teaches all that is teaches all that is required as applied to claim 16 above and further teaches wherein: said group of cameras has said thermal imaging camera and at least two video image cameras (see Kanaujia, paragraphs 0077-0079 and figure 1B for example where as above the system “provides multiple video streams from multiple video sources” such that it “may comprise three video cameras operating to take a video of an area to be monitored” and as in figure 1B the frames of the sources can be seen to overlap from the different fields of view of the cameras imaging the subject where such cameras are disclosed as being a thermal imaging camera and at least two video image cameras as in  “video surveillance system 101” which “may be configured to monitor a scene to estimate human shapes of detected human objects in one or more video streams” such that one or more video streams could come from a group of cameras in this context as further explained where the system “provides multiple video streams from multiple video sources” such that it “may comprise three video cameras operating to take a video of an area to be monitored” where these “video sources” are taught as being any of “one or more of the following: a video imager and lens apparatus; a video camera; a digital video camera; a color camera; a monochrome camera; a camera; a camcorder; a PC camera; a webcam; an infrared (IR) video camera; a low-light video camera; a thermal video camera; a closed-circuit television (CCTV) camera; a pan, tilt, zoom (PTZ) camera; and a video sensing device” as in paragraph 0069 meaning that a first camera of those as seen in figure 1B as explained above is taught to be a thermal image video camera and the second camera could be a video image camera at 45 degrees or greater as in the arrangement surrounding the subject in figure 1B and a third camera could also be a video camera as within the description of Kanaujia);
said first camera is said thermal imaging camera and said second camera is said video image camera, said objective lenses of said first and second cameras are disposed at a distance x of at least two meters from one another and/or the optical axes of which are oriented at the angle a of at least 45 degrees with respect to one another (see Kanaujia, paragraphs 0077-0079 and figure 1B for example where as above the system “provides multiple video streams from multiple video sources” such that it “may comprise three video cameras operating to take a video of an area to be monitored” and as in figure 1B the frames of the sources can be seen to overlap from the different fields of view of the cameras imaging the subject where such cameras are disclosed as being a thermal imaging camera and at least two video image cameras as in  “video surveillance system 101” which “may be configured to monitor a scene to estimate human shapes of detected human objects in one or more video streams” such that one or more video streams could come from a group of cameras in this context as further explained where the system “provides multiple video streams from multiple video sources” such that it “may comprise three video cameras operating to take a video of an area to be monitored” where these “video sources” are taught as being any of “one or more of the following: a video imager and lens apparatus; a video camera; a digital video camera; a color camera; a monochrome camera; a camera; a camcorder; a PC camera; a webcam; an infrared (IR) video camera; a low-light video camera; a thermal video camera; a closed-circuit television (CCTV) camera; a pan, tilt, zoom (PTZ) camera; and a video sensing device” as in paragraph 0069 meaning that a first camera of those as seen in figure 1B as explained above is taught to be a thermal image video camera and the second camera could be a video image camera at 45 degrees or greater as in the arrangement surrounding the subject in figure 1B such as one across from another viewpoint); and
said group of cameras includes a third camera being a video image camera, an objective lens of said third camera is disposed immediately adjacent to said objective lens of said thermal imaging camera (see Kanaujia, paragraphs 0077-0079 and figure 1B for example where as above the system “provides multiple video streams from multiple video sources” such that it “may comprise three video cameras operating to take a video of an area to be monitored” and as in figure 1B the frames of the sources can be seen to overlap from the different fields of view of the cameras imaging the subject where such cameras are disclosed as being a thermal imaging camera and at least two video image cameras as in  “video surveillance system 101” which “may be configured to monitor a scene to estimate human shapes of detected human objects in one or more video streams” such that one or more video streams could come from a group of cameras in this context as further explained where the system “provides multiple video streams from multiple video sources” such that it “may comprise three video cameras operating to take a video of an area to be monitored” where these “video sources” are taught as being any of “one or more of the following: a video imager and lens apparatus; a video camera; a digital video camera; a color camera; a monochrome camera; a camera; a camcorder; a PC camera; a webcam; an infrared (IR) video camera; a low-light video camera; a thermal video camera; a closed-circuit television (CCTV) camera; a pan, tilt, zoom (PTZ) camera; and a video sensing device” as in paragraph 0069 meaning that a first camera of those as seen in figure 1B as explained above is taught to be a thermal image video camera and the second camera could be a video image camera at 45 degrees or greater as in the arrangement surrounding the subject in figure 1B and a third camera could also be a video camera as within the description of Kanaujia and could be adjacent as also seen in the arrangment) such that the optical axes of said first and third cameras are oriented substantially parallel with respect to one another.
Kanaujia teaches all that is required as applied to claim 19 as explained above, but fails to teach that an arrangement of the third camera and the thermal imaging camera is such that an objective lens of said third camera is disposed immediately adjacent to said objective lens of said thermal imaging camera.  Rather, Kanaujia does not specifically teach such an arrangement, although any viewpoint surrounding the subject could be applied in the method.  Thus Kanaujia stands as a base device upon which the claimed invention can be seen as an improvement whereby such a camera arrangement with such a fixed arrangement could lead to simpler and more efficient and faster fusion and synchronization of captured images of a subject.
In the same field of endeavor relating to tracking moving subjects using multicamera thermal and visible camera arrangements (see Lee, Abstract, teaching “detecting human subjects in physical spaces, reconstructing their location in 3D, describing their spectral characteristics at each 3D location, and rendering 3D and spectral characteristics in real time” through “integrating thermal infrared (IR) cameras and visible spectral cameras so that the 3D and spectral characteristics can be acquired in real time”), Lee teaches that it is known to arrange a thermal camera and a visible spectrum video camera and their objective lenses disposed immediately adjacent to another and optical axes of said first and third cameras are oriented substantially parallel with respect to one another (see Lee, abstract, teaching “integrating thermal infrared (IR) cameras and visible spectral cameras so that the 3D and spectral characteristics can be acquired in real time” where as can be seen in figure 1 and as in Section 2 this includes a thermal video camera and visible spectrum camera disposed immediately adjacent to each other with object lenses being substantially parallel such that “Each camera cluster used in our experiments for 3D imaging consists of one thermal IR camera and four visible spectrum cameras (see Figure 1)”).  Thus Lee teaches the above known technique applicable to the base system of Kanaujia.
Therefore it would have been obvious for one of ordinary skill in the art before the effective filing date of the invention to modify Kanaujia to adopt Lee’s arrangement with respect to arranging at least a first and third camera with respect to one another as the results would have been predictable and resulted in an improved system.  The results would predictably be that images which are synchronized and fused as in Lee would be captured as in Lee and used in the reconstruction techniques of Kanaujia.  Note that Lee suggests in the Introduction that the images can be used with other techniques, teaching “most widely used motion based foreground object detection, or standard background subtraction method can be replaced with thermal infrared (IR) based foreground detection while gaining the thermal signature in addition to the visible spectral signature used for 3D stereo-based reconstruction.”  Thus this shows the results of use of such an arrangement in the system of Lee would be compatible and yield predictable results for the reconstruction steps.  The modification would yield an improved system as Lee suggests through “integrating thermal-IR cameras” the system is “able to increase (a) robustness to illumination changes and moving clusters of cameras, and (b) real-time performance.”
Claim(s) 31 is/are rejected under 35 U.S.C. 103 as being unpatentable over Kanaujia in view of St-Laurent et al3 (“St-Laurent”).
Regarding claim 31, Kanaujia teaches all that is required as applied to claim 16 above but fails to specifically teach a visualization unit, which uses texture mapping to visualize temperature data of segmented 2D WB pixel regions on the 3D voxel model and/or on an aligned model. Rather, Kanaujia does teach texture applied to the 3D voxel model such as the detailed 3D mesh texture seen applied as in figure 1B in part (e ) where the parts are seen with different textures.  However, this does not visualize temperature data of the thermal camera segmented from the images on the 3D voxel model or an aligned model as this requires some visualization of the temperature data itself according to the claims.  Thus Kanaujia is found to be a base system upon which the claimed invention can be seen as an improvement whereby such temperature visualization mapped to a 3D voxel model would give another type of information about the tracked subject providing additionally uses for thermal image data already obtained for other visualization purposes leading to greater efficiency and more varied visualization options for a user.  
In the same field of endeavor relating to tracking a user using a multicamera setup including at least a thermal video camera and a more conventional visual video camera, tracking moving objects (see St-Laurent, Introduction, teaching “widely known in the field of image fusion, the combination of thermal and visible images” where multiple approaches are known such as “representative image fusion” and “analytical image fusion” for example and approaches include “objects of interest are first extracted from infrared images along the hypothesis that pedestrians are warmer than background” and “regions of interest are then used in both spectrums for contour extraction and fusion” or “moving objects are detected and tracked independently in each spectrum” and an “analysis of object’s temporal persistence is used at every frame to select the more reliable sensor” or “[d]etection and targets tracking tasks are performed independently for visible and thermal images and a confidence measure is employed to weight sensor data at every time instant” and specifically in St-Laurent the technique involves “the distinctive characteristic of combining information from both sensors at pixel level” such that “every pixel is classified as foreground or background along its similarity to thermal-colour background model”),   St-Laurent teaches to use texture mapping to visualize temperature data of segmented 2D WB pixel regions on the 3D voxel model and/or on an aligned model (see St-Laurent,  section 4 and figure 3 teaching “visualization of image registration” on an aligned model where “Figure 3 illustrates the achieved quality of the registration of thermal and colour images” such that for “visualization purposes, the red channel of the colour image has been replaced by the scaled thermal image”).  Thus St-Laurent provides a known technique applicable to the base system of Kanaujia as explained above.
Therefore it would have been obvious for one of ordinary skill in the art before the effective filing date of invention to modify Kanaujia with the above teachings of St-Laurent as doing so would be no more than application of a known technique to a base device ready for improvement which would yield predictable results and result in an improved system.  The results of the combination would predictably be that instead of simply using the thermal video camera to capture 2D frames for segmentation purposes, the technique and suggestion of St-Laurent to use the information for segmentation purposes as well as for visualization purposes with regard to a reconstructed model from initial thermal and other images captured of the object would be utilized.  This would result in the detailed shape model of Kanaujia being textured according to registration of each thermal image with the model such that the thermal WB images are visualized as texture mapped to the model.  This would result in an improved system as the captured temperature would also be conveyed to the user in the final model given them additional useful information and also allowing to see how closely a thermal image has been aligned with a model based on such images.
Allowable Subject Matter
Claims 17 and 18 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
The following is a statement of reasons for the indication of allowable subject matter:  the prior art of record fails to teach or suggest the respective claim limitations when considered as a whole.
Regarding claim 17, the prior art teaches all that is required as applied to parent claim 16, but fails to teach a “2D supplementation unit that supplements missing 2D thermal images in a manner such that a synchronous 2D thermal image is always present for each 2D video image” which could be used when the image frequency of recording is different as required by the claim.  Kanaujia is silent as to any teaching of frequencies differing between cameras and is also silent as to any supplementation unit that would function as claimed.  Furthermore, in the instances where the Examiner is able to find relevant teachings of differing recording frequencies in multicamera setups including thermal and visible light video cameras, there is no suggestion to provide a supplementation unit as recited.  Rather, as in St-Laurent, in the Introduction which surveys various prior art techniques, none of the known techniques involve a supplementation unit as claimed, and rather the techniques instead choose between image modalities as the input for reconstructions.  Furthermore, in Lee, it is actually taught that the frequency of recording between the imaging modalities differs, but the visible spectrum camera is actually slower than the thermal camera and the solution to the different frame rates does not involve or suggest a 2D supplementation unit as recited (see Lee, methodology section, “synchronization” section).  Rather the only technique applied based on such different is a synchronization technique which does not involve supplementing any of the 2D images, much less missing thermal images.  While such interpolation of frames could be technically trivial, it would also require additional processing power and could affect downstream reconstruction such that it cannot be said to be an obvious technique to apply to the claimed system, especially when related system refrain from using such a technique.  Thus the claims recite such allowable subject matter as explained above.
Regarding claim 18, Kanaujia teaches all that is required as applied to claim 16 above but fails to specifically teach wherein a model frequency of 3D voxel models produced by said reconstruction unit is lower than an image frequency of the 2D thermal images recorded using said thermal imaging camera and/or any said 2D video images recorded using said video image camera; and further comprising a 3D supplementation unit, which supplements missing said 3D voxel models in a manner such that for each said 2D thermal image and/or any said 2D video image a synchronous 3D voxel model is always present.  Rather, Kanaujia is silent with regard to any techniques which deal with a situation in which the 3D reconstruction occurs at a different rate.  Furthermore, the Examiner is unable to find any teaching or suggestion in any other prior art which utilizes such a 3D supplementation unit to provide 3D missing voxel models nor which identify such 3D missing voxel models in a related process.  Similarly to the above explanation with regard to claim 17, there is no reason necessarily to provide any missing 3D voxel models as it would have to be determined that such models are missing and furthermore such reconstruction would come with associated processing costs such that it would not necessarily be obvious to provide any such missing voxel models.  The Examiner is unable to find any teaching or suggestion of such limitations in the claimed context and for the same purposes in the prior art, thus the claims contain allowable subject matter.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SCOTT E SONNERS whose telephone number is (571)270-7504. The examiner can normally be reached Mon-Friday 9-5.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Xiao Wu can be reached on (571) 272-7761. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/SCOTT E SONNERS/Examiner, Art Unit 2613       

/XIAO M WU/Supervisory Patent Examiner, Art Unit 2613                                                                                                                                                                                                        
                                                                                                                                                                                                 


    
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
    

    
        1 US PGPUB No. 20130250050
        2 Lee SK, McHenry K, Kooper R, Bajcsy P. Characterizing human subjects in real-time and three-dimensional spaces by integrating thermal-infrared and visible spectrum cameras. In2009 IEEE International Conference on Multimedia and Expo 2009 Jun 28 (pp. 1708-1711).
        3 St-Laurent L, Maldague X, Prévost D. Combination of colour and thermal sensors for enhanced object detection. In2007 10th International Conference on Information Fusion 2007 Jul 9 (pp. 1-8).