DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

Priority
Acknowledgment is made of applicant’s claim for foreign priority under 35 U.S.C. 119 (a)-(d). Receipt is acknowledged of certified copies of papers required by 37 CFR 1.55.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.

The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.

Claim 10 is rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
The term “consistent” in claim 10 is a relative and/or subjective term which renders the claim indefinite. The term “consistent” is not defined by the claim, the specification does not provide a standard for ascertaining the requisite degree, and one of ordinary skill in the art would not be reasonably apprised of the scope of the invention. 
Page 15 of the specification states that the phrase “consistent association” results in at least a subset of the estimated 3D positions, but does disclose what the scope of the consistency is or what additional elements result from the “consistent association.” For example, it is impossible to determine, based on the claims and the specification, whether the term “consistent” corresponds to ± 0.1 or 100% error, a different range of numbers/percentages, or some other metric. One of ordinary skill in the art would not be able to determine what is considered to be “consistent” association vs. inconsistent association solely based on the original disclosure. 
A claim that requires the exercise of subjective judgment without restriction renders the claim indefinite. In re Musgrave, 431 F.2d 882, 893, 167 USPQ 280, 289 (CCPA 1970). Claim scope cannot depend solely on the unrestrained, subjective opinion of a particular individual purported to be practicing the invention. Datamize LLC v. Plumtree Software, Inc., 417 F.3d 1342, 1350, 75 USPQ2d 1801, 1807 (Fed. Cir. 2005)); see also Interval Licensing LLC v. AOL, Inc., 766 F.3d 1364, 1373, 112 USPQ2d 1188 (Fed. Cir. 2014). 
For the purpose of further examination, the claim has been interpreted as defining an association of one object between the 2D images in the group of 2D images.

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.

Claim(s) 1-3, 5-7, and10-17 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Amin et al. (“Multi-view Pictorial Structures for 3D Human Pose Estimation,” In Proceedings British Machine Vision Conference, 2013, pp: 45.1-45.12), hereinafter referred to as Amin.
Regarding claim 1, Amin teaches a method of determining positioning of objects in a scene, said method comprising:
obtaining object detection data corresponding to two-dimensional, 2D, images of the scene, said object detection data comprising an object identifier of a respective object in a respective 2D image among the 2D images of the scene and a location of a respective reference point of the respective object in the respective 2D image (Amin pg. 2: “Representing 3D pose as a collection of 2D projections allows to directly tap into recent literature on articulated 2D pose estimation … we propose a 2D pose estimation approach that extends our state-of-the-art 2D pictorial structures model [22] with color features and more effective spatial terms”; Amin Fig. 1: a mixture of multi-view 2D pictorial structures are used to estimate the 3D pose);
processing the object detection data to generate candidate association data which associates pairs of objects between the 2D images of the scene (Amin Fig. 1: shows the multi-view correspondence and appearance constraints; Amin pg. 5: “We denote the 2D body configuration as Lm and the image evidence as Im for view m … we introduce pairwise factors between every pair of corresponding parts in each view … When more than two views are available we connect the corresponding 2D body parts in all pairs of views”);
computing a plurality of estimated three-dimensional, 3D, positions in a scene coordinate system of the scene for associated pairs of objects in the candidate association data (Amin pg. 6: “Multi-view correspondence. The factor fncor encodes the constraint that part locations in each view should agree on the same 3D position. Given a pair of corresponding part locations ln1 and ln2 we first reconstruct the corresponding position of the part in 3D using linear triangulation”);
determining one or more clusters of the plurality of estimated 3D positions (Amin pg. 5: “We obtain the mixture components by clustering the training data with k-means and learning a separate model for each cluster”; Amin pg. 6: “we first cluster the 3D training poses with k-means”; Amin Table 1(c); Amin pg. 7: “With our mixture of pictorial structures (with 16 components, 3D clustering, and min-var selection) the error significantly reduces to 83.2mm”);
generating, based on estimated 3D positions in at least one cluster among the one or more clusters, final association data which associates one or more objects between the 2D images of the scene (Amin Table 1(c) & pg. 7 discussed above); and
computing, based on the final association data, one or more final 3D positions in the scene coordinate system of the scene for one or more reference points of said one or more objects (Amin Fig. 1 & pg. 6: “Given a pair of corresponding part locations ln1 and ln2 we first reconstruct the corresponding position of the part in 3D using linear triangulation … Finally, given the 2D projections estimated by the multi-view pictorial structures model we reconstruct the 3D pose using triangulation”).

Regarding claim 2, Amin teaches the method of claim 1, wherein said processing the object detection data comprises: determining a candidate correspondence of said one or more objects between pairs of 2D images among the 2D images of the scene, and wherein the candidate association data associates object identifiers between said pairs of 2D images to represent the candidate correspondence (Amin Fig. 1 & pg. 3: “We build on our publicly available implementation of pictorial structures [22], which consists of 10 parts that correspond to left/right body limbs, torso and head”; Amin pg. 6: “We define the joint appearance feature vector by concatenating the features from multiple views … and train a boosted part detector using this representation”). 

Regarding claim 3, Amin teaches the method of claim 2, wherein the candidate association data further associates a set of reference points between said pairs of 2D images (Amin pg. 3: “we describe our approach to 2D pose estimation that relies on the pictorial structures model”; Amin pg. 4: “The pairwise terms in Eq. 1 encode the spatial constraints between model prats and are modeled with a Gaussian distribution in the transformed space of the joint between two parts”; Amin pg. 6: “Given a pair of corresponding part locations ln1 and ln2 we first reconstruct the corresponding position of the part in 3D using linear triangulation”). 

Regarding claim 5, Amin teaches the method of claim 1, wherein said generating the final association data comprises: 
determining a primary object association between a group of 2D images for said at least one cluster, the primary object association identifying a primary object in each 2D image among the group of 2D images (Amin Fig. 1 & pg. 5 discussed above; Amin pg. 7: “For each view this results in 896 training images from 5 subjects and 1154 test images from 7 subjects”); 
computing, based on the primary object association, at least one candidate 3D position in the scene coordinate system of the scene (Amin Fig. 1 & pg. 6 discussed above, triangulation is used to determine 3D position); and 
projecting said at least one candidate 3D position onto the group of 2D images to generate at least one projected 2D position on said each 2D image, wherein the final association data is generated based on said at least one projected 2D position on said each 2D image (Amin Fig. 1; Amin pg. 1: “In this paper we argue that the search complexity can be reduced significantly by formulating the 3D inference problem as a joint inference over 2D projections of pose in each of the camera views”; Amin pg. 2: “Representing 3D pose as a collection of 2D projections allows to directly tap into recent literature on articulated 2D pose estimation”; Amin pg. 5: “In the first step we jointly estimate the 2D projections of the 3D body joints in each view … In the second step, we use the estimated 2D projections and recover the 3D pose by triangulation”). 

Regarding claim 6, Amin teaches the method of claim 5, wherein said computing the at least one candidate 3D position comprises: computing a plurality of candidate 3D positions for reference points of said primary object in said each 2D image, wherein said projecting results in projected 2D positions on said each 2D image, and wherein said generating the final association data further comprises: evaluating the projected 2D positions on said each 2D image in relation to reference points of the respective object in said each 2D image, wherein the final association data is generated based on said evaluating (Amin Fig. 1, pg. 1, & pg. 5 discussed above; also see Amin pg. 6: “We then project the training data of each 3D cluster and learn 2D models from the projected data. We visualize the components learned on the boxing data in Fig. 1. Note that the resulting 2D mixture components are consistent across views by construction as they are learned from the projections of the same 3D poses. We exploit this fact by jointly selecting the best mixture component across all views and adapt the component selection procedure introduced in Sec. 2 accordingly”). 

Regarding claim 7, Amin teaches the method of claim 6, wherein said evaluating the projected 2D positions comprises: computing a comparison score for the projected 2D positions on said each 2D image in relation to the reference points of the respective object in said each 2D image; selecting, based on the comparison score, objects in the group of 2D images and including, in the final association data, an association between thus-selected objects in the group of 2D images (Amin pg. 6 discussed above – the best mixture components are chosen; also see Amin pg. 5: “We select the mixture component using criteria directly related to the quality of the pose estimation … we select the best component with the minimal uncertainty in the marginal posterior distributions of the body parts”). 

Regarding claim 10, Amin teaches the method of claim 5, wherein the primary object association defines a consistent association of one object between the 2D images in the group of 2D images (Amin Fig. 1). 

Regarding claim 11, Amin teaches the method of claim 5, wherein said determining the primary object association comprises: evaluating the estimated 3D positions in said at least one cluster to select a set of estimated 3D positions that originate from a single object in each 2D image among the group of 2D images, wherein the primary object association identifies the single object in each 2D image among the group of 2D images (Amin Fig. 1: the associated 2D positions and the triangulated 3D positions correspond to a single object, e.g., a person).

Regarding claim 12, Amin teaches the method of claim 11, wherein said computing the plurality of estimated 3D positions comprises: assigning a score value to each estimated 3D position in the plurality of estimated 3D positions, wherein the set of estimated 3D positions is selected to optimize an aggregation of score values while ensuring that the set of estimated 3D positions originates from one object in said each 2D image in the group of 2D images (Amin pg. 5-6 discussed above teaches that the best mixture components having minimal uncertainty are selected). 

Regarding claim 13, Amin teaches the method of claim 12, wherein the score value is a probability value assigned by said processing the object detection data (Amin pg. 5: the best mixture component having the minimal uncertainty is selected and the uncertainty is calculated using a covariance matrix). 

Regarding claim 14, Amin teaches the method of claim 1, wherein the respective object among the associated objects in the candidate association data is assigned a plurality of reference points, and wherein said computing the plurality of estimated 3D positions is performed for a subset of the plurality of reference points of the respective object among the associated objects in the candidate association data (Amin Fig. 1: a pair of image is compared for corresponding/matching points, i.e., one image is always a reference for the other image. Also see Amin Eq. 4 & pg. 5: “When more than two views are available we connect the corresponding 2D body parts in all pairs of views. The posterior in Eq. 4 then includes multi-view appearance and correspondence factors for each pair of connected parts in all views as well as within-view spatial and appearance factors”; see Amin pg. 6 discussed above regarding estimating 3D position using triangulation). 

Regarding claim 15, Amin teaches the method of claim 1, wherein said computing the one or more final 3D positions comprises: combining, between the 2D images of the scene and in accordance with the final association data, locations of said one or more reference points of said one or more objects, and operating a position calculation function on the thus-combined locations to generate the one or more final positions (Amin Fig. 1 and pg. 5-6 discussed above). 

Regarding claim 16, Amin teaches the method of claim 1, wherein the respective 2D image comprises a 2D digital image, and wherein said location of the respective reference point of the respective object in the respective 2D image is given in a local coordinate system with a fixed relation to the 2D digital image (Amin Fig. 1: the images used for the multi-view pictorial structure are 2D – the locations of the correspondence points are in a fixed relationship between the image pairs; also see Amin pg. 7: “In order to compensate for the slight differences in positioning of joints in our model and in HumanEva we add a fixed offset to each joint”). 

Regarding claim 17, Amin teaches the method of claim 16, further comprising: obtaining 2D digital images captured by imaging devices facing the scene; and processing the 2D digital images to generate said object detection data (Amin Fig. 1: the images used for the multi-view pictorial structure are 2D). 

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claim(s) 4, 8, 19, and 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Amin et al. (“Multi-view Pictorial Structures for 3D Human Pose Estimation,” In Proceedings British Machine Vision Conference, 2013, pp: 45.1-45.12), in view of Hallett et al. (US 2020/0143561 A1), hereinafter referred to as Amin and Hallett, respectively.
Regarding claim 4, Amin teaches the method of claim 1, wherein said determining the one or more clusters comprises: operating a clustering algorithm on the plurality of estimated 3D positions (Amin pg. 5: “We obtain the mixture components by clustering the training data with k-means and learning a separate model for each cluster”). 
However, Amin does not appear to explicitly teach that the clustering algorithm is density-based.
Pertaining to the same field of endeavor, Hallett teaches that the clustering algorithm is density-based (Hallett ¶0033: “this location may include an overlap region of multiple fields of view”; Hallett ¶0076: “where the pair of entities may have been detected from a set of images from multiple cameras”; Hallett ¶0078: “embodiments may use a density-based clustering algorithm to a set of attributes amongst the set of entities associated with the set of entity pairs in order to determine which of the set entities are most similar to each other”).
Amin and Hallett are considered to be analogous art because they are directed to multi-view tracking. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the multi-view pose estimation (as taught by Amin) to use a density-based clustering algorithm (as taught by Hallett) because the combination allows determining similar objects (Hallett ¶0078).

Regarding claim 8, Amin teaches the method of claim 7, wherein said generating the final association data further comprises: identifying, among the plurality of estimated 3D positions, a first set of estimated 3D positions that correspond to said association between the thus-selected objects; generating an updated plurality of estimated 3D positions by removing the first set of estimated 3D positions from the plurality of estimated 3D positions (Amin pg. 6 footnote: “In all experiments we sample 1,000 locations for each part and remove the duplicates”). 
However, Amin does not appear to explicitly teach repeating said determining the one or more clusters and said generating the final association data for the updated plurality of estimated 3D positions.
Pertaining to the same field of endeavor, Hallett teaches repeating said determining the one or more clusters and said generating the final association data for the updated plurality of estimated 3D positions (Hallett ¶0083: “upon a determination that the matching entity confidence factor satisfies the matching confidence threshold, the entity-tracking system may associate the first entity with the second entity. The entity-tracking system may associate the first entity with the second entity directly, such as by updating a record representing the second entity to include a reference to the first entity, Alternatively, or in addition, the entity-tracking system may associate the first entity with the second entity by updating a record indicating that the first entity is associated with the reference model and updating a second record indicating that the second entity is associated with the reference mode.”; Hallett ¶0095: “this operation may be repeated for each of a set of entities detected”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the multi-view pose estimation (as taught by Amin) to repeat the process (as taught by Hallett) because the combination results in a higher confidence factor (Hallett ¶0095).

Regarding claim 19, Amin teaches a pictorial structure model comprising the method and processes described in claim 1 (see the rejection of claim 1 above). 
However, Amin does not appear to explicitly teach a non-transitory computer-readable medium comprising computer instructions executed by a processing system.
Pertaining to the same field of endeavor, Hallett teaches a non-transitory computer-readable medium comprising computer instructions executed by a processing system (Hallett ¶0008: “a system, including: one or more processors; and memory storing instructions that when executed by the processors cause the processors to effectuate operations of the above-mentioned process”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the multi-view pose estimation (as taught by Amin) to use a non-transitory computer-readable medium (as taught by Hallett) because the combination is more convenient and allows the algorithm to be stored on a memory and automatically executed by a processor of the computer rather than being manually written each time.

Regarding claim 20, Amin teaches a pictorial structure model comprising the method and processes described in claim 1 (see the rejection of claim 1 above). However, Amin does not appear to explicitly teach a monitoring device configured to process the processes described in claim 1.
Pertaining to the same field of endeavor, Hallett teaches a device performing the method (Hallett ¶0008 discussed above).
Therefore, claim 20 is rejected using the same rationale as applied to claims 1 and 19 discussed above.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the multi-view pose estimation (as taught by Amin) to use a device (as taught by Hallett) because the combination is more convenient and allows the algorithm to be stored on a memory and automatically executed by a processor of the computer rather than being manually written each time.

Claim(s) 18 is/are rejected under 35 U.S.C. 103 as being unpatentable over Amin et al. (“Multi-view Pictorial Structures for 3D Human Pose Estimation,” In Proceedings British Machine Vision Conference, 2013, pp: 45.1-45.12), in view of Ryu et al. (US 2018/0308253 A1), hereinafter referred to as Amin and Ryu, respectively.
Regarding claim 18, Amin teaches the method of claim 1, further comprising: matching the one or more final 3D positions to one or more final 3D positions computed (Amin Fig. 1). 
However, Amin does not appear to explicitly teach that the matching is performed at one or more preceding time points and/or at one or more succeeding time points.
Pertaining to the same field of endeavor, Ryu teaches that the matching is performed at one or more preceding time points and/or at one or more succeeding time points (Ryu ¶0007: “predicting target object templates formed by mapping effective pixel positions of the target object in frames generated by the assistant vision sensor in time points adjacent to the timestamp of the frame corresponding to the first target object template to the imaging plane of the dynamic vision sensor according to the spatial relative relation between the dynamic vision sensor and the assistant vision sensor”; Ryu ¶0024: “the time points adjacent to the timestamp of the frame corresponding to the first target object template comprises: time points of predetermined time intervals between the timestamp of the frame corresponding to the first target object template and a timestamp of a previous frame, and/or time points of predetermined time intervals between the timestamp of the frame corresponding to the first target object template and a timestamp of a next frame”).
Amin and Ryu are considered to be analogous art because they are directed to multi-view tracking. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the multi-view pose estimation (as taught by Amin) to match the adjacent frames (as taught by Ryu) because the combination allows tracking a moving target (Ryu Abstract).

Allowable Subject Matter
Claim 9 is objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

The following is a statement of reasons for the indication of allowable subject matter: 
Regarding claim 9, the prior art of record teaches the method of claim 8 (refer to the rejection above).
However, the prior art, alone or in combination, does not appear to teach or suggest identifying a second set of estimated 3D positions which are located within a predefined distance from the one or more final 3D positions computed based on the final association data, wherein said generating the updated plurality of estimated 3D positions further comprises: removing the second set of estimated 3D positions from the plurality of estimated 3D positions. 
U.S. 9,868,212 (hereinafter Hinterstoisser) further teaches that subsampling based solely on a distance constraint may remove at least some important information (e.g., it would remove a point if it is too close to another point in the subsampled set even if its normal is fundamentally different). Therefore, there is no obvious motivation to combine the closest prior art (Amin) with Hinterstoisser.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SOO J SHIN whose telephone number is (571)272-9753. The examiner can normally be reached M-F; 10-6.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Matthew Bella can be reached on (571)272-7778. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/Soo Shin/Primary Examiner, Art Unit 2667