DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Objections
Claims 1-6, are objected to because of the following informalities:  In claim 1, line 10, recites “the different time instances:” however should recite “the different time instances;”.  Claims 2-6, are objected based on their dependency on the objected base claim and inherent the same deficiency. Appropriate correction is required.
Claims 7-12, are objected to because of the following informalities:  In claim 7, line 10, recites “the different time instances:” however should recite “the different time instances;”.  Claims 8-12, are objected based on their dependency on the objected base claim and inherent the same deficiency. Appropriate correction is required.
Claims 13-18, are objected to because of the following informalities:  In claim 13, line 5, recites “an detection unit” however should recite “a detection unit”. Further line 10, recites “the different time instances:” however should recite “the different time instances;”. Claims 14-18, are objected based on their dependency on the objected base claim and inherent the same deficiency.  Appropriate correction is required.




Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees.  A nonstatutory double patenting rejection is appropriate where the claims at issue are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); and In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on a nonstatutory double patenting ground provided the reference application or patent either is shown to be commonly owned with this application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b).
The USPTO internet Web site contains terminal disclaimer forms which may be used.  Please visit http://www.uspto.gov/forms/.  The filing date of the application will determine what form should be used.  A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission.  For more information about eTerminal Disclaimers, refer to http://www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp. 
Claims 1, 7, and 13, of the instant application are rejected on the ground of nonstatutory obviousness-type double patenting as being unpatentable over claims 1-2, 8-9, and 15-16, of US Patent 11,308,312 B2 as being described below.  Although the claims at issue are not identical, they are not patentably distinct from each other because claims 1, 7, and 13, of the instant application merely broadens the scope of the claims 1-2, 8-9, and 15-16, of US Patent 11,308,312 B2. It has been held that the omission of an element and its function is an obvious expedient if the remaining elements perform the same function as before (See MPEP §804.II.B.1 and §2144.04 IIA). 
Instant Application 17/702,570.
US Patent 11,308,312 B2
Claim 1. A method implemented on at least one machine including at least one processor, memory, and communication platform capable of connecting to a network for estimating occupancy of a three-dimensional (3D) scene, the method comprising: receiving continuous image data, acquired by at least one sensor from the 3D scene having at least one of a user and one or more objects therein, wherein the user is engaged in a human machine dialogue; detecting, based on the continuous image data, the user and the one or more objects at different time instances and corresponding characteristics associated therewith; estimating 3D occupancy dynamics of the 3D scene across the different time instances by, with respect to each of the different time instances: determining, a spatial relationship between the user and each of the one or more objects, a 3D volumetric space occupied by the user, and a 3D space occupancy of the 3D scene based on the 3D volumetric space and the spatial relationship with each of the one or more objects, and constructing a 3D space occupancy record of the 3D scene associated with the time instance based on the 3D space occupancy, wherein the 3D space occupancy records related to different time instances describe the 3D occupancy dynamics.





Claim 7. Machine readable and non-transitory medium having information recorded thereon for estimating occupancy of a three-dimensional (3D) scene, wherein the information, when read by the machine, causes the machine to perform the following: receiving continuous image data, acquired by at least one sensor from the 3D scene having at least one of a user and one or more objects therein, wherein the user is engaged in a human machine dialogue; detecting, based on the continuous image data, the user and the one or more objects at different time instances and corresponding characteristics associated therewith; estimating 3D occupancy dynamics of the 3D scene across the different time instances by, with respect to each of the different time instances: determining, a spatial relationship between the user and each of the one or more objects, a 3D volumetric space occupied by the user, and a 3D space occupancy of the 3D scene based on the 3D volumetric space and the spatial relationship with each of the one or more objects, and constructing a 3D space occupancy record of the 3D scene associated with the time instance based on the 3D space occupancy, wherein the 3D space occupancy records related to different time instances describe the 3D occupancy dynamics.





Claim 13. A system for estimating occupancy of a three-dimensional (3D) scene, comprising: at least one sensor configured for acquiring continuous image data from the 3D scene having at least one of a user and one or more objects therein, wherein the user is engaged in a human machine dialogue; an detection unit implemented using a processor and configured for detecting, based on the continuous image data, the user and the one or more objects at different time instances and corresponding characteristics associated therewith; a 3D space occupancy estimator implemented using a processor and configured for estimating 3D occupancy dynamics of the 3D scene across the different time instances by, with respect to each of the different time instances: obtaining, a spatial relationship between the user and each of the one or more objects, a 3D volumetric space occupied by the user, and a 3D space occupancy of the 3D scene based on the 3D volumetric space and the spatial relationship with each of the one or more objects, and constructing a 3D space occupancy record of the 3D scene associated with the time instance based on the 3D space occupancy, wherein the 3D space occupancy records related to different time instances describe the 3D occupancy dynamics.




Claims 1+2. A method implemented on at least one machine including at least one processor, memory, and communication platform capable of connecting to a network for understanding a three dimensional (3D) scene, the method comprising: receiving image data acquired, by a camera at different time instances, with respect to the 3D scene which includes at least one of a user and one or more objects present therein; detecting a face of the user at each of the different time instances; and with respect to each of at least some of the different time instances, generating a 2D user profile of the user based on the face detected at the time instance, wherein the 2D user profile represents a region in the image data occupied by the user at the time instance, obtaining a 3D position of the face of the user in the 3D scene corresponding to the region in the image data, determining, based on a face based human model, a 3D prism in the 3D scene at the 3D position, estimating a 3D volumetric space occupied by the user in the 3D scene based on the 3D prism, determining spatial relationships between the one or more objects and the 3D volumetric space, estimating, based on the 3D volumetric space and the spatial relationships, a 3D space occupancy of the 3D scene, and dynamically updating a 3D space occupancy record of the 3D scene to reflect the estimated 3D space occupancy of the 3D scene.
2. The method of claim 1, wherein the user is engaged in a human machine dialogue.

Claims 8+9. Machine readable and non-transitory medium having information recorded thereon for understanding a three dimensional (3D) scene, wherein the information, when read by the machine, causes the machine to perform: receiving image data acquired, by a camera at different time instances, with respect to the 3D scene which includes at least one of a user and one or more objects present therein; detecting a face of the user at each of the different time instances; and with respect to each of at least some of the different time instances, generating a 2D user profile of the user based on the face detected at the time instance, wherein the 2D user profile represents a region in the image data occupied by the user at the time instance, obtaining a 3D position of the face of the user in the 3D scene corresponding to the region in the image data, determining, based on a face based human model, a 3D prism in the 3D scene at the 3D position, estimating a 3D volumetric space occupied by the user in the 3D scene based on the 3D prism, determining spatial relationships between the one or more objects and the 3D volumetric space, estimating, based on the 3D volumetric space and the spatial relationships, a 3D space occupancy of the 3D scene, and dynamically updating a 3D space occupancy record of the 3D scene to reflect the estimated 3D space occupancy of the 3D scene.
9. The medium of claim 8, wherein the user is engaged in a human machine dialogue.

Claims 15+16. A system for understanding a three dimensional (3D) scene, comprising: a face detection unit configured for receiving image data acquired, by a camera at different time instances, with respect to the 3D scene which includes at least one of a user and one or more objects present therein, and detecting a face of the user at each of the different time instances; and a faced based human tracking unit configured for, with respect to each of at least some of the different time instances, generating a 2D user profile of the user based on the face detected at the time instance, wherein the 2D user profile represents a region in the image data occupied by the user at the time instance; a human 3D occupancy estimator configured for, with respect to the at least some of the time instances, obtaining a 3D position of the face of the user in the 3D scene corresponding to the region in the image data, determining, based on a face based human model, a 3D prism in the 3D scene at the 3D position, estimating a 3D volumetric space occupied by the user in the 3D scene based on the 3D prism, and a spatial relationship identifier configured for determining spatial relationships between the one or more objects and the 3D volumetric space; and a dynamic occupancy updater configured for dynamically updating a 3D space occupancy record of the 3D scene to reflect a 3D space occupancy of the 3D scene that is estimated based on the 3D volumetric space and the spatial relationships.


Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1-2, 4-8, 10-14 and 16-18, is/are rejected under 35 U.S.C. 103 as being unpatentable over Schoenberg (US PGPUB 2017/0270711 A1) and further in view of Breazeal (US PGPUB 2015/0314454 A1).

As per claim 1, Schoenberg discloses a method implemented on at least one machine (Schoenberg, Fig. 6:600 and Fig. 7:700) including at least one processor (Schoenberg, Fig. 6:600 and Fig. 7:700), memory (Schoenberg, Fig. 7:704), and communication platform capable of connecting to a network for estimating occupancy of a three-dimensional (3D) scene (Schoenberg, Fig. 7:710, and paragraphs 11, 17 and 69), the method comprising: 
receiving continuous image data, acquired by at least one sensor from the 3D scene having at least one of a user and one or more objects therein (Schoenberg, Fig. 3:302:304, and paragraphs 13, 15, 27 and 28, discloses The imaging device(s) may image the physical space continuously, and also please see Fig. 5), 
detecting, based on the continuous image data, the user and the one or more objects at different time instances and corresponding characteristics associated therewith (Schoenberg, paragraphs 11, 13 and 16, discloses regions of the physical space may be identified based on measurements of sensors of the HMD 118 (e.g., GPS, accelerometers/gyroscopes, etc.) and/or processing, such as object recognition); 
estimating 3D occupancy dynamics of the 3D scene across the different time instances by, with respect to each of the different time instances (Schoenberg, Fig. 3:304, and paragraphs 28 and 29): 
determining, a spatial relationship between the user and each of the one or more objects, a 3D volumetric space occupied by the user (Schoenberg, paragraphs 28, 29 and 72), and a 3D space occupancy of the 3D scene based on the 3D volumetric space and the spatial relationship with each of the one or more objects (Schoenberg, paragraphs 11, 26 and 28, discloses As a real-world example, a path may traverse a room, have a stopping point at a location of a piece of furniture (e.g., a couch), then continue from the stopping point to an exit or other destination point (e.g., a doorway exiting a boundary of the physical space, where the physical space may be bound by the field(s) of view of one or more cameras monitoring the physical space).), and 
constructing a 3D space occupancy record of the 3D scene associated with the time instance based on the 3D space occupancy (Schoenberg, paragraphs 28, 30, 48 and 49, discloses the controller 604 may be configured to generate or update a three-dimensional model of the physical using information from outward facing image sensors 610A, 610B ), wherein the 3D space occupancy records related to different time instances describe the 3D occupancy dynamics (Schoenberg, paragraphs 21, 28 and 30, discloses the method returns to 306 to examine a next frame and determine/record occupancy changes in that frame. For example, the analysis of depth images to determine movement and/or interactions of physical objects within the physical space may be performed by examining a plurality of depth images (e.g., sequentially and/or simultaneously)).
Schoenberg does not explicitly disclose wherein the user is engaged in a human machine dialogue;
Breazeal discloses wherein the user is engaged in a human machine dialogue (Breazeal, paragraph 59, discloses interactions may be followed in a “directed dialog” manner. For instance, after the intent of taking a picture has been identified, the PCD 100 may ask directed questions, either for confirming what was just heard or asking for additional information (e.g. Do you want me to take a picture of you?).);
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Schoenberg teachings by implementing multimodal dialog system to the system of Schoenberg, as taught by Breazeal.
The motivation would be to improve and optimize the customization of content/delivery to individual users over time based on user preferences and user reaction (paragraph 151), as taught by Breazeal.


As per claim 2, Schoenberg in view of Breazeal further discloses the method of claim 1, wherein the step of detecting the user comprises: 
detecting, from the continuous image data, a face of the user (Breazeal, paragraphs 65 and 164, discloses PCD 100 may utilize sound localization and face tracking); 
tracking the face of the user appearing in different image frames of the continuous image data (Breazeal, paragraphs 64 and 165); and 
generating, with respect to each of at least some of the different image frames at different time instances (Schoenberg, Fig. 3:302:304, and paragraphs 13, 15, 27 and 28), a 2D profile of the user based on the face detected, wherein the 2D profile corresponds to a region in the image frame where the user is detected (Breazeal, paragraph 67, discloses Grid based particle filters may help to fuse the inputs of 3D (stereo) and 2D (vision) sensing in a single coordinate system and enforce the constraint that the space may be occupied by only one object at any given time).

As per claim 4, Schoenberg in view of Breazeal further discloses the method of claim 1, wherein the 3D occupancy records corresponding to the different time instances include information about a trajectory of the user in the 3D scene during the human machine dialogue (Breazeal, paragraph 59 and 63, discloses The PCD 100 may be configured to locate the person in 3D and accordingly, determine the trajectory of the person using sensors such as vision, depth, motion, sound, color, features & active movement).


As per claim 5, Schoenberg in view of Breazeal further discloses the method of claim 1, wherein the 3D occupancy dynamics are rendered on a device during the human machine dialogue (Breazeal, paragraph 59, 63 and 67).

As per claim 6, Schoenberg in view of Breazeal further discloses the method of claim 4, wherein the human machine dialogue is conducted based on the 3D occupancy dynamics (Breazeal, paragraphs 59, 67, 131 and 399).

As per claim 7, Schoenberg discloses machine readable and non-transitory medium having information recorded thereon for estimating occupancy of a three-dimensional (3D) scene (Schoenberg, paragraphs 11, 17 60, and 69), wherein the information, when read by the machine (Schoenberg, Fig. 7:700:702, and paragraph 60, discloses The logic machine may include one or more processors configured to execute software instructions), causes the machine to perform the following: 
For rest of claim limitations please see the analysis of claim 1.

As per claim 8, please see the analysis of claim 2.

As per claim 10, please see the analysis of claim 4.

As per claim 11, please see the analysis of claim 5.

As per claim 12, please see the analysis of claim 6.

As per claim 13, Schoenberg discloses a system for estimating occupancy of a three-dimensional (3D) scene (Schoenberg, Fig. 6:600 and Fig. 7:700), comprising: 
at least one sensor configured for acquiring continuous image data from the 3D scene having at least one of a user and one or more objects therein (Schoenberg, Fig. 3:302:304, and paragraphs 13, 15, 27 and 28, discloses The imaging device(s) may image the physical space continuously, and also please see Fig. 6: image sensor 610A, and  Fig. 5), 
an detection unit implemented using a processor and configured for detecting, based on the continuous image data, the user and the one or more objects at different time instances and corresponding characteristics associated therewith (Schoenberg, paragraphs 11, 13 and 16, discloses regions of the physical space may be identified based on measurements of sensors of the HMD 118 (e.g., GPS, accelerometers/gyroscopes, etc.) and/or processing, such as object recognition); 
a 3D space occupancy estimator implemented using a processor and configured for estimating 3D occupancy dynamics of the 3D scene across the different time instances by, with respect to each of the different time instances (Schoenberg, Fig. 3:304, and paragraphs 28 and 29): 
obtaining, a spatial relationship between the user and each of the one or more objects, a 3D volumetric space occupied by the user Schoenberg, paragraphs 28, 29 and 72), and a 3D space occupancy of the 3D scene based on the 3D volumetric space and the spatial relationship with each of the one or more objects (Schoenberg, paragraphs 11, 26 and 28, discloses As a real-world example, a path may traverse a room, have a stopping point at a location of a piece of furniture (e.g., a couch), then continue from the stopping point to an exit or other destination point (e.g., a doorway exiting a boundary of the physical space, where the physical space may be bound by the field(s) of view of one or more cameras monitoring the physical space).), and 
constructing a 3D space occupancy record of the 3D scene associated with the time instance based on the 3D space occupancy (Schoenberg, paragraphs 28, 30, 48 and 49, discloses the controller 604 may be configured to generate or update a three-dimensional model of the physical using information from outward facing image sensors 610A, 610B ), wherein the 3D space occupancy records related to different time instances describe the 3D occupancy dynamics (Schoenberg, paragraphs 21, 28 and 30, discloses the method returns to 306 to examine a next frame and determine/record occupancy changes in that frame. For example, the analysis of depth images to determine movement and/or interactions of physical objects within the physical space may be performed by examining a plurality of depth images (e.g., sequentially and/or simultaneously)).
Schoenberg does not explicitly disclose wherein the user is engaged in a human machine dialogue;
Breazeal discloses wherein the user is engaged in a human machine dialogue (Breazeal, paragraph 59, discloses interactions may be followed in a “directed dialog” manner. For instance, after the intent of taking a picture has been identified, the PCD 100 may ask directed questions, either for confirming what was just heard or asking for additional information (e.g. Do you want me to take a picture of you?).);
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Schoenberg teachings by implementing multimodal dialog system to the system of Schoenberg, as taught by Breazeal.
The motivation would be to improve and optimize the customization of content/delivery to individual users over time based on user preferences and user reaction (paragraph 151), as taught by Breazeal.

As per claim 14, Schoenberg in view of Breazeal further discloses the system of claim 13, wherein the 3D space occupancy estimator comprises: 
a face detection unit implemented using a processor and configured for detecting, from the continuous image data, a face of the user (Breazeal, paragraphs 65 and 164, discloses PCD 100 may utilize sound localization and face tracking); 
a face based human tracking unit implemented using a processor and configured for tracking the face of the user appearing in different image frames of the continuous image data (Breazeal, paragraph 63, discloses The PCD 100 may be configured to locate the person in 3D and accordingly, determine the trajectory of the person using sensors such as vision, depth, motion, sound, color, features & active movement); and 
a human 3D occupancy estimator implemented using a processor and configured for generating, with respect to each of at least some of the different image frames at different time instances (Schoenberg, Fig. 3:302:304, and paragraphs 13, 15, 27 and 28), a 2D profile of the user based on the face detected, wherein the 2D profile corresponds to a region in the image frame where the user is detected (Breazeal, paragraph 67, discloses Grid based particle filters may help to fuse the inputs of 3D (stereo) and 2D (vision) sensing in a single coordinate system and enforce the constraint that the space may be occupied by only one object at any given time).

As per claim 16, please see the analysis of claim 4.

As per claim 17, please see the analysis of claim 5.

As per claim 18, please see the analysis of claim 6.


Claim(s) 3, 9, and 15, is/are rejected under 35 U.S.C. 103 as being unpatentable over Schoenberg (US PGPUB 2017/0270711 A1) and further in view of Breazeal (US PGPUB 2015/0314454 A1) and further in view of Friedland (US PGPUB 2017/0083753).

As per claim 3, Schoenberg in view of Breazeal further discloses the method of claim 2, wherein the step of determining the 3D volumetric space at the time instance comprises: determining a 3D position of the face of the user in the 3D scene corresponding to the region in the image frame (Breazeal, paragraph 63, discloses Person Tracking: The PCD 100 may be configured to locate the person in 3D and accordingly, determine the trajectory of the person using sensors such as vision, depth, motion, sound, color, features & active movement); 
Schoenberg in view of Breazeal does not explicitly disclose obtaining, based on a face based human model, a 3D prism in the 3D scene corresponding to the 3D position; and estimating the 3D volumetric space occupied by the user in the 3D scene based on the 3D prism.
Friedland discloses obtaining, based on a face based human model, a 3D prism in the 3D scene corresponding to the 3D position (Friedland, Figs. 37A-D, and paragraph 109 and 132); and 
estimating the 3D volumetric space occupied by the user in the 3D scene based on the 3D prism (Friedland, paragraph 132, discloses A face track can be thought of as a cylindrical or tube-like volume in spacetime, with the time boundaries of the space-time volume defined by the times of acquisition of the first and final frames of the face track and the three-dimensional volume corresponding to the area occupied by the face sub-image within video frames integrated over time).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Schoenberg in view of Breazeal teachings by implementing face tracking technique to the system, as taught by Friedland.
The motivation would be to facilitate real-time processing of information acquired from two or more imaging devices (paragraph 157), as taught by Friedland.

As per claim 9, please see the analysis of claim 3.

As per claim 15, please see the analysis of claim 3.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SYED Z HAIDER whose telephone number is (571)270-5169. The examiner can normally be reached MONDAY-FRIDAY 9-5:30 EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, SAM K Ahn can be reached on 571-272-3044. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/SYED HAIDER/Primary Examiner, Art Unit 2633