DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1, 10-15, 17-18 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Wang et al. (US patent publication: 20200202119, “Wang”) in view of Chipalkatty et al. ( US Patent Publication: 20180311818, “Chipalkatty”).

Regarding claim 18,  Wang teaches, A system ( Fig. 2) comprising:
one or more processors ( processor of AR apparatus ); and
one or more computer-readable non-transitory storage media coupled to one or more of the processors and comprising instructions operable when executed by one or more of the processors (“[181] The instructions or software to control a processor or computer to implement the hardware components and perform the methods as described above, and any associated data, data files, and data structures, are recorded, stored, or fixed in or on one or more non-transitory computer-readable storage media.”) to cause the system to:
determine a user intent to perform a task in a physical environment surrounding the user; (par. 0004, [0076]… A server may recognize the apparatus user and establish a communication connection with the AR apparatus. When a video is received from the camera, the server may predict an action of the apparatus user captured in the video. [0077] Referring to FIG. 4A, in the scene 410, an action of an apparatus user is predicted to correspond to “eating”. In this example, AR functions mapped to eating may include, as examples, a function of detecting and highlighting food, and a function of displaying calories of food.” “[0175] Referring to FIG. 19, the action prediction apparatus 1900 includes a processor 1910 and a memory 1920. Any one or any combination of the server 700 of FIG. 7, the AR apparatus 800 of FIG. 8 and the AR apparatus 1200 may perform an action prediction by implementing the action prediction apparatus 1900. For example, the action prediction apparatus 1900 may be implemented as at least a portion of at least one of the server 700 of FIG. 7, the AR apparatus 800 of FIG. 8 and the AR apparatus 1200 of FIG. 12.”)
send a query based on the user intent to a mapping server that stores AR activities containing spatial and semantic information of physical items in the physical environment surrounding the user, wherein the mapping server is configured to identify a subset of the physical items that are relevant to the user intent; ( AR apparatus  230 sends a query on the user intent to a mapping server that stores different AR activities  to map with user action. “[0068] For example, when a required AR function is not stored in the AR apparatus 230, the server 210 may control the AR apparatus 230 so that the AR apparatus 230 may download the required AR function from the AR application provider 240. The AR application provider 240 may store various AR applications. For example, the AR application provider 240 may store AR applications for AR functions stored in the mapping DB.” ) but doesn’t expressly teach that a three-dimensional (3D) occupancy map containing spatial and semantic information of physical items in the physical environment surrounding the user.
However, Chipalkatty teaches, a three-dimensional (3D) occupancy map containing spatial and semantic information of physical items in the physical environment surrounding a user. (“[0043] The planning module 117 bridges the gap between task definition and task execution, computing a world representation from explicit and implicit task definitions. In particular, the planning module 117 utilizes task-planning and motion-planning methodologies to create the robot workflow program from the tasks and constraints provided by the user and/or the perception module 115. Obstacles and free-space may be represented by a discrete 3D occupancy grid map, and the planning block may compile poses of all objects and workpieces within the workspace.”)
Wang and Chipalkatty are analogous as they are from the AR processing.
Therefore it would have been obvious for an ordinary skilled person in the art before the effective filing date of the claimed invention to have modified Wang to have  send a query based on the user intent to a mapping server that stores a 3D occupancy map containing spatial and semantic information of physical items in the physical environment surrounding the user, wherein the mapping server is configured to identify a subset of the physical items that are relevant to the user intent as taught by Chipalkatty. 
The motivation to include the modification is that AR apparatus would get better viewable AR activities in a 3d map.
Wang as modified by Chipalkatty teaches, receive, from the mapping server, a response to the query comprising a portion of the 3D occupancy containing the subset of the physical items specific to the user intent; (Wang, “[0077]….In this example, AR functions mapped to eating may include, as examples, a function of detecting and highlighting food, and a function of displaying calories of food. The AR apparatus executes the above AR functions based on a control of the server. For example, in FIG. 4A, food is detected and highlighted in the AR apparatus, and calories of the detected food may be displayed as “400 KJ”. In the example of FIG. 4A, the AR apparatus is assumed as smart glasses. However, this is only an example, and the AR apparatus may be, for example, a smartphone or a smart watch.” Chipalkatty is included to send 3d occupancy map for the AR application.)
capture a plurality of video frames of the physical environment using a camera associated with a device worn by the user; (Wang, [0076] FIGS. 4A-4C illustrate examples of application scenes of a third-person video. Referring to FIGS. 4A-4C, in each of scene 410 (FIG. 4A), scene 420 (FIG. 4B) and scene 430 (FIG. 4C), an AR apparatus and an apparatus user are captured using a camera. }and 
process the plurality of video frames and the portion of the 3D occupancy map to provide one or more action labels associated with the task. (Wang,  Refer to Fig. 4A-C, see the output AR display “[0077]….In this example, AR functions mapped to eating may include, as examples, a function of detecting and highlighting food, and a function of displaying calories of food. The AR apparatus executes the above AR functions based on a control of the server. For example, in FIG. 4A, food is detected and highlighted in the AR apparatus, and calories of the detected food may be displayed as “400 KJ”. In the example of FIG. 4A, the AR apparatus is assumed as smart glasses. However, this is only an example, and the AR apparatus may be, for example, a smartphone or a smart watch.”  These AR activities or functions are displayed better with part of 3d occupancy map of the AR activities and user action.)

Claim 1 is directed to a method and its steps are similar in scope and functions of the elements of the device claim 18 and therefore claim 1 is rejected with same rationales as specified in the rejection of claim 18.

Claim 15 is directed to One or more computer-readable non-transitory storage media ( Wang, “[181] The instructions or software to control a processor or computer to implement the hardware components and perform the methods as described above, and any associated data, data files, and data structures, are recorded, stored, or fixed in or on one or more non-transitory computer-readable storage media.”)   and its elements are similar in scope and functions of the elements of the device claim18 and therefore claim 15 is rejected with same rationales as specified in the rejection of claim 18.

Regarding claims 10, 17 and 20,  Wang as modified by Chipalkatty  teaches, the task is an action direction task;  (Wang, “[0079] Referring to FIG. 4C, in the scene 430, an action of an apparatus user is predicted to correspond to a “smoking” action. (“[0079]… In this example, AR functions mapped to the smoking action may include a function of displaying whether a user is currently located in a smoking area and a function of guiding a nearest smoking area. The AR apparatus executes the above AR functions based on the control of the server.” Similarly Fig. 4 410, 420 and 430 are all actions.) and
the one or more action labels aid in performing the action direction task.  (Wang, “[0079]….For example, in FIG. 4C, a current location corresponding to a “smoking area”, and a distance to a nearby smoking area and navigation information may be displayed on the AR apparatus.” Labels are shown in last column of the images in Fig.4A. $B and 4C).

Regarding claim 11, Wang as modified by Chipalkatty   teaches, wherein:
the device worn by the user is an augmented-reality device; ( Wang, Fig. 2 element 230, the AR apparatus is worn by user) and
the one or more action labels are overlaid on a display screen of the augmented-reality device. (Wang, “[0079]….For example, in FIG. 4C, a current location corresponding to a “smoking area”, and a distance to a nearby smoking area and navigation information may be displayed on the AR apparatus.” Labels are shown in last column of the images in Fig.4A. $B and 4C).

Regarding claim 12, Wang as modified by Chipalkatty  teaches wherein the plurality of video frames and the portion of the 3D occupancy map are processed in parallel. (Wang “[0081]…. Many of the operations shown in FIG. 5 may be performed in parallel or concurrently.” Therefore processing od video frame and processing of portion of the 3D occupancy map can be processed in parallel.)

Regarding claim 13,  Wang as modified by Chipalkatty  teaches, wherein the user intent is determined explicitly through a voice command of the user. (Wang, “[0069] FIG. 3 illustrates an example of a mapping relationship between a human body action and an AR function. FIG. 3 illustrates actions, and AR functions mapped to the actions. The actions may include, for example, but not limited to, singing, smoking, handshaking or instrument playing.”  Here singing is the user intent and which is determined through voice of an user.)



Regarding claim 14, Wang as modified by Chipalkatty  teaches, wherein the user intent is determined automatically, without explicit user input, based on one or more of a current location, time of day, or previous history of the user. ( Wang predicts user intent  using CNN which uses a previous history. Wang, [0017] The performing of the action prediction may include acquiring a video-based local feature image from an image frame of the video, extracting a first feature associated with a human body pose action and a second feature associated with an interactive action from the video-based local feature image with a first 3D CNN having a human body pose action as a classification label and a second 3D CNN having an interactive action as a classification label; and fusing the first feature and the second feature and acquiring an action classification result.”)

Claim(s)  7 is rejected under 35 U.S.C. 103 as being unpatentable over Wang as modified by Chipalkatty and further in view of Taylor et al. ( US Patent Publication: 20200262427, “Taylor”).

Regarding claim 7,  Wang as modified by Chipalkatty doesn’t expressly teach,  wherein the portion of the 3D occupancy is a parent-children semantic occupancy map comprising a parent voxel and a plurality of children voxels. 
However, Taylor teaches, the portion of the 3D occupancy is a parent-children semantic occupancy map comprising a parent voxel and a plurality of children voxels. (“[0095] The occupancy map 318 may include one or more root nodes. Where the occupancy map 318 is broken down into an octree data structure, each parent node within the octree data structure may have at most eight children, such that each node may be represented by three bits.”)
Wang as modified by Chipalkatty and Taylor are analogous as they are from the field of 3d graphics processing.
Therefore it would have been obvious for an ordinary skilled person in the art before the effective filing date of the claimed invention to have modified Wang as modified by Chipalkatty to have the portion of the 3D occupancy as a parent-children semantic occupancy map comprising a parent voxel and a plurality of children voxels as taught by Taylor.
The motivation to include Taylor is to have a standard method occupancy map generation  using known method of resource mapping in an occupancy map.

Claim(s)  9 is rejected under 35 U.S.C. 103 as being unpatentable over Wang as modified by Chipalkatty and further in view of Hadar et al. ( US Patent Publication: 20220051111, “Hadar”).

Regarding claim 9,  Wang as modified by Chipalkatty  is silent regarding wherein the subset of the physical items specific to the user intent is identified, at the mapping server, using a scene graph or a knowledge graph.
However, Hadar teaches, a subset of the physical items is identified, using a scene graph or a knowledge graph. (“[0016] This specification generally describes a knowledge graph system that determines nodes that provide the most impact on target nodes and improve the knowledge graphs by adjusting the impact of the actual element represented by the node. A knowledge graph can represent a real world system, such as a computer network, roadways in a geographic area, or a population of people during an epidemic outbreak. The nodes of the knowledge graph can represent the real world elements in the system, e.g., computing devices in a computer network, roads in the geographic area, or people in the population. The edges between the nodes can represent the relationships between the real world elements, e.g., pathways between pairs of elements and the characteristics of the pathways.”)
Wang as modified by Chipalkatty and Hadar are analogous as they are from the field of identifyingobjects.
Therefore it would have been obvious for an ordinary skilled person in the art before the effective filing date of the claimed invention to have modified Wang as modified by Chipalkatty to have the subset of the physical items specific to the user intent is identified, at the mapping server, using a scene graph or a knowledge graph similar to identifying a subset of the physical items, using a scene graph or a knowledge graph as taught by Hadar.
The motivation to include Hadar is to have a to use faster method of identifying  real object from an environment.

Allowable Subject Matter
Claims 2-6, 8, 16 and 19 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

Dependent claims 2, 16 and 19 are objected  because  Regarding claims 2, 16 and 19, 
 wherein processing the plurality of video frames and the portion of the 3D occupancy map comprises: generating a first feature map based on processing of the plurality of video frames; (Wong, “[0127] Referring to FIG. 14, at least a portion of image frames is extracted from a video including a human body action and may be input to a 3D CNN 1410. For example, each of the image frames may correspond to a color image, and may have, for example, a dimension of H×W×T×3. The 3D CNN 1410 generates frame-based global feature images of image frames at different times based on spatial domains of the image frames and temporal domains of the image frames.”) 
Though Wong has two feature maps (local and global feature map) but there is no feature map generated from 3D occupancy map. The combination of prior art fails to expressly teach, generating a second feature map based on processing of the portion of the 3D occupancy map; processing the first feature map and the second feature map to generate an action region map, the action region map indicating a probability of action happening within each region of the portion of the 3D occupancy map; filtering, via an attention pooling process, the second feature map associated with the portion of the 3D occupancy map based on the action region map; and using the first feature map associated with the plurality of video frames and the filtered second feature map associated with the portion of the 3D occupancy map to generate the one or more action labels for display on the device worn by the user.

Claims 3-6 are objected by virtue of dependency.

Claim 8 is objected to be allowable because the combination of prior arts fails to expressly teach the limitation, wherein each children voxel of the plurality of children voxels comprises a plurality of grids indicating a coarse location or feature of an item of the subset of the physical items specific to the user intent.


Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Tapas Mazumder whose telephone number is (571)270-7466. The examiner can normally be reached M-F 8:00 AM-5:00 PM PST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kent Chang can be reached on 570-272-7667. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/TAPAS MAZUMDER/           Primary Examiner, Art Unit 2619