DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Priority
Acknowledgment is made of applicant’s claim for foreign priority under 35 U.S.C. 119 (a)-(d). The certified copy has been filed in parent Application No. JP2020-020068, filed on 02/07/2020.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-4, 6-11 is/are rejected under 35 U.S.C. 103 as being unpatentable over D'AMATO et al. (US 20180315329 A1, hereinafter D’Amato), in view of Zhu et al. (US 20200082544 A1, hereinafter Zhu).

Regarding Claim 6, D'Amato teaches a system of generating virtual and real composite image data, comprising (D'Amato, Paragraph [0003], Embodiments of the present technology includes methods and systems for teaching a user to perform a manual task with an extended reality (XR) device)
(D’Amato, Paragraph [0013], learning system that includes a motion capture system to record an expert's hands); one or more processors (D’Amato, Fig. 2A, Element 220 Processor) that perform the following (D’Amato, Paragraph [0008], Rendering the model of the expert's hand may be performed by distributing rendering processes across a plurality of processors):  acquiring captured image data capturing the image of the real space as seen from the user's point of view (D’Amato, Paragraph [0069], an instrument is likely to be within the field of view of the user);  inputting the captured image data into a trained model (D’Amato, Paragraph [0054], generating labelled training data where the representation of the expert's hands is actively measured and tracked by a secondary apparatus, which is then correlated to recordings collected by the motion capture system 210),  the training model outputting segmentation data segmenting the captured image data into a first region in which a target object is displayed (D’Amato, Paragraph [0068], display expert's hands slightly above the piano keys of a stationary piano <read on target object>), a second region in which at least a part of the user's body is displayed (D’Amato, Paragraph [0017], using an XR learning system to display a rendered model of an expert's hands),[[ and a third region that is other than the first and second regions (; ]] and compositing data of the first region and data of the second region with a virtual space image data based on the segmentation data (D’Amato, Paragraph [0068], Once a reference is identified, the model of the expert's hands can be displayed in a proper position and orientation in relation to the stationary reference, e.g., display expert's hands <read on second region> slightly above the piano keys of a stationary piano <read on first region>).
D’Amoto does not explicitly disclose but Zhu teaches a third region (Zhu, Fig. 4, Element 420) that is other than the first and second regions (Zhu, Fig. 4, Element 416 (first region) and 418 (second region) Paragraph [0099], [0104], in a second frame 404 of the sequence, motion is detected between the first and second frames, and a first region of interest 416 and second region of interest 418 are selected based on the detected motion between the first and second frames. implemented at this stage, but only in respect of the image data for the third region of interest 420 in an attempt to classify any objects within that third region of interest 420).
Zhu and D'Amato are analogous since both of them are dealing with using neural network to implementing the process image data. D'Amato provided a way of using learning system in conventional neural network during the image process by combing the user body part and the musical instrument in order for user to practice the musical instrument. Zhu provided a way of using neural network during the image process and using training model with multiple layers and regions to display and/or hide images data as needed with the configured data. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention was made to incorporate training model with different region and layers taught by Zhu into modified invention of D'Amato such that during the image processing system will be able to using multiple layers in the training model including intermediate layer to allow user configure to choose what data to be included or excluded to see at the result image which 

Regarding Claim 7, D'Amato teaches  a method of generating a trained model, comprising (D’Amato, Paragraph [0017], [0054], FIG. 3 is a flow chart that illustrates a method of using an XR learning system to display a rendered model of an expert's hands performing a task on a user's XR device using a recording of the expert's hands. the CPM is trained to recognize the expert's hands. This can be accomplished by generating labelled training data where the representation of the expert's hands is actively measured and tracked by a secondary apparatus):  setting up a neural network having an input layer (D’Amato, Paragraph [0053], One method is the use of a convolutional pose machine (CPM), which is a type of DLN, to generate the bone-by-bone representation of the expert's hands. A CPM is a series of convolutional neural networks, each with multiple layers and nodes),  [[ one or more intermediate layers, ]]  and an output layer (D’Amato, Paragraph [0056], , an image recorded at a particular resolution, corresponding to a particular frame from a series of images in a video, can be used as input <read on input layer> to the CPM, which outputs <read on output layer> the 3D translational and rotational data of each bone in the expert's hands), the input layer being configured to receive captured image data capturing an image of a real space as seen from a user's point of view, [[ the one or more intermediate layers ]] having trainable parameters (D’Amato, Paragraph [0013], learning system that includes a motion capture system to record an expert's hands; [0072], The XR system 230 can also render the model of the expert's hands at variable speeds <read on trainable parameter>; [0076], those skilled in the art will readily appreciate that all parameters, dimensions, materials, and configurations described herein are meant to be exemplary and that the actual parameters, dimensions, materials, and/or configurations will depend upon the specific application or applications for which the inventive teachings is/are used), the output layer being configured to output segmentation data segmenting the captured image data into a first region in which a target object is displayed, the second region in which at least a part of the user's body is displayed (D’Amato, Paragraph [0017], using an XR learning system to display a rendered model of an expert's hands), [[ and a third region that is other than the first and second regions; ]] and training the one or more intermediate layers having the trainable parameters using training data (D’Amato, Paragraph [0054], In order to use the CPM to extract the representation of an expert performing a task, the CPM is trained to recognize the expert's hands. This can be accomplished by generating labelled training data where the representation of the expert's hands is actively measured and tracked. those skilled in the art will readily appreciate that all parameters, dimensions, materials, and configurations described herein are meant to be exemplary and that the actual parameters, dimensions, materials, and/or configurations will depend upon the specific application or applications for which the inventive teachings is/are used), the training data including first input image data having one of the target object (D’Amato, Paragraph [0068], display expert's hands slightly above the piano keys of a stationary piano <read on target object>) and the at least the part of the user's body in each image (D’Amato, Paragraph [0017], using an XR learning system to display a rendered model of an expert's hands <read on user’s body>), second input image data having both of the target object and the at least the part of the user's body in each image (D’Amato, Paragraph [0068], display expert's hands slightly above the piano keys of a stationary piano <read on target object>) [0017], using an XR learning system to display a rendered model of an expert's hands <read on user’s body>; [0027], The XR learning system provides the ability to both record and display an expert's hands <user’s body> while the expert performs a particular task),[[ and third input image data having neither of the target object and the at least the part of the user's body in each image, and correct answer data providing correct segmentation of the target object and the at least the part of the user's body in each image of the first, second, and third image data. ]]
But D’Amato does not explicitly disclose the one or more intermediate layers having trainable parameters, and a third region that is other than the first and second regions  third input image data having neither of the target object and the at least the part of the user's body in each image, 
However, Zhu teaches the one or more intermediate layers having trainable parameters (Zhu, Paragraph [0023], The artificial neural network comprise one or more fully connected networks including one or more activation layers, such as an input layer that receives an input (e.g. the image data (data elements) for the region of interest), one or more intermediate or "hidden" layers, and an output layer that provides a result ( e.g. indicating a classification of an object),and a third region (Zhu, Fig. 4, Element 420) that is other than the first and second regions (Zhu, Fig. 4, Element 416 (first region) and 418 (second region) Paragraph [0099], [0104], in a second frame 404 of the sequence, motion is detected between the first and second frames, and a first region of interest 416 and second region of interest 418 are selected based on the detected motion between the first and second frames. implemented at this stage, but only in respect of the image data for the third region of interest 420 in an attempt to classify any objects within that third region of interest 420) third input image data having neither of the target object and the at least the part of the user's body in each image (Zhu, Fig. 4, Element 420 <third image data> is neither of Element 418 of second region <read on target object> and Element 416 first region <read on user’s body>) and correct answer data providing correct segmentation of the target object and the at least the part of the user's body in each image of the first, second, and third (Zhu, FIg. 4, the final frame data Element 414 including first region 416, second region 416 and third region 418).
Zhu and D'Amato are analogous since both of them are dealing with using neural network to implementing the process image data. D'Amato provided a way of using learning system in conventional neural network during the image process by combing the user body part and the musical instrument in order for user to practice the musical instrument. Zhu provided a way of using neural network during the image process and using training model with multiple layers to display and/or hide images data as needed with the configured data. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention was made to incorporate training model taught by Zhu into modified invention of D'Amato such that 


Regarding Claim 8, it recites limitations similar in scope to the limitations of claim 6, but in a device. As shown in the rejection, the combination of D’Amato and Zhu disclose the limitations of claims 6. Additionally, D’Amato discloses an device that maps to Fig. 2C and Paragraph [0015] (D’Amato, Fig. 2C, Paragraph [0015], an exemplary XR device from FIG. 2A to display a recording of an expert's hands while a user is performing a manual task. Thus, Claim 8 is met by D’Amato according to the mapping presented in the rejection of claims 6, given the device corresponds to the system).

Regarding Claim 9, the combination of D’Amato and Zhu teaches the invention in Claim 8. 
The combination further teaches wherein the target object is an musical instrument  (D’Amato, Paragraph [0068], display expert's hands slightly above the piano keys of a stationary piano <read on target object>), and the part of the user's body is a part of the user playing the musical instrument (D’Amato, Paragraph [0004], rendering the model of the expert's hand comprises playing an audio recording of the musical instrument played), and wherein the compositing of the data of the first region and the data of the second region with the virtual space is performed in real time (D’Amato, Paragraph [0068], display expert's hands slightly above the piano keys of a stationary piano <read on target object of first region>;  [0017], using an XR learning system to display a rendered model of an expert's hands <second region> [0006], the camera may provide the series of images to the DLN in real time; This enables the processor to generate the model of the expert's hand and the XR device to render the model of the expert's hand in real time; [0026], As understood by those of skill in the art, XR refers to real-and-virtual combined environments and human-machine interactions generated by computer technology and wearables).

Regarding Claim 10, the combination of D’Amato and Zhu teaches the invention in Claim 8. 
The combination further teaches wherein the target object includes at least one of an musical instrument (D’Amato, Paragraph [0068], display expert's hands slightly above the piano keys of a stationary piano <read on target object>), [[ the drink, and the portable terminal device, ]]  and wherein the part of the user's body includes a part of the user touching the at least one of the musical instrument, [[ the drink, and the portable terminal device. ]]  (D’Amato, Paragraph [0004], rendering the model of the expert's hand comprises playing an audio recording of the musical instrument played [0019], is an image showing an example of an expert's hands playing a guitar).

Regarding Claim 11, the combination of D’Amato and Zhu teaches the invention in Claim 8. 
The combination further teaches wherein the trained model includes: an input layer that receives the captured image data capturing the image of the real space as seen from the user's point of view (D’Amato, Paragraph [0013], learning system that includes a motion capture system to record an expert's hands); an output layer that outputs the segmentation data segmenting the captured image data into the first region in which the target object is displayed (D’Amato, Paragraph [0068], display expert's hands slightly above the piano keys of a stationary piano <read on target object>), the second region in which at least the part of the user's body is displayed (D’Amato, Paragraph [0017], using an XR learning system to display a rendered model of an expert's hands <read on second region>), [[ and the third region that is other than the first and second regions;  and one or more intermediate layers having ]] parameters that have been trained using training data (D’Amato, Paragraph [0013], learning system that includes a motion capture system to record an expert's hands; [0072], The XR system 230 can also render the model of the expert's hands at variable speeds <read on trainable parameter>; [0076], those skilled in the art will readily appreciate that all parameters, dimensions, materials, and configurations described herein are meant to be exemplary and that the actual parameters, dimensions, materials, and/or configurations will depend upon the specific application or applications for which the inventive teachings is/are used),  the training data including first input image data having one of the target object (D’Amato, Paragraph [0068], display expert's hands slightly above the piano keys of a stationary piano <read on target object>) and the at least the part of the user's body in each image(D’Amato, Paragraph [0017], using an XR learning system to display a rendered model of an expert's hands <read on user’s body>), second input image data having both of the target object and the at least the part of the user's body in each image (D’Amato, Paragraph [0068], display expert's hands slightly above the piano keys of a stationary piano <read on target object>) [0017], using an XR learning system to display a rendered model of an expert's hands <read on user’s body>; [0027], The XR learning system provides the ability to both record and display an expert's hands <user’s body> while the expert performs a particular task), [[  and third input image data having neither of the target object and the at least the part of the user's body in each image, and correct answer data providing correct segmentation of the target object and the at least the part of the user's body in each image of the first, second, and third image data. ]]
D’Amato does not explicitly disclose but Zhu teaches and the third region that is other than the first and second regions (Zhu, Fig. 4, Element 416 (first region) and 418 (second region) Paragraph [0099], [0104], in a second frame 404 of the sequence, motion is detected between the first and second frames, and a first region of interest 416 and second region of interest 418 are selected based on the detected motion between the first and second frames. implemented at this stage, but only in respect of the image data for the third region of interest 420 in an attempt to classify any objects within that third region of interest 420); and one or more intermediate layers having parameters that have been trained using training data (Zhu, Paragraph [0023], The artificial neural network comprise one or more fully connected networks including one or more activation layers, such as an input layer that receives an input (e.g. the image data (data elements) for the region of interest), one or more intermediate or "hidden" layers, and an output layer that provides a result ( e.g. indicating a classification of an object)
Zhu and D'Amato are analogous since both of them are dealing with using neural network to implementing the process image data. D'Amato provided a way of using learning system in conventional neural network during the image process by combing the user body part and the musical instrument in order for user to practice the musical instrument. Zhu provided a way of using neural network during the image process and using training model with multiple layers to display and/or hide images data as needed with the configured data. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention was made to incorporate training model taught by Zhu into modified invention of D'Amato such that during the image processing system will be able to using multiple layers in the training model including intermediate layer to allow user configure to choose what data to be included or excluded to see at the result image which increase the flexibility and functionality of the image process system using neural network.

Regarding Claim 1, it recites limitations similar in scope to the limitations of Claim 6 but as a method and the combination of D'Amato and Zhu teaches all the limitations as of Claim 6. Therefore is rejected under the same rationale.

Regarding Claim 2, it recites limitations similar in scope to the limitations of Claim 9 and therefore is rejected under the same rationale.

Regarding Claim 3, it recites limitations similar in scope to the limitations of Claim 10 and therefore is rejected under the same rationale.

Regarding Claim 4, it recites limitations similar in scope to the limitations of Claim 11 and therefore is rejected under the same rationale.


Claims 5, 12 is/are rejected under 35 U.S.C. 103 as being unpatentable over D'AMATO et al. (US 20180315329 A1, hereinafter D’Amato), in view of Zhu et al. (US 20200082544 A1, hereinafter Zhu) as applied to Claim 1, 8 above respectively and further in view of Sawaki (US 20180373328 A1)

Regarding Claim 12, the combination of D’Amato and Zhu teaches the invention in Claim 8. 
The combination does not explicitly disclose but Swaki teaches wherein the one or more processors (Swaki, Fig. 2, Element 210 Processor) composite the data of the first region with the virtual space image data (Swaki, Paragraph [0094], ] The processor 210 of the computer 200 defines a field-of-view region 15 in the virtual space 11 based on the position and inclination (reference line of sight 16) of the virtual camera) when the one or more processors determine that the user's line-of-sight (Swaki, Paragraph [0171], the processor 210A of the HMD set 11A acquires avatar information for determining a motion of the avatar object. the face tracking data is data representing motions of parts forming the face of the user 5A and line-of-sight data), and wherein the one or more processors do not composite the compositing of the data of the first region with the virtual space image data when the one or more processors determine that the user's line-of-sight data indicates that the user is not looking at the target object (Swaki, Paragraph [0006], The method further includes identifying an eye gaze position of the first virtual line of sight in accordance with the first virtual line of sight. The method further includes defining a predetermined condition relating to an interest of the first user. [0108], While the user 5 is wearing the HMD 120 (having a non-transmissive monitor 130), the user 5 can visually recognize only the panorama image 13 developed in the virtual space 11 without visually recognizing the real world).
Swaki and D'Amato are analogous since both of them are dealing with using neural network to implementing the process image data. D'Amato provided a way of using learning system in conventional neural network during the image process by combing the user body part and the musical instrument in order for user to practice the musical instrument. Swaki provided a way of using neural network during the image process and using line-of-sight data to allow system to display the focus area based on the user’s field of view. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention was made to incorporate line-of-sight taught by Swaki into modified invention of D'Amato such that during the 


Regarding Claim 5, it recites limitations similar in scope to the limitations of Claim 12 and therefore is rejected under the same rationale.

Conclusion

The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
US 20200275976 A1 Algorithm-based optimization for knee arthroplasty procedures
US 10,643,593 B1	Prediction-based communication latency elimination in a
distributed virtualized orchestra.
US 11,127,148 B1	Parallax correction for partially overlapping stereo depth images
	
Any inquiry concerning this communication or earlier communications from the examiner should be directed to YUJANG TSWEI whose telephone number is (571)272-6669. The examiner can normally be reached 8:30am-5:30pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an 
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kent Chang can be reached on (571) 272-7667. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/YuJang Tswei/Primary Examiner, Art Unit 2619