DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . 

Response to Amendment
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant’s submission filed 3/21/2022 has been entered. The claim 1 has been amended. The claims 1-7 are pending in the current application. 

Response to Arguments
Applicant’s arguments filed 3/21/2022 with respect to the amended claim 1 and similar claims have been considered but are moot because the new ground of rejection set forth in the current Office Action based on the newly cited references.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim 1 is rejected under 35 U.S.C. 103 as being unpatentable over Malagawa et. al. US-PGPUB No. 20210181854 (hereinafter Malagawa) in view of Parland US-PGPUB No. 2020/0393909 (hereinafter Parland) and Tadros et al. US-PGPUB No. 2022/0164097 (hereinafter Tadros). 
Re Claim 1: 
Malagawa teaches a method for generating realistic content based on a motion of a user, comprising: 
generating a video of the user by a camera (
Malagawa teaches at Paragraph -119 that the smartphone 11 can use various methods such as using a stereo camera to recognize the 3D position of the fingertip and at Paragraph 0032 that by following a locus of the user’s fingertip, the AR display application can display on the AR image display screen 13 an AR image obtained by superimposing a virtual drawing image 14 drawn by a line following the locus of the fingertip on the image of a real space captured by the image capturing device. 
Malagawa teaches at FIG. 9 and Paragraph 0097 generating a video of the user by a camera and at Paragraph 0038-0039 that the operation on the line drawing operation button 22 may switch the start or finish of the creation of the virtual drawing image 14 each time the touch is performed….the line width operation panel 23 is a GUI for continuously operating changes to the width of the line representing the virtual drawing image 14); 
recognizing a hand motion of the user from the generated video (
Malagawa teaches FIG. 9 and Paragraph 004 that the AR display application can simultaneously recognize a plurality of indication points, e.g., simultaneously recognize the fingertips of the user A and the user B to create two virtual drawing images 14b and 14c); 
deriving hand coordinates depending on a shape of a hand and position of the hand based on the recognized hand motion for drawing a picture of an object (Malagawa teaches at Paragraph 0034 that the AR display application can generate virtual drawing data for display the virtual drawing image 14, for example, data indicating the locus of the fingertip represented by the absolute coordinate system on a real space, according to the absolute coordinate system on a real space. This allows the AR display application to display the created virtual drawing image 14 on the AR image display screen 13 from all directions like the virtual drawing image 14 is virtually placed on a real space); 
outputting the picture of the object on an output screen based on the derived hand coordinates after recognizing the hand motion indicating that the drawing is completed (Malagawa teaches at Paragraph 0034 that the AR display application can generate virtual drawing data for display the virtual drawing image 14, for example, data indicating the locus of the fingertip represented by the absolute coordinate system on a real space, according to the absolute coordinate system on a real space. This allows the AR display application to display the created virtual drawing image 14 on the AR image display screen 13 from all directions like the virtual drawing image 14 is virtually placed on a real space);
Malagawa does not explicitly teach, but suggest the claim limitation: 
pre-processing the output picture of the object based on a correction algorithm;  
generating realistic 3D content of the object from the pre-processed picture based on a deep learning model on the output screen; and 
providing the generated realistic 3D content of the object to the user in a virtual space. 
However, Tadros/Parland teaches the claim limitation that: 
generating a video of the user by a camera (Parkland teaches at Paragraph 0064 a live streaming video of a presenter making gestures and at Paragraph 0068 that the gesture detection module 202 is activated such that it will continuously monitor for and detect triggers for the remainder of the video session); 
pre-processing the output picture of the object based on a correction algorithm (
Tadros teaches at Paragraph 0049 that the machine learning component 112 receives the representation of the user 101 gesture input and predicts using the trained machine learning model design coordinates describing an intended design based on the representation of the user 101 gesture input…a preprocessor receives the representation of the gesture input as an image and generates a re-scaled, reshaped image matrix…the machine learning component is configured to predict design coordinates for representations of a specific size…the application 111 resizes the representation of the user 101 gesture input as part of generating the rescaled, reshaped image matrix based on the resized representation….the application 111 reshapes a 3D representation based on a logged user gesture input…as part of generating the rescaled, reshaped image matrix….the preprocessor converts the representation of the user gesture input that is in the form of an image file into the input image matrix. 
Tadros teaches at Paragraph 0033 that the application 111 applies the image recognition model (correction algorithm) to determine an intended design based on a representation of the user 101 gesture input and at Paragraph 0058-0059 that the user may not input a geometrically correct square and may not draw the line extending exactly from the center of the square and the application 113 predicts intended design coordinates 301-A corresponding to the intended design of the user. 
Tadros teaches at FIG. 3 and Paragraph 0057 pre-processing the output picture of the object based on a correction algorithm by generating the intended design coordinates interpreted from the user interface gestures, for example, generating the intended design coordinates 301-A, 301-B, 301C and 301D corrected from 300-A, 300-B, 300-C and 300D wherein the intended design coordinates may be computer code representations of intended designs…pixel maps of intended designs. 
Parland teaches at Paragraph 0076-0080 that comparing the detected gesture with known gestures that are stored in a gesture library in database 126..If the detected gesture matches a known or approved gesture in the gesture library, then the gesture interpretation module 205 initiates the drawing module 206…..an administrator may determine that dots, lines, and circles are three types of gestures that should be recognized for subsequent gesture visualization…..if the recorded gesture falls within a certain standard deviation of the standard base gesture, then a match is determined and at Paragraph 0085-0086 that the gesture interpretation module 205 is configured to identify an intended location of the gesture…a presenter may gesture in the general direction of the content being displayed rather than walking towards a different portion of the display screen where the content is being displayed. The gesture interpretation module 205 is configured to identify the intended location or placement of gesture drawing….the gesture interpretation module 205 may use one or more sensors to evaluate the angle at which a presenter’s finger is pointing and calculate a three-dimensional coordinate of the intended location of the gesture….and subsequently work with the drawing module 206 to draw in the intended location.    
Parland teaches at Paragraph 0119-0120 that a gesture is detected during a video stream using a camera…the gesture detection module 202 uses a trained machine learning model to detect a gesture triggers….if a presenter uses a finger to underline a word, then each frame that features the gesture of the finger’s underlining will have an associated coordinate grid created by the drawing module 206. Subsequently, the drawing module 206 will digitally mark the coordinates of the gesture from each frame and connect the marks from frame to frame to generate the digital drawing. The digital drawing is then stored as a gesture layer that is combined or layered on top of the original video stream. 
Parland teaches at Paragraph 0097-0098 that the drawing module 206 works in conjunction with the gesture interpretation module 205 to draw or stamp approved, known gesture shapes that are stored in the database 126…The drawing module 206 is configured to estimate a size for the pre-fabricated gesture shapes based on the presenter’s gesture, estimate a placement or location and/or referenced content as previously discussed and then stamp the pre-fabricated gesture shape at the determined location and/or in association with the referenced content);  
generating realistic 3D content of the object from the pre-processed picture based on a deep learning model on the output screen; and 
providing the generated realistic 3D content of the object to the user in a virtual space (Tadros teaches at Paragraph 0049 that the machine learning component 112 receives the representation of the user 101 gesture input and predicts using the trained machine learning model design coordinates describing an intended design based on the representation of the user 101 gesture input…a preprocessor receives the representation of the gesture input as an image and generates a re-scaled, reshaped image matrix…the machine learning component is configured to predict design coordinates for representations of a specific size…the application 111 resizes the representation of the user 101 gesture input as part of generating the rescaled, reshaped image matrix based on the resized representation….the application 111 reshapes a 3D representation based on a logged user gesture input…as part of generating the rescaled, reshaped image matrix….the preprocessor converts the representation of the user gesture input that is in the form of an image file into the input image matrix…The machine learning model provides design coordinates which indicate the intended dimensions, orientation or other features of the design intended by the user from the matrix representation. The machine model provides an output comprising the design coordinates. 
Tadros teaches at Paragraph 0031 that the application 11 renders an updated 3D space for display via the user interface 113 including the object added to the 3D virtual space and at Paragraph 0035 that the user computing device 110 receives the trained machine learning model as the machine learning component 112 of the application 111 when downloading the application 111. 
Tadros teaches at Paragraph 0058-0059 that the application 111 (executing the machine learning component 112 as shown in FIG. 1 and Paragraph 0031) generates a user interface 113 input by mapping the intended design coordinates 301-A to the user interface and then generates output in virtual environment 302-A in response to the user interface 113 input. The output in virtual environment 300-A comprises a cube displayed in the 3D virtual space. The cube constitutes the realistic 3D content of the object. 
Tadros teaches at FIG. 3 pre-processing the output picture of the object based on a correction algorithm by generating the intended design coordinates interpreted from the user interface gestures, for example, generating the intended design coordinates 301-A, 301-B, 301C and 301D corrected from 300-A, 300-B, 300-C and 300D and outputting the respective virtual outputs 302-A, 302-B, 302-C and 302-D derived from the respective intended design coordinates. . 
Parland teaches at Paragraph 0082 that the gesture interpretation module 205 initiates the drawing module 206 for further processing. Parland teaches at Paragraph 0085-0086 that the gesture interpretation module 205 is configured to identify an intended location of the gesture…a presenter may gesture in the general direction of the content being displayed rather than walking towards a different portion of the display screen where the content is being displayed. The gesture interpretation module 205 is configured to identify the intended location or placement of gesture drawing….the gesture interpretation module 205 may use one or more sensors to evaluate the angle at which a presenter’s finger is pointing and calculate a three-dimensional coordinate of the intended location of the gesture….and subsequently work with the drawing module 206 to draw in the intended location.    

Parland teaches at FIGS. 5A-5B and Paragraph 0114 that the dotted oval represents the movement of the gesture 51 and the gesture visualization 512 has been generated and displayed through the use of the drawing module 206. 
Parland teaches at Paragraph 0094-0097 that the drawing module 206 will follow the gesture created by the finger and draw a circle and a coordinate grid is used to generate a free-form digital drawing of the gesture….each frame’s coordinates are then combined to create a single digital drawing of the gesture…a mark such as a dot is used to indicate the coordinate position of the gesture from one frame to the next…..The drawing library includes standard shapes for a dot, a line, a circle, a cylinder, a triangle, a square, a rectangle, an arrow and at Paragraph 0101 that the stitching module 208 accesses the original video and the gesture layer stored in the database 126 in order to stitch the video and the gesture layer together to generate a gesture visualization.  
Parland teaches at FIG. 5B and Paragraph 0114 that the gesture visualization 512 has been generated and displayed through the use of drawing module 206, the stitching module 208 and the display module 210. 
Parland teaches at FIGS. 5A-5B and Paragraph 0114 that the dotted oval represents the movement of the gesture 51 and the gesture visualization 512 has been generated and displayed through the use of the drawing module 206. 
Parland teaches at Paragraph 0061 that a computing device may run known input data through a deep neural network 300 in an attempt to compute a particular known output and 0068 that a neural network may be used to train a machine learning algorithm or model to detect a gesture completion such as moving a finger or hand away from a projection screen and at Paragraph 0086 that an unsupervised machine learning model may be used to learn an intended location of a gesture drawing and subsequently work with the drawing module 206 to draw in the intended location. 
Parland teaches at Paragraph 0119-0120 that a gesture is detected during a video stream using a camera…the gesture detection module 202 uses a trained machine learning model to detect a gesture triggers….if a presenter uses a finger to underline a word, then each frame that features the gesture of the finger’s underlining will have an associated coordinate grid created by the drawing module 206. Subsequently, the drawing module 206 will digitally mark the coordinates of the gesture from each frame and connect the marks from frame to frame to generate the digital drawing. The digital drawing is then stored as a gesture layer that is combined or layered on top of the original video stream). 
It would have been obvious to one of the ordinary skill in the art before the filing date of the instant application to further incorporated Tadros’s processing of the object for generating realistic 3D content of the object using the machine learning component comprising a convolutional neural network and/or Parkland’s deep learning model for correcting the rough gesture inputs to generate the standard realistic gesture outputs into the method and system of Malagawa. One of the ordinary skill in the art would have been motivated to have provided machine learning models to have recognized the corrected hand gestures based on the rough gesture inputs. 

Claim 2 is rejected under 35 U.S.C. 103 as being unpatentable over Malagawa et. al. US-PGPUB No. 20210181854 (hereinafter Malagawa) in view of Parland US-PGPUB No. 2020/0393909 (hereinafter Parland); Tadros et al. US-PGPUB No. 2022/0164097 (hereinafter Tadros); 
Hasegawa US-PGPUB No. 2011/0221768 (hereinafter Hasegawa) and Shimura et al. US-Patent No. 10,186,057 (hereinafter Shimura). 
Re Claim 2: 
The claim 2 encompasses the same scope of invention as that of the claim 1 except additional claim limitation that the outputting of the picture on the output screen includes: outputting the picture in a picture layer on the output screen; and generating a user interface (Ul) menu on the output screen based on a length of an arm from the recognized hand motion, and the Ul menu allows line color and thickness of the picture to be changed.
Hasegawa teaches outputting the picture in a picture layer on the output screen; and generating a user interface (Ul) menu on the output screen based on a length of an arm from the recognized hand motion (Hasegawa teaches at Paragraph 0075 that the area in the image where the GUI elements may be disposed is changed (corrected) and the GUI elements are disposed at appropriate positions and at Paragraph 0059 that the shape of the hand of the person in the image is recognized and a gesture is recognized based on changes in the shape pattern and/or position of the hand. For example, coincidence of a GUI element and the hand and an operation of a GUI element by the hand are recognized), but does not teach the claim limitation that the UI menu allows line color and thickness of the picture to be changed
Shimura further teaches the claim limitation that the UI menu allows line color and thickness of the picture to be changed (Shimura teaches at FIGS. 24-26 and 28 and column 32, lines 48-65 causing the display of a selection menu to select the drawing from and selecting the line thickness from the selection menu and at column 33, lines 1-36 that the display control unit 221b may establish the color of the line based on color space information indicating a pre-established relationship of correspondence between the rearward coordinate and the color…the rearward coordinate is not restricted to being associated with the line thickness or color and may be associated with a tone density or enlargement ratio. 
Shimura teaches at column 35, lines 34-50 that the line thickness can be selected in accordance with the rearward coordinate of the specified position….this guidance image shows by text and a graphic that the thickness of the displayed line can be made thicker by moving the hand related to drawing input toward the display and thinner by moving the hand toward user).
It would have been obvious to one of the ordinary skill in the art before the filing date of the instant application to have used the UI menu to have changed the line color or the line thickness of the strokes in response to the user’s gesture recognition. One of the ordinary skill in the art would have been motivated to have further modified the attributes of the drawn picture through the UI elements. 

Claims 3-5 are rejected under 35 U.S.C. 103 as being unpatentable over Malagawa et. al. US-PGPUB No. 20210181854 (hereinafter Malagawa) in view of Parland US-PGPUB No. 2020/0393909 (hereinafter Parland); Tadros et al. US-PGPUB No. 2022/0164097 (hereinafter Tadros); Hasegawa US-PGPUB No. 2011/0221768 (hereinafter Hasegawa); Shimura et al. US-Patent No. 10,186,057 (hereinafter Shimura) and Tanimura US-PGPUB No. 2015/0220155 (Tanimura). 
Re Claim 3: 
The claim 3 encompasses the same scope of invention as that of the claim 2 except additional claim limitation that the pre-processing includes: producing equations of lines based on coordinates of the output picture; comparing slopes of the produced equations; and changing the lines to a straight line based on the comparison result. 
Tanimura teaches the claim limitation that the pre-processing includes: producing equations of lines based on coordinates of the output picture; comparing slopes of the produced equations; and changing the lines to a straight line based on the comparison result (Tanimura teaches at Paragraph 0056-0058 that a sequence of user’s hand motions will be referred to as a trajectory and the controller 150 determines whether or not the trajectory of the user’s hand corresponds to any one of the movement patterns by checking the trajectory of user’s hand with the feature data and FIGS. 7B and 8C and Paragraph 0093 the movement trajectory 2 is calculated as a linear trajectory having a slope zero based on the extracted feature data in the movement-pattern information table…a best-fit straight line (a regression line) may be obtained using the centroid coordinate data acquired in the step S103 to calculate a degree of similarity between the hand movement trajectory and the approximate straight line). 
It would have been obvious to one of the ordinary skill in the art before the filing date of the instant application to have approximated the feature data in the movement-pattern information table using a linear equation of a regression line. One of the ordinary skill in the art would have been motivated to have corrected the drawn stroke using a straight line. 
Re Claim 4: 
The claim 4 encompasses the same scope of invention as that of the claim 3 except additional claim limitation that the pre-processing further includes: defining a variable located on the lines; generating a new line based on the defined variable; and correcting a curve based on the generated new line and a trajectory of the defined variable.
Tanimura teaches the claim limitation that the pre-processing further includes: defining a variable located on the lines; generating a new line based on the defined variable; and correcting a curve based on the generated new line and a trajectory of the defined variable (Tanimura teaches at Paragraph 0056-0058 that a sequence of user’s hand motions will be referred to as a trajectory and the controller 150 determines whether or not the trajectory of the user’s hand corresponds to any one of the movement patterns by checking the trajectory of user’s hand with the feature data and FIGS. 7B and 8C and Paragraph 0093 the movement trajectory 2 is calculated as a linear trajectory having a slope zero based on the extracted feature data in the movement-pattern information table…a best-fit straight line (a regression line) may be obtained using the centroid coordinate data acquired in the step S103 to calculate a degree of similarity between the hand movement trajectory and the approximate straight line. 
Tanimura shows at FIG. 7B and 8C-8D and Paragraph 0064-0072 the movement trajectory extractor 152 transcribes the plurality of specified centroid coordinates on the same coordinate plane and links the plurality of transcribed centroid coordinates…thereby extracting the trajectory of the humanhand…extracts the movement trajectory that has a movement pattern matching the feature data among the extracted trajectories by checking the extracted trajectory information with the feature data contained in the movement pattern information table…two movement trajectories of the movement pattern that match the feature of the feature data 2. Accordingly, the new trajectory 2 is generated based on the linear regression variable. 
Tanimura teaches that a movement trajectory 2 is calculated as in the leftward/rightward linear movement after the controller 150 analyzes image data that is picked up by the imager 130 and stored in the image memory and determines to which one of the plurality of pieces of combination information stored in the command table the trajectory of user’s hand corresponds. 
). 
It would have been obvious to one of the ordinary skill in the art before the filing date of the instant application to have approximated the feature data in the movement-pattern information table using a linear equation of a regression line. One of the ordinary skill in the art would have been motivated to have corrected the drawn stroke using a straight line. 
Re Claim 5: 
The claim 5 encompasses the same scope of invention as that of the claim 4 except additional claim limitation that the pre-processing further includes: extracting the picture layer from the output screen; and cropping the pre-processed picture from the extracted picture layer based on the hand coordinates.
Tanimura teaches the claim limitation that that the pre-processing further includes: extracting the picture layer from the output screen; and cropping the pre-processed picture from the extracted picture layer based on the hand coordinates (Tanimura shows at FIG. 7B and 8C-8D and at Paragraph 0076 cropping the coverage occupied by movement trajectory B2 or the movement trajectory 2 wherein the relative information acquirer 153 specifies an coverage occupied by the trajectory 2, for example using the ranges of the minimum rectangular enclosed with dashed lines enclosing the movement trajectory 2. 
Tanimura teaches at Paragraph 0056-0058 that a sequence of user’s hand motions will be referred to as a trajectory and the controller 150 determines whether or not the trajectory of the user’s hand corresponds to any one of the movement patterns by checking the trajectory of user’s hand with the feature data and FIGS. 7B and 8C and Paragraph 0093 the movement trajectory 2 is calculated as a linear trajectory having a slope zero based on the extracted feature data in the movement-pattern information table…a best-fit straight line (a regression line) may be obtained using the centroid coordinate data acquired in the step S103 to calculate a degree of similarity between the hand movement trajectory and the approximate straight line). 
It would have been obvious to one of the ordinary skill in the art before the filing date of the instant application to have approximated the feature data in the movement-pattern information table using a linear equation of a regression line. One of the ordinary skill in the art would have been motivated to have corrected the drawn stroke using a straight line. 


Claims 6 and 7 are rejected under 35 U.S.C. 103 as being unpatentable over Malagawa et. al. US-PGPUB No. 20210181854 (hereinafter Malagawa) in view of Parland US-PGPUB No. 2020/0393909 (hereinafter Parland) and Tadros et al. US-PGPUB No. 2022/0164097 (hereinafter Tadros); Grundhoefer et al. US-PGPUB No. 2020/0401804 (hereinafter Grundhoefer); Faulkner et al. US-PGPUB No. 2021/0096726 (hereinafter Faulkner). 
Re Claim 6: 
The claim 6 encompasses the same scope of invention as that of the claim 1 except additional claim limitation that generating realistic 3D content of the object from the pre-processed picture based on the deep learning model comprises: picture image learning by the deep learning model using an open graffiti data set, wherein the open graffiti data set comprises coordinate data of an image, and inputting the pre-processed picture into the deep learning model to generate the realistic 3D content of the object based on the coordinate data of the image from the open graffiti data set. 
Grundhoefer in view of Faulkner further teaches the claim limitation that generating realistic 3D content of the object from the pre-processed picture based on the deep learning model comprises: picture image learning by the deep learning model using an open graffiti data set, wherein the open graffiti data set comprises coordinate data of an image, and inputting the pre-processed picture into the deep learning model to generate the realistic 3D content of the object based on the coordinate data of the image from the open graffiti data set. 
In other words, Grundhoefer in view of Faulkner teaches the claim limitation that generating realistic 3D content of the object from the pre-processed picture based on the deep learning model comprises: picture image learning by the deep learning model using an open graffiti data set (Grundhoefer teaches at Paragraph 0069 “the sensor is used to estimate a shape, orientation, and positon of the hand of an HMD user” and at Paragraph 0070-0071 that the method 500 detects a location of an object based on the image wherein the object is a hand, hand and part of an arm, a body part…the surface of the object is a 3D area corresponding to the detected object or a 3D area corresponding to the detected object….a depth sensor may be used to construct and analyze a depth map to reconstruct the HMD user’s hand and track movement and pose, e.g., position and orientation of the hand…a 3D camera can use machine learning techniques to fit a hand model to the 3D image for detecting and tracking the hand. 
It is noted Grundhoefer implicitly teaches determining the hand’s coordinates while Faulkner explicitly teaches determining the hand’s coordinates relative to a coordinate system. Faulkner teaches at Paragraph 0079 “hand tracking device 140 is controlled by hand tracking unit 243 to track the position/location of one or more portions of the user’s hands…relative to a coordinate system defined relative to the user’s hand” and at Paragraph 0082-0084 “the image sensors 404 project a pattern of spots onto a scene containing the hand 406…the controller 110 computes the 3D coordinates of points in the scene including points on the surface of the user’s hand by triangulation…the hand tracking device 440 may use other methods of 3D mapping…the controller 110 processes the 3D map data to extract patch descriptors of the hand in these depth maps…The pose typically includes 3D locations of the user’s hand joints and finger tips”), wherein the open graffiti data set comprises coordinate data of an image (Grundhoefer teaches at Paragraph 0069 “the sensor is used to estimate a shape, orientation, and positon of the hand of an HMD user” and at Paragraph 0070-0071 that the method 500 detects a location of an object based on the image wherein the object is a hand, hand and part of an arm, a body part…the surface of the object is a 3D area corresponding to the detected object or a 3D area corresponding to the detected object….a depth sensor may be used to construct and analyze a depth map to reconstruct the HMD user’s hand and track movement and pose, e.g., position and orientation of the hand…a 3D camera can use machine learning techniques to fit a hand model to the 3D image for detecting and tracking the hand. 
It is noted Grundhoefer implicitly teaches determining the hand’s coordinates while Faulkner explicitly teaches determining the hand’s coordinates relative to a coordinate system. Faulkner teaches at Paragraph 0079 “hand tracking device 140 is controlled by hand tracking unit 243 to track the position/location of one or more portions of the user’s hands…relative to a coordinate system defined relative to the user’s hand” and at Paragraph 0082-0084 “the image sensors 404 project a pattern of spots onto a scene containing the hand 406…the controller 110 computes the 3D coordinates of points in the scene including points on the surface of the user’s hand by triangulation…the hand tracking device 440 may use other methods of 3D mapping…the controller 110 processes the 3D map data to extract patch descriptors of the hand in these depth maps…The pose typically includes 3D locations of the user’s hand joints and finger tips”), and inputting the pre-processed picture into the deep learning model to generate the realistic 3D content of the object based on the coordinate data of the image from the open graffiti data set (Grundhoefer teaches at Paragraph 0069 “the sensor is used to estimate a shape, orientation, and positon of the hand of an HMD user” and at Paragraph 0070-0071 that the method 500 detects a location of an object based on the image wherein the object is a hand, hand and part of an arm, a body part…the surface of the object is a 3D area corresponding to the detected object or a 3D area corresponding to the detected object….a depth sensor may be used to construct and analyze a depth map to reconstruct the HMD user’s hand and track movement and pose, e.g., position and orientation of the hand…a 3D camera can use machine learning techniques to fit a hand model to the 3D image for detecting and tracking the hand. 
It is noted Grundhoefer implicitly teaches determining the hand’s coordinates while Faulkner explicitly teaches determining the hand’s coordinates relative to a coordinate system. Faulkner teaches at Paragraph 0079 “hand tracking device 140 is controlled by hand tracking unit 243 to track the position/location of one or more portions of the user’s hands…relative to a coordinate system defined relative to the user’s hand” and at Paragraph 0082-0084 “the image sensors 404 project a pattern of spots onto a scene containing the hand 406…the controller 110 computes the 3D coordinates of points in the scene including points on the surface of the user’s hand by triangulation…the hand tracking device 440 may use other methods of 3D mapping…the controller 110 processes the 3D map data to extract patch descriptors of the hand in these depth maps…The pose typically includes 3D locations of the user’s hand joints and finger tips”). 

Faulkner teaches at Paragraph 0083-0084 that the hand tracking device 140 captures and processes a temporal sequence of depth maps containing the user’s hand…The software matches these descriptors of the hand to patch descriptors scored in a database 408 based on a prior learning process in order to estimate the pose of the hand in each frame. The pose typically includes 3D locations of the user’s hand joints and finger tips…The software may also analyze the trajectory of the hands and/or fingers over multiple frames in the sequence in order to identify gestures. Faulkner teaches at Paragraph 0209 that this upward flick gesture across the middle of the index finger using the thumb causes a currently selected user interface object to be pushed into the 3D environment and initiates an immersive experience, e.g., a 3D movie, or 3D virtual experience and at Paragraph 0259 displaying the indication of one or more interaction options available for the first object such as a 3D character. Faulkner’s hand tracking and recognition features can be readily combined with Grunhoefer’s hand tracking and recognition features so that the 3D virtual content can be displayed in association with the recognized hand gestures based on a deep learning model. 
Grundhoefer teaches at Paragraph 0039 and Paragraph 0047 that the CGR virtual content location unit 242 is configured to determine a virtual content location to place virtual content based on a detected body part or object held by a body part. The CGR presentation unit 244 is configured to present virtual content, e.g., 3D content that will be used as part of CGR environments for one or more users. 
Grundhoefer teaches at Paragraph 0060 that hand tracking functionality at the HMD supports gesture recognition…hand gesture recognition using machine learning or other image-based recognition technique is used. Grundhoefer teaches at Paragraph 0070 that the object is a hand, a hand and part of an arm, a body part, or an object held by the body part of the HMD user and at Paragraph 0071 the object, e.g., the HMD user’s hand, is detected or tracked. For example, a depth sensor may be used to construct and analyze a depth map to reconstruct the HMD user’s hand and track movement and pose of the hand…the hand can be detected and tracked using two or more cameras that generate a 3D map of the physical environment…the hand can be tracked using SLAM processes…a 2D camera can use machine learning techniques to fit a hand model to the 2D images for detecting and tracking the hand….a pre-scanned geometric representation of the HMD user’s hand can be used with a video stream of the physical environment to detect and track the hand in the video stream and at Paragraph 0073-0074 that detecting an object of a particular type, e.g., hand, and having a particular surface characteristic, e.g., flat palm visible, could be a trigger for displaying the CGR virtual content based on a surface of the object…other triggers, e.g., hand gestures, could be used to initiate the display of the CGR virtual content at the HMD user’s hand, e.g., palm facing in and circled to the right.  
It would have been obvious to one of the ordinary skill in the art before the filing date of the instant application to have corrected/reconstructed/fitted the shape and/or position and/or type of the drawn picture model based on the detected gestures using a trained machine learning model. One of the ordinary skill in the art would have rendered a modified realistic content at a location adapted to the location of the drawn picture model based on a deep learning model.  
Re Claim 7: 
The claim 7 encompasses the same scope of invention as that of the claim 6 except additional claim limitation that a realistic content generating unit is used that comprises an object detection algorithm and performs a process including extracting a candidate area as a position of an object from the pre-processed picture and classifying a class of the extracted candidate area. 
Grundhoefer in view of Faulkner teaches the claim limitation that a realistic content generating unit is used that comprises an object detection algorithm and performs a process including extracting a candidate area as a position of an object from the pre-processed picture and classifying a class of the extracted candidate area (Grundhoefer teaches at Paragraph 0069 “the sensor is used to estimate a shape, orientation, and positon of the hand of an HMD user” and at Paragraph 0070-0071 that the method 500 detects a location of an object based on the image wherein the object is a hand, hand and part of an arm, a body part…the surface of the object is a 3D area corresponding to the detected object or a 3D area corresponding to the detected object….a depth sensor may be used to construct and analyze a depth map to reconstruct the HMD user’s hand and track movement and pose, e.g., position and orientation of the hand…a 3D camera can use machine learning techniques to fit a hand model to the 3D image for detecting and tracking the hand. 
Faulkner teaches at Paragraph 0083-0084 that the hand tracking device 140 captures and processes a temporal sequence of depth maps containing the user’s hand…The software matches these descriptors of the hand to patch descriptors scored in a database 408 based on a prior learning process in order to estimate the pose of the hand in each frame. The pose typically includes 3D locations of the user’s hand joints and finger tips…The software may also analyze the trajectory of the hands and/or fingers over multiple frames in the sequence in order to identify gestures. Faulkner teaches at Paragraph 0209 that this upward flick gesture across the middle of the index finger using the thumb causes a currently selected user interface object to be pushed into the 3D environment and initiates an immersive experience, e.g., a 3D movie, or 3D virtual experience and at Paragraph 0259 displaying the indication of one or more interaction options available for the first object such as a 3D character. Faulkner’s hand tracking and recognition features can be readily combined with Grunhoefer’s hand tracking and recognition features so that the 3D virtual content can be displayed in association with the recognized hand gestures based on a deep learning model. 
Grundhoefer teaches at Paragraph 0039 and Paragraph 0047 that the CGR virtual content location unit 242 is configured to determine a virtual content location to place virtual content based on a detected body part or object held by a body part. The CGR presentation unit 244 is configured to present virtual content, e.g., 3D content that will be used as part of CGR environments for one or more users. 
Grundhoefer teaches at Paragraph 0060 that hand tracking functionality at the HMD supports gesture recognition…hand gesture recognition using machine learning or other image-based recognition technique is used. Grundhoefer teaches at Paragraph 0070 that the object is a hand, a hand and part of an arm, a body part, or an object held by the body part of the HMD user and at Paragraph 0071 the object, e.g., the HMD user’s hand, is detected or tracked. For example, a depth sensor may be used to construct and analyze a depth map to reconstruct the HMD user’s hand and track movement and pose of the hand…the hand can be detected and tracked using two or more cameras that generate a 3D map of the physical environment…the hand can be tracked using SLAM processes…a 2D camera can use machine learning techniques to fit a hand model to the 2D images for detecting and tracking the hand….a pre-scanned geometric representation of the HMD user’s hand can be used with a video stream of the physical environment to detect and track the hand in the video stream and at Paragraph 0073-0074 that detecting an object of a particular type, e.g., hand, and having a particular surface characteristic, e.g., flat palm visible, could be a trigger for displaying the CGR virtual content based on a surface of the object…other triggers, e.g., hand gestures, could be used to initiate the display of the CGR virtual content at the HMD user’s hand, e.g., palm facing in and circled to the right).   
It would have been obvious to one of the ordinary skill in the art before the filing date of the instant application to have corrected/reconstructed/fitted the shape and/or position and/or type of the drawn picture model based on the detected gestures using a trained machine learning model. One of the ordinary skill in the art would have rendered a modified realistic content at a location adapted to the location of the drawn picture model based on a deep learning model.  

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JIN CHENG WANG whose telephone number is (571)272-7665. The examiner can normally be reached Mon-Fri 8:00-5:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Xiao Wu can be reached on 571-272-7761. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/JIN CHENG WANG/Primary Examiner, Art Unit 2613