Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

DETAILED ACTION
Response to Amendment
This is in response to applicant’s amendment/response filed on 10/26/2022, which has been entered and made of record.  Claims 1, 3-28, 30-54 are pending in the application. 

Response to Arguments
Applicant's arguments filed on 10/26/2022 regarding claims rejection under 35 U.S.C 103 have been fully considered but they are not persuasive.
Applicant submits “the combination of Kim and Booth fails to describe or make obvious an apparatus including one or more processors configured to, at least, "capture a pose of a user of an extended reality system, the pose of the user including a location of the user within a portion of a real-world environment associated with the extended reality system," "capture one or more frames of the portion of the real-world environment without the user," and "overlay the digital avatar representation of the user onto the one or more frames of the portion of the real-world environment in a frame location corresponding to the location of the user within the portion of the real-world environment associated with the captured pose," as claimed.” (Remarks, Page 16, third paragraph.)
The examiner disagrees with Applicant’s premises and conclusion.  Examiner understands applicant’s concept based on applicant’s Fig. 4A to Fig. 4F and ¶0040-0041 of the specification. However, the presented independent claims do not explicitly recite all features described in Fig. 4A to Fig. 4F and ¶0040-0041. Under broadest reasonable interpretation of the claims, the prior art Kim can still teach all the claimed term. In particular, the term “a portion of a real-world environment” could be the environment of the users where they capture the image.  “a frame location corresponding to the location of the user within the portion of the real-world environment associated with the captured pose” could be the location where the user takes the dual images. Examiner further provides a mapping table below for the applicant to understand how examiner mapped the claim.  

17174137
Kim (US Pub 20150009349 A1)
the pose of the user including a location of the user within a portion of a real-world environment
¶0005, “allow users to capture an event or thing of interest (taken by the rear-facing camera) while simultaneously capturing his or her own expression or reaction to the event (taken by the front-facing camera).” the location of the users where they capture the image is mapped to “a location of the user within a portion of a real-world environment”
a digital avatar representation of the user reflecting the pose of the user
Fig .6B, Face 610 is mapped to a digital avatar representation.
capture one or more frames of the portion of the real-world environment without the user
Fig. 6A. the environment of the users where they capture the image is mapped “the portion of the real-world environment”
overlay the digital avatar representation of the user onto the one or more frames of the portion of the real-world environment
Fig. 6C. 620 is overlaid onto 615. 615 is “frames of the portion of the real-world environment”
in a frame location corresponding to the location of the user within the portion of the real-world environment 
Fig. 6C. the frame location, the location of the user is same location where the user takes the dual images. 
the portion of the real-world environment associated with the captured pose.
the environment of the users where they capture the image is mapped “the portion of the real-world environment”

  


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.


The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claims 1, 3-5, 10-20, 22, 24-28, 30-32, 37-47, 49, 51-53 are rejected under 35 U.S.C. 103 as being unpatentable over Kim (US Pub 20150009349 A1) in view of Booth et al. (US Pub 20210209854 A1), further in view of Kawakami et al (US Pub 2021/0233325 A1).

As to claim 1, Kim discloses an apparatus for capturing self-images (Fig.6A to 6C) (¶0033); and one or more processors coupled to the memory (¶0033), the one or more processors being configured to: 
capture a pose of a user (¶0030, “the electronic device may also capture a “user” image, “foreground” scenery image or (generally) a front image, utilizing the front-facing camera of the electronic device. The front-facing camera is preferably instructed to output a part of the image that only includes desired elements of the front image, as indicated by the user, which may be, for example, a portion of the user's body (e.g., the user's face, particularly). This may be determined based on information designating a pixel-based area of interest which includes the part of the user (e.g., the user's face).”) (¶0005, “allow users to capture an event or thing of interest (taken by the rear-facing camera) while simultaneously capturing his or her own expression or reaction to the event (taken by the front-facing camera).” ¶0044, ¶0059, ¶0063-0064.) 
generate a digital avatar representation of the user, the digital avatar representation of the user reflecting the pose of the user (¶0064, “the user may review the composition of the front image comprising their face 405 through a live preview screen where the user's face 405 is tracked and displayed.” ¶0067, “display the preview images in user-designated sizes at user-designated positions.” Fig. 6B); 
capture one or more frames of the portion of the real-world environment without the user (Fig. 6A, ¶0077, “a rear image 600 is captured by the second image sensor 152, as depicted in FIG. 6A.”); and 
overlay the digital avatar representation of the user onto the one or more frames of the portion of the real-world environment in a frame location corresponding to the location of the user within the portion of the real-world environment associated with the captured pose (Fig.6A to 6C, ¶0076, “The ISP 156 may then create a combined dual-shot image 515 by overlaying the partial image 510 over the rear image 505.”).
Kim does not explicitly disclose an extended reality environment or an extended reality system.
Booth teaches an extended reality environment and an extended reality system (Booth, abstract, “artificial reality system can provide a user self representation in an artificial reality environment based on a self portion from an image of the user.” Fig. 2A, 2B, Fig. 3. ¶0023-0024).
Kim and Booth are considered to be analogous art because all pertain to captured images. It would have been obvious before the effective filing date of the claimed invention to have modified Kim with the features of “an extended reality environment and an extended reality system” as taught by Booth. The suggestion/motivation would have been the process of capturing images, applying a machine learning model to extract a self portion, and displaying the self portion as a self representation can be performed with significantly less computing power than that required by existing systems to track part of a user, map determined body positions into a virtual space, and render an avatar positioned according to the determined body positions (Booth, ¶0026),  and artificial reality systems provide users the ability to experience different worlds, learn in new ways, and make better connections with others (Booth, ¶0002). 
In regarding a digital avatar representation of the user, examiner believes Kim’s face image could be interpreted as “a digital avatar representation of the user”. But for a narrower meaning of “avatar”, one of ordinary skill in the art could also refer to Kawakami for detail. 
Kawakami teaches “a digital avatar representation of the user” (Kawakami, Fig.3, ¶0034, “The video synthesis device 1 fixes the avatar 100 to a coordinate system that corresponds to the real space and synthesizes the avatar 100 with the video image.”).
Kim, Booth and Kawakami are considered to be analogous art because all pertain to captured images. It would have been obvious before the effective filing date of the claimed invention to have modified Kim with the features of “a digital avatar representation of the user” as taught by Booth. The suggestion/motivation would have been in order to synthesize the CG character to a selfie video image by using a face-tracking technology, the CG character reflecting the facial expression of the user him/herself can be easily synthesized real time with a live-action video image (Kawakami, ¶0004).

As to claim 3, claim 1 is incorporated and the combination of Kim, Kawakami and Booth discloses the one or more processors are configured to generate the digital avatar representation of the user before capturing the one or more frames of the portion of the real-world environment (Kim. Fig. 5, ¶0061, “Dual-shot mode refers to the capture of two images simultaneously or sequentially with two different image sensors” in a sequentially fashion, it is obvious to capture front image before the rear image. ¶0064 “taking dual-shot photos begins instead with automatic recognition of the user's face 405, which is then previewed as a real-time image. This enables the user to more easily decide whether to capture the image without altering their own position.” Booth, ¶0017, “An artificial reality system can generate the self representation by capturing images of the user, in real time, and applying a machine learning model to classify a self portion of each of the images. The artificial reality system can display a version of the self portions as a self representation in the artificial reality environment by positioning the version in the artificial reality environment relative to the user's perspective view into the artificial reality environment.”).

As to claim 4, claim 3 is incorporated and the combination of Kim, Kawakami and Booth discloses the one or more processors are configured to display, within a display of the extended reality system through which the real-world environment is visible, the digital avatar representation of the user in the frame location corresponding to the location of the user within the portion of the real-world environment (Kim, ¶0030, “The electronic device may then display a dual-shot preview screen by overlaying the front image over the rear image.” ¶0059, “a combined dual-shot image is created by combining a front image and a rear image captured by the first and second image sensors 151 and 152, respectively. The rear image is obtained by capturing a desired image using the second image sensor 152 of the electronic device 100. The front image is created by capturing a subject using the first image sensor 151 of the electronic device 100. For example, the first image sensor 151 may be configured to capture an image of the user who is holding the electronic device 100, and the second image sensor 152 may be configured to capture whatever the user is currently looking at.”).

As to claim 5, claim 4 is incorporated and the combination of Kim, Kawakami and Booth discloses the one or more processors are configured to: detect user input corresponding to an instruction to capture the one or more frames of the portion of the real-world environment while the digital avatar representation of the user is displayed within the display of the extended reality system; and capture the one or more frames of the portion of the real-world environment based on the user input (Kim, ¶0030, “The front-facing camera is preferably instructed to output a part of the image that only includes desired elements of the front image, as indicated by the user, which may be, for example, a portion of the user's body (e.g., the user's face, particularly). This may be determined based on information designating a pixel-based area of interest which includes the part of the user (e.g., the user's face). The electronic device may then display a dual-shot preview screen by overlaying the front image over the rear image.” ¶0060, “when producing a dual-shot image, upon receiving the appropriate user input, the controller 110 may instruct the first image sensor 151 to capture the front image for (in this case a picture of the user), set a pixel-area of interest that includes only the user's body from among the entire pixel-area of the captured image (which may be the user's face, hands, etc.), and then output an image corresponding to the pixel-area of interest. The controller 110 may also instruct the second image sensor 152 to capture a rear image (presumably of some object, person or landscape of interest) and output the rear image. Finally, the controller 110 may combine the rear image and the pixel-area of interest from the front image.” ¶0064, “taking dual-shot photos begins instead with automatic recognition of the user's face 405, which is then previewed as a real-time image. This enables the user to more easily decide whether to capture the image without altering their own position.” ¶0067, “The user may analyze the first and second preview images to decide when to take a dual-shot photo.”).

As to claim 10, claim 1 is incorporated and the combination of Kim, Kawakami and Booth discloses the one or more processors are configured to: generate a first digital avatar representation of the user of a first fidelity (Kim, ¶0075, “Once the controller 110 recognizes the subject portion based on features from the received pixel information, the controller 110 may instruct the first image sensor 151 to output a partial image based on information regarding the position of the subject part. The first image sensor 151 preferably then outputs an image signal 510 based on the instruction corresponding only to the pixel-area of interest. Thus, the first image sensor 151 can output a live preview image output from a part of the full range of pixels.”); and obtain a second digital avatar representation of the user of a second fidelity, wherein the second fidelity is higher than the first fidelity (Kim, ¶0075, “after recognition of the subject portion, the controller 110 may determine whether a change in the position of the subject portion has been made within the full range pixel-area of the first image sensor 151 in order to display a real-time preview screen through tracking of the subject portion. If the controller 110 determines that a change of the position of the subject portion has been made, the controller 110 may modify the pixel-area of interest so that the desired subject portion continues to be present in the partial image. The controller 110 thereafter instructs the first image sensor 151 to output a partial image 510 consisting of only the modified pixel-area of interest.” “modify the pixel-area of interest so that the desired subject portion continues to be present in the partial image” indicates higher fidelity.).

As to claim 11, claim 10 is incorporated and the combination of Kim, Kawakami and Booth discloses the one or more processors are configured to: display the first digital avatar representation of the user within a display of the extended reality system before the pose of the user is captured (Kim, ¶0075, “Once the controller 110 recognizes the subject portion based on features from the received pixel information, the controller 110 may instruct the first image sensor 151 to output a partial image based on information regarding the position of the subject part. The first image sensor 151 preferably then outputs an image signal 510 based on the instruction corresponding only to the pixel-area of interest. Thus, the first image sensor 151 can output a live preview image output from a part of the full range of pixels.”); 
generate the second digital avatar representation of the user based on the pose of the user being captured (Kim, ¶0075, “after recognition of the subject portion, the controller 110 may determine whether a change in the position of the subject portion has been made within the full range pixel-area of the first image sensor 151 in order to display a real-time preview screen through tracking of the subject portion. If the controller 110 determines that a change of the position of the subject portion has been made, the controller 110 may modify the pixel-area of interest so that the desired subject portion continues to be present in the partial image. The controller 110 thereafter instructs the first image sensor 151 to output a partial image 510 consisting of only the modified pixel-area of interest.”); and 
overlay the second digital avatar representation of the user onto the one or more frames of the real-world environment (Kim, ¶0076, “The ISP 156 may then create a combined dual-shot image 515 by overlaying the partial image 510 over the rear image 505.”)

As to claim 12, claim 10 is incorporated and the combination of Kim, Kawakami and Booth discloses the one or more processors are configured to: display the first digital avatar representation of the user within a display of the extended reality system before the one or more frames of the portion of the real-world environment are captured; generate the second digital avatar representation of the user based on the one or more frames of the portion of the real-world environment being captured; and overlay the second digital avatar representation of the user onto the one or more frames of the portion of the real-world environment (Kim, ¶0069, “ISP 156 may receive real-time images captured by the image sensor module 150, process the images to fit desired visual characteristics (size, quality, resolution, etc.), and display the processed images.” ¶0076, “The present invention may also be applied to real-time images or real-time image frames that are sequentially inputted or read. The touch screen 190 may display the combined dual-shot image 515 created by the ISP 156. The method and apparatus may also be utilized when the user desires to create dual-shot images using sequentially-captured photographs. In this case, the electronic device 100 may be configured such that the second image sensor 152 first takes a picture of a subject in focus, and, after a predetermined time lag, the first image sensor 151 then takes a subject in focus (or vice versa). The images may then be overlaid automatically, creating a dual-shot image.” Fig. 6A-6C, ¶0077. Also can see in Booth, ¶0018, “The artificial reality system can merge the contemporaneously captured images into a single image and adjust them to be from the user's perspective.”. Booth, ¶0022, “The artificial reality system can also identify movements of the user, e.g., by tracking a controller or a body part of the user. Based on this movement, instead of having to capture a new self portion of the user and create a new self representation, the artificial reality system can adjust the self representation to match the user's movement. This can provide more accurate self representations. For example, a controller may be able to report its position to an artificial reality system headset more quickly than the artificial reality system can capture images and create a new self representation. By warping the existing self representation to match the movement until a new self representation can be created from more current captured self portions of images, the artificial reality system can keep the self representation spatially accurate according to the user's body position.”)

As to claim 13, claim 10 is incorporated and the combination of Kim, Kawakami and Booth discloses the first digital avatar representation is based on a first machine learning algorithm and the second digital avatar representation of the user is based on a second machine learning algorithm (Kim, ¶0074, “The desired subject portion may be recognized algorithmically utilizing, for example, image pattern algorithms. The controller 100 may set the pixel-area of interest based on such known pattern information, which may comprise image patterns of faces, hands, etc. The controller 110 may alternatively recognize the subject portion based on a number of “learning model” algorithms.” ¶0075, “the controller 110 may instruct the first image sensor 151 to output a partial image based on information regarding the position of the subject part.” A number of “learning model” algorithms suggests first and second machine learning algorithm. Also see Booth, ¶00019, “The artificial reality system can identify the self portion by applying a machine learning model to the image. This machine learning model can be of various types such as a type of neural network, a support vector machine, Bayes classifier, decision tree, etc. The machine learning model can be trained to identify self portions in images based on a set of training images, with portions (e.g., set areas, pixels, etc.) tagged as either depicting a user from a self-perspective or not.”)

As to claim 14, claim 13 is incorporated and the combination of Kim, Kawakami and Booth discloses the one or more processors are configured to: generate the first digital avatar representation of the user based on implementing the first machine learning algorithm on the extended reality system (Kim, ¶0074, “The desired subject portion may be recognized algorithmically utilizing, for example, image pattern algorithms. The controller 100 may set the pixel-area of interest based on such known pattern information, which may comprise image patterns of faces, hands, etc. The controller 110 may alternatively recognize the subject portion based on a number of “learning model” algorithms.” ¶0075, “the controller 110 may instruct the first image sensor 151 to output a partial image based on information regarding the position of the subject part.”); and 
cause a server configured to generate digital avatar representations of users to generate the second digital avatar representation of the user based on implementing the second machine learning algorithm (Also see Booth, ¶00019, “The artificial reality system can identify the self portion by applying a machine learning model to the image. This machine learning model can be of various types such as a type of neural network, a support vector machine, Bayes classifier, decision tree, etc. The machine learning model can be trained to identify self portions in images based on a set of training images, with portions (e.g., set areas, pixels, etc.) tagged as either depicting a user from a self-perspective or not.”).

As to claim 15, claim 1 is incorporated and the combination of Kim, Kawakami and Booth discloses the one or more processors are configured to: capture a pose of a person within the portion of the real-world environment; generate a digital avatar representation of the person, the digital avatar representation of the person reflecting the pose of the person; and overlay the digital avatar representation of the user and the digital avatar representation of the person onto the one or more frames of the portion of the real-world environment (Kim, ¶0075, “Once the controller 110 recognizes the subject portion based on features from the received pixel information, the controller 110 may instruct the first image sensor 151 to output a partial image based on information regarding the position of the subject part. The first image sensor 151 preferably then outputs an image signal 510 based on the instruction corresponding only to the pixel-area of interest. Thus, the first image sensor 151 can output a live preview image output from a part of the full range of pixels. In one embodiment of the present invention, after recognition of the subject portion, the controller 110 may determine whether a change in the position of the subject portion has been made within the full range pixel-area of the first image sensor 151 in order to display a real-time preview screen through tracking of the subject portion. If the controller 110 determines that a change of the position of the subject portion has been made, the controller 110 may modify the pixel-area of interest so that the desired subject portion continues to be present in the partial image. The controller 110 thereafter instructs the first image sensor 151 to output a partial image 510 consisting of only the modified pixel-area of interest.” ¶0076, “The outputted partial image may then be received by the ISP 156. The ISP 156 may then create a combined dual-shot image 515 by overlaying the partial image 510 over the rear image 505.”).

As to claim 16, claim 15 is incorporated and the combination of Kim, Kawakami and Booth discloses the one or more processors are configured to generate the digital avatar representation of the person based at least in part on information associated with the digital avatar representation of the person received from an extended reality system of the person (Booth, ¶0019, “The artificial reality system can identify the self portion by applying a machine learning model to the image.” ¶0020, “The artificial reality system can use the classifications from the machine learning model to create a mask, which then can be applied to the original image to extract the self portion from the image.” ¶0021, “The artificial reality system can then identify a self portion of the image that depicts part of the user's torso, hands, arms, legs and feet by applying a trained machine learning model. The area of the identified self portion can be used as a mask to extract the self portion from the image. The artificial reality system can then display the extracted self portion in the artificial reality system relative to the user's point of view, thus allowing the user to see a self representation showing her real-world torso, hands, arms, legs and feet in the artificial reality environment.”).

As to claim 17, claim 16 is incorporated and the combination of Kim, Kawakami and Booth discloses the information associated with the digital avatar representation of the person includes a machine learning model trained to generate digital avatar representations of the person (Booth, ¶0019, “The artificial reality system can identify the self portion by applying a machine learning model to the image.” ¶0020, “The artificial reality system can use the classifications from the machine learning model to create a mask, which then can be applied to the original image to extract the self portion from the image.” ¶0021, “The artificial reality system can then identify a self portion of the image that depicts part of the user's torso, hands, arms, legs and feet by applying a trained machine learning model. The area of the identified self portion can be used as a mask to extract the self portion from the image. The artificial reality system can then display the extracted self portion in the artificial reality system relative to the user's point of view, thus allowing the user to see a self representation showing her real-world torso, hands, arms, legs and feet in the artificial reality environment.”).

As to claim 18, claim 1 is incorporated and the combination of Kim, Kawakami and Booth discloses the one or more processors are configured to: capture a plurality of poses of the user associated with a plurality of frames; generate a plurality of digital avatar representations of the user corresponding to the plurality of frames; and overlay the plurality of digital avatar representations of the user onto the one or more frames of the portion of the real-world environment, the one or more frames of the portion of the real-world environment including a plurality of frames of the portion of the real-world environment (Booth, ¶0022, “The artificial reality system can also identify movements of the user, e.g., by tracking a controller or a body part of the user. Based on this movement, instead of having to capture a new self portion of the user and create a new self representation, the artificial reality system can adjust the self representation to match the user's movement. This can provide more accurate self representations. For example, a controller may be able to report its position to an artificial reality system headset more quickly than the artificial reality system can capture images and create a new self representation. By warping the existing self representation to match the movement until a new self representation can be created from more current captured self portions of images, the artificial reality system can keep the self representation spatially accurate according to the user's body position.” ¶0052. ¶0074.) 

As to claim 19, claim 1 is incorporated and the combination of Kim, Kawakami and Booth discloses the one or more processors are configured to: generate the digital avatar representation of the user using a first machine learning algorithm (Kim, ¶0074, “The desired subject portion may be recognized algorithmically utilizing, for example, image pattern algorithms. The controller 100 may set the pixel-area of interest based on such known pattern information, which may comprise image patterns of faces, hands, etc. The controller 110 may alternatively recognize the subject portion based on a number of “learning model” algorithms.” ¶0075, “the controller 110 may instruct the first image sensor 151 to output a partial image based on information regarding the position of the subject part.”); and 
overlay the digital avatar representation of the user onto the one or more frames of the portion of the real- world environment using a second machine learning algorithm (Booth, ¶0019, “The artificial reality system can identify the self portion by applying a machine learning model to the image.” ¶0020, “The artificial reality system can use the classifications from the machine learning model to create a mask, which then can be applied to the original image to extract the self portion from the image.” ¶0021, “The artificial reality system can then identify a self portion of the image that depicts part of the user's torso, hands, arms, legs and feet by applying a trained machine learning model. The area of the identified self portion can be used as a mask to extract the self portion from the image. The artificial reality system can then display the extracted self portion in the artificial reality system relative to the user's point of view, thus allowing the user to see a self representation showing her real-world torso, hands, arms, legs and feet in the artificial reality environment.” ¶0052. ¶0060.).

As to claim 20, claim 1 is incorporated and the combination of Kim, Kawakami and Booth discloses the one or more processors are configured to capture the pose of the user based at least in part on image data captured by an inward-facing camera system of the extended reality system (Kim, Fig. 2, Fig. 4, ¶0013, “A desired subject part in a front image captured by a first image sensor is algorithmically recognized. An area of pixels of interest is set depending on where the desired subject part is recognized in the front image.”)

As to claim 22, claim 1 is incorporated and the combination of Kim, Kawakami and Booth discloses the one or more processors are configured to capture the pose of the user based at least in part on determining a gesture of the user (Kim, ¶0053, “The controller 110 may also detect various user inputs received through other modules, such as the image sensor module 150, the input/output module 160, the sensor module 170, etc. The user inputs may include different forms of information entered into the electronic device 100, such as touches, user gestures, voice, pupil movements, vital signs, etc.”)

As to claim 24, claim 1 is incorporated and the combination of Kim, Kawakami and Booth discloses the one or more processors are configured to capture the one or more frames of the portion of the real-world environment using an outward-facing camera system of the extended reality system (Kim, Fig. 3, Fig. 6A, ¶0021, “a rear image captured by a rear-facing camera,”)

As to claim 25, claim 1 is incorporated and the combination of Kim, Kawakami and Booth discloses the apparatus includes the extended reality system (Booth, ¶0023).

As to claim 26, claim 1 is incorporated and the combination of Kim, Kawakami and Booth discloses the apparatus includes a mobile device (Kim, Fig. 2, Fig. 3).

As to claim 27, claim 1 is incorporated and the combination of Kim, Kawakami and Booth discloses further comprising a display (Kim, Fig. 2, ¶0031).

As to claim 28, the combination of Kim, Kawakami and Booth discloses a method for capturing self-images in extended reality environments, the method comprising: capturing a pose of a user of an extended reality system, the pose of the user including a location of the user within a portion of a real-world environment associated with the extended reality system; generating a digital avatar representation of the user, the digital avatar representation of the user reflecting the pose of the user; capturing one or more frames of the portion of the real-world environment without the user; and overlaying the digital avatar representation of the user onto the one or more frames of the portion of the real-world environment in a frame location corresponding to the location of the user within the portion of the real-world environment associated with the captured pose (See claim 1 for detailed analysis.).

As to claim 30, claim 28 is incorporated and the combination of Kim, Kawakami and Booth discloses generating the digital avatar representation of the user is performed before capturing the one or more frames of the portion of the real-world environment (See claim 3 for detailed analysis.).

As to claim 31, claim 30 is incorporated and the combination of Kim, Kawakami and Booth discloses displaying, within a display of the extended reality system through which the real-world environment is visible, the digital avatar representation of the user in the frame location corresponding to the location of the user within the portion of the real-world environment (See claim 4 for detailed analysis.).

As to claim 32, claim 31 is incorporated and the combination of Kim, Kawakami and Booth discloses capturing the one or more frames of the portion of the real- world environment further comprises: detecting user input corresponding to an instruction to capture the one or more frames of the portion of the real-world environment while the digital avatar representation of the user is displayed within the display of the extended reality system; and capturing the one or more frames of the portion of the real-world environment based on the user input (See claim 5 for detailed analysis.).

As to claim 37, claim 28 is incorporated and the combination of Kim, Kawakami and Booth discloses generating the digital avatar representation of the user includes: generating a first digital avatar representation of the user of a first fidelity; and obtaining a second digital avatar representation of the user of a second fidelity, wherein the second fidelity is higher than the first fidelity (See claim 10 for detailed analysis.).

As to claim 38, claim 37 is incorporated and the combination of Kim, Kawakami and Booth discloses displaying the first digital avatar representation of the user within a display of the extended reality system before the pose of the user is captured; generating the second digital avatar representation of the user based on the pose of the user being captured; and overlaying the second digital avatar representation of the user onto the one or more frames of the portion of the real-world environment (See claim 11 for detailed analysis.).

As to claim 39, claim 37 is incorporated and the combination of Kim, Kawakami and Booth discloses displaying the first digital avatar representation of the user within a display of the extended reality system before the one or more frames of the portion of the real-world environment are captured; generating the second digital avatar representation of the user based on the one or more frames of the portion of the real-world environment being captured; and overlaying the second digital avatar representation of the user onto the one or more frames of the real-world environment (See claim 12 for detailed analysis.).

As to claim 40, claim 37 is incorporated and the combination of Kim, Kawakami and Booth discloses the first digital avatar representation is based on a first machine learning algorithm and the second digital avatar representation of the user is based on a second machine learning algorithm (See claim 13 for detailed analysis.).

As to claim 41, claim 40 is incorporated and the combination of Kim, Kawakami and Booth discloses generating the first digital avatar representation of the user includes implementing the first machine learning algorithm on the extended reality system; and obtaining the second digital avatar representation of the user includes causing a server configured to generate digital avatar representations of users to generate the second digital avatar representation of the user based on implementing the second machine learning algorithm (See claim 14 for detailed analysis.).

As to claim 42, claim 28 is incorporated and the combination of Kim, Kawakami and Booth discloses capturing a pose of a person within the portion of the real-world environment; generating a digital avatar representation of the person, the digital avatar representation of the person reflecting the pose of the person; and overlaying the digital avatar representation of the user and the digital avatar representation of the person onto the one or more frames of the portion of the real-world environment (See claim 15 for detailed analysis.).

As to claim 43, claim 42 is incorporated and the combination of Kim, Kawakami and Booth discloses the digital avatar representation of the person is generated based at least in part on information associated with the digital avatar representation of the person received from an extended reality system of the person (See claim 16 for detailed analysis.).

As to claim 44, claim 43 is incorporated and the combination of Kim, Kawakami and Booth discloses the information associated with the digital avatar representation of the person includes a machine learning model trained to generate digital avatar representations of the person (See claim 17 for detailed analysis.).

As to claim 45, claim 28 is incorporated and the combination of Kim, Kawakami and Booth discloses capturing a plurality of poses of the user associated with a plurality of frames; generating a plurality of digital avatar representations of the user corresponding to the plurality of frames; and overlaying the plurality of digital avatar representations of the user onto the one or more frames of the portion of the real-world environment, the one or more frames of the portion of the real-world environment including a plurality of frames of the portion of the real-world environment (See claim 18 for detailed analysis.).

As to claim 46, claim 28 is incorporated and the combination of Kim, Kawakami and Booth discloses generating the digital avatar representation of the user includes using a first machine learning algorithm; and overlaying the digital avatar representation of the user onto the one or more frames of the portion of the real- world environment includes using a second machine learning algorithm (See claim 19 for detailed analysis.).

As to claim 47, claim 28 is incorporated and the combination of Kim, Kawakami and Booth discloses capturing the pose of the user includes capturing image data using an inward-facing camera system of the extended reality system (See claim 20 for detailed analysis.).

As to claim 49, claim 28 is incorporated and the combination of Kim, Kawakami and Booth discloses capturing the pose of the user includes determining a gesture of the user (See claim 22 for detailed analysis.).

As to claim 51, claim 28 is incorporated and the combination of Kim, Kawakami and Booth discloses capturing the one or more frames of the portion of the real- world environment includes capturing image data using an outward-facing camera system of the extended reality system (See claim 24 for detailed analysis.).

As to claim 52, the combination of Kim, Kawakami and Booth discloses A non-transitory computer-readable storage medium for capturing self-images in extended reality environments, the non-transitory computer-readable storage medium comprising: instructions stored therein which, when executed by one or more processors, cause the one or more processors to: capture a pose of a user of an extended reality system, the pose of the user including a location of the user within a portion of a real-world environment associated with the extended reality system; generate a digital avatar representation of the user, the digital avatar representation of the user reflecting the pose of the user; capture one or more frames of the portion of the real-world environment without the user; and overlay the digital avatar representation of the user onto the one or more frames of the portion of the real-world environment in a frame location corresponding to the location of the user within the portion of the real-world environment associated with the captured pose (See claim 1 for detailed analysis.).

As to claim 53, claim 52 is incorporated and the combination of Kim, Kawakami and Booth discloses to generate the digital avatar representation of the user before capturing the one or more frames of the portion of the real-world environment. (See claim 3 for detailed analysis.).

Claims 6-9, 33-36, 54 are rejected under 35 U.S.C. 103 as being unpatentable over Kim (US Pub 20150009349 A1) in view of Booth et al. (US Pub 20210209854 A1), Kawakami et al (US Pub 2021/0233325 A1) and Tan (US Pub 20090175609 A1)

As to claim 6, claim 1 is incorporated and the combination of Kim and Booth does not explicitly disclose the one or more processors are configured to capture the one or more frames of the portion of the real-world environment before capturing the pose of the user.
Tan teaches to capture the one or more frames of the portion of the real-world environment before capturing the pose of the user (Tan, Fig. 5, abstract, ¶0061-0062. ¶0072-0074.).
Kim, Kawakami,Tan and Booth are considered to be analogous art because all pertain to captured images. It would have been obvious before the effective filing date of the claimed invention to have modified Kim with the features of “capture the one or more frames of the portion of the real-world environment before capturing the pose of the user.” as taught by Tan. The claim would have been obvious because a particular known technique was recognized as part of the ordinary capabilities of one skilled in the art.

As to claim 7, claim 6 is incorporated and the combination of Kim, Kawakami, Tan and Booth discloses the one or more processors are configured to display, within a display of the extended reality system on which the one or more frames of the portion of the real-world environment are displayed, the digital avatar representation of the user in a display location corresponding to the location of the user within the portion of the real-world environment (Kim, ¶0030, ¶0059)

As to claim 8, claim 7 is incorporated and the combination of Kim, Kawakami , Tan and Booth discloses the one or more processors are configured to update the frame location of the digital avatar representation of the user based on detecting a change in the location of the user within the portion of the real-world environment (Booth, ¶0017, “An artificial reality system can generate the self representation by capturing images of the user, in real time, and applying a machine learning model to classify a self portion of each of the images. The artificial reality system can display a version of the self portions as a self representation in the artificial reality environment by positioning the version in the artificial reality environment relative to the user's perspective view into the artificial reality environment.” ¶0022, “Based on this movement, instead of having to capture a new self portion of the user and create a new self representation, the artificial reality system can adjust the self representation to match the user's movement. This can provide more accurate self representations. For example, a controller may be able to report its position to an artificial reality system headset more quickly than the artificial reality system can capture images and create a new self representation. By warping the existing self representation to match the movement until a new self representation can be created from more current captured self portions of images, the artificial reality system can keep the self representation spatially accurate according to the user's body position.”).

As to claim 9, claim 7 is incorporated and the combination of Kim, Kawakami , Tan and Booth discloses the one or more processors are further configured to: detect user input corresponding to an instruction to capture the pose of the user while the digital avatar representation of the user is displayed within the display of the extended reality system; and capture the pose of the user based on the user input (Booth, ¶0022, “The artificial reality system can also identify movements of the user, e.g., by tracking a controller or a body part of the user. Based on this movement, instead of having to capture a new self portion of the user and create a new self representation, the artificial reality system can adjust the self representation to match the user's movement. This can provide more accurate self representations. For example, a controller may be able to report its position to an artificial reality system headset more quickly than the artificial reality system can capture images and create a new self representation. By warping the existing self representation to match the movement until a new self representation can be created from more current captured self portions of images, the artificial reality system can keep the self representation spatially accurate according to the user's body position.”).

As to claim 33, claim 28 is incorporated and the combination of Kim, Kawakami , Tan and Booth discloses capturing the one or more frames of the real- world environment is performed before capturing the pose of the user (See claim 6 for detailed analysis.).

As to claim 34, claim 33 is incorporated and the combination of Kim, Kawakami, Tan and Booth discloses displaying, within a display of the extended reality system on which the one or more frames of the portion of the real-world environment are displayed, the digital avatar representation of the user in a display location corresponding to the location of the user within the portion of the real-world environment (See claim 7 for detailed analysis.)

As to claim 35, claim 34 is incorporated and the combination of Kim, Kawakami, Tan and Booth discloses updating the display location of the digital avatar representation of the user based on detecting a change in the location of the user within the portion of the real-world environment (See claim 8 for detailed analysis.)

As to claim 36, claim 34 is incorporated and the combination of Kim, Kawakami, Tan and Booth discloses capturing the pose of the user of the extended reality system further comprises: detecting user input corresponding to an instruction to capture the pose of the user while the digital avatar representation of the user is displayed within the display of the extended reality system; and capturing the pose of the user based on the user input (See claim 9 for detailed analysis.).

As to claim 54, claim 52 is incorporated and the combination of Kim, Kawakami, Tan and Booth discloses capture the one or more frames of the portion of the real-world environment before capturing the pose of the user (See claim 6 for detailed analysis.).

Claims 21, 48 are rejected under 35 U.S.C. 103 as being unpatentable over Kim (US Pub 20150009349 A1) in view of Booth et al. (US Pub 20210209854 A1), Kawakami et al (US Pub 2021/0233325 A1) and Smith et al. (US Pub 2020/0327378 A1).

As to claim 21, claim 1 is incorporated and the combination of Kim and Booth teaches the one or more processors are configured to capture the pose of the user based at least in part on input of the user (Kim, ¶0053, “The controller 110 may also detect various user inputs received through other modules, such as the image sensor module 150, the input/output module 160, the sensor module 170, etc. The user inputs may include different forms of information entered into the electronic device 100, such as touches, user gestures, voice, pupil movements, vital signs, etc.” Kim, ¶0064, “taking dual-shot photos begins instead with automatic recognition of the user's face 405, which is then previewed as a real-time image.”).
The combination of Kim and Booth does not explicitly disclose determining an expression of the user.
Smith teaches capturing the pose of the user based at least in part on determining an expression of the user (Smith, ¶0195, “The system or auxiliary systems may create digital products (such as memes or synopses) that are based on user instruction or are automatically recapped based on reactions. Memorable moments may be captured with or without user's reactions in an image, such as a heartbeat monitor during a scary scene of a horror movie and/or facial expression overlaid on character.”).
Kim, Kawakami, Smith and Booth are considered to be analogous art because all pertain to captured images. It would have been obvious before the effective filing date of the claimed invention to have modified Kim with the features of “determining an expression of the user.” as taught by Smith. The suggestion/motivation would have been in order to automatically recapped based on reactions (Smith, ¶0195).

As to claim 48, claim 28 is incorporated and the combination of Kim, Kawakami, Smith and Booth discloses capturing the pose of the user includes determining an expression of the user (See claim 21 for detailed analysis.).

Claims 23, 50 are rejected under 35 U.S.C. 103 as being unpatentable over Kim (US Pub 20150009349 A1) in view of Booth et al. (US Pub 20210209854 A1), Kawakami et al (US Pub 2021/0233325 A1) and Dascola et al. (US Pub 2022/0214743 A1).

As to claim 23, claim 1 is incorporated and the combination of Kim and Booth does not disclose to determine the location of the user within the real-world environment based at least in part on generating a three-dimensional map of the real-world environment.
Dascola teaches determine the location of the user within the real-world environment based at least in part on generating a three-dimensional map of the real-world environment (Dascola, ¶0092, “generate an XR map (e.g., a 3D map of the mixed reality scene or a map of the physical environment into which computer generated objects can be placed to generate the extended reality) based on media content data.” ¶0100, “Software running on a processor in the image sensors 404 and/or the controller 110 processes the 3D map data to extract patch descriptors of the hand in these depth maps. The software matches these descriptors to patch descriptors stored in a database 408, based on a prior learning process, in order to estimate the pose of the hand in each frame. The pose typically includes 3D locations of the user's hand joints and finger tips.”).
Kim, Kawakami, Dascola and Booth are considered to be analogous art because all pertain to captured images. It would have been obvious before the effective filing date of the claimed invention to have modified Kim with the features of “determine the location of the user within the real-world environment based at least in part on generating a three-dimensional map of the real-world environment” as taught by Dascola. The suggestion/motivation would have been in order to estimate the pose of the user in each frame (Dascola, ¶0100).

As to claim 50, claim 28 is incorporated and the combination of Kim, Kawakami, Dascola and Booth discloses determining the location of the user within the real-world environment based at least in part on generating a three-dimensional map of the real-world environment (See claim 23 for detailed analysis.).



Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to YU CHEN whose telephone number is (571)270-7951.  The examiner can normally be reached on M-F 8-5 PST Mid-day flex.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Xiao Wu can be reached on 571-270-7951.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/YU CHEN/
Primary Examiner, Art Unit 2613