DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Applicant’s argument and amendment filed on 02/11/2022 has been entered and reviewed. Accordingly, the action is made final.
Claim status:
Claims 1-14 are pending.
Claims 1 and 9 are amended.
No claim is new
NO claim is cancelled.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 3-4, 9 and  11-12 are rejected under 35 U.S.C. 103 as being unpatentable over Sharma et al. (US patent Publication: 20140184644, “Sharma”) in view of Russell et al. ( US patent Publication: 20170262967, “Russell”)


Regarding claim 1, Sharma teaches,  A non-transitory computer readable medium, ( {0057]  and memory 105p at Fig. 10) storing instructions for executing a process for a device comprising a camera (element 110 at Fig. 10) and a processor (Fig. 10 element 105 p) , the instructions comprising:
detecting a target image from images received from the camera; ( “[0027] FIG. 1 illustrates a mobile device 100 capable of detecting moveable foreground object in images captured by a camera 110 without depth information.  The foreground object may be, e.g., a finger 112 or hand of a user 111, or any other desired object that is not attached to the imaged scene.  The foreground object may be tracked in subsequently captured images.  It should be understood that the mobile device 100 may capture a video stream of the scene, where the video stream is composed of a plurality of frames or images.  Accordingly, captured images, as used herein, should be understood to include individual images as well as frames of video.  [0028] Mobile device 100 is shown in FIG. 1 in landscape mode imaging a scene 102 that includes a table 104 with a target 106 and a foreground object 112 in the form of a finger of the user 111.”)
upon initiation of a recording of the target image, generating a mask on gestures made to the target image screen detected from the images received from the camera;  (Sharma [0033] [0049] generates mask based on gesture. ….. “[0033]…If desired, but not necessarily, a mask of the foreground object may be generated based on the comparison of the image and the reference image, and the foreground object may be segmented from the image using the mask”)
While Sharma teaches detecting a target image from the images captured by a camera, it doesn’t teach, the target image is the device screen (detecting a target image from images received from the camera) and  generating perspective corrected frames of the detected device screen from the images received from the camera; and 
However, Russell teaches, detecting a device screen from images received from the camera; generating perspective corrected frames of the detected device screen from the images received from the camera (Paragraph [0007] and0024]   detects a display screen from the images of camera  and then performs perspective correction of the detected device screen………….[0024]…..FIG. 2 is a conceptual illustration of a device having a curved display screen 130 displaying a test pattern 210, according to various embodiments of the present invention.  As shown, the test pattern 210 may be generated by the perspective correction application 112 and outputted to the curved display screen 130, which may be coupled to and/or integrated with the computing device 100.”)
Sharma and Russell are analogous as they are from the field of image processing.
Therefore, it would have been obvious for an ordinary skilled person in the art before the effective filing date of the claimed invention to have modified Sharma to include detection of device screen to enhance Sharma’s application of generating AR image on the screen images for wider applicability and generating perspective corrected frames of the detected device screen as taught by Russell for the purpose of effect of movement of camera during image capture.
Sharma as modified by Russell teaches, generating a mask on gestures made to the target image screen detected from the images received from the camera; wherein the mask is based on one or more frames from recording, (Sharma [0033] [0049] teaches generating a mask on target image and the teaching is applied to detected device screen from Russel. …..Sharma   “[0033]….If desired, but not necessarily, a mask of the foreground object may be generated based on the comparison of the image and the reference image, and the foreground object may be segmented from the image using the mask” Here is a mask is created based the difference of target image (recorded image) and reference image, that means the mask is generated based on the target image (images from recording).)
Sharma as modified by Russell teaches, processing the recording for reference images of the device screen, and interactions made to the device screen based on the mask; (Sharma [0033] processes reference images and detects the interaction made on the image.,,,,,,,,,,,, Sharma,  “[0033] At least one of the image and a reference image of the scene, which does not include the foreground object, is warped so the image and the reference image have a same view (204), e.g., such as a frontal view.  The reference image is of the scene or a portion of the scene and does not include the foreground object and is, thus, the background in the scene.  For example, the reference image may be an image of only the known target or may be an image that includes the known target and an area around the target.  The image is 
compared to a reference image after warping to detect pixels that belong to the point of interest on the foreground object (206).  The comparison of the image and the reference image identifies the portion of the image that is the foreground object from which pixels may be detected as extracted features, e.g., using SIFT, SURF, etc.  If desired, but not necessarily, a mask of the foreground object may be generated based on the comparison of the image and the reference image, and the foreground object may be segmented from the image using the mask.  The pixels may then be detected using the foreground object segmented from the image.  The point of interest on the foreground object is then detected using the pixels (208).”)  and
Sharma as modified by Russell teaches, generating augmented reality (AR) overlays for the reference images based on the interactions made to the device screen based on the mask. ([0033] generates augmented reality based on interaction and mask.  “For example, the image is displayed on the display (210) and an augmentation is rendered on the display over the image based on the point of interest (212).”)

Regarding claim 9, Sharma teaches,   A non-transitory computer readable medium, ( {0057]  and memory 105p at Fig. 10) storing instructions for executing management apparatus configured to facilitate an application for a mobile device(element 100, Fig. 1),, the instructions comprising:
receiving a recording of a  target image; ( “[0027] FIG. 1 illustrates a mobile device 100 capable of detecting moveable foreground object in images captured by a camera 110 without depth information.  The foreground object may be, e.g., a finger 112 or hand of a user 111, or any other desired object that is not attached to the imaged scene.  The foreground object may be tracked in subsequently captured images.  It should be understood that the mobile device 100 may capture a video stream of the scene, where the video stream is composed of a plurality of frames or images.  Accordingly, captured images, as used herein, should be understood to include individual images as well as frames of video.  [0028] Mobile device 100 is shown in FIG. 1 in landscape mode imaging a scene 102 that includes a table 104 with a target 106 and a foreground object 112 in the form of a finger of the user 111.”)
a mask on gestures made to the  target image;  (Sharma [0033] [0049] generates mask based on gesture. ….. “[0033]…If desired, but not necessarily, a mask of the foreground object may be generated based on the comparison of the image and the reference image, and the foreground object may be segmented from the image using the mask”)

Sharma doesn’t expressly teach, the target image is a device screen and perspective corrected frames of the device screen and a mask on gestures made to the device screen;
However, Russell teaches, receiving a recording of device screen; perspective corrected frames of the device screen and a mask on gestures made to the device screen; (Paragraph {0007] and0024]   detects a display screen from the images of camera  and then performs perspective correction of the detected device screen………….[0024]…..FIG. 2 is a conceptual illustration of a device having a curved display screen 130 displaying a test pattern 210, according to various embodiments of the present invention.  As shown, the test pattern 210 may be generated by the perspective correction application 112 and outputted to the curved display screen 130, which may be coupled to and/or integrated with the computing device 100.”)
Sharma and Russell are analogous as they are from the field of image processing.
Therefore, it would have been obvious for an ordinary skilled person in the art before the effective filing date of the claimed invention to have modified Sharma to include receiving a recording of device screen; perspective corrected frames of the device screen and a mask on gestures made to the device screen as taught by Russell for the purpose of effect of movement of camera during image capture.

Sharma as modified by Russell teaches, processing the recording for reference images of the device screen, and interactions made to the device screen based on the mask;.  (Sharma [0033] processes reference images and detects the interaction made on the image.,,,,,,,,,,,, Sharma,  “[0033] At least one of the image and a reference image of the scene, which does not include the foreground object, is warped so the image and the reference image have a same view (204), e.g., such as a frontal view.  The reference image is of the scene or a portion of the scene and does not include the foreground object and is, thus, the background in the scene.  For example, the reference image may be an image of only the known target or may be an image that includes the known target and an area around the target.  The image is 
compared to a reference image after warping to detect pixels that belong to the point of interest on the foreground object (206).  The comparison of the image and the reference image identifies the portion of the image that is the foreground object from which pixels may be detected as extracted features, e.g., using SIFT, SURF, etc.  If desired, but not necessarily, a mask of the foreground object may be generated based on the comparison of the image and the reference image, and the foreground object may be segmented from the image using the mask.  The pixels may then be detected using the foreground object segmented from the image.  The point of interest on the foreground object is then detected using the pixels (208).”)  and
 
Sharma as modified by Russell teaches, wherein the mask is based on one or more frames from recording, (Sharma [0033] [0049] teaches generating a mask on target image and the teaching is applied to detected device screen from Russel. …..Sharma   “[0033]….If desired, but not necessarily, a mask of the foreground object may be generated based on the comparison of the image and the reference image, and the foreground object may be segmented from the image using the mask” Here is a mask is created based the difference of target image (recorded image) and reference image, that means the mask is generated based on the target image (images from recording).)

Sharma as modified by Russell teaches, generating augmented reality (AR) overlays for the reference images based on the interactions made to the device screen based on the mask. ([0033] generates augmented reality based on interaction and mask.  “For example, the image is displayed on the display (210) and an augmentation is rendered on the display over the image based on the point of interest (212).”)

Regarding claims 3 and 11, Sharma as modified by Russell teaches, wherein the processing the recording for reference images of the detected device screen, and interactions made to the detected device screen based on the mask comprises using frames from the recording in which the mask is not overlaid on the detected device screen as the reference images. (Sharma [0032] generates reference image from the captured images where the mask in not overlaid.…………. [0032]….. “a reference image or model of the target may be known and stored, or the target may be learned in real-time based on one or more captured images of the scene, e.g., using Simultaneous Localization and Mapping (SLAM), or other appropriate techniques.  Additionally or alternatively, the pose may be determined using, e.g., a sensor based tracker.  [0033] At least one of the image and a reference image of the scene, which does not include the foreground object, is warped so the image and the reference 
image have a same view (204), e.g., such as a frontal view.”)


Regarding claims 4 and 12, Sharma as modified by Russell teaches,  wherein the processing the recording for reference images of the detected device screen, and interactions made to the detected device screen based on the mask comprises identifying the interactions from identifying fingertip interactions on the detected device screen. (Sharma [0028] identifies the interaction based on fingertip.  ……….. Sharma “[0028]……“For example, by tracking the position of the user's fingertips over multiple images, the mobile device 100 can discern gestures made by the user and hence a user's intended action may be determined from the captured images.”)

Claims 2 and 10 are rejected under 35 U.S.C. 103 as being unpatentable over Sharma as modified by Russell and further in view of Mullins et al. ( US patent Publication: 20180290057, “Mullins”).

Regarding claims 2 and 10, Sharma as modified by Russell teaches, transmitting, to a database, another recording for playback comprising the reference images and the AR overlays. 
Mullins teaches, transmitting or storing to a database, another recording for playback comprising the reference images and the AR overlays.  ([0051]stores in a database, reference image and overlays. ………….“[0051] The storage device 308 may also store a database that identifies reference objects (visual references or images of objects) and corresponding AR experiences (e.g., animations, 3D virtual objects, interactive features of the 3D virtual objects). 
Mullins and Sharma as modified by Russell are analogous as they are from the field of AR processing.
Therefore it would have been obvious for an ordinary skilled person in the art before the effective filing date of the claimed invention to have modified Sharma as modified by Russell to transmit to a database, another recording for playback comprising the reference images and the AR overlays as taught by Mullins for the purpose of using  reference image and augmentation in future time without a need of regenerating these items.

Claims 5 and 13 are rejected under 35 U.S.C. 103 as being unpatentable over Sharma as modified by Russell and further in view of Shreve et al. ( US patent Publication: 2014/0064566, “Shreve”).

Regarding claims 5 and 13, Sharma as modified by Russell teaches, wherein the generating augmented reality (AR) overlays for the reference images based on the interactions made to the detected device screen based on the mask comprises, for each of the interactions:
determining a location for each of the AR overlays corresponding to the each of the interactions on the reference images based on a location of the each of the interactions; (Sharma determines the position of finger to determine the location of AR overlay,………….Sharma [0048]….“For example, as illustrated in FIG. 5F, in each captured image 280, the augmentation, e.g., disk 294, may be generated and displayed based on the position of the detected finger 112, as illustrated in image 292.  Accordingly, the augmentations maybe rendered so that it appears that the tracked foreground object interacts with the augmentations in the display.”) 
determining a type for the each of the AR overlays corresponding to each of the interaction (Sharma [0052] detects type of gesture (temporal/non-temporal)  based on comparison with a  database.)
generating the each of the AR overlays for the each of the interactions on the reference images at the location on the reference images and according to the type. ([0048] and [0052] discloses generation of AER overlay based on the location of interaction and type of gesture.)
while Sharma as modified by Russell teaches, determination of type AR overlay  corresponding to each of the interaction but doesn’t teach so  based on differences between binarized images of the reference images corresponding to the each of the interactions;
However, Shreve teaches, determination of type AR overlay  corresponding to each of the interaction  based on differences between binarized images of the reference images corresponding to the each of the interactions; ([0030] determines type of gesture based  difference in binary images.)
Sharma as modified by Russell and Shreve are analogous as they are from the field of image processing.
Therefore it would have been obvious for an ordinary skilled person in the art before the effective filing date of the claimed invention to have modified Sharma as modified by Russell to have included determination of type AR overlay corresponding to each of the interaction based on differences between binarized images of the reference images corresponding to the each of the interactions as taught by Shreve for the purpose of using known alternative method of determining gesture.

Allowable Subject Matter
Claims 6 and 14 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

Claim 6 and 14 are objected because the best combination of prior arts teaches,  wherein the determining the type for the each of the AR overlays based on differences between binarized images of the reference images corresponding to the interactions comprises: for the differences between binarized images indicating transitions between above the threshold, determining the type for the each of the AR overlays as a swipe gesture overlay. (Shreve [0030] determines type of gesture as swipe based on difference in binary images over a threshold.) but fails to expressly teach for the differences between binarized images being below a threshold, determining the type for the each of the AR overlays as a tap gesture overlay; and

 

Claims 7-8 are allowed.
The following is an examiner’s statement of reasons for allowance:

Claim 7 is allowed because Sharma teaches a non-transitory computer readable medium (Sharma  Fig. 10 element 105m), storing instructions for executing a process for a device comprising a camera (element 110 in Fig. 10) and a processor ( element 105p in Fig. 10) , the instructions comprising: 
Russell teaches, detecting a device screen from images received from the camera (Paragraph [0007] and [0024]   detects a display screen from the images of camera …….[0024]…..FIG. 2 is a conceptual illustration of a device having a curved display screen 130 displaying a test pattern 210, according to various embodiments of the present invention.  As shown, the test pattern 210 may be generated by the perspective correction application 112 and outputted to the curved display screen 130, which may be coupled to and/or integrated with the computing device 100.”)
However the combination of best available prior arts (Sharma, Ruseell, Shrieve) fails to expressly teach as a whole, upon initiation of a playback of the recording corresponding to the detected device screen: playing the recording corresponding to the detected device screen until an augmented reality (AR) overlay corresponding to an interaction is reached; stopping the recording until a change is detected on the detected device screen from the images received from the camera; and continuing playback of the recording once the change is detected on the detected device screen.

Claim 8 is allowed by virtue of dependency.

Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee.  Such submissions should be clearly labeled “Comments on Statement of Reasons for Allowance.”

Response to Arguments
Applicant’s arguments, see remarks Page , filed 02/11/2022, with respect to rejection of claims 1 and 9 under 35 USC 103 have been fully considered and are not persuasive.  Therefore, the rejection has been maintained. 

Applicant argues, see remarks Page 8 *Sharma and Russell, alone or in combination, do not disclose the features of claim 1. For example, Sharma and Russell do not disclose that “the mask is based on one or more frames from the recording,” as recited in claim 1. The Office cites paragraph [0033] of Sharma as disclosing the mask of claim 1. However, claim 1 has been amended to recite that “the mask is based on one or more frames from the recording.” Sharma does not disclose the features of amended claim 1. For example, paragraph [0033] of Sharma discloses that “a mask of the foreground object may be generated based on the comparison of the image and the reference image.” Sharma discloses that its mask is based on a comparison between two images. There is no disclosure or suggestion in Sharma that “the mask is based on one or more frames from the recording,” as recited in claim 1. Sharma’s teaching that the mask is based on a comparison of images is not a mask “based on one or more frames from the recording,” as recited in claim 1. Thus, Sharma does not disclose the features of claim 1.”
Examiner replies, applicant’s argument is not persuasive. Sharma [0033] indicates, “For example, paragraph [0033] of Sharma discloses that “a mask of the foreground object may be generated based on the comparison of the image and the reference image.” Here “the image” is the target image which is recorded by camera. AS mask is determined based on eth difference between recording image and a reference image, mask is also based on  target image or recorded image. Just take an example: If C is based on a and b than C is based on a. Therefore Sharma modified by Russell teaches that the  mask is “based on one or more frames from the recording,”

Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Tapas Mazumder whose telephone number is (571)270-7466.  The examiner can normally be reached on M-F 8:00 AM-5:00 PM PST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kee Tung can be reached on 571-272-7794.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/TAPAS MAZUMDER/           Primary Examiner, Art Unit 2616