Office Action Appendix

Claim Rejections - 35 USC § 102/103

Regarding claims 1, Zobel (US 2021/0027439) discloses a non-transitory computer readable medium for generating images [Fig. 5], the non-transitory computer readable medium comprising instructions that, when execute by at least one processor, cause a computing device to: 
segment instances of a first object and a second object in a sequence of images; 
[Zobel: Figs. 5-8 and paragraphs 48 (“…example segmentation may be a cropping, generating another layer including the segment, generating the segment as another image”), 54 (“…segment the depictions 618-628 into segments 606-616”).  Note that the objects (faces of different persons) shown as 618 (resp. 624) and 622 (resp. 628) are considered a first and a second object, respectively and are segmented into 606 and 610]
receive, via a graphical user interface at a user client device, a selection of the first object in a first position from a first image;
[Zobel: Fig. 6 and paragraphs 54 (“…The segment with the preferred orientation for each object may be included in the final image 634”), 57 (“…The first frame includes a first depiction of the object of interest with a first orientation”), 62 (“…display the segment 606 and the segment 612 side by side on a touch sensitive display 414. The user may then touch the displayed segment including the preferred orientation of the object, and the selected segment may be included in the final image 634”)]
based on the selection, generate a fixed object image comprising the first object fixed in the first position;
[Zobel: Fig. 6 and paragraphs 48 (“…example segmentation may be…generating another layer including the segment, generating the segment as another image”), 57 (“…The first frame includes a first depiction of the object of interest with a first orientation”)]
present, via the graphical user interface, the fixed object image with the first object fixed in the first position and segmented instances of the second object sequencing through a plurality of positions from images of the sequence of images;
receive, via the graphical user interface, a selection of the second object in a second position from a second image;
based on the selection of the second object, generate a composite image comprising the first object in the first position and the second object in the second position
[Zobel: Fig. 6 and paragraphs 62 (“…a user may select the preferred orientation of an object for the final image. For example, the device 400 may display the segment 606 and the segment 612 side by side on a touch sensitive display 414. The user may then touch the displayed segment including the preferred orientation of the object, and the selected segment may be included in the final image 634”), 63 (“… the segment 616 may be laid as a second layer over the first frame 602 (as a first layer of an image) to cover segment 610…then merge the layers to generate the final image 634”). Note while Zobel uses 606 and 612 as an example, it is obvious to present 610 and 616 to the user to select the preferred one to generate the composite shown in image 634 of Fig. 6]

Regarding claims 2-4, 11, 13,both Hou et al. (US 2021/0158043) and Rhodes et al. (US 2020/0026928) discloses:
(Claims 2 and 11) segment the instances of the first object and the second 
object by assigning pixels of images of the sequence of images labels utilizing a segmentation neural network
(Claim 3) generate object masks for instances of the first object and the second object by extracting groups of pixels with a same label
(Claims 4 and 13) segment the instances of the first object and the second object by detecting the first object and the second object in images of the sequence of images utilizing an object detection neural network
[Hou: Fig. 5 and paragraph 53 (“…neural network module 220 performs semantic segmentation and object detection on an input image…generates a plurality of bounding boxes associated with an object… each bounding box…has an associated confidence level”), 55 (“…generate a mask assignment (instance mask) for the object…defining the contour (outline shape) of the object”).  Note that Zobel discloses segmenting multiple objects in multiple images]
[Rhodes: Fig. 5 and paragraphs 56 (“…The segmentation CNN provides candidate segmentations of the current video frame”), 58 (“…at operation 509, where the final segmentation may be thresholded…to generate a binary segmentation mask…indicating pixels deemed to be within the object of interest”)]

Regarding claim 5, Zobel discloses: 
generate the composite image by inserting a segmented second object in the second position into the fixed object image
[Zobel: Figs. 5-8 and paragraphs 46 (“ FIG. 6…generating a final image 634 from a first frame 602…and from a second frame 604”), 62 (“…the selected segment may be included in the final image 634”), 63 (“… the segment 616 may be laid as a second layer over the first frame 602 (as a first layer of an image) to cover segment 610…then merge the layers to generate the final image 634”)]

Regarding claim 6, Zobel further discloses:
generate the fixed object image by removing the second object from the first image and generating background pixels to fill in pixels of the removed second object
[Zobel: Figs. 9, 10 and paragraphs 63 (“… the segment 616 may be laid as a second layer over the first frame 602 (as a first layer of an image) to cover segment 610…then merge the layers to generate the final image 634”), 74 (“FIG. 10…depiction 1002 may be from a second frame…and the remainder may be from a first frame”), 76 (“… use neighboring pixel data from the background 1006 to fill in region 1004”).  Note that Fig. 10 shows filling the void from portion of the removed second object with background.  See also
Au (US 2007/0286482): Fig. 5 and paragraph 13 (“…The extracted object(s)…can be carved out in the initial set-up image. The results are a "reference with hole" image and an object reference image…The hole…in the "reference with hole" image, can be filled by…an in-painting algorithm”).  Note that in-painting will fill the hole with its surrounding (background in this case) color/pattern]

Regarding claim 7, Zobel and Agarwala et al. (“Interactive digital photomontage,” ACM SIGGRAPH, 2004 – IDS) disclose: 
present the fixed object image and the second object sequencing through the plurality of positions by sequentially superimposing the second object in the plurality of positions from the sequence of images
[Zobel: Figs. 9, 10 and paragraphs 63 (“… the segment 616 may be laid as a second layer over the first frame 602 (as a first layer of an image) to cover segment 610…then merge the layers to generate the final image 634”), 74 (“FIG. 10…depiction 1002 may be from a second frame…and the remainder may be from a first frame”).  Note that Zobel disclose superimposing a second object with the first image.
[Agarwala: Sect. 2.2, 2nd ¶ (“…source images that best satisfy the locally-specified image objective is presented to the user, along with a shaded indication of the region that would be copied to the composite, in…the selection window…As the user scrolls through the source images in the selection window, the composite window is continually updated to show the result of copying the indicated region of the current source to the current composite.  The user can “accept” the current composite at any time”).  Noe that Agarwala discloses presenting composites formed with each images containing desired regions (the second object in Zobel’s case))] 

Regarding claim 8, Rhodes et al. (US 2020/0026928) discloses:
wherein the sequence of images comprises frames from an input video or a plurality of burst images
[Rhodes: Fig. 5 and paragraphs 20 (“…segment each video frame of a video sequence into, for example, foreground and background regions”), 56 (“…The segmentation CNN provides candidate segmentations of the current video frame”)]

Regarding claim 9, Zobel discloses: 
wherein the second position comprises a position outside a frame of the second image
[Fig. 6.  Note that in the final image 634, the second object (the rightmost) is in a second position outside of a frame such as 612 of the second image 632]

Regarding claim 10, it is similarly analyzed and rejected as per the analyses of claim 1 (includes most of the limitations), claim 2 (regarding segmentation using a neural network and claim 3 (regarding object masks). 

Regarding claim 11, it is similarly analyzed and rejected as per the analyses of claim 10 (base claim) and claim 2. 

Regarding claim 12, it is similarly analyzed and rejected as per the analyses of claim 11 (parent claim) and the disclosures of Cleland et al. (US 2014/0079296) for the following:: 
generate filtered images by removing noise from the images of the sequences of images utilizing a median filter;
creating grayscale images from the filtered images;
utilizing the grayscale images as input to the segmentation neural network.
[Cleland: Paragraph 79 (“…By using a color to grayscale image and applying a median filter, image noise may be reduced and a greater number of true branch points could be located. This prepares the image for the blood vessel segmentation”).  Note that while as disclosed grayscale conversion is performed before median filtering, one of ordinary skill would have been motivated to perform median filtering before grayscale conversion as set forth in the claim as there are only a limited number of execution orders (two in this case) and obtain the predictable result of a noise-filtered grayscale image for segmentation.  Note further that the use of a neural network is taught by Hou et al. (US 2021/0158043), per the analysis of claim 10.
See also Karki et al. (US 2020/0202524): Fig. 1 and paragraph 19 (“…The output grayscale images…are sent to a Deep Convolutional Network 106…customized based on the task…include segmentation of the grayscale image”)]

Regarding claim 13, it is similarly analyzed and rejected as per the analyses of claim 10 (base claim) and claim 4.

Regarding claim 14, it is similarly analyzed and rejected as per the analyses of claim 10 (base claim) and claim 6.

Regarding claim 15, it is similarly analyzed and rejected as per the analyses of claim 14 (parent claim) and Agarwala et al. (US 2013/0128121)
generate the background pixels for the region based on analyzing corresponding regions in images of the sequence of images
[Agarwala: Fig. 8 and paragraph 99 (“…Elements 806 through 816 may be repeated for every frame in the video sequence to fill in the hole in that frame…As indicated at 808, a suitable source frame in the video sequence may be found (that is, another frame that has content that may be used to at least partially fill the current hole in the current frame)”)]

Regarding claim 16, it is similarly analyzed and rejected per the analysis of claim 10 (base claim), as well as the disclosure of Wang et al. (US 2017/0206662) for the following:
receiving, via the graphical user interface, a first user input indicating a first location and a second user input indicating a second location in an image of the sequence of images;
[Wang: Figs. 10, 11 and paragraph 126 (“…step 1101, a seed may be defined…with a user intervention. For instance, a user may designate a voxel of interest through a graphical user interface (e.g., a display module 140 in FIG. 1). In some embodiments, two or more seeds may be defined”).  Note that the applied teaching is for a user to indicating positions (“seeds”) for segmentation]
detecting the first object based on the neural network and the first location; and
detecting the second object based on the neural network and the second location.
[Wang: Figs. 10, 11 and paragraphs 119 (“In step 1002, target voxels may be determined based on an algorithm and the seed…the algorithm used in step 1002 may refer to an segmentation algorithm including…a neural network segmentation”)]

Claim 17 is similarly analyzed and rejected per the analysis of claim 10 (base claim), as well as Fig. 6 (with refs. 606 and 608 corresponding to the first and the second objects and 616 the third object mask) .

Regarding claim 18, it is similarly analyzed per the analysis of claim 10 (base claim) with the additional disclosures of Rhodes et al. (US 2020/0026928) and Sun (US 6,731,799):
generating binarized masks comprising approximate boundaries for the instances of the first object and the second object;
[Rhodes: Fig. 5 and paragraphs 56 (“…The segmentation CNN provides candidate segmentations of the current video frame”), 58 (“…at operation 509, where the final segmentation may be thresholded…to generate a binary segmentation mask…indicating pixels deemed to be within the object of interest”)]
refining the approximate boundaries of the binarized masks utilizing an active contour model
[Sun: Fig. 4; col. 8, lines 57-60 (“At step 96, the object boundary is revised, such as with an active contour model to derive a final estimate of the object boundary for the current image frame”); col. 13, lines 50-53 (“Upon completion of the boundary propagation, there is an output estimate of the object boundary. Such output estimate is refined at step 96 (see FIG. 4) using an active contour model or another refining process”)]

Regarding claim 19, it is similarly analyzed and rejected as per the analysis of claim 1, which includes all the limitations of claim 19.

Regarding claim 20, it is similarly analyzed as per the analysis and rejection of claim 19 along with the disclosure by Lettau (US 2013/0124572) of the following:
providing, for display via the graphical user interface at the user client device, a scroll bar for navigating the previews.
[Fig. 3B and paragraph 32 (“…interface 302 is shown with…scroll bar 322…repository navigation tools 336…may be used to navigate between assets 312-319 or others…repository navigation tools 336 may be implemented using …scroll bars…As an example, scroll bar 322 may be used to navigate asset editing panel 360”)]

Claim Interpretation - 35 USC § 112(f)

The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f), is invoked.

As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f):
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 

Use of the word “means” (or “step for”) in a claim with functional language creates a rebuttable presumption that the claim element is to be treated in accordance with 35 U.S.C. 112(f).  The presumption that 35 U.S.C. 112(f) is invoked is rebutted when the function is recited with sufficient structure, material, or acts within the claim itself to entirely perform the recited function.  

Absence of the word “means” (or “step for”) in a claim creates a rebuttable presumption that the claim element is not to be treated in accordance with 35 U.S.C. 112(f).  The presumption that 35 U.S.C. 112(f) is not invoked is rebutted when the claim element recites function but fails to recite sufficiently definite structure, material or acts to perform that function. 

Claim limitations in this application that use the word “means” (or “step for”) are presumed to invoke 35 U.S.C. 112(f) except as otherwise indicated in an Office action.  Similarly, claim elements that do not use the word “means” (or “step for”) are presumed not to invoke 35 U.S.C. 112(f) except as otherwise indicated in an Office action.

This application includes one or more claim limitations in claims 1-20 that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f), because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation(s) include the placeholder “device” (such as “computing device” included in claims 1 and 10 and “user client device” included in claim 19) either expressly or, in the case of dependent claims, by inheritance.

A review of the specification shows that the following appears to be the corresponding structure(, material, or acts for performing the claimed function) described in the specification for the 35 U.S.C. 112(f) limitation: Fig. 1 and paragraph 42 of the published application for the user client device as well as Fig. 11 and paragraphs 123 and 127. 

Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) , it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.

If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f), applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f).

For more information, see MPEP § 2173 et seq. and Supplementary Examination Guidelines for Determining Compliance With 35 U.S.C. 112 and for Treatment of Related Issues in Patent Applications, 76 FR 7162, 7167 (Feb. 9, 2011).

Conclusion and Contact Information

The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
Huang et al. (US 2014/0369627)—[Figs. 4, 5 and paragraphs 35 (“… utilization score of digital photos 400a, 400c meet or exceed a utilization threshold…stores the digital photos 400a, 400c for compositing purposes”), 41 (“…selects one of the digital photos 400a, 400c (FIG. 4)…as a base photo for compositing purposes”), 42 (“…retrieve candidate region(s)…for purposes of inserting…in place of a swap region in the base frame”)]
Ghadyali et al. (US 2021/0019528)—[Figs. 30, 35 and paragraphs 329 (“…produces the detected bounding boxes on objects in the image…The human identification and segmentation can be achieved…by a deep learning neural network model which classifies each pixel as human from the rest of the scene”)]
Lukk et al. (US 2012/0176481)—[Fig. 6, paragraph 55 (“…images 631 and 632 further include hole areas 637 and 638…holes such as hole areas 637 and 638 are filled at least in part by using data from secondary images 616 and 617”) and claim 3 (“…filling a hole in at least a first one of the stereoscopic images using image data from one or more images of the secondary image sequences”)]
Vachtsevanos et al. (US 2002/0054694)—[Fig. 7 and paragraph 189 (“…Some techniques that may be utilized as part preprocessing module 55 include…median filters… Wavelet neural network (WNN) module 190 can act as…the pattern identifier and classifier”)]
Ajemba et al. (US 2013/0230230)—[Fig. 1 and paragraph 116 (“… the initial segmentation may utilize an average size median filter of 10x10”)]
Brada et al. (US 2020/0294239)—[Figs. 5, 6 and paragraphs 46 (“…a cascaded neural network that separately processes a gray-scale representation of the image and the initial segmentation map, and combines the separately processed results of both to generate the refined segmentation map”), 53 (“… the refinement predictive model may include a cascaded neural network that separately processes an RGB or a gray-scale representation of the image and the initial segmentation map, and combines the separately processed results to generate the refined segmentation map”)]
Sunkavalli et al. (“Video Snapshots: Creating High-Quality Images from Video Clips,” IEEE Transactions on Visualization and Computer Graphics ( Volume: 18, Issue: 11, November 2012; Date of Publication: 06 March 2012)
Teodosio et al. (“Salient Video Stills: Content and Context Preserved,” Proc. ACM Int’l Conf. Multimedia, Vol. 1, No. 1, February, 2005)
Kwatra et al. (“Graph-cut textures: Image and video synthesis using graph cuts,”  ACM Transactions on Graphics, Vo. 22, Issue 3; July 2003)

/YUBIN HUNG/Primary Examiner, Art Unit 2662                                                                                                                                                                                                        September 1, 2022