Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claims 1-20 are pending.
Drawings as filed 07/26/2019 are accepted.
IDS as filed are considered.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1, 11, 18, 3, 7, 19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Sigmard et al. (US 2003/0202697) in view of Wang et al (CN 109671102).
As to claim 1:
Sigmard discloses a non-transitory computer-readable medium (¶0184, CRM) storing instructions thereon that, when executed by at least one processor, cause a computing device to:

identify a foreground image, a background image, and a segmentation mask corresponding to the foreground image; (¶0013, foreground and background images, segmenting mask are produced)

and

generate a composite digital image based on the foreground image, the background image, and the segmentation mask (See Fig. 1, ¶0043, using processed image data from 106, 108, 110, the 112 combines foreground/background processed data along with mask data to generate an output image file that is composite of said various data) by:

generating a foreground feature map based on the foreground image and the segmentation mask utilizing a foreground encoder of image analysis system; (¶0037, the segmenting mask is used to separate the foreground from the input, see at least ¶0134-0137, the foreground encoder then encodes the outputted foreground and generate a foreground bitstream ouput of the layer, i.e. foreground feature map)

 (¶0037, the segmenting mask is used to separate the foreground from the input, see at least ¶0142-0143, the background encoder then encodes the outputted background and generate a background bitstream ouput of the layer, i.e. foreground feature map

generating the composite digital image based on the foreground feature map and the background feature map using a decoder of image analysis system. (See Fig. 1, ¶0043, using processed image data from elements 106, 108, 110, the element 112 combines and process the encoded foreground/background processed data along with mask data to generate an output image file that is composite of said various data)

Sigmard discloses a trained machine learning system configured to segment and separate/detect foreground objects from an input image, however does not explicitly disclosing the mechanism being a multi-level fusion neural network. 

Wang, however, in a related field of endeavor for image analysis that the use of fusion convolution (multi-layer) neural network is used to detect and process foreground features in images/video frames for various purposes.  (See at least Abstract, Fig. 1 and its description, which shows a fusion CNN is used for separate and process foreground from background)

It would have been obvious to one of ordinary skill in the art before the effective filing time of the invention that the trained image analysis system of Sigmard to implement its features using 


As to claim 11:
Sigmard discloses a system comprising: at least one memory device (¶0184, system) comprising a network trained to generate composite digital images, the network comprising a foreground encoder, a background encoder, and a decoder; (See fig. 1,  108, 110, 112, respectively)

at least one server device (¶0029, server) that causes the system to:

identify a foreground image and a background image; (¶0013, foreground and background images, segmenting mask are produced)


generate a segmentation mask based on the foreground image utilizing a foreground segmentation neural network; (¶0013, segmenting mask is produced as the image is segmented, i.e. by a segmentation NN, segmenting mask is a binary image that has pixels reflecting the corresponding foreground elements against background)


(¶0037, the segmenting mask is used to separate the foreground from the input, see at least ¶0134-0137, the foreground encoder then encodes the outputted foreground and generate a foreground bitstream ouput of the layer, i.e. foreground feature map)

generate a background feature map based on the background image and the segmentation mask utilizing the background encoder of a decoder of image analysis system; (¶0037, the segmenting mask is used to separate the foreground from the input, see at least ¶0142-0143, the background encoder then encodes the outputted background and generate a background bitstream ouput of the layer, i.e. foreground feature map


combine the foreground feature map and the background feature map to generate a combined feature map; and generate a composite digital image based on the combined feature map using the decoder of image analysis system. (See Fig. 1, ¶0043, using processed image data from elements 106, 108, 110, the element 112 combines and process the encoded foreground/background processed data along with mask data to generate an output image file that is composite of said various data)

Sigmard discloses a trained machine learning system configured to segment and separate/detect foreground objects from an input image, however does not explicitly disclosing the mechanism being a multi-level fusion neural network. 

Wang, however, in a related field of endeavor for image analysis that the use of fusion convolution (multi-layer) neural network is used to detect and process foreground features in images/video frames for various purposes.  (See at least Abstract, Fig. 1 and its description, which shows a fusion CNN is used for separate and process foreground from background)

It would have been obvious to one of ordinary skill in the art before the effective filing time of the invention that the trained image analysis system of Sigmard to implement its features using the fusion convolution neural network mechanism as suggested by Wang’s disclosure.  The implementation would advantageously reap the benefit such as a strong robustness and high accuracy in object detection (Wang, abstract).

As to claim 18:
Sigmard discloses, in a digital medium environment for editing digital images, a computer-implemented method (Abstract) comprising:

identifying a foreground image, a background image, and a segmentation mask corresponding to the foreground image; and(¶0013, foreground and background images, segmenting mask are produced. segmenting mask is produced as the image is segmented, i.e. by a segmentation NN, segmenting mask is a binary image that has pixels reflecting the corresponding foreground elements against background)


See Fig. 1, ¶0043, using processed image data from elements 106, 108, 110, the element 112 combines and process the encoded foreground/background processed data along with mask data to generate an output image file that is composite of said various data)
Sigmard discloses a trained machine learning system configured to segment and separate/detect foreground objects from an input image, however does not explicitly disclosing the mechanism being a multi-level fusion neural network. 

Wang, however, in a related field of endeavor for image analysis that the use of fusion convolution (multi-layer) neural network is used to detect and process foreground features in images/video frames for various purposes.  (See at least Abstract, Fig. 1 and its description, which shows a fusion CNN is used for separate and process foreground from background)

It would have been obvious to one of ordinary skill in the art before the effective filing time of the invention that the trained image analysis system of Sigmard to implement its features using the fusion convolution neural network mechanism as suggested by Wang’s disclosure.  The implementation would advantageously reap the benefit such as a strong robustness and high accuracy in object detection (Wang, abstract).

As to claim 3/19:
Sigmard in view of Wang discloses all limitations of claim 1/18, wherein the instructions, when executed by the at least one processor, cause the computing device to identify the segmentation  (Sigmard, ¶0013, segmenting mask is produced as the image is segmented, i.e. by a segmentation NN, segmenting mask is a binary image that has pixels reflecting the corresponding foreground elements against background)

As to claim 7:
Sigmard in view of Wang discloses all limitations of claim 1, wherein the composite digital image comprises a foreground object from the foreground image portrayed against a scene from the background image. (See Sigmard, ¶0178, Fig. 6, the recombined image is identical or close approximation of the original image, and per Fig. 6, foreground object is portrayed against a background scene)


Claims 2, 16 is/are rejected under 35 U.S.C. 103 as being unpatentable over Sigmard et al. (US 2003/0202697) in view of Wang et al (CN 109671102) and in further view of Zhang et al. (US 2015/0237324).
As to claim 2/16:
Sigmard in view of Wang discloses all limitations of claim 1/11, and regarding: generate an inverted segmentation mask based on the segmentation mask corresponding to the foreground image, wherein generating the background feature map based on the background image and the segmentation mask comprises generating the background feature map based on the background image and the inverted segmentation mask.
(Sigmard, ¶0037, the segmenting mask is used to separate the foreground from the input, see at least ¶0134-0137, the foreground encoder then encodes the outputted foreground and generate a foreground bitstream ouput of the layer, i.e. foreground feature map), however is silent on the use of an inverted segmentation map as claimed


The examiner however asserts that the use of inverted segmentation mask is well known in object recognition field of endeavor. Zhang, in a related field of endeavor, discloses obtaining an inverted segmentation mask (i.e. by negation processing of the segmentation mask), which allows for pixel-by-pixel comparison, which allows for best matching partition (¶0006)

It would have been obvious to one of ordinary skill in the art before the effective filing time of the invention that the system of Sigmard and Wang to incorporate the use of inverted segmentation mask as outlined by Zhang.  Such implementation would advantageously allow for improved depth analysis (foreground vs. background) and a best matching partition between a partition type and segmentation mask.



Claims 4, 17, and 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Sigmard et al. (US 2003/0202697) in view of Wang et al (CN 109671102) and in further view of Lin (US 2018/0232887)

As to claim 4/20:
Sigmard in view of Wang discloses all limitations of claim 3/19, however is silent on instructions that, when executed by the at least one processor, cause the computing device to modify the segmentation mask corresponding to the foreground image utilizing a mask refinement neural network.

However the examiner asserts that updating a learning model is a routine in the art to ensure optimized operations.  In a related field of endeavor, Lin discloses segmentation masks can be refined using one or more neural networks. (See at least ¶0058, 0060, 0065, Claim 5 of Lin)

It would have been obvious to one of ordinary skill in the art before the effective filing time of the invention that the system of Sigmard and Wang to incorporate Lin’s feature of refining segmentation masks using one or more neural network.  The implementation would advantageously avoid existing issues such as false positives or failure to detect foreground and various other issues describes in ¶0007-0009 of Lin.  

As to claim 17:
Sigmard in view of Wang discloses all limitations of claim 11, however is silent on  the at least one server device causes the system to modify a boundary of a foreground object portrayed in the 

However the examiner asserts that updating a learning model as well as foreground detection results is a routine in the art to ensure optimized operations.  In a related field of endeavor, Lin discloses segmentation masks can be refined using one or more neural networks, and can result to modify a boundary of a foreground object portrayed in the segmentation mask based on the foreground image and the segmentation mask utilizing a mask refinement neural network (See at least ¶ 0075, 0058, 0060, 0065, Claim 5 of Lin)
It would have been obvious to one of ordinary skill in the art before the effective filing time of the invention that the system of Sigmard and Wang to incorporate Lin’s feature of refining segmentation masks and object detection results using one or more neural network.  The implementation would advantageously avoid existing issues such as false positives or failure to detect foreground and various other issues describes in ¶0007-0009 of Lin.  

Claims 5, 6 is/are rejected under 35 U.S.C. 103 as being unpatentable over Sigmard et al. (US 2003/0202697) in view of Wang et al (CN 109671102) and in further view of Liu et al. (US 2019/0295228).
As to claim 5:
Sigmard in view of Wang discloses all limitations of claim 1, and Sigmard discloses 

identify a first layer-specific feature map generated by the foreground encoder and a second layer-specific feature map generated by the background encoder; (See ¶0042, 0014, Fig. 1, foreground layer bitstream associated with the foreground and background layer bitstream associated with the background are produced by respective encoders)

provide the first layer-specific feature map and the second layer-specific feature map to a layer of the decoder of the multi-level fusion neural network via p links, (see Fig. 1 or 5, the outputted bitstreams from each encoders are provided via respective links to the combiner)

wherein generating the composite digital image based on the foreground feature map and the background feature map using the decoder of the multi-level fusion neural network comprises generating the composite digital image further based on the first layer-specific feature map and the second layer-specific feature map using the decoder. (See Fig. 1, ¶0043, using processed image data from elements 106, 108, 110, the element 112 combines and process the encoded foreground/background processed data along with mask data to generate an output image file that is composite of said various data)

Except that neither Sigmard nor Wang discloses the use of skip link between encoder-decoder. 

The Examiner asserts that the use of skip links are well known in the art, as evidenced in Liu’s disclosure in Abstracts, ¶0008, 0135, which discloses skip links for encoders to communicate with other elements of the system. 

It would have been obvious to one of ordinary skill in the art before the effective filing time of the invention that the system of Sigmard and Wang to incorporate the use of skip links for 

As claim 6:
Sigmard in view of Wang and Liu discloses all limitations of claim 5, wherein a layer of the foreground encoder corresponding to the first layer-specific feature map is at a same encoder level as a layer of the background encoder corresponding to the second layer-specific feature map. (See Fig. 1 of Sigmard, all encoders are parallel in terms of process flow, and not in sequential order)

Claims 8-10 is/are rejected under 35 U.S.C. 103 as being unpatentable over Sigmard et al. (US 2003/0202697) in view of Wang et al (CN 109671102) and in further view of Kanazawa et al. (US 2020/0218961)

As to claim 8.    
Sigmard in view of Wang and Liu discloses all limitations of claim 1, however is silent on the foreground image comprises a training foreground image and the background image comprises a training background image; and further comprising instructions that, when executed by the at least one processor, cause the computing device to train the multi-level fusion neural network to generate composite digital images by: comparing the composite digital image to a target composite digital image to determine a loss; and modifying parameters of the multi-level fusion neural network based on the determined loss.
See ¶0040, 0049, 0096, 0125, the disclosures describe a back propagating procedures in which characteristics of the results of segmentation analysis is compared against a ground truth mask to determine loss, and by which the learning model can be updated based the loss)

It would have been obvious to one of ordinary skill in the art before the effective filing time of the invention that the system of Sigmard and Wang to include the loss-based refining of neural networks of Kanazawa.  Such implementation allows continuous optimization of the model ensure the system to improve performance with time (See ¶0040, 0086 of kanazawa)

As to claim 9:
Sigmard in view of Wang and Kanazawa discloses all limitations of claim 8, wherein the instructions, when executed by the at least one processor, cause the computing device to identify the foreground image by generating the foreground image utilizing the multi-level fusion neural network. (in view of Wang as applied in claim 1, See Sigmard, ¶0013, identifying and producing a foreground image)


As to claim 10:

Allowable Subject Matter
Claims 12-15 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
Conclusion

Any inquiry concerning this communication or earlier communications from the examiner should be directed to QUAN M HUA whose telephone number is (571)270-7232.  The examiner can normally be reached on 10:30-6:30.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Anthony Addy can be reached on 571-272-7795.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/QUAN M HUA/Primary Examiner, Art Unit 2645