DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Objections
Claims 1 and 16 are objected to because of the following informalities:  the claim limitation “an first set” in line 3 of claim 1 and claim 16 is not correct. Appropriate correction is required.

Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.
Claims 1-2, 6, 9-10, 13, and 15-16, and 18-20 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1-20 of U.S. Patent No. US 11,037,341 B1. Although the claims at issue are not identical, they are not patentably distinct from each other because all the limitations in claim 1 is anticipated by claim 10 of the patent.
Application: 17,317,246
Patent US 11,037,341
Claim 1
Claim 10
1. A method for generating shapes implemented by at least one computing device, the method comprising:
10. A method for generating shapes implemented by at least one computing device, the method comprising:

transforming, by the at least one computing device, a set visual elements into distance field representations of the visual elements;
encoding, by the at least one computing device, an first set of visual elements to generate a latent representation;
generating, by the at least one computing device, a latent representation of the visual elements based on the distance field representations, said generating including:

training, by the at least one computing device, a machine learning model by inputting the distance field representations into the machine learning model; and

generating the latent representation by the trained machine learning model as distance field representations of geometric relationships between the visual elements in each set of visual elements; 
decoding, by the at least one computing device, the latent representation to generate a set of decoded visual elements;

decoding, by the at least one computing device, the latent representation to generate a set of decoded visual elements that are each comprised of a set of parameters that describe geometric aspects of each decoded visual element;
calculating, by the at least one computing device, an accuracy probability for individual decoded visual elements of the set of decoded visual elements;
removing, by that at least one computing device, a particular decoded visual element based on determining that an accuracy probability for the particular decoded visual element is below a threshold accuracy probability; and
removing, by the at least one computing device, at least one decoded visual element by predicting an accuracy probability for each decoded visual element, and determining that an accuracy probability for the at least one decoded visual element is below an accuracy probability threshold; and
generating, by the at least one computing device, a shape utilizing remaining decoded visual elements.
generating, by the at least one computing device, a shape utilizing the remaining decoded visual elements and based the set of parameters for each remaining decoded visual element.
2. A method as described in claim 1, further comprising transforming the first set of visual elements into distance field representations, and wherein said encoding comprises encoding the distance field representations to generate the latent representation.
Claim 10:
transforming, by the at least one computing device, a set visual elements into distance field representations of the visual elements;
generating, by the at least one computing device, a latent representation of the visual elements based on the distance field representations, said generating including:
6. A method as described in claim 1, wherein said calculating an accuracy probability for individual decoded visual elements is performed by a machine learning model and comprises configuring a bidirectional chamfer distance of a loss function to enable the machine learning model to calculate the accuracy probability based on an effect of individual decoded visual elements on evaluation of the loss function.
12. A method as described in claim 10, wherein the machine learning model utilizes a loss function, and wherein said predicting an accuracy probability for each decoded visual element comprises configuring a bidirectional chamfer distance of the loss function to enable the machine learning model to predict an accuracy probability based on an affect effect of each decoded visual element on evaluation of the loss function.


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-15 are rejected under 35 U.S.C. 103 as being unpatentable over Deng (US 20190318261 A1), in view of Doran (US 20170039739 A1), and further in view of Yumer (US 20180253869 A1).
Regarding to claim 1, Deng discloses a method for generating shapes implemented by at least one computing device (Fig. 1; [0040]: process the latent representation; output a score; [0046]: generate and display various icons, and symbols, i.e. shapes, to the user; Fig. 11A; [0107]: perform image captioning; Fig. 11A; [0110]: generate and obtain the image captioning results  using the NSE, ALISE, and ALISE+NSE approaches), the method comprising:
encoding, by the at least one computing device, an first set of visual elements to generate a latent representation (Fig. 1; [0040]: process the latent representation; Fig. 2; [0055]:  perform sequence encoding and decoding using an encoder-decoder framework for labeling data, such as RNN encoder-decoder;  [0056]: the feature encoders 204, 206 and the feature decoder 210 are configured as a convolutional neural network for image data; Fig. 2; [0057]: the first feature encoder 204 receives the input and maps the input to a first latent representation 212;  the first feature encoder generates a first latent representation; 
    PNG
    media_image1.png
    647
    975
    media_image1.png
    Greyscale
; [0064]: generate the first latent representation 212 and the second latent representation);
decoding, by the at least one computing device, the latent representation to generate a set of decoded visual elements ([0056]: decode encoded inputs from the first feature encoder; Fig. 2; [0057]: the feature decoder 210 accepts the first latent representation 212 as a conditional input and sequentially predicts each token; the feature decoder 210 thus provides for a set of target output data for the labeled data; Fig. 3; [0063]: the encoder-decoder framework);
calculating, by the at least one computing device, an accuracy probability for individual decoded visual elements of the set of decoded visual elements ([0057]: RNN calculates and models a generative probability; obtain and calculate the training loss of the sequence learning process  by counting the differences between the predicted sequence and the ground truth labels; [0060]: minimize the loss of the first feature encoder 204; [0064]: calculate and provide similarity scores for the labeled data and unlabeled data to already-labeled data; [0065]: the processor minimizes the loss and updates the parameters of the feature encoders 204, 206 and the feature decoder 210.);
generating, by the at least one computing device, a shape ([0085]: cause a prompt, i.e. including a shape, to be displayed to the user on the first electronic device; Fig. 11A; [0107]: perform image captioning; Fig. 11A-E; [0110]: obtain and generate the image captioning result, i.e. shapes, using the NSE, ALISE, and ALISE+NSE approaches; [0111]: words includes shapes).
Deng fails to explicitly disclose:
removing, by that at least one computing device, a particular decoded visual element based on determining that an accuracy probability for the particular decoded visual element is below a threshold accuracy probability; and
generating, by the at least one computing device, a shape utilizing remaining decoded visual elements.
In same field of endeavor, Doran teaches:
removing, by that at least one computing device, a particular decoded visual element based on determining that an accuracy probability for the particular decoded visual element is below a threshold accuracy probability ([0072]: discard and remove points based on determining that points fall outside the finite portion of the canonical curve, i.e. accuracy probability is zero; Fig. 2; [0235]: discard and remove one or more of the input Bezier curves 2, 3, 4, 5, 6, 7, 8, 9 which are clearly further away than others, i.e. accuracy probability is very low; the input curves 3, 9 to the left of the glyph 1 are discarded; [0248]: discard and remove sample points as falling outside the glyph, i.e. accuracy probability is zero);
generating, by the at least one computing device, a shape utilizing remaining decoded visual elements (Fig. 1; [0177]: generate a render output; output includes shapes as illustrated in Fig. 1; Fig. 2; [0235]:  generate and make up the glyph 1, i.e. shape, only using a sub-set of all of the multiple input Bezier curves  by discarding one or more of the input Bezier curves which are clearly further away than others; [0248]: render the sample position as falling inside the glyph).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Deng to include removing, by that at least one computing device, a particular decoded visual element based on determining that an accuracy probability for the particular decoded visual element is below a threshold accuracy probability as taught by Doran. The motivation for doing so would have been to discard points which fall outside the finite portion of the canonical curve; to determine the distance between the transformed sampling point and the closest point on the canonical curve to the transformed sampling point in the canonical space; to discard one or more of the input Bezier curves 2, 3, 4, 5, 6, 7, 8, 9 which are clearly further away than others as taught by Doran in paragraphs [0072], [0074], and [0235].
Deng in view of Doran fails to explicitly disclose a shape.
In same field of endeavor, Yumer teaches:
generating, by the at least one computing device, a shape (Fig. 4; [0087]: generate the synthesized target image 422 by replacing and removing the material property set 410 with the training target property 418; 
    PNG
    media_image2.png
    123
    315
    media_image2.png
    Greyscale
 ; [0141]: 40 shapes per each category for joint category fine-tuning; [0143]: experimenters randomly selected an image corresponding to a shape as input; the neural network generated a modified digital image reflecting the target material property set.).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Deng in view of Doran to include generating, by the at least one computing device, a shape as taught by Yumer. The motivation for doing so would have been to improve efficiency of training neural networks; to generate the synthesized target image 422, i.e. a car shape, by replacing the material property set 410 with the training target property; to improve the accuracy of generating modified digital images as taught by Yumer in paragraphs [0030], [0087], and [0140]. 

Regarding to claim 2, Deng in view of Doran and Yumer discloses a method as described in claim 1, further comprising transforming the first set of visual elements into distance field representations (Doran; [0008]: store the pre-calculated signed distance field offline; Fig. 1; [0177]: calculate the signed distance field for each individual glyph; Fig. 6; [0240]: a graphical representation of an exemplary signed distance field is shown in FIG. 6 for the glyphs representing the characters; [0243]: generate the signed distance fields; Fig. 8; [0251]: SDF generator produces a signed distance field for each glyph), and wherein said encoding comprises encoding the distance field representations to generate the latent representation (Doran; Fig. 1; [0177]: calculate the signed distance field for each individual glyph; Fig. 7; [0242]: generate a graphical representation of the signed distance field for each of the glyphs in a font represented graphically as a texture atlas 29; Fig. 8; [0252]: access the values in the signed distance field and combine the signed distance field for the glyph, along with information relating to the paths).
Deng in view of Doran and Yumer further discloses wherein said encoding comprises encoding the distance field representations to generate the latent representation (Deng;  [0057]: the first feature encoder 204 receives the input x.sup.L and maps the input to a first latent representation 212; generate a first latent representation; [0064]: generate the first latent representation 212 and the second latent representation).

Regarding to claim 3,  Deng in view of Doran and Yumer discloses a method as described in claim 1, wherein said decoding comprises decoding the latent representation based on a complexity variable to generate the set of decoded visual elements to include more visual elements than the first set of visual elements (Deng; [0055]: a long short-term memory (LSTM) encoder-decoder, a recurrent neural network (RNN) encoder-decoder, or other encoder-decoder framework used for purposes such as natural language understanding and slot filling, image recognition, image captioning, or other purposes; [0057]: the feature decoder 210 accepts the first latent representation 212 as a conditional input and sequentially predicts each token in y.sup.P; 
    PNG
    media_image3.png
    58
    401
    media_image3.png
    Greyscale
 ; the decoder 210 outputs each token y.sub.t based on the input of the t.sup.th step and the memory vector maintained by the RNN; multiple steps generate more data; Fig. 9A; Fig. 9B; [0097]).

Regarding to claim 4,  Deng in view of Doran and Yumer discloses a method as described in claim 3, wherein the complexity variable is user specified (Deng; [0042]: an application receives input data from a user; [0060]:  manual labeling by a user and image captioning by a user; Fig. 9A; Fig. 9B; [0097]: sample numbers are specified by users; [0106]: the 82,783 images of the training set are used as the data pool for active learning and query selection).

Regarding to claim 5,  Deng in view of Doran and Yumer discloses a method as described in claim 3, wherein the complexity variable specifies a number of visual elements to be included in the decoded visual elements (Yumer; Fig. 5B; [0098]: the architecture 510 utilizes an input digital image volume of size 3×256×256; the architecture 510 is composed of seven convolutional layers where the output of each layer is half the size of its input; Fig. 5C; [0099]: the architecture 520 utilizes an input digital image volume of size 3×256×256 and 14 layers of various dimensionality to generate an output illumination environment map of size 3×64×128.).

Regarding to claim 6, Deng in view of Doran and Yumer discloses a method as described in claim 1, wherein said calculating an accuracy probability for individual decoded visual elements is performed by a machine learning model (Deng; [0057]: RNN, i.e. a machine learning model, calculates and models a generative probability; the training loss of the sequence learning process is obtained by counting the differences between the predicted sequence and the ground truth labels; [0060]: minimizing the loss of the first feature encoder 204, the second feature encoder 206, and the feature decoder 210 and minimizing the loss of the adversarial discriminator 216; [0065]:  the processor minimizes the loss and updates the parameters of the feature encoders 204, 206 and the feature decoder 210) and comprises configuring a bidirectional chamfer distance of a loss function (Yumer; define a joint loss function that evaluates both the predictions of individual prediction facilities and images synthesized by the rendering layer) to enable the machine learning model to calculate the accuracy probability based on an effect of individual decoded visual elements on evaluation of the loss function (Deng; [0057]: a generative probability is modeled and calculated by an RNN; the training loss of the sequence learning process is obtained by counting the differences between the predicted sequence and the ground truth labels; [0060]: minimizing the loss of the first feature encoder 204, the second feature encoder 206, and the feature decoder 210 and minimizing the loss of the adversarial discriminator 216; [0064]: calculate and provide similarity scores for the labeled data and unlabeled data to already-labeled data; [0065]:  the processor minimizes the loss and updates the parameters of the feature encoders 204, 206 and the feature decoder 210; [0104]: a METEOR accuracy indicator).

Regarding to claim 7, Deng in view of Doran and Yumer discloses a method as described in claim 1, wherein said decoding is performed by a decoder module that is trained using the first set of visual elements to generate the set of decoded visual elements (Deng; [0057]: a generative probability is modeled and calculated by an RNN; the training loss of the sequence learning process is obtained by counting the differences between the predicted sequence y.sup.P and the ground truth labels; [0105]: this dataset includes 82,783 images for training, 40,504 images for validation, and 40,775 images for testing; [0106]: the 82,783 images of the training set are used as the data pool for active learning and query selection), and wherein said calculating an accuracy probability is performed by an accuracy predictor module that is trained using the set of decoded visual elements to perform said calculating the accuracy probability (Deng; [0057]: a generative probability is modeled and calculated by an RNN; the training loss of the sequence learning process is obtained by counting the differences between the predicted sequence y.sup.P and the ground truth labels y.sup.L; [0060]: minimizing the loss of the first feature encoder 204, the second feature encoder 206, and the feature decoder 210 and minimizing the loss of the adversarial discriminator 216; [0065]: the processor minimizes the loss and updates the parameters of the feature encoders 204, 206 and the feature decoder 210.).

Regarding to claim 8,  Deng in view of Doran and Yumer discloses a method as described in claim 1, further comprising:
receiving an input interaction to manipulate a visual element of the generated shape (Yumer; [0004]: digital image editing systems; [0028]: conventional digital editing systems; Fig. 1; [0038]: Fig. 2; [0046]: an input digital image 202; Fig. 8; [0157]: the target property manager 810 receives user input in relation to the visual indication; [0158]; [0171]: the client device 902a sends a request to the server(s) 906 to edit one or more digital images);
processing the manipulated visual element to adjust the manipulated visual element to a defined shape manifold for the generated shape (Yumer; Fig. 2; [0046]: the digital neural network rendering system provides the input digital image 202 to the neural network 200 and the neural network generates the modified digital image 206; [0159]: the digital image manager 812 receives user input and select one or more input digital images to provide the neural network 802; [0171]: the server(s) 906 provides the selected digital image as an input digital image to a neural network with a rendering layer), wherein the shape manifold is based on the latent representation (Deng; Fig. 2; [0057]: the first feature encoder 204 receives the input and maps the input to a first latent representation 212;  the first feature encoder generates a first latent representation; [0159]:  the digital image manager 812 generates an array comprising a plurality of input digital images and modified digital images); and
generating an edited shape based on the adjustment to the manipulated visual element (Yumer; [0004]: digital image editing systems; [0028]: conventional digital editing systems; Fig. 8; [0157]:based on the user input, the target property manager 810 identifies a target property to apply to an input digital image in generating a modified digital image; [0158]).

Regarding to claim 9,  Deng discloses a method for generating shapes implemented by at least one computing device (Fig. 1; [0040]: process the latent representation; output a score; [0046]: generate and display various icons, and symbols, i.e. shapes, to the user; Fig. 11A; [0107]: perform image captioning; Fig. 11A; [0110]: generate and obtain the image captioning results  using the NSE, ALISE, and ALISE+NSE approaches), the method comprising:
encoding, by the at least one computing device, a set of visual elements to generate a latent representation (Fig. 1; [0040]: process the latent representation; Fig. 2; [0055]:  perform sequence encoding and decoding using an encoder-decoder framework for labeling data, such as RNN encoder-decoder;  [0056]: the feature encoders 204, 206 and the feature decoder 210 are configured as a convolutional neural network for image data; Fig. 2; [0057]: the first feature encoder 204 receives the input and maps the input to a first latent representation 212;  the first feature encoder generates a first latent representation; 
    PNG
    media_image1.png
    647
    975
    media_image1.png
    Greyscale
; [0064]: generate the first latent representation 212 and the second latent representation);
decoding, by the at least one computing device, the latent representation to generate a set of decoded visual elements ([0056]: decode encoded inputs from the first feature encoder; Fig. 2; [0057]: the feature decoder 210 accepts the first latent representation 212 as a conditional input and sequentially predicts each token; the feature decoder 210 thus provides for a set of target output data for the labeled data; Fig. 3; [0063]: the encoder-decoder framework);
generating, by the at least one computing device, a shape ([0085]: cause a prompt, i.e. including a shape, to be displayed to the user on the first electronic device; Fig. 11A; [0107]: perform image captioning; Fig. 11A-E; [0110]: obtain and generate the image captioning result, i.e. shapes, using the NSE, ALISE, and ALISE+NSE approaches; [0111]: words includes shapes);
wherein the shape manifold is based on the latent representation (Deng; Fig. 2; [0057]: the first feature encoder 204 receives the input and maps the input to a first latent representation 212;  the first feature encoder generates a first latent representation; [0159]:  the digital image manager 812 generates an array comprising a plurality of input digital images and modified digital images).
Deng fails to explicitly disclose:
generating, by the at least one computing device, a shape utilizing the decoded visual elements;
receiving an input interaction to manipulate a visual element of the generated shape;
processing the manipulated visual element to adjust the manipulated visual element to a defined shape manifold for the generated shape, wherein the shape manifold is based on the latent representation; and 
generating an edited shape based on the adjustment to the manipulated visual element.
In same field of endeavor, Doran teaches:
generating, by the at least one computing device, a shape utilizing the decoded visual elements (Fig. 1; [0177]: generate a render output; output includes shapes as illustrated in Fig. 1; Fig. 2; [0235]:  generate and make up the glyph 1, i.e. shape, only using a sub-set of all of the multiple input Bezier curves  by discarding one or more of the input Bezier curves which are clearly further away than others; [0248]: render the sample position as falling inside the glyph);
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Deng to include generating, by the at least one computing device, a shape utilizing the decoded visual elements as taught by Doran. The motivation for doing so would have been to discard points which fall outside the finite portion of the canonical curve; to determine the distance between the transformed sampling point and the closest point on the canonical curve to the transformed sampling point in the canonical space; to discard one or more of the input Bezier curves 2, 3, 4, 5, 6, 7, 8, 9 which are clearly further away than others as taught by Doran in paragraphs [0072], [0074], and [0235].
Deng in view of Doran fails to explicitly disclose: 
receiving an input interaction to manipulate a visual element of the generated shape;
processing the manipulated visual element to adjust the manipulated visual element to a defined shape manifold for the generated shape, wherein the shape manifold is based on the latent representation; and 
generating an edited shape based on the adjustment to the manipulated visual element.
In same field of endeavor, Yumer teaches:
generating, by the at least one computing device, a shape (Fig. 4; [0087]: generate the synthesized target image 422 by replacing and removing the material property set 410 with the training target property 418; 
    PNG
    media_image2.png
    123
    315
    media_image2.png
    Greyscale
 ; [0141]: 40 shapes per each category for joint category fine-tuning; [0143]: experimenters randomly selected an image corresponding to a shape as input; the neural network generated a modified digital image reflecting the target material property set.);
receiving an input interaction to manipulate a visual element of the generated shape (Yumer; [0004]: digital image editing systems; [0028]: conventional digital editing systems; Fig. 1; [0038]: Fig. 2; [0046]: an input digital image 202; Fig. 8; [0157]: the target property manager 810 receives user input in relation to the visual indication; [0158]; [0171]: the client device 902a sends a request to the server(s) 906 to edit one or more digital images);
processing the manipulated visual element to adjust the manipulated visual element to a defined shape manifold for the generated shape (Yumer; Fig. 2; [0046]: the digital neural network rendering system provides the input digital image 202 to the neural network 200 and the neural network generates the modified digital image 206; [0159]: the digital image manager 812 receives user input and select one or more input digital images to provide the neural network 802; [0171]: the server(s) 906 provides the selected digital image as an input digital image to a neural network with a rendering layer); and 
generating an edited shape based on the adjustment to the manipulated visual element (Yumer; [0004]: digital image editing systems; [0028]: conventional digital editing systems;  Fig. 8; [0157]:based on the user input, the target property manager 810 identifies a target property to apply to an input digital image in generating a modified digital image; [0158]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Deng in view of Doran to include generating, by the at least one computing device, a shape; receiving an input interaction to manipulate a visual element of the generated shape; processing the manipulated visual element to adjust the manipulated visual element to a defined shape manifold for the generated shape; generating an edited shape based on the adjustment to the manipulated visual element as taught by Yumer. The motivation for doing so would have been to improve efficiency of training neural networks; to generate the synthesized target image 422, i.e. a car shape, by replacing the material property set 410 with the training target property; to improve the accuracy of generating modified digital images as taught by Yumer in paragraphs [0030], [0087], and [0140].

Regarding to claim 10,  the claim limitations are similar to claim limitations recited in claim 2. Therefore, same rational used to reject claim 2 is also used to reject claim 10. 

Regarding to claim 11, the claim limitations are similar to claim limitations recited in claim 3. Therefore, same rational used to reject claim 3 is also used to reject claim 11. 

Regarding to claim 12,  Deng in view of Doran and Yumer discloses a method as described in claim 11, wherein said receiving an input interaction to manipulate a visual element is based on a user input to a graphical user interface (Yumer; [0004]: digital image editing systems; [0028]: conventional digital editing systems; Fig. 1; [0038]: Fig. 2; [0046]: an input digital image 202; Fig. 8; [0157]: the target property manager 810 receives user input in relation to the visual indication; [0158]; [0171]: the client device 902a sends a request to the server(s) 906 to edit one or more digital images), and where the complexity variable is determined based on a different user input to the graphical user interface to specify the complexity variable (Yumer; Fig. 5B; [0098]: the architecture 510 utilizes an input digital image volume of size 3×256×256; the architecture 510 is composed of seven convolutional layers where the output of each layer is half the size of its input; Fig. 5C; [0099]: the architecture 520 utilizes an input digital image volume of size 3×256×256 and 14 layers of various dimensionality to generate an output illumination environment map of size 3×64×128).

Regarding to claim 13,  Deng in view of Doran and Yumer discloses a method as described in claim 9, further comprising:
calculating, by the at least one computing device, an accuracy probability for individual decoded visual elements of the set of decoded visual elements (Deng; [0057]: RNN calculates and models a generative probability; obtain and calculate the training loss of the sequence learning process  by counting the differences between the predicted sequence and the ground truth labels; [0060]: minimize the loss of the first feature encoder 204; [0064]: calculate and provide similarity scores for the labeled data and unlabeled data to already-labeled data; [0065]: the processor minimizes the loss and updates the parameters of the feature encoders 204, 206 and the feature decoder 210.); and
removing, by that at least one computing device, a particular decoded visual element based on determining that an accuracy probability for the particular decoded visual element is below a threshold accuracy probability (Doran; [0072]: discard and remove points based on determining that points fall outside the finite portion of the canonical curve, i.e. accuracy probability is zero; Fig. 2; [0235]: discard and remove one or more of the input Bezier curves 2, 3, 4, 5, 6, 7, 8, 9 which are clearly further away than others, i.e. accuracy probability is very low; the input curves 3, 9 to the left of the glyph 1 are discarded; [0248]: discard and remove sample points as falling outside the glyph, i.e. accuracy probability is zero), wherein said generating a shape utilizes remaining decoded visual elements (Doran; Fig. 1; [0177]: generate a render output; output includes shapes as illustrated in Fig. 1; Fig. 2; [0235]:  generate and make up the glyph 1, i.e. shape, only using a sub-set of all of the multiple input Bezier curves  by discarding one or more of the input Bezier curves which are clearly further away than others; [0248]: render the sample position as falling inside the glyph).

Regarding to claim 14,  Deng in view of Doran and Yumer discloses a method as described in claim 9, wherein the shape manifold comprises a topological space that defines different configurations of decoded visual elements according to the latent representation (Yumer; Fig. 4; [0087]: generates the synthesized target image 422 by replacing the material property set 410 with the training target property 418; Fig. 6; [0132]:  the array 620 includes a plurality of modified digital images; Fig. 6; Fig. 6B; [0133]: The digital neural network rendering system generates multiple different modified digital images).

Regarding to claim 15,  Deng in view of Doran and Yumer discloses a method as described in claim 9, wherein said processing the manipulated visual element to adjust the manipulated visual element comprises one or more of automatically repositioning or resizing the visual element to conform to the latent representation (Yumer; [0028]: a rendering layer models image formation from physical properties; Fig. 5C; [0099]: the architecture 520 includes seven convolutional layers where the output of each layer is half the size of its input; the convolutional layers are followed by two fully connected layers and a sequence of deconvolutional layers to generate an illumination environment map of size 64×128).

Claims 16-17 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Deng (US 20190318261 A1) and in view of Yumer (US 20180253869 A1).
Regarding to claim 16,  Deng discloses a system for generating shapes (Fig. 1; [0039]: an electronic device 101 is included in the network environment 100; Fig. 1; [0040]: process the latent representation; output a score; [0046]: generate and display various icons, and symbols, i.e. shapes, to the user; Fig. 11A; [0107]: perform image captioning; Fig. 11A; [0110]: generate and obtain the image captioning results  using the NSE, ALISE, and ALISE+NSE approaches), the system comprising:
an encoder module implemented at least partially in hardware of at least one computing device to encode an first set of visual elements to generate a latent representation (Fig. 1; [0039]: an electronic device 101 is included in the network environment 100; Fig. 1; [0040]: process the latent representation; Fig. 2; [0055]:  perform sequence encoding and decoding using an encoder-decoder framework for labeling data, such as RNN encoder-decoder;  [0056]: the feature encoders 204, 206 and the feature decoder 210 are configured as a convolutional neural network for image data; Fig. 2; [0057]: the first feature encoder 204 receives the input and maps the input to a first latent representation 212;  the first feature encoder generates a first latent representation; 
    PNG
    media_image1.png
    647
    975
    media_image1.png
    Greyscale
; [0064]: generate the first latent representation 212 and the second latent representation);
a decoder module implemented at least partially in the hardware of the at least one computing device (Fig. 1; [0039]: an electronic device 101 is included in the network environment 100; Fig. 1; [0040]: process the latent representation) to decode the latent representation based on a complexity variable to generate a set of decoded visual elements ([0056]: decode encoded inputs from the first feature encoder; Fig. 2; [0057]: the feature decoder 210 accepts the first latent representation 212 as a conditional input and sequentially predicts each token; the feature decoder 210 thus provides for a set of target output data for the labeled data; Fig. 3; [0063]: the encoder-decoder framework) that includes more visual elements than the first set of visual elements (Deng; [0055]: a long short-term memory (LSTM) encoder-decoder, a recurrent neural network (RNN) encoder-decoder, or other encoder-decoder framework used for purposes such as natural language understanding and slot filling, image recognition, image captioning, or other purposes; [0057]: the feature decoder 210 accepts the first latent representation 212 as a conditional input and sequentially predicts each token in y.sup.P; 
    PNG
    media_image3.png
    58
    401
    media_image3.png
    Greyscale
 ; the decoder 210 outputs each token y.sub.t based on the input of the t.sup.th step and the memory vector maintained by the RNN; multiple steps generate more data; Fig. 9A; Fig. 9B; [0097]); and
Deng fails to explicitly disclose:
an object editor module implemented at least partially in the hardware of the at least one computing device to generate a shape utilizing the set of decoded visual elements.
In same field of endeavor, Yumer teaches:
generating, by the at least one computing device, a shape (Fig. 4; [0087]: generate the synthesized target image 422 by replacing and removing the material property set 410 with the training target property 418; 
    PNG
    media_image2.png
    123
    315
    media_image2.png
    Greyscale
);
an object editor module implemented at least partially in the hardware of the at least one computing device to generate a shape utilizing the set of decoded visual elements ([0004]: digital image editing systems; [0028]: conventional digital editing systems;  Fig. 4; [0087]: generate the synthesized target image 422 by replacing and removing the material property set 410 with the training target property 418; 
    PNG
    media_image2.png
    123
    315
    media_image2.png
    Greyscale
 ; [0141]: 40 shapes per each category for joint category fine-tuning; [0143]: experimenters randomly selected an image corresponding to a shape as input; the neural network generated a modified digital image reflecting the target material property set; Fig. 8; [0157]:based on the user input, the target property manager 810 identifies a target property to apply to an input digital image in generating a modified digital image; [0158]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Deng to include generating, by the at least one computing device, a shape; an object editor module implemented at least partially in the hardware of the at least one computing device to generate a shape utilizing the set of decoded visual elements as taught by Yumer. The motivation for doing so would have been to improve efficiency of training neural networks; to generate the synthesized target image 422, i.e. a car shape, by replacing the material property set 410 with the training target property; to improve the accuracy of generating modified digital images as taught by Yumer in paragraphs [0030], [0087], and [0140].

Regarding to claim 17, Deng and Yumer discloses a system as described in claim 16, wherein the decoder module is implemented to receive the complexity variable as a value received via user input (Deng; [0042]: an application receives input data from a user; Fig. 2; [0057]: the feature decoder 210 accepts the first latent representation 212 as a conditional input and sequentially predicts each token; the feature decoder 210 thus provides for a set of target output data for the labeled data; [0060]:  manual labeling by a user and image captioning by a user; Fig. 9A; Fig. 9B; [0097]: sample numbers are specified by users; [0106]: the 82,783 images of the training set are used as the data pool for active learning and query selection).

Regarding to claim 20, the claim limitation is similar to claim limitations recited in claim 8. Therefore, same rational used to reject claim 8 is also used to reject claim 20.

Claims 18-19 are rejected under 35 U.S.C. 103 as being unpatentable over Deng (US 20190318261 A1), in view of Yumer (US 20180253869 A1), and further in view of Doran (US 20170039739 A1).
Regarding to claim 18,  Deng in view of Yumer discloses a system as described in claim 16, further comprising an accuracy predictor module implemented at least partially in the hardware of the at least one computing device to:
calculate an accuracy probability for individual decoded visual elements of the set of decoded visual elements (Deng;  [0057]: RNN calculates and models a generative probability; obtain and calculate the training loss of the sequence learning process  by counting the differences between the predicted sequence and the ground truth labels; [0060]: minimize the loss of the first feature encoder 204; [0064]: calculate and provide similarity scores for the labeled data and unlabeled data to already-labeled data; [0065]: the processor minimizes the loss and updates the parameters of the feature encoders 204, 206 and the feature decoder 210.); and
Deng in view of Yumer fails to explicitly disclose:
remove a particular decoded visual element based on determining that an accuracy probability for the particular decoded visual element is below a threshold accuracy probability.
In same field of endeavor, Doran teaches:
remove a particular decoded visual element based on determining that an accuracy probability for the particular decoded visual element is below a threshold accuracy probability ( [0072]: discard and remove points based on determining that points fall outside the finite portion of the canonical curve, i.e. accuracy probability is zero; Fig. 2; [0235]: discard and remove one or more of the input Bezier curves 2, 3, 4, 5, 6, 7, 8, 9 which are clearly further away than others, i.e. accuracy probability is very low; the input curves 3, 9 to the left of the glyph 1 are discarded; [0248]: discard and remove sample points as falling outside the glyph, i.e. accuracy probability is zero).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Deng and Yumer to include removing a particular decoded visual element based on determining that an accuracy probability for the particular decoded visual element is below a threshold accuracy probability as taught by Doran. The motivation for doing so would have been to discard points which fall outside the finite portion of the canonical curve; to determine the distance between the transformed sampling point and the closest point on the canonical curve to the transformed sampling point in the canonical space; to discard one or more of the input Bezier curves 2, 3, 4, 5, 6, 7, 8, 9 which are clearly further away than others as taught by Doran in paragraphs [0072], [0074], and [0235].

Regarding to claim 19, the claim limitations are similar to claim limitations recited in claim 7. Therefore, same rational used to reject claim 7 is also used to reject claim 19.  

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Hai Tao Sun whose telephone number is (571)272-5630. The examiner can normally be reached 9:00AM-6:00PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kee Tung can be reached on 5712727794. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/HAI TAO SUN/Primary Examiner, Art Unit 2616