Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective 

Claim 1-20 are rejected under 35 U.S.C. 103 as being unpatentable over Lee et al. (KR 20200047272 A), hereafter referred to as Lee, in view of Ishiguro et al. (JP 6099099 B2), hereafter referred to as Ishiguro.

Regarding claim 1, 
Lee teaches 
a method, comprising: 
developing a joint latent variable model having a first variable, a second variable, and a joint latent variable representing common information between the first and second variables (Lee: ¶[0083], “the value of the latent variable (Z) learned by the first variable cycle autoencoder (VAE1) and the second variable cycle autoencoder (VAE2) is shared. Accordingly, the first variable cycle autoencoder (VAE1) and the second variable cycle autoencoder (VAE2) learn and calculate the value of the latent variable Z under the influence of each other”, here, the latent variable Z is representing as a joint latent variable, and the first variable and the second variable are shared under the influence each other is representing as common information between the first and the second variable);

generating a variational posterior of the joint latent variable model (Lee: ¶[0035-0037], “The variable cycle autoencoder is a kind of unsupervised learning and is widely used in dimensional reduction and generation models. The key to the variable cycle autoincoder is to learn that the latent variable (Z) follows a normal distribution (diagonal Gaussian) of mean (μ) and variance (σ). Since the posterior distribution p(z|x) is difficult to calculate (Intractable), we approximate q(z|x) and p(z|x) using Variational Inference. Using Kullback-Leibler Divergence, we can induce as follows to minimize the difference between q(z|x) and p(z|x)

    PNG
    media_image1.png
    75
    724
    media_image1.png
    Greyscale
”, here, 
    PNG
    media_image2.png
    46
    116
    media_image2.png
    Greyscale
 is representing as variational posterior which is calculated using the above equation);

training the variational posterior (Lee: ¶[0045], “the encoding LSTMs 210 and 220 and the decoding LSTMs 230, 240 and 250 of the variable cycle autoencoder 200 may perform a training operation. The encoding LSTMs 210 and 220 calculate a latent variable by encoding each of a plurality of preset keyword sets, and perform training by performing decoding on the calculated latent variable. Accordingly, a value of an appropriate latent variable may be calculated for a plurality of preset keyword sets”. ¶[0073], “autoencoder VAE1 learns the value of the latent variable Z through a training process of encoding and decoding a preset predetermined keyword set”, here, the latent variable z includes the posterior distribution p(z|x) and q(z|x) which is representing as a variational posterior); and

performing inference of the first variable from the second variable based on the variational posterior (Lee: ¶[0035], “Since the posterior distribution p(z|x) is difficult to calculate (Intractable), we approximate q(z|x) and p(z|x) using Variational Inference”. ¶[0083], “the first variable cyclic autoencoder (VAE1) is based on the value of the latent variable (Z) learned and calculated by the second variable cyclic autoencoder (VAE2) as well as the first variable cyclic autoencoder (VAE1)”, here, ranked content items in an order is representing as sequence of historical actions).

wherein performing the inference comprises conditionally generating the first variable from the second variable (Lee: ¶[0035], “we approximate q(z|x) and p(z|x) using Variational Inference. Using Kullback-Leibler Divergence, we can induce as follows to minimize the difference between q(z|x) and p(z|x)”. ¶[0083], “the first variable cyclic autoencoder (VAE1) is based on the value of the latent variable (Z) learned and calculated by the second variable cyclic autoencoder (VAE2)”, here, variational inference is performed using Kullback-Leibler Divergence and the first variable is learned and calculated by the second variable represents generating from the second variable).

Although, in ¶[0035], Lee describes variational inference of the posterior distribution function q(z|x) and p(z|x), but does not distinctly disclose:
performing inference of the first variable from the second variable based on the variational posterior.

However, Ishiguro teaches:
performing inference of the first variable from the second variable based on the variational posterior (Ishiguro: ¶[0050], “determines whether or not the inference of the variational posterior distribution q (Z) by the CVB learning device 200 has converged based on the change amount of the ACVB posterior distribution r (Z) calculated by the ACVB posterior distribution calculation unit 22”, here, inference is performed on the posterior distribution r(z)).

It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to incorporate the method of developing a joint latent variable model from the first and second variables of Lee with performing inference based on the variational posterior of Ishiguro to generate the variational posterior of the joint latent variable model.

One would be motivated to do so to take a weighted average of the variational posterior distribution q (Z) calculated repeatedly, examines the fluctuation of the value, and makes a Ishiguro: ¶[0064]).

Regarding claim 2, 
Lee in view of Ishiguro teaches the method of claim 1 as discussed above and Lee further teaches:
further comprising extracting common information between the first variable and the second variable (Lee: ¶[0083], “the value of the latent variable (Z) learned by the first variable cycle autoencoder (VAE1) and the second variable cycle autoencoder (VAE2) is shared”, here, the first variable and the second variable is shared is representing as common information between the first and the second variable).

Regarding claim 3, 
Lee in view of Ishiguro teaches the method of claim 2 as discussed above and Lee further teaches:
wherein extracting the common information comprises adding a regularization term to a loss function (Lee: ¶[0039], “In the above equation, 
    PNG
    media_image3.png
    48
    438
    media_image3.png
    Greyscale
 is always greater than or equal to 0. When 
    PNG
    media_image4.png
    47
    178
    media_image4.png
    Greyscale
 is maximized, it is the Evidence Lower Bound that makes 
    PNG
    media_image5.png
    49
    177
    media_image5.png
    Greyscale
 the maximum and becomes the Objective Function”, here, 
    PNG
    media_image3.png
    48
    438
    media_image3.png
    Greyscale
 is representing as a regularization term and 
    PNG
    media_image4.png
    47
    178
    media_image4.png
    Greyscale
 is representing as a loss function).

Regarding claim 4, 
Lee in view of Ishiguro teaches the method of claim 1 as discussed above and Lee further teaches:
further comprising adding local randomness to the joint latent variable model (Lee: ¶[0050], “Referring to FIG. 4, keywords of a keyword set generated by the morpheme analyzer 100 are sequentially input to Xt-1 and Xt. In the embodiment of FIG. 4, "this year" may be input to Xt-1, and "corporation tax" may be input to Xt. ht-1 and ht are encoded vector values, and finally output ht may correspond to the latent variable Z of FIG. 3”, local randomness is an assumption that the number of bits is generated sequentially in a pseudorandom sequence, here, Xt-1 and Xt are representing as local randomness).

Regarding claim 5, 
Lee in view of Ishiguro teaches the method of claim 4 as discussed above and Lee further teaches:
wherein adding the local randomness comprises separating the joint latent variable into a common latent variable and a local latent variable (Lee: ¶[0050], “Referring to FIG. 4, keywords of a keyword set generated by the morpheme analyzer 100 are sequentially input to Xt-1 and Xt. In the embodiment of FIG. 4, "this year" may be input to Xt-1, and "corporation tax" may be input to Xt. ht-1 and ht are encoded vector values, and finally output ht may correspond to the latent variable Z of FIG. 3”, here, Xt-1 and Xt are representing as separated local latent variable and ht is representing as a common latent variable).

Regarding claim 7, 
Lee in view of Ishiguro teaches the method of claim 4 as discussed above and Lee further teaches:
Lee: ¶[0071], “Referring to FIG. 5, the morpheme analysis unit 1100 generates a keyword set by receiving a user's natural language in the form of text or voice”. ¶[0080], “second encoding units 1211 and 1221 for encoding a predetermined image set in advance or an image input of the user; And second decoding units 1231 and 1241 for decoding a value of the latent variable Z generated by the second encoding units 1211 and 1221 encoding a predetermined image”, here, a value of latent variable from the natural language text, voice or predetermined image by second encoding unit is representing as style for the first or second variable).

Regarding claim 8, 
Lee in view of Ishiguro teaches the method of claim 1 as discussed above and Lee further teaches:
wherein training the variational posterior comprises training a decoder in the joint latent variable model with a full approximate posterior of the joint latent variable model (Lee: ¶[0013], “the value of the latent variable is learned through a training process of encoding and decoding a preset predetermined keyword set using a first variable cycle auto-encoder. Learning a value of a latent variable shared with a latent variable learned in the first variable cycle autoencoder through a training process of encoding and decoding a set predetermined image using a second variable cycle autoencoder; Calculating a value of a latent variable corresponding to a user's input by using the first cyclical autoencoder or the second cyclical autoencoder”. ¶[0035], “we approximate q(z|x) and p(z|x) using Variational Inference”, here, q(z|x) is representing as full approximate variational posterior of the joint latent variable).

Regarding claim 9, 
Lee in view of Ishiguro teaches the method of claim 8 as discussed above and Lee further teaches:
wherein training the variational posterior further comprises fixing parameters of the decoder and training the marginal variational posterior with the trained decoder (Lee: ¶[0045], “the variable cycle autoencoder 200 may perform a training operation…and perform training by performing decoding on the calculated latent variable”. ¶[0039], “In the above equation, 
    PNG
    media_image3.png
    48
    438
    media_image3.png
    Greyscale
 is always greater than or equal to 0. When 
    PNG
    media_image4.png
    47
    178
    media_image4.png
    Greyscale
 is maximized, it is the Evidence Lower Bound that makes 
    PNG
    media_image5.png
    49
    177
    media_image5.png
    Greyscale
 the maximum and becomes the Objective Function”, here, θ and φ are representing as parameters and 
    PNG
    media_image6.png
    45
    170
    media_image6.png
    Greyscale
 is representing as the marginal variational posterior).

Regarding claim 10, 
Lee in view of Ishiguro teaches the method of claim 1 as discussed above and Lee further teaches:
wherein training the variational posterior comprises training the joint latent variable model, a full approximate posterior, and the marginal variational posterior jointly using a hyperparameter (Lee: ¶[0045], “the variable cycle autoencoder 200 may perform a training operation…and perform training by performing decoding on the calculated latent variable”. ¶[0035], “we approximate q(z|x) and p(z|x) using Variational Inference”. ¶[0039], “In the above equation, 
    PNG
    media_image3.png
    48
    438
    media_image3.png
    Greyscale
 is always greater than or equal to 0. When 
    PNG
    media_image4.png
    47
    178
    media_image4.png
    Greyscale
 is maximized, it is the Evidence Lower Bound that makes 
    PNG
    media_image5.png
    49
    177
    media_image5.png
    Greyscale
 the maximum and becomes the Objective Function”, here, q(z|x) is representing as full approximate posterior of the joint latent variable z, 
    PNG
    media_image6.png
    45
    170
    media_image6.png
    Greyscale
 is representing as the marginal variational posterior, and φ are representing as hyperparameter when φ is greater than 0).

Regarding claim 11, 
Lee teaches a system, comprising: 
at least one decoder; 
at least one encoder; and 
a processor configured to: 
develop a joint latent variable model having a first variable, a second variable, and a joint latent variable representing common information between the first and second variables (Lee: ¶[0044], “the variable cycle auto-encoder 200 includes encoding Long Short Term Memory (LSTM; encoding units; 210, 220)… It may include a decoding LSTM (decoder; 230, 240, 250)”. ¶[0089], Fig. 7, “the specialized field response service system 2 includes an input/output unit 2100, an index system 2200, a search unit 2300, and a database 2400”. ¶[0083], “the value of the latent variable (Z) learned by the first variable cycle autoencoder (VAE1) and the second variable cycle autoencoder (VAE2) is shared. Accordingly, the first variable cycle autoencoder (VAE1) and the second variable cycle autoencoder (VAE2) learn and calculate the value of the latent variable Z under the influence of each other”, here, the latent variable Z is representing as a joint latent variable, and the first variable and the second variable are shared under the influence each other is representing as common information between the first and the second variable; and in Fig. 7, system is representing a compute or processor);

generate a variational posterior of the joint latent variable model (Lee: ¶[0035-0037], “The variable cycle autoencoder is a kind of unsupervised learning and is widely used in dimensional reduction and generation models. The key to the variable cycle autoincoder is to learn that the latent variable (Z) follows a normal distribution (diagonal Gaussian) of mean (μ) and variance (σ). Since the posterior distribution p(z|x) is difficult to calculate (Intractable), we approximate q(z|x) and p(z|x) using Variational Inference. Using Kullback-Leibler Divergence, we can induce as follows to minimize the difference between q(z|x) and p(z|x)

    PNG
    media_image1.png
    75
    724
    media_image1.png
    Greyscale
”, here, 
    PNG
    media_image2.png
    46
    116
    media_image2.png
    Greyscale
 is representing as variational posterior which is calculated using the above equation);

train the variational posterior (Lee: ¶[0045], “the encoding LSTMs 210 and 220 and the decoding LSTMs 230, 240 and 250 of the variable cycle autoencoder 200 may perform a training operation. The encoding LSTMs 210 and 220 calculate a latent variable by encoding each of a plurality of preset keyword sets, and perform training by performing decoding on the calculated latent variable. Accordingly, a value of an appropriate latent variable may be calculated for a plurality of preset keyword sets”. ¶[0073], “autoencoder VAE1 learns the value of the latent variable Z through a training process of encoding and decoding a preset predetermined keyword set”, here, the latent variable z includes the posterior distribution p(z|x) and q(z|x) which is representing as a variational posterior); and

perform inference of the first variable from the second variable based on the variational posterior (Lee: ¶[0035], “Since the posterior distribution p(z|x) is difficult to calculate (Intractable), we approximate q(z|x) and p(z|x) using Variational Inference”. ¶[0083], “the first variable cyclic autoencoder (VAE1) is based on the value of the latent variable (Z) learned and calculated by the second variable cyclic autoencoder (VAE2) as well as the first variable cyclic autoencoder (VAE1)”, here, ranked content items in an order is representing as sequence of historical actions), 
by conditionally generating the first variable from the second variable (Lee: ¶[0035], “we approximate q(z|x) and p(z|x) using Variational Inference. Using Kullback-Leibler Divergence, we can induce as follows to minimize the difference between q(z|x) and p(z|x)”. ¶[0083], “the first variable cyclic autoencoder (VAE1) is based on the value of the latent variable (Z) learned and calculated by the second variable cyclic autoencoder (VAE2)”, here, variational inference is performed using Kullback-Leibler Divergence and the first variable is learned and calculated by the second variable represents generating from the second variable).

Although, in ¶[0035], Lee describes variational inference of the posterior distribution function q(z|x) and p(z|x), but does not distinctly disclose:
perform inference of the first variable from the second variable based on the variational posterior.
However, Ishiguro teaches 
perform inference of the first variable from the second variable based on the variational posterior as cited above in claim 1.

Regarding claim 12, 
Lee in view of Ishiguro teaches the system of claim 11 as discussed above and Lee further teaches:
wherein the processor is further configured to extract the common information between the first variable and the second variable (Lee: ¶[0083], “the value of the latent variable (Z) learned by the first variable cycle autoencoder (VAE1) and the second variable cycle autoencoder (VAE2) is shared”, here, the first variable and the second variable is shared is representing as common information between the first and the second variable).

Regarding claim 13, 
Lee in view of Ishiguro teaches the system of claim 12 as discussed above and Lee further teaches:
Lee: ¶[0039], “In the above equation, 
    PNG
    media_image3.png
    48
    438
    media_image3.png
    Greyscale
 is always greater than or equal to 0. When 
    PNG
    media_image4.png
    47
    178
    media_image4.png
    Greyscale
 is maximized, it is the Evidence Lower Bound that makes 
    PNG
    media_image5.png
    49
    177
    media_image5.png
    Greyscale
 the maximum and becomes the Objective Function”, here, 
    PNG
    media_image3.png
    48
    438
    media_image3.png
    Greyscale
 is representing as a regularization term and 
    PNG
    media_image4.png
    47
    178
    media_image4.png
    Greyscale
 is representing as a loss function).

Regarding claim 14, 
Lee in view of Ishiguro teaches the system of claim 11 as discussed above and Lee further teaches:
wherein the processor is further configured to add local randomness to the joint latent variable model (Lee: ¶[0050], “Referring to FIG. 4, keywords of a keyword set generated by the morpheme analyzer 100 are sequentially input to Xt-1 and Xt. In the embodiment of FIG. 4, "this year" may be input to Xt-1, and "corporation tax" may be input to Xt. ht-1 and ht are encoded vector values, and finally output ht may correspond to the latent variable Z of FIG. 3”, local randomness is an assumption that the number of bits is generated sequentially in a pseudorandom sequence, here, Xt-1 and Xt are representing as local randomness).

Regarding claim 15, 
Lee in view of Ishiguro teaches the system of claim 14 as discussed above and Lee further teaches:
wherein the processor is further configured to add the local randomness further comprises by separating the joint latent variable into a common latent variable and a local latent variable (Lee: ¶[0050], “Referring to FIG. 4, keywords of a keyword set generated by the morpheme analyzer 100 are sequentially input to Xt-1 and Xt. In the embodiment of FIG. 4, "this year" may be input to Xt-1, and "corporation tax" may be input to Xt. ht-1 and ht are encoded vector values, and finally output ht may correspond to the latent variable Z of FIG. 3”, here, Xt-1 and Xt are representing as separated local latent variable and ht is representing as a common latent variable).

Regarding claim 17, 
Lee in view of Ishiguro teaches the system of claim 14 as discussed above and Lee further teaches:
wherein the processor is further configured to perform the inference by generating a style for the first variable or the second variable (Lee: ¶[0071], “Referring to FIG. 5, the morpheme analysis unit 1100 generates a keyword set by receiving a user's natural language in the form of text or voice”. ¶[0080], “second encoding units 1211 and 1221 for encoding a predetermined image set in advance or an image input of the user; And second decoding units 1231 and 1241 for decoding a value of the latent variable Z generated by the second encoding units 1211 and 1221 encoding a predetermined image”, here, a value of latent variable from the natural language text, voice or predetermined image by second encoding unit is representing as style for the first or second variable).

Regarding claim 18, 
Lee in view of Ishiguro teaches the system of claim 11 as discussed above and Lee further teaches:
wherein the processor is further configured to train the variational posterior comprises by training a decoder in the joint latent variable model with a full approximate posterior of the joint latent variable model (Lee: ¶[0013], “the value of the latent variable is learned through a training process of encoding and decoding a preset predetermined keyword set using a first variable cycle auto-encoder. Learning a value of a latent variable shared with a latent variable learned in the first variable cycle autoencoder through a training process of encoding and decoding a set predetermined image using a second variable cycle autoencoder; Calculating a value of a latent variable corresponding to a user's input by using the first cyclical autoencoder or the second cyclical autoencoder”. ¶[0035], “we approximate q(z|x) and p(z|x) using Variational Inference”, here, q(z|x) is representing as full approximate variational posterior of the joint latent variable).

Regarding claim 19, 
Lee in view of Ishiguro teaches the system of claim 18 as discussed above and Lee further teaches:
wherein the processor is further configured to train the variational posterior further comprises by fixing parameters of the decoder and training the marginal variational posterior with the trained decoder (Lee: ¶[0045], “the variable cycle autoencoder 200 may perform a training operation…and perform training by performing decoding on the calculated latent variable”. ¶[0039], “In the above equation, 
    PNG
    media_image3.png
    48
    438
    media_image3.png
    Greyscale
 is always greater than or equal to 0. When 
    PNG
    media_image4.png
    47
    178
    media_image4.png
    Greyscale
 is maximized, it is the Evidence Lower Bound that makes 
    PNG
    media_image5.png
    49
    177
    media_image5.png
    Greyscale
 the maximum and becomes the Objective Function”, here, θ and φ are representing as parameters and 
    PNG
    media_image6.png
    45
    170
    media_image6.png
    Greyscale
 is representing as the marginal variational posterior).

Regarding claim 20, 
Lee in view of Ishiguro teaches the system of claim 11 as discussed above and Lee further teaches:
wherein the processor is further configured to train the variational posterior comprises by training the joint latent variable model, a full approximate posterior, and the marginal variational posterior jointly using a hyperparameter (Lee: ¶[0045], “the variable cycle autoencoder 200 may perform a training operation…and perform training by performing decoding on the calculated latent variable”. ¶[0035], “we approximate q(z|x) and p(z|x) using Variational Inference”. ¶[0039], “In the above equation, 
    PNG
    media_image3.png
    48
    438
    media_image3.png
    Greyscale
 is always greater than or equal to 0. When 
    PNG
    media_image4.png
    47
    178
    media_image4.png
    Greyscale
 is maximized, it is the Evidence Lower Bound that makes 
    PNG
    media_image5.png
    49
    177
    media_image5.png
    Greyscale
 the maximum and becomes the Objective Function”, here, q(z|x) is representing as full approximate posterior of the joint latent variable z, 
    PNG
    media_image6.png
    45
    170
    media_image6.png
    Greyscale
 is representing as the marginal variational posterior, and φ are representing as hyperparameter when φ is greater than 0.).

Response to Arguments
Applicant's arguments filed on 07/21/2021 have been fully considered but they are not persuasive.
Applicant asserts 
“Contrary to the rejection, paragraph [0083] of Lee fails to teach or suggest "developing a joint latent variable model having a first variable, a second variable, and a joint latent variable representing common information between the first and second variables," as recited in amended independent Claims 1 and 11. That is, in amended independent Claims 1 and 11, there is one joint VAE (which encodes the first and the second variables together into the joint latent variable), while Lee has two separate VAEs. While Lee describes "the first variable cycle autoencoder (VAE1) and the second variable cycle autoencoder (VAE2) learn and calculate the value of the latent variable Z under the influence of each other," Lee does not clearly explain how VAEi and VAE2 influence each other, but it is at least clear that they use two VAEs. 
Accordingly, Lee fails to teach or suggest "developing a joint latent variable model having a first variable, a second variable, and a joint latent variable representing (Remarks, p. 6)

Examiner’s response:
The examiner respectfully disagrees. 

The examiner understands the applicant’s assertion “in amended independent Claims 1 and 11, there is one joint VAE (which encodes the first and the second variables together into the joint latent variable), while Lee has two separate VAEs.”

However, the claim just says “developing a joint latent variable model having a first variable, a second variable, and a joint latent variable representing common information between the first and second variables,” but it does not say a specific structure and/or configuration and/or type, etc. of the joint latent variable model and/or the joint latent variable. Thus, there is nothing that prevents a model with two VAEs of Lee from reading on the claimed “joint latent variable model”. In addition, e.g., the latent variable (Z) may read on “joint latent variable”.

For more details, see the rejections. Thus, the examiner’s rejections are reasonable and proper.

Applicant asserts 
“Further, Lee in view of Ishiguro fails to teach or suggest that performing the inference based on the variational posterior comprises conditionally generating the first variable from the second variable. 

More specifically, in amended independent Claims 1 and 11, the first variable is inferred from the second variable based on the variational posterior, by conditionally generating the first variable from the second variable, i.e., conditional generation of the first variable, given the second variable, while Lee merely describes using the latent variable for indexing of the input. That is, there is no conditional generation in Lee.
…
As shown above, there is nothing in paragraphs [0035] and [0083] of Lee that teaches or suggests the first variable is inferred from the second variable based on the variational posterior, by conditionally generating the first variable from the second variable. That is, there is nothing in Lee that teaches or suggests conditionally generating.
…
As described above, Lee in view of Ishiguro fails to teach or suggest all of the recitations of amended independent Claims 1 and 11. Therefore, based at least on the foregoing, it is respectfully submitted that amended independent Claims 1 and 11 are patentably distinct over Lee in view of Ishiguro, and that the rejection should be withdrawn.” (Remarks, pp. 6-8)

Examiner’s response:
The examiner respectfully disagrees. 

The examiner understands the applicant’s assertion “there is nothing in paragraphs [0035] and [0083] of Lee that teaches or suggests the first variable is 

However, first of all, there is no definition of “conditionally generating” in the specification of the present application (e.g., in par 51). In addition, according to the American Heritage Dictionary (AHD), “conditional” is defined as “Imposing, depending on, or containing a condition.” Thus, under the broadest reasonable interpretation (BRI), “conditionally generating” may be interpreted as generating based on a condition. As rejected on claim 1 under Claim Rejections - 35 USC § 103, e.g., “the first variable cyclic autoencoder (VAE1) is based on the value of the latent variable (Z) learned and calculated by the second variable cyclic autoencoder (VAE2) as well as the first variable cyclic autoencoder (VAE1)” may read on “conditionally generating” since the first variable cyclic autoencoder (VAE1) is calculated based on a condition of “the value of the latent variable (Z) learned and calculated by the second variable cyclic autoencoder (VAE2) as well as the first variable cyclic autoencoder (VAE1).” Furthermore, conditional probabilities are used as well. 

For more details, see the rejections. Thus, the examiner’s rejections are reasonable and proper.

Prior Art
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. These includes:
US 2020/0401916 A1 which describes systems and method for training generative machine learning models.
US 2019/0036795 A1 which describes method and system for proactive anomaly detection in devices and networks.
KR 102070049 B1 which describes apparatus and method for collaborative filtering using auxiliary information based on conditional variational autoencoder.

Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SEHWAN KIM whose telephone number is (571)270-7409. The examiner can normally be reached Mon - Thu 7:00 AM - 5:00 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Michael J Huntley can be reached on (303) 297-4307. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.


/S.K./Examiner, Art Unit 2129                                                                                                                                                                                                        



/MICHAEL J HUNTLEY/Supervisory Patent Examiner, Art Unit 2129