DETAILED ACTION
This communication is in response to the Amendments and Arguments filed on 09/23/2022. Claims 1-16 are pending and have been examined. Hence, this action has been made FINAL.
All previous objections/rejections not mentioned in this Office Action have been withdrawn by the Examiner.
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Arguments and Amendments
Amendments to the claims by the Applicant have been considered and addressed below. 
With respect to the 35 USC § 112 and 103 rejections, the Applicant provides several arguments in which the Examiner will respond accordingly, below.
Claim Rejections - 35 U.S.C. § 112:
Arguments: 
The office action includes a rejection to claims 7 and 15 for a lack of antecedent for the term "the aggregate loss values." Claims 7 and 15 have been amended to remove the "the" and recite "... with aggregate loss values of..." Applicant trusts that the Examiner agrees that claims 7 and 15 now comply with 35 USC 112. 
Reconsideration and withdrawal of all rejections under 35 USC 112 are respectfully requested.

Examiner response to Arguments:
Applicant’s arguments with respect to 35 U.S.C. § 112 have been fully considered and are persuasive.  The 35 U.S.C. § 112 rejection of claims 7 and 15 have been withdrawn. 
Claim Rejections - 35 U.S.C. § 103:
Arguments: 
1-3, 5 and 13 
The office action includes a rejection to claims 1-3, 5 and 13 as being unpatentable over the Chen reference and further in view of DeFelice (US20190236148). Applicant respectfully requests reconsideration. 
The office action states at pages 6 to 7 that the Chen reference teaches "the semantics mean vector constrained to a simplex associated with a semantics Gaussian posterior and learned during training of a sentence model." The office action references Section 3.1 Parameterizations and Subsection vMF Distribution, and Section 4.1 Paraphrase Reconstruction Loss in the Chen reference. At page 7, the office action states, "Here, the 'probability simplex' is interpreted as the mean/variance data obtained from the Gaussian / VAE model and the sentence model is interpreted as associated with the paraphrasing model." Applicant respectfully requests reconsideration of this interpretation. 
[1] Applicant respectfully submits that the interpretation that reads the simplex as the mean/variance data obtained from the GaussianNAE model is incorrect for two reasons. a) The present application explicitly constrains the mean to be on the simplex. Without the explicit constraint, the mean vectors from the GaussianNAE will not fill the simplex (see FIG. 5B), which would lead to inconsistent decoding during generation. b) Without the explicit constraints, neither the mean/variance (log-variance) nor the Gaussian sample themselves need to be on the simplex, at its interior, or vertices. They are by default 'unbounded' whereas the simplex is a convex and bounded space in which the mean is forced to be located. 
[2] Therefore, in the Chen reference, the semantics mean vector is NOT constrained to a simplex. As such, the Chen reference does not teach this limitation recited in claims 1 and 9. DeFelice does not teach this either. Therefore, the cited combination does not teach each limitation in claims 1 and 9. Since claims 2-3, 5 depend upon claim 1, and claim 13 depends upon claim 9, these claims are patentable over the cited art for at least that reason. 

4 and 12 
The office action includes a rejection to claims 4 and 12 as being unpatentable over the Chen reference in combination with DeFelice and further in view of Devarajan (US20160350664). Applicant respectfully requests reconsideration. 
Applicant notes that Devarajan does not cure the deficiencies of Chen in combination with DeFelice with respect to claims 1 and 9. As such, the cited art does not teach each limitation in claims 1 and 9. [3] Since claims 4 and 12 depend upon claims 1 and 9, respectively, they are patentable over the cited art for at least that reason. 

6 and 14 
The office action includes a rejection to claims 6 and 14 as being unpatentable over the Chen reference in combination with DeFelice and further in view of Moore (US20040044530). Applicant respectfully requests reconsideration. 
Applicant notes that Moore does not cure the deficiencies of Chen in combination with DeFelice with respect to claims 1 and 9. As such, the cited art does not teach each limitation in claims 1 and 9. [4] Since claims 6 and 14 depend upon claims 1 and 9, respectively, they are patentable over the cited art for at least that reason. 

7-8 and 15-16 
The office action includes a rejection to claims 7-8 and 15-16 as being unpatentable over the Chen reference in combination with DeFelice and Moore and further in view of Norton (US202000279017). Applicant respectfully requests reconsideration. 
Applicant notes that Norton does not cure the deficiencies of Chen in combination with DeFelice with respect to claims 1 and 9. As such, the cited art does not teach each limitation in claims 1 and 9. [5] Since claims 7-8 and 15- 16 depend upon claims 1 and 9, respectively, they are patentable over the cited art for at least that reason. 
Reconsideration and withdrawal of all rejections under 35 USC 103 are respectfully requested.

Examiner response to Arguments:
[1]-[2]: Applicant notes that “the interpretation that reads the simplex as the mean/variance data obtained from the Gaussian/VAE model is incorrect for two reasons. a) The present application explicitly constrains the mean to be on the simplex. Without the explicit constraint, the mean vectors from the Gaussian/VAE will not fill the simplex (see FIG. 5B), which would lead to inconsistent decoding during generation. b) Without the explicit constraints, neither the mean/variance (log-variance) nor the Gaussian sample themselves need to be on the simplex, at its interior, or vertices. They are by default 'unbounded' whereas the simplex is a convex and bounded space in which the mean is forced to be located.” Also, that “Chen reference, the semantics mean vector is NOT constrained to a simplex. As such, the Chen reference does not teach this limitation recited in claims 1 and 9. De Felice does not teach this either. Therefore, the cited combination does not teach each limitation in claims 1 and 9.”
Applicant's arguments have been fully considered but they are not persuasive. The Examiner respectfully disagrees and notes that the assertions of the Applicant regarding “a) The present application explicitly constrains the mean to be on the simplex. Without the explicit constraint, the mean vectors from the Gaussian/VAE will not fill the simplex (see FIG. 5B), which would lead to inconsistent decoding during generation” and “b) Without the explicit constraints, neither the mean/variance (log-variance) nor the Gaussian sample themselves need to be on the simplex, at its interior, or vertices. They are by default 'unbounded' whereas the simplex is a convex and bounded space in which the mean is forced to be located,” although valid, are not claimed in the claim language as drafted. Claim 1, recites: “process the original semantics component through a semantics VAE to receive a semantics mean vector and a semantics covariance matrix, the semantics mean vector constrained to a simplex associated with a semantics Gaussian posterior and learned during training of a sentence model.” Hence, the Examiner notes that given a broadest reasonable interpretation, the “probability simplex” is still interpreted as associated to the mean/variance data obtained from the Gaussian / VAE model in Chen et al. (Section 3 “Proposed Approach and Section 4.1 Paraphrase Reconstruction Loss of Chen et al.). In summary, Chen et al. discloses the use of a Gaussian distribution to define a posterior over latent variables [i.e., associated with semantics Gaussian posterior], wherein there is a Gaussian distribution with two parameters [i.e., constraints]: mean direction and variance. Therefore, it is noted by the Examiner that the Chen et al. does teach this limitation recited in claims 1 and 9 (incorporated below for reference). Please see relevant citations below.
3.1 Parameterizations 
“VGVAE uses two distribution families in defining the posterior over latent variables, namely, the von Mises-Fisher (vMF) distribution and the Gaussian distribution.” 
vMF Distribution.
“vMF can be regarded as a Gaussian distribution on a hypersphere with two parameters: µ and κ. µ ∈ R m is a normalized vector (i.e. kµk2 = 1 ) defining the mean direction. κ ∈ R≥0 is often referred to as a concentration parameter analogous to the variance in a Gaussian distribution… since we will evaluate our semantic representations in the context of modeling paraphrases…”
4.1 Paraphrase Reconstruction Loss
“To impose such constraints, PRL is defined as… That is, we swap the semantic variables, keep the syntactic variables, and attempt to reconstruct the sentences” 


[3]-[5]: Please refer to response to arguments [1]-[2].

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims  1-3, 5, and 13 are rejected under 35 U.S.C. 103 as being unpatentable over Chen, Mingda, et al. ("A Multi-Task Approach for Disentangling Syntax and Semantics in Sentence Representations." arXiv e-prints (2019): arXiv-1904. https://arxiv.org/pdf/1904.01173.pdf; hereinafter referred to as Chen et al.) and further in view of DeFelice; Michael (US 20190236148 A1; hereinafter referred to as DeFelice).

As to independent claim 1, Chen et al. teaches a system for machine text generation (see Introduction, par. 2: “To this end, we propose a generative model of a sentence which makes use of both semantic and syntactic latent variables…”), 
process an original sentence structure through an encoder neural network to decompose the original sentence structure into an original semantics component and an original syntax component (see Figure 1, Section 3 Proposed Approach, and Subsection Inference and Generative Models: “Our goal is to extract the disentangled semantic and syntactic information from sentence representations. To achieve this, we introduce the vMF Gaussian Variational Autoencoder (VGVAE). As shown in Figure 1, VGVAE assumes a sentence is generated by conditioning on two independent variables: semantic variable y and syntactic variable z. […] The inference models qφ(y|x) and qφ(z|x) are two independent word averaging encoders with additional linear feedforward neural networks…” Here, qφ(y|x) and qφ(z|x) are interpreted as associated to the semantic variable y, and the syntactic variable z, respectively as shown in Eq. 1.);
process the original syntax component through a syntax variational autoencoder (VAE) to receive a syntax mean vector and a syntax covariance matrix (see Figure 1, Section: 3 Proposed approach, and Subsection RNN: “To achieve this, we introduce the vMF Gaussian Variational Autoencoder (VGVAE). As shown in Figure 1, VGVAE assumes a sentence is generated by conditioning on two independent variables: semantic variable y and syntactic variable z. […] The inference model qφ(y|x) is still a word averaging encoder, but qφ(z|x) is parameterized by a bidirectional LSTM, where we concatenate the forward and backward hidden states and then take the average. The output of the LSTM is then used as input to a feedforward network with one hidden layer for producing µ(x) and σ(x) (or κ(x)).” Here, it is interpreted that the VGVAE assuming sentences generated by conditioning syntax variables are associated with a syntax VAE. Also, qφ(y|x) and qφ(z|x) are interpreted as associated to the semantic variable y, and the syntactic variable z, respectively as shown in Eq. 1.”); 
obtain a sampled syntax value from a syntax Gaussian posterior parameterized by the syntax mean vector and the syntax covariance matrix (see Section 3 Proposed Approach and Subsections: 3.1 Parametrizations and Inference and Generative Models: “To achieve this, we introduce the vMF Gaussian Variational Autoencoder (VGVAE). As shown in Figure 1, VGVAE assumes a sentence is generated by conditioning on two independent variables: semantic variable y and syntactic variable z. […] To perform inference, we assume a factored posterior qφ(y, z|x) = qφ(y|x)qφ(z|x) […] VGVAE uses two distribution families in defining the posterior over latent variables, namely, the von Mises-Fisher (vMF) distribution and the Gaussian distribution. […] The inference models qφ(y|x) and qφ(z|x) are two independent word averaging encoders with additional linear feedforward neural networks for producing µ(x) and σ(x) (or κ(x)).” Here, it is interpreted that the syntax Gaussian posterior is associated with the VGVAE Gaussian distribution defined posterior, while the syntax mean vector and syntax covariance matrix are interpreted as µ(x) σ(x) (or κ(x)), respectively.); 
process the original semantics component through a semantics VAE to receive a semantics mean vector and a semantics covariance matrix (see Figure 1, Section: 3 Proposed approach, and Subsection RNN: “Here, it is interpreted that the VGVAE assuming sentences generated by conditioning semantic variables are associated with a semantic VAE. Also, the inference model qφ(y|x) is still a word averaging encoder, but qφ(z|x) is parameterized by a bidirectional LSTM, where we concatenate the forward and backward hidden states and then take the average. The output of the LSTM is then used as input to a feedforward network with one hidden layer for producing µ(x) and σ(x) (or κ(x)). Here, qφ(y|x) and qφ(z|x) are interpreted as associated to the semantic variable y, and the syntactic variable z, respectively as shown in Eq. 1.”), the semantics mean vector constrained to a simplex associated with a semantics Gaussian posterior and learned during training of a sentence model (see Section 3.1 Parameterizations and Subsection vMF Distribution, and Section 4.1 Paraphrase Reconstruction Loss: “vMF can be regarded as a Gaussian distribution on a hypersphere with two parameters: µ and κ. µ ∈ R m is a normalized vector (i.e. kµk2 = 1 ) defining the mean direction. κ ∈ R≥0 is often referred to as a concentration parameter analogous to the variance in a Gaussian distribution. vMF has been used for modeling similarity between two sentences (Guu et al., 2018), which is particularly suited to our purpose here, since we will evaluate our semantic representations in the context of modeling paraphrases (See Sections 4.1 and 4.2 for more details). […] Our first loss is a paraphrase reconstruction loss (PRL). The key assumption underlying the PRL is that for a paraphrase pair x1, x2, the semantic information is equivalent between the two sentences and only the syntactic information varies. To impose such constraints, PRL is defined as That is, we swap the semantic variables, keep the syntactic variables, and attempt to reconstruct the sentences (shown in Figure 3). While instead of using a multi-task objective we could directly model paraphrases x1 and x2 as being generated by the same y (which naturally suggests a product of-experts style posterior, as in Wu and Goodman (2018)), we found that for the purposes of our downstream tasks training with the multi-task loss gave superior results.” Here, the “probability simplex” is interpreted as the mean/variance data obtained from the Gaussian / VAE model and the sentence model is interpreted as associated with the paraphrasing model.); 
obtain a sampled semantics vector from the Gaussian semantics posterior parameterized by the semantics mean vector and the semantics covariance matrix (see Section 3 Proposed Approach and Subsections: 3.1 Parametrizations and Inference and Generative Models: “To achieve this, we introduce the vMF Gaussian Variational Autoencoder (VGVAE). As shown in Figure 1, VGVAE assumes a sentence is generated by conditioning on two independent variables: semantic variable y and syntactic variable z. […] To perform inference, we assume a factored posterior qφ(y, z|x) = qφ(y|x)qφ(z|x) […] VGVAE uses two distribution families in defining the posterior over latent variables, namely, the von Mises-Fisher (vMF) distribution and the Gaussian distribution. […] The inference models qφ(y|x) and qφ(z|x) are two independent word averaging encoders with additional linear feedforward neural networks for producing µ(x) and σ(x) (or κ(x)).” Here, it is interpreted that the semantic Gaussian posterior is associated with the VGVAE Gaussian distribution defined posterior, while the semantics mean vector and semantics covariance matrix are interpreted as µ(x) σ(x) (or κ(x)), respectively.); and 
process the sampled syntax vector and the sampled semantics vector through a decoder neural network to compose a new sentence (see Figure 3 and 4 Multi-Task Training and 4.1 Paraphrase Reconstruction Loss, and 8 Discussion: “That is, we swap the semantic variables, keep the syntactic variables, and attempt to reconstruct the sentences (shown in Figure 3). […] We also conducted experiments using LSTM encoders and decoders as recurrent neural networks are a natural way to capture syntactic information in a sentence. We found this approach to give us additional benefits for both disentangling semantics and syntax and achieving better results overall.” Here, y are the semantic variables and z are the syntactic variables.).
However, Chen et al. does not explicitly teach wherein the system comprising:
at least one processor; and 
a memory comprising instructions, which, when executed by the processor, configure the processor to [perform all the previous limitations].
DeFelice does teach the system comprising:
at least one processor (see ¶ [0061]: “In this embodiment, the named entity recognition component receives each sentence or group of sentences and uses a processor to tag the words according to the part of speech (4a) and identify particular noun phrases (4b) within the input.”); and 
a memory comprising instructions (see ¶ [0061]: “At this point, any new information not included in the storage 370 is identified, and flow returns to the correlation component 310 so that any information newly identifiable via a canonical representation can be associated with the prospect and stored…”) which, when executed by the processor, configure the processor to [perform all the previous limitations, taught by Chen et al.].
Chen et al. and DeFelice are both considered to be analogous to the claimed invention because they are in the same field of endeavor in text generation. 
Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Chen et al. to incorporate the teachings of DeFelice. All of the elements of the claim are disclosed in Chen et al. and DeFelice. The only difference is the combination of the elements into a single device i.e. computing device (i.e., comprised of a processor and a memory). It would have been obvious to one of ordinary skill in the art to use a computing device, which could be used to perform all the instructions disclosed in Chen et al.

Regarding claims 2 and 10, Chen et al. in combination with DeFelice, teach all of the limitations as in claim 1, above.
Chen et al. further teaches: 
wherein the at least one processor is configured to: 
receive a semantics input value defining a variation parameter used to vary the original sentence (see Figure 3, Section 4 Multi-Task Training and Subsection 4.1 Paraphrase Reconstruction Loss: “That is, we swap the semantic variables, keep the syntactic variables, and attempt to reconstruct the sentences (shown in Figure 3).” Here, the “reconstruction” of the sentence is associated with “varying” the original sentence using semantic variables (i.e., semantic input value).).

Regarding claims 3 and 11, Chen et al. in combination with DeFelice teach all of the limitations as in claim 2, above.
DeFelice further teaches:
wherein the semantics input value comprises at least one of: 
a sentiment value; or a topic value (see ¶ [0066]: “Categorization component 360 labels data or groups of data according to different categories, usually relating to either topical or sentiment information.”).
Chen et al. and DeFelice are both considered to be analogous to the claimed invention because they are in the same field of endeavor in text generation. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Chen et al. to incorporate the teachings of DeFelice of the semantics input value comprising at least one of: a sentiment value; or a topic value which provides the benefit of allowing a broader understanding of a prospect or detecting of presumed emotional signals (DeFelice ¶ [0066]).

Regarding claims 5 and 13, Chen et al. in combination with DeFelice teach all of the limitations as in claim 4, above.
DeFelice further teaches:
wherein the at least one processor is configured to: 
output the new sentence (see ¶ [0045]: “…The generated text 203 is used in a campaign 230 where targeted readers interact with the text (shown as two-way arrow 232) and respond, either implicitly approving or disapproving the generated text 203. Here, the generated text is interpreted as the new sentence.”).
Chen et al. and DeFelice are both considered to be analogous to the claimed invention because they are in the same field of endeavor in text generation. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Chen et al. to incorporate the teachings of DeFelice of outputting the new sentence which provides the benefit a having a higher-quality future “translation” of the source text into an effective generated text. (DeFelice ¶ [0045]).

As to independent claim 9, DeFelice further teaches:
 a computer-implemented method for machine text generation (see ¶ [0061]: “In this embodiment, the named entity recognition component receives each sentence or group of sentences and uses a processor to tag the words according to the part of speech (4a) and identify particular noun phrases (4b) within the input […] At this point, any new information not included in the storage 370 is identified, and flow returns to the correlation component 310 so that any information newly identifiable via a canonical representation can be associated with the prospect and stored…”), the method comprising the limitations disclosed in claim 1. 

Claims 4 and 12  are rejected under 35 U.S.C. 103 as being unpatentable over Chen, Mingda, et al. ("A Multi-Task Approach for Disentangling Syntax and Semantics in Sentence Representations." arXiv e-prints (2019): arXiv-1904. https://arxiv.org/pdf/1904.01173.pdf; hereinafter referred to as Chen et al.) in combination with DeFelice; Michael (US 20190236148 A1; hereinafter referred to as DeFelice) and further in view of Devarajan; Ravinder et al. (US 20160350664 A1; hereinafter referred to as Devarajan et al.)
 
Regarding claims 4 and 12, Chen et al. in combination with DeFelice teach all of the limitations as in claim 3, above
However, Chen et al. in combination with DeFelice does not explicitly teach: wherein the at least one processor is configured to: 
display options for the sentiment value and the topic value.
Devarajan et al. does teach:
wherein the at least one processor is configured to: 
display options for the sentiment value and the topic value (see ¶ [0005]: “[…] The method can include transmitting graphical information configured to cause a display to output a graphical user interface visually indicating at least a portion of: the plurality of sentiments, the plurality of sentiment pattern groups, the plurality of semantic tags, or the plurality of topic sets.”).
Chen et al. in combination with DeFelice and Devarajan et al. are all considered to be analogous to the claimed invention because they are in the same field of endeavor in text/data processing/generation. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Chen et al. in combination with DeFelice to incorporate the teachings Devarajan et al. of displaying options for the sentiment value and the topic value  which provides the benefit of improving the accuracy of the system (¶ [0110] Devarajan et al.).

Claims 6 and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Chen, Mingda, et al. ("A Multi-Task Approach for Disentangling Syntax and Semantics in Sentence Representations." arXiv e-prints (2019): arXiv-1904. https://arxiv.org/pdf/1904.01173.pdf; hereinafter referred to as Chen et al.) in combination with DeFelice; Michael (US 20190236148 A1; hereinafter referred to as DeFelice) and further in view of Moore, Robert C. (US 20040044530 A1; hereinafter referred to as Moore)

Regarding claims 6 and 14, Chen et al. in combination with DeFelice teach all of the limitations as in claims 1-3, 5, and 13, above.
Chen et al. further teaches:
wherein the at least one processor is configured to: 
receive a plurality of training sentence structures (see Figure 3: “Figure 3: Diagram showing the training process when using the discriminative paraphrase loss (DPL; dotted lines) and paraphrase reconstruction loss (PRL; dashdotted lines). The pair (x1, x2) is a sentential paraphrase pair, the y’s are the semantic variables corresponding to each x, and the z’s are syntactic variables.”); and 
for each training sentence structure (see Figure 3: “Figure 3: Diagram showing the training process when using the discriminative paraphrase loss (DPL; dotted lines) and paraphrase reconstruction loss (PRL; dashdotted lines). The pair (x1; x2) is a sentential paraphrase pair, the y’s are the semantic variables corresponding to each x, and the z’s are syntactic variables.” Here, x1, x2 are interpreted as training sentence structures.): 
process that training sentence structure through an encoder neural network to decompose that training sentence structure into a training semantics component and a training syntax component (see Figure 1, Section 3 Proposed Approach, and Subsection Inference and Generative Models citations as in claim 1. Here, it is interpreted that the same process as in claim 1 and Figure 1 (element x = sentence) of Chen et al. is performed in training sentences (i.e., x1, x2).);
process the training syntax component through the syntax VAE to receive a training syntax mean vector and a training syntax covariance matrix (see Figure 1, Section: 3 Proposed approach, and Subsection RNN citations as in claim 1. Here, it is interpreted that the same process as in claim 1 and Figure 1 (element x = sentence) of Chen et al. is performed in training sentences (i.e., x1, x2).); 
obtain a training sampled syntax vector from a syntax Gaussian posterior parameterized by the training syntax mean vector and the training syntax covariance matrix (see Section 3 Proposed Approach and Subsections: 3.1 Parametrizations and Inference and Generative Models citations as in claim 1. Here, it is interpreted that the same process as in claim 1 and Figure 1 (element x = sentence) of Chen et al. is performed in training sentences (i.e., x1, x2).); 
process the training semantics component through the semantics VAE to receive a training semantics mean vector and a training semantics covariance matrix (see Figure 1, Section: 3 Proposed approach, and Subsection RNN citations as in claim 1. Here, it is interpreted that the same process as in claim 1 and Figure 1 (element x = sentence) of Chen et al. is performed in training sentences (i.e., x1, x2).); 
obtain a training sampled semantics vector from a semantics Gaussian posterior parameterized by the training semantics mean vector and the training semantics covariance matrix (see Section 3 Proposed Approach and Subsections: 3.1 Parametrizations and Inference and Generative Models citations as in claim 1. Here, it is interpreted that the same process as in claim 1 and Figure 1 (element x = sentence) of Chen et al. is performed in training sentences (i.e., x1, x2).); 
determine a reconstruction loss value for the training sentence using the sampled syntax vector and the sampled semantics vector (see Section 4.1 Paraphrase Reconstruction Loss: “Our first loss is a paraphrase reconstruction loss (PRL). The key assumption underlying the PRL is that for a paraphrase pair x1, x2, the semantic information is equivalent between the two sentences and only the syntactic information varies. To impose such constraints, PRL is defined as [see Eq. 4].” Here, it is interpreted that the same process as in claim 1 and Figure 1 (element x = sentence) of Chen et al. is performed in training sentences (i.e., x1, x2).); and 
determine a reconstruction loss value, a KL divergence value, a regularization loss value and a structured reconstruction loss value (see Section 4.1 Paraphrase Reconstruction Loss citation as in previous limitation, and Section 3.1 Parameterizations / Subsection Gaussian distribution, 4.2  Discriminative Paraphrase Loss and 4.3 Word Position Loss: “3.1 Parameterizations / Subsection Gaussian distribution: Since we only consider a diagonal covariance matrix, the KL divergence term KL(qφ(z|x)kpθ(z)) can also be computed efficiently […] 4.2  Discriminative Paraphrase Loss: Our second loss is a discriminative paraphrase loss (DPL). The DPL explicitly encourages the similarity of paraphrases x1, x2 to be scored higher than the dissimilar sentences n1, n2 (i.e., negative samples; see Sec. 5 for more details) by a given margin δ. As shown in Figure 3, the similarity function in this loss only uses the semantic variables in the sentences. The loss is defined as [see eq. 5] […] 4.3 Word Position Loss: …To guide the syntactic variable to represent word order, we introduce a word position loss (WPL). Although our word averaging encoders only have access to the bag of words of the input, using this loss can be viewed as a denoising autoencoder where we have maximal input noise (i.e., an orderless representation of the input) and the encoders need to learn to reconstruct the ordering. For both word averaging encoders and LSTM encoders, WPL is parameterized by a three-layer feedforward neural network f(·) with input from the concatenation of the samples of the syntactic variable z and the embedding vector ei at input position i; we then attempt to predict a one-hot vector representing the position i.” Here, as mentioned in previous limitation, PRL is interpreted as associated with the reconstruction loss value. Also, DPL is interpreted as associated with regularization loss value (similarity), while WPL us interpreted as associated with structured reconstruction loss.)

However, Chen et al. in combination with DeFelice do not explicitly teach 
wherein the at least one processor is configured to: 
apply optimization methods over iterations of each training sentence to determine the simplex 
Moore does teach:
wherein the at least one processor is configured to: 
apply optimization methods over iterations of each training sentence to determine the simplex (see [0047]: “In previous length-based alignment systems, p(l.sub.f.vertline.l.su- b.e) has been modeled as a Gaussian distribution based on the log of the ratio of length l.sub.e to length l.sub.f. A Gaussian distribution includes two hidden parameters, a mean and a variance, for each length l.sub.e. Although the mean of the Gaussian distribution could be estimated based on the average lengths of sentence found in each corpus, the variances could not be estimated without having aligned corpora. As a result, prior art systems had to use an Expectation-Maximization (EM) algorithm to identify those parameters by iteratively estimating the parameters and then using the estimates to form estimated alignments. In particular, the prior art first estimated the means and variances of the Gaussian model, then used the estimated model to identify a likely alignment. Based on this alignment, the means and variances of the Gaussian model would be updated. The updated model would then be used to identify a new alignment. This would continue until the model became stable at which point the final means and variances would be selected for the model.” Here, the “simplex” is interpreted as the mean/variance data obtained from the Gaussian model, “each training sentence” is interpreted to be associated with the sentences of each corpus, and the optimizations methods are interpreted to be associated to the update/iterative estimation of mean/variance parameters.).

Chen et al. in combination with DeFelice and Moore are all considered to be analogous to the claimed invention because they are in the same field of endeavor in text/data processing/generation. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Chen et al. in combination with DeFelice to incorporate the teachings Moore of applying optimization methods over iterations of each training sentence to determine the simplex which provides the benefit of producing scores reflecting actual probability estimates for the states in the initial search space, which allows more pruning of the search space without sacrificing accuracy (¶ [0105] Moore).


Claims 7-8 and 15-16  are rejected under 35 U.S.C. 103 as being unpatentable over Chen, Mingda, et al. ("A Multi-Task Approach for Disentangling Syntax and Semantics in Sentence Representations." arXiv e-prints (2019): arXiv-1904. https://arxiv.org/pdf/1904.01173.pdf; hereinafter referred to as Chen et al.) in combination with DeFelice; Michael (US 20190236148 A1; hereinafter referred to as DeFelice) and Moore, Robert C. (US 20040044530 A1; hereinafter referred to as Moore); and further in view of Norton; R. David et al. (US 20200279017 A1; hereinafter referred to as Norton et al.)

Regarding claims 7 and 15, Chen et al. in combination with DeFelice and Moore teach all of the limitations as in claim 6 and 14, above
Chen et al. further teaches:
the aggregate loss values of each reconstructed training sentence (see Table 1 and Section 7. Results and Analysis and Subsection 7.1 Semantic Similarity: Finally, we see that when the bag-of-words VGVAE model is used with all of the multi-task losses (“ALL”), we observe a large gap between the performance of the semantic and syntactic latent variables, as well as strong performance on the STS tasks that outperforms all baselines. Here, the combination of all the multi-task losses is interpreted as the aggregate of loss values.);
However, Chen et al. in combination with DeFelice and Moore does not explicitly teach: wherein the at least one processor is configured to: 
display classification groupings associated with [
assign a label for each classification grouping.
Norton et al. does teach:
wherein the at least one processor is configured to: 
display classification groupings associated with  (disclosed by Chen et al. as discussed above)] loss values of each reconstructed training sentence (see ¶ [0133-0136, 0159]: “[0133]: To implement the loss function 508, in some embodiments, the intelligent-text-insight system 106 compares the textual-quality-training score 506 and the ground-truth-quality score 510 in a mean-absolute-error function. In the alternative to a mean-absolute-error function, the intelligent-text-insight system 106 uses an L2-loss function, cross-entropy-loss function, a mean-squared-error-loss function, a root-mean-squared-error function, or other suitable loss function as the loss function 508. Upon determining a loss from the loss function 508, the intelligent-text-insight system 106 adjusts network parameters (e.g., weights or values) of the text-quality classifier 208 to decrease a loss for the loss function 508 in a subsequent training iteration. For example, the intelligent-text-insight system 106 may increase or decrease weights or values of the text-quality classifier 208 to minimize the loss in a subsequent training iteration. [0135] In addition (or in the alternative) to training the text-quality classifier 208, the intelligent-text-insight system 106 applies the text-quality classifier 208, computational sentiment analysis, and other text-based determinations to generate representative-textual responses for a set of textual responses. […] [0136]: […] For the textual responses 512n-512n, the intelligent-text-insight system 106 further (and respectively) determines relevancy parameters 520a-520n, topics 522a-522n, and sentiment indicators 524a-524n. […]. [0159]: As further shown in FIG. 6A, the intelligent-text-insight system 106 provides representative-response indicators that correspond to (and describe) the representative-textual responses 620a-620c within the graphical user interface 604a. For instance, the representative-textual responses 620a-620c correspond to a summary caption 616a indicating the selected topic and time period. The representative-textual responses 620a-620c also respectively correspond to sentiment scores 618a-618c shown in the graphical user interface 604a. Each of the sentiment scores 618a-618c in turn correspond to a sentiment label indicated on a sentiment-label key 608. As the sentiment scores 618a-618c and the sentiment-label key 608 indicate, in some embodiments, the intelligent-text-insight system 106 selects each of the representative-textual responses 620a-620c from a corresponding response group of textual responses. Such response groups may include textual responses corresponding to a range of sentiment scores or to a particular sentiment label (e.g., positive sentiment, neutral sentiment, negative sentiment).” Here, it is interpreted that the classification groupings are associated with the response groups (i.e., sentiment score or labels) displayed in the GUI. Also, these response groups (or classification groupings) are associated with a loss function which is interpreted to be associated with loss values which in this case is interpreted as the loss function that is implemented by the intelligent-text-insight system. Here, it is interpreted that Chen et al. “aggregate loss values” and Norton can be combined with the “display of groupings associated with loss values”.); and 
assign a label for each classification grouping (see ¶ [0159]: “Such response groups may include textual responses corresponding to a range of sentiment scores or to a particular sentiment label (e.g., positive sentiment, neutral sentiment, negative sentiment).” Here, the labels assigned to the classification grouping (i.e., response groups) are the positive, neutral or negative sentiments.) .

Chen et al. in combination with DeFelice and Moore and Norton et al. are all considered to be analogous to the claimed invention because they are in the same field of endeavor in text/data processing/generation. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Chen et al. in combination with DeFelice and Moore to incorporate the teachings Norton et al. of displaying classification groupings and labelling them which provides the benefit of improves the efficiency and flexibility of the system (¶ [0040] Norton et al.)

Regarding claims 8 and 16, Chen et al. in combination with DeFelice and Moore teach all of the limitations as in claim 6 and 15, above
However, Chen et al. in combination with DeFelice and Moore does not explicitly teach: wherein the classification groupings comprise at least one of: 
sentence sentiment or sentence topic.
Norton et al. does teach:
wherein the classification groupings comprise at least one of: 
sentence sentiment or sentence topic (see ¶ [0159] citation as last limitation of claims 7 and 15: “Such response groups may include textual responses corresponding to a range of sentiment scores or to a particular sentiment label (e.g., positive sentiment, neutral sentiment, negative sentiment).” Here, it is interpreted that the classification groupings are associated with the response groups (i.e., sentiment score or labels).).
Chen et al. in combination with DeFelice and Moore and Norton et al. are all considered to be analogous to the claimed invention because they are in the same field of endeavor in text/data processing/generation. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Chen et al. in combination with DeFelice and Moore to incorporate the teachings Norton et al. of displaying classification groupings and labelling them which provides the benefit of improves the efficiency and flexibility of the system (¶ [0040] Norton et al.)

Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Keisha Y Castillo-Torres whose telephone number is (571)272-3975. The examiner can normally be reached Monday - Friday, 9:00 am - 4:00 pm (EST).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Pierre-Louis Desir can be reached on (571)272-7799. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

Keisha Y. Castillo-Torres
Examiner
Art Unit 2659


/Keisha Y. Castillo-Torres/Examiner, Art Unit 2659                                                                                                                                                                                                        
/Paras D Shah/Primary Examiner, Art Unit 2659                                                                                                                                                                                                        
10/06/2022