Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 5-7, 11 and 12 are rejected under 35 U.S.C. 103 as being unpatentable over Devlin et al. (“BERT: Pre-training of Deep Bidirectional Transformer for Language Understanding”) in view of Haoming et al. (“SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization”).

Regarding claim 1, Devlin discloses a method performed on a computing device, the method comprising:		providing a machine learning model having one or more mapping layers, including at least a first mapping layer configured to map components of pretraining examples into first representations in a space (Devlin discusses a multi-layer machine learning model which has multiple layers and those layers are mapping layers (as discussed in the Vaswani reference which gives further details on the architecture of the Devlin system. This is discussed in page 3, section “Model Architecture”, Figures 1 and 2, As discussed in section 3.2 and 3.3 of Devlin, “Input Representation” and “Pre-training Tasks”, the text sentences that form the pretraining examples can be represented in a space as vectors or embeddings); 		performing a pretraining stage on the one or more mapping layers using the pretraining examples (Devlin, section 3.1, section 3.3 “Pre-training Tasks discusses pre-training the multi-layer machine learning model BERT). 
		Devlin does not disclose wherein the pretraining stage comprises: adding noise to the first representations of the components of the pretraining examples to obtain noise-adjusted first representations; and performing a self-supervised learning process to pretrain the one or more mapping layers using at least the first representations and the noise-adjusted first representations of the components of the pretraining examples.

		Haoming discloses wherein the pretraining stage comprises: 		adding noise (Haoming, page 2179, right-hand column, third paragraph: “inject a small perturbation..to xi”) to the first representations of the components of the pretraining examples  to obtain noise-adjusted first representations (Haoming, page 2179, left column, third paragraph: “xf’s denote the embedding of the input sentence”); and 		performing a self-supervised learning process to pretrain the one or more mapping layers using at least the first representations and the noise-adjusted first representations of the components of the pretraining examples (Haoming discusses using the first representation or embedding and a noise adjusted representation of the pretraining example on page 2179, left col., last paragraph and into the right hand column of page 2179:    (ƒ(x̃𝑖;𝜃),(ƒ(x𝑖;𝜃) which are applied within the context of the BERT model. Also see the third paragraph of the left hand column of page 2179. The framework utilized by Haoming is at least partially self-supervised as discussed in the second paragraph on the right hand column of page 2179.) 

It would have been obvious before the effective filing date of the claimed invention to one of ordinary skill in the art to have modified the invention of Devlin to include wherein the pretraining stage comprises: adding noise to the first representations of the components of the pretraining examples to obtain noise-adjusted first representations; and performing a self-supervised learning process to pretrain the one or more mapping layers using at least the first representations and the noise-adjusted first representations of the components of the pretraining examples as taught by Devlin. The suggestion/motivation for doing so would have been that training using noisy samples along with not non-noisy samples results in a learning model which is more effective under various conditions. Therefore, it would have been obvious before the effective filing date of the claimed invention to one of ordinary skill in the art to have combined Haoming with Devlin.
 




Regarding claim 5, the combination of Devlin and Haoming discloses the method of claim 1, further comprising: after the pretraining stage, performing a supervised learning process on a classification layer and the one or more mapping layers (Devlin discusses a “output layer for classification”, section 3.2, see Fig.1 for the mapping layers).

Regarding claim 6, the combination of Devlin and Haoming discloses the method of claim 5, wherein the supervised learning process is performed using adversarial training or virtual adversarial training (Haoming discusses using adversarial training on page 2178, left hand column, item “(I)” and also in section 3.1).

Regarding claim 7, the combination of Devlin and Haoming discloses the method of claim 5, wherein the classification layer is selected from a group comprising a single-sentence classification layer, a pairwise text similarity layer, and a pairwise text classification layer (Devlin, As discussed in the second paragraph of Section 3.2 and in section 4, there are several task-specific classification layers such as sentence pair, text pair classification, hypothesis-premise pair, question-passage pair).

Regarding claim 11. the combination of Devlin and Haoming discloses the method of claim 1, wherein the adding noise comprises regularizing a training objective using virtual adversarial training (The Haoming reference discusses adversarial regularization as the goal of noise or perturbation injection into the system’s training as discussed in section 3.1).

Regarding claim 12. the combination of Devlin and Haoming discloses the method of claim 11, wherein the training objective encourages a smooth output distribution of the machine learning model for pairs of first representations and corresponding noise-adjusted first representations of the components of the pretraining examples (As discussed in the section 3.1 of the Haoming reference, the system of Haoming performs Smoothness-Inducing Adversarial Regulization and the objective is a smooth output distribution for the pair of first representation and its associated noise adjusted representation).

Claims 8 and 9 are rejected under 35 U.S.C. 103 as being unpatentable over Devlin et al. (“BERT: Pre-training of Deep Bidirectional Transformer for Language Understanding”) in view of Haoming et al. (“SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization”), further in view of Pathak et al. (“Context Encoders: Feature Learning by Inpainting”).

Regarding claim 8, the combination of Devlin and Haoming discloses the method of claim 5, but does not disclose wherein the pretraining examples comprise images or video, and the one or more mapping layers include a convolutional layer.

Pathak discloses wherein the pretraining examples comprise images or video (Pathak, “5. Evaluation” discusses using images for a pretraining step for image classification), and the one or more mapping layers include a convolutional layer (In the Pathak reference, section “3.1 Encoder-decoder pipeline, Encoder” discusses using multiple convolutional layers).
It would have been obvious before the effective filing date of the claimed invention to one of ordinary skill in the art to have modified the invention of Devlin and Haoming to include wherein the pretraining examples comprise images or video, and the one or more mapping layers include a convolutional layer as taught by Pathak. The suggestion/motivation for doing so would have been that using images or videos as pretraining examples and using a convolutional layer for the mapping layer is routinely done and would have been well-known to one of ordinary skill in the art. Therefore, it would have been obvious before the effective filing date of the claimed invention to one of ordinary skill in the art to have combined Pathak with Haoming and Devlin.


Regarding claim 9. the combination of Devlin, Haoming and Pathak discloses the method of claim 8, wherein the supervised learning process trains the classification layer to predict classifications of objects in the images or video (Pathak, “5.2.1 Classification pre-training” discusses using supervised learning to train the AlexNet classifier with a classification layer for image classification).

Claim 10 is rejected under 35 U.S.C. 103 as being unpatentable over Devlin et al. (“BERT: Pre-training of Deep Bidirectional Transformer for Language Understanding”) in view of Haoming et al. (“SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization”), further in view of Erhan et al. (“The Difficulty of Training Deep Architectures and the Effect of Unsupervised Pre-Training”).

Regarding claim 10. the combination of Devlin and Haoming discloses the method of claim 1, further comprising: but does not disclose performing one or more initial training iterations of the self- supervised learning process without the noise-adjusted first representations; and performing one or more subsequent training iterations of the self- supervised learning process with the noise-adjusted first representations.

Erhan discloses performing one or more initial training iterations of a learning process to train the machine learning model with the first representations and performing one or more subsequent training iterations of the learning process to train the machine learning model with the noise-adjusted first representations (Erhan, last paragraph in Section “3. Experimental Methodology” on page 155, The system of Erhan uses 50 iterations initial pretraining iterations of a learning process and 50 subsequent fine tuning iterations of the learning process to train the learning model).

It would have been obvious before the effective filing date of the claimed invention to one of ordinary skill in the art to have modified the invention of Devlin and Haoming to include wherein performing one or more initial training iterations of a learning process to train the machine learning model with the first representations and performing one or more subsequent training iterations of the learning process to train the machine learning model with the noise-adjusted first representations as taught by Erhan. The suggestion/motivation for doing so would have been that using multiple initial pretraining iterations and multiple subsequent pretraining iterations improves accuracy of the learning model. Therefore, it would have been obvious before the effective filing date of the claimed invention to one of ordinary skill in the art to have combined Erhan with Haoming and Devlin.


Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claims 13 and 14 are rejected under 35 U.S.C. 102(a)(1) and/or 102(a)(2) as being anticipated by Haoming et al. (“SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization”).

Regarding claim 13, Haoming discloses a system comprising: a hardware processing unit; 		and a storage resource storing computer-readable instructions which, when executed by the hardware processing unit, cause the hardware processing unit to:		receive input data (The system of Haoming receives input data as discussed in part (I) of the left hand column of page 2178. This is also discussed in section 3.1); 		process the input data using a machine learning model having a first layer and a second layer to obtain a result (The system of Haoming has a top layer or first layer and task specific layer or second layer with the top layer being the first layer and the task specific layer being the second layer. This is discussed in the last paragraph in the right hand column of page 2177, the last paragraph on the right hand column of page 2178 and also in section 3.1), the first layer having been pretrained in a pretraining stage using virtual adversarial training for a self-supervised learning task (As discussed in section 3.1 of the Haoming reference, the first layer uses adversarial training in the form of smoothness-inducing adversarial regularization which is at least partially self-supervised to reduce costs.);and output the result (As discussed in (I) in the left hand column, the second paragraph, on page 2178 and in section 3.1, an output of the model is not changed substantially as a result of the SMART framework proposed by Haoming).

Regarding claim 14. Haoming discloses the system of claim 13, wherein the virtual adversarial training used in the pretraining stage involves adding noise to representations of components of pretraining examples that are used to adjust parameters of the first layer (Haoming, A perturbation (noise) is injected into the adversarial training system of Haoming in order to fine tune the parameters as discussed in section 3.1 and the first paragraph of page 2178)
Claims 15 is rejected under 35 U.S.C. 103 as being unpatentable over Haoming et al. (“SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization”) in view of Rehling et al. (US 8,463,595 B1)

Regarding claim 15. Haoming discloses the system of claim 14, but does not disclose wherein the input data comprises reviews, the result characterizes sentiments associated with the reviews as predicted by the machine learning model, and the computer-readable instructions, when executed by the hardware processing unit, cause the hardware processing unit to: determine whether to output individual reviews in response to a request for negative reviews based at least on the sentiments predicted by the machine learning model.
Rehling discloses wherein the input data comprises reviews, the result characterizes sentiments associated with the reviews as predicted by the machine learning model (As discussed in col. 10, lines 10-15 of the Rehling reference, the inputted reviews are characterized based on the sentiment they contain with the cracked cylinder review considered a negative sentiment in regards to the automobile dealer. However, the sentiment is rated neutral in regards to an automobile repair shop) and the computer-readable instructions, when executed by the hardware processing unit, cause the hardware processing unit to: determine whether to output individual reviews in response to a request for negative reviews based at least on the sentiments predicted by the machine learning model (Based on the negative sentiment that has been associated with the cracked cylinder review in regards to the automobile dealer and the neutral sentiment that has been associated with the cracked cylinder review in regards to the automobile repair shop, if both the automobile dealer and the automobile repair shop have requested negative reviews, only the automobile dealer will receive the review since the review is only considered a negative review in regards to the automobile dealer and not in regards to the automobile repair shop).

It would have been obvious before the effective filing date of the claimed invention to one of ordinary skill in the art to have modified the invention of Haoming to include wherein the input data comprises reviews, the result characterizes sentiments associated with the reviews as predicted by the machine learning model, and the computer-readable instructions, when executed by the hardware processing unit, cause the hardware processing unit to: determine whether to output individual reviews in response to a request for negative reviews based at least on the sentiments predicted by the machine learning model as taught by Rehling. The suggestion/motivation for doing so would have been that utilizing learning models to figure out to which party the sentiment in a review is directed to and then using that information to pinpoint who needs to see the review based on their user preferences/demands results in a more efficient system that doesn’t waste a user’s time sending irrelevant/unrequested reviews. Therefore, it would have been obvious before the effective filing date of the claimed invention to one of ordinary skill in the art to have combined Rehling with Haoming.

Claim 16 is rejected under 35 U.S.C. 103 as being unpatentable over Haoming et al. (“SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization”) in view of Srivastava (US 2018/0053233 A1).

Regarding claim 16. Haoming discloses the system of claim 14, but does not disclose wherein the input data comprises a query, the result reflects similarities of the query to a plurality of documents as output by the machine learning model, and the computer-readable instructions, when executed by the hardware processing unit, cause the hardware processing unit to: rank the plurality of documents relative to the query based at least on the similarities output by the machine learning model.
Srivastava discloses ranking the plurality of documents relative to the query based at least on the similarities output by the machine learning model (Srivastava discusses reranking by the machine learning model based on similarity to the query in paragraph 77).

It would have been obvious before the effective filing date of the claimed invention to one of ordinary skill in the art to have modified the invention of Haoming to include ranking the plurality of documents relative to the query based at least on the similarities output by the machine learning model as taught by Srivastava. The suggestion/motivation for doing so would have been that ranking based on similarity to a query using a machine learning model leads to more useful results for a user. Therefore, it would have been obvious before the effective filing date of the claimed invention to one of ordinary skill in the art to have combined Srivastava with Haoming.





Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 17 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Devlin et al. (“BERT: Pre-training of Deep Bidirectional Transformer for Language Understanding”) in view of Haoming et al. (“SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization”), further in view of Erhan et al. (“The Difficulty of Training Deep Architectures and the Effect of Unsupervised Pre-Training”).


Regarding claim 17, Devlin discloses a computer-readable storage medium storing instructions which, when executed by one or more processing devices, cause the one or more processing devices to perform acts comprising: 

providing a machine learning model having one or more mapping layers, including at least a first mapping layer configured to map components of pretraining examples into first representations in a space (Devlin discusses a multi-layer machine learning model which has multiple layers and those layers are mapping layers (as discussed in the Vaswani reference which gives further details on the architecture of the Devlin system. This is discussed in page 3, section “Model Architecture”, Figures 1 and 2, As discussed in section 3.2 and 3.3 of Devlin, “Input Representation” and “Pre-training Tasks”, the text sentences that form the pretraining examples can be represented in a space as vectors or embeddings); 		performing a pretraining stage on the one or more mapping layers using the pretraining examples (Devlin, section 3.1, section 3.3 “Pre-training Tasks discusses pre-training the multi-layer machine learning model BERT). 
		Devlin does not disclose wherein the pretraining stage comprises: adding noise to the first representations of the components of the pretraining examples to obtain noise-adjusted first representations; and performing a self-supervised learning process to pretrain the one or more mapping layers using at least the first representations and the noise-adjusted first representations of the components of the pretraining examples.

		Haoming discloses wherein the pretraining stage comprises: 		adding noise (Haoming, page 2179, right-hand column, third paragraph: “inject a small perturbation..to xi”) to the first representations of the components of the pretraining examples (Haoming, page 2179, left column, third paragraph: “xf’s denote the embedding of the input sentence”) to obtain noise-adjusted first representations; and 		performing a self-supervised learning process to pretrain the one or more mapping layers using at least the first representations and the noise-adjusted first representations of the components of the pretraining examples (Haoming, page 2179, left col., last paragraph discusses using the first representation or embedding and a noise adjusted representation of the pretraining example and as discussed in Haoming section 3.1) 

It would have been obvious before the effective filing date of the claimed invention to one of ordinary skill in the art to have modified the invention of Devlin to include wherein the pretraining stage comprises: adding noise to the first representations of the components of the pretraining examples to obtain noise-adjusted first representations; and performing a self-supervised learning process to pretrain the one or more mapping layers using at least the first representations and the noise-adjusted first representations of the components of the pretraining examples as taught by Devlin. The suggestion/motivation for doing so would have been that training using noisy samples along with not non-noisy samples results in a learning model which is more effective under various conditions. Therefore, it would have been obvious before the effective filing date of the claimed invention to one of ordinary skill in the art to have combined Haoming with Devlin.
 
The combination of Haoming and Devlin does not disclose performing one or more initial pretraining iterations of a learning process to train the machine learning model with the first representations and performing one or more subsequent pretraining iterations of the learning process to train the machine learning model with the noise-adjusted first representations.

Erhan discloses performing one or more initial pretraining iterations of a learning process to train the machine learning model with the first representations and performing one or more subsequent pretraining iterations of the learning process to train the machine learning model with the noise-adjusted first representations (Erhan, last paragraph in Section “3. Experimental Methodology” on page 155, The system of Erhan uses 50 iterations initial pretraining iterations of a learning process and 50 subsequent fine tuning iterations of the learning process to train the learning model.)

It would have been obvious before the effective filing date of the claimed invention to one of ordinary skill in the art to have modified the invention of Devlin and Haoming to include wherein performing one or more initial pretraining iterations of a learning process to train the machine learning model with the first representations and performing one or more subsequent pretraining iterations of the learning process to train the machine learning model with the noise-adjusted first representations as taught by Erhan. The suggestion/motivation for doing so would have been that using multiple initial pretraining iterations and multiple subsequent pretraining iterations improves accuracy of the learning model. Therefore, it would have been obvious before the effective filing date of the claimed invention to one of ordinary skill in the art to have combined Erhan with Haoming and Devlin.

Regarding claim 18, the combination of Devlin, Haoming and Erhan discloses the computer-readable storage medium of claim 17, wherein the first representations comprising embedding vectors (Devlin discusses a multi-layer machine learning model which has multiple layers and those layers are mapping layers (as discussed in the Vaswani reference which gives further details on the architecture of the Devlin system. This is discussed in page 3, section “Model Architecture”, Figures 1 and 2, As discussed in section 3.2 and 3.3 of Devlin, “Input Representation” and “Pre-training Tasks”, the text sentences that form the pretraining examples can be represented in a space as vectors or embeddings), the noise-adjusted first representations comprising noise-adjusted embedding vectors(Haoming, page 2179, right-hand column, third paragraph: “inject a small perturbation..to xi”) to the first representations of the components of the pretraining examples (Haoming, page 2179, left column, third paragraph: “xf’s denote the embedding of the input sentence”).

Claims 19 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Devlin et al. (“BERT: Pre-training of Deep Bidirectional Transformer for Language Understanding”) in view of Haoming et al. (“SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization”), in view of Erhan et al. (“The Difficulty of Training Deep Architectures and the Effect of Unsupervised Pre-Training”), further in view of Shi et al.  (“Adaptive iterative attack towards explainable adversarial robustness”).                                                                                                                                                                                                     

Regarding claim 19. the combination of Devlin, Haoming and Erhan discloses the computer-readable storage medium of claim 18, but does not disclose wherein the performing noise adjustment comprises determining an adversarial direction in which to perform the noise adjustment.

Shi discloses wherein the performing noise adjustment comprises determining an adversarial direction in which to perform the noise adjustment (Shi is an adversarial system and based on the noise-adding it determines what the goal of attack is and an adversarial direction in which to perform the adjustment as discussed in section 3.2 of the Shi reference).

It would have been obvious before the effective filing date of the claimed invention to one of ordinary skill in the art to have modified the invention of Devlin and Haoming to include wherein the performing noise adjustment comprises determining an adversarial direction in which to perform the noise adjustment as taught by Erhan. The suggestion/motivation for doing so would have been that determining the adversarial direction in which to perform the adjustment improves robustness of the learning model. Therefore, it would have been obvious before the effective filing date of the claimed invention to one of ordinary skill in the art to have combined Shi with Haoming and Devlin and Erhan.

Regarding claim 20. the combination of Devlin, Haoming, Erhan and Shi discloses the computer-readable storage medium of claim 19, wherein the one or more subsequent pretraining iterations encourage the machine learning model to produce a smooth output distribution for predictions made using the embedding vectors and the noise-adjusted embedding vectors (As discussed in the section 3.1 of the Haoming reference, the system of Haoming performs Smoothness-Inducing Adversarial Regulization and the objective is a smooth output distribution for the pair of first representation and its associated noise adjusted representation).


Allowable Subject Matter
Claims 2 (and its dependents 3 and 4) are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.




Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ELISA M RICE whose telephone number is (571)270-1582. The examiner can normally be reached M-F 11am to 7pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Chan Park can be reached on 571-272-7409. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/ELISA M RICE/Examiner, Art Unit 2669                                                                                                                                                                                                        
/JOHN B STREGE/Primary Examiner, Art Unit 2669