DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . 

The text of those sections of Title 35, U.S. Code not included in this action can be found in a prior Office action.

Response to Amendments and Arguments
Regarding the rejection to claims 1-20 under 35 U.S.C. §102(a)(1), applicant argued (Remarks, page 6) that Tripathi reference was not prior art because the publication date is within one year of the provisional filing date. The applicant further argued the prior art exceptions under 35 U.S.C. §102(b)(l) apply here (disclosures made by an inventor, joint inventor, or another who obtained the subject matter disclosed directly or indirectly from the inventor or a joint inventor, 1 year or less before the effective filing date of the claimed invention).

In response, the examiner notes that the cited Tripathi reference was co-authored by Aanchan Mohan who is NOT an invention of the instant application. The cited reference and the instant application have different inventive entities. The Tripathi qualify as a prior art under a provision of 35 U.S.C. §102(a)(1) and the exception under §102(b)(1) does not apply to the cited reference. If applicant believed Aanchan Mohan, obtained the subject matter from the instant inventors, applicant must provide an evidence or a formal statement.

	Applicant further argued that April 15, 2018 indicates when the conference proceeding started, but does not indicate that Tripathi et al. was actually published or
publicly disclosed on that date.

	In response, the examiner provided a prima facie evidence that Tripathi was publically available on April 15, 2018 based on an ICASSP conference starting date. Now the burden is shift to the applicant to provide factual evidences showing that Tripathi paper was not publically available on April 15, 2018.  

As a comment, the examiner had participated the annual ICASSP conference several times when doing research in a university. The ICASSP conference proceeding is distributed to all participants during registering to the conference. Therefore, the conference proceeding is publically available on or before the conference staring date. Therefore, Tripathi paper is publically available on or before April 15, 2018 and qualify as a prior art reference under 35 U.S.C. §102(a)(1). 

	The argument is not persuasive. The rejection under §102(a)(1) has been maintained. 
	 
discloses using a domain adversarial neural network (DANN) in connection with images, not speech, and a person of ordinary skill in the art would not look to adapt the DANN of Ganin et al. to function with speech information (emphasis in the remarks).  

	In response, the examiner notes artificial neural network models are just a type of machine learning models and can be applied in different technical areas. Ganin discloses learning approach for domain adaptation and can be applied to document sentiment analysis and image classification. These are just a few of application examples. The domain-adversarial neural networks (DANN) are mathematical models / algorithms that could be used for any type of input / output data (Section 4).  Palaz is directed to convolutional neural networks (CNN) applied to a speech recognition application. Both Ganin and Palaz are using neural network models to model input / output relationships. The speech data are just numbers derived from speech signals. Images data are just numbers derived from images. Ganin reference is reasonably pertinent art. It is not required that the reference be from the same field of endeavor as the claimed invention, in light of the Supreme Court's instruction that "[w]hen a work is available in one field of endeavor, design incentives and other market forces can prompt variations of it, either in the same field or a different one." Id. at 417, 82 USPQ2d 1396. Rather, a reference is analogous art to the claimed invention if: (1) the reference is from the same field of endeavor as the claimed invention (even if it addresses a different problem); or (2) the reference is reasonably pertinent to the problem faced by the 

	Applicant further argued (Remarks, page 7) that “Second, there is no teaching, suggestion, or motivation stated anywhere in Ganin et al. regarding extending or adapting the DANN of Ganin et al. to process (much less handle) speech information. Similarly, there is no teaching, suggestion, or motivation stated anywhere in Palaz
et al. regarding extending or adapting the system of Palaz et al. to process or handle image information. Because of this, a person or ordinary skill in the art, given the teachings of the two references, would not look to combine them.”

	In response, the examiner notes known work in one field of endeavor may prompt variations of it for use in either the same field or a different one based on design incentives or other market forces if the variations are predictable to one of ordinary skill in the art (MPEP, 2143). Ganin discloses they evaluated their training algorithms using different data including sentiment analysis problem in natural language processing (page 3) and demonstrated the success of domain-adversarial learning (Abstract). A person with ordinal skill would be motivated to use domain adaptation methods as described in Ganin because success applications in different areas. 

	Applicant further argued that Ganin is handling a particular type of input information (image), while Palaz is handling speech information. Applicant argued “it is unclear whether the system of Palaz et al. could be re-trained or reconfigured
to handle the image data of Ganin et al. Because of these apparent technical hurdles,
it is unclear whether a person of ordinary skill in the art could undertake such a modification in the first instance. 
	
	The examiner notes both image data and speed data are digitized data which are numbers. The neural network models just take input data (numbers) regardless whether the original data were digitized from speech or image. "The test for obviousness is not whether the features of a secondary reference may be bodily incorporated into the structure of the primary reference.... Rather, the test is what the combined teachings of those references would have suggested to those of ordinary skill in the art." In reKeller, 642 F.2d 413, 425, 208 USPQ 871, 881 (CCPA 1981). See also In reSneed, 710 F.2d 1544, 1550, 218 USPQ 385, 389 (Fed. Cir. 1983) ("[I]t is not necessary that the inventions of the references be physically combinable to render obvious the invention under review."); and In reNievelt, 482 F.2d 965, 179 USPQ 224, 226 (CCPA 1973) ("Combining the teachings of references does not involve an ability to combine their specific structures.").

	From the above explanation, the arguments regarding to the rejection under 35 U.S.C. §103 are not persuasive. The rejections have been maintained. 

	


Claim Rejections - 35 USC § 102
Claims 1-20 are rejected under 35 U.S.C. §102 (a)(1) as being anticipated by Tripathi et al. (“Adversarial learning of raw speech features for domain invariant speech recognition”, IEEE ICASSP, published on April. 15, 2018, hereinafter referred to as Tripathi). 

Examiner notes: Tripathi was published on 04/15/2018 which is earlier than a filing date of the provision application (62/659,584, filed on 04/18/2018). Although Tripathi was co-authored by the instant inventors, the Tripathi reference has a different inventor entities and qualifies as a prior art reference under 35 U.S.C. §102 (a)(1). All claimed features defined by the instant claims 1-20 are based on disclosure (Fig. 1 and Spec, pages 7-11) of the instant application. The Fig. 1 of the instant application is exactly the same as the fig. 1 of the cited Tripathi reference. The equations in Spec. (pages 7-11) are the same as that in the cited Tripathi (Sections 3).   

Regarding claims 1 and 11, Tripathi discloses a system (and a method) for automatic speech recognition by training a neural network to learn features from raw speech (Fig. 1, Section 4.2, speech recognition experiments using neural network trained using raw speech), comprising: 
a neural network executing on a computer system and comprising a feature extractor, a label classifier, and a domain classifier (Fig. 1, Section 3.2, pages 5960-5961), wherein: 
the feature extractor processes raw speech data and generates a first output data (Fig. 1, section 3.1, pages 5960-5961); 
Fig. 1, section 3.2, pages 5960-5961); 
the domain classifier processes the first output data and generating a third output data (Fig. 1, section 3.2, pages 5960-5961); 
the neural network calculates first loss data based on the second output, and second loss data based on the third output (Fig. 1, section 3.2, pages 5960-5961); and 
the neural network is trained to minimize a cross-entropy cost of the label classifier and to maximize a cross-entropy cost of the domain classifier using the first loss data and the second loss data (Fig. 1, section 3.2, pages 5960-5961).

Regarding claims 2 and 12, Tripathi discloses a gradient reversal layer, wherein, prior to the domain classifier processing the first output data, the gradient reversal layer processes the first output data and feeds the processed first output data into the domain classifier (Fig. 1, section 3.2, pages 5960-5961).

Regarding claims 3 and 13, Tripathi discloses the gradient reversal layer uses a standard stochastic gradient descent based approach to process the first output data (Fig. 1, section 3.2, pages 5960-5961).

Regarding claims 4 and 14, Tripathi discloses the feature extractor is a multi-layer convolutional neural network ("CNN") comprising a convolutional layer, an average pooling step, and a rectified linear unit ("ReLU") (Fig. 1, section 3.1 and 3.2, pages 5960-5961)

Regarding claims 5 and 15, Tripathi discloses the label classifier comprises a linear step, a ReLU, and a softmax function (Fig. 1, section 3.1 and 3.2, pages 5960-5961).

Regarding claims 6 and 16, Tripathi discloses the domain classifier comprises a linear step, a ReLU, and a softmax function (Fig. 1, section 3.1 and 3.2, pages 5960-5961).
Regarding claims 7 and 17, Tripathi discloses computing the first loss over labeled samples (Fig. 1, section 3.1 and 3.2, pages 5960-5961).

Regarding claims 8 and 18, Tripathi discloses computes the second loss over labeled samples and unlabeled samples (Fig. 1, section 3.1 and 3.2, pages 5960-5961).

Regarding claims 9 and 19, Tripathi discloses the label classifier optimizes one or more parameters of the feature extractor and the label predictor using the first loss data (Fig. 1, section 3.1 and 3.2, pages 5960-5961).

Regarding claims 10 and 20, Tripathi discloses the one or more parameters are used as a saddle point during training of the neural network (Fig. 1, section 3.1 and 3.2, pages 5960-5961).
Claim Rejections - 35 USC § 103
Examiner Notes: the instant application uses a domain adversarial neural network (DANN) described by Ganin for speech recognition (See Ganin Fig. 1 in page 12, which is similar to the illustrated neural network in Fig. 1 of the instant application). Ganin illustrated DANN in different applications such as an image processing application, sentiment analysis for natural language analysis etc. The feature of using raw speech as input was disclosed by Palaz.

Regarding claims 1 and 11, Ganin discloses a system (and a method) (Section 4, fig. 1), comprising: 
a neural network executing on a computer system and comprising a feature extractor, a label classifier, and a domain classifier (page 12, Fig. 1), wherein: 
the feature extractor processes data and generates a first output data (Fig. 1, pages 12, 20, a feature extractor have two or three convolution layers,  CNN network); 
the label classifier processes the first output data and generates a second output data (fig. 1, page 3, “We provide an experimental evaluation of the proposed domain-adversarial learning idea over a range of deep architectures and applications. We first consider the simplest DANN architecture where the three parts (label predictor, domain classifier and feature extractor) are linear, and demonstrate the success of domain-adversarial learning for such architecture. The evaluation is performed for synthetic data as well as for the sentiment analysis problem in natural language processing”, page 13, a label predictor generating a second output); 
fig. 1, page 12); 
the neural network calculates first loss data based on the second output, and second loss data based on the third output (fig. 1, page 8,  page 12); and 
the neural network is trained to minimize a cross-entropy cost of the label classifier and to maximize a cross-entropy cost of the domain classifier using the first loss data and the second loss data (page 2, “While the parameters of the classifiers are optimized in order to minimize their error on the training set, the parameters of the underlying deep feature mapping are optimized in order to minimize the loss of the label classier and to maximize the loss of the domain classier”; page 21, cross-entropy).

Ganin discloses domain adaption using domain-adversarial neural network (DANN) for applications such as sentiment analysis or image classification (Abstract, section 4). Ganin does not discloses using DANN in a speech recognition application and fails to disclose using raw speech data. 

Palaz discloses using convolutional neural network (CNN) with raw input speech data (Palaz, Abstract, section 2.1 and 2.2). 

It would have been obvious to a person having ordinary skill in the art at the time the invention was made to combine Ganin’s teaching with Palaz’s teaching to apply DANN network in a speech application using raw speech input. One having ordinary skill in the art would have been motivated to make such a modification to obtain a better 

Regarding claims 2 and 12, Ganin in view of Palaz further discloses a gradient reversal layer, wherein, prior to the domain classifier processing the first output data, the gradient reversal layer processes the first output data and feeds the processed first output data into the domain classifier (Ganin, page 12, Fig. 1).

Regarding claims 3 and 13, Ganin in view of Palaz further discloses the gradient reversal layer uses a standard stochastic gradient descent based approach to process the first output data (Ganin, page 3, page 10).

Regarding claims 4 and 14, Ganin in view of Palaz further discloses the feature extractor is a multi-layer convolutional neural network ("CNN") comprising a Ganin, Fig. 4, page 21, algorithm 1).

Regarding claims 5 and 15, Ganin in view of Palaz further discloses the label classifier comprises a linear step, a ReLU, and a softmax function (Ganin, Fig. 4, page 8 and page 21, algorithm 1).

Regarding claims 6 and 16, Ganin in view of Palaz further discloses the domain classifier comprises a linear step, a ReLU, and a softmax function (Ganin, Fig. 4, page 8 and page 21, algorithm 1).
Regarding claims 7 and 17, Ganin in view of Palaz further discloses computing the first loss over labeled samples (Ganin, Fig. 1, pages 11-12).

Regarding claims 8 and 18, Ganin in view of Palaz further discloses computes the second loss over labeled samples and unlabeled samples (Ganin, page 5 and page 8; fig. 2).

Regarding claims 9 and 19, Ganin in view of Palaz further discloses the label classifier optimizes one or more parameters of the feature extractor and the label predictor using the first loss data (Ganin, page 2, page 8, page 11).

Ganin, page 11, page 13).

Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to Jialong He, whose telephone number is (571) 270-5359.  The examiner can normally be reached on Monday – Friday, 8:00AM – 4:30PM, EST.

If attempts to reach the examiner by telephone are unsuccessful, the examiner's supervisor, Pierre Desir can be reached on (571) 272-7799. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.


/JIALONG HE/Primary Examiner, Art Unit 2659