PNG
    media_image1.png
    340
    340
    media_image1.png
    Greyscale
United States Patent and Trademark Office    
        
            
                                
            
        
    

Commissioner for Patents
United States Patent and Trademark Office
P.O. Box 1450
Alexandria, VA 22313-1450
www.uspto.gov











BEFORE THE PATENT TRIAL AND APPEAL BOARD


Application Number: 15/633,931
Filing Date: 27 Jun 2017
Appellant(s): Daniel, Adrien



__________________
Mark A. Wilson, Reg. No. 43,994
For Appellant


EXAMINER’S ANSWER





This is in response to the appeal brief filed November 30, 2021.

(1) Grounds of Rejection to be Reviewed on Appeal
Every ground of rejection set forth in the Office action dated June 30, 2021 from which the appeal is taken is being maintained by the examiner except for the grounds of rejection (if any) listed under the subheading “WITHDRAWN REJECTIONS.”  New grounds of rejection (if any) are provided under the subheading “NEW GROUNDS OF REJECTION.”
The following ground(s) of rejection are applicable to the appealed claims.

Claim Interpretation
The text of those sections of Title 35, U.S. Code not included in this action can be found in a prior Office action.
The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 

Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation(s) is/are: “storage unit” in claim 15.
The “storage unit” of claim 15 is being construed to be embodied on a computer comprising a processor and a memory.  See specification pp. 9 (“The systems and methods described herein may … be embodied by a computer program … which may exist … in a single computer system….  [T]he term ‘computer’ refers to any electronic device comprising a processor….”), 10 (“The term ‘processor’ … refers to a data processing unit … that manipulates signals … based on operational instructions that are stored in a memory.”).
See specification p. 5 (“The storage unit … may be any memory [that] is suitable for integration into the system….”).
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.

Claim Rejections - 35 USC § 103
Claims 1, 3, 5-9, and 12-13 are rejected under 35 U.S.C. 103 as being obvious over Fakotakis et al., “High Performance Text-Independent Speaker Recognition System Based on Voiced/Unvoiced Segmentation and Multiple Neural Nets” (“Fakotakis”) in view of Launay et al., “Towards Knowledge-Based Features for HMM Based Large Vocabulary Automatic Speech Recognition,” in 1 IEEE Int’l Conf. Acoustics, Speech, and Signal Proc. I-817-20 (2002) (“Launay”) and further in view of Donahue et al., “DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition,” in Int’l Conf. Machine Learning 647-55 (2014) (“Donahue”).
Regarding claim 1, Fakotakis teaches “[a] method for facilitating the detection of one or more time series patterns, comprising building one or more artificial neural networks, wherein, for at least one time series pattern to be detected, a specific one of said one or more artificial neural networks is built (text-independent speaker recognition system based on the voiced segments of the speech signal [time-series pattern] is presented; a large number of MLPs [artificial neural networks] are used for classification – Fakotakis, abstract; speaker classification stage of the system is implemented by modeling each speaker with an individual feedforward MLP network – id. at sec. 2.3, first paragraph; see also Fig. 1 [each speaker is modeled with a separate ANN]), wherein each time series pattern to be detected represents a class of a detection task (speaker classification stage of the system is implemented by modeling each speaker with an individual feedforward MLP network – Fakotakis, sec. 2.3, first paragraph [classes = speaker 1, speaker 2, etc.]), wherein a separate artificial neural network is evolved for each class of the detection task (speaker classification stage of the system is implemented by modeling each speaker with an individual feedforward MLP network – Fakotakis, sec. 2.3, first paragraph [classes = speaker 1, speaker 2, etc.]) …, and wherein, for each speaker to be authenticated, a [separate] artificial neural network is built for detecting … speech segments of said speaker (see Fakotakis Fig. 1 and note that each of speakers 1, …, N has his own ANN)….”
Fakotakis appears not to disclose explicitly the further limitations of the claim.  However, Launay discloses that “a separate artificial neural network is evolved for each class of the detection task for distinction between vowels and consonants in a speech signal (multi-layer perceptrons are used to learn the mapping between the acoustic space and the distinctive feature space; the distinctive feature space consists of commonly used articulatory features, plus some broad phonetic classes; rather than training a single neural network to learn the whole mapping, an MLP is trained individually on each feature, leading to a set of 60 distinct NNs – Launay, sec. 2.1, first paragraph; see also Table 1 (showing that the phonetic features each of which corresponds to a separate NLP correspond, inter alia, to vowels and consonants)), [and] … a first artificial neural network is built for detecting voiced speech segments …, and a second artificial neural network is built for detecting unvoiced speech segments (multi-layer perceptrons are used to learn the mapping between the acoustic space and the distinctive feature space; the distinctive feature space consists of commonly used articulatory features, plus some broad phonetic classes; rather than training a single neural network to learn the whole mapping, an MLP is trained individually on each feature, leading to a set of 60 distinct NNs – Launay, sec. 2.1, first paragraph; see also Table 1 (showing that the phonetic features each of which corresponds to a separate NLP include, inter alia, voiced sounds and unvoiced sounds))….”
See Launay, sec. 2.1, first paragraph.
Neither Fakotakis nor Launay appears to disclose explicitly the further limitations of the claim.  However, Donahue discloses that “one of said one or more artificial neural networks has multiple inputs and multiple outputs to output a feature vector to be fed into a subsequent support vector machine (SVM) (features extracted from the activation of a deep convolutional neural network trained in a fully supervised fashion can be repurposed to novel generic tasks – Donahue, abstract; inputs for CNN are a mean-centered raw RGB pixel intensity values of a 224 x 224 image [i.e., the network has multiple inputs] – id. at penultimate paragraph before sec. 3.2; activations of the nth hidden layer of the network are called feature DeCAFn, where DeCAF7 denotes features taken from the final hidden layer [features taken from the nth hidden layer = feature vector comprising multiple outputs] – id. at sec. 4, first paragraph; top performing method trains a linear SVM on DeCAF6 with dropout – id. at sec. 4.1, third paragraph; see also Fig. 4 (showing that a standard SVM can be trained on DeCAF5-7 and an SVM with dropout can also be trained on DeCAF7)).”
Donahue and the instant application both relate to neural networks and are analogous.  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention See Donahue, abstract.

Regarding claim 3, Fakotakis, as modified by Launay and Donahue, discloses that “the one or more artificial neural networks are stored for subsequent use in the detection task (speaker classification stage of the system is implemented by modeling each speaker with an individual feedforward MLP network; for speaker verification the selected feature vectors are applied to the speaker model of the speaker to be verified, and a measure is calculated that this speaker generated the feature vectors; if this measure exceeds a given decision threshold, the speaker is verified [detected] – Fakotakis, sec. 2.3 [note that the networks would have to be stored before being used for detection]).”

Regarding claim 5, Fakotakis, as modified by Launay and Donahue, teaches that “said time series patterns 20are audio patterns (text-independent ASR system that is suitable for identification and verification purposes separates voiced part of speech [audio] signal to form feature vectors – Fakotakis, sec. 1, penultimate paragraph).”

Regarding claim 6, Fakotakis, as modified by Launay and Donahue, discloses that “a raw time series signal is provided as an input to each artificial neural network that is built (feature vectors are extracted from the input signal [raw time series signal] through signal processing, and the system separates the voiced part of the speech signal using a voiced/unvoiced decision module; the last module performs speaker classification – Fakotakis, sec. 2, first paragraph; see also Figure 1; note that Examiner construes the claim as covering the situation in which the raw time series signal is input to a signal processor, whose output is the direct input to the networks).”

Regarding claim 257, Fakotakis, as modified by Launay and Donahue, teaches that “the audio patterns include at least one audio pattern selected from the group consisting of: voiced speech, unvoiced speech, user-specific speech, contextual sound, and a sound event (text-independent ASR system that is suitable for identification and verification purposes separates voiced part of speech [audio] signal to form feature vectors – Fakotakis, sec. 1, penultimate paragraph).”

Regarding claim 8, Fakotakis, as modified by Launay and Donahue, discloses that “the detection of the time 30series patterns forms part of a speaker authentication function (task of a speaker recognition system is to verify whether a speaker is the person he claims to be – Fakotakis, sec. 1, first paragraph; text-independent automatic speech recognition system suitable both for identification and verification [authentication] purposes is presented – id. at sec. 1, penultimate paragraph).”

Regarding claim 9, Fakotakis, as modified by Launay and Donahue, discloses that “for each speaker to be authenticated, at least one artificial neural network is built for detecting speech segments of said speaker (speaker classification stage of the system is implemented by modeling each speaker with an individual feedforward MLP network – Fakotakis, sec. 2.3, first paragraph).”

Regarding claim 1012, Fakotakis, as modified by Launay and Donahue, discloses “[a] non-transitory computer-readable medium containing a computer program comprising instructions which, when executed, carry out or control a method as claimed in claim 1 (MLP classifiers for individual speaker models are trained using as a supervised training procedure [computer program] a fast version of the back propagation algorithm – Fakotakis, sec. 2.3, first full paragraph after Eq. 5; system exhibited low memory requirements – id. at sec. 4 [implying the existence of memory/a non-transitory computer-readable medium on which to store the program]).”

text-independent speaker recognition system based on the voiced segments of the speech signal [time-series pattern] is presented; a large number of MLPs [artificial neural networks] are used for classification – Fakotakis, abstract; speaker classification stage of the system is implemented by modeling each speaker with an individual feedforward MLP network – id. at sec. 2.3, first paragraph; see also Fig. 1 [each speaker is modeled with a separate ANN]; note that the circuitry that trains the individual networks is deemed to be the processor), wherein each time series pattern to be detected represents a class of a detection task (speaker classification stage of the system is implemented by modeling each speaker with an individual feedforward MLP network – Fakotakis, sec. 2.3, first paragraph [classes = speaker 1, speaker 2, etc.]), wherein a separate artificial neural network is evolved for each class of the detection task (speaker classification stage of the system is implemented by modeling each speaker with an individual feedforward MLP network – Fakotakis, sec. 2.3, first paragraph [classes = speaker 1, speaker 2, etc.]) …, [and] wherein, for each speaker to be authenticated, a [separate] artificial neural network is built for detecting … speech segments of said speaker (see Fakotakis Fig. 1 and note that each of speakers 1, …, N has his own ANN)….”
Fakotakis appears not to disclose explicitly the further limitations of the claim.  However, Launay discloses that “a separate artificial neural network is evolved for each class of the detection task for distinction between vowels and consonants in a speech signal (multi-layer perceptrons are used to learn the mapping between the acoustic space and the distinctive feature space; the distinctive feature space consists of commonly used articulatory features, plus some broad phonetic classes; rather than training a single neural network to learn the whole mapping, an MLP is trained individually on each feature, leading to a set of 60 distinct NNs – Launay, sec. 2.1, first paragraph; see also Table 1 (showing that the phonetic features each of which corresponds to a separate NLP correspond, inter alia, to vowels and consonants)), [and] … a first artificial neural network is built for detecting voiced speech segments …, and a second artificial neural network is built for detecting unvoiced speech segments (multi-layer perceptrons are used to learn the mapping between the acoustic space and the distinctive feature space; the distinctive feature space consists of commonly used articulatory features, plus some broad phonetic classes; rather than training a single neural network to learn the whole mapping, an MLP is trained individually on each feature, leading to a set of 60 distinct NNs – Launay, sec. 2.1, first paragraph; see also Table 1 (showing that the phonetic features each of which corresponds to a separate NLP include, inter alia, voiced sounds and unvoiced sounds))….”
Fakotakis and Launay both relate to automatic speech recognition and are analogous.  Fakotakis discloses a system in which a separate neural network is developed for each speaker, and Launay discloses a system in which separate neural networks recognize voiced and unvoiced parts of speech and are trained to distinguish between consonants and vowels.  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Fakotakis such that each speaker’s ANN disclosed therein is further split up such that, for each speaker, separate ANNs are developed for detecting voiced and unvoiced speech segments of a voice signal and for distinguishing between consonants and vowels, as disclosed by Launay.  An ordinary artisan could reasonably expect to have made such a combination successfully and would have been motivated to do so because doing so would allow the system to detect features commonly encountered in speech directly without the added complexity involved with training a single neural network on every possible feature.  See Launay, sec. 2.1, first paragraph.
Neither Fakotakis nor Launay appears to disclose explicitly the further limitations of the claim.  However, Donahue discloses that “one of said one or more artificial neural networks has multiple inputs and multiple outputs to output a feature vector to be fed into a subsequent support vector machine (SVM) (features extracted from the activation of a deep convolutional neural network trained in a fully supervised fashion can be repurposed to novel generic tasks – Donahue, abstract; inputs for CNN are a mean-centered raw RGB pixel intensity values of a 224 x 224 image [i.e., the network has multiple inputs] – id. at penultimate paragraph before sec. 3.2; activations of the nth hidden layer of the network are called feature DeCAFn, where DeCAF7 denotes features taken from the final hidden layer [features taken from the nth hidden layer = feature vector comprising multiple outputs] – id. at sec. 4, first paragraph; top performing method trains a linear SVM on DeCAF6 with dropout – id. at sec. 4.1, third paragraph; see also Fig. 4 (showing that a standard SVM can be trained on DeCAF5-7 and an SVM with dropout can also be trained on DeCAF7)).”
Donahue and the instant application both relate to neural networks and are analogous.  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Fakotakis and Launey to feed the output of one of the neural networks into an SVM, as disclosed by Donahue, and an ordinary artisan could reasonably expect to have done so successfully.  Doing so would allow a user to adapt a new architecture to new tasks for which there may not be sufficient labeled or unlabeled data to train the new architecture conventionally.  See Donahue, abstract.

Claims 2 and 14-15 are rejected under 35 U.S.C. 103 as being unpatentable over Fakotakis in view of Launay and Donahue and further in view of Stanley et al., “Evolving Neural Networks through Augmenting Topologies” (“Stanley”).
Regarding claim 102, neither Fakotakis, Launay, nor Donahue appears to disclose explicitly the further limitations of the claim.  However, Stanley discloses that “building said artificial neural networks comprises employing neuroevolution of augmenting topologies (novel method of neuroevolution of neural networks called NeuroEvolution of Augmenting Topologies is designed to take advantage of structure as a way of minimizing the dimensionality of the search space of the connection weights – Stanley, p. 100, penultimate paragraph).” 
Fakotakis, Launay, Donahue, and Stanley all relate to neural networks and are analogous.  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Fakotakis, Launay, and Donahue to build the neural networks with See Stanley, p. 100, fourth paragraph.

Regarding claim 14, Fakotakis, as modified by Launay, Donahue, and Stanley, discloses that “the processor is configured to employ neuroevolution of augmenting topologies for building said 20artificial neural networks (novel method of neuroevolution of neural networks called NeuroEvolution of Augmenting Topologies is designed to take advantage of structure as a way of minimizing the dimensionality of the search space of the connection weights – Stanley, p. 100, penultimate paragraph).”  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Fakotakis/Launey/Donahue to build the neural networks with NEAT, as disclosed by Stanley, and an ordinary artisan could reasonably expect to do so successfully.  Doing so would eliminate the need to decide by trial-and-error how many hidden nodes are needed, thereby increasing efficiency.  See Stanley, p. 100, fourth paragraph.

Regarding claim 15, Fakotakis, as modified by Launay, Donahue, and Stanley, discloses “a storage unit, wherein the processor is further configured to store the artificial neural networks in said storage unit for subsequent use in a detection task (speaker classification stage of the system is implemented by modeling each speaker with an individual feedforward MLP network; for speaker verification the selected feature vectors are applied to the speaker model of the speaker to be verified, and a measure is calculated that this speaker generated the feature vectors; if this measure exceeds a given decision threshold, the speaker is verified [detected] – Fakotakis, sec. 2.3; system exhibited low memory requirements – id. at sec. 4 [implying that the networks are stored in a memory/storage unit]).”

Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159.  See MPEP §§ 706.02(l)(1) - 706.02(l)(3) for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.
s 1 and 8; 2-3; 5-6; 7; 9; and 13 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1; 5-6; 8-9; 10; and 12 (respectively) of U.S. Patent No. 10,529,339 (“reference patent”) in view of Fakotakis and Launay and further in view of Donahue.1  A comparison chart of the claims follows, followed by an analysis.
Instant Application
Reference Patent
1. A method for facilitating the detection of one or more time series patterns, comprising building one or more artificial neural networks, wherein, for at least one time series pattern to be detected, a specific one of said one or more artificial neural networks is built, wherein each time series pattern to be detected represents a class of said detection task, wherein a separate neural network is evolved for each class of the detection task for distinction between vowels and consonants in a speech signal, wherein, for each speaker to be authenticated, a first artificial neural network is built for detecting voiced speech segments of said speaker, and a second artificial neural network is built for detecting unvoiced speech segments of said speaker, and wherein[] one of said one or more artificial neural networks has multiple inputs and 
8. A method as claimed in claim 1, wherein the detection of the time series patterns forms part of a speaker authentication function.Page 4 of 6New U.S. patent appln. Docket No.: 81989072US03Preliminary Amendment


building one or more artificial neural networks,
wherein, for at least one time series pattern to be detected, a specific one of said artificial neural networks is built, 
the specific one of said artificial neural networks being configured to produce a decision output and a reliability output, 
wherein the reliability output is indicative of the reliability of the decision output, 
wherein the detection of the time series patterns forms part of a speaker authentication function; 
wherein, for each speaker to be authenticated, at least one artificial neural network is built for detecting speech segments of said speaker; and 
wherein, for each speaker to be authenticated, an artificial neural network is built for detecting 

5. A method as claimed in claim 1, wherein building said artificial neural networks comprises employing neuroevolution of augmenting topologies.
3. A method as claimed in claim 1, wherein the one or more artificial neural networks are stored for subsequent use in the detection task.
6. A method as claimed in claim 1, wherein the artificial neural networks are stored for subsequent use in a detection task.
5. A method as claimed in claim 1, wherein said time series patterns are audio patterns.
8. A method as claimed in claim 1, wherein said time series patterns are audio patterns.
6. A method as claimed in claim 1, wherein a raw time series signal is provided as an input to each artificial neural network that is built.
9. A method as claimed in claim 1, wherein a raw time series signal is provided as an input to each artificial neural network that is built.
7. A method as claimed in claim 5, wherein the audio patterns include at least one audio pattern selected from the group consisting of: voiced speech, unvoiced speech, user-specific speech, contextual sound, and a sound event.
9. A method as claimed in claim 7, wherein, for each speaker to be authenticated, at least one 

13. A system for facilitating the detection of one or more time series patterns, comprising a processor configured to build one or more artificial neural networks, wherein, for at least one time series pattern to be detected, the processor is configured to build a specific one of said one or more artificial neural networks, wherein each time series pattern to be detected represents a class of said detection task, wherein a separate neural network is evolved for each class of the detection task for distinction between vowels and consonants in a speech signal, wherein, for each speaker to be authenticated, a first artificial neural network is built for detecting voiced speech segments of said speaker, and a second artificial neural network is built for detecting unvoiced speech segments of said speaker, and wherein[] one of said one or more artificial neural networks has multiple inputs and multiple outputs to output a feature vector to be fed into a subsequent support vector machine (SVM).
12. A system for facilitating detection of one or more time series patterns, comprising a network building unit configured to build one or more artificial neural networks, wherein, for at least one time series pattern to be detected, the network building unit is configured to build a specific one of said artificial neural networks, the specific one of said artificial neural networks being configured to produce a decision output and a reliability output, wherein the reliability output is indicative of the reliability of the decision output wherein the detection of the time series patterns forms part of a speaker authentication function; wherein, for each speaker to be authenticated, at least one artificial neural network is built for detecting speech segments of said speaker; and wherein, for each speaker to be authenticated, an artificial neural network is built for detecting voiced speech segments of said speaker, and another artificial neural network is built for detecting unvoiced speech segments of said speaker.


	Instant independent claims 1 and 13 differ from their counterparts in the reference application insofar as the instant claims now contain the following limitations, taught by Fakotakis:  “each time series speaker classification stage of the system is implemented by modeling each speaker with an individual feedforward MLP network – Fakotakis, sec. 2.3, first paragraph [classes = speaker 1, speaker 2, etc.]), and … a separate artificial neural network is evolved for each class of the detection task (speaker classification stage of the system is implemented by modeling each speaker with an individual feedforward MLP network – Fakotakis, sec. 2.3, first paragraph [classes = speaker 1, speaker 2, etc.]).”
	The reference application and Fakotakis both relate to the use of neural networks for audio classification and are analogous.  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the reference application to use a separate neural network for each class of the detection task, as taught by Fakotakis, and an ordinary artisan could reasonably expect to have done so successfully.  Doing so would reduce the training time relative to the time spent training a single large network.  See Fakotakis, sec. 2.3, first paragraph.
Furthermore, the reference patent and the instant application differ in the following limitation, taught by Launay: “a separate artificial neural network is evolved for each class of the detection task for distinction between vowels and consonants in a speech signal (multi-layer perceptrons are used to learn the mapping between the acoustic space and the distinctive feature space; the distinctive feature space consists of commonly used articulatory features, plus some broad phonetic classes; rather than training a single neural network to learn the whole mapping, an MLP is trained individually on each feature, leading to a set of 60 distinct NNs – Launay, sec. 2.1, first paragraph; see also Table 1 (showing that the phonetic features each of which corresponds to a separate NLP correspond, inter alia, to vowels and consonants))….”
The reference patent, Fakotakis, and Launay all relate to automatic speech recognition and are analogous.  Fakotakis discloses a system in which a separate neural network is developed for each speaker, and Launay discloses a system in which separate neural networks are trained to distinguish between consonants and vowels.  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of the reference patent and Fakotakis See Launay, sec. 2.1, first paragraph.
Furthermore, instant claims 1 and 13 differ from their counterparts in the reference patent in containing the following limitation, taught by Donahue: “one of said one or more artificial neural networks has multiple inputs and multiple outputs to output a feature vector to be fed into a subsequent support vector machine (SVM) (features extracted from the activation of a deep convolutional neural network trained in a fully supervised fashion can be repurposed to novel generic tasks – Donahue, abstract; inputs for CNN are a mean-centered raw RGB pixel intensity values of a 224 x 224 image [i.e., the network has multiple inputs] – id. at penultimate paragraph before sec. 3.2; activations of the nth hidden layer of the network are called feature DeCAFn, where DeCAF7 denotes features taken from the final hidden layer [features taken from the nth hidden layer = feature vector comprising multiple outputs] – id. at sec. 4, first paragraph; top performing method trains a linear SVM on DeCAF6 with dropout – id. at sec. 4.1, third paragraph; see also Fig. 4 (showing that a standard SVM can be trained on DeCAF5-7 and an SVM with dropout can also be trained on DeCAF7)).”
Donahue and the instant application both relate to neural networks and are analogous.  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of the reference patent, Fakotakis, and Launey to feed the output of one of the neural networks into an SVM, as disclosed by Donahue, and an ordinary artisan could reasonably expect to have done so successfully.  Doing so would allow a user to adapt a new architecture to new tasks for which there may not be sufficient labeled or unlabeled data to train the new architecture conventionally.  See Donahue, abstract.
.

(2) Response to Argument
Appellant makes two substantive arguments that the rejections under 35 USC § 103 are allegedly improper: that Launey in view of Fakotakis and Donahue allegedly does not disclose that “one of said one of more artificial neural networks has multiple inputs and multiple outputs to output a feature vector to be fed into a subsequent support vector machine (SVM)” and that the reasoning to combine the references is allegedly not based on a rational underpinning.  Appellant’s Brief dated November 30, 2021 (“Br.”) 7-11.  Neither argument should convince the Board, for the reasons delineated below.
Launey in View of Fakotakis and Donahue Teaches the Disputed Limitation
Statement of Law
“The legal concept of prima facie obviousness is a procedural tool of examination which applies broadly to all arts.”  MPEP § 2142.  “The examiner bears the initial burden of factually supporting any prima facie conclusion of obviousness. If the examiner does not produce a prima facie case, the applicant is under no obligation to submit secondary evidence to show nonobviousness. If, however, the examiner does produce a prima facie case, the burden of coming forward with evidence or arguments shifts to the applicant who may submit additional evidence of nonobviousness….”  Id.  “‘The determination of obviousness is dependent on the facts of each case.’ Sanofi-Synthelabo v. Apotex, Inc., 550 F.3d 1075, 1089, 89 USPQ2d 1370, 1379 (Fed. Cir. 2008).”  Id.  “The rationale to modify or combine the prior art does not have to be expressly stated in the prior art; the rationale may be expressly or impliedly contained in the prior art or it may be reasoned from knowledge generally available to one of ordinary skill in the art, established scientific principles, or legal precedent established by prior case law.”  MPEP § 2144(I) (citations omitted).

Appellant Has Waived Its Right Independently to Challenge Examiner’s Rejection of the Dependent Claims
Examiner notes that, among the claims Examiner rejected under 35 USC § 103, Appellant has entered a substantive argument only against the rejection of the independent claims.  Therefore, if the Board finds that Launey in view of Fakotakis and Donahue render the independent claims obvious, it should summarily sustain the rejection of the dependent claims, because Appellant has entered substantive arguments against neither the rejections of these claims nor the rejections of any of the claims on which they depend other than the independent claims.  See 37 CFR § 41.37(c)(1)(iv).
Donahue Teaches the Disputed Limitation
Appellant’s argument that Donahue does not teach the limitation reproduced above should not convince the Board.  As an initial matter, Examiner notes that Appellant does little more than make a bald assertion that the instant claims are somehow relevantly different from Donahue, without specifically pointing out how the instant claims are different from the disclosure of Donahue.  This alone should be sufficient ground for the Board to reject Appellant’s argument.  See In re Lovin, 652 F.3d 1349, 1357 (Fed. Cir. 2011) (“we hold that the Board reasonably interpreted Rule 41.37 to require more substantive arguments in an appeal brief than a mere recitation of the claim elements and a naked assertion that the corresponding elements were not found in the prior art.”).
Nonetheless, Examiner will explain how Donahue meets the limitation.  Regarding the assertion that Donahue fails to teach that the input neural network has multiple inputs and multiple outputs, Br. at 7-8, Examiner explained in the Advisory Action of September 8, 2021 (“Advisory Action”) that the inputs for the CNN (the network whose outputs are ultimately fed into an SVM) are mean-centered raw RGB pixel intensity values of a 224 x 224 image.  Donahue, penultimate paragraph before section 3.2.  Insofar as the claim does not define “multiple inputs,” nor is the term defined in the specification, it is reasonable to assume that each pixel of each input image qualifies as a separate input.  Moreover, because there is more than one input image, this too means that there must be “multiple inputs.”  Note that Appellant did not claim that one of the artificial neural networks has multiple input nodes or neurons.  Appellant merely inputs, which there clearly are.  Similarly, regarding “multiple outputs,” the Advisory Action explains that Donahue section 4, first paragraph, discloses that the features DeCAFn taken as activations of the nth hidden layer of the network are used to train another network, which may be an SVM.  Notably, that paragraph gives the example of DeCAF7, which “denotes features taken from the final hidden layer”.  (Emphasis added.)  That is, the outputs of the network that are fed into the SVM are also multiple in number.  Again, it is not necessary that the network have multiple output nodes for the reference to read on the limitation.
Regarding the assertion that Donahue fails to teach “output[ting] a feature vector to be fed into a subsequent support vector machine (SVM),” Br. at 8, the above-mentioned first paragraph of section 4 of Donahue indicates that the features of DeCAFn, which again represent the outputs of the nth hidden layer of the CNN, are evaluated on multiple standard computer vision benchmarks.  Among those benchmarks, as the third paragraph of section 4.1 notes, is a linear SVM trained on DeCAF6.  In other words, a feature vector comprising the outputs of the sixth hidden layer of the CNN is input as a training datum into a linear SVM, which is as claimed.

The Stated Motivation to Combine the References is Proper
Statement of Law
MPEP § 2143 states that:
The Supreme Court in KSR Int'l Co. v. Teleflex Inc., 550 U.S. 398, 415-421, 82 USPQ2d 1385, 1395-97 (2007) identified a number of rationales to support a conclusion of obviousness which are consistent with the proper "functional approach" to the determination of obviousness as laid down in Graham. The key to supporting any rejection under 35 U.S.C. 103 is the clear articulation of the reason(s) why the claimed invention would have been obvious. The Supreme Court in KSR noted that the analysis supporting a rejection under 35 U.S.C. 103  should be made explicit.

To reject a claim based on a teaching, suggestion, or motivation in the prior art that would have led one of ordinary skill to modify the prior art reference or to combine prior art reference teachings to arrive at the claimed invention,
Office personnel must articulate the following:

(2) a finding that there was reasonable expectation of success; and 
(3) whatever additional findings based on the Graham factual inquiries may be necessary, in view of the facts of the case under consideration, to explain a conclusion of obviousness.

Id. at § 2143(I)(G).
The Motivation to Combine is Proper
Appellant’s argument that there is no rational underpinning for the determination that it would have been obvious to an ordinary artisan before the effective filing date to modify the combination of Fakotakis and Launey to feed the results of a multiple-input, multiple-output neural network into an SVM is unconvincing.  As an initial matter, Appellant’s statement that “the Final Office Action fails to explain how the teachings of Fakotakis, Launey and Donahue would be combined as suggested in the Final Office Action”, Br. at 9, ignores the fact that an explanation of the exact manner in which the prior art references may be combined is unnecessary to support a conclusion of obviousness based on the teaching, suggestion, or motivation (“TSM”) rationale.  All that is necessary is a finding of a teaching, suggestion, or motivation to modify the prior art reference; a finding of a reasonable expectation of success; and any other findings necessary to resolve the Graham factors.
Appellant’s argument is, in effect, that Examiner has not properly articulated a finding of a TSM because the motivation given by Examiner, namely that feeding output data of a first neural network into a SVM would alleviate the problem of finding insufficient training data to train the SVM properly by generating training data that do not need to be found externally, is insufficient because Donahue does not explicitly say that generating training data with a separate network reduces the need to find training data elsewhere.  Br. at 9-10.  However, in making this argument, Appellant unduly restricts the flexible obviousness analysis by insisting that every element of the motivation be explicitly found in the art.  However, as noted above, and as Appellant explicitly concedes, Br. at 10, the “teaching, suggestion, or motivation” may also be found “in the knowledge generally available to one of ordinary skill in the art”.  MPEP § 2143(I)(G).  The problem of finding insufficient labeled training data to train a supervised machine See Wikipedia, Semi-Supervised Learning, screenshot captured Nov. 9, 2015, http://web.archive.org/web/20151109160151/https://en.wikipedia.org/wiki/Semi-supervised_learning (“The acquisition of labeled data for a learning problem often requires a skilled human agent (e.g. to transcribe an audio segment) or a physical experiment (e.g. determining the 3D structure of a protein or determining whether there is oil at a particular location). The cost associated with the labeling process thus may render a fully labeled training set infeasible….”).  Moreover, the abstract of Donahue explicitly says that the reason why features are extracted from a CNN for input into the SVM is because “there may be insufficient labeled or unlabeled training data to conventionally train or adapt a deep architecture to … new tasks.”  In other words, in Donahue, feeding the features generated by the CNN into the SVM facilitates transfer learning in situations where there may be insufficient training data to train the SVM directly.  As applied to the instant application, feeding the output of a neural network into an SVM is useful in situations where the SVM cannot be trained directly due to insufficient labeled data.
Examiner would also note that the specification as originally filed never articulates why the output of the original neural network is fed into an SVM.  Indeed, the idea of feeding the output of the original network into the SVM is given only a single perfunctory mention in the first full paragraph of page 9 of the specification as originally filed.  While it is not necessary that the motivation provided be the same as that provided by the instant application, see MPEP § 2144(IV), here the complete absence of any explanation of the reasoning for feeding the network’s output into an SVM means that any advantage of making this modification that an ordinary artisan would have envisioned before the effective filing date will suffice.  Here, given that both Donahue itself and knowledge generally available to ordinary artisans suggest that feeding the output of the neural network into an SVM would facilitate training the SVM when there are insufficient training data to train the SVM without generating training data, there is a rational basis for combining these teachings of Donahue with those of Fakotakis and Launey.  The rejection is proper.
For the above reasons, it is believed that the rejections should be sustained.


/R.C.V./Examiner, Art Unit 2125                                                                                                                                                                                                        
Conferees:
/ALAN CHEN/Primary Examiner, Art Unit 2125           
                                                                                                                                                                                             /KAMRAN AFSHAR/Supervisory Patent Examiner, Art Unit 2125                                                                                                                                                                                                        

Requirement to pay appeal forwarding fee.  In order to avoid dismissal of the instant appeal in any application or ex parte reexamination proceeding, 37 CFR 41.45 requires payment of an appeal forwarding fee within the time permitted by 37 CFR 41.45(a), unless appellant had timely paid the fee for filing a brief required by 37 CFR 41.20(b) in effect on March 18, 2013.







    
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
    

    
        1 Note that only a one-way test for distinctness is required even though the instant application is the earlier-filed application because Applicant could have filed the claims of the reference patent in the instant application.  See MPEP § 804(II)(B)(2)(b).