DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Election/Restriction
This application contains claims directed to the following patentably distinct species:
Group I: Claims 1-20. These claims are directed towards an apparatus and method describing specific structure and functionalities of a “teacher network” and “student network.” With reference to the drawings filed 01/08/2018, the above claims are shown and described, at least in part, in Figure 5-6, which show the structure of the student-teacher network with knowledge bridges. 
Group II: Claim 21. This claim is directed towards a method of forming a physical wafer (e.g. chip) which has Group I’s “student-teacher network”. Further Claim 21 is directed towards the testing of the “apparatus” which includes optical converters and optical splitters. 
Group III: Claims 22: The claim is directed towards “a method of constructing a integrated circuit.” Group III is patentably distinct from Group II because describes the method by which a wafer is formed; namely, by “generating a mask layout.” 

The species are independent or distinct because the claims to the different specification recite mutually exclusion characteristics of such species. For example, 
In contrast, while Group II recites the “student-teacher network” the specific functionalities required of Group I are NOT required in Group II (e.g. specific equations). Similarly, Group I does NOT require that the “student-teacher network” be on a chip nor requires the physical testing of a physical device using the specific testing methods as claimed in Group II. 
Group III, too, recites the general characteristics of the “student-teacher network” as required by Group I, but similar to Group II does NOT require the specific functionalities of Group I. Group III is further distinct from Group II and Group I because Group III claims a specific method of constructing an integrated circuit; this specific method is not required by Group I nor Group II. 
 In addition, these species are not obvious variants of each other based on the current record.
Applicant is required under 35 U.S.C. 121 to elect a single disclosed species, or a single grouping of patentably indistinct species, for prosecution on the merits to which the claims shall be restricted if no generic claim is finally held to be allowable. Currently, no claims are generic.
There is a search and/or examination burden for the patentably distinct species as set forth above because at least the following reason(s) apply:  
The species or groupings of patentably indistinct species have acquired a separate status in the art in view of their different classification(s): 
Group I is drawn towards at least the CPC Classification G06N3/08 which is defined as learning methods.
Group II is drawn towards at least the CPC Classification H01L 21/00 which is defined as “processes or apparatus adapted for the manufacture or treatment of semiconductor or solid state devices or parts thereof” and/or H01L 22/00 which is defined as “testing or measuring during manufacture or treatment [of semiconductors]…”
Group III is drawn towards at least the CPC Classification G06F 30/39 which is defined as “circuit design at the physical level” and/or G06F 30/398 which is defined as “design verification or optimization…” 
The species or groupings of patentably indistinct species require a different field of search
Group I would be most directed towards a searching including neural networks, knowledge bridges, and the mathematical ideas behind certain machine learning algorithms. This is different from…
Group II which would be most directed towards a search including methods of manufacturing silicon wafers and the testing of those wafers
Group III would be most directed towards a search of “mask layouts”, “macros for compliance”, and methods of “constructing an integrated circuit.” 

Applicant is advised that the reply to this requirement to be complete must include (i) an election of a species to be examined even though the requirement may be traversed (37 CFR 1.143) and (ii) identification of the claims encompassing the elected species or grouping of patentably indistinct species, including any claims subsequently added. An argument that a claim is allowable or that all claims are generic is considered nonresponsive unless accompanied by an election.
The election may be made with or without traverse. To preserve a right to petition, the election must be made with traverse. If the reply does not distinctly and specifically point out supposed errors in the election of species requirement, the election shall be treated as an election without traverse. Traversal must be presented at the time of election in order to be considered timely. Failure to timely traverse the requirement will result in the loss of right to petition under 37 CFR 1.144. If claims are added after the election, applicant must indicate which of these claims are readable on the elected species or grouping of patentably indistinct species.
Should applicant traverse on the ground that the species, or groupings of patentably indistinct species from which election is required, are not patentably distinct, applicant should submit evidence or identify such evidence now of record showing them to be obvious variants or clearly admit on the record that this is the case. In either instance, if the examiner finds one of the species unpatentable over the prior art, the evidence or admission may be used in a rejection under 35 U.S.C. 103 or pre-AIA  35 U.S.C. 103(a) of the other species.


A telephone call was made to attorney of record Christian LaPense on 03/17/2021 to request an oral election to the above restriction requirement. After a conversation with the applicant, the attorney of record called back and elected Group I (Claims 1-20) without traverse. 

For clarity of record, the claims examined in the instant action are Group I. Group II and Group III are NOT examined in the instant action. 

Claim Objections
Claims 1-2 and 12 are objected to because of the following informalities:  
Claims 1-2 and 12 recite multiple limitations which use the word “device.” Claim 1 for example recites “…a loss function device.” While definite, it is unclear what the purpose of the word “device” is, as each limitation which uses the word “device” refers to mathematical equations or layers within a neural network. 
The examiner respectfully requests that the applicant provide clarification on the word “device” or cancel such language.
For example, it appears that claim 2 should instead recite: 
“The apparatus of claim 1, 
wherein each of the teacher network and the student network comprises: 

a maximum pooling layer 
a 3x1 convolutional layer 
a dimension reduction 
at least one long short term memory (LSTM) layer 
a soft maximum layer 

Appropriate correction is required.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 10 and 11-20 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claim 10 recites: 
…a collection of distant speech received by a microphone in a feature network of the student network, and where the teacher network receives close proximity speech received by a microphone…


“When a subjective term is used in the claim, the examiner should determine whether the specification supplies some standard for measuring the scope of the term, similar to the analysis for a term of degree. Some objective standard must be provided in order to allow the public to determine the scope of the claim. A claim that requires the exercise of subjective judgment without restriction may render the claim indefinite.” 

Because no objective standard has been provided or identified, the use of “distant speech” and/or “close proximity speech” is indefinite. The examiner notes that paragraph [0005] provides the best description of what the applicant considers “distant speech” and recites: “There has been a great effort to improve distant (e.g. far-field) speech recognition…” However, this does not provide the required “objective standard.” In other words, what distance away from a microphone is considered “distant”, similarly how close does speech need to be to be considered “close proximity”? Because neither Claim 10 nor the as-filed specification provide an objective standard for what would be considered “distant speech” and/or “close proximity speech” the metes and bounds of the claim cannot be established and therefore the claim is indefinite. Appropriate correction is required. 
Claim 20 recites similar language and therefore is similarly rejected. 

Claim 11 recites, at least in part: “providing hints to a student network by a plurality of knowledge bridges…” The term “hints” is considered a subjective term and thus renders the claim indefinite.  The examiner suggests amending claim 11 to recite 
The examiner notes that Claims 12-20 are rejected due their dependency on Claim 11. 
	
Allowable Subject Matter
Claims 3-9 objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

The following is a statement of reasons for the indication of allowable subject matter: 
Claim 3 recites, at least in part:
“wherein each of the teacher network and the student network is a recursive teacher network and a recursive student network respectively, wherein each of the recursive teacher network and the recursive student network is: 
                
                    
                        
                            m
                        
                        
                            t
                        
                        
                            n
                        
                    
                    =
                    g
                    (
                    
                        
                            W
                        
                        
                            1
                        
                    
                    
                        
                            i
                        
                        
                            t
                        
                        
                            n
                        
                    
                    
                        
                            
                                
                                    x
                                
                                
                                    t
                                
                            
                        
                    
                    +
                    
                        
                            W
                        
                        
                            2
                        
                    
                    
                        
                            f
                        
                        
                            t
                        
                        
                            n
                        
                    
                    
                        
                            
                                
                                    s
                                
                                
                                    t
                                
                                
                                    n
                                    -
                                    1
                                
                            
                        
                    
                    +
                    b
                    )
                
            

The prior art of record does not teach or even fairly suggest:
	1. That each of the teacher network and the student network is a recursive teacher network and a recursive student network respectively. 
	2. The specific mathematical equation which defines the network(s). 
More specifically, the best prior art of record is Romero et al. ("FitNets: Hints for Thin Deep Nets" NPL 2015). However, while Romero teaches the “student-teacher” network (See Rejection under 35 U.S.C. 102 below), Romero does not disclose the 
	For this feature, the examiner turns to Zhu et al. (“A re-ranking Model for Dependency Parser with Recursive Convolutional Neural Network”, NPL 2015). While Zhu, arguably, discloses a Recursive Convolutional Neural Network, Zhu does not teach that this structure of a neural network can be used in the “Student-Teacher” network paradigm of the instant invention. Further, even if a person of ordinary skill in the art could find that the structure of Zhu (i.e. Recursive Convolutional Neural Network) could be used with the Student-Teacher network of Romero, neither reference, alone or in combination, would disclose the specific mathematical equation of Claim 3 which defines the “recursive teacher network” and “recursive student network” of the instant invention. 
	For at least the reason(s) above, Claim 3 is allowable over the prior art of record. reasons.  
	The examiner notes that Claims 4-5 allowable merely due to being dependent upon allowable Claim 3.
	The examiner notes that Claims 6-9 are similarly allowable for at least the reasons above.  
Claims 13-19 be allowable if rewritten to overcome the rejection(s) under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), 2nd paragraph, set forth in this Office action and to include all of the limitations of the base claim and any intervening claims.


The examiner notes that Claims 14-15 are allowable merely due to being dependent upon allowable Claim 13.
The examiner notes that Claims 16-19 are similarly allowable for at least the reasons above.  
Claims 10 and 20 would be allowable if rewritten to overcome the rejection(s) under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), 2nd paragraph, set forth in this Office action and to include all of the limitations of the base claim and any intervening claims.
Claim Rejections - 35 USC § 102
For clarity of record and ease of reading, the examiner notes the following: 
Any text that is bolded is a limitation of a claim. 
The “teaching” or reference citation, along with any necessary examiner notes are contained within the parentheses “()” following the bolded claim language. 
Any text that is underlined is emphasized language from reference(s) used and/or particular important examiner notes. While NOT fully reflective of the rejection as a whole, these underlined passages are indicative or otherwise reflective of key evidence.   

In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of 
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claim(s) 1 and 11 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Romero et al. ("FitNets: Hints for Thin Deep Nets" NPL 2015).
With respect Claim 1, Romero teaches an apparatus, comprising: a teacher network… (c.f. Figure 1(a) Note teacher and student network)...a student network (c.f. Figure 1(a) Note teacher and student network)… a plurality of knowledge bridges between the teacher network and the student network, where each of the plurality of knowledge bridges provides a hint about a function being learned (Pg. 3 Section 2.2 "Hint based training" "In order to help the training of deep FitNets (deeper than their teacher), we introduce hints from the teacher network. A hint is defined as the output of a teacher’s hidden layer responsible for guiding the student’s learning process. Analogously, we choose a hidden layer of the FitNet, the guided layer, to learn from the teacher’s hint layer. We want the guided layer to be able to predict the output of the hint layer." The examiner notes that the "hint layer" or "guided layer" teaches "a plurality of knowledge bridges between the teacher network and the student network, where each of the plurality of knowledge bridges provides a hint about a function being learned".)…where a hint includes a mean square error or a probability (Pg. 4 The examiner notes that the “prediction error” and/or “output probability” teaches “where a hint includes a mean square error or a probability” )… a loss function device connected to the plurality of knowledge bridges and the student network (Note Line 4 and/or Line 6 in Algorithm 1. Note especially the Loss function in each; Pg. 3 Equation (3) "Then, we train the FitNet parameters from the first layer up to the guided layer as well as the regressor parameters by minimizing the following loss function"). 
With respect Claim 11, Romero teaches a method, comprising: training a teacher network (c.f. Figure 1(a) Note teacher and student network. Further note algorithm 1 “FitNet Stage-wise Training” and especially “The second stage is a [Knowledge Distillation] KD training of the whole network…”)…providing hints to a student network by a plurality of knowledge bridges between the teacher network and the student network…(Pg. 3 Section 2.2 "Hint based training" "In order to help the training of deep FitNets (deeper than their teacher), we introduce hints from the teacher network. A hint is defined as the output of a teacher’s hidden layer responsible for guiding the student’s learning process. Analogously, we choose a hidden layer of the FitNet, the guided layer, to learn from the teacher’s hint layer. We want the guided layer to be able to predict the output of the hint layer." The examiner notes that the "hint layer" or "guided layer" teaches "a plurality of knowledge bridges between the teacher network and the student network, where each of the plurality of knowledge bridges provides a hint about a function being learned".)...determining a loss function from outputs of the plurality of knowledge bridges and the student network (Note Line 4 and/or Line 6 in Algorithm 1. Note especially the Loss function in each; Pg. 3 Equation (3) "Then, we train the FitNet parameters from the first layer up to the guided layer as well as the regressor parameters by minimizing the following loss function"). 
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim 2 is/are rejected under 35 U.S.C. 103 as being unpatentable over Romero et al. ("FitNets: Hints for Thin Deep Nets" NPL 2015) in view of Sainath et al. ("Convolutional, Long Short-Term Memory, Fully Connected Deep Neural Networks", NPL 2015) and further in view of Gajhede et al. ("Convolutional Neural Networks with Batch Normalization for Classifying Hi-hat, Snare, and Bass Percussion Sound Samples" NPL 2016). 
With respect to Claim 2, Romero teaches all of the limitations of Claim 1 as discussed above. 
Romero further teaches a soft maximum device (Pg. 3 "Let S be a student network with parameters Ws and output probability PS = softmax(aS ), where aS is the student’s pre-softmax output"). 
Romero, however, does not appear to explicitly disclose: 
wherein each of the teacher network and the student network comprises: a 9x9 convolutional layer device
a maximum pooling layer device
a 3x1 convolutional layer device
a dimension reduction device
at least one long short term memory (LSTM) layer device

Sainath, however, does teach wherein each of the teacher network and the student network comprises: a 9x9 convolutional layer device…(Pg. 4581 Sainath Fig. 1. Section 2.1 Col. 1 "Specifically, we use 2 convolutional layers, each with 256 feature maps. We use a 9x9 frequency-time filter for the first convolutional layer, followed by a 4x3 filter for the second convolutional layer, and these filters are shared across the entire time-frequency space. Our pooling strategy is to use non-overlapping max pooling...")… a maximum pooling layer device…( Pg. 4581 Sainath Fig. 1. Section 2.1 Col. 1 "Specifically, we use 2 convolutional layers, each with 256 feature maps. We use a 9x9 frequency-time filter for the first convolutional layer, followed by a 4x3 filter for the second convolutional layer, and these filters are shared across the Our pooling strategy is to use non-overlapping max pooling...")…a dimension reduction device…(Pg. 4581 Sainath Fig. 1. Section 2.1 Col. 1 “The dimension of the last layer of the CNN is large, due to the number of feature-maps×time×frequency context. Thus, we add a linear layer to reduce feature dimension, before passing this to the LSTM layer, as indicated in Figure 1…”)… at least one long short term memory (LSTM) layer device…( Pg. 4581 Sainath Fig. 1. Section 2.1 Col. 1 “The dimension of the last layer of the CNN is large, due to the number of feature-maps×time×frequency context. Thus, we add a linear layer to reduce feature dimension, before passing this to the LSTM layer, as indicated in Figure 1…”).
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to combine the student teacher network as taught by Romero modified with the specific layers as taught by Sainath because this would reduce frequency variance within the input signal, thus improving the accuracy of the network (Sainath Pg. 4581 Col. 1). 
The combination of Romero and Sainath, however, does not appear to explicitly disclose: 
a 3x1 convolutional layer device
Gajhede, however, does teach a 3x1 convolutional layer device (Pg. 113 Section 2.5 "The third and final convolutional layer consists of 64 feature maps generated by 3x3 kernels using ReLU, max-pooling of 3x1..." The examiner notes that a convolutional layer which includes a max-pooling of 3x1 teaches “a 3x1 convolutional layer device.”).

With respect to Claim 12, Romero teaches all of the limitations of Claim 11 as discussed above. 
Romero further teaches a soft maximum device (Pg. 3 "Let S be a student network with parameters Ws and output probability PS = softmax(aS ), where aS is the student’s pre-softmax output"). 
Romero, however, does not appear to explicitly disclose: 
wherein each of the teacher network and the student network comprises: a 9x9 convolutional layer device
a maximum pooling layer device
a 3x1 convolutional layer device
a dimension reduction device
at least one long short term memory (LSTM) layer device
Sainath, however, does teach wherein each of the teacher network and the student network comprises: a 9x9 convolutional layer device…(Pg. 4581 Sainath Fig. 1. Section 2.1 Col. 1 "Specifically, we use 2 convolutional layers, each with 256 feature maps. We use a 9x9 frequency-time filter for the first convolutional layer, followed by a 4x3 filter for the second convolutional layer, and these filters are shared across the entire time-frequency space. Our pooling strategy is to use non-overlapping … a maximum pooling layer device…( Pg. 4581 Sainath Fig. 1. Section 2.1 Col. 1 "Specifically, we use 2 convolutional layers, each with 256 feature maps. We use a 9x9 frequency-time filter for the first convolutional layer, followed by a 4x3 filter for the second convolutional layer, and these filters are shared across the entire time-frequency space. Our pooling strategy is to use non-overlapping max pooling...")…a dimension reduction device…(Pg. 4581 Sainath Fig. 1. Section 2.1 Col. 1 “The dimension of the last layer of the CNN is large, due to the number of feature-maps×time×frequency context. Thus, we add a linear layer to reduce feature dimension, before passing this to the LSTM layer, as indicated in Figure 1…”)… at least one long short term memory (LSTM) layer device…( Pg. 4581 Sainath Fig. 1. Section 2.1 Col. 1 “The dimension of the last layer of the CNN is large, due to the number of feature-maps×time×frequency context. Thus, we add a linear layer to reduce feature dimension, before passing this to the LSTM layer, as indicated in Figure 1…”).
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to combine the student teacher network as taught by Romero modified with the specific layers as taught by Sainath because this would reduce frequency variance within the input signal, thus improving the accuracy of the network (Sainath Pg. 4581 Col. 1). 
The combination of Romero and Sainath, however, does not appear to explicitly disclose: 
a 3x1 convolutional layer device
Gajhede, however, does teach a 3x1 convolutional layer device (Pg. 113 Section 2.5 "The third and final convolutional layer consists of 64 feature maps 
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to combine the specific layers and student-teacher network as taught by the combination of Romero and Sainath modified with the 3x1 convolutional layer device as taught by Gajhede because this would increase the accuracy of the network, thus improve the accuracy of the classifier.

Prior Art
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
1. Moniz, Joel et al. “Convolutional Residual Memory Networks.” NPL 2016. Similar neural network structure. See for example Fig. 2. 
2. Zhao, Yuanyuan et al. “Multidimensional Residual Learning based on Recurrent Neural Networks for Acoustic Modeling.” NPL 2016. Similar neural network structure, see Figure 1. 
3. Deming, Laura et al. “Genetic Architect: Discoving Genomic Structure with Learned Neural Architectures.” NPL 2016. Note Figure 1 and specific layers of “PromoterNet.” 
4. Wang, Yisen et al. “Residual Convolutional CTC network for Automatic Speech Recognition.” NPL February 2017. Note the description of the residual Convolutional network in section 3. 
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to FEN TAMULONIS whose telephone number is (571)272-0934.  The examiner can normally be reached on 7:30AM-5:30PM MON-FRI EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ann Lo can be reached on (571)-272-9767.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/F.C.T./           Examiner, Art Unit 2126                                                                                                                                                                                             
/MICHAEL J HUNTLEY/Primary Examiner, Art Unit 2116