DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . Claims 1-10 are presented for examination.
Priority
Acknowledgment is made of applicant's claim for foreign priority based on an application filed in Japan on December 13, 2016. It is noted, however, that applicant has not filed a certified copy of the Japanese Patent Application No. 2016-241629 as required by 37 CFR 1.55.

Specification
The disclosure is objected to because of the following informalities: 
In Para. [0055], line 7, “with a decrease in the degree” and is decreased…” should read “with a decrease in the degree and are decreased…”
In Para. [0062], line 1, “the CPU sets, initial values…” should read “the CPU sets initial values…”  (omit comma)
Appropriate correction is required.
The specification is objected to as failing to provide proper antecedent basis for the claimed subject matter.  See 37 CFR 1.75(d)(1) and MPEP § 608.01(o).  Correction of the following is required: Claim 3 references “correspondence values” which are not referenced or defined in the specification.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory 
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claims 1, 3, 9, and 10 are rejected under 35 U.S.C. 103 as being unpatentable over Roh et al. (A fuzzy ensemble of parallel polynomial neural networks with information granules formed by fuzzy clustering), hereinafter Roh, in further view of Shankar et al. (Refining Architectures of Deep Convolutional Neural Networks), hereinafter Shankar.

Regarding claim 1, Roh discloses an information processing device (Abstract: “In this paper, we introduce a new category of fuzzy models called a fuzzy ensemble of parallel polynomial neural network (FEP2N2), which consist of a series of polynomial neural networks weighted by activation levels of information granules formed with the use of fuzzy clustering.” The fuzzy model is the information processing device) comprising: identifying, based on input-output characteristics of the divided neural networks (Sec. 4 Para. 1: “In Ref. [11], the concept of a fuzzy multi model was proposed where the entire system was divided into R sub-spaces and ‘‘R” local models represent the relationship of input and output variable. The number of the sub-spaces is predefined and is equal to the number of the clusters. In this way we can obtain a sort of a multi model by structuring PNNs in parallel. Functionally this means that each PNN represents a certain relationship formed in the corresponding input sub-space.” Here, the R sub-spaces are the divided neural networks with the relationship of input and output variable being the input-output characteristics), parameters of each of polynomial neural networks corresponding to each of the divided neural networks (Sec. 4.1 Para. 1: “A local learning algorithm is used to estimate the values of the coefficients of polynomials, which are consequent parts of the fuzzy model. The parameters and the structure of the local model for the decoupled fuzzy rule can be estimated without considering its relationship with other fuzzy rules. Put it differently: a local learning algorithm can estimate the parameters of the local model given only the corresponding local input space.” Here, the parameters of the PNNs are their coefficients and structure), and output another neural network generated by linking the identified polynomial neural networks (Sec. 4 Para. 1: “To overcome this shortcoming which is inherent to the increasing number of the layers of PNNs being added in a successive manner, we connect some PNNs in parallel.”).
Roh fails to disclose an information processing device comprising: a memory; and a processor coupled to the memory and the processor configured to: acquire a neural network, divide the neural network into divided neural networks.
Shankar teaches an information processing device comprising: a memory; and a processor coupled to the memory (Sec. 4 Para. 1: “We evaluate our approach on SUN Attributes Dataset (SAD) [16] and Cambridge-MIT Natural Scenes Attributes Dataset (CAMIT-NSAD) [20]. Both the datasets have classes of natural scenes attributes, who’s listing can be found in Fig 3.” Sec. 4 Para. 7: “We choose GoogleNet [22] and VGG-11 [21] as the base CNN architectures, which we intend to alter using our approach.” Shankar uses known, publicly available digital datasets which are stored in some form of memory and apply them to well-known CNN architectures indicating the presence of a processor, which is running the CNNs, which is coupled to the memory storing the datasets) and the processor configured to: acquire a neural network (Sec. 3 Para. 2: “For a given dataset and a base CNN architecture, we first train the CNN on the dataset using a given loss function (such as softmax loss, sigmoid cross entropy loss, etc. [9]”) and divide the neural network into divided neural networks (Abstract: “We use two operations for architecture refinement, viz. stretching and symmetrical splitting. Stretching increases the number of hidden units (nodes) in a given CNN layer, while a symmetrical split of say K between two layers separates the input and output channels into K equal groups, and connects only the corresponding input-output channel groups.” Figure 1: The figure illustrates the splitting operation where a neural network is divided into two neural networks).
Shankar and the instant application are analogous because they both speak to dividing neural networks into multiple different neural networks. It would have been obvious to one of ordinary skill in the art to modify Roh to include using an information processing device to acquire and divide a neural network in order to reduce redundancy in the networks and increase their accuracy (Shankar Sec. 3 Para. 4: “As we will discuss in the next subsection and Section 4, both the stretch and split operations applied to the same layer helps us to optimally reduce the model size and increase accuracy.”).

Regarding claim 3, incorporating the rejection of claim 1, Roh discloses wherein the processor changes the parameters in a case where correspondence values corresponding to the respective polynomial neural networks do not satisfy a predetermined condition (Sec. 4.2 Para. 1: “The pseudocode used to build the fuzzy ensemble of parallel polynomial neural networks combined by activation levels is covered in Fig. 5.” Sec. 4.2 Step 6.4: “The termination condition that controls the growth of the model consists of two components such as the performance index and the size of the networks. Here, the termination condition deals with the number of layers being reached. In other words, the growth of the network stops once we have reached the predetermined maximum L…” Sec. 4.2 Step 6.5: “If the termination condition has not been satisfied, the outputs of the retained nodes serve as new inputs to the next layer.” Here, the parameter being changed is the structure of the PNN. If the termination (predetermined) condition of the network is not reached, the network continues to grow, changing its structure. The correspondence values are the number of layers in the networks. The fact that Polynomial Neural Networks are models generated and run in a computing environment and the presence pseudocode in the excerpt indicates a processor is being used to execute the steps in the reference.).

Regarding claim 9, Roh discloses an information processing method (Sec. 2 Para. 1: “In what follows, we consider the FCM method as a viable algorithmic vehicle of information granulation.”) comprising: identifying, based on input-output characteristics of the divided neural networks (Sec. 4 Para. 1: “In Ref. [11], the concept of a fuzzy multi model was proposed where the entire system was divided into R sub-spaces and ‘‘R” local models represent the relationship of input and output variable. The number of the sub-spaces is predefined and is equal to the number of the clusters. In this way we can obtain a sort of a multi model by structuring PNNs in parallel. Functionally this means that each PNN represents a certain relationship formed in the corresponding input sub-space.” Here, the R sub-spaces are the divided neural networks with the relationship of input and output variable being the input-output characteristics), parameters of each of polynomial neural networks corresponding to each of the divided neural networks (Sec. 4.1 Para. 1: “A local learning algorithm is used to estimate the values of the coefficients of polynomials, which are consequent parts of the fuzzy model. The parameters and the structure of the local model for the decoupled fuzzy rule can be estimated without considering its relationship with other fuzzy rules. Put it differently: a local learning algorithm can estimate the parameters of the local model given only the corresponding local input space.” Here, the parameters of the PNNs are their coefficients and structure), and output another neural network generated by linking the identified polynomial neural networks (Sec. 4 Para. 1: “To overcome this shortcoming which is inherent to the increasing number of the layers of PNNs being added in a successive manner, we connect some PNNs in parallel.”).
Roh fails to disclose executing by a computer: acquiring a neural network and dividing the neural network into divided neural networks.
Shankar teaches executing by a computer (Sec. 4 Para. 1: “We evaluate our approach on SUN Attributes Dataset (SAD) [16] and Cambridge-MIT Natural Scenes Attributes Dataset (CAMIT-NSAD) [20]. Both the datasets have classes of natural scenes attributes, who’s listing can be found in Fig 3.” Sec. 4 Para. 7: “We choose GoogleNet [22] and VGG-11 [21] as the base CNN architectures, which we intend to alter using our approach.” Shankar uses known, publicly available digital datasets which are stored in some form of memory and apply them to well-known CNN architectures indicating the presence of a processor, which is running the CNNs, which is coupled to the memory storing the datasets): acquiring a neural network (Sec. 3 Para. 2: “For a given dataset and a base CNN architecture, we first train the CNN on the dataset using a given loss function (such as softmax loss, sigmoid cross entropy loss, etc. [9]”).and dividing the neural network into divided neural networks (Abstract: “We use two operations for architecture refinement, viz. stretching and symmetrical splitting. Stretching increases the number of hidden units (nodes) in a given CNN layer, while a symmetrical split of say K between two layers separates the input and output channels into K equal groups, and connects only the corresponding input-output channel groups.” Figure 1: The figure illustrates the splitting operation where a neural network is divided into two neural networks).
As we will discuss in the next subsection and Section 4, both the stretch and split operations applied to the same layer helps us to optimally reduce the model size and increase accuracy.”).

Regarding claim 10, Roh discloses identifying, based on input-output characteristics of the divided neural networks (Sec. 4 Para. 1: “In Ref. [11], the concept of a fuzzy multi model was proposed where the entire system was divided into R sub-spaces and ‘‘R” local models represent the relationship of input and output variable. The number of the sub-spaces is predefined and is equal to the number of the clusters. In this way we can obtain a sort of a multi model by structuring PNNs in parallel. Functionally this means that each PNN represents a certain relationship formed in the corresponding input sub-space.” Here, the R sub-spaces are the divided neural networks with the relationship of input and output variable being the input-output characteristics), parameters of each of polynomial neural networks corresponding to each of the divided neural networks (Sec. 4.1 Para. 1: “A local learning algorithm is used to estimate the values of the coefficients of polynomials, which are consequent parts of the fuzzy model. The parameters and the structure of the local model for the decoupled fuzzy rule can be estimated without considering its relationship with other fuzzy rules. Put it differently: a local learning algorithm can estimate the parameters of the local model given only the corresponding local input space.” Here, the parameters of the PNNs are their coefficients and structure), and outputting another neural network generated by linking the identified polynomial neural networks (Sec. 4 Para. 1: “To overcome this shortcoming which is inherent to the increasing number of the layers of PNNs being added in a successive manner, we connect some PNNs in parallel.”).

Shankar teaches a non-transitory computer-readable medium storing a program that causes a computer to execute a process comprising (Sec. 4 Para. 1: “We evaluate our approach on SUN Attributes Dataset (SAD) [16] and Cambridge-MIT Natural Scenes Attributes Dataset (CAMIT-NSAD) [20]. Both the datasets have classes of natural scenes attributes, who’s listing can be found in Fig 3. Sec. 4 Para. 7: We choose GoogleNet [22] and VGG-11 [21] as the base CNN architectures, which we intend to alter using our approach.” Shankar uses known, publicly available digital datasets which are stored in some form of computer-readable medium and apply them to well-known CNN architectures indicating the presence of a processor, which is running the CNNs, which is coupled to the memory storing the datasets): acquiring a neural network (Sec. 3 Para. 2: “For a given dataset and a base CNN architecture, we first train the CNN on the dataset using a given loss function (such as softmax loss, sigmoid cross entropy loss, etc. [9]”) and dividing the neural network into divided neural networks (Abstract: “We use two operations for architecture refinement, viz. stretching and symmetrical splitting. Stretching increases the number of hidden units (nodes) in a given CNN layer, while a symmetrical split of say K between two layers separates the input and output channels into K equal groups, and connects only the corresponding input-output channel groups.” Figure 1: The figure illustrates the splitting operation where a neural network is divided into two neural networks).
It would have been obvious to one of ordinary skill in the art to modify Oh to include using an information processing device to acquire and divide a neural network in order to reduce redundancy in the networks and increase their accuracy (Shankar Sec. 3 Para. 4: “As we will discuss in the next subsection and Section 4, both the stretch and split operations applied to the same layer helps us to optimally reduce the model size and increase accuracy.”).

Claim 2 is rejected under 35 U.S.C. 103 as being unpatentable over Roh, in further view of Shankar and Hampel et al. (U.S. Patent No. 3,922,536), hereinafter Hampel.  
Regarding claim 2, Roh, in view of Shankar, discloses the information processing device according to claim 1. Roh further discloses wherein a process of identifying the parameters includes arithmetic operations of a multivariable polynomial (Sec. 4.2 Step 3: “We use the training dataset to estimate the coefficients of the polynomials and evaluate each PN using the training dataset. The approximation ability can be quantified in the form ACk =                         
                            
                                
                                    1
                                
                                
                                    N
                                
                            
                            
                                
                                    
                                        
                                            Y
                                            -
                                            X
                                            
                                                
                                                    a
                                                
                                                
                                                    k
                                                
                                            
                                        
                                    
                                
                                
                                    T
                                
                            
                            
                                
                                    D
                                
                                
                                    k
                                
                                
                                    q
                                
                            
                            (
                            Y
                            -
                            X
                            
                                
                                    a
                                
                                
                                    k
                                
                            
                            )
                        
                    ” ).
 Roh, in view of Shankar, does not disclose arithmetic operations of a multivariable polynomial performed, in parallel arithmetic, by addition circuits and multiplication circuits.
Hampel teaches arithmetic operations of a multivariable polynomial performed, in parallel arithmetic (Col. 1 Lines 30-34: This disclosure describes an invention which is more efficiently adapted to the rapid solution of high-order multivariable polynomials and which can be implemented to produce machines with high level artificial intelligence. Col. 1 Lines 54-55: “FIG. 6 is a block diagram of a floating point bit-parallel arithmetic processor.”), by addition circuits and multiplication circuits (Col. 8 Line 67- Col. 9 Line 7: “A circuit for evaluating arbitrarily complex multinomial expressions comprising in combination: a plurality of multiplier-added cells, each cell having three input ports for receiving electrical signals representing, respectively, a multiplicand, a multiplier, and an addend, and an output port for producing electrical output signals representing the product of the multiplier and multiplicand which product is added to the addend…” Here, Hampel describes a circuit composed of cells capable of multiplication and addition.).
Hampel is analogous to the instant application because both implement multiplication and addition circuits for the purpose of solving polynomial expressions in computing environments. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claim to use This disclosure describes an invention which is more efficiently adapted to the rapid solution of high-order multivariable polynomials and which can be implemented to produce machines with high level artificial intelligence.”). 

 Claims 4, 5, 7 and 8 are rejected under 35 U.S.C. 103 as being unpatentable over Roh, in further view of Shankar and Karnavas et al. (Excitation Control of a Synchronous Machine Using Polynomial Neural Networks), hereinafter Karnavas.  

Regarding claim 4, Roh, in view of Shankar, discloses the information processing device according to claim 1. Roh further discloses an information processing device wherein the parameters include a degree and a number of intermediate elements of each of the polynomial neural networks (Sec. 3 Para. 5: “The essence of the design is such that simple functions are combined at all nodes of each layer of the PNN, which leads to more complex dependencies. The outputs obtained from each of the nodes of the same layer are combined to form a higher order polynomial. The degree of the polynomial itself increases in proportion to the number of the selected inputs and the number of the layers of the network.”), and the degree and the number of intermediate elements of each of the polynomial neural networks are identified without exceeding an upper limit number of the number of intermediate elements (Sec. 4.2 Step 6.1: “Determine input variables of each polynomial neuron and the degree of polynomial: We determine the number of input variables which can be used as inputs to each node. The number of polynomial neurons which can be generated in the current layer is calculated using                         
                            S
                            =
                            
                                
                                    m
                                    !
                                
                                
                                    
                                        
                                            m
                                            -
                                            K
                                        
                                    
                                    !
                                    K
                                    !
                                
                            
                            ∙
                            T
                        
                    . T is the number of types of polynomial (refer to Table 1, in this paper, T = 4) as shown on Table 1. S is the number of PNs, m is the number of the output variables of the previous layer which are used as input variables in the current layer (if the current layer is the first layer, m is the number of system input variables. K is the maximum number of input variables which can be used as inputs to each node).” Sec. 4.2 Step 6.4: “Here, the termination condition deals with the number of layers being reached. In other words, the growth of the network stops once we have reached the predetermined maximum L (in this paper, this value is set to 3, L = 3).” In these excerpts, the types of polynomials refer to polynomials of different degrees. The number of intermediate elements refers to the number of layers with the upper limit being the predetermined maximum L.).
Roh, in view of Shankar, does not disclose intermediate elements including addition circuits and multiplication circuits.
Karnavas teaches intermediate elements including addition circuits and multiplication circuits (Sec. 4.2 Para. 1: “Figure 8(a) shows the proposed PSN controller for the present work. The input x is an N dimensional vector and xk is the kth component of x. The inputs are weighted and fed to a layer of K linear summing units, where K is the desired order of the network. Sec. 4.2.1 Para. 1: There are a total of (N +1)K adjustable weights and thresholds for each output unit, since there are N + 1 weights associated with each summing unit. The learning rule is a randomized version of the gradient descent procedure. Since the output yi is a function of the product of all the hji's, we do not have to adjust all the variable weights at each learning cycle.” Fig. 9: Here the figure shows the intermediate elements of the network including the summing units (addition circuits) and product units (multiplication circuits) Sec. 6 Para. 1: “It is emphasized that the hardware implementation for such kind of controllers is easier than FLC ones and the computational time needed for real-time applications is drastically reduced.”).
Karnavas and the instant application are analogous because they speak to applications of polynomial neural networks. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claim to modify Roh to include a hardware implementation of the polynomial neural networks because it would significantly reduce the computational time needed for real-time It is emphasized that the hardware implementation for such kind of controllers is easier than FLC ones and the computational time needed for real-time applications is drastically reduced.).

Regarding claim 5, Roh, in view of Shankar and Karnavas, discloses the information processing device according to claim 4.  Roh further discloses wherein based on the input-output characteristics of the divided neural networks, loss functions of the respective polynomial neural networks are calculated (Sec. 1 Para. 3: When we consider global approximation ability (approximation ability or modeling ability, for brief) of the fuzzy model concerning the overall input-output space, the fuzzy model produced with the use of the local learning algorithm is inferior to the model developed with the aid of the global learning algorithm. This result is self-evident. The minimized objective function (performance index) of the global learning algorithm focuses on the minimization of an overall error encountered between the outputs of the resulting model and the corresponding experimental data. The minimized objective function from the excerpt is the loss function), it is determined whether or not each of the calculated loss functions are greater than a threshold value, and in a case where one of the calculated loss functions is greater than the threshold value, a corresponding one of the degrees and a corresponding one of the numbers of intermediate elements are changed (Sec. 3 Para. 4: Here one estimates the values of the parameters of the PN by invoking the weighted least square method and using available training data. In this way, we choose the optimal model forming the 1st layer… Afterwards, we take another pair of new input variables, and repeat the construction of the successive PNs until some given stopping criterion has been satisfied… Furthermore, all the nodes of the previous layers that do not exhibit any influence on the estimated output node are also removed by tracing the data flow at each iteration of the design process. Sec. 3 Para. 5: The essence of the design is such that simple functions are combined at all nodes of each layer of the PNN, which leads to more complex dependencies. The outputs obtained from each of the nodes of the same layer are combined to form a higher order polynomial. Using the weighted least square method [loss function] the algorithm decides which polynomial neurons are part of the optimal solution and discards those that are not. The polynomial neurons that are retained are combined to form a higher order polynomial meaning that the degree and coefficients are changed based on the results of the least square method. The stopping criterion functions as a threshold value.).

Regarding claim 7, Roh, in view of Shankar and Karnavas, discloses the information processing device according to claim 5.  Roh further discloses wherein, in a case where one of the loss functions after changing a corresponding one of the degrees and a corresponding one of the numbers of intermediate elements is greater than the threshold value, a corresponding one of the divided polynomial neural networks is further divided (Sec. 3 Para. 4: Afterwards, we take another pair of new input variables, and repeat the construction of the successive PNs until some given stopping criterion has been satisfied. Once the final layer has been constructed, the node characterized by the best performance is selected as the output node. All remaining nodes in that layer are then discarded. Furthermore, all the nodes of the previous layers that do not exhibit any influence on the estimated output node are also removed by tracing the data flow at each iteration of the design process. Here the further division occurs in the form of the nodes of the previous layers that are not connected to the output node being separated from the polynomial neural network and removed. The changing of the corresponding degrees and numbers comes from the nodes of the final layer being discarded. The stopping criterion functions as a threshold value, as above.). 

Regarding claim 8, incorporating the rejection of claim 5, Roh discloses wherein the linking is a process for linking the polynomial neural networks (Sec. 4.2 Para. 1: The pseudocode used to build the fuzzy ensemble of parallel polynomial neural networks combined by activation levels is covered in Fig. 5. This is the process for linking the PNNs) identified based on the parameters for which the loss functions are less than or equal to the threshold value (Sec. 3 Para. 4: In this way, we choose the optimal model forming the 1st layer. In the sequel, we construct new PNs using intermediate variables (for example, zm) being generated at the current iteration. Afterwards, we take another pair of new input variables, and repeat the construction of the successive PNs until some given stopping criterion has been satisfied. Once the final layer has been constructed, the node characterized by the best performance is selected as the output node. Sec. 4.1 Para. 1: Given the nature of the fuzzy clustering, the condition of disjointness of the sub-spaces is not met and this calls for use of the weighted LSE. A local learning algorithm is used to estimate the values of the coefficients of polynomials, which are consequent parts of the fuzzy model. Sec. 4.1 Para. 5: For the local learning algorithm, the objective function is defined as a linear combination of error, which is expresses a difference between output data and the output of the local model of each fuzzy rule, based on a modified diagonal partition matrix (Dj)q by a linguistic modifier q. Here, the networks that are being combined to form the model are identified based on the weighted LSE [loss function]. The networks were formed by meeting the stopping criterion which functions as the threshold as above.).

Claim 6 is rejected under 35 U.S.C. 103 as being unpatentable over Roh, in further view of Shankar, Karnavas, and Dognin et al. (U.S. Patent No. 9,626,621), hereinafter Dognin.
Regarding claim 6, Roh, in view of Shankar and Karnavas, discloses the information processing device according to claim 5. Roh fails to disclose wherein the parameters include weighting parameters, and the weighting parameters are calculated based on gradients calculated from the loss functions.
Dognin teaches wherein the parameters include weighting parameters, and the weighting parameters are calculated based on gradients calculated from the loss functions (Col. 7 Lines 7-10: For example, in accordance with an embodiment of the present invention, a gradient of the loss is computed on a sample portion of the training data and a solution is found. Col. 7 Lines 14-17: Weighting is performed to calculate the contribution of the gradient from the first and second iterations, when finding the solution in the second iteration. Col. 13 Lines 5-10: Weighting a gradient may comprise dynamically estimating a weight assigned to the gradient to provide a loss function gradient before an iteration takes place. Weights assigned to gradients may be based on a tunable parameter that controls exponentiation of the weights across the plurality of subsets.).
Dognin is analogous to the instant application because both pertain to the functionality of neural networks. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claim to modify Oh to include a weighting parameter calculated based on gradients calculated from a loss function in order to converge on the best solution by incorporating information calculated from the gradients (Col. 6 Lines 57-58: Effectively, SAG re-uses gradient information computed at previous iterations to help the convergence of its solution. Col. 6 Lines 64-67: Embodiments of the present invention extend and combine the concept of SAG with HF sequence training, which is referred to as dynamic stochastic average gradient with Hessian-free (DSAG-HF) optimization. Col. 7 Lines 14-17: Weighting is performed to calculate the contribution of the gradient from the first and second iterations, when finding the solution in the second iteration.).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Kim, J., Park, Y., Kim, G., & Hwang, S. J. (2017, July). SplitNet: Learning to semantically split deep networks for parameter reduction and model parallelization. In International Conference on Machine Learning
Oh, Sung-Kwun, Witold Pedrycz, and Byoung-Jun Park. "Polynomial neural networks architecture: analysis and design." Computers & Electrical Engineering 29.6 (2003): 703-725. (Teaches the formation of polynomial neural networks)
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JEREMY SCOTT COOPER whose telephone number is (313)446-6643.  The examiner can normally be reached on M-F 8:00-5:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kamran Afshar can be reached on (571) 272-7796.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.


                                                                                                                                                                                         
/KAMRAN AFSHAR/Supervisory Patent Examiner, Art Unit 2125