Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
The filing date of the present application is 2/13/2018.
This action is in response to amendments and/or remarks filed on 5/11/2021. In the current amendments, claims 1-2, 4-5, 8-9, 11-12 and 15-18 have been amended. Claims 1-20 are pending and have been examined. 
In view of Applicant’s amendments and/or remarks, the amended specification has not been modified regarding reference numbers 901, 903, 904 and 905. Drawings are further objected for the following reason: The drawings do not include the following reference sign(s) mentioned in the description: 903, 904, 905 and 901 of Paragraph 0064.
In view of Applicant’s amendments and/or remarks, the objections to the specifications made in the previous Office Action have been withdrawn.
Response to Arguments
Applicant’s arguments with respect to claim(s) 1-20 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the 
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 6-8, 15 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Shaji et al. (US 2018/0039879 A1) in view of Tai et al. (“Convolutional Neural Networks with Low-Rank Regularization”) further in view of Doumbouya et al. (US 2018/0157916 A1)

Regarding claim 1
Shaji teaches 
- deploying a neural network (NN) model on an electronic device (Shaji: [Abstract] “The method includes providing a base neural network for generating learned features.”), 
- the NN model being generated by training a first NN architecture on a first dataset wherein a first function defines a first layer of the first NN architecture, (Shaji: [0065] “The base neural network is trained on a first set of training images and the base neural network includes two or more layers comprising one or more initial layers and one or more final layers.”; the base neural network in 0065 reads on the claimed “a first NN architecture”, one or more initial layers in 0065 reads on “a first layer”), 
 a second function applied by a second layer of a second NN architecture (Shaji: [0067] “the initial layers of the second personalized neural network corresponds to the initial layers of the personalized neural network.”; one of the initial layers reads on the claimed “a second layer”);
- enabling retraining of the NN model on the electronic device using a second data set (Shaji: [0073] " Updating the base neural network comprises re-training the final layers of the base neural network with the second set of images and keeping the initial layers of the base neural network").
	Shaji does not distinctly disclose:
- approximating a second function applied by a second layer of a second NN architecture
	However, Tai teaches: 
- approximating a second function applied by a second layer of a second NN architecture (Tai: [Section 3], “The goal is to find an approximation W˜ of W that facilitates more efficient computation while maintaining the classification accuracy of the CNN”, “Based on the approximation criterion introduced in the previous section, the objective function to be minimized is:”, [Section 3] discloses how the function is approximated.)
Before the effective filling date of the claimed invention, it would have been obvious to one of ordinary skill in the art to combine the personalized aesthetic scoring neural network system of Shaji with the function approximation of Tai in order to obtain an exact solution efficiently (Tai: [Section 1] line 34-35).
	Shaji as modified by Tai does not distinctly disclose:

	However, Doumbouya teaches
- wherein the first layer of the first NN architecture replaces the second layer of the second NN architecture ([0187] “For example, in an example embodiment in which the modified second CNN is created by replacing its first N layers with M layers of the initial first CNN, on a subsequent iteration, layer M to layer M−a of the modified second CNN may be replaced with layer N to layer N−b of the initial second CNN, where each of a and b is an integer of at least zero and, in at least one example embodiment, each of a and b equals 0.”; Under the broadest reasonable interpretation, when a=b=0, M=2 and N=1, “layer M to layer M−a of the modified second CNN may be replaced with layer N to layer N−b of the initial second CNN” becomes “layer 2 of the modified second CNN may be replaced with layer 1 of the initial second CNN”; “modified second CNN” reads on “the second NN architecture” and “initial second CNN” reads on “the first NN architecture)
Before the effective filling date of the claimed invention, it would have been obvious to one of ordinary skill in the art to combine the personalized aesthetic scoring neural network system of Shaji and Tai with the layer replacement of Doumbouya to use the reduced NN architecture thereby achieving significant savings in data storage (Doumbouya: [0151]).

Regarding claim 6
Shaji as modified by Tai and Doumbouya teaches all of the limitations of claim 1 as cited above and Shaji further teaches
(Shaji: [0073] “Updating the base neural network comprises re-training the final layers of the base neural network with the second set of images”; [0076] “Updating the personalized neural network comprises re-training the final layers of the personalized neural network with the third set of images and keeping the initial layers of the personalized neural network”; [0073] shows re-training is not done with the specific dataset. )

Regarding claim 7
Shaji as modified by Tai and Doumbouya teaches all of the limitations of claim 1 as cited above and Shaji further teaches
- wherein the electronic device comprises a mobile electronic device (Shaji: [0036] “In some embodiments, the training of the personalized layer might be done in mobile device, smartphone, or other low-powered portable device.”)

Regarding claim 8
Shaji teaches 
- a memory storing instructions (Shaji: [0065]; see the structure of a device) 
- at least one processor executing the instructions including a process configured to: (Shaji: [0065]; see the structure of a device) 
- deploying a neural network (NN) model on an electronic device ([Abstract] " The method includes providing a base neural network for generating learned features."), 
([0065] “The base neural network is trained on a first set of training images and the base neural network includes two or more layers comprising one or more initial layers and one or more final layers.”; the base neural network in 0065 reads on the claimed “a first NN architecture”, one or more initial layers in 0065 reads on “a first layer”), 
- the first function being constructed based on a second function applied by a second layer of a second NN architecture ([0067] “the initial layers of the second personalized neural network corresponds to the initial layers of the personalized neural network.”; one of the initial layers reads on the claimed “a second layer”);
- enabling retraining of the NN model on the electronic device using a second data set ([0073] “ Updating the base neural network comprises re-training the final layers of the base neural network with the second set of images and keeping the initial layers of the base neural network”).
	Shaji does not distinctly disclose:
- approximating a second function applied by a second layer of a second NN architecture
	However, Tai teaches: 
- approximating a second function applied by a second layer of a second NN architecture
 ([Section 3], “The goal is to find an approximation W˜ of W that facilitates more efficient computation while maintaining the classification accuracy of the CNN”, “Based on the approximation criterion introduced in the previous section, the objective function to be minimized is:”, [Section 3] discloses how the function is approximated.)
Shaji with the function approximation of Tai in order to obtain an exact solution efficiently (Tai [Section 1] line 34-35).
	Shaji as modified by Tai does not distinctly disclose:
- wherein the first layer of the first NN architecture replaces the second layer of the second NN architecture
	However, Doumbouya teaches
- wherein the first layer of the first NN architecture replaces the second layer of the second NN architecture ([0187] “For example, in an example embodiment in which the modified second CNN is created by replacing its first N layers with M layers of the initial first CNN, on a subsequent iteration, layer M to layer M−a of the modified second CNN may be replaced with layer N to layer N−b of the initial second CNN, where each of a and b is an integer of at least zero and, in at least one example embodiment, each of a and b equals 0.”; Under the broadest reasonable interpretation, when a=b=0, M=2 and N=1, “layer M to layer M−a of the modified second CNN may be replaced with layer N to layer N−b of the initial second CNN” becomes “layer 2 of the modified second CNN may be replaced with layer 1 of the initial second CNN”; “modified second CNN” reads on “the second NN architecture” and “initial second CNN” reads on “the first NN architecture)
Before the effective filling date of the claimed invention, it would have been obvious to one of ordinary skill in the art to combine the personalized aesthetic scoring neural network Shaji and Tai with the layer replacement of Doumbouya to use the reduced NN architecture thereby achieving significant savings in data storage (Doumbouya: [0151]).

Regarding claim 15
Shaji teaches 
- deploying a neural network (NN) model on an electronic device (Shaji: [Abstract] “The method includes providing a base neural network for generating learned features.”), 
- the NN model being generated by training a first NN architecture on a first dataset wherein a first function defines a first layer of the first NN architecture, (Shaji: [0065] “The base neural network is trained on a first set of training images and the base neural network includes two or more layers comprising one or more initial layers and one or more final layers.”; the base neural network in 0065 reads on the claimed “a first NN architecture”, one or more initial layers in 0065 reads on “a first layer”),
- the first function being constructed based on a second function applied by a second layer of a second NN architecture (Shaji: [0067] “the initial layers of the second personalized neural network corresponds to the initial layers of the personalized neural network.”; one of the initial layers reads on the claimed “a second layer”);
- enabling retraining of the NN model on the electronic device using a second data set (Shaji: [0073] " Updating the base neural network comprises re-training the final layers of the base neural network with the second set of images and keeping the initial layers of the base neural network").
	Shaji does not distinctly disclose:

	However, Tai teaches: 
- approximating a second function applied by a second layer of a second NN architecture (Tai: [Section 3], “The goal is to find an approximation W˜ of W that facilitates more efficient computation while maintaining the classification accuracy of the CNN”, “Based on the approximation criterion introduced in the previous section, the objective function to be minimized is:”, [Section 3] discloses how the function is approximated.)
Before the effective filling date of the claimed invention, it would have been obvious to one of ordinary skill in the art to combine the personalized aesthetic scoring neural network system of Shaji with the function approximation of Tai in order to obtain an exact solution efficiently (Tai: [Section 1] line 34-35).
	Shaji as modified by Tai does not distinctly disclose:
- wherein the first layer of the first NN architecture replaces the second layer of the second NN architecture
	However, Doumbouya teaches
- wherein the first layer of the first NN architecture replaces the second layer of the second NN architecture ([0187] “For example, in an example embodiment in which the modified second CNN is created by replacing its first N layers with M layers of the initial first CNN, on a subsequent iteration, layer M to layer M−a of the modified second CNN may be replaced with layer N to layer N−b of the initial second CNN, where each of a and b is an integer of at least zero and, in at least one example embodiment, each of a and b equals 0.”; Under the broadest reasonable interpretation, when a=b=0, M=2 and N=1, “layer M to layer M−a of the modified 
Before the effective filling date of the claimed invention, it would have been obvious to one of ordinary skill in the art to combine the personalized aesthetic scoring neural network system of Shaji and Tai with the layer replacement of Doumbouya to use the reduced NN architecture thereby achieving significant savings in data storage (Doumbouya: [0151]).

Regarding claim 20
Shaji as modified by Tai and Doumbouya teaches all of the limitations of claim 15 as cited above and Shaji further teaches
-wherein the electronic device comprises a mobile electronic device (Shaji: [0036] “In some embodiments, the training of the personalized layer might be done in mobile device, smartphone, or other low-powered portable device.”)


Claims 2-3, 9-10, 13-14 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Shaji in view of Tai in view of Doumbouya as shown above, further in view of Guo et al. (US 2020/0167654 A1).

Regarding claim 2
Shaji as modified by Tai and Doumbouya teaches all of the limitations of claim 1 as cited above, but does not distinctly disclose:
-wherein approximating the second function is based on a sketching operation performed on parameters of the second function and a number of parameters for the second function are less than a number of parameters for the first function
However, Guo teaches 
-wherein approximating the second function is based on a sketching operation performed on parameters of the second function ( [0246] “Considering that the fully-connected layers of AlexNet contain more than 95% of its parameters, sketching them to an extreme can be attempted, namely 1 bit.”, Fig. 5 also discloses sketching) and a number of parameters for the second function are less than a number of parameters for the first function ([0250] “TABLE 6: Network sketching technique generates binary-weight ResNets with the ability to make faithful inference and roughly 7.4× fewer parameters than its reference (in bits).”; “reference” reads on “the first function”)
Before the effective filling date of the claimed invention, it would have been obvious to one of ordinary skill in the art to modify the personalized aesthetic scoring system as taught by Shaji, Tai and Doumbouya to include function sketching as taught by Guo in order to implement more flexible network and get better trade-off between the model efficiency and accuracy (Guo: [0251] line 3-8).

Regarding claim 3
Shaji as modified by Tai, Doumbouya and Guo teaches all of the limitations of claim 2 as cited above and Guo further teaches 
- wherein the sketching operation is performed along different dimensions of a tensor space for generating multiple different first functions that are combined to form the first layer (Guo: [0213] “As described above, a first goal is to find a binary expansion of W that approximates it well (as illustrated in FIG. 16, which means
W ≈ 〈 B , a 〉 =                         
                            
                                
                                    ∑
                                    
                                        j
                                        =
                                        0
                                    
                                    
                                        m
                                        -
                                        1
                                    
                                
                                
                                    
                                        
                                            a
                                        
                                        
                                            j
                                        
                                    
                                
                            
                            
                                
                                    b
                                
                                
                                    j
                                
                            
                        
                     
in which B∈{+1−1}c×w×h×m and a∈Rm are the concatenations of m binary tensors {B0, . . . , Bm-1} and the same number of scale factors {a0, . . . , am-1}, respectively”, tensors {B0, . . . , Bm-1} reads on “different dimensions of a tensor space”,  Fig. 15 also shows how multiple subsets of first layers are formed.)
Same motivation as claim 2

Regarding claim 9
Shaji as modified by Tai and Doumbouya teaches all of the limitations of claim 8 as cited above, but does not distinctly disclose:
-wherein approximating the second function is based on a sketching operation performed on parameters of the second function and a number of parameters for the second function are less than a number of parameters for the first function
However, Guo teaches 
 [0246] “Considering that the fully-connected layers of AlexNet contain more than 95% of its parameters, sketching them to an extreme can be attempted, namely 1 bit.”, Fig. 5 also discloses sketching) and a number of parameters for the second function are less than a number of parameters for the first function ([0250] “TABLE 6: Network sketching technique generates binary-weight ResNets with the ability to make faithful inference and roughly 7.4× fewer parameters than its reference (in bits).”; “reference” reads on “the first function”)
Same motivation as claim 2.

Regarding claim 10
Shaji as modified by Tai, Guo and Doumbouya teaches all of the limitations of claim 9 as cited above and Guo further teaches: 
- wherein the sketching operation is performed along different dimensions of a tensor space for generating multiple different first functions that are combined to form the first layer (Guo: [0213] “As described above, a first goal is to find a binary expansion of W that approximates it well (as illustrated in FIG. 16, which means
W ≈ 〈 B , a 〉 =                         
                            
                                
                                    ∑
                                    
                                        j
                                        =
                                        0
                                    
                                    
                                        m
                                        -
                                        1
                                    
                                
                                
                                    
                                        
                                            a
                                        
                                        
                                            j
                                        
                                    
                                
                            
                            
                                
                                    b
                                
                                
                                    j
                                
                            
                        
                     
in which B∈{+1−1}c×w×h×m and a∈Rm are the concatenations of m binary tensors {B0, . . . , Bm-1} and the same number of scale factors {a0, . . . , am-1}, respectively”,  tensors {B0, . . . , Bm-1} reads on “different dimensions of a tensor space”,  Fig. 15 also shows how multiple subsets of first layers are formed.)
Same motivation as claim 2.

Regarding claim 13
Shaji as modified by Tai, Guo and Doumbouya teaches all of the limitations of claim 9 as cited above and Shaji further teaches: 
- wherein the second dataset is a personal data set on the electronic device (Shaji: [0027], “Further, in some embodiments, the personalization layer (or layers) (i.e., those layers that are updated based on the personalized data set) may be comprised of a linear and/or non-linear multi dimensionality reduction functions.”)

Regarding claim 14
Shaji as modified by Tai, Guo and Doumbouya teaches all of the limitations of claim 9 as cited above and Shaji further teaches:
- wherein the electronic device comprises a mobile electronic device (Shaji: [0036] “In some embodiments, the training of the personalized layer might be done in mobile device, smartphone, or other low-powered portable device.”)

Regarding claim 16
Shaji as modified by Tai and Doumbouya teaches all of the limitations of claim 15 as cited above, but does not distinctly disclose:

However, Guo teaches 
-wherein approximating the second function is based on a sketching operation performed on parameters of the second function ( [0246] “Considering that the fully-connected layers of AlexNet contain more than 95% of its parameters, sketching them to an extreme can be attempted, namely 1 bit.”, Fig. 5 also discloses sketching) and a number of parameters for the second function are less than a number of parameters for the first function ([0250] “TABLE 6: Network sketching technique generates binary-weight ResNets with the ability to make faithful inference and roughly 7.4× fewer parameters than its reference (in bits).”; “reference” reads on “the first function”)
Same motivation as claim 2.

Claims 4-5 are rejected under 35 U.S.C. 103 as being unpatentable over Shaji in view of Tai in view of Doumbouya as shown above, further in view of Kisilev et al. (US 2018/0060719 A1).

Regarding claim 4
Shaji as modified by Tai and Doumbouya teaches all of the limitations of claim 1 as cited above and Shaji further teaches
(Shaji: [0027], “Further, in some embodiments, the personalization layer (or layers) (i.e., those layers that are updated based on the personalized data set) may be comprised of a linear and/or non-linear multi dimensionality reduction functions.”)
-the first NN architecture is trained on the first dataset … ([0065] “The base neural network is trained on a first set of training images …”; “The base neural network” reads on “the first NN architecture”)
	Shaji as modified by Tai and Doumbouya does not distinctly disclose
- after the second NN architecture is reduced to the first NN architecture
	However, Kisilev teaches 
- after the second NN architecture is reduced to the first NN architecture ([0027] “The reduced network may be trained on the data in that stage.”; “The reduced network” reads on the “first NN architecture” reduced from the second NN architecture.)
Before the effective filling date of the claimed invention, it would have been obvious to one of ordinary skill in the art to modify the personalized aesthetic scoring system as taught by Shaji, Tai and Doumbouya to include network reduction as taught by Kisilev in order to train neural network architecture after reduction thereby improving the speed of training (Kisilev: [0027]).

Regarding claim 5
Shaji as modified by Tai, Doumbouya and Kisilev teaches all of the limitations of claim 4 as cited above and Kisilev further teaches
(Kisilev: [0027] “A dropout layer of processing may be performed to prevent overfitting. In dropout processing, individual nodes may be either “dropped out” of the neural network with probability 1-p or kept with probability p, so that a reduced network is left Likewise, incoming and outgoing edges to a dropped-out node may also be removed.”; [0027] discloses how the neural network is computationally reduced.)
	Same motivation as claim 4.

Claims 11-12 and 17-19 are rejected under 35 U.S.C. 103 as being unpatentable over Shaji in view of Tai in view of Doumbouya in view of Guo as shown above, further in view of Kisilev.

Regarding claim 11
Shaji as modified by Tai, Doumbouya and Guo teaches all of the limitations of claim 9 as cited above and Shaji further teaches: 
- wherein the second dataset is a personal data set on the electronic device (Shaji: [0027], “Further, in some embodiments, the personalization layer (or layers) (i.e., those layers that are updated based on the personalized data set) may be comprised of a linear and/or non-linear multi dimensionality reduction functions.”)
-the first NN architecture is trained on the first dataset … ([0065] “The base neural network is trained on a first set of training images …”; “The base neural network” reads on “the first NN architecture”)
Shaji as modified by Tai, Doumbouya and Guo does not distinctly disclose
- after the second NN architecture is reduced to the first NN architecture
	However, Kisilev teaches 
- after the second NN architecture is reduced to the first NN architecture ([0027] “The reduced network may be trained on the data in that stage.”; “The reduced network” reads on the “first NN architecture” reduced from the second NN architecture.)
Before the effective filling date of the claimed invention, it would have been obvious to one of ordinary skill in the art to modify the personalized aesthetic scoring system as taught by Shaji, Tai, Doumbouya and Guo to include network reduction as taught by Kisilev in order to train neural network architecture after reduction thereby improving the speed of training (Kisilev: [0027]).

Regarding claim 12
Shaji as modified by Tai, Doumbouya, Guo and Kisilev teaches all of the limitations of claim 11 as cited above and Kisilev further teaches
- wherein the first NN architecture is computationally reduced from the second NN architecture (Kisilev: [0027] “A dropout layer of processing may be performed to prevent overfitting. In dropout processing, individual nodes may be either “dropped out” of the neural network with probability 1-p or kept with probability p, so that a reduced network is left Likewise, incoming and outgoing edges to a dropped-out node may also be removed.”; [0027] discloses how the neural network is computationally reduced.)
	Same motivation as claim 11.

Regarding claim 17
Shaji as modified by Tai, Doumbouya and Guo teaches all of the limitations of claim 16 as cited above and Shaji further teaches: 
- wherein the second dataset is a personal data set on the electronic device (Shaji: [0027], “Further, in some embodiments, the personalization layer (or layers) (i.e., those layers that are updated based on the personalized data set) may be comprised of a linear and/or non-linear multi dimensionality reduction functions.”)
-the first NN architecture is trained on the first dataset … ([0065] “The base neural network is trained on a first set of training images …”; “The base neural network” reads on “the first NN architecture”)
	Shaji as modified by Tai, Doumbouya and Guo does not distinctly disclose
- after the second NN architecture is reduced to the first NN architecture
	However, Kisilev teaches 
- after the second NN architecture is reduced to the first NN architecture ([0027] “The reduced network may be trained on the data in that stage.”; “The reduced network” reads on the “first NN architecture” reduced from the second NN architecture.)

Regarding claim 18
Shaji as modified by Tai, Doumbouya, Guo and Kisilev teaches all of the limitations of claim 17 as cited above and Kisilev further teaches
(Kisilev: [0027] “A dropout layer of processing may be performed to prevent overfitting. In dropout processing, individual nodes may be either “dropped out” of the neural network with probability 1-p or kept with probability p, so that a reduced network is left Likewise, incoming and outgoing edges to a dropped-out node may also be removed.”; [0027] discloses how the neural network is computationally reduced.)
	Same motivation as claim 17.

Regarding claim 19
Shaji as modified by Tai, Doumbouya, Guo and Kisilev teaches all of the limitations of claim 18 as cited above and Shaji further teaches:
- wherein retraining of the first NN architecture is not tied to a particular dataset (Shaji: [0073] “Updating the base neural network comprises re-training the final layers of the base neural network with the second set of images”; [0076] “Updating the personalized neural network comprises re-training the final layers of the personalized neural network with the third set of images and keeping the initial layers of the personalized neural network” ”; [0073] shows re-training is not done with the specific dataset.)

Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  


Any inquiry concerning this communication or earlier communications from the examiner should be directed to SUNG WON LEE whose telephone number is 571-272-8508.  The examiner can normally be reached on Mon-Fri 0730-1730.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Alexey Shmatov can be reached on 571-270-3428.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a 





/SUNG W LEE/Examiner, Art Unit 2123       
/ALEXEY SHMATOV/Supervisory Patent Examiner, Art Unit 2123