DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Priority
Receipt is acknowledged of certified copies of papers required by 37 CFR 1.55.
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 04/22/2020 and 08/01/2020 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.
Drawings
The drawings filed on 12/02/2019 are accepted by the Examiner.
Specification
The disclosure filed on 12/02/2019 is accepted by the Examiner.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have 

Claims 1-20 are rejected under 35 U.S.C. 103 as being unpatentable over Romero ("FITNETS: hints for thin deep nets", ICLR 2015, pages 1-13) in view of Warden ("How to Quantize Neural Networks with Tensor Flow", 2016 Pete Warden, pages 1-6)
Examiner’s Note: The Examiner agrees with the rejections in the first Office action from the Chinese Patent Office provided in the IDS.
Regarding claims 1, 12 and 19, Romero discloses inputting a to-be-processed image into a neural network wherein the neural network is trained based on guidance information, and during the training process, the neural network is taken as a student neural network; and the guidance information comprises a difference between discrete feature data formed by a teacher neural network for an image sample and discrete feature data formed by the student neural network for the image sample (abstract sections 1-2 Romero specifically discloses “image classification” … “The framework compresses an ensemble of deep networks (teacher) into a student network of similar depth” …”minimizing the following loss function:

    PNG
    media_image1.png
    48
    630
    media_image1.png
    Greyscale
 LHT where uh and vg are the teacher/student deep nested functions up to their respective hint/guided layers with parameters W Hint and W Guided, r is the regressor function on top of the guided layer with parameters W r· Note that the outputs of uh and r have to be comparable, i.e., uh and r must be the same non-linearity” and slender neural network, it involves image processing (equivalent to an image processing what is obvious to try is also obvious, such as where "there is a design need or market pressure to solve a problem, and there are a finite number of identified, predictable solutions, a person of ordinary skill has good reason to pursue the known options within his or her technical grasp. If this leads to the anticipated success, it is likely the product not of innovation but of ordinary skill and common sense." Regarding hindsight, the Court found that "[r]igid preventive rules that deny fact finders recourse to common sense . . . are neither necessary under our case law nor consistent with it." The Court stated that "familiar items may have obvious uses beyond their primary purposes," analogizing an obvious invention to the fitting together of pieces to a puzzle. The Court in this regard further stated that the person of ordinary skill is also a person of ordinary creativity, and not "an automaton.")

    PNG
    media_image1.png
    48
    630
    media_image1.png
    Greyscale

 LHT where uh and vg are the teacher/student deep nested functions up to their respective hint/guided layers with parameters W Hint and W Guided, r is the regressor function on top of the guided layer with parameters W r· Note that the outputs of uh and r have to be comparable, i.e., uh and r must be the same non-linearity” and slender neural network, it involves image processing (equivalent to an image processing method), and specifically discloses the following (see Section 1, Section 2): Deep neural networks are commonly used for image classification and object detection, performance requirements often make the network have a large number of parameters, occupy a large amount of computing and storage resources, and are not suitable for memory or time-constrained applications. In this paper, a deep and slender neural network is proposed, two training steps, to get deeper and finer student neural networks, teacher neural network middle layer is required as the Hint Layer, instructing to train the corresponding layer of the student network, taking the difference between the output 
Regarding claims 2 and 13, Romero and Warden discloses claims 1 and 12, Warden also discloses forming floating-point feature data of the to-be-processed image via the neural network, and quantizing the floating-point feature data into the discrete feature data of the to-be-processed image (see page 2: A disk space network model may occupy a large number of millions, e.g. The initial motivation of quantizing network is to reduce the size of model file. When network weights are saved as files, the minimum and maximum values of each layer are stored, then each floating-point value is represented by 8-bit integer. Another motivation for quantization is to reduce the computational resource requirements of the prediction process, where it is necessary to implement the full computation using 8-bit (i.e., quantify the file data, quantify the data involved in the computation process).
Regarding claims 3 and 14, Romero and Warden discloses claims 2 and 13, Warden also discloses extracting floating-point feature data from the to-be-processed image via the neural network, and converting the extracted floating-point feature data into floating-point feature data satisfying a predetermined requirement to form the floating-point feature data of the to-be-processed image (see page 2: A disk space network model may occupy a large number of millions, e.g. The initial motivation of quantizing network is to reduce the size of model file. When network weights are saved as files, the minimum and maximum values of each layer are stored, then each floating-point value is represented by 8-bit integer. Another motivation for quantization is to 
Regarding claims 4 and 15, Romero and Warden discloses claims 3 and 14, Warden also discloses at least one of: converting the floating-point feature data into floating-point feature data with a predetermined number of channels; or converting the floating-point feature data into floating-point feature data with a predetermined size (see page 2: A disk space network model may occupy a large number of millions, e.g. The initial motivation of quantizing network is to reduce the size of model file. When network weights are saved as files, the minimum and maximum values of each layer are stored, then each floating-point value is represented by 8-bit integer. Another motivation for quantization is to reduce the computational resource requirements of the prediction process, where it is necessary to implement the full computation using 8-bit (i.e., quantify the file data, quantify the data involved in the computation process).
Regarding claims 5 and 16, Romero and Warden discloses claims 1 and 12, Romero also discloses performing corresponding vision task processing on the to-be-processed image via the neural network according to the discrete feature data of the to-be-processed image, wherein the guidance information further comprises: a difference between a vision task processing result output by the student neural network for the image sample and tagging information of the image sample (abstract sections 1-2 Romero specifically discloses “image classification” … “The framework compresses an ensemble of deep networks (teacher) into a student network of similar depth” …”minimizing the following loss function: 
    PNG
    media_image1.png
    48
    630
    media_image1.png
    Greyscale
LHT where uh and vg are the teacher/student deep nested functions up to their respective hint/guided layers with parameters W Hint and W Guided, r is the regressor function on top of the guided layer with parameters W r· Note that the outputs of uh and r have to be comparable, i.e., uh and r must be the same non-linearity” and slender neural network, it involves image processing).
Regarding claims 6 and 17, Romero and Warden discloses claims 5 and 16, Romero also discloses performing classification processing on the to-be-processed image via the neural network according to the discrete feature data of the to-be-processed image; or performing object detection processing on the to-be-processed image according to the discrete feature data of the to-be-processed image, wherein the guidance information further comprises: a difference between a classification processing result output by the student neural network for the image sample and classification tagging information of the image sample; or a difference between an object detection processing result output by the student neural network for the image sample and detection box tagging information of the image sample (abstract sections 1-2 Romero specifically discloses “image classification” … “The framework compresses an ensemble of deep networks (teacher) into a student network of similar depth” …”minimizing the following loss function:

    PNG
    media_image1.png
    48
    630
    media_image1.png
    Greyscale

LHT where uh and vg are the teacher/student deep nested functions up to their respective hint/guided layers with parameters W Hint and W Guided, r is the regressor 
Regarding claim 8, Romero and Warden discloses claim 7, Romero also discloses performing vision task processing on the image sample via the student neural network according to the feature data of the image sample and performing supervised learning on the student neural network by using, as guidance information, the difference between the discrete feature data formed by the teacher neural network for the image sample and the discrete feature data formed by the student neural network for the image sample and a difference between a vision task processing result output by the student neural network and tagging information of the image sample (abstract sections 1-2 Romero specifically discloses “image classification” … “The framework compresses an ensemble of deep networks (teacher) into a student network of similar depth” …”minimizing the following loss function: 
    PNG
    media_image1.png
    48
    630
    media_image1.png
    Greyscale


    PNG
    media_image2.png
    275
    768
    media_image2.png
    Greyscale


Regarding claim 9, Romero and Warden discloses claim 7, before the effective filing date of the claimed invention, it would have been obvious to a person of ordinary skill in the art to successfully trained floating-point teacher neural network configured to form floating-point feature data for an input image, and perform vision task processing on the input image according to the floating-point feature data and to convert the floating-point feature data formed by the floating-point teacher neural network into discrete feature data, and provide the discrete feature data to the floating-point teacher neural network, so that the floating-point teacher neural network performs vision task processing on the input image according to the discrete feature data (Romero abstract sections 1-2 Romero specifically discloses “image classification” … “The framework compresses an ensemble of deep networks (teacher) into a student network of similar depth” …”minimizing the following loss function: 
    PNG
    media_image1.png
    48
    630
    media_image1.png
    Greyscale

LHT where uh and vg are the teacher/student deep nested functions up to their respective hint/guided layers with parameters W Hint and W Guided, r is the regressor function on top of the guided layer with parameters W r· Note that the outputs of uh and r have to be comparable, i.e., uh and r must be the same non-linearity” and slender neural network, it involves image processing).

    PNG
    media_image1.png
    48
    630
    media_image1.png
    Greyscale

LHT where uh and vg are the teacher/student deep nested functions up to their respective hint/guided layers with parameters W Hint and W Guided, r is the regressor function on top of the guided layer with parameters W r· Note that the outputs of uh and r have to be comparable, i.e., uh and r must be the same non-linearity” and slender neural network, it involves image processing).
Regarding claim 11, Romero and Warden discloses claim 9, before the effective filing date of the claimed invention, it would have been obvious to a person of ordinary skill in the art to input an image sample into a to-be-trained floating-point teacher neural 
    PNG
    media_image1.png
    48
    630
    media_image1.png
    Greyscale

LHT where uh and vg are the teacher/student deep nested functions up to their respective hint/guided layers with parameters W Hint and W Guided, r is the regressor function on top of the guided layer with parameters W r· Note that the outputs of uh and r have to be comparable, i.e., uh and r must be the same non-linearity” and slender neural network, it involves image processing).
Regarding claim 20, Romero and Warden discloses claim 1, before the effective filing date of the claimed invention, it would have been obvious to a person of ordinary skill in the art that the student neural network is trained by using the teacher neural network forming discrete feature data, such that the knowledge of the teacher neural network can be transferred to the student neural network, and the network parameters of the student neural network are not limited to fixed-point network parameters, the student neural network is configured to perform floating-point arithmetic, such that after the student neural network is successfully trained, the neural network is not be limited 
    PNG
    media_image1.png
    48
    630
    media_image1.png
    Greyscale

LHT where uh and vg are the teacher/student deep nested functions up to their respective hint/guided layers with parameters W Hint and W Guided, r is the regressor function on top of the guided layer with parameters W r· Note that the outputs of uh and r have to be comparable, i.e., uh and r must be the same non-linearity” and slender neural network, it involves image processing).
Claims 1-20 are rejected under 35 U.S.C. 103 as being unpatentable over Li (US 20190287515 A1) in view of Warden ("How to Quantize Neural Networks with Tensor Flow", 2016 Pete Warden, pages 1-6)
Regarding claims 1, 12 and 19, Li discloses inputting a to-be-processed image into a neural network wherein the neural network is trained based on guidance information, and during the training process, the neural network is taken as a student neural network; and the guidance information comprises a difference between discrete 

    PNG
    media_image3.png
    553
    576
    media_image3.png
    Greyscale

Li doesn’t specifically disclose forming discrete feature data of the to-be-processed image via the neural network.  Warden discloses forming discrete feature data of the to-be-processed image via the neural network (see page 2: A disk space 
Regarding claims 7 and 18, Li discloses inputting an image sample into a student neural network and a teacher neural network, respectively and performing supervised learning on the student neural network according to guidance information, wherein the guidance information comprises a difference between discrete feature data formed by the teacher neural network for the image sample and discrete feature data formed by the student neural network for the image sample (abstract figures 3-5 and 8-9 paragraphs [0037]-[0080] Li specifically discloses “The results from the speech recognition models 204, 208 are compared by an output comparator 304, and the differences between the two result sets are fed back into the student model 208 to update the student model 208” … “The outputs from the speech recognition models 
Regarding claims 2 and 13, Li and Warden discloses claims 1 and 12, Warden also discloses forming floating-point feature data of the to-be-processed image via the 
Regarding claims 3 and 14, Li and Warden discloses claims 2 and 13, Warden also discloses extracting floating-point feature data from the to-be-processed image via the neural network, and converting the extracted floating-point feature data into floating-point feature data satisfying a predetermined requirement to form the floating-point feature data of the to-be-processed image (see page 2: A disk space network model may occupy a large number of millions, e.g. The initial motivation of quantizing network is to reduce the size of model file. When network weights are saved as files, the minimum and maximum values of each layer are stored, then each floating-point value is represented by 8-bit integer. Another motivation for quantization is to reduce the computational resource requirements of the prediction process, where it is necessary to implement the full computation using 8-bit (i.e., quantify the file data, quantify the data involved in the computation process).
Regarding claims 4 and 15, Li and Warden discloses claims 3 and 14, Warden also discloses at least one of: converting the floating-point feature data into floating-
Regarding claims 5 and 16, Li and Warden discloses claims 1 and 12, Li also discloses performing corresponding vision task processing on the to-be-processed image via the neural network according to the discrete feature data of the to-be-processed image, wherein the guidance information further comprises: a difference between a vision task processing result output by the student neural network for the image sample and tagging information of the image sample (abstract figures 3-5 and 8-9 paragraphs [0037]-[0080] Li specifically discloses “The results from the speech recognition models 204, 208 are compared by an output comparator 304, and the differences between the two result sets are fed back into the student model 208 to update the student model 208” … “The outputs from the speech recognition models 204, 208 are compared by the output comparator 304, and the differences or similarities in the predicted words/phonemes/senones posteriors are fed back into the student model 208 to update the student model 208, according to one of various machine 
Regarding claims 6 and 17, Li and Warden discloses claims 5 and 16, Li also discloses performing classification processing on the to-be-processed image via the neural network according to the discrete feature data of the to-be-processed image; or performing object detection processing on the to-be-processed image according to the discrete feature data of the to-be-processed image, wherein the guidance information further comprises: a difference between a classification processing result output by the student neural network for the image sample and classification tagging information of the image sample; or a difference between an object detection processing result output by the student neural network for the image sample and detection box tagging information of the image sample (abstract figures 3-5 and 8-9 paragraphs [0037]-[0080] Li specifically discloses “The results from the speech recognition models 204, 208 are compared by an output comparator 304, and the differences between the two result sets are fed back into the student model 208 to update the student model 208” … “The outputs from the speech recognition models 204, 208 are compared by the output comparator 304, and the differences or similarities in the predicted words/phonemes/senones posteriors are fed back into the student model 208 to update the student model 208, according to one of various machine learning techniques or schemes to more accurately identify speech in accord with the outputs from the teacher model 204”).
Regarding claim 8, Li and Warden discloses claim 7, Li also discloses performing vision task processing on the image sample via the student neural network according to 
Regarding claim 9, Li and Warden discloses claim 7, before the effective filing date of the claimed invention, it would have been obvious to a person of ordinary skill in the art to successfully trained floating-point teacher neural network configured to form floating-point feature data for an input image, and perform vision task processing on the input image according to the floating-point feature data and to convert the floating-point feature data formed by the floating-point teacher neural network into discrete feature data, and provide the discrete feature data to the floating-point teacher neural network, so that the floating-point teacher neural network performs vision task processing on the 
Regarding claim 10, Li and Warden discloses claim 9, before the effective filing date of the claimed invention, it would have been obvious to a person of ordinary skill in the art to input an image sample into a successfully trained floating-point teacher neural network, extracting floating-point feature data of the image sample via the successfully trained floating-point teacher neural network, converting the floating-point feature data into discrete feature data via the quantization auxiliary unit, and performing vision task processing on the image sample via the successfully trained floating-point teacher neural network according to the discrete feature data, and performing network parameter adjustment on the successfully trained floating-point teacher neural network by using a difference between the vision task processing result and tagging information of the image sample as guidance information (abstract figures 3-5 and 8-9 paragraphs [0037]-[0080] Li specifically discloses “The results from the speech recognition models 204, 208 are compared by an output comparator 304, and the differences between the 
Regarding claim 11, Li and Warden discloses claim 9, before the effective filing date of the claimed invention, it would have been obvious to a person of ordinary skill in the art to input an image sample into a to-be-trained floating-point teacher neural network, extracting floating-point feature data of the image sample via the to-be-trained floating-point teacher neural network, and performing vision task processing on the image sample according to the floating-point feature data, and performing supervised learning on the to-be-trained floating-point teacher neural network by using a difference between the vision task processing result and tagging information of the image sample as guidance information (abstract figures 3-5 and 8-9 paragraphs [0037]-[0080] Li specifically discloses “The results from the speech recognition models 204, 208 are compared by an output comparator 304, and the differences between the two result sets are fed back into the student model 208 to update the student model 208” … “The outputs from the speech recognition models 204, 208 are compared by the output comparator 304, and the differences or similarities in the predicted words/phonemes/senones posteriors are fed back into the student model 208 to update the student model 208, according to one of various machine learning techniques or 
Regarding claim 20, Li and Warden discloses claim 1, before the effective filing date of the claimed invention, it would have been obvious to a person of ordinary skill in the art that the student neural network is trained by using the teacher neural network forming discrete feature data, such that the knowledge of the teacher neural network can be transferred to the student neural network, and the network parameters of the student neural network are not limited to fixed-point network parameters, the student neural network is configured to perform floating-point arithmetic, such that after the student neural network is successfully trained, the neural network is not be limited by a specific instruction set and a specific device, thereby facilitating improvement in an application range of the neural network, and the floating feature data obtained by the floating-point arithmetic are converted into discrete feature data by quantization and maintain high accuracy, thereby facilitating matching between the discrete feature data output by the teacher neural network and the discrete feature data output by the student neural network and transfer of the knowledge of the teacher neural network to the student neural network (abstract figures 3-5 and 8-9 paragraphs [0037]-[0080] Li specifically discloses “The results from the speech recognition models 204, 208 are compared by an output comparator 304, and the differences between the two result sets are fed back into the student model 208 to update the student model 208” … “The outputs from the speech recognition models 204, 208 are compared by the output comparator 304, and the differences or similarities in the predicted words/phonemes/senones posteriors are fed back into the student model 208 to update .
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
Kang (US 20170083829 A1) discloses model training method and apparatus, and data recognizing method.
Choi (US 20180268292 A1	) discloses learning efficient object detection models with knowledge distillation.
Kim (US 20180336465 A1) discloses apparatus and method for student-teacher transfer learning network using knowledge bridge.
Oh (US 20190034764 A1) discloses method and apparatus for generating training data to train student model using teacher model.
Li (US 20190051290 A1) discloses domain adaptation in speech recognition via teacher-student learning.
Chung (US 20190325308 A1) discloses multi-task learning using knowledge distillation.
Gupta (US 20200349435 A1) discloses Secure Training of Multi-Party Deep Neural Network.
Kim (US 20200357384 A1) discloses model training method and apparatus.
Keshwani (US 20200380313 A1) discloses machine learning device and method.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JUAN A TORRES whose telephone number is (571) 272-3119.  The examiner can normally be reached on M-F 9-5.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kenneth N Vanderpuye can be reached on (571) 272-3078.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.


/JUAN A TORRES/           Primary Examiner, Art Unit 2636