DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This Office Action is responsive to communications filed on 03/29/19. Claims 1-20 are pending in the instant application. Claims 1, 10 and 19 are independent. An Office Action on the merits follows here below.
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 03/29/19 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(d):
(d) REFERENCE IN DEPENDENT FORMS.—Subject to subsection (e), a claim in dependent form shall contain a reference to a claim previously set forth and then specify a further limitation of the subject matter claimed. A claim in dependent form shall be construed to incorporate by reference all the limitations of the claim to which it refers.


Claim 18 is rejected under 35 U.S.C. 112(d) as being of improper dependent form for failing to further limit the subject matter of the claim upon which it depends, or for failing to include all the limitations of the claim upon which it depends.  Claim 18 depends from claim 1 which is not a system but a method so the Examiner assumes that Claim 18 contains a typographical error. The Examiner assumes that claim 18 should depend from independent claim 10. The Examiner rejected the claim as claim 18 equally mirrors claim 9. Notwithstanding,  applicant may cancel the claim(s), amend the claim(s) to place the claim(s) in proper dependent form, rewrite the claim(s) in independent form, or present a sufficient showing that the dependent claim(s) complies with the statutory requirements. Appropriate correction/clarification is required. 
Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.
(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claims 1-5, 9-14, 18, 19, 20 are rejected under 35 U.S.C. 102 (a)(1) and/or (a)(2) as being anticipated by Karam et al. (US 20190303720 A1).
Regarding Claim 1: Karam discloses a method for training a machine learning model (Refer to para [002]; “systems and methods for feature corrections and regeneration for robust sensing, transmission, computer vision, recognition, and classification.” comprising: acquiring, by one or more processing units (Refer to para [107]; “The Generative Sensing Framework 100 includes a bus 101 (i.e., interconnect), at least one processor 102 or other computing element…”) a training data (Refer to para [031]; “referred to herein as the DeepCorrect framework, the aforementioned questions are addressed by first evaluating the effect of image distortions like blur, noise, and adversarial perturbations on the outputs of pre-trained convolutional filters…”) and training, by one or more processing units, the machine learning model based on the training data, the training comprising: optimizing, by one or more processing units  (Refer to para [036]; “Two popular image classification datasets were used: 1) CIFAR-100 (Krizhevsky and Hinton, 2009) which has fixed size images of 32×32 pixels and 2) ImageNet (Deng et al, 2009) (ILSVRC2012) which has images of varying resolution, usually greater than 128×128 pixels. CIFAR-100 consists of 50,000 training images and 10,000 testing images covering 100 different object categories. The CIFAR-100 training set was split into two sets: 45,000 images for training and 5,000 for validation. The ImageNet dataset consists of around 1.3 million training images covering 1,000 object classes and 50,000 validation images, with 50 validation images per class. The validation set of ImageNet was used for evaluating performance.”) the machine learning model based on stochastic gradient descent (SGD) (Refer to para [041]; “The version of a fully-convolutional network shown in FIG. 3A is based on the All-Convolutional Net proposed by Springenberg et al (2014), with the addition of batch normalization units after each convolutional layer. This network, which serves as the baseline model for CIFAR-100, was trained using stochastic gradient descent. The AlexNet DNN proposed by Krizhevsky et al (2012), shown in FIG. 3B, was used as a baseline model for the ImageNet dataset.”) by adding a dynamic noise to a gradient of a model parameter of the machine learning model calculated by the SGD (Refer to para [053 and 058]; “For a given layer, let ϕm(u) be the output (activation map) of the mth convolutional filter with kernel weights represented by the tensor (3-dimensional array) Wm for an input u. Let em=ϕm(u+r)−ϕm(u) be the additive noise (perturbation) that is caused in the output activation map ϕm(u) as a result of applying an additive perturbation r to the input u. It can be shown that em is bounded as follows… The DeepCorrect framework and the most obvious fine-tuning baseline models were evaluated to distortion affected distributions on the two datasets and architectures described herein. The DNN models were trained and tested using the python-based deep learning framework Keras and a single Nvidia Titan-X GPU. The classification performance was evaluated independently for each type of distortion. Although the classification accuracy for each distortion level was reported separately, for both the DeepCorrect framework and fine-tuning baseline models, a single model that was used to classify images affected by different levels of distortion including the undistorted images was learnt. Unlike common image denoising and deblurring methods like BM3D and NCSR which expect the distortion/noise severity to be known or other preprocessing methods that assume knowledge of the sensor characteristics/image formation model or learning-based methods that train separate models for each noise/distortion severity, the DeepCorrect framework-based models were totally blind to the severity of distortion added to a test image.”).

Regarding Claim 2: Karam discloses the machine learning model is a convolutional neural networks (CNN) or a recurrent neural network (RNN) (Refer to para [031]; “The present disclosure observes that for every layer of convolutional filters in the DNN, certain filters are far more susceptible to input distortions than others and that correcting the activations of these filters can help recover lost performance. Metrics are disclosed herein to rank the convolutional filters in order of the highest gain in classification accuracy upon correction. In one embodiment, the features can be corrected by appending correction units taking the form of small blocks of stacked convolutional layers such as a residual block with a single skip connection or other CNN-based block or other blocks implementing a transformation, at the output of select filters and train them to correct the worst distortion-affected filter activations using a target-oriented loss or other desired loss function, whilst leaving the rest of the pre-trained filter outputs in the network unchanged.”).

Regarding Claim 3: Karam discloses the training data is selected from the group consisting: pathological data; autopilot data; medical experimental data; biological data; internet of things (IoT) data; social network data; and e-commerce data (Refer to para [096 and 097]; “The baseline AlexNet that has been pre-trained for object recognition on the relatively high-quality visible-wavelength (RGB) ImageNet dataset has been adopted for the baseline pre-trained DNN ø of FIG. 18 in order to show the performance of the generative sensing framework 100 and its ability to generalize to different tasks (e.g., face recognition, scene recognition), different input modalities (e.g., RGB, NIR, and IR), and different sensor sizes/resolutions. For varying tasks and input modalities, the RGB-IR SCface face recognition dataset and the EPFL RGB-NIR scene recognition dataset are used. Unlike the task of face recognition, where the aim is to assign the face in a test image to one of the known subjects in a database, the goal of scene recognition is to classify the entire scene of the image. The SCface dataset was primarily designed for surveillance-based face recognition.”).

Regarding Claim 4: Karam discloses the optimizing further comprises minimizing a loss function of the machine learning model (Refer to para [031, 094, 095]; “the features can be corrected by appending correction units taking the form of small blocks of stacked convolutional layers such as a residual block with a single skip connection or other CNN-based block or other blocks implementing a transformation, at the output of select filters and train them to correct the worst distortion-affected filter activations using a target-oriented loss or other desired loss function, whilst leaving the rest of the pre-trained filter outputs in the network unchanged.”).

Regarding Claim 5: Karam discloses the added dynamic noise is selected from a predefined noise set (Refer to para [090 and 092]; “A visible wavelength image sensor comprises a core part of digital cameras and smart phones. The cost of image sensors has been aggressively scaled largely due to the high-volume consumer market products. In addition to the cost, the technical specification of image sensors, including the number of pixels, color contrast, dynamic range, and power consumption, meet almost all demands of consumer market products. IR sensors, on the other hand, have been mostly used for specific needs such as surveillance- and tracking-based applications in the military domain, yet some started penetrating the consumer market recently. IR sensors typically cost significantly higher than image sensors to produce IR images at an equivalent resolution primarily due to the relatively low volume market. Generally those sensors show a trend of “the higher the cost is, the better the delivered performance.””).

Regarding Claim 9: Karam discloses the noise is a Gaussian noise (Refer to Table 002; “Top-1 accuracy of pre-trained networks for images distorted by additive white Gaussian noise.”).

Regarding Claim 10: Karam discloses a computer system (Refer to para [002]; “systems and methods for feature corrections and regeneration for robust sensing, transmission, computer vision, recognition, and classification.”) comprising: a processor (Refer to para [107]; “The Generative Sensing Framework 100 includes a bus 101 (i.e., interconnect), at least one processor 102 or other computing element…”) a computer-readable memory coupled to the processor (Refer to para [107]; “a main memory 104, a removable storage media 105, a read-only memory 106, and a mass storage device 107.”) the memory comprising instructions that when executed by the processor perform actions of: acquiring a training data (Refer to para [031]; “referred to herein as the DeepCorrect framework, the aforementioned questions are addressed by first evaluating the effect of image distortions like blur, noise, and adversarial perturbations on the outputs of pre-trained convolutional filters…”) and training a machine learning model based on the training data (Refer to para [036]; “Two popular image classification datasets were used: 1) CIFAR-100 (Krizhevsky and Hinton, 2009) which has fixed size images of 32×32 pixels and 2) ImageNet (Deng et al, 2009) (ILSVRC2012) which has images of varying resolution, usually greater than 128×128 pixels. CIFAR-100 consists of 50,000 training images and 10,000 testing images covering 100 different object categories. The CIFAR-100 training set was split into two sets: 45,000 images for training and 5,000 for validation. The ImageNet dataset consists of around 1.3 million training images covering 1,000 object classes and 50,000 validation images, with 50 validation images per class. The validation set of ImageNet was used for evaluating performance.”) wherein the training comprises: optimizing the machine learning model based on stochastic gradient descent (SGD) (Refer to para [041]; “The version of a fully-convolutional network shown in FIG. 3A is based on the All-Convolutional Net proposed by Springenberg et al (2014), with the addition of batch normalization units after each convolutional layer. This network, which serves as the baseline model for CIFAR-100, was trained using stochastic gradient descent. The AlexNet DNN proposed by Krizhevsky et al (2012), shown in FIG. 3B, was used as a baseline model for the ImageNet dataset.”) by adding a dynamic noise to a gradient of a model parameter of the machine learning model calculated by the SGD (Refer to para [053 and 058]; “For a given layer, let ϕm(u) be the output (activation map) of the mth convolutional filter with kernel weights represented by the tensor (3-dimensional array) Wm for an input u. Let em=ϕm(u+r)−ϕm(u) be the additive noise (perturbation) that is caused in the output activation map ϕm(u) as a result of applying an additive perturbation r to the input u. It can be shown that em is bounded as follows… The DeepCorrect framework and the most obvious fine-tuning baseline models were evaluated to distortion affected distributions on the two datasets and architectures described herein. The DNN models were trained and tested using the python-based deep learning framework Keras and a single Nvidia Titan-X GPU. The classification performance was evaluated independently for each type of distortion. Although the classification accuracy for each distortion level was reported separately, for both the DeepCorrect framework and fine-tuning baseline models, a single model that was used to classify images affected by different levels of distortion including the undistorted images was learnt. Unlike common image denoising and deblurring methods like BM3D and NCSR which expect the distortion/noise severity to be known or other preprocessing methods that assume knowledge of the sensor characteristics/image formation model or learning-based methods that train separate models for each noise/distortion severity, the DeepCorrect framework-based models were totally blind to the severity of distortion added to a test image.”).

Regarding Claim 11: Karam discloses the machine learning model is a convolutional neural networks (CNN) or a recurrent neural network (RNN) (Refer to para [031]; “The present disclosure observes that for every layer of convolutional filters in the DNN, certain filters are far more susceptible to input distortions than others and that correcting the activations of these filters can help recover lost performance. Metrics are disclosed herein to rank the convolutional filters in order of the highest gain in classification accuracy upon correction. In one embodiment, the features can be corrected by appending correction units taking the form of small blocks of stacked convolutional layers such as a residual block with a single skip connection or other CNN-based block or other blocks implementing a transformation, at the output of select filters and train them to correct the worst distortion-affected filter activations using a target-oriented loss or other desired loss function, whilst leaving the rest of the pre-trained filter outputs in the network unchanged.”).

Regarding Claim 12: Karam discloses the training data is selected from the group consisting: pathological data; autopilot data; medical experimental data; biological data; internet of things (IoT) data; social network data; e-commerce data (Refer to para [096 and 097]; “The baseline AlexNet that has been pre-trained for object recognition on the relatively high-quality visible-wavelength (RGB) ImageNet dataset has been adopted for the baseline pre-trained DNN ø of FIG. 18 in order to show the performance of the generative sensing framework 100 and its ability to generalize to different tasks (e.g., face recognition, scene recognition), different input modalities (e.g., RGB, NIR, and IR), and different sensor sizes/resolutions. For varying tasks and input modalities, the RGB-IR SCface face recognition dataset and the EPFL RGB-NIR scene recognition dataset are used. Unlike the task of face recognition, where the aim is to assign the face in a test image to one of the known subjects in a database, the goal of scene recognition is to classify the entire scene of the image. The SCface dataset was primarily designed for surveillance-based face recognition.”).

Regarding Claim 13: Karam discloses the optimizing further comprises minimizing a loss function of the machine learning model (Refer to para [031, 094, 095]; “the features can be corrected by appending correction units taking the form of small blocks of stacked convolutional layers such as a residual block with a single skip connection or other CNN-based block or other blocks implementing a transformation, at the output of select filters and train them to correct the worst distortion-affected filter activations using a target-oriented loss or other desired loss function, whilst leaving the rest of the pre-trained filter outputs in the network unchanged.”).

Regarding Claim 14: Karam discloses the added dynamic noise is selected from a predefined noise set (Refer to para [090 and 092]; “A visible wavelength image sensor comprises a core part of digital cameras and smart phones. The cost of image sensors has been aggressively scaled largely due to the high-volume consumer market products. In addition to the cost, the technical specification of image sensors, including the number of pixels, color contrast, dynamic range, and power consumption, meet almost all demands of consumer market products. IR sensors, on the other hand, have been mostly used for specific needs such as surveillance- and tracking-based applications in the military domain, yet some started penetrating the consumer market recently. IR sensors typically cost significantly higher than image sensors to produce IR images at an equivalent resolution primarily due to the relatively low volume market. Generally those sensors show a trend of “the higher the cost is, the better the delivered performance.””).

Regarding Claim 18: Karam discloses the noise is a Gaussian noise (Refer to Table 002; “Top-1 accuracy of pre-trained networks for images distorted by additive white Gaussian noise.”).

Regarding Claim 19: Karam discloses a computer program product for training a machine learning model (Refer to para [107]; “a main memory 104, a removable storage media 105, a read-only memory 106, and a mass storage device 107.”) comprising a computer readable storage medium having program instructions embodied therewith (Refer to para [104]; “Main memory 104 can be Random Access Memory (RAM) or any other dynamic storage device(s) commonly known in the art. Read-only memory 106 can be any static storage device(s) such as Programmable Read-Only Memory (PROM) chips for storing static information such as instructions for processor 102. Mass storage device 107 can be used to store information and instructions. For example, hard disks such as the Adaptec® family of Small Computer Serial Interface (SCSI) drives, an optical disc, an array of disks such as Redundant Array of Independent Disks (RAID), such as the Adaptec® family of RAID drives, or any other mass storage devices, may be used.”) the program instructions executable by a processor to cause the processor to: acquiring, by one or more processing units (Refer to para [107]; “a main memory 104, a removable storage media 105, a read-only memory 106, and a mass storage device 107.”) a training data (Refer to para [031]; “referred to herein as the DeepCorrect framework, the aforementioned questions are addressed by first evaluating the effect of image distortions like blur, noise, and adversarial perturbations on the outputs of pre-trained convolutional filters…”)  and training, by one or more processing units (Refer to para [107]; “a main memory 104, a removable storage media 105, a read-only memory 106, and a mass storage device 107.”)  the machine learning model based on the training data (Refer to para [036]; “Two popular image classification datasets were used: 1) CIFAR-100 (Krizhevsky and Hinton, 2009) which has fixed size images of 32×32 pixels and 2) ImageNet (Deng et al, 2009) (ILSVRC2012) which has images of varying resolution, usually greater than 128×128 pixels. CIFAR-100 consists of 50,000 training images and 10,000 testing images covering 100 different object categories. The CIFAR-100 training set was split into two sets: 45,000 images for training and 5,000 for validation. The ImageNet dataset consists of around 1.3 million training images covering 1,000 object classes and 50,000 validation images, with 50 validation images per class. The validation set of ImageNet was used for evaluating performance.”) the training comprising: optimizing the machine learning model based on stochastic gradient descent (SGD) (Refer to para [041]; “The version of a fully-convolutional network shown in FIG. 3A is based on the All-Convolutional Net proposed by Springenberg et al (2014), with the addition of batch normalization units after each convolutional layer. This network, which serves as the baseline model for CIFAR-100, was trained using stochastic gradient descent. The AlexNet DNN proposed by Krizhevsky et al (2012), shown in FIG. 3B, was used as a baseline model for the ImageNet dataset.”) by adding a dynamic noise to a gradient of a model parameter of the machine learning model calculated by the SGD (Refer to para [053 and 058]; “For a given layer, let ϕm(u) be the output (activation map) of the mth convolutional filter with kernel weights represented by the tensor (3-dimensional array) Wm for an input u. Let em=ϕm(u+r)−ϕm(u) be the additive noise (perturbation) that is caused in the output activation map ϕm(u) as a result of applying an additive perturbation r to the input u. It can be shown that em is bounded as follows… The DeepCorrect framework and the most obvious fine-tuning baseline models were evaluated to distortion affected distributions on the two datasets and architectures described herein. The DNN models were trained and tested using the python-based deep learning framework Keras and a single Nvidia Titan-X GPU. The classification performance was evaluated independently for each type of distortion. Although the classification accuracy for each distortion level was reported separately, for both the DeepCorrect framework and fine-tuning baseline models, a single model that was used to classify images affected by different levels of distortion including the undistorted images was learnt. Unlike common image denoising and deblurring methods like BM3D and NCSR which expect the distortion/noise severity to be known or other preprocessing methods that assume knowledge of the sensor characteristics/image formation model or learning-based methods that train separate models for each noise/distortion severity, the DeepCorrect framework-based models were totally blind to the severity of distortion added to a test image.”).

Regarding Claim 20: Karam discloses the training data is selected from the group consisting of: pathological data; autopilot data; medical experimental data; biological data; internet of things (IoT) data; social network data; e- commerce data (Refer to para [096 and 097]; “The baseline AlexNet that has been pre-trained for object recognition on the relatively high-quality visible-wavelength (RGB) ImageNet dataset has been adopted for the baseline pre-trained DNN ø of FIG. 18 in order to show the performance of the generative sensing framework 100 and its ability to generalize to different tasks (e.g., face recognition, scene recognition), different input modalities (e.g., RGB, NIR, and IR), and different sensor sizes/resolutions. For varying tasks and input modalities, the RGB-IR SCface face recognition dataset and the EPFL RGB-NIR scene recognition dataset are used. Unlike the task of face recognition, where the aim is to assign the face in a test image to one of the known subjects in a database, the goal of scene recognition is to classify the entire scene of the image. The SCface dataset was primarily designed for surveillance-based face recognition.”).
Allowable Subject Matter
Claims 6-8 and 15-17 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
The prior art either singularly or in combination, made of record, does not expressly disclose “machine learning model as a CNN, and a predefined noise set comprises noises with three different scales and the training data are labeled pathological images.”
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
US 20200105377 A1 discloses “a machine learning model for classification or regression consisting of multiple layers of linear transformations followed by element-wise nonlinearities typically trained via stochastic gradient descent and back-propagation.”
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MIA M THOMAS whose telephone number is (571)270-1583. The examiner can normally be reached M-Th 8:30am-4:30pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Edward (Ed) Urban can be reached on 572-272-7899. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

MIA M. THOMAS
Primary Examiner
Art Unit 2665



/MIA M THOMAS/Primary Examiner
Art Unit 2665