Detailed Action
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after April 25, 2019, is being examined under the first inventor to file provisions of the AIA .
Claim 1-2, 4-9, 11-17, 19-20 are pending.
Claim 3, 10, and 18 are cancelled.

Drawings
The amended drawing was received on 3/10/2022. The drawing is acceptable.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(d):
(d) REFERENCE IN DEPENDENT FORMS.—Subject to subsection (e), a claim in dependent form shall contain a reference to a claim previously set forth and then specify a further limitation of the subject matter claimed. A claim in dependent form shall be construed to incorporate by reference all the limitations of the claim to which it refers.

The following is a quotation of pre-AIA  35 U.S.C. 112, fourth paragraph:
Subject to the following paragraph [i.e., the fifth paragraph of pre-AIA  35 U.S.C. 112], a claim in dependent form shall contain a reference to a claim previously set forth and then specify a further limitation of the subject matter claimed. A claim in dependent form shall be construed to incorporate by reference all the limitations of the claim to which it refers.


Claim 4, 11, 17 are rejected under 35 U.S.C. 112(d) or pre-AIA  35 U.S.C. 112, 4th paragraph, as being of improper dependent form for failing to further limit the subject matter of the claim upon which it depends, or for failing to include all the limitations of the claim upon which it depends. The claim recites ‘wherein the plurality of features are identified using a reconstruction error value’. Claim 1, 8, and 15 where the claim 4, 11, and 17 depend on, already recites ‘identifying a plurality of features based on the reconstruction model’.  Applicant may cancel the claim(s), amend the claim(s) to place the claim(s) in proper dependent form, rewrite the claim(s) in independent form, or present a sufficient showing that the dependent claim(s) complies with the statutory requirements.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


The claim 15-20 is/are rejected under 35 U.S.C. 101 because the claimed invention is directed to a signal per se without significantly more.

	Regarding claim 15, the claim recites “a computer program product for identifying feature importance in deep learning models, comprising: one or more computer-readable tangible storage media and program instructions stored on at least one of the one or more computer-readable tangible storage media”. However, the specification fails to provide clear support for the claimed computer readable tangible storage media. Without clear support for “computer-readable tangible storage media”, it is unclear if applicant intends to claim something broader than e.g., RAM. ROM, CD-ROM, disks, etc. and cover signals, carrier waves and other forms of transmission media. Therefore, the claim is not limited to statutory subject matter and thus non-statutory.

	Claim 16-20 are dependent of claim 15, inherits all its limitations, and are non-statutory. Therefore, claim 16-20 are rejected.


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

	
Claim 1-2, 4-6, 8-9, 11-13, 15-17, and 19-20 are rejected under 35 U.S.C. 103 over Wojna (Wojna et al, 2017, “The Devil is in the Decoder”) in view of Jordan (Jordan, 03/2018, "Introduction to autoencoders"), in view of Do (Do et al, 2016, “Learning to Hash with Binary Deep Neural Network”) in view of Chen (US 20200334867 A1) and further in view of Qin (Qin et al, 2018, “Convolutional Neural Networks and Hash Learning for Feature Extraction and of Fast Retrieval of Pulmonary Nodules”).

Regarding claim 1, Wojna teaches a method for identifying feature importance in deep learning models, the method comprising: 
building a reconstruction model ([Wojna, 1 Introduction, Figure 1] “Models for such applications are usually composed of a feature extractor that decreases spatial resolution while learning high-dimensional representation and a decoder that recovers the original input resolution”); 
processing the output of the trained model using the reconstruction model ([Wojna, Figure 1]  output of the encoder (i.e. trained model) is processed by the decoder (i.e. reconstruction model)); 
wherein the reconstruction model is built using the convolution decoder ([Wojna, 2 Decoder Architecture, line 9-13] “Decoder architectures were previously studied only in the context of the single problem: analyzes 5×5 transposed convolution and proposes equivalent convolutional operation but faster...”. Using transposed convolution to build a decoder is common practice as suggested by Wojna); 
Wojna does not specifically discloses intercepting an output of a trained prediction model at a bottleneck layer, computing one or more hash codes at the bottleneck layer to train a convolution decoder to map the one or more hash codes to one or more input features, identifying a plurality of features based on the reconstruction model, wherein the reconstruction model decodes the plurality of features by calculating a reconstruction error value, ranking an importance of the one or more input features, based on the calculated reconstruction error value produced by the reconstruction model, wherein a lower reconstruction error value signifies a greater importance of the one or more input features, and wherein a higher reconstruction error value signifies a lesser importance of the one or more input features.
Jordan teaches intercepting an output of a trained prediction model at a bottleneck layer ([Jordan, page 9, ‘Sparse autoencoders’, and the first figure] “Sparse autoencoders offer us an alternative method for introducing an information bottleneck without requiring a reduction in the number of nodes at our hidden layers. Rather, we'll construct our loss function such that we penalize activations within a layer”, Jordan discloses the bottleneck structure of generic autoencoders, and flow of data between encoder (i.e. trained prediction model) and decoder (i.e. reconstruction model”);
It would have been obvious to a person of ordinary skill in art before the effective filling date of the claimed invention to implement the function of Jordan into the computer program product of Wojna to have wherein using the interception of data at bottleneck layer. The modification would have been obvious because one of the ordinary skills of the art would implement the function of Jordan wherein using the interception of data at the bottleneck layer to process it through reconstruction model (i.e. decoder) as it is well known in the art to have bottleneck layer between encoder and decoder of an autoencoder, intercept data at a bottleneck layer to process it through decoder as suggested by Jordan. The motivation to do so is to increase efficiency of unsupervised learning.
Wojna in view of Jordan does not specifically discloses computing one or more hash codes at the bottleneck layer to train a convolution decoder to map the one or more hash codes to one or more input features, identifying a plurality of features based on the reconstruction model, wherein the reconstruction model decodes the plurality of features by calculating a reconstruction error value, ranking an importance of the one or more input features, based on the calculated reconstruction error value produced by the reconstruction model, wherein a lower reconstruction error value signifies a greater importance of the one or more input features, and wherein a higher reconstruction error value signifies a lesser importance of the one or more input features.
Do teaches computing one or more hash codes at the bottleneck layer to train a convolution decoder to map the one or more hash codes to one or more input features ([Do, page 4, Fig 1] “Fig. 1. The illustration of our network (D = 4;L = 2). In our proposed network design, the outputs of layer n-1 are constrained to {-1,1} and are used as the binary codes. During training, these codes are used to reconstruct the input samples at the final layer”, in this figure, the reconstruction layer is being equated to be the convolution decoder.  [Do, page 2, second paragraph] “Furthermore, a good hashing method should produce binary codes with these properties [5]: (i) similarity preserving, i.e., (dis)similar inputs should likely have (dis)similar binary codes; (ii) independence, i.e., different bits in the binary codes are independent to each other; (iii) balance, i.e., each bit has a 50% chance of being 1 or -1”, discloses the hashing method produces binary codes, and [Do, page 3, 2.1 Formulation of UH-BDNN, line 5 – page 4, second paragraph, line 7] “Our idea is to learn the network such that the output values of the penultimate layer (layer n-1) can be used as the binary codes. We introduce constraints in the learning algorithm such that the output values at the layer n-1 have the following desirable properties: (i) belonging to {-1, 1}; (ii) similarity preserving; (iii) independent and (iv) balancing. Figure 1 illustrates our network for the case D = 4; L = 2 … The first term of (1) makes sure that the binary code gives a good reconstruction of X. It is worth noting that the reconstruction criterion has been used as an indirect way for preserving the similarity in state-of-the-art unsupervised hashing methods [6,22,21], i.e., it encourages (dis)similar inputs map to (dis)similar binary codes”); 
Before the effective filing date of the invention to a person of ordinary skill in the art, it would have been obvious, having the teachings of Wojna, Jordan, and Do to use the computing hash codes at bottleneck layer of Do to implement the feature importance identification method of Wojna and Jordan. The suggestion and/or motivation for doing so is to compress the input data that passes through the encoder.
Wojna in view of Jordan, and further in view of Do does not specifically discloses identifying a plurality of features based on the reconstruction model, wherein the reconstruction model decodes the plurality of features by calculating a reconstruction error value, ranking an importance of the one or more input features, based on the calculated reconstruction error value produced by the reconstruction model, wherein a lower reconstruction error value signifies a greater importance of the one or more input features, and wherein a higher reconstruction error value signifies a lesser importance of the one or more input features.
Chen teaches identifying a plurality of features based on the reconstruction model, wherein the reconstruction model decodes the plurality of features by calculating a reconstruction error value ([Chen, 0059] “In some implementations, to generate identity-preserving face images, a feature reconstruction loss function may be used to train the third sub-network 330 so as to encourage the synthesized image x′ and the image x.sup.s … In some implementations, assuming that f.sub.C(x) represents the feature of the image x extracted from at least one layer of the fourth sub-network 340, the feature reconstruction loss function may be constructed to measure the difference between the feature extracted from the synthesized image x.sup.s and the feature extracted from the original image x′ ”); 
Before the effective filing date of the invention to a person of ordinary skill in the art, it would have been obvious, having the teachings of Wojna, Jordan, Do and Chen to use the process of identifying plurality of features based on reconstruction loss of Chen to implement the method of identifying feature importance in deep learning of Wojna, Jordan, and Do. The suggestion and/or motivation for doing so is to obtain the data (features) needed to measure the feature importance before calculating it.
Wojna in view of Jordan, in view of Do, and further in view of Chen does not specifically teaches ranking an importance of the one or more input features, based on the calculated reconstruction error value produced by the reconstruction model, wherein a lower reconstruction error value signifies a greater importance of the one or more input features, and wherein a higher reconstruction error value signifies a lesser importance of the one or more input features.
Qin teaches ranking an importance of the one or more input features, based on the calculated reconstruction error value produced by the reconstruction model, wherein a lower reconstruction error value signifies a greater importance of the one or more input features, and wherein a higher reconstruction error value signifies a lesser importance of the one or more input features ([Qin, page 524, 1st paragraph] “Fine Retrieval. Given a query image q I and candidate pool P, an image of top-K is retrieved and sorted in candidate pool P using the features extracted from FC1. Let q V represent eigenvector of the query image q, Pi V represent the image ci I in the candidate pool P and the Euclidean distance on the corresponding eigenvector of the image i in q I and P Euclidean distance.             
                d
                i
                s
                
                    
                        t
                    
                    
                        t
                    
                
                =
                |
                |
                
                    
                        V
                    
                    
                        q
                    
                
                -
                
                    
                        V
                    
                    
                        i
                    
                    
                        P
                    
                
                |
                |
            
         The smaller the Euclidean distance, the more similar the images are. Each is sorted by similarity, so an image of top-K is retrieved”, [Qin, page 527, 1st paragraph] “The retrieval performance has also been evaluated using mean average precision. Fig. 9 and 10 show the retrieved images, of query image from normal and nodule classes respectively. The retrieval results are shown in a ranked order Top-10, where the most relevant image found after feature comparison is presented first. The retrieved results demonstrate the interclass variance”).
Before the effective filing date of the invention to a person of ordinary skill in the art, it would have been obvious, having the teachings of Wojna, Jordan, Do, Chen and Qin to use the ranking the reconstruction error of Qin to implement the method of identifying feature importance in deep learning of Wojna, Jordan, Do and Chen. The suggestion and/or motivation for doing so is to measure and pick the most important feature among the data.

Regarding claim 8, Wojna in view of Jordan, in view of Do, in view of Chen, and further in view of Qin teaches a computer system for identifying feature importance in deep learning models, comprising: one or more processors, one or more computer-readable memories, one or more computer-readable tangible storage media, and program instructions stored on at least one of the one or more computer-readable tangible storage media for execution by at least one of the one or more processors via at least one of the one or more computer-readable memories, wherein the computer system is capable of performing a method ([Wojna, 3 Tasks and Experimental Setup] Tasks and Experimental Setup performed by Wojna suggests that the entire experiment process were performed by generic computer which inherently includes processor, memory performs computer processes). Claim 8 is a computer program product claim having similar limitation to method claim 1. Therefore, rejected using same rationale as claim 1 above.

Regarding claim 15, Wojna in view of Jordan, in view of Do, in view of Chen, and further in view of Qin teaches a computer program product for identifying feature importance in deep learning models, comprising: one or more computer-readable tangible storage media and program instructions stored on at least one of the one or more computer-readable tangible storage media, the program instructions executable by a processor to cause the processor to perform a method ([Wojna, 3 Tasks and Experimental Setup] Tasks and Experimental Setup performed by Wojna suggests that the entire experiment process were performed by generic computer which inherently includes processor, memory performs computer processes). Claim 15 is a computer program product claim having similar limitation to system claim 8. Therefore, rejected using same rationale as claim 8 above.

Regarding claim 2, Wojna teaches further comprising: receiving two datasets; transforming the two datasets into two sets of spatial images ([Wojna, Depth To Space, page 4; Figure 4] “Depth to Space operation [30] (also called subpixel convolution) shifts the feature channels into the spatial domain as illustrated in Figure 4 … To be comparable with other upsampling layers which have learnable parameters, before depth to space transformation we are applying a convolution with four times more output channels than for other upsampling layers”, the process of transforming the datasets into spatial images, is the process of placing data into spatial domain. Wojna discloses the process of feature channels into the spatial domain); and training the prediction model using the two sets of spatial images ([Wojna, Depth To Space, page 4, and Figure 4] “Depth to Space operation [30] (also called subpixel convolution) shifts the feature channels into the spatial domain as illustrated in Figure 4 … To be comparable with other upsampling layers which have learnable parameters, before depth to space transformation we are applying a convolution with four times more output channels than for other upsampling layers”, Wojna discloses the process of converting data into spatial domain to process it through the encoder (i.e. prediction model) ).
Wojna does not specifically discloses receiving two datasets.
Wojna in view of Jordan, in view of Do, in view of Chen, and further in view of Qin teaches further comprising: receiving two datasets ([Jordan, page 13, Denoising autoencoders] “Another approach towards developing a generalizable model is to slightly corrupt the input data but still maintain the uncorrupted data as our target output”, Jordan receives two datasets, input and target, to develop generalizable model. The claim language does not explicitly disclose how the invention uses two datasets).
It would have been obvious to a person of ordinary skill in art before the effective filling date of the claimed invention to implement the function of Jordan into the method of Wojna to have wherein receiving the two datasets. The modification would have been obvious because one of the ordinary skills of the art would implement the function of Jordan wherein receiving two datasets as it is well known in the art to use two datasets which are input and target data as suggested by Jordan. The motivation to do so is to increase accuracy of prediction output from unsupervised learning algorithm.
Claim 9 is a system claim having similar limitation to method claim 2. Therefore, rejected using same rationale as claim 2 above.
Claim 16 is a computer program product claim having similar limitation to method claim 2. Therefore, rejected using same rationale as claim 2 above.

Regarding claim 4, Wojna in view of Jordan, in view of Do, in view of Chen, and further in view of Qin teaches wherein the plurality of features are identified using a reconstruction error value ([Chen, 0059] “In some implementations, to generate identity-preserving face images, a feature reconstruction loss function may be used to train the third sub-network 330 so as to encourage the synthesized image x′ and the image x.sup.s … In some implementations, assuming that f.sub.C(x) represents the feature of the image x extracted from at least one layer of the fourth sub-network 340, the feature reconstruction loss function may be constructed to measure the difference between the feature extracted from the synthesized image x.sup.s and the feature extracted from the original image x′ ”).
Claim 11 is a system claim having similar limitation to method claim 4. Therefore, rejected using same rationale as claim 4 above.
Claim 17 is a computer program product claim having similar limitation to method claim 4. Therefore, rejected using same rationale as claim 4 above.

Regarding claim 5, Wojna in view of Jordan, in view of Do, in view of Chen, and further in view of Qin teaches wherein the prediction model includes a prediction model encoder and a prediction model decoder ([Wojna, 2 Decoder Architecture, line 1 and Figure 1] “Dense problems which require per pixel predictions are typically addressed with an encoder-decoder architecture (see Figure 1). First, a feature extractor downsamples the spatial resolution (usually by a factor 8-32) while increasing the number of channels. Afterward, a ‘decoder’ upsamples the representation back to the original input size”), wherein the prediction model encoder includes one or more inception blocks and the prediction model decoder includes one or more transposed convolutional layers.
Claim 12 is a system claim having similar limitation to method claim 5. Therefore, rejected using same rationale as claim 5 above.
Claim 19 is a computer program product claim having similar limitation to method claim 5. Therefore, rejected using same rationale as claim 5 above.

Regarding claim 6, Wojna in view of Jordan, in view of Do, in view of Chen, and further in view of Qin teaches wherein the two datasets include an input dataset and a target dataset ([Jordan, page 13, Denoising autoencoders] “Another approach towards developing a generalizable model is to slightly corrupt the input data but still maintain the uncorrupted data as our target output”, Jordan receives two datasets, corrupted input and target, and compare the output of the corrupted input and target output, to develop generalizable model. The claim language does not explicitly disclose how the invention uses input and target data).
Claim 13 is a system claim having similar limitation to method claim 6. Therefore, rejected using same rationale as claim 6 above.
Claim 20 is a computer program product claim having similar limitation to method claim 6. Therefore, rejected using same rationale as claim 6 above.

The claim 7 and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Wojna (Wojna et al, 2017, “The Devil is in the Decoder”) in view of Jordan (Jordan, 03/2018, "Introduction to autoencoders"), in view of Do (Do et al, 2016, “Learning to Hash with Binary Deep Neural Network”), in view of Chen (US 20200334867 A1), in view of Qin (Qin et al, 2018, “Convolutional Neural Networks and Hash Learning for Feature Extraction and of Fast Retrieval of Pulmonary Nodules”), and further in view of Suwardi (Suwardi et al, 2015, “Geohash Index Based Spatial Data Model for Corporate”).

Regarding claim 7, Regarding claim 7, Wojna in view of Jordan teaches the method of claim 2, wherein a spatial embedding is used to transform the dataset into the spatial image ([Wojna, Depth To Space, page 4; Figure 4] “Depth to Space operation [30] (also called subpixel convolution) shifts the feature channels into the spatial domain as illustrated in Figure 4 … ”, the process of transforming the datasets into spatial images, is the process of placing data into spatial domain. Wojna discloses the process of feature channels into the spatial domain), wherein the spatial image allow the dataset to conform to a convolutional encoder-decoder ([Wojna, page 10, 4 Results; Figure 4] “For depth prediction, all layers except bilinear upsampling have good performance, whereas adding skip layers to these results in equal performance except for depth-to-space, where it slightly lowers performance.”, dataset converted to spatial to process through depth predictor (i.e. encoder-decoder)). Wojna failed to teach two datasets. 
However, Wojna in view of Jordan, in view of Do, in view of Chen, and further in view of Qin teaches two datasets ([Jordan, page 13, Denoising autoencoders] “Another approach towards developing a generalizable model is to slightly corrupt the input data but still maintain the uncorrupted data as our target output”, Jordan receives two datasets, corrupted input and target, and compare the output of the corrupted input and target output, to develop generalizable model. The claim language does not explicitly disclose how the invention uses two datasets). 
Wojna in view of Jordan, in view of Do, in view of Chen, and further in view of Qin failed to teach wherein the spatial embedding uses geohashes as image pixels.
Suwardi teaches wherein the spatial embedding uses geohashes as image pixels ([Suwardi, page 4-5, V. Geohash Spatial Index; Figure 5] “In order to map the coordinate data into single shorter string, we use the geocoding algorithm called Geohash [10] … for example, the decimal coordinates is -45.995 -41.728, we can sub-dividing the space until we get the more detail level”, Suwardi discloses the process of dividing map into plurality of divisions (pixels) and embedding geohash values to a map).
It would have been obvious to a person of ordinary skill in art before the effective filling date of the claimed invention to implement the function of Suwardi into the method of Wojna and Jordan to have wherein using the Geohash as suggested by Suwardi. The motivation for doing so would be mapping geometric information easily which is a benefit of using Geohash as suggested by Suwardi.
	

Response to Argument
Applicant’s arguments filed 3/10/2022 have been fully considered but they are not persuasive.
Applicant’s argument that the 101 rejection is inapplicable to the claims because the invention because it may be confusing if the tangible part is different or not. It is still unclear if applicant intends to claim something broader than e.g., RAM. ROM, CD-ROM, disks, etc. and cover signals, carrier waves and other forms of transmission media. Therefore, the claim is not limited to statutory subject matter and thus non-statutory.
Applicant’s arguments with 35 U.S.C. 103 prior arts respect to claim(s) 1-2, 4-9, 11-17, 19-20 have been considered but are moot because the new ground of rejection does not rely on reference applied in the prior rejection of record.

Conclusion
THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JUN KWON whose telephone number is (571)272-2072. The examiner can normally be reached on 7:30 AM - 5:30 PM. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Abdullah Kawsar can be reached on (571)270-3169. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. 
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free).

/JUN KWON/
Examiner, Art Unit 2127
/LUIS A SITIRICHE/Primary Examiner, Art Unit 2126