DETAILED ACTION
This action is in response to communications filed on 11/01/2018 in which claims 1-20 are still
pending.
This action is non-final.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Priority
Applicant’s claim for the benefit of a prior-filed U.S. Provisional Application 62/553,177 filed on September 1, 2017., which is acknowledged.

Drawings
The drawings were received on 08/31/2018.  These drawings are acceptable.

Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.


As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that use the word “means” or “step” but are nonetheless not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph because the claim limitation(s) recite(s) sufficient structure, materials, or acts to entirely perform the recited function.  Such claim limitation(s) is/are: listed below, where the generic placeholder is in bold and functional language italicized 
Claim 1:
an aggregator combining the plurality of indications based on the weighting vector to provide a weighted combination of the plurality of indications, wherein the weighted combination corresponds to a reconstruction of the input data; and 
an error module determining an error between the reconstruction and the input data, wherein the controller is responsive to the error to adjust the weighting vector to reduce the error.
Specification details the following as supported structures and flow chart for performing claimed functions:
depicts the aggregator and error module in Fig. 2 and Fig. 3
pg. 13: “…In the claims hereof, any element expressed as a means for performing a specified function is intended to encompass any way of performing that function 
Because this/these claim limitation(s) is/are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are not being interpreted to cover only the corresponding structure, material, or acts described in the specification as performing the claimed function, and equivalents thereof.
If applicant intends to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to remove the structure, materials, or acts that performs the claimed function; or (2) present a sufficient showing that the claim limitation(s) does/do not recite sufficient structure, materials, or acts to perform the claimed function.


Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.




Step 1: This part of the eligibility analysis evaluates whether the claim falls within any statutory category. MPEP 2106.03: 
According to the first part of the Alice analysis, in the instant case, the claims were determined to be directed to one of the four statutory categories: an article of manufacture (claim 20), a method/process (claims 10-19) , a machine/system/product (claims 1-11), and a composition of matter (none). Based on the claims being determined to be within of the four categories (i.e., process, machine, manufacture, or composition of matter), (Step 1).

Independent claims 1, 10, and 20:
Step 2A Prong One:  This part of the eligibility analysis evaluates whether the claim recites a judicial exception. 
Regarding independent claim 1 the claim recites a judicial exception: Mental process, see MPEP 2106.04(a)(2)(II), as analyzed below:
…processing the input dataset to each produce a respective one of a plurality of indications of an association of each of the plurality of data points …; (Mental process of making observation as noting input data for making judgements for making indications of observed data points through evolutions for associating each indication with each observation of data points in the set of 3 data points)
… provide a weighting vector including a plurality of weighting values each associated with a respective one of the plurality of indications; (Mental process of making evaluations and judgements to provide a set of weights in vector associated with the plurality of judgment indicators through evaluation of the weighting values and their association with the produced indicators from the evolutions process)
… combining the plurality of indications based on the weighting vector to provide a weighted combination of the plurality of indications, wherein the weighted combination corresponds to a reconstruction of the input data; (Mental process of making evaluations and judgements to combining determined indications based on an evaluations using weighing values as vectors to reconstruct the input data of 3 points into a reconstructed order or evaluated opinion)
and … determining an error between the reconstruction and the input data, … is responsive to the error to adjust the weighting vector to reduce the error. (Mental process of making evaluations and judgements to provide an outcome that is evaluated and determined to reduce a noted error between the observations and evaluated opinion)
The limitations as analyzed include concepts directed to the “mental process” groupings of abstract ideas; the "mental processes" abstract idea grouping is defined as concepts performed in the human mind, and examples of mental processes include observations, evaluations, judgments, and opinions (see MPEP § 2106.04(a)(2), subsection III). Thus, limitations noted above also fall into the “mental process” groupings of abstract ideas.
Regarding claims 10 and 20:
Claim 10 limitations:
	processing an input dataset including a plurality of data points in each of a plurality of autoencoders, wherein each autoencoder produces a respective one of a plurality of indications of an association between each of the data points and one of a plurality of structures; producing a weighting vector responsive to the input dataset, wherein the weighting vector includes a plurality of weighting values and each of the plurality of weighting values is associated with a respective one of the plurality of indications; (Mental process of making observation as noting input data for making judgements for making indications of observed data points through evolutions for associating each indication with each observation of data points in the set of 3 data points and evaluating indications for producing weighing judgements)
combining the plurality of indications produced by the plurality of autoencoders based on the weighting vector to provide a weighted combination of the plurality of indications, wherein the weighted combination corresponds to a reconstruction of the input dataset; (Mental process of making evaluations and judgements to combining determined indications based on an evaluations using weighing values as vectors to reconstruct the input data of 3 points into a reconstructed order or evaluated opinion)
determining an error between the reconstruction and the input dataset; and adjusting the weighting vector to reduce the error(Mental process of making evaluations and judgements to provide an outcome that is evaluated and determined to reduce a noted error between the observations and evaluated opinion)
The limitations as analyzed include concepts directed to the “mental process” groupings of abstract ideas; the "mental processes" abstract idea grouping is defined as concepts performed in the human mind, and examples of mental processes include observations, evaluations, judgments, and opinions (see MPEP § 2106.04(a)(2), subsection III). Thus, limitations noted above also fall into the “mental process” groupings of abstract ideas.

Claim 20 limitations are similar to those rejected in claim 10 and are therefore rejected under the same rationale.

Step 2A Prong Two: This part of the eligibility analysis evaluates whether the claim as a whole integrates the recited judicial exception into a practical application of the exception.
is not integrated into practical application. In particular, the claim recites the following addition limitations analyzed below:

The preamble is deemed insufficient to transform the judicial exception to a patentable invention to a patentable invention because the recited components (i.e. apparatus,  non-transitory computer-readable medium comprising instructions which, when executed by a computer, cause the computer to carry out a method) are recited at a high level of generality that they represent no more than mere instructions to apply the judicial exception in a technology/computing environment, see MPEP 2106.05(f) or as nothing more than an attempt to generally link the use of the judicial exception to the technology environment of a computer, see MPEP 2106.05(h). The MPEP discloses that limitations that amount to merely indicating a field of use or technological environment in which to apply a judicial exception do not amount to significantly more than the exception itself, and cannot integrate a judicial exception into a practical application. See MPEP 2106.06.)
a plurality of autoencoders, each of the plurality of autoencoders receiving an input dataset including a plurality of data points and each of the plurality of autoencoders processing the input data… with a respective one of a plurality of structures; plurality of autoencoders, wherein each autoencoder produces a respective one of a plurality of indications …(Deemed insufficient to transform the judicial exception to a patentable invention to a patentable invention because the recited components (i.e. a plurality of autoencoders, plurality of structures) are recited at a high level of generality that they represent no more than mere instructions to apply the judicial exception in a technology/computing environment, see MPEP 2106.05(f) or as nothing more than an attempt to generally link the use of the judicial exception to the technology environment of a computer, see MPEP 2106.05(h))
a controller responsive to the input dataset to provide … (Deemed insufficient to transform the judicial exception to a patentable invention to a patentable invention because the recited components (i.e. controller) are recited at a high level of generality that they represent no more than mere instructions to apply the judicial exception in a technology/computing environment, see MPEP 2106.05(f) or as nothing more than an attempt to generally link the use of the judicial exception to the technology environment of a computer, see MPEP 2106.05(h)
an aggregator combining … (Deemed insufficient to transform the judicial exception to a patentable invention to a patentable invention because the recited components (i.e. aggregator) are recited at a high level of generality that they represent no more than mere instructions to apply the judicial exception in a technology/computing environment, see MPEP 2106.05(f) or as nothing more than an attempt to generally link the use of the judicial exception to the technology environment of a computer, see MPEP 2106.05(h)
…an error module… wherein the controller is responsive … (Deemed insufficient to transform the judicial exception to a patentable invention to a patentable invention because the recited components (i.e. error module and controller) are recited at a high level of generality that they represent no more than mere instructions to apply the judicial exception in a technology/computing environment, see MPEP 2106.05(f) or as nothing more than an attempt to generally link the use of the judicial exception to the technology environment of a computer, see MPEP 2106.05(h)).
When viewed in combination or as a whole, the recited additional elements do no more than automate the mental process including evaluate and make judgements on dataset observations, as recited in the judicial exception, using the computer components/technology environment as a tool.
Thus, independent claims 1, 10, and 20 are directed to an abstract idea.

Step 2B:  This part of the eligibility analysis evaluates whether the claim as a whole amounts to significantly more than the recited exception, i.e., whether any additional element, or combination of additional elements, adds an inventive concept to the claim. MPEP 2106.05.
Regarding independent claims 1, 10, and 20, the limitations do not include additional elements that are sufficient to amount to significantly more that the judicial exception, as discussed above. 
The additional elements include the recitation are recited at a high level of generality that they represent no more than mere instructions to apply the judicial exception in a technology/computing environment, see MPEP 2106.05(f); or as nothing more than an attempt to generally link the use of the judicial exception to the technology environment of a computer, see MPEP 2106.05(h), as noted in Step 2A Prong Two. Mere instructions to apply an exception using generic computer components or generally linking the exception to a field of use cannot provide an inventive concept. 
Thus, the independent claims 1, 10, and 20, as examined, individually and as an ordered combination (e.g. as a whole) do not recite what have the courts have identified as "significantly more”.

Furthermore, regarding dependent claims 2-9 and 11-19 which are dependent on claims 1 and 10 respectively, the claims are further directed to a judicial exception (i.e. an abstract idea enumerated in the 2019 PEG, a law of nature, or a natural phenomenon) without significantly more as highlighted below in the claim limitations by evaluating the claim limitations under the Step2A and 2B:

wherein the input dataset represents a mixture of a plurality of features, each of the plurality of features including at least one of the plurality of structures; and wherein each of the plurality of autoencoders is configured to produce a respective one of the plurality of indications to indicate an association of one or more of the data points of the input dataset with a respective one of the plurality of structures. (Mental process for producing indications as judgements associated with observations in the data)
Incorporates abstract idea recited in claim 1.
The recitation additional limitations are directed to because the recitation simply link the judicial exception to a field of use (e.g. input dataset represents a mixture of a plurality of features, each of the plurality of features including at least one of the plurality of structures; the plurality of autoencoders is configured to produce a respective… indications … with a respective one of the plurality of structures) and/or technology environment, see MPEP 2106.05(h); limitations directed to field of use cannot integrate a judicial exception into a practical application at Step 2A or provide an inventive concept in Step 2B.

Regarding claim 3 limitation(s): wherein the controller comprises a machine learning network responsive to the input dataset and the error to learn the association of the plurality of data points of the input dataset with the plurality of structures and to process a second dataset including a second plurality of data points to assign each of the second plurality of data points to one of a plurality of data clusters representing a sparse representation of the second dataset. (Mental process for producing indications as judgements associated with observations in the data and make evaluations and judgements regarding data associations with observed structures)
Incorporates abstract idea recited in claim 1.
The recitation additional limitations are directed to because the recitation simply link the judicial exception to a field of use (e.g. controller comprises a machine learning network; the second plurality of data points to one of a plurality of data clusters representing a sparse representation of the second dataset) and/or technology environment, see MPEP 2106.05(h); limitations directed to field of use cannot integrate a judicial exception into a practical application at Step 2A or provide an inventive concept in Step 2B.
Limitation a machine learning network responsive to the input dataset and the error to learn… Deemed mere instructions to apply the judicial exception in a field of use, See MPEP 2106.05(f); limitations directed to mere instructions to apply an exception on a generic computer cannot integrate a judicial exception into a practical application at Step 2A or provide an inventive concept in Step 2B.

Regarding claim 4 limitation(s): wherein the machine learning network comprises a convolutional neural network.
Incorporates abstract idea recited in claim 3.
The recitation additional limitations are directed to because the recitation simply link the judicial exception to a field of use (e.g. wherein the machine learning network comprises a convolutional neural network.) and/or technology environment, see MPEP 2106.05(h); limitations directed to field of use cannot integrate a judicial exception into a practical application at Step 2A or provide an inventive concept in Step 2B.

Regarding claim 5 limitation(s): wherein the machine learning network comprises a deep learning network with a softmax output.
Incorporates abstract idea recited in claim 3.
The recitation additional limitations are directed to because the recitation simply link the judicial exception to a field of use (e.g. the machine learning network comprises a deep learning network with a softmax output.) and/or technology environment, see MPEP 2106.05(h); limitations directed to field of use cannot integrate a judicial exception into a practical application at Step 2A or provide an inventive concept in Step 2B.
Regarding claim 6 limitation(s): wherein each of the plurality of weighting values of the weighting vector corresponds to a weight to be applied to a respective one of the plurality of indications produced by the plurality of autoencoders. (Mental process for making evaluations for considering weights for a collection of observation models to consider in a plurality of model observations as a collective set)
Incorporates abstract idea recited in claim 3.
The recitation additional limitations are directed to because the recitation simply recite mere instructions to apply the judicial exception in a field of use (e.g. each of the plurality of weighting values … to be applied to a respective one of the plurality of indications produced by the plurality of autoencoders), See MPEP 2106.05(f); limitations directed to mere instructions to apply an exception on a generic computer cannot integrate a judicial exception into a practical application at Step 2A or provide an inventive concept in Step 2B.

Regarding claim 7 limitation(s): wherein one of the plurality of weighting values of the weighting vector has a value of one and the other ones of the plurality of weighting values have values of zero, thereby providing a one-hot vector as the weighting vector. (Mental process for making evaluations for considering weights for a collection of observations using values of zero)
Incorporates abstract idea recited in claim 1.

Regarding claim 8 limitation(s): wherein the apparatus is operative to provide functions comprising semi-supervised classification, representation learning, and unsupervised clustering.
Incorporates abstract idea recited in claim 1.
The recitation additional limitations are directed to because the recitation simply link the judicial exception to a field of use (e.g. the apparatus is operative to provide functions comprising semi-supervised classification, representation learning, and unsupervised clustering.) and/or technology environment, see MPEP 2106.05(h); limitations directed to field of use cannot integrate a judicial exception into a practical application at Step 2A or provide an inventive concept in Step 2B.

Regarding claim 9 limitation(s): further comprising a processor processing information to be provided to a user responsive to the sparse representation to adapt the information provided to the user. (Mental process for evaluations and judgements used to adapt information made as observations)
Incorporates abstract idea recited in claim 1.
The recitation additional limitations are directed to because the recitation simply link the judicial exception to a field of use (e.g. the apparatus is operative to provide functions comprising semi-supervised classification, representation learning, and unsupervised clustering.
Limitation comprising a processor processing information to be provided to a user responsive to … provided to the user. Deemed additional limitations are directed to because the recitation simply links the judicial exception to a field of use (e.g. a processor processing information to be provided to a user responsive to … provided to the user.) and/or technology environment, see MPEP 2106.05(h); limitations directed to field of use cannot integrate a judicial exception into a practical application at Step 2A or provide an inventive concept in Step 2B.

Regarding claim 13 limitation(s): wherein producing the weighting vector, learning the association and processing the second dataset occur in a machine learning network. (Mental process for evaluations and judgements used to adapt information made as observations of weighing vectors)
Incorporates abstract idea recited in claim 10.
The recitation additional limitations are directed to because the recitation simply link the judicial exception to a field of use (e.g. learning the association and processing the second dataset occur in a machine learning network) and/or technology environment, see MPEP 2106.05(h); limitations directed to field of use cannot integrate a judicial exception into a practical application at Step 2A or provide an inventive concept in Step 2B.

Regarding claims 11-12, and 14-19, the limitations are similar to claims 2-9 and are therefore rejected under the same rationale.
Regarding dependent claims 2-9 and 11-19, the limitations do not include additional elements that are sufficient to amount to significantly more that the judicial exception, as discussed above. 


Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claims 1-3, 6, 8-13, 16 and 18-20 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Miotto et al. (US Pub. No. 2020/0327404, hereinafter ‘Ric’).

Regarding independent claim 1, Ric teaches an apparatus comprising:
a plurality of autoencoders, each of the plurality of autoencoders receiving an input dataset including a plurality of data points and each of the plurality of autoencoders processing the input dataset to each produce a respective one of a plurality of indications of an association of each of the plurality of data points with a respective one of a plurality of structures; (Ric teaches as depicted in Fig. 1 and Figs. 6A-B, in 0025-0033: … The plurality of sparse vectors is applied to a network architecture that includes a plurality of denoising autoen­coders and a post processor engine. The plurality of denois­ing autoencoders [i.e. a plurality of autoencoders, each of the plurality of autoencoders receiving an input dataset including a plurality of data points and each of the plurality of autoencoders processing the input dataset] includes an initial denoising autoencoder and a final denoising autoencoder. Responsive to a respec­tive sparse vector in the plurality of sparse vectors, the initial denoising autoencoder receives as input the elements in the respective sparse vector. Each respective denoising autoen­coder, other than the final denoising autoencoder, feeds intermediate values, as an instance of a function of (i) a weight coefficient matrix and bias vector associated with the respective denoising autoencoder and (ii) input values received by the respective denoising autoencoder, into another denoising autoencoder in the plurality of denoising autoencoders…a network architecture 64 that includes a plural­ity of denoising autoencoders, each respective denois­ing autoencoders 66 in the plurality of denoising auto­encoders having input values 68, a function 70, and output values 72 [i.e. each produce a respective one of a plurality of indications of an association of each of the plurality of data points with a respective one of a plurality of structures]… )
a controller responsive to the input dataset to provide a weighting vector including a plurality of weighting values each associated with a respective one of the plurality of indications; (Ric teaches in 0062: … To accomplish this, the sparse vector 60 representation of the test entity is obtained and run through the network archi­tecture 64, each denoising autoencoder 66 of which now has its weight coefficient matrix W [i.e. a controller responsive to the input dataset to provide a weighting vector including a plurality of weighting values each associated with a respective one of the plurality of indications] and bias vector b trained from the initial plurality of entities. This results in a dense vector corresponding to the test entity which can be applied to the trained post processor engine to predict a future change in a value for the feature in a test entity. Examiner notes controller as instructions for performing claimed functions, in 0035: In some implementations, one or more of the above identified data elements or modules of the analysis computer system 100 are stored in one or more of the previously disclosed memory devices, and correspond to a set of instructions for performing a function described above. The above identified data, modules or programs (e.g., sets of instructions) need not be implemented as separate software programs, procedures or modules, and thus various subsets of these modules may be combined or otherwise re-arranged in various implementations [i.e. a controller responsive to the input dataset] …; and claim 1)
an aggregator combining the plurality of indications based on the weighting vector to provide a weighted combination of the plurality of indications, wherein the weighted combination corresponds to a reconstruction of the input data; and (Ric teaches the instructions for claimed combining process as depicted in Figs 2C-D, and in 0057:  …The latent representation  
    PNG
    media_image1.png
    37
    23
    media_image1.png
    Greyscale
 is then mapped back (with a decoder) to a reconstructed vector 
    PNG
    media_image2.png
    32
    91
    media_image2.png
    Greyscale
. Referring to element 242 of FIG. 2C, in some embodiments, the recon- structed vector 
    PNG
    media_image3.png
    34
    33
    media_image3.png
    Greyscale
 has the form: 
    PNG
    media_image4.png
    37
    203
    media_image4.png
    Greyscale
…Accordingly, responsive to a respective sparse vector in the plurality of sparse vectors, the initial denoising autoencoder in the network architecture 64 receives as input the elements in the respective sparse vector. Each respective denoising autoencoder 66, other than the final denoising autoencoder, feeds intermediate values, as a function of (i) [i.e. an aggregator combining the plurality of indications based on the weighting vector to provide a weighted combination of the plurality of indications, wherein the weighted combination corresponds to a reconstruction of the input data]… the weight coefficient matrix 
    PNG
    media_image5.png
    36
    240
    media_image5.png
    Greyscale
associated with the respective denoising autoencoder [i.e. the plurality of indications based on the weighting vector to provide a weighted combination of the plurality of indications, wherein the weighted combination corresponds to a reconstruction of the input data] and (ii) input values received by the respective denoising autoencoder, into another denoising autoencoder 66 in the plurality of denoising autoencoders . In some embodiments, this function is 
    PNG
    media_image6.png
    54
    216
    media_image6.png
    Greyscale

    PNG
    media_image7.png
    30
    507
    media_image7.png
    Greyscale
expectation is that the code 
    PNG
    media_image8.png
    27
    22
    media_image8.png
    Greyscale
 is a distributed representation that captures the coordinates along the main factors of variation in the data. Accordingly, responsive to a respective sparse vec­tor in the plurality of sparse an aggregator combining the plurality of indications based on the weighting vector to provide a weighted combination of the plurality of indications, wherein the weighted combination corresponds to a reconstruction of the input data], as a function of (i)  the weight coefficient matrix W and bias vector b associated with the respective denoising autoencoder and (ii) input values received by the respective denoising autoencoder, into another denoising autoencoder 66 in the plurality of denoising autoencoders…; And using claimed reconstriction of the input data (autoencoder reconstruction and use of  encoding function as wherein the weighted combination corresponds to a reconstruction of the input data), in 0011…: 
    PNG
    media_image9.png
    265
    514
    media_image9.png
    Greyscale
)
an error module determining an error between the reconstruction and the input data, (Ric teaches uses loss function for determining an error between the reconstruction and the input data, as 
    PNG
    media_image9.png
    265
    514
    media_image9.png
    Greyscale
 )
wherein the controller is responsive to the error to adjust the weighting vector to reduce the error. (in 0060-0062: 
    PNG
    media_image10.png
    132
    501
    media_image10.png
    Greyscale
 over the input sparse vectors 60, which constitute a training set, to minimize the average reconstruction error [i.e. wherein the controller is responsive to the error to adjust the weighting vector to reduce the error], …. The trained post processor engine can then be used to predict a future change in a value for the feature in a test entity. To accomplish this, the sparse vector 60 representation of the test entity is obtained and run through the network archi­tecture 64, each denoising autoencoder 66 of which now has its weight coefficient matrix 
    PNG
    media_image11.png
    27
    251
    media_image11.png
    Greyscale
 [i.e. wherein the controller is responsive to the error to adjust the weighting vector to reduce the error.] trained from the initial plurality of entities.)
Examiner notes controller, aggregator, and error module as instructions for performing respective claimed functions, in 0028-0035: Turning to FIG. 1 with the foregoing in mind, an analysis computer system 100 comprises one or more pro­cessing units (CPU's) 74 ,  a network or other communica­tions interface 84, a user interface ( e.g., including a display 82 and keyboard 80 or other form of input device) a memory 92 (e.g., random access memory), one or more magnetic disk storage [i.e. controller], one or more… In other words, some data stored in memory 92 and/or memory 90 may in fact be hosted on computers that are external to analysis computer system 100 but that can be electronically accessed by the analysis computer system over an Internet, intranet, or other form of network or electronic cable using network interface 84…: In some implementations, one or more of the above identified data elements or modules of the analysis computer system 100 are stored in one or more of the previously disclosed memory devices, and correspond to a set of instructions [i.e. a controller responsive to the input dataset; and an aggregator combining the plurality of indications; and an error module] for performing a function described above. The above identified data, modules or programs (e.g., sets of instructions) need not be implemented as separate software programs, procedures or modules, and thus various subsets of these modules may be combined or otherwise re-arranged in various implementations [i.e. a controller responsive to the input dataset; and an aggregator combining the plurality of indications; and an error module] …; and claim 1)

Regarding claim 2, the rejection of claim 1 is incorporated and Ric further teaches the apparatus of claim 1 wherein the controller comprises a machine learning network responsive to the input dataset and the error to learn the association of the plurality of data points of the input dataset with the plurality of structures and to process a second dataset including a second plurality of data points to assign each of the second plurality of data points to one of a plurality of data clusters representing a sparse representation of the second dataset.
 ( Ric teaches as depicted in Figs. 2C-D and, in 0060-0062: 
    PNG
    media_image10.png
    132
    501
    media_image10.png
    Greyscale
 over the input sparse vectors 60, which constitute a training set, to minimize the average reconstruction error [i.e. wherein the controller comprises a machine learning network responsive to the input dataset and the error to learn the association of the plurality of data points of the input dataset with the plurality of structures],… Referring to element 252 of FIG. 2D, in some embodiments, optimization is carried out by mini-batch stochastic gradient descent, which iterates through small subsets of the training patients and modifies the parameters in the opposite direction of the gradient of the loss function to minimize the reconstruction error.;  …Referring to element 254 of FIG. 2E, the plurality of dense vectors is provided to a post processor engine 68. Each dense vector corresponds to an entity 58 with some known features. Thus, the plurality of dense vectors can be used to train the post processor engine 68 to predict a future change in a value for a feature, or combination of features. The trained post processor engine can then be used to predict a future change in a value for the feature in a test entity. To accomplish this, the sparse vector 60 representation [i.e. and to process a second dataset including a second plurality of data points to assign each of the second plurality of data points to one of a plurality of data clusters representing a sparse representation of the second dataset] of the test entity is obtained and run through the network architecture 64, each denoising autoencoder 66 of …In some embodiments, the future change in the value for the feature in a test entity is the onset of a predetermined disease or other clinical indication in a predetermined time frame (e.g., the next three months, the next six months, the next year, etc.) [Ric teaches as depicted in Fig. 3A-B clusters associated with a plurality of predetermined diseases in a time frame as including claimed second plurality of data points to assign each of the second plurality of data points to one of a plurality of data clusters representing a sparse representation of the second dataset]. Examples of predetermined diseases include, but are not limited to, the diseases listed in FIG. 3. In such embodiments, the value is binary and changes, for instance, from zero (does not exhibit the disease) to one (exhibits the disease) ...).)
	

	Regarding claim 3, the rejection of claim 1 is incorporated and Ric further teaches the apparatus of claim 1, wherein the controller comprises a machine learning network responsive to the input dataset and the error to learn the association of the plurality of data points of the input dataset with the plurality of structures and to process a second dataset including a second plurality of data points to assign each of the second plurality of data points to one of a plurality of data clusters representing a sparse representation of the second dataset. (Ric teaches as depicted in Figs. 2C-D and, in 0060-0062: 
    PNG
    media_image10.png
    132
    501
    media_image10.png
    Greyscale
 over the input sparse vectors 60, which constitute a training set, to minimize the average reconstruction error [i.e. wherein the controller comprises a machine learning network responsive to the input dataset and the error to learn the association of the plurality of data points of the input dataset with the plurality of structures s],… Referring to element 252 of FIG. 2D, in some embodiments, optimization is carried out by mini-batch stochastic gradient descent, which iterates through small subsets of the training patients and modifies the parameters in the opposite direction of the gradient of the loss function to minimize the reconstruction error.;  …Referring to element 254 of FIG. 2E, the plurality of dense vectors is provided to a post processor engine 68. Each dense vector corresponds to an entity 58 with some known features. Thus, the plurality of dense vectors can be used to train the post processor engine 68 to predict a future change in a value for a feature, or combination of features. The trained post processor engine can then be used to predict a future change in a value for the feature in a test entity. To accomplish this, the sparse vector 60 representation [i.e. and to process a second dataset including a second plurality of data points to assign each of the second plurality of data points to one of a plurality of data clusters representing a sparse representation of the second dataset] of the test entity is obtained and run through the network architecture 64, each denoising autoencoder 66 of …In some embodiments, the future change in the value for the feature in a test entity is the onset of a predetermined disease or other clinical indication in a predetermined time frame (e.g., the next three months, the next six months, the next year, etc.) [Ric teaches as depicted in Fig. 3A-B clusters associated with a plurality of predetermined diseases in a time frame as including claimed a second dataset including a second plurality of data points to assign each of the second plurality of data points to one of a plurality of data clusters representing a sparse representation of the second dataset]. Examples of predetermined diseases include, but are not limited to, the diseases listed in FIG. 3. In such embodiments, the value is binary and changes, for instance, from zero (does not exhibit the disease) to one (exhibits the disease) ...)

Regarding claim 6, the rejection of claim 1 is incorporated and Ric further teaches the apparatus of claim 1, wherein each of the plurality of weighting values of the weighting vector corresponds to a weight to be applied to a respective one of the plurality of indications produced by the plurality of autoencoders. (Ric teaches as depicted in Figs. 2B-C; and 
    PNG
    media_image12.png
    32
    31
    media_image12.png
    Greyscale
[i.e. each of the plurality of weighting values of the weighting vector corresponds to a weight to be applied to a respective one of the plurality of indications produced by the plurality of autoencoders.] in 0057
    PNG
    media_image13.png
    623
    522
    media_image13.png
    Greyscale
 )

Regarding claim 8, the rejection of claim 1 is incorporated and Ric further teaches the apparatus of claim 1, wherein the apparatus is operative to provide functions comprising semi-supervised classification, representation learning, and unsupervised clustering. (in 0078: Referring to FIG. 1, a deep neural network archi­tecture 64 comprising a stack of denoising autoencoders 66 was used to process EHRs in an unsupervised manner that captured stable structures and regular patterns in the data, which, grouped together, compose the deep patient repre­sentation. Deep patient is domain free (i.e., not related to any specific task), does not require any additional human effort, and can be easily applied to different predictive applications, both supervised and unsupervised [i.e. wherein the apparatus is operative to provide functions comprising semi-supervised classification, representation learning, and unsupervised clustering ]; And in 0069: Referring to element 256 of FIG. 2E, for purposes of training the post processor engine 68, in some embodi­ments the post processor engine 68 subjects the plurality of dense vectors to a random forest classifier, a decision tree, a multiple additive regression tree, a clustering algorithm [i.e. representation learning, and unsupervised clustering], a principal component analysis, a nearest neighbor analysis [i.e. representation learning], a linear discriminant analysis, a quadratic discriminant analysis, a support vector machine [i.e. representation learning] , an evolutionary method, a projection pursuit, or ensembles thereof…; And in 0098: K-means groups unlabeled data into k clusters [i.e. representation learning, and unsupervised clustering], in such a way that each data point belongs to the cluster with the closest mean. In feature learning, the centroids of the cluster are used to produce features, i.e., each feature value is the distance of the data point from the corresponding cluster centroid.)

Regarding claim 9, the rejection of claim 1 is incorporated and Ric further teaches the apparatus of claim 1, further comprising a processor processing information to be provided to a user responsive to the sparse representation to adapt the information provided to the user. (Ric teaches in 0052: Referring to FIG. 2A at 218, in some embodiments, each respective entity in the plurality of entities is a respec­tive human subject, and an element in each sparse vector 60 in the plurality of sparse vectors represents a presence or absence of a diagnosis [i.e. a processor processing information to be provided to a user responsive to the sparse representation to adapt the information provided to the user], where the diagnosis is one of a plurality of general disease definitions ( e.g., between 50 and 150 disease definitions) that is identified by the ICD code in the medical record. Such embodiments are advantageous because different codes can refer to the same disease. Thus, in one specific embodiment, ICD codes in medical records are mapped to the codes in a disease categorization structure which groups ICD-9s into a vocabulary of general disease definitions [i.e. a processor processing information to be provided to a user responsive to the sparse representation to adapt the information provided to the user] … Accordingly, in some embodiments, each sparse vector 60 includes an element for each of the diseased provided in FIG. 3. Referring to element 220 of FIG. 2B, in some embodiments, each respective entity 58 in the plurality of entities is a respective human subject [i.e. a processor processing information to be provided to a user responsive to the sparse representation to adapt the information provided to the user]. Further, each respec­tive human subject is associated with one or more medical records. An element in a first sparse vector 60 in the plurality of sparse vectors corresponds to a free text clinical note in a medical record of the human subject corresponding to the first sparse vector [i.e. a processor processing information to be provided to a user responsive to the sparse representation to adapt the information provided to the user].)
Examiner notes process including instructions for performing respective claimed functions, in 0028-0035: Turning to FIG. 1 with the foregoing in mind, an analysis computer system 100 comprises one or more pro­cessing units (CPU's) 74  [i.e. a processor processing information], a network or other communica­tions interface 84, a user interface ( e.g., including a display 82 and keyboard 80 or other form of input device) a memory 92 (e.g., random access memory), one or more magnetic disk storage and/or persistent devices 90 optionally accessed by one or more controllers 88, one or more… In other words, some data stored in memory 92 and/or memory 90 may in fact be hosted on computers that are external to analysis computer system 100 but that can be electronically accessed by the analysis computer system over an Internet, intranet, or other form of network or electronic cable using network interface 84… In some implementations, one or more of the above identified data elements or modules of the analysis computer system 100 [i.e. a processor processing information] are stored in one or more of the previously disclosed memory devices, and correspond to a set of instructions for performing a function described above. The above identified data, modules or programs (e.g., sets of instructions) need not be implemented as separate software programs, procedures or modules, and thus various subsets of these modules may be combined or otherwise re-arranged in various implementations …)

Regarding independent claim 10, Ric teaches a method comprising:
processing an input dataset including a plurality of data points in each of a plurality of autoencoders, wherein each autoencoder produces a respective one of a plurality of indications of an association between each of the data points and one of a plurality of structures; (Ric teaches as depicted in Fig. 1 and Figs. 6A-B, in 0025-0033: … The plurality of sparse vectors is applied to a network architecture that includes a plurality of denoising autoen­coders and a post processor engine. The plurality of denois­ing autoencoders [i.e. processing an input dataset including a plurality of data points in each of a plurality of autoencoders, wherein each autoencoder produces a respective one of a plurality of indications of an association between each of the data points and one of a plurality of structures] includes an initial denoising autoencoder and a final denoising autoencoder. Responsive to a respec­tive sparse vector in the plurality of sparse vectors, the initial denoising autoencoder receives as input the elements in the respective sparse vector. Each respective denoising autoen­coder, other than the final denoising autoencoder, feeds intermediate values, as an instance of a function of (i) a weight coefficient matrix and bias vector associated with the respective denoising autoencoder and (ii) input values received by the respective denoising autoencoder, into another denoising autoencoder in the plurality of denoising autoencoders…a network architecture 64 that includes a plural­ity of denoising autoencoders, each respective denois­ing autoencoders 66 in the plurality of denoising auto­encoders having input values 68, a function 70, and output values 72 [i.e. wherein each autoencoder produces a respective one of a plurality of indications of an association between each of the data points and one of a plurality of structures]… )
producing a weighting vector responsive to the input dataset, wherein the weighting vector includes a plurality of weighting values and each of the plurality of weighting values is associated with a respective one of the plurality of indications; (Ric teaches in 0062: … To accomplish this, the sparse vector 60 representation of the test entity is obtained and run through the network archi­tecture 64, each denoising autoencoder 66 of which now has its weight coefficient matrix W [i.e. producing a weighting vector responsive to the input dataset, wherein the weighting vector includes a plurality of weighting values and each of the plurality of weighting values is associated with a respective one of the plurality of indications] and bias vector b trained from the initial plurality of entities [i.e. producing a weighting vector responsive to the input dataset]. This results in a dense vector corresponding to the test entity which can be applied to the trained post processor engine to predict a future change in a value for the feature in a test entity)
combining the plurality of indications produced by the plurality of autoencoders based on the weighting vector to provide a weighted combination of the plurality of indications, wherein the weighted combination corresponds to a reconstruction of the input dataset; (Ric teaches the instructions for claimed combining process as depicted in Figs 2C-D, and in 0057:  …The latent representation  
    PNG
    media_image1.png
    37
    23
    media_image1.png
    Greyscale
 is then mapped back (with a decoder) to a reconstructed vector 
    PNG
    media_image2.png
    32
    91
    media_image2.png
    Greyscale
. Referring to element 242 of FIG. 2C, in some embodiments, the recon- structed vector 
    PNG
    media_image3.png
    34
    33
    media_image3.png
    Greyscale
 has the form: 
    PNG
    media_image4.png
    37
    203
    media_image4.png
    Greyscale
…Accordingly, responsive to a respective sparse vector in the plurality of sparse vectors, the initial denoising autoencoder in the network architecture 64 receives as input the elements in the respective sparse vector. Each respective denoising autoencoder 66, other than the final denoising autoencoder, feeds intermediate values, as a function of (i) [i.e. combining the plurality of indications produced by the plurality of autoencoders based on the weighting vector to provide a weighted combination of the plurality of indications, wherein the weighted combination corresponds to a reconstruction of the input dataset]… the weight coefficient matrix 
    PNG
    media_image5.png
    36
    240
    media_image5.png
    Greyscale
associated with the respective denoising autoencoder [i.e. combining the plurality of indications produced by the plurality of autoencoders based on the weighting vector to provide a weighted combination of the plurality of indications, wherein the weighted combination corresponds to a reconstruction of the input dataset] and (ii) input values received by the respective denoising autoencoder, into another denoising autoencoder 66 in the plurality of denoising autoencoders . In some embodiments, this function is 
    PNG
    media_image6.png
    54
    216
    media_image6.png
    Greyscale

    PNG
    media_image7.png
    30
    507
    media_image7.png
    Greyscale
expectation is that the code 
    PNG
    media_image8.png
    27
    22
    media_image8.png
    Greyscale
 is a distributed representation that captures the coordinates along the main factors of variation in the data. Accordingly, responsive to a respective sparse vec­tor in the plurality of sparse vectors, the initial denoising autoencoder in the network architecture 64 receives as input the elements in the respective sparse vector. Each respective denoising autoencoder 66, other than the final denoising autoencoder, feeds intermediate values  [i.e. combining the plurality of indications produced by the plurality of autoencoders based on the weighting vector to provide a weighted combination of the plurality of indications, wherein the weighted combination corresponds to a reconstruction of the input dataset], as a function of (i)  the weight coefficient matrix W and bias vector b associated with the respective denoising autoencoder and (ii) input values received by the respective denoising autoencoder, into another denoising autoencoder 66 in the plurality of denoising autoencoders…; And using claimed reconstriction of the input data (autoencoder reconstruction and use of  encoding function as wherein the weighted combination corresponds to a reconstruction of the input data), in 0011…: 
    PNG
    media_image9.png
    265
    514
    media_image9.png
    Greyscale
)
determining an error between the reconstruction and the input dataset; (Ric teaches uses loss function for determining an error between the reconstruction and the input dataset; as depicted in Figs. 2C-D and in 0011: …
    PNG
    media_image9.png
    265
    514
    media_image9.png
    Greyscale
 )
adjusting the weighting vector to reduce the error. error. (in 0060-0062: 
    PNG
    media_image10.png
    132
    501
    media_image10.png
    Greyscale
 over the input sparse vectors 60, which constitute a training set, to minimize the average reconstruction error [i.e. adjusting the weighting vector to reduce the error], …. The trained post processor engine can then be used to predict a future change in a value for the feature in a test entity. To accomplish this, the sparse vector 60 representation of the test entity is obtained and run through the network archi­tecture 64, each denoising autoencoder 66 of which now has its weight coefficient matrix 
    PNG
    media_image11.png
    27
    251
    media_image11.png
    Greyscale
 [i.e. adjusting the weighting vector to reduce the error.] trained from the initial plurality of entities.)

Regarding claims 11-12, Ric teaches a method having similar limitations to claims 2-3 limitations respectively and are rejected under the same rationale. 

Regarding claim 13, the rejection of claim 10 is incorporated and Ric further teaches the method of claim 10, wherein producing the weighting vector, learning the association and processing the second dataset occur in a machine learning network. (Ric teaches as depicted in Figs. 2C-D and, in 0060-0062: ….,… Referring to element 252 of FIG. 2D, in some embodiments, optimization is carried out by mini-batch stochastic gradient descent, which iterates through small subsets of the training patients and modifies the parameters in the opposite direction of the gradient of the loss function to minimize the reconstruction error.;  …Referring to element 254 of FIG. 2E, the plurality of dense vectors is provided to a post processor engine 68. Each dense vector corresponds to an entity 58 with some known features. Thus, the plurality of dense vectors can be used to train the post processor engine 68 to predict a future change in a value for a feature, or combination of features. The trained post processor engine can then be used to predict a future change in a value for the feature in a test entity. To accomplish this, the sparse vector 60 representation [i.e. and to process a second dataset including a second plurality of data points to assign each of the second plurality of data points to one of a plurality of data clusters representing a sparse representation of the second dataset] of the test entity is obtained and run through the network architecture 64 [i.e. wherein producing the weighting vector, learning the association and processing the second dataset occur in a machine learning network], each denoising autoencoder 66 [i.e. wherein producing the weighting vector, learning the association and processing the second dataset occur in a machine learning network] of …In some embodiments, the future change in the value for the feature in a test entity is the onset of a predetermined disease or other clinical indication in a predetermined time frame (e.g., the next three months, the next six months, the next year, etc.)]. Ric teaches as depicted in Figs. 2B-C; and 
    PNG
    media_image12.png
    32
    31
    media_image12.png
    Greyscale
 [i.e. wherein producing the weighting vector, learning the association and processing the second dataset occur in a machine learning network, in  
    PNG
    media_image13.png
    623
    522
    media_image13.png
    Greyscale
)

Regarding claims 16 and 18-19, Ric teaches a method having similar limitations to claims 6 and 8-9 limitations respectively and are rejected under the same rationale.

Regarding independent claim 20, Ric teaches a non-transitory computer-readable medium comprising instructions which, when executed by a computer, cause the computer to carry out a method comprising: in 0028-0035: Turning to FIG. 1 with the foregoing in mind, an analysis computer ,  a network or other communica­tions interface 84, a user interface ( e.g., including a display 82 and keyboard 80 or other form of input device) a memory 92 (e.g., random access memory), one or more magnetic disk storage and/or persistent devices 90 [i.e. a non-transitory computer-readable medium comprising instructions which, when executed by a computer, cause the computer to carry out a method] optionally accessed by one or more controllers 88, one or more… In other words, some data stored in memory 92 and/or memory 90 may in fact be hosted on computers that are external to analysis computer system 100 but that can be electronically accessed by the analysis computer system over an Internet, intranet, or other form of network or electronic cable using network interface 84…: In some implementations, one or more of the above identified data elements or modules of the analysis computer system 100 are stored in one or more of the previously disclosed memory devices, and correspond to a set of instructions for performing a function described above. The above identified data, modules or programs (e.g., sets of instructions) need not be implemented as separate software programs, procedures or modules, and thus various subsets of these modules may be combined or otherwise re-arranged in various implementations…; and claim 1) 
Claim 20 limitations are similar to claim 10 limitations and are rejected under the same rationale.

	
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the 

Claims 4-5, 7, 14-15 and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Miotto et al. (US Pub. No. 2020/0327404, hereinafter ‘Ric’) in view of Shrikumar et al. (US Pub. No. 2017/0249547, hereinafter ‘Ava’).	
	Regarding claim 4, the rejection of claim 3 is incorporated. While Ric teaches the plurality of autoencoders as disclosed previously, as depicted in Figs. 2C-D, 3A-B, and in 0006: … In the present disclosure, the vectors are applied to a deep neural network, which is a stack of neural networks [i.e. wherein the machine learning network comprises a … neural network] in which the output of one neural network serves as the input to another of the neural networks. For instance, in some embodiments, the deep neural network comprises a plurality of denoising autoencoders. In such embodiments, each respective denois­ing autoencoder, other than the final denoising autoencoder, in this plurality of denoising autoencoders feeds intermedi­ate values as a function of (i) a weight coefficient matrix and bias vector associated with the respective autoencoder and (ii) input values received by the autoencoder, into another autoencoder. The final layer of the deep neural network outputs a dense vector, consisting of less than 1000 ele­ments, for each sparse vector inputted into the deep neural network thereby forming a plurality of dense vectors. A post processor engine is trained on the plurality of dense vectors. In this way, the post processor engine can be used for a variety of predictive applications (e.g., predicting a future change in a value for a feature for a test entity).
Ric does not expressly teach the user of a convolutional neural network as claimed wherein the machine learning network comprises a convolutional neural network. 
Ava does teach wherein the machine learning network comprises a convolutional neural network. (Ava teaches in 0149: Furthermore, unsupervised learning can also be used to aid clustering processes. An example of such unsu­pervised learning includes (but is not limited to) a convo­lutional autoencoder [i.e. wherein the machine learning network comprises a convolutional neural network] that learns a low-dimensional repre­sentations of the segments that may be easier to cluster, or a variational autoencoder on a vector of scores representing the strengths of the match of the segment to some pre­defined set of patterns (such a vector of scores can be obtained by methods that include but are not limited to the feature location identification processes described below). The autoencoders may involve regularization to encourage sparsity. In some embodiments, the objective function of a convolutional autoencoder [i.e. wherein the machine learning network comprises a convolutional neural network] can be modified to reward correct reconstruction of true segments and penalize correct recon­struction of segments identified randomly, thereby encour­aging the autoencoder to learn patterns that are unique to the true segments…; Examiner notes Ava also teaches the use of neural networks including controllers, in 0042: … Before discussing the specifics of the processes utilized to perform holistic feature extraction from neural networks, an overview of the computing platform and software architectures that can be utilized to implement holistic feature extraction systems in accordance with many embodiments of the invention will be provided. Neural network feature controller architectures, including software architectures that can be utilized in holistic feature extraction, are discussed below.)
The Ric and Ava references would have been recognized by those of ordinary skill in the art as useful for applicant’s purpose in developing information retrieval and processing system and methods using machine learning based on artificial neural networks.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of the prior art for learning uses convolutional neural networks as autoencoders for feature extractions and detections as disclosed by Ava with the method of 
One of ordinary skill in the arts would have been motivated to combine the disclosed methods of Ava and Ric in order “build neural networks in a computationally efficient manner that provide information regarding features of inputs that contribute to the ability of the neural network to generate the correct outputs”, (Ava, 0037); Doing so allows for developing neural network based learning techniques that “can extract similar information concerning important features within input data from
existing neural networks and can enable determinations of the importance of specific features with respect to generation of particular outputs. (Ava, 0037).

	
	Regarding claim 5, the rejection of claim 3 is incorporated and Ric in combination with Ava further teaches the apparatus of claim 1, wherein the machine learning network comprises a deep learning network with a … output. ()
	While Ric teaches the use of deep neural networks as disclosed, Ric does not expressly teach with a softmax output.
	Ava does expressly teach a softmax output. (Ava teaches the use of claimed, in 0051: … In the illustrated process, input quantities and (optionally) refer­ence input quantities as well as reference input can be received (502) by the neural network. The activation of neurons as well as reference activation are not pre-specified in the neural network can be calculated (504). In some embodiments, these activations can be calculated using a wide variety of activation functions including (but not limited to) identity, binary step, soft step, tan h, arctan, softsign, rectified linear unit (ReLU), leaky rectified linear unit, parameteric rectified linear unit, randomized leaky rectified linear unit, exponential linear unit, s-shaped recti­fied linear activation unit, adaptive piecewise linear, soft­plus, bent identity, softexponential, sinusoid, sine, gaussian, softmax [i.e. the machine learning network comprises a deep learning network with a softmax output], maxout, and/or a combination of activation func­tions.; And in 0103: In various embodiments, in the case of softmax [i.e. the machine learning network comprises a deep learning network with a softmax output] or sigmoid outputs, it may be preferred to compute contribu­tions to the linear layer preceding the final nonlinearity rather than the final nonlinearity itself…)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Ric and Ava n for the same reasons disclosed above.

	
Regarding claim 7, the rejection of claim 1 is incorporated. While Ric teaches the plurality of autoencoders as disclosed previously, as depicted in Figs. 2C-D, 3A-B, and in 0006: … In the present disclosure, the vectors are applied to a deep neural network, which is a stack of neural networks in which the output of one neural network serves as the input to another of the neural networks. For instance, in some embodiments, the deep neural network comprises a plurality of denoising autoencoders. In such embodiments, each respective denois­ing autoencoder, other than the final denoising autoencoder, in this plurality of denoising autoencoders feeds intermedi­ate values as a function of (i) a weight coefficient matrix [i.e. wherein one of the plurality of weighting values of the weighting vector has a value of one and the other ones of the plurality of weighting values] and bias vector associated with the respective autoencoder...	
Ric does not expressly teach the limitation wherein one of the plurality of weighting values of the weighting vector has a value of one and the other ones of the plurality of weighting values have values of zero, thereby providing a one-hot vector as the weighting vector.
Ava teaches the limitation wherein one of the plurality of weighting values of the weighting vector has a value of one and the other ones of the plurality of weighting values have values of zero, thereby providing a one-hot vector as the weighting vector. (Ava teaches in 0101: In many embodiments, y can be a neuron with some subset of inputs SY that are constrained such that 
    PNG
    media_image14.png
    46
    66
    media_image14.png
    Greyscale
 x=c (for example, one-hot encoded input satisfies the constraint 
    PNG
    media_image15.png
    32
    107
    media_image15.png
    Greyscale
, and a convolutional neuron operating on one-hot encoded [i.e. wherein one of the plurality of weighting values of the weighting vector has a value of one and the other ones of the plurality of weighting values have values of zero, thereby providing a one-hot vector as the weighting vector] channels has one constraint per channel that it sees). Let the weights from x to y be denoted wxy and let by be the bias of y. It is advisable to use normalized weights 
    PNG
    media_image16.png
    34
    446
    media_image16.png
    Greyscale
 is the mean over all wxy for which 
    PNG
    media_image17.png
    35
    59
    media_image17.png
    Greyscale
 This can maintain the output of the neural net because,… The nonnalization can be desirable because, for affine functions, the multipliers 
    PNG
    media_image18.png
    25
    81
    media_image18.png
    Greyscale
can be equal to the weights wxy and can thus be sensitive to μ. To take the example of a convolutional neuron operating on one-hot encoded rows[i.e. wherein one of the plurality of weighting values of the weighting vector has a value of one and the other ones of the plurality of weighting values have values of zero, thereby providing a one-hot vector as the weighting vector] : by mean-normalizing wxy for each channel in the filter, one can ensure that the contributions 
    PNG
    media_image19.png
    35
    76
    media_image19.png
    Greyscale
from some channels are not systematically overestimated or underestimated relative to the contributions from other channels, particularly in the case where a reference of all zeros [i.e. wherein one of the plurality of weighting values of the weighting vector has a value of one and the other ones of the plurality of weighting values have values of zero, thereby providing a one-hot vector as the weighting vector] is chosen.)
The Ric and Ava references would have been recognized by those of ordinary skill in the art as useful for applicant’s purpose in developing information retrieval and  processing system and methods using machine learning based on artificial neural networks.
 to combine the teachings of the prior art for learning uses convolutional neural networks as autoencoders for feature extractions and detections as disclosed by Ava with the method of information retrieving and processing using machine learning algorithms based on artificial neural networks as disclosed by Ric.
One of ordinary skill in the arts would have been motivated to combine the disclosed methods of Ava and Ric in order “build neural networks in a computationally efficient manner that provide information regarding features of inputs that contribute to the ability of the neural network to generate the correct outputs”, (Ava, 0037); Doing so allows for developing neural network based learning techniques that “can extract similar information concerning important features within input data from
existing neural networks and can enable determinations of the importance of specific features with respect to generation of particular outputs. (Ava, 0037).

Regarding claims 14-15 and 17, Ric in combination with Ava teaches a method having similar limitations to claims 4-5 and 7 limitations and are rejected under the same rationale. 


Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure listed below:
Fan et al. (NPL: “Low-level structure feature extraction for image processing via stacked sparse denoising autoencoder”): teaches the use of convolutional neural networks as autoencoders.


Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Michael Huntley can be reached on (303) 297-4307. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/OLUWATOSIN O ALABI/Examiner, Art Unit 2129