DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Status of the Claims
Claims 1, 5-8, 12, 14-17, and 19-20 are currently amended. Claims 4, 11, 13, 18, and 21 are canceled. Claims 22 and 23 are new. Claims 1-3, 5-10, 12, 14-17, 19-20, and 22-23 are pending and have been considered. 
Note: The status of claim 6 states “Original” but it should state “Currently amended”. Examiner is examining claim 6 as amended.

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 10/28/2021 has been entered.
 
Drawings
The drawings were received on 09/23/2021.  These drawings are acceptable.

Claim Objections
Claim 8 is objected to because of the following informalities: in the final line, the semicolon should be a period.  Appropriate correction is required.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 8-10, 12, and 14 are rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter.  The claims do not fall within at least one of the four categories of patent eligible subject matter because the claims recite software per se. 
Claim 8 recites an apparatus comprising a convolutional neural network, a first dataset, a first set of weights, a second dataset, a second set of weights, and a final set of weights are software elements without any physical or tangible form, so the claim recites software per se. See MPEP § 2106.03, subsection I:
“Non-limiting examples of claims that are not directed to any of the statutory categories include: Products that do not have a physical or tangible form, such as… a computer program per se (often referred to as "software per se") when claimed as a product without any structural recitations”

Claims 9-10, 12, and 14 are rejected for failing to cure the deficiencies of claim 8 upon which they depend. The claims are not eligible.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner 

Claims 1-3, 5-10, 12, 14-17, 19-20 and 22-23 are rejected under 35 U.S.C. 103 as being unpatentable over Yosinski et al. (“How transferable are features in deep neural networks?”) in view of O’Shea et al. (“An Introduction to Convolutional Neural Networks”), and further in view of El-Yaniv et al. (US 20170286830 A1).

Regarding CLAIM 1, Yosinski teaches: A method, comprising: 
at a training computer: (Yosinski, in Supplementary Material p. 1, ¶ 2, last line, teaches the networks were trained on a NVidia K20 GPU.)
	training a neural network on a first dataset to generate a first set of weights, the neural network including a plurality of layers; (A neural network is interpreted as the network baseA in Fig. 1 on p. 4. Yosinski states on p. 3, ¶ 2, lines 2-4: “We train one eight-layer convolutional network on A and another on B. These networks, which we call baseA and baseB, are shown in the top two rows of Figure 1.” A and B are image datasets (p. 3, ¶ 2, lines 1-2).)
	*identifying one or more fixed layers and two or more programmable layers of the neural network, the fixed layers having a fixed set of weights, the programmable layers having a programmable set of weights; (*Steps in a method claim may be reordered. The network in p. 4, Fig. 1, row 4 is called A3B and it has 3 frozen (“fixed”) layers and 5 unfrozen (“programmable”) layers. On p. 3, the second bullet point states: “A transfer network A3B: the first 3 layers are copied from baseA and frozen. The five higher layers (4–8) are initialized randomly…”)
	*training the neural network on a second dataset to generate a second set of weights; (*Steps in a method claim may be reordered. On p. 3, the second bullet point states: “A transfer network A3B: and trained toward dataset B.”)
at an inference computer: (Yosinski indicates that a GPU was used for inferencing the network on p. 6, in footnote 4. Processing is disclosed by the experimental results discussed on p. 5, § 4.1: “The results of all A/B transfer learning experiments on randomly split (i.e. similar) datasets are shown in Figure 2”. Lines 1-2 in the caption state: “Top: Each marker in the figure represents the average accuracy over the validation set for a trained network”)
Yosinski does not explicitly teach: training the neural network on the second dataset to generate a final set of weights including at least one of: quantizing at least a portion of the fixed set of weights, and pruning at least a portion of the fixed set of weights;
Yosinski also does not disclose which types of layers are used or which layers generate feature maps, so Yosinski does not explicitly teach: processing, by a first fixed layer of the neural network, input data to generate intermediate feature map data;
processing, by a first programmable layer of the neural network, the intermediate feature map data to generate output feature map data;
processing, by a second programmable layer of the neural network, the output feature map data to generate output data; and
outputting the output data.
But O’Shea teaches: processing, by a first fixed layer of the neural network, input data to generate intermediate feature map data; (O’Shea teaches on pages 4-5 that CNNs have four types of layers: input layer, convolutional layer, pooling layer, and fully-connected layer. Fig. 5 on p. 9 shows the various layers stacked in a common CNN architecture, and Fig. 5 is described at the top of p. 9. In Fig. 5, the input image is processed by the first two convolution layers and the pooling layer. The poling layer generates intermediate feature map data.)
processing, by a first programmable layer of the neural network, the intermediate feature map data to generate output feature map data; (O’Shea teaches on pages 4-5 that CNNs have four types of layers: input layer, convolutional layer, pooling layer, and fully-connected layer. Fig. 5 on p. 9 shows the various layers stacked in a common CNN architecture, and Fig. 5 is described at the top of p. 9. In Fig. 5, “intermediate feature map data” generated by the first pooling layer is input to a series of additional convolutional and pooling layers. The broadest reasonable interpretation of a “layer,” in light of the specification, is a group of nodes, so a number of consecutive layers can be interpreted as a single layer.)
processing, by a second programmable layer of the neural network, the output feature map data to generate output data; and (O’Shea teaches on pages 4-5 that CNNs have four types of layers: input layer, convolutional layer, pooling layer, and fully-connected layer. Fig. 5 on p. 9 shows the various layers stacked in a common CNN architecture, and Fig. 5 is described at the top of p. 9. In Fig. 5, the fully-connected layer generates output data.)
outputting the output data. (In Fig. 5, output data is output as a digit between 0 and 9.)
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have processed inferencing inputs using the following layers in Yosinski’s network: convolutional layers for layers 1 to 2, a pooling layer for layer 3, convolutional layers for layers 4 to 6, a pooling layer for layer 7, and a fully-connected layer for layer 8. The figure below shows Yosinski’s A3B network annotated with the layers taught by O’Shea.

    PNG
    media_image1.png
    231
    848
    media_image1.png
    Greyscale

(O’Shea p. 9, top paragraph) and second, splitting large convolutional layers up into many smaller sized convolutional layers reduces the amount of computational complexity within a given convolutional layer (O’Shea p. 9, second paragraph). The numbered list in O’Shea, end of p. 4 to top of p. 5 teaches that the convolutional layers will determine the output of neurons of which are connected to local regions of the input through the calculation of the scalar product between their weights and the region connected to the input volume; the pooling layer will then simply perform downsampling along the spatial dimensionality of the given input, further reducing the number of parameters within that activation; and the fully-connected layers will then perform the same duties found in standard ANNs and attempt to produce class scores from the activations, to be used for classification.
However, neither Yosinski nor O’Shea explicitly teaches: training the neural network on the second dataset to generate a final set of weights including at least one of: quantizing at least a portion of the fixed set of weights, and pruning at least a portion of the fixed set of weights;
But El-Yaniv teaches: training the neural network on the second dataset to generate a final set of weights including at least one of: quantizing at least a portion of the fixed set of weights, and (El-Yaniv at ¶ [0065], lines 1-2 and 6-13 states: “Reference is now made to an exemplary description of the training phase… A binarization function, referred to herein as Binarize( ), binarizes (e.g. stochastically or deterministically) a floating point value of a neuron (floating point activation values) and/or a connection weight value (floating point weights). This function may be replaced with a quantization function that outputs a finite set of outcomes based on a floating point value of a neuron and/or a connection weight value.” El-Yaniv at ¶ [0050], lines 2-3 discloses the neural network may be a convolutional neural network.)
pruning at least a portion of the fixed set of weights; (Examiner is not required to map prior art because this limitation is an alternative to “quantizing”.)
El-Yaniv is in the same field of endeavor as the claimed invention, namely training neural networks. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have applied El-Yaniv’s quantizing training technique to the fixed set of weights in Yosinski/O’Shea’s network. A motivation for binarizing neuron weights is to improve the network’s efficiency. (El-Yaniv ¶ [0038], lines 1-10)

	Regarding CLAIM 2, the combination of Yosinski, O’Shea, and El-Yaniv teaches: The method as claimed in claim 1, 
Yosinski teaches: further comprising identifying similarities of the first set of weights and the second set of weights. (The weight values for the first three layers are identical in both the first and second sets of weights. On p. 3, the second bullet point states: “A transfer network A3B: the first 3 layers are copied from baseA and frozen. The five higher layers (4–8) are initialized randomly…”)

	Regarding CLAIM 3, the combination of Yosinski, O’Shea, and El-Yaniv teaches: The method as claimed in claim 1, 
Yosinski teaches: further comprising determining that the first dataset is the same domain as the second dataset. (Yosinski states that Dataset A and Dataset B contain images on p. 3, ¶ 2: “To create tasks A and B, we randomly split the 1000 ImageNet classes into two groups each containing 500 classes and approximately half of the data, or about 645,000 examples each. We train one eight-layer convolutional network on A and another on B.”)

CLAIM 5, the combination of Yosinski, O’Shea, and El-Yaniv teaches: The method as claimed in claim 1, where:
Further, Yosinski teaches: the neural network is a convolutional neural network (CNN); and (P. 3, ¶ 2: “We train one eight-layer convolutional network on A and another on B. These networks, which we call baseA and baseB, are shown in the top two rows of Figure 1.)
	However, neither Yosinski nor El-Yaniv explicitly teach: the first fixed layer is a convolutional layer.
	But O’Shea teaches: the first fixed layer is a convolutional layer. (O’Shea teaches on pages 4-5 that CNNs have four types of layers: input layer, convolutional layer, pooling layer, and fully-connected layer. Fig. 5 on p. 9 shows the various layers stacked in a common CNN architecture, and Fig. 5 is described at the top of p. 9. In Fig. 5, the input image is processed by a first convolutional layer.)

Regarding CLAIM 6, the combination of Yosinski, O’Shea, and El-Yaniv teaches: The method as claimed in claim 5, where:
	However, neither Yosinski nor El-Yaniv teaches: the first programmable layer is a convolutional layer; and the second programmable layer is a fully-connected layer.
	But O’Shea teaches: the first programmable layer is a convolutional layer; and (O’Shea teaches on pages 4-5 that CNNs have four types of layers: input layer, convolutional layer, pooling layer, and fully-connected layer. Fig. 5 on p. 9 shows the various layers stacked in a common CNN architecture, and Fig. 5 is described at the top of p. 9. The broadest reasonable interpretation of a “layer,” in light of the specification, is a group of nodes, so consecutive layers 4-7 can be interpreted as a single layer.)
the second programmable layer is a fully-connected layer. (O’Shea teaches on pages 4-5 that CNNs have four types of layers: input layer, convolutional layer, pooling layer, and fully-connected layer. In Fig. 5, the fully-connected layer generates output data.)

	Regarding CLAIM 7, the combination of Yosinski, O’Shea, and El-Yaniv teaches: The method as claimed in claim 1, 
Yosinski teaches: further comprising: identifying selected layers of the neural network having particular connectivity properties; and (On p. 3, the second bullet point states for the network A3B (Fig. 1, row 4), the first 3 layers are copied from baseA and frozen, while the five higher layers (4–8) are initialized randomly and trained toward dataset B. Layers 4-8 have the properties of having unfrozen weights.)
updating the identified selected layers of the neural network. (The network is trained with stochastic gradient descent (SGD), according to p. 1 of Yosinski’s Supplementary Material, § A, ¶ 2. SGD involves updating weights.)

Regarding CLAIM 8, Yosinski teaches: An apparatus, comprising: a convolutional neural network including at least a first layer, a second layer and a third layer, (The network in p. 4, Fig. 1, row 4 is called A3B and it has 3 frozen (“fixed”) layers and 5 unfrozen (“programmable”) layers. See p. 3, second bullet point for details on A3B. The broadest reasonable interpretation of a “layer,” in light of the specification, is a group of nodes, so a number of consecutive layers can be interpreted as a single layer. Therefore, Examiner interprets the claimed “first layer” as comprising A3B’s layers 1 to 4, the claimed “second layer” as comprising A3B’s layers 5 to 7, and the claimed “third layer” as comprising A3B’s layer 8. Below, Yosinski’s A3B network has been annotated to show how the network’s layers correspond to the claimed layers.)

    PNG
    media_image2.png
    269
    748
    media_image2.png
    Greyscale

the first layer including a fixed… layer portion (Claimed “first layer” includes frozen layer portion, A3B layers 1 to 3 from p. 4, Fig. 1, row 4.)
the first layer including a programmable… layer portion (Claimed “first layer” includes unfrozen layer portion, A3B layer 4 from p. 4, Fig. 1, row 4.)
the second layer including a programmable… layer (Claimed “second layer” includes unfrozen weights from p. 4, Fig. 1, row 4.)
the third layer including at least one programmable… layer (Claimed “third layer” includes unfrozen weights from p. 4, Fig. 1, row 4.)
a first dataset that is used to train the convolutional neural network; (Dataset A. Yosinski states on p. 3, ¶ 2, lines 2-4: “We train one eight-layer convolutional network on A and another on B. These networks, which we call baseA and baseB, are shown in the top two rows of Figure 1.” A and B are image datasets (p. 3, ¶ 2, lines 1-2).)
a first set of weights associated with the layers of the convolutional neural network, the first set of weights generated based on the training of the convolutional neural network on the first dataset; (The broadest reasonable interpretation of “associated,” in light of the specification, is that A3B’s layer correspond to baseA’s layers, which have a first set of weights after training on dataset A. The claim limitations have been met. See p. 3, ¶ 2, lines 2-4 and p. 3, second bullet point for evidence.)
a second dataset that is used to train the convolutional neural network; (Dataset B. On p. 3, the second bullet point states: “A transfer network A3B: the first 3 layers are copied from baseA and frozen. The five higher layers (4–8) are initialized randomly and trained toward dataset B”)
a second set of weights associated with the layers of the convolutional neural network, the second set of weights generated based on the training of the convolutional neural network on the second dataset; (Broadly interpreted as any given set of weights generated during  the training described at p. 3, second bullet point.)
a final set of weights associated with the layers of the convolutional neural network, the fixed convolutional layer portion having a fixed set of weights, the programmable convolutional layer portion, the programmable convolutional layer and the programmable fully-connected layer having a programmable set of weights, (Final set of weights upon completing the training described at p. 3, second bullet point. The claimed “first layer” has frozen weights, and the claimed “first layer,” “second layer,” and “third layer” have unfrozen weights.)
Yosinski does not disclose which types of layers are used or which layers generate feature maps, so Yosinski does not explicitly teach: 
the first layer including a fixed convolutional layer portion generating one or more intermediate maps and a programmable convolutional layer portion generating one or more concatenate maps, 
the second layer including a programmable convolutional layer configured to receive the one or more intermediate maps and the one or more concatenate maps and generate one or more output feature maps,
the third layer including at least one programmable fully-connected layer configured to receive the output feature maps and generate output data;
But O’Shea teaches: 
 convolutional layer portion generating one or more intermediate maps and a programmable convolutional layer portion generating one or more concatenate maps, (The broadest reasonable interpretation of “intermediate maps” and “concatenate maps,” in light of specification ¶ [0047] - [0048], is the output of a convolution layer or a pooling layer.
O’Shea teaches on pages 4-5 that CNNs have four types of layers: input layer, convolutional layer, pooling layer, and fully-connected layer. Fig. 5 on p. 9 shows the various layers stacked in a common CNN architecture, and Fig. 5 is described at the top of p. 9. In Fig. 5, the input image is processed by the first two convolution layers, a pooling layer, and another convolution layer. The output of the first four layers is an intermediate map or a concatenate map, especially since the pooling layer concatenates the data from the first two convolutional layers.)
the second layer including a programmable convolutional layer configured to receive the one or more intermediate maps and the one or more concatenate maps and generate one or more output feature maps, (O’Shea teaches on pages 4-5 that CNNs have four types of layers: input layer, convolutional layer, pooling layer, and fully-connected layer. Fig. 5 on p. 9 shows the various layers stacked in a common CNN architecture, and Fig. 5 is described at the top of p. 9. In Fig. 5, “intermediate feature map” generated by the third convolutional layer is input to a series of additional convolutional and pooling layers.)
the third layer including at least one programmable fully-connected layer configured to receive the output feature maps and generate output data; (O’Shea teaches on pages 4-5 that CNNs have four types of layers: input layer, convolutional layer, pooling layer, and fully-connected layer. Fig. 5 on p. 9 shows the various layers stacked in a common CNN architecture, and Fig. 5 is described at the top of p. 9. In Fig. 5, the fully-connected layer generates output data.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have processed inferencing inputs using the following layers in Yosinski’s 
A motivation for the combination is that this architecture using stacked layers improves the network in several ways. First, stacking multiple convolutional layers allows for more complex features of the input vector to be selected (O’Shea p. 9, top paragraph) and second, splitting large convolutional layers up into many smaller sized convolutional layers reduces the amount of computational complexity within a given convolutional layer (O’Shea p. 9, second paragraph). The numbered list in O’Shea, end of p. 4 to top of p. 5 teaches that the convolutional layers will determine the output of neurons of which are connected to local regions of the input through the calculation of the scalar product between their weights and the region connected to the input volume; the pooling layer will then simply perform downsampling along the spatial dimensionality of the given input, further reducing the number of parameters within that activation; and the fully-connected layers will then perform the same duties found in standard ANNs and attempt to produce class scores from the activations, to be used for classification.
However, neither Yosinski nor O’Shea explicitly teaches: the final set of weights generated based on additional Att'y Dkt: P05152US.family-6- Application Number: 16/054,35818.ARM.32Rule 116 AmendmentPATENTtraining of the convolutional neural network on the second dataset, the additional training including at least one of: quantizing at least a portion of the fixed set of weights, and pruning at least a portion of the fixed set of weights;
But El-Yaniv teaches: the final set of weights generated based on additional Att'y Dkt: P05152US.family-6- Application Number: 16/054,35818.ARM.32Rule 116 AmendmentPATENTtraining of the convolutional neural network on the second dataset, the additional training including at least one of: quantizing at least a portion of the fixed set of weights, and (El-Yaniv at ¶ [0065], lines 1-2 and 6-13 states: “Reference is now made to an exemplary description of the training phase… A binarization a connection weight value (floating point weights). This function may be replaced with a quantization function that outputs a finite set of outcomes based on a floating point value of a neuron and/or a connection weight value.” El-Yaniv at ¶ [0050], lines 2-3 discloses the neural network may be a convolutional neural network.)
pruning at least a portion of the fixed set of weights; (Examiner is not required to map prior art because this limitation is an alternative to “quantizing”.)
El-Yaniv is in the same field of endeavor as the claimed invention, namely training neural networks. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have applied El-Yaniv’s quantizing training technique to the fixed set of weights in Yosinski/O’Shea’s network. A motivation for binarizing neuron weights is to improve the network’s efficiency. (El-Yaniv ¶ [0038], lines 1-10)

	Regarding CLAIM 9, the combination of Yosinski, O’Shea, and El-Yaniv teaches: The apparatus as claimed in claim 8, 
Yosinski teaches: where the first set of weights and the second set of weights have identified similarities. (The weight values for the first three layers are identical in both the first and second sets of weights. On p. 3, the second bullet point states: “A transfer network A3B: the first 3 layers are copied from baseA and frozen. The five higher layers (4–8) are initialized randomly…”)

	Regarding CLAIM 10, the combination of Yosinski, O’Shea, and El-Yaniv teaches: The apparatus as claimed in claim 8, 
Yosinski teaches: where the first dataset is the same domain as the second dataset. (Yosinski states that Dataset A and Dataset B contain images on p. 3, ¶ 2: “To create tasks A and B, we randomly 

Regarding CLAIM 12, the combination of Yosinski, O’Shea, and El-Yaniv teaches: The apparatus as claimed in claim 8, 
Yosinski teaches: where the fixed convolutional layer portion, the programmable convolutional layer portion, the programmable convolutional layer and the programmable fully-connected layer are identified during training. (The broadest reasonable interpretation of “identifying” layers and layer portions during training, in light of specification ¶ [0084] - [0085] is that the layers and layer portions are identified as being either frozen (“fixed”) or unfrozen (“programmable”). The network identifies these portions as discussed on p. 3, in the second bullet point: “A transfer network A3B: the first 3 layers are copied from baseA and frozen. The five higher layers (4–8) are initialized randomly and trained toward dataset B.”)

Regarding CLAIM 14, the combination of Yosinski, O’Shea, and El-Yaniv teaches: The apparatus as claimed in claim 8, 
Yosinski teaches: where selected layers of the convolutional neural network have particular connectivity properties. (On p. 3, the second bullet point states for the network A3B (Fig. 1, row 4), the first 3 layers are copied from baseA and frozen, while the five higher layers (4–8) are initialized randomly and trained toward dataset B. Layers 4-8 have the properties of having unfrozen weights.)

Regarding CLAIM 15, Yosinski teaches: A system, comprising: 
a training computer including a memory and a processor, coupled to the memory, that executes instructions stored in the memory, the instructions comprising: (Yosinski, in Supplementary Material p. 1, ¶ 2, last line, teaches the networks were trained on a NVidia K20 GPU. One of ordinary skill in the art at the filing date of the claimed invention would understand that the experimental results (p. 4, § 4) are evidence of a memory storing instructions.)
training a neural network on a first dataset to generate a first set of weights, the neural network including a plurality of layers; (A neural network is interpreted as the network baseA in Fig. 1 on p. 4. Yosinski states on p. 3, ¶ 2, lines 2-4: “We train one eight-layer convolutional network on A and another on B. These networks, which we call baseA and baseB, are shown in the top two rows of Figure 1.” A and B are image datasets (p. 3, ¶ 2, lines 1-2).)
*identifying one or more fixed layers and two or more programmable layers of the neural network, the fixed layers having a fixed set of weights, the programmable layers having a programmable set of weights; (*The claim limitations do not preclude this instruction steps from happening before the “training” limitation below. The network in p. 4, Fig. 1, row 4 is called A3B and it has 3 frozen (“fixed”) layers and 5 unfrozen (“programmable”) layers. On p. 3, the second bullet point states: “A transfer network A3B: the first 3 layers are copied from baseA and frozen. The five higher layers (4–8) are initialized randomly…”)
	*training the neural network on a second dataset to generate a second set of weights; (*The claim limitations do not preclude this instruction steps from happening before the “identifying” limitation above. On p. 3, the second bullet point states: “A transfer network A3B: the first 3 layers are copied from baseA and frozen. The five higher layers (4–8) are initialized randomly and trained toward dataset B.”)
an inference computer including a memory and a processor, coupled to the memory, that executes instructions stored in the memory, the instructions comprising: (Yosinski indicates that a GPU 
Yosinski does not explicitly teach: training the neural network on the second dataset to generate a final set of weights including at least one of: quantizing at least a portion of the fixed set of weights, and pruning at least a portion of the fixed set of weights;
Yosinski also does not disclose which types of layers are used or which layers generate feature maps. Yosinski does not explicitly teach: processing, by a first fixed layer of the neural network, input data to generate intermediate feature map data; (O’Shea teaches on pages 4-5 that CNNs have four types of layers: input layer, convolutional layer, pooling layer, and fully-connected layer. Fig. 5 on p. 9 shows the various layers stacked in a common CNN architecture, and Fig. 5 is described at the top of p. 9. In Fig. 5, the input image is processed by the first two convolution layers and the pooling layer. The poling layer generates intermediate feature map data.)
processing, by a first programmable layer of the neural network, the intermediate feature map data to generate output feature map data; (O’Shea teaches on pages 4-5 that CNNs have four types of layers: input layer, convolutional layer, pooling layer, and fully-connected layer. Fig. 5 on p. 9 shows the various layers stacked in a common CNN architecture, and Fig. 5 is described at the top of p. 9. In Fig. 5, “intermediate feature map data” generated by the first pooling layer is input to a series of additional convolutional and pooling layers. The broadest reasonable interpretation of a “layer,” in light of the specification, is a group of nodes, so a number of consecutive layers can be interpreted as a single layer.)
processing, by a second programmable layer of the neural network, the output feature map data to generate output data; and (O’Shea teaches on pages 4-5 that CNNs have four types of layers: input layer, convolutional layer, pooling layer, and fully-connected layer. Fig. 5 on p. 9 shows the various In Fig. 5, the fully-connected layer generates output data.)
outputting the output data. (In Fig. 5, output data is output as a digit between 0 and 9.)
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have processed inferencing inputs using the following layers in Yosinski’s network: convolutional layers for layers 1 and 2, a pooling layer for layer 3, convolutional layers for layers 4 to 6, a pooling layer for layer 7, and a fully-connected layer for layer 8. The figure below shows Yosinski’s A3B network annotated with the layers taught by O’Shea. 

    PNG
    media_image1.png
    231
    848
    media_image1.png
    Greyscale

A motivation for the combination is that this architecture using stacked layers improves the network in several ways. First, stacking multiple convolutional layers allows for more complex features of the input vector to be selected (O’Shea p. 9, top paragraph) and second, splitting large convolutional layers up into many smaller sized convolutional layers reduces the amount of computational complexity within a given convolutional layer (O’Shea p. 9, second paragraph). The numbered list in O’Shea, end of p. 4 to top of p. 5 teaches that the convolutional layers will determine the output of neurons of which are connected to local regions of the input through the calculation of the scalar product between their weights and the region connected to the input volume; the pooling layer will then simply perform downsampling along the spatial dimensionality of the given input, further reducing the number of parameters within that activation; and the fully-connected layers will then perform the same duties 
However, neither Yosinski nor O’Shea explicitly teaches: training the neural network on the second dataset to generate a final set of weights including at least one of: quantizing at least a portion of the fixed set of weights, and pruning at least a portion of the fixed set of weights;
But El-Yaniv teaches: training the neural network on the second dataset to generate a final set of weights including at least one of: quantizing at least a portion of the fixed set of weights, and (El-Yaniv at ¶ [0065], lines 1-2 and 6-13 states: “Reference is now made to an exemplary description of the training phase… A binarization function, referred to herein as Binarize( ), binarizes (e.g. stochastically or deterministically) a floating point value of a neuron (floating point activation values) and/or a connection weight value (floating point weights). This function may be replaced with a quantization function that outputs a finite set of outcomes based on a floating point value of a neuron and/or a connection weight value.” El-Yaniv at ¶ [0050], lines 2-3 discloses the neural network may be a convolutional neural network.)
pruning at least a portion of the fixed set of weights; (Examiner is not required to map prior art because this limitation is an alternative to “quantizing”.)
El-Yaniv is in the same field of endeavor as the claimed invention, namely training neural networks. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have applied El-Yaniv’s quantizing training technique to the fixed set of weights in Yosinski/O’Shea’s network. A motivation for binarizing neuron weights is to improve the network’s efficiency. (El-Yaniv ¶ [0038], lines 1-10)

	Regarding CLAIM 16, the combination of Yosinski, O’Shea, and El-Yaniv teaches: The system as claimed in claim 15, 
where the training computer instructions further comprise identifying similarities of the first set of weights and the second set of weights. (The weight values for the first three layers are identical in both the first and second sets of weights. On p. 3, the second bullet point states: “A transfer network A3B: the first 3 layers are copied from baseA and frozen. The five higher layers (4–8) are initialized randomly…”)

Regarding CLAIM 17, the combination of Yosinski, O’Shea, and El-Yaniv teaches: The system as claimed in claim 15, 
Yosinski teaches: where the training computer instructions further comprise determining that the first dataset is the same domain as the second dataset. (Yosinski states that Dataset A and Dataset B contain images on p. 3, ¶ 2: “To create tasks A and B, we randomly split the 1000 ImageNet classes into two groups each containing 500 classes and approximately half of the data, or about 645,000 examples each. We train one eight-layer convolutional network on A and another on B.”)

Regarding CLAIM 19, the combination of Yosinski, O’Shea, and El-Yaniv teaches: The system as claimed in claim 15, where: 
However, neither Yosinski nor El-Yaniv teaches: the neural network is a convolutional neural network (CNN); and the first fixed layer is a convolutional layer.
Further, Yosinski teaches: the neural network is a convolutional neural network (CNN); and (P. 3, ¶ 2: “We train one eight-layer convolutional network on A and another on B. These networks, which we call baseA and baseB, are shown in the top two rows of Figure 1.)
	However, neither Yosinski nor El-Yaniv explicitly teach: the first fixed layer is a convolutional layer.
the first fixed layer is a convolutional layer. (O’Shea teaches on pages 4-5 that CNNs have four types of layers: input layer, convolutional layer, pooling layer, and fully-connected layer. Fig. 5 on p. 9 shows the various layers stacked in a common CNN architecture, and Fig. 5 is described at the top of p. 9. In Fig. 5, the input image is processed by a first convolutional layer.)

Regarding CLAIM 20, the combination of Yosinski, O’Shea, and El-Yaniv teaches: The system as claimed in claim 19, where: 
However, neither Yosinski nor El-Yaniv teaches: the first programmable layer is a convolutional layer; and the second programmable layer is a fully-connected layer.
But O’Shea teaches: the first programmable layer is a convolutional layer; and (O’Shea teaches on pages 4-5 that CNNs have four types of layers: input layer, convolutional layer, pooling layer, and fully-connected layer. Fig. 5 on p. 9 shows the various layers stacked in a common CNN architecture, and Fig. 5 is described at the top of p. 9. The broadest reasonable interpretation of a “layer,” in light of the specification, is a group of nodes, so consecutive layers 4-7 can be interpreted as a single layer.)
the second programmable layer is a fully-connected layer. (O’Shea teaches on pages 4-5 that CNNs have four types of layers: input layer, convolutional layer, pooling layer, and fully-connected layer. Fig. 5 on p. 9 shows the various layers stacked in a common CNN architecture, and Fig. 5 is described at the top of p. 9. In Fig. 5, the input image is processed by a first convolutional layer.)

Regarding CLAIM 22, the combination of Yosinski, O’Shea, and El-Yaniv teaches: The method as claimed in claim 1, 
Further, Yosinski teaches: where the training computer and the inference computer are the same computer. (Yosinski, in Supplementary Material p. 1, ¶ 2, last line, teaches the networks were 

Regarding CLAIM 23, the combination of Yosinski, O’Shea, and El-Yaniv teaches: The system as claimed in claim 15, 
Further, Yosinski teaches: where the training computer and the inference computer are the same computer. (Yosinski, in Supplementary Material p. 1, ¶ 2, last line, teaches the networks were trained on a NVidia K20 GPU. Yosinski indicates that a GPU was used for inferencing the network on p. 6, in footnote 4. Since a GPU was used for both training and inferencing, the limitations have been met.)

Response to Arguments
Examiner will respond to the remarks, claim amendments, specification amendments, and replacement drawings filed 09/23/2021.

Objections to the Specification: The objections to the abstract are withdrawn due to the amendments to the specification.

Objections to the Claims: The objections are withdrawn due to the claim amendments.

Claim Rejections Under 35 U.S.C. § 102: Applicant’s arguments with respect to claims 1-3, 5-10, 12, 14-17, and 19-20 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Asher H. Jablon whose telephone number is (571)270-7648. The examiner can normally be reached Monday - Friday, 9:00 am - 6:00 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Abdullah Al Kawsar can be reached on (571)270-3169. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/ASHER H. JABLON/Examiner, Art Unit 2127                                                                                                                                                                                                        
/ABDULLAH AL KAWSAR/Supervisory Patent Examiner, Art Unit 2127