DETAILED ACTION
1.	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
2.	Examiner notes that the Applicant’s request to participate in the Patent Prosecution Highway (PPH) program and the petition under 37 CFR 1.102(a), filed 28 May 2020, to make the application special, had been GRANTED 26 June 2020.
3.	This correspondence is in response to the Applicant’s submission filed 25 May 2021 [hereinafter Response], where:
Claims 1, 3, 12, and 13 have been amended. 
Claims 2, 4-11, and 14-18 have been cancelled.
New claims 19-23 are presented for consideration. 
Claims 1, 3, 12, 13, and 19-23 are pending.
Claims 1, 3, 12, 13, and 19-23 are rejected.
Examiner notes that Applicant claims foreign priority in the instant application to Chinese Appl’n No. 201711219332.9, filed 28 November 2017, in which a certified copy of the priority document has been filed with the instant application on 28 May 2020. 
Specification
4.	The title of the invention is not descriptive. A new title is required that is clearly indicative of the invention to which the claims are directed.
The following title is suggested:
METHOD, DEVICE, AND TERMINAL IN A FEATURE MAP DETERMINATION FOR A CONVOLUTIONAL LAYER
5.	The objection to the abstract because of undue length is withdrawn in view of the Applicant’s amendment thereto.
Claim Objections
6.	Claims 1, 12, and 13 are objected to because of the following informalities: 
Claim 1, lines 12-13, “alterative feature maps” should read --alternative feature maps--.
Claim 1, line 13, “the first convolutional later” should read --the first convolutional layer--.
Claim 1, lines 13-14, “a first slice selection module” should read --the first slice selection module--.
Claim 12, lines 8-9, “alterative feature maps” should read --alternative feature maps--.
Claim 12, line9, “the first convolutional later” should read --the first convolutional layer--.
Claim 13, lines 8-9, “alterative feature maps” should read --alternative feature maps--.
Claim 13, line 9, “the first convolutional later” should read --the first convolutional layer--.
Appropriate correction is required.
Claim Rejections - 35 U.S.C. § 112(b)
7.	The following is a quotation of 35 U.S.C. § 112(b):
(b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.
8.	The rejection to claims 13, 17, and 18 is withdrawn in view of the amendment to claim 13 to recite “a processor.” Examiner notes claims 17 and 18 have been cancelled, rendering the rejection moot as to those claims.
9.	Claims 1, 3, 12, 13, and 19-23 are rejected under 35 U.S.C. 112(b) as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor regards as the invention.
Claim 1, line 16, it is unclear as to in the manner the determining of the “determining the alternative feature maps as the target feature map” is achieved. It is unclear because the manner of arriving at a singular “target feature map” from a plurality of “alternative feature maps” is not discernable from the claims, though the Specification does recite the “first slice selection module may determine the target feature map from the alternative feature maps based on the one-to-one correspondence between the weight values in the feature map weight vector and the alternative feature maps.” (PGPUB ¶ 0074). Nevertheless, the claims do not recite such teaching. 
Claim 3, line 1, improperly depends from a cancelled claim because the amended claim now recites dependency to cancelled claim 2. For purposes of examination, the Examiner will consider claim 3 as to depend from independent claim 1.
Claim 3, line 4, recites the limitation "the adjusted feature map weight vector.” There is insufficient antecedent basis for this limitation in the claim.
Claim 12, line 12, it is unclear as to in the manner the determining of the “determining the alternative feature maps as the target feature map” is achieved. It is unclear because the manner of arriving at a singular “target feature map” from a plurality of “alternative feature maps” is not discernable from the claims, though the Specification does recite the “first slice selection module may determine the target feature map from the alternative feature maps based on the one-to-one correspondence between the weight values in the feature map weight vector and the alternative feature maps.” (PGPUB ¶ 0074). Nevertheless, the claims do not recite such teaching.
Claim 13, line 12, it is unclear as to in the manner the determining of the “determining the alternative feature maps as the target feature map” is achieved. It is unclear because the manner of arriving at a singular “target feature map” from a plurality of “alternative feature maps” is not discernable from the claims, though the Specification does recite the “first slice selection module may determine the target feature map from the alternative feature maps based on the one-to-one correspondence between the weight values in the feature map weight vector and the alternative feature maps.” (PGPUB ¶ 0074). Nevertheless, the claims do not recite such teaching.
Claim 19, line 9, recites the limitation "the adjusted feature map weight vector.” There is insufficient antecedent basis for this limitation in the claim.
Claim 20, line 9, recites the limitation "the adjusted feature map weight vector.” There is insufficient antecedent basis for this limitation in the claim.
Claim 21, line 4, recites the limitation "the adjusted feature map weight vector.” There is insufficient antecedent basis for this limitation in the claim.
Claim 22, line 9, recites the limitation "the adjusted feature map weight vector.” There is insufficient antecedent basis for this limitation in the claim.
Claim 23, line 4, recites the limitation "the adjusted feature map weight vector.” There is insufficient antecedent basis for this limitation in the claim.
Claim Rejections - 35 U.S.C. § 103
10.	The following is a quotation of 35 U.S.C. § 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
11.	The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. § 103 are summarized as follows:
1. 	Determining the scope and contents of the prior art.
2. 	Ascertaining the differences between the prior art and the claims at issue.
3. 	Resolving the level of ordinary skill in the pertinent art.
4. 	Considering objective evidence present in the application indicating obviousness or nonobviousness.
12.	This application currently names joint inventors. In considering patentability of the claims the Examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the Examiner to consider the applicability of 35 U.S.C. § 102(b)(2)(C) for any potential 35 U.S.C. § 102(a)(2) prior art against the later invention.
13.	Claims 1, 3, 12, and 13 are rejected under 35 U.S.C. § 103 as being unpatentable over US Published Application 20180025257 to Oord et al. [hereinafter Oord], and in view of US Published Application 20190266387 to Sun et al. [hereinafter Sun].
Regarding claim 1, Oord teaches [a] method (Oord ¶ 0013 teaches a process for generating an output image from a neural network input) for processing image, comprising:
receiving, by a first convolutional layer, output data of a previous convolutional layer (Oord ¶ 0045 teaches [t]o ensure that the convolutional neural network layers are conditioned only on the already generated output values, each convolutional neural network layer (that is receiving, by a first convolutional layer, output data of a previous convolutional layer) is configured to apply a convolution that is masked; Oord, Figure 1, teaches:

    PNG
    media_image1.png
    740
    605
    media_image1.png
    Greyscale

Oord ¶ 0032 teaches the initial neural network layers 110 can process a current output image 140 to generate an alternative representation 142 [sic] of the current output image 140 (that is, receiving, by a first convolutional layer, output data of a previous convolutional layer); Oord ¶ 0039 teaches the alternative representation [144] is a feature map);
obtaining, by the first convolutional layer, alternative feature maps based on the output data (Oord ¶ 0032 teaches the initial neural network layers 110 can process a current output image 140 to generate an alternative representation 142 of the current output image 140; Oord ¶ 0039 teaches the alternative representation is a feature map that includes features for each color channel of each pixel in the output Oord ¶ 0083 teaches (Oord ¶ 0083 teaches when the initial neural network layers are a fully convolutional neural network, the processing necessary for the initial neural network layers to generate the alternative representations (that is, alternative feature maps based on the output data) can be done in parallel rather than sequentially because the entire output image is available from the beginning of the computation);
determining whether the first convolutional layer comprises a first slice selection module (Oord ¶ 0039 teaches the alternative representation is a feature map that includes features for each color channel (that is, the channel being a “slice,” which is determining whether the first convolutional layer comprises a first selection module)) determining a target feature map based on a feature map weight vector and the alterative feature maps in response to that the first convolutional layer comprises a first slice selection module (Oord ¶ 0003 teaches [e]ach layer of the network generates an output from a received input in accordance with current values of a respective set of parameters (that is, set of parameters are a feature map weight vector; Oord ¶ 0083 teaches when the initial neural network layers are a fully convolutional neural network, the processing necessary for the initial neural network layers to generate the alternative representations can be done in parallel rather than sequentially because the entire output image is available from the beginning of the computation), wherein the feature map weight vector is generated by the first slice selection module based on the alternative feature maps (Oord ¶ 0081 teaches the system can also perform the processes 200 [for generating an output image from a neural network input] and 300 [for generating a color value for a given color channel of a given pixel of an output image] on neural network inputs in a set of training data, i.e., a set of inputs for which the output image that should be generated by the system is known, in order to train the initial neural network layers and, if the output layers have parameters, the output layers, i.e., to determine trained values for the parameters of the initial neural network layers and, optionally, the output layers (that is, parameters are a feature map weight vector). The processes 200 and 300 can be performed repeatedly on inputs selected from a set of training data as part of a conventional machine learning training technique to train the initial neural network layers, e.g., a stochastic gradient descent with backpropagation training technique (that is, by training a neural network, the parameters are weight vectors, wherein the feature map weight vector is generated by the first slice selection module based on the altnerative feature maps));
* * *
and obtaining output data of the first convolutional layer based on the target feature map (Oord ¶ 0062 teaches when the initial neural network layers 110 are convolutional layers (that is, the convolutional layers include the first convolutional layer), some or all of the layers have a gated activation function in place of a conventional activation function. In a gated activation function, the output of an element-wise non-linearity, i.e., of a conventional activation function, is element-wise multiplied by a gate vector that is generated by applying an element-wise non-linearity to the obtaining output data of the first convolutional layer based on the target feature map)).
Though Oord teaches a slice selection module (that is, color channels) in that a neural network system generates an output image that includes a predetermined number of pixels having a respective color value for each of multiple color channels, Oord, however, does not explicitly teach -
* * *
determining the alternative feature maps as the target feature map in response to that the first convolutional layer does not comprise the first slice selection module; and
* * *
But Sun teaches -
* * *
determining the alternative feature maps as the target feature map in response to that the first convolutional layer does not comprise the first slice selection module (Sun ¶ 0084 teaches the first processed data may be processed (that is, inputting the output data of the previous convolutional layer) by the second convolution processing unit 120 of the device 100 using a second type of convolutional layers in the CNN (that is, into the first convolutional layer) to generate second processed data, wherein the second type of convolutional layers comprise convolution kernels having shared weights (that is, with shared weights of the second convolution processing unit 120, is in response to that the first convolutional layer does not comprise the first slice selection module)); and
* * *
Oord and Sun are from the same or similar field of endeavor. Oord teaches the generation of alternative feature maps and parameter values of a convolutional layer used in image processing. Sun teaches parameter sharing solutions in convolutional layers on a depth slice basis that may be a singular slice. Thus, it would have obvious to a person having ordinary skill in the art as of the effective filing date of the Applicant’s invention to modify Oord pertaining to alternative feature map and parameter generation for a convolutional layer with the convolutional layer parameter sharing solutions of Sun.
The motivation for doing so is to solution effectively utilizes structured information of a human face while taking image results of the human face in different environments into account such as via the layer depth of a convolutional kernel, thereby realizing rapid and accurate recognition of facial features. (Sun ¶ 0072).
Regarding claim 3, the combination of Oord and Sun teaches all of the limitations of claim 2, as described above.
Oord teaches -
wherein said obtaining output data of the first convolutional layer in response to that the first convolutional layer comprises the first slice selection module comprises:
determining the target feature map based on the adjusted feature map weight vector by the first convolutional layer (Oord ¶ 0003 teaches [e]ach layer of the network generates an output from a received input in accordance with current values of a respective set of parameters (that is, set of parameters are a feature map weight vector; Oord ¶ 0083 teaches when the initial neural network layers are a fully ; 
and obtaining the output data of the first convolutional layer based on the target feature map (Oord ¶ 0062 teaches when the initial neural network layers 110 are convolutional layers (that is, the convolutional layers include the first convolutional layer), some or all of the layers have a gated activation function in place of a conventional activation function. In a gated activation function, the output of an element-wise non-linearity, i.e., of a conventional activation function, is element-wise multiplied by a gate vector that is generated by applying an element-wise non-linearity to the output of a convolution (that is, obtaining output data of the first convolutional layer based on the target feature map)).
Regarding claim 12, Oord teaches [a] terminal (Oord cl. 1), comprising: a memory, a processor (Oord ¶ 0148 teaches a According to yet another aspect of the present application, provided is a terminal which includes a memory, a processor and an image processing program which is stored on the memory and may run on the processor, and when the image processing program is executed by the processor, any of the image processing methods in the present application are implemented) and an image processing program stored in the memory and executable on the processor (Oord ¶ 0007 teaches [f]or a system of one or more computers to be configured to perform particular operations or actions means that the system has installed on it software, firmware, hardware, or a combination of them that in operation an image processing program stored in the memory and executable on the processor)), wherein the processor is configured to execute the image processing program in the memory (the combination of Sun, Lawrence, and Kim teaches all of the limitations of claim 1, as described in detail above) to perform following process:
receiving, by a first convolutional layer, output data of a previous convolutional layer (Oord ¶ 0045 teaches [t]o ensure that the convolutional neural network layers are conditioned only on the already generated output values, each convolutional neural network layer (that is receiving, by a first convolutional layer, output data of a previous convolutional layer) is configured to apply a convolution that is masked; Oord, Figure 1, teaches:

    PNG
    media_image1.png
    740
    605
    media_image1.png
    Greyscale

Oord ¶ 0032 teaches the initial neural network layers 110 can process a current output image 140 to generate an alternative representation 142 [sic] of the current output image 140 (that is, receiving, by a first convolutional layer, output data of a previous convolutional layer); Oord ¶ 0039 teaches the alternative representation [144] is a feature map);
obtaining, by the first convolutional layer, alternative feature maps based on the output data (Oord ¶ 0032 teaches the initial neural network layers 110 can process a current output image 140 to generate an alternative representation 142 of the current output image 140; Oord ¶ 0039 teaches the alternative representation is a feature map that includes features for each color channel of each pixel in the output image. In these implementations, when generating the color value for a given channel of a given pixel, the output layer uses the corresponding portion of the alternative Oord ¶ 0083 teaches (Oord ¶ 0083 teaches when the initial neural network layers are a fully convolutional neural network, the processing necessary for the initial neural network layers to generate the alternative representations (that is, alternative feature maps based on the output data) can be done in parallel rather than sequentially because the entire output image is available from the beginning of the computation);
determining a target feature map based on a feature map weight vector and the alterative feature maps in response to that the first convolutional later comprises a first slice selection module (Oord ¶ 0003 teaches [e]ach layer of the network generates an output from a received input in accordance with current values of a respective set of parameters (that is, set of parameters are a feature map weight vector; Oord ¶ 0083 teaches when the initial neural network layers are a fully convolutional neural network, the processing necessary for the initial neural network layers to generate the alternative representations can be done in parallel rather than sequentially because the entire output image is available from the beginning of the computation), wherein the feature map weight vector is generated by the first slice selection module based on the alternative feature maps (Oord ¶ 0081 teaches the system can also perform the processes 200 [for generating an output image from a neural network input] and 300 [for generating a color value for a given color channel of a given pixel of an output image] on neural network inputs in a set of training data, i.e., a set of inputs for which the output image that should be generated by the system is known, in order to train the initial neural network layers and, if the output layers have i.e., to determine trained values for the parameters of the initial neural network layers and, optionally, the output layers (that is, parameters are a feature map weight vector). The processes 200 and 300 can be performed repeatedly on inputs selected from a set of training data as part of a conventional machine learning training technique to train the initial neural network layers, e.g., a stochastic gradient descent with backpropagation training technique (that is, by training a neural network, the parameters are weight vectors, wherein the feature map weight vector is generated by the first slice selection module based on the alternative feature maps));
* * *
and obtaining output data of the first convolutional layer based on the target feature map (Oord ¶ 0062 teaches when the initial neural network layers 110 are convolutional layers (that is, the convolutional layers include the first convolutional layer), some or all of the layers have a gated activation function in place of a conventional activation function. In a gated activation function, the output of an element-wise non-linearity, i.e., of a conventional activation function, is element-wise multiplied by a gate vector that is generated by applying an element-wise non-linearity to the output of a convolution (that is, obtaining output data of the first convolutional layer based on the target feature map)).
Though Oord teaches a slice selection module (that is, color channels) in that a neural network system generates an output image that includes a predetermined number of pixels having a respective color value for each of multiple color channels, Oord, however, does not explicitly teach -
* * *
determining the alternative feature maps as the target feature map in response to that the first convolutional layer does not comprise the first slice selection module; and
* * *
But Sun teaches -
* * *
determining the alternative feature maps as the target feature map in response to that the first convolutional layer does not comprise the first slice selection module (Sun ¶ 0084 teaches the first processed data may be processed (that is, inputting the output data of the previous convolutional layer) by the second convolution processing unit 120 of the device 100 using a second type of convolutional layers in the CNN (that is, into the first convolutional layer) to generate second processed data, wherein the second type of convolutional layers comprise convolution kernels having shared weights (that is, with shared weights of the second convolution processing unit 120, is in response to that the first convolutional layer does not comprise the first slice selection module)); and
* * *
Oord and Sun are from the same or similar field of endeavor. Oord teaches the generation of alternative feature maps and parameter values of a convolutional layer used in image processing. Sun teaches parameter sharing solutions in convolutional layers on a depth slice basis that may be a singular slice. Thus, it would have obvious to a person having ordinary skill in the art as of the effective filing date of the Applicant’s invention to modify Oord pertaining to alternative feature map and parameter Sun.
The motivation for doing so is to solution effectively utilizes structured information of a human face while taking image results of the human face in different environments into account such as via the layer depth of a convolutional kernel, thereby realizing rapid and accurate recognition of facial features. (Sun ¶ 0072).
Regarding claim 13, Oord teaches [a] non-transitory computer readable storage medium, wherein an image processing program is stored in the computer readable storage medium (Oord ¶ 0144 teaches [e]mbodiments of the subject matter described in this specification can be implemented as one or more computer programs, i.e., one or more modules of computer program instructions encoded on a tangible non transitory program carrier for execution by, or to control the operation of, data processing apparatus.), a method for processing image is implemented when the image processing program is executed by a processor (Oord ¶ 0013 teaches a process for generating an output image from a neural network input), the method comprises:
receiving, by a first convolutional layer, output data of a previous convolutional layer (Oord ¶ 0045 teaches [t]o ensure that the convolutional neural network layers are conditioned only on the already generated output values, each convolutional neural network layer (that is receiving, by a first convolutional layer, output data of a previous convolutional layer) is configured to apply a convolution that is masked; Oord, Figure 1, teaches:

    PNG
    media_image1.png
    740
    605
    media_image1.png
    Greyscale

Oord ¶ 0032 teaches the initial neural network layers 110 can process a current output image 140 to generate an alternative representation 142 [sic] of the current output image 140 (that is, receiving, by a first convolutional layer, output data of a previous convolutional layer); Oord ¶ 0039 teaches the alternative representation [144] is a feature map);
obtaining, by the first convolutional layer, alternative feature maps based on the output data (Oord ¶ 0032 teaches the initial neural network layers 110 can process a current output image 140 to generate an alternative representation 142 of the current output image 140; Oord ¶ 0039 teaches the alternative representation is a feature map that includes features for each color channel of each pixel in the output image. In these implementations, when generating the color value for a given channel of a given pixel, the output layer uses the corresponding portion of the alternative Oord ¶ 0083 teaches (Oord ¶ 0083 teaches when the initial neural network layers are a fully convolutional neural network, the processing necessary for the initial neural network layers to generate the alternative representations (that is, alternative feature maps based on the output data) can be done in parallel rather than sequentially because the entire output image is available from the beginning of the computation);
determining a target feature map based on a feature map weight vector and the alterative feature maps in response to that the first convolutional later comprises a first slice selection module (Oord ¶ 0003 teaches [e]ach layer of the network generates an output from a received input in accordance with current values of a respective set of parameters (that is, set of parameters are a feature map weight vector; Oord ¶ 0083 teaches when the initial neural network layers are a fully convolutional neural network, the processing necessary for the initial neural network layers to generate the alternative representations can be done in parallel rather than sequentially because the entire output image is available from the beginning of the computation), wherein the feature map weight vector is generated by the first slice selection module based on the alternative feature maps (Oord ¶ 0081 teaches the system can also perform the processes 200 [for generating an output image from a neural network input] and 300 [for generating a color value for a given color channel of a given pixel of an output image] on neural network inputs in a set of training data, i.e., a set of inputs for which the output image that should be generated by the system is known, in order to train the initial neural network layers and, if the output layers have i.e., to determine trained values for the parameters of the initial neural network layers and, optionally, the output layers (that is, parameters are a feature map weight vector). The processes 200 and 300 can be performed repeatedly on inputs selected from a set of training data as part of a conventional machine learning training technique to train the initial neural network layers, e.g., a stochastic gradient descent with backpropagation training technique (that is, by training a neural network, the parameters are weight vectors, wherein the feature map weight vector is generated by the first slice selection module based on the alternative feature maps));
* * *
and obtaining output data of the first convolutional layer based on the target feature map (Oord ¶ 0062 teaches when the initial neural network layers 110 are convolutional layers (that is, the convolutional layers include the first convolutional layer), some or all of the layers have a gated activation function in place of a conventional activation function. In a gated activation function, the output of an element-wise non-linearity, i.e., of a conventional activation function, is element-wise multiplied by a gate vector that is generated by applying an element-wise non-linearity to the output of a convolution (that is, obtaining output data of the first convolutional layer based on the target feature map)).
Though Oord teaches a slice selection module (that is, color channels) in that a neural network system generates an output image that includes a predetermined number of pixels having a respective color value for each of multiple color channels, Oord, however, does not explicitly teach -
* * *
determining the alternative feature maps as the target feature map in response to that the first convolutional layer does not comprise the first slice selection module; and
* * *
But Sun teaches -
* * *
determining the alternative feature maps as the target feature map in response to that the first convolutional layer does not comprise the first slice selection module (Sun ¶ 0084 teaches the first processed data may be processed (that is, inputting the output data of the previous convolutional layer) by the second convolution processing unit 120 of the device 100 using a second type of convolutional layers in the CNN (that is, into the first convolutional layer) to generate second processed data, wherein the second type of convolutional layers comprise convolution kernels having shared weights (that is, with shared weights of the second convolution processing unit 120, is in response to that the first convolutional layer does not comprise the first slice selection module)); and
* * *
Oord and Sun are from the same or similar field of endeavor. Oord teaches the generation of alternative feature maps and parameter values of a convolutional layer used in image processing. Sun teaches parameter sharing solutions in convolutional layers on a depth slice basis that may be a singular slice. Thus, it would have obvious to a person having ordinary skill in the art as of the effective filing date of the Applicant’s invention to modify Oord pertaining to alternative feature map and parameter Sun.
The motivation for doing so is to solution effectively utilizes structured information of a human face while taking image results of the human face in different environments into account such as via the layer depth of a convolutional kernel, thereby realizing rapid and accurate recognition of facial features. (Sun ¶ 0072).
14.	Claims 19-23 are rejected under 35 U.S.C. § 103 as being unpatentable over US Published Application 20180025257 to Oord et al. [hereinafter Oord], and in view of US Published Application 20190266387 to Sun et al. [hereinafter Sun], Lawrence et al., "Face Recognition: A Convolutional Neural-Network Approach,” IEEE Transactions on neural Networks (January 1997) [hereinafter Lawrence] and further in view of Kim et al., “A Novel Zero Weight/Activation-Aware Hardware Architecture of Convolutional Neural Network,” IEEE (March 2017) [hereinafter Kim].
Regarding claim 19, the combination of Oord and Sun teaches all of the limitations of claim 1, as described above in detail.
Sun teaches -
wherein said determining a target feature map, in response to that the first convolutional layer comprises the first slice selection module, comprises:
generating the feature map weight vector by the first slice selection module based on the alternative feature maps (Sun ¶ 0033 teaches In the CNN 1, various convolution kernels may be repeated over the entire field of view (i.e., in wide and generating a feature map weight vector . . . based on the output data of the previous convolutional layer)), . . . ;
* * *
Oord and Sun are from the same or similar field of endeavor. Oord teaches the generation of alternative feature maps and parameter values of a convolutional layer used in image processing. Sun teaches parameter sharing solutions in convolutional layers on a depth slice basis that may be a singular slice. Thus, it would have obvious to a person having ordinary skill in the art as of the effective filing date of the Applicant’s invention to modify Oord pertaining to alternative feature map and parameter generation for a convolutional layer with the convolutional layer parameter sharing solutions of Sun.
The motivation for doing so is to solution effectively utilizes structured information of a human face while taking image results of the human face in different environments into account such as via the layer depth of a convolutional kernel, thereby realizing rapid and accurate recognition of facial features. (Sun ¶ 0072).
Though the Oord and Sun teaches the features of alternative feature map and parameter generation for a convolutional layer with the convolutional layer parameter sharing solutions, the combination of Oord and Sun does not explicitly teach -
. . . wherein each point in the feature map weight vector corresponds to one of the alternative feature maps and a weight value;
* * *
Lawrence teaches -
. . . wherein each point in the feature map weight vector corresponds to one of the alternative feature maps and a weight value (Lawrence, left column of p. 103, “IV. System Components - E. Convolutional Networks”, third paragraph, teaches [t]he weights forming the receptive field for a plane are forced to be equal to all points in the plane (that is, each point in the feature map weight vectors). Each plane can be considered as a feature map . . . which is scanned over the planes in the previous layer. Multiple planes are usually used in each [convolutional] layer so that multiple features can be detected (that is, each point . . . corresponds to one of the feature maps in the first convolutional layer and a weight value));
* * *
Oord, Sun and Lawrence are from the same or similar field of endeavor. Oord teaches the generation of alternative feature maps and parameter values of a convolutional layer used in image processing. Sun teaches convolutional operations where it is sometimes convenient to fill in the input data with 0s at the edge of the input data to control spatial size of the output data. Lawrence teaches convolutional networks using self-organizing maps (SOM) for convolutional network pre-processing. Thus, it would have been obvious to a person having ordinary skill in the art as of the effective filing date of the Applicant’s invention to modify the combination of Oord and Sun pertaining to convolutional operations with alternative representations (that is, feature maps) with the pre-processing of Lawrence.
The motivation for doing so is to provide dimensionality reduction and invariance to minor changes in the image sample, and the convolutional neural network provides Lawrence, Abstract).
Though the Oord, Sun, and Lawrence teaches the features of alternative feature map and parameter generation for a convolutional layer with the convolutional layer parameter sharing solutions and zero padding for spatial control, the combination of Oord, Sun, and Lawrence does not explicitly teach -
* * *
determining a number N of target features based on a preset acceleration ratio;
adjusting weight values of other points except a first N points in the feature map weight vector to 0, and inputting adjusted feature map weight vector into the first convolutional layer, wherein the alternative feature maps corresponding to the first N points is the target feature map.
But Kim teaches -
* * *
determining a number N of target features based on a preset acceleration ratio (Kim, left column of p. 1463, “III. Basic Idea of Exploiting Zero Values - A. Architecture design, third paragraph, teaches a proposed architecture in Fig. 1b:

    PNG
    media_image2.png
    348
    197
    media_image2.png
    Greyscale

in which each [processing element (PE)] of our architecture performs a single convolution . . . [that] allows each PE to individually skip multiplications and accumulation associated with either zero weights or activations (that is, acceleration); Kim, right column of p. 1465, “V. Zero-Aware Kernel Allocation”, second full paragraph, teaches that Fig. 5a shows the ratio of non-zero weights per kernel tile. . . . [T]he PE with the largest ratio of non-zero kernel weights (that is, preset acceleration ratio) . . . determines the runtime of each sub-[working group] (that is, determining a number N of target features based on a preset acceleration ratio));
adjusting weight values of other points except a first N points in the feature map weight vector to 0 (Kim, left column of p. 1462, “I. Introduction”, third paragraph, teaches pruning techniques [11] increase the portion of zero weights (that is, adjusting weight values of other points . . . to 0)) without losing the quality of CNN result (because the quality of the CNN result is retained, this is except a first N points in the feature map weight vector))), and inputting adjusted feature map weight vector into the first convolutional layer, wherein the alternative feature maps corresponding to the first N points is the target feature map (Kim, left column of p. 1464, “IV. Architecture - A. Architecture Overview”, first paragraph, teaches [a]ctivations and weights are broadcast (that is, inputting) from on-chip SRAM to [processing elements]; Kim, left column of p. 1465, “Iv. Architecture - C. Zero-aware PE Architecture”, first column, teaches the [microarchitecture of a zero-aware PE] determines which entries to read from the Act and Weight buffers . . . . This allows us to skip the multiplications whose results will be zero).
Oord, Sun, Lawrence, and Kim are from the same or similar field of endeavor. Oord teaches the generation of alternative feature maps and parameter values of a convolutional layer used in image processing. Sun teaches convolutional operations where it is sometimes convenient to fill in the input data with 0s at the edge of the input data to control spatial size of the output data. Lawrence teaches convolutional networks using self-organizing maps (SOM) for convolutional network pre-processing. Kim teaches a novel hardware architecture for accelerating convolution operations. Thus, it would have been obvious to a person having ordinary skill in the art as of the effective filing date of the Applicant’s invention to modify the combination of Oord, Sun and Lawrence pertaining to convolutional operations having zero padding for spatial size control and convolutional pre-processing with the convolutional acceleration of Kim.
The motivation for doing so is for the need to accelerate convolutional neural networks (CNNs) due to their ever-widening application areas from server, mobile to IoT devices. (Kim, Abstract).
Regarding claim 20, the combination of Oord and Sun teaches all of the limitations of claim 12, as described in detail above.
Sun teaches -
wherein the processor is configured to execute the image processing program to generate a target feature map, in response to that the first convolutional layer comprises the first slice selection module, by:
generating the feature map weight vector by the first slice selection module based on the alternative feature maps (Sun ¶ 0033 teaches In the CNN 1, various convolution kernels may be repeated over the entire field of view (i.e., in wide and height planes). These repeated units may share the same parameters (weight vectors and offsets) and form a feature map (that is, generating a feature map weight vector . . . based on the output data of the previous convolutional layer)), . . . ;
* * *
Oord and Sun are from the same or similar field of endeavor. Oord teaches the generation of alternative feature maps and parameter values of a convolutional layer used in image processing. Sun teaches parameter sharing solutions in convolutional layers on a depth slice basis that may be a singular slice. Thus, it would have obvious to a person having ordinary skill in the art as of the effective filing date of the Applicant’s invention to modify Oord pertaining to alternative feature map and parameter generation for a convolutional layer with the convolutional layer parameter sharing solutions of Sun.
The motivation for doing so is to solution effectively utilizes structured information of a human face while taking image results of the human face in different environments Sun ¶ 0072).
Though the Oord and Sun teaches the features of alternative feature map and parameter generation for a convolutional layer with the convolutional layer parameter sharing solutions, the combination of Oord and Sun does not explicitly teach -
. . . wherein each point in the feature map weight vector corresponds to one of the alternative feature maps and a weight value;
* * *
But Lawrence teaches -
. . . wherein each point in the feature map weight vector corresponds to one of the alternative feature maps and a weight value (Lawrence, left column of p. 103, “IV. System Components - E. Convolutional Networks”, third paragraph, teaches [t]he weights forming the receptive field for a plane are forced to be equal to all points in the plane (that is, each point in the feature map weight vectors). Each plane can be considered as a feature map . . . which is scanned over the planes in the previous layer. Multiple planes are usually used in each [convolutional] layer so that multiple features can be detected (that is, each point . . . corresponds to one of the feature maps in the first convolutional layer and a weight value));
* * *
Oord, Sun and Lawrence are from the same or similar field of endeavor. Oord teaches the generation of alternative feature maps and parameter values of a convolutional layer used in image processing. Sun teaches convolutional operations Lawrence teaches convolutional networks using self-organizing maps (SOM) for convolutional network pre-processing. Thus, it would have been obvious to a person having ordinary skill in the art as of the effective filing date of the Applicant’s invention to modify the combination of Oord and Sun pertaining to convolutional operations with alternative representations (that is, feature maps) with the pre-processing of Lawrence.
The motivation for doing so is to provide dimensionality reduction and invariance to minor changes in the image sample, and the convolutional neural network provides for partial invariance to translation, rotation, scale, and deformation. (Lawrence, Abstract).
Though the Oord, Sun, and Lawrence teaches the features of alternative feature map and parameter generation for a convolutional layer with the convolutional layer parameter sharing solutions and zero padding for spatial control, the combination of Oord, Sun, and Lawrence does not explicitly teach -
* * *
determining a number N of target features based on a preset acceleration ratio;
adjusting weight values of other points except a first N points in the feature map weight vector to 0, and inputting adjusted feature map weight vector into the first convolutional layer, wherein the alternative feature maps corresponding to the first N points is the target feature map.
But Kim teaches -
* * *
determining a number N of target features based on a preset acceleration ratio (Kim, left column of p. 1463, “III. Basic Idea of Exploiting Zero Values - A. Architecture design, third paragraph, teaches a proposed architecture in Fig. 1b:

    PNG
    media_image2.png
    348
    197
    media_image2.png
    Greyscale

in which each [processing element (PE)] of our architecture performs a single convolution . . . [that] allows each PE to individually skip multiplications and accumulation associated with either zero weights or activations (that is, acceleration); Kim, right column of p. 1465, “V. Zero-Aware Kernel Allocation”, second full paragraph, teaches that Fig. 5a shows the ratio of non-zero weights per kernel tile. . . . [T]he PE with the largest ratio of non-zero kernel weights (that is, preset acceleration ratio) . . . determines the runtime of each sub-[working group] (that is, determining a number N of target features based on a preset acceleration ratio));
adjusting weight values of other points except a first N points in the feature map weight vector to 0 (Kim, left column of p. 1462, “I. Introduction”, third paragraph, teaches pruning techniques [11] increase the portion of zero weights (that is, adjusting weight values of other points . . . to 0)) without losing the quality of CNN result (because the quality of the CNN result is retained, this is except a first N points in the feature map weight vector))), and inputting adjusted feature map weight vector into the first convolutional layer, wherein the alternative feature maps corresponding to the first N points is the target feature map (Kim, left column of p. 1464, “IV. Architecture - A. Architecture Overview”, first paragraph, teaches [a]ctivations and weights are broadcast (that is, inputting) from on-chip SRAM to [processing elements]; Kim, left column of p. 1465, “Iv. Architecture - C. Zero-aware PE Architecture”, first column, teaches the [microarchitecture of a zero-aware PE] determines which entries to read from the Act and Weight buffers . . . . This allows us to skip the multiplications whose results will be zero).
Oord, Sun, Lawrence, and Kim are from the same or similar field of endeavor. Oord teaches the generation of alternative feature maps and parameter values of a convolutional layer used in image processing. Sun teaches convolutional operations where it is sometimes convenient to fill in the input data with 0s at the edge of the input data to control spatial size of the output data. Lawrence teaches convolutional networks using self-organizing maps (SOM) for convolutional network pre-processing. Kim teaches a novel hardware architecture for accelerating convolution operations. Thus, it would have been obvious to a person having ordinary skill in the art as of the effective filing date of the Applicant’s invention to modify the combination of Oord, Sun and Lawrence pertaining to convolutional operations having zero padding for spatial size control and convolutional pre-processing with the convolutional acceleration of Kim.
The motivation for doing so is for the need to accelerate convolutional neural networks (CNNs) due to their ever-widening application areas from server, mobile to IoT devices. (Kim, Abstract).
Regarding claim 21, the combination of Oord, Sun, Lawrence, and Kim teaches all of the limitations of claim 20, as discussed above in detail.
Oord teaches -
wherein the processor is configured to execute the image processing program to obtain output data of the first convolutional layer in response to that the first convolutional layer comprises the first slice selection module, by:
determining the target feature map based on the adjusted feature map weight vector by the first convolutional layer (Oord ¶ 0003 teaches [e]ach layer of the network generates an output from a received input in accordance with current values of a respective set of parameters (that is, set of parameters are a feature map weight vector; Oord ¶ 0083 teaches when the initial neural network layers are a fully convolutional neural network, the processing necessary for the initial neural network layers to generate the alternative representations can be done in parallel rather than sequentially because the entire output image is available from the beginning of the computation); and
obtaining the output data of the first convolutional layer based on the target feature map (Oord ¶ 0062 teaches when the initial neural network layers 110 are convolutional layers (that is, the convolutional layers include the first convolutional layer), some or all of the layers have a gated activation function in place of a conventional activation function. In a gated activation function, the output of an element-wise non-linearity, i.e., of a conventional activation function, is element-wise multiplied by a gate vector that is generated by applying an element-wise non-linearity to the output of a convolution (that is, obtaining output data of the first convolutional layer based on the target feature map)).
Regarding claim 22, the combination of Oord and Sun teaches all of the limitations of claim 13, as described in detail above.
Sun teaches -
wherein said generating a target feature map in response to that the first convolutional layer comprises the first slice selection module, comprises:
generating the feature map weight vector by the first slice selection module based on the alternative feature maps (Sun ¶ 0033 teaches In the CNN 1, various convolution kernels may be repeated over the entire field of view (i.e., in wide and height planes). These repeated units may share the same parameters (weight vectors and offsets) and form a feature map (that is, generating a feature map weight vector . . . based on the output data of the previous convolutional layer)), . . . ;
* * *
Oord and Sun are from the same or similar field of endeavor. Oord teaches the generation of alternative feature maps and parameter values of a convolutional layer used in image processing. Sun teaches parameter sharing solutions in convolutional layers on a depth slice basis that may be a singular slice. Thus, it would have obvious to Oord pertaining to alternative feature map and parameter generation for a convolutional layer with the convolutional layer parameter sharing solutions of Sun.
The motivation for doing so is to solution effectively utilizes structured information of a human face while taking image results of the human face in different environments into account such as via the layer depth of a convolutional kernel, thereby realizing rapid and accurate recognition of facial features. (Sun ¶ 0072).
Though the Oord and Sun teaches the features of alternative feature map and parameter generation for a convolutional layer with convolutional layer parameter sharing solutions, the combination of Oord and Sun does not explicitly teach -
. . . wherein each point in the feature map weight vector corresponds to one of the alternative feature maps and a weight value;
* * *
But Lawrence teaches -
. . . wherein each point in the feature map weight vector corresponds to one of the alternative feature maps and a weight value (Lawrence, left column of p. 103, “IV. System Components - E. Convolutional Networks”, third paragraph, teaches [t]he weights forming the receptive field for a plane are forced to be equal to all points in the plane (that is, each point in the feature map weight vectors). Each plane can be considered as a feature map . . . which is scanned over the planes in the previous layer. Multiple planes are usually used in each [convolutional] layer so that multiple features each point . . . corresponds to one of the feature maps in the first convolutional layer and a weight value));
* * *
Oord, Sun and Lawrence are from the same or similar field of endeavor. Oord teaches the generation of alternative feature maps and parameter values of a convolutional layer used in image processing. Sun teaches convolutional operations where it is sometimes convenient to fill in the input data with 0s at the edge of the input data to control spatial size of the output data. Lawrence teaches convolutional networks using self-organizing maps (SOM) for convolutional network pre-processing. Thus, it would have been obvious to a person having ordinary skill in the art as of the effective filing date of the Applicant’s invention to modify the combination of Oord and Sun pertaining to convolutional operations with alternative representations (that is, feature maps) with the pre-processing of Lawrence.
The motivation for doing so is to provide dimensionality reduction and invariance to minor changes in the image sample, and the convolutional neural network provides for partial invariance to translation, rotation, scale, and deformation. (Lawrence, Abstract).
Though the Oord, Sun, and Lawrence teaches the features of alternative feature map and parameter generation for a convolutional layer with the convolutional layer parameter sharing solutions and zero padding for spatial control, the combination of Oord, Sun, and Lawrence does not explicitly teach -
* * *
determining a number N of target features based on a preset acceleration ratio;
adjusting weight values of other points except a first N points in the feature map weight vector to 0, and inputting adjusted feature map weight vector into the first convolutional layer, wherein the alternative feature maps corresponding to the first N points is the target feature map.
But Kim teaches -
* * *
determining a number N of target features based on a preset acceleration ratio (Kim, left column of p. 1463, “III. Basic Idea of Exploiting Zero Values - A. Architecture design, third paragraph, teaches a proposed architecture in Fig. 1b:

    PNG
    media_image2.png
    348
    197
    media_image2.png
    Greyscale

in which each [processing element (PE)] of our architecture performs a single convolution . . . [that] allows each PE to individually skip multiplications and acceleration); Kim, right column of p. 1465, “V. Zero-Aware Kernel Allocation”, second full paragraph, teaches that Fig. 5a shows the ratio of non-zero weights per kernel tile. . . . [T]he PE with the largest ratio of non-zero kernel weights (that is, preset acceleration ratio) . . . determines the runtime of each sub-[working group] (that is, determining a number N of target features based on a preset acceleration ratio));
adjusting weight values of other points except a first N points in the feature map weight vector to 0 (Kim, left column of p. 1462, “I. Introduction”, third paragraph, teaches pruning techniques [11] increase the portion of zero weights (that is, adjusting weight values of other points . . . to 0)) without losing the quality of CNN result (because the quality of the CNN result is retained, this is except a first N points in the feature map weight vector))), and inputting adjusted feature map weight vector into the first convolutional layer, wherein the alternative feature maps corresponding to the first N points is the target feature map (Kim, left column of p. 1464, “IV. Architecture - A. Architecture Overview”, first paragraph, teaches [a]ctivations and weights are broadcast (that is, inputting) from on-chip SRAM to [processing elements]; Kim, left column of p. 1465, “Iv. Architecture - C. Zero-aware PE Architecture”, first column, teaches the [microarchitecture of a zero-aware PE] determines which entries to read from the Act and Weight buffers . . . . This allows us to skip the multiplications whose results will be zero).
Oord, Sun, Lawrence, and Kim are from the same or similar field of endeavor. Oord teaches the generation of alternative feature maps and parameter values of a convolutional layer used in image processing. Sun teaches convolutional operations Lawrence teaches convolutional networks using self-organizing maps (SOM) for convolutional network pre-processing. Kim teaches a novel hardware architecture for accelerating convolution operations. Thus, it would have been obvious to a person having ordinary skill in the art as of the effective filing date of the Applicant’s invention to modify the combination of Oord, Sun and Lawrence pertaining to convolutional operations having zero padding for spatial size control and convolutional pre-processing with the convolutional acceleration of Kim.
The motivation for doing so is for the need to accelerate convolutional neural networks (CNNs) due to their ever-widening application areas from server, mobile to IoT devices. (Kim, Abstract).
Regarding claim 23, the combination of Oord, Sun, Lawrence, and Kim teaches all of the limitations of claim 22, as described above in detail.
Oord teaches -
wherein said obtaining output data of the first convolutional layer in response to that the first convolutional layer comprises the first slice selection module comprises:
determining the target feature map based on the adjusted feature map weight vector by the first convolutional layer (Oord ¶ 0003 teaches [e]ach layer of the network generates an output from a received input in accordance with current values of a respective set of parameters (that is, set of parameters are a feature map weight vector; Oord ¶ 0083 teaches when the initial neural network layers are a fully convolutional neural network, the processing necessary for the initial neural network layers to generate the alternative representations can be done in parallel rather than ; and
obtaining the output data of the first convolutional layer based on the target feature map (Oord ¶ 0062 teaches when the initial neural network layers 110 are convolutional layers (that is, the convolutional layers include the first convolutional layer), some or all of the layers have a gated activation function in place of a conventional activation function. In a gated activation function, the output of an element-wise non-linearity, i.e., of a conventional activation function, is element-wise multiplied by a gate vector that is generated by applying an element-wise non-linearity to the output of a convolution (that is, obtaining output data of the first convolutional layer based on the target feature map)).
Response to Arguments
15.	Applicant submits that the rejection under Section 101 has been overcome by amending the claim to now recite “non-transitory” computer readable storage medium.
Examiner notes the Applicant’s amendment; however, as further explained above, the Specification causes the claims to not expressly exclude transitory computer-readable storage mediums. 
16.	With respect to the rejection under Section 103, Applicant argues “Sun, an optional pooling layer is used to downsample output data of the preceding convolutional layer. However, Sun is silent on obtaining alternative feature maps based on the output data of the preceding convolutional layer before down-sampling. In the amended claim 1, before the first slice selection module determining the target feature map, obtaining, by the first convolutional layer, alternative feature maps based on the output data, then determining whether the first convolutional layer comprises a first slice selection module (which is also not disclosed by Sun), if yes, subsequently processing is implemented based on the alternative feature maps in the first slice selection module to obtain the output data of the first convolutional layer.” (Response at p. 10).
Examiner respectfully disagrees. Applicant appears to argue claim limitations that are not set out in the claims. For example, the claims merely recite limitations of “determining a target feature map based on a feature map weight vector and the alterative feature maps in response to that the first convolutional layer comprises a first slice selection module.” (See, e.g., representative claim 1). The claims do not set out other neural network layers, other than a first convolutional layer, and a previous convolutional layer. Moreover, Examiner notes that the specification recites a pooling feature, in that the output data of the previous convolutional layer is processed by adopting a global-average-pooling algorithm. (See PGPUB ¶ 0024).
With respect to Applicant’s argument that Sun does not teach the feature of “alternative feature maps,” Examiner agrees. Oord is cited as teaching this feature, as set out in the rejections hereinabove in detail.
17.	Applicant argues “Sun merely discloses that repeated convolution kernels (convolutional layers) may share same weight vectors and offsets and form a feature map but does not disclose/suggest that the weight vectors are generated by the pooling layer.” (Response at p. 11).
Examiner respectfully disagrees. Applicant appears to argue limitations that are not recited by Applicant’s claims. For example, the claims merely recite “wherein the feature map weight vector is generated by the first slice selection module based on the alternative feature maps.” (See, e.g., representative claim 1). The claims do not define the nature of the generation with specificity. Also, Examiner now cites to Oord as teaching this feature, as is set out in the rejections hereinabove in detail.
Conclusion
18.	Applicant's amendment necessitated the new grounds of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a). 
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action.
19.	The prior art made of record and not relied upon is considered pertinent to Applicant's disclosure:
(Jacob V. Bouvrie, “Hierarchical Learning: Theory with Applications in Speech and Vision,” MIT (2009) (Thesis)) teaches definition of the derived kernel can be modified to define alternative feature maps while preserving the same architecture.
20.	Any inquiry concerning this communication or earlier communications from the Examiner should be directed to KEVIN L. SMITH whose telephone number is (571) 272-5964. Normally, the Examiner is available on Monday-Thursday 0730-1730. 
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, Applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the Examiner by telephone are unsuccessful, the Examiner’s supervisor, KAKALI CHAKI can be reached on 571-272-3719. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO 

/K.L.S./
Examiner, Art Unit 2122

/BABOUCARR FAAL/Primary Examiner, Art Unit 2184