DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.


Claim Rejections - 35 USC § 101

35 U.S.C. 101 reads as follows: 
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

35 U.S.C. 101 requires that a claimed invention must fall within one of the four eligible categories of invention (i.e. process, machine, manufacture, or composition of matter) and must not be directed to subject matter encompassing a judicially recognized exception as interpreted by the courts.  MPEP 2106.  The four eligible categories of invention  include: (1) process which is an act, or a series of acts or steps, (2) machine which is an concrete thing, consisting of parts, or of certain devices and combination of devices, (3) manufacture which is an article produced from raw or prepared materials by giving to these materials new forms, qualities, properties, or combinations, whether by 

Claims 8-20 recite computer readable storage medium are analyzed under 35 U.S.C. 101 and falling within one of the four statutory categories of invention because the broadest reasonable interpretation of the instant claims in light of the specification encompasses transitory signals.  The instant specifications discloses in paragraphs [0019-0020] that discloses -- “present invention may be a system, a method, and/or a computer program product at any possible technical detail level of integration. The computer program product may include a computer readable storage medium (or media) having computer readable program instructions thereon for causing a processor to carry out aspects of the present invention and the computer readable storage medium can be a tangible device that can retain and store instructions for use by an instruction execution device. The computer readable storage medium may be, for example, but is not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. A non-exhaustive list of more specific examples of the computer readable storage medium includes the following: a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM),P201807079US01 6 of 30 an erasable programmable read-only memory (EPROM or Flash memory), a static random access memory (SRAM), a portable compact disc read-only . A computer readable storage medium, as used herein, is not to be construed as being transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission media (e.g., light pulses passing through a fiber-optic cable), or electrical signals transmitted through a wire”. Therefore, the specification explicitly exclude transitory signals, and transitory signals are not within one of the four statutory categories (i.e. non-statutory subject matter).  See MPEP 2106(I).   
However, Examiner suggest amending the claims to recite a “non-transitory computer-readable storage medium” in claims 8-20. 

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-20 are rejected under 35 U.S.C. 103 as being unpatentable over Wang et al. (US Pub No. 20150117760 A1, as provided in IDS) in view of Lin et al. (US Pub No. 20160035078 A1, as provided in IDS).  


Wang discloses A method comprising: 
receiving, by a computing device, an image; (Wang, Abstract, Fig. 8, discloess deep convolutional neural networks receive local and global representations of images as inputs and learn the best representation for a particular feature through multiple convolutional and fully connected layers; images are input using computing system devices)

determining, by the computing device, feature vectors from the image;  (Wang, [0037], discloses neural network component 536 is further configured to merge at least one 
layer of the first column with at least one layer of the second column into a fully connected layer.  The fully connected layers may have 1000 and 256 neurons respectively.  Neural network component 536 is further configured to learn or classify at least one feature for the image.  The 256.times.1 vectors may be concatenated from each of the fc256 layers and the weights may be jointly trained in the final layer.  Interaction between the two columns in convolutional layers of the DCNN is avoided because they are in different spatial scales; features are extracted from input images)


determining, by the computing device, a first padding value and a first stride value by inputting the feature vectors into a deep neural network; (Wang, [0007], [0046], [0053], advantages of the system may include one or more of the following.  The system provides fast object detection using powerful neural network features and Regionlets 
needs to run the neural network several times to produce DNPs for the whole image depending on the required feature stride, promising low computational cost for feature extraction.  To adapt our features for the Regionlets framework, we build normalized histograms of DNPs inside each sub-region of arbitrary resolution within the detection window and add these histograms to the feature pool for the boosting learning process.  DNPs can also be easily combined with traditional features in the Regionlets framework;  discloses now that a feature vector can be computed and localized, dense neural patterns can be obtained by network-convolution.  This process is shown in FIG. 4 where dense feature maps obtained by shifting the classification window and extract neural patterns at center positions.  Producing dense neural patterns to a high-resolution image could be trivial by shifting the deep CNN model with 224.times.224 input over the larger image.  However, deeper convolutional networks are usually geometrically constrained.  For instance, they require extra padding to ensure the map sizes and borders work with strides and pooling of the next layer.  Therefore, the activation of a neuron on the fifth convolutional layer may have been calculated on zero padded values.  This creates the inhomogeneous problem among neural patterns, 
feature points in both horizontal and vertical directions.  As illustrated in Fig. 6, a regionlet can cover multiple feature points or no feature point.  The illustration of FIG. 6 shows feature points, a detection window, regions, and regionlets.  Blue points represent dense neural patterns extracted in each spatial location.  FIG. 6 shows that a regionlet can spread across multiple feature points, or no feature point; pad and stride values of deep neural network are determined to extract features of relevance from input images)


needs to run the neural network several times to produce DNPs for the whole image depending on the required feature stride, promising low computational cost for feature extraction.  To adapt our features for the Regionlets framework, we build normalized histograms of DNPs inside each sub-region of arbitrary resolution within the detection window and add these histograms to the feature pool for the boosting learning process.  DNPs can also be easily combined with traditional features in the Regionlets framework;  discloses now that a feature vector can be computed and localized, dense neural patterns can be obtained by network-convolution.  This process is shown in FIG. 4 where dense feature maps obtained by shifting the classification window and extract neural patterns at center positions.  Producing dense neural patterns to a high-resolution image could be trivial by shifting the deep CNN model with 224.times.224 input over the larger image.  However, deeper convolutional networks are usually 
feature points in both horizontal and vertical directions.  As illustrated in Fig. 6, a regionlet can cover multiple feature points or no feature point.  The illustration of FIG. 6 shows feature points, a detection window, regions, and regionlets.  Blue points represent dense neural patterns extracted in each spatial location.  FIG. 6 shows that a regionlet can spread across multiple feature points, or no feature point; pad and stride values of deep neural network are determined to extract features of relevance from input images)

		Wang does not explicitly disclose regression model and determining, by the computing device, padding by averaging the first padding value and the second padding value;  determining, by the computing device, stride by averaging the first stride value and the second stride value; classifying, by the computing device, the image using a convolutional neural network using the padding and the stride.  


		Lin discloses regression model (Lin, [0022], [0051], Fig. 1, at step 616, a probability of each input being assigned to a class for a particular feature is calculated.  Resulted associated with each input associated with the image are averaged, at step 618.  At step 620, the class with the highest probability is selected.  In one embodiment, one or more features may be extracted from the image at one of the fully-connected layers.  In one embodiment, the last layer of the deep convolutional neural network may be replaced with a regression (i.e., a continuous output between 0 and 1).  In this instance, the cost function is the sum of L.sup.2 distance between the predicted network output NN(x) and the ground truth label y);  an exemplary diagram 100 of an original image 110 as well as various global and local representations in accordance with embodiments of the present invention is depicted.  Several different transformations may be considered to normalize image sizes utilizing the original high-resolution image 110 to create a global view or global input.  A center-crop (g.sub.c) transformation 120 regression model is disclosed)

determining, by the computing device, padding by averaging the first padding value and the second padding value;  (Lin, [0051], discloses a probability of each input being assigned to a class for a particular feature is calculated.  Resulted associated with each input associated with the image are averaged, at step 618.  At step 620, the class with the highest probability is selected.  In one embodiment, one or more features may be extracted from the image at one of the fully-connected layers.  In one embodiment, the last layer of the deep convolutional neural network may be replaced with a regression (i.e., a continuous output between 0 and 1).  In this instance, the cost function is the sum of L.sup.2 distance between the predicted network output NN(x) and the ground truth averaging of relevant features values from input images are determined to classify image)

determining, by the computing device, stride by averaging the first stride value and the second stride value; (Lin, [0051], discloses a probability of each input being assigned to a class for a particular feature is calculated.  Resulted associated with each input 
associated with the image are averaged, at step 618.  At step 620, the class with the highest probability is selected.  In one embodiment, one or more features may be extracted from the image at one of the fully-connected layers.  In one embodiment, the last layer of the deep convolutional neural network may be replaced with a regression (i.e., a continuous output between 0 and 1).  In this instance, the cost function is the sum of L.sup.2 distance between the predicted network output NN(x) and the ground truth label y; averaging of relevant features values from input images are determined to classify image) and 

classifying, by the computing device, the image using a convolutional neural network using the padding and the stride.  (Lin, [0006], discloses  providing automatic 
feature learning and image assessment using deep convolutional neural networks.  
A double-column deep convolutional neural network (DCNN) is implemented and 
trained to learn and classify features for a set of images.  A global image representation of an image is extracted as a global input to a first column of the DCNN.  A local image representation of the image is extracted as a fine-grained input to a second column of the DCNN.  At least one layer of the first column is merged with at least one layer of the images are classified according to the features extracted from determined padding and stride values of input images)

 		Accordingly, it would have been obvious to one of ordinary skill in the art to modify Wang with Lin to classify an input image by extracting feature vectors through deep and regression models. One would be motivated to modify Wang by teachings of Lin to accurately classify an input image by determining padding and stride values of deep neural layers substituted with regression model in order to accurately determine padding and stride values and averaging them to classify into categories.  (see Lin, paragraph [0006]). Therefore, it would have been obvious to combine Wang and Lin to obtain the invention recited in Claim 1.

Regarding Claim 2, 
The combination of Wang and Lin further discloses wherein the computing device receives the image from a mobile device.  (Lin, Fig. 10, disclsoes I/O ports). Additionally, the rational and motivation to combine the references Wang and Lin as applied in claim 1 apply to this claim. 

Regarding Claim 3, 
		The combination of Wang and Lin further discloses wherein the feature vectors include a feature vector of a person in the image and a feature vector of an object in the image.  (Wang, [0002], detecting generic objects in high-resolution images is one of the most valuable pattern recognition tasks, useful for large-scale image labeling, scene objects with varying features (objects, persons) are detected). Additionally, the rational and motivation to combine the references Wang and Lin as applied in claim 1 apply to this claim.

Regarding Claim 4, 
		The combination of Wang and Lin further discloses training, by the computing device, the deep neural network using a training data set including descriptors and corresponding padding and stride values.  ; (Wang, [0046], [0053], discloses now that a feature vector can be computed and localized, dense neural patterns can be obtained by network-convolution.  This process is shown in FIG. 4 where dense feature maps obtained by shifting the classification window and extract neural patterns at center positions.  Producing dense neural patterns to a high-resolution image could be trivial by shifting the deep CNN model with 224.times.224 input over the larger image.  However, deeper convolutional networks are usually geometrically constrained.  For instance, they require extra padding to ensure the map sizes and borders work with strides and pooling of the next layer.  Therefore, the activation of a neuron on the fifth pad and stride values of deep neural are determined). Additionally, the rational and motivation to combine the references Wang and Lin as applied in claim 1 apply to this claim.

Regarding Claim 5, 
		The combination of Wang and Lin further discloses training, by the computing device, the at least one multiple regression model using a training data set including descriptors and corresponding padding and stride values.  (Wang, [0051], [0056], at step 616, a probability of each input being assigned to a class for a particular feature is calculated.  Resulted associated with each input associated with the image are averaged, at step 618.  At step 620, the class with the highest probability is selected.  In one embodiment, one or more features may be extracted from the image at one of the fully-connected layers.  In one embodiment, the last layer of the deep convolutional neural network may be replaced with a regression (i.e., a continuous output between 0 and 1).  In this instance, the cost function is the sum of L.sup.2 distance between the predicted network output NN(x) and the ground truth label y; an image is received, at step 812, from the set of images.  In one embodiment, a global image representation of the image is extracted as one or more global inputs to a second feature column of the RDCNN.  The image may be resized to create the global image representation.  In one embodiment, the image is resized by warping the image into a normalized input with a fixed size.  In one embodiment, the image is resized by normalizing its shorter side to a normalized input with a fixed length s and center-cropping the normalized input to generate a s.times.s.times.3 input; an architecture associated with each column in the RDCNN may comprise: a first convolutional layer that filters a 224.times.224.times.3 patch with 64 kernels of size 11.times.11.times.3 with a stride of 2 pixels; a second convolutional layer that filters output of the first convolutional layer with 64 kernels of pad and stride values for regression model are determined). Additionally, the rational and motivation to combine the references Wang and Lin as applied in claim 1 apply to this claim.

Regarding Claim 6, 
		The combination of Wang and Lin further discloses masking, by the computing device, a portion of the image.  (Wang, [0007], discloses the system provides fast object detection using powerful neural network features and Regionlets object detection framework.  Our system extracts shift invariant neural patterns from deep CNN and achieves excellent performance in object detection.  The system is a new example of transfer learning, i.e transferring the knowledge learned from large-scale image classification (in this case, ImageNet image classification) to generic object detection.  The system transfers the knowledge learned from a classification task to object detection by trickling high-level information in top convolutional layers in a deep CNN 
down to low-level image patches.  As a result, a typical PASCAL VOC image only 
needs to run the neural network several times to produce DNPs for the whole 
image depending on the required feature stride, promising low computational 
cost for feature extraction.  To adapt our features for the Regionlets framework, we build normalized histograms of DNPs inside each sub-region of arbitrary resolution within the sub-region features are extracted and processed and rest of image regions features are masked for processing). Additionally, the rational and motivation to combine the references Wang and Lin as applied in claim 1 apply to this claim.

Regarding Claim 7, 
		The combination of Wang and Lin further discloses wherein the classifying the image using the convolutional neural network comprises convolving a filter across the masked portion of the image. (Lin, [0036], discloses by setting s as 256, the size of I.sub.t is 256.times.256.times.3.  To alleviate overfitting in network training, for each normalized input I.sub.t, a random 224.times.224.times.3 patch I.sub.p or its horizontal reflection is extracted to be the input patch to the network.  The DCNN may include four convolutional layers, including a first convolutional layer that filters the 224.times.224.times.3 patch with 64 kernels of size 11.times.11.times.3 with a stride of two pixels.  A second convolutional layer may filter the output of the first convolutional layer with 64 kernels of size 5.times.5.times.64.  Each of the third and fourth convolutional layers may have 64 kernels of size 3.times.3.times.64; ). Additionally, the rational and motivation to combine the references Wang and Lin as applied in claim 1 apply to this claim.
Claims 8, 9, 13, 14 and 15 recite system with elements corresponding to the method steps recited in Claims 1, 2, 3, 6 and 7 respectively. Therefore, the recited elements of the system Claims 8, 9, 13, 14 and 15 are mapped to the proposed 
Furthermore, the combination of Wang and Lin further discloses A computer program product comprising a computer readable storage medium having program instructions embodied therewith, the program instructions executable by a computing device to cause the computing device (Wang, [0059], Fig. 8, discloses computer for Object Detection.  The system may be implemented in hardware, firmware or software, or a combination of the three.  Preferably the system is implemented in a computer program executed on a programmable computer having a processor, a data storage system, volatile and non-volatile memory and/or storage elements, at least one input device and at least one output device).

Regarding Claim 10, 
		The combination of Wang and Lin further discloses the program instructions further being executable by the computing device to cause the computing device to determine feature vectors from the image.  (Wang, [0019], discloses the system efficiently incorporates a deep neural network into conventional object detection frameworks using the Dense Neural Pattern (DNP), a local feature densely extracted from an image with arbitrary resolution using a well-trained deep convolutional neural network.  The DNPs not only encode high-level features learned from a large image data-set, but are also local and flexible like other dense local features.  It is easy to integrate DNPs into the conventional detection frameworks.  More specifically, the feature vectors are extracted from input images). Additionally, the rational and motivation to combine the references Wang and Lin as applied in claim 1 apply to this claim. 

Regarding Claim 11,
		The combination of Wang and Lin further disclsoes wherein the determining the first padding value and the first stride value comprises inputting the feature vectors into the neural network modeling padding and stride.  (Wang, [0007], [0046], [0053], advantages of the system may include one or more of the following.  The system provides fast object detection using powerful neural network features and Regionlets object detection framework.  Our system extracts shift invariant neural patterns from deep CNN and achieves excellent performance in object detection.  The system is a new example of transfer learning, i.e transferring the knowledge learned from large-scale image classification (in this case, ImageNet image classification) to generic object detection.  The system transfers the knowledge learned from a classification task to object detection by trickling high-level information in top convolutional layers in a deep CNN down to low-level image patches.  As a result, a typical PASCAL VOC image only 
needs to run the neural network several times to produce DNPs for the whole image depending on the required feature stride, promising low computational cost for feature  discloses now that a feature vector can be computed and localized, dense neural patterns can be obtained by network-convolution.  This process is shown in FIG. 4 where dense feature maps obtained by shifting the classification window and extract neural patterns at center positions.  Producing dense neural patterns to a high-resolution image could be trivial by shifting the deep CNN model with 224.times.224 input over the larger image.  However, deeper convolutional networks are usually geometrically constrained.  For instance, they require extra padding to ensure the map sizes and borders work with strides and pooling of the next layer.  Therefore, the activation of a neuron on the fifth convolutional layer may have been calculated on zero padded values.  This creates the inhomogeneous problem among neural patterns, implying that the same image patch may produce different activations.  Although this might cause tolerable inaccuracies for image classification, the problem could be detrimental to object detectors, which is evaluated by localization accuracy.  To rectify this concern, we only retain central 5.times.5 feature points of the feature map square.  In this manner, each model convolution generates 25 feature vectors with a 16.times.16 pixel stride.  In order to produce the dense neural patterns map for the whole image using the fifth convolutional layer, we convolve the deep CNN model every 80 pixels in both x and y direction.  Given a 640.times.480 image, it outputs 40.times.30 feature points which involves 8.times.6 model convolutions; the dense neural patterns 
feature points in both horizontal and vertical directions.  As illustrated in Fig. 6, a regionlet can cover multiple feature points or no feature point.  The illustration of FIG. 6 shows feature points, a detection window, regions, and regionlets.  Blue points represent dense neural patterns extracted in each spatial location.  FIG. 6 shows that a regionlet can spread across multiple feature points, or no feature point; pad and stride values of deep neural network are determined to extract features of relevance from input images). Additionally, the rational and motivation to combine the references Wang and Lin as applied in claim 1 apply to this claim.

Regarding Claim 12, 
		The combination of Wang and Lin further discloses wherein the determining the second padding value comprises inputting the feature vectors into the regression model for P201807079US0126 of 30padding, and the determining the second stride value comprises inputting the feature vectors into the regression model for stride. (Lin, [0022], [0051], Fig. 1, at step 616, a probability of each input being assigned to a class for a particular feature is calculated.  Resulted associated with each input associated with the image are averaged, at step 618.  At step 620, the class with the highest probability is selected.  In one embodiment, one or more features may be extracted from the image at one of the fully-connected regression model is disclosed)

Claims 16, 17, 18, 19 and 20 recite computer readable medium with program instructions corresponding to the method steps recited in Claims 1, 3, 4, 5, (6+7) 

		Furthermore, the combination of Wang and Lin further discloses A system with a hardware processor, a computer readable memory, and a computer readable storage medium associated with a computing device; program instructions to receive an image (Wang, [0059], Fig. 8, discloses computer for Object Detection.  The system may be implemented in hardware, firmware or software, or a combination of the three.  Preferably the system is implemented in a computer program executed on a programmable computer having a processor, a data storage system, volatile and non-volatile memory and/or storage elements, at least one input device and at least one output device). 

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
US 20170262735 A1
US 20180129899 A1
US 20200074625 A1

Any inquiry concerning this communication or earlier communications from the examiner should be directed to PINALBEN V PATEL whose telephone number is (571)270-5872.  The examiner can normally be reached on M-F: 10am - 8pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. 
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Vincent Rudolph can be reached on (571)272-8243. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/Pinalben Patel/Examiner, Art Unit 2661