DETAILED ACTION
Claims 1-20 have been examined.
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . 
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.

Claim Objections
Claims 2, 6, 9, 13, 16, and 20 are objected to because of the following informalities:
Claims 2, 9, and 16 recite “the first and second input documents.”  Claims 6, 13, and 20 recite “the second input document.”  There is insufficient antecedent basis for the second input document because parent claims 1, 8, and 15 does not recite a second document.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 2, 5, 6, 8, 9, 12, 13, 15, 16, 19, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Li et al. “A Convolutional Neural Network Cascade for Face Detection” .

As per claim 1, Li teaches the invention as claimed, including a computerized method of training and utilizing deep convolutional neural networks (DCNNs), the method comprising:
training a first DCNN to output a value for a first metric by inputting a plurality of sample documents to the first DCNN (i.e., training 12-calibration-net with sample images, 12 calibration-net outputs a vector of confidence scores, see at least page 5329-5330, section 3.4); 
feeding a plurality of possible transformations of a first input document to the first DCNN (i.e., detection windows are processed by 12-calibration-net, see at least page 5327, sections 3.1, 3.2.2), the first DCNN producing a value for the first metric for each of the plurality of possible transformations (i.e. calibration net outputs a vector of confidence scores, see at least page 5327, sections 3.2.2); 
selecting, by an optimizing problem solving component separate from the first DCNN, a first transformation from the plurality of possible transformations based on the values for the first metric for each of the plurality of possible transformations output by the first DCNN (i.e., Non-maximum suppression is applied to eliminate highly overlapped detection windows, iteratively select the detection window with the highest confidence score, see at least page 5329, section 3.2.7); 
training a second DCNN to output a transformation for a document by inputting the selected first transformation (i.e., 24-net is trained with mined negative training samples, detection windows from the 12-calibration-net are cropped out and resized into 24x24 images 
feeding a first input document to the second DCNN, obtaining a second transformation of the first input document (i.e., 24-net further reduce the number of detection windows, we also feed the input in 12 x 12 resolution to a sub-structure same as the 12-net in 24-net, see at least page 5327, section 3.2.2, page 5328, section 3.2.3).
Li does not explicitly teach of the sample documents having been labeled with a value for the first metric.
Erhan teaches sample documents having been labeled with a value for the first metric (i.e., updating values of parameters of the neural network using the bounding box data and the object location data associated with the training image, bounding box data including data that defines the predetermined number of candidate bounding boxes within the training image and the confidence score, see at least column 4, line 29 - column 5, line 14).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Li such that the sample documents having been labeled with a value for the first metric as similarly taught by Erhan because it would have been obvious to use known technique to improve similar methods in the same way. Erhan teaches using bounding boxes, confidence scores, and data locations for the training image to train a neural network (see at least column 4, line 29 - column 6, line 67 of Erhan), and thus it would have been obvious to use known technique taught by Erhan in Li to train Li’s neural network such that the neural network is optimized (see at least column 4, line 29 - column 6, line 67 of Erhan).

Majumdar teaches inputting the corresponding value for the first metric to the second DCNN (i.e., first classifier outputs a vector of scores indicating the confidence, second classifier receive output of the first classifier, see at least [0093], [0096]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Li to input the corresponding value for the first metric to the second DCNN as similarly taught by Majumdar because it would have been obvious to use known technique to improve similar methods in the same way.  Majumdar teaches that output from one classifier such as confidence score is useful in training another classifier (see at least [0093], [0096] of Majumdar) and thus it would have been obvious to use known technique taught by Majumdar in Li such that the confidence score from one neural network is inputted into another neural network to improve the functions of the neural networks.
Li does not explicitly teach a second DCNN to perform a transformation.
Zhou teaches a DCNN to perform a transformation (i.e., neural network level produces a cropped face image as output, see at least [0009]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Li such that a second DCNN perform a transformation as similarly taught by Zhou.  Li teaches the 24-net outputs detection windows and given a detection window, a region is cropped out.  While Li does not teach that the 24-net performs the cropping, Zhou teaches that it is known in the art that a neural network can crop images (see at least [0009]).  It would have been obvious to one of ordinary skill in the art that the cropping could have been performed by the 24-net as it is using known technique to improve 

As per claim 2, Li teaches wherein the sample documents and the first and second input documents are images (see at least pages 5327-5329, sections 3.1, 3.2).

As per claim 5, Li teaches wherein the plurality of possible transformations are different croppings of an image (i.e., detection windows are cropped out, see at least pages 5327-5329, sections 3.1, 3.2).

As per claim 6, Li teaches feeding a second plurality of possible transformations of a first input document to the first DCNN, obtaining a value for the first metric for each of the second plurality of possible transformations, the second plurality of possible transformations being of a different transformation type than the plurality of possible transformations (see at least page 5326-5330, section 3); 
selecting a third transformation from the second plurality of possible transformations based on the values for the first metric for each of the second plurality of possible transformations (see at least page 5326-5330, section 3); 
training a third DCNN to output another transformation for a document by inputting the selected third transformation to the second DCNN (see at least page 5326-5330, section 3); and 
feeding the second input document to the third DCNN, obtaining a fourth transformation of the second input document (see at least page 5326-5330, section 3).
As per claims 8, 9, 12, and 13, these are the system claims of claims 1, 2, 5, and 6.  Therefore, claims 8, 9, 12, and 13 are rejected using the same reasons as claims 1, 2, 5, and 6.

As per claims 15, 16, 19, and 20, these are the non-transitory machine-readable storage medium claims of claims 1, 2, 5, and 6.  Therefore, claims 15, 16, 19, and 20 are rejected using the same reasons as claims 1, 2, 5, and 6.

Claims 7 and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Li, in view of Erhan, further in view of Majumdar, further in view of Zhou, further in view of Pilly et al. (US 2017/0316311, hereinafter Pilly).

As per claim 7, Li teaches wherein the second DCNN comprises a different convolutional layer, and pooling layer (see at least page 5327, section 2, page 5328, Figure 2).
Li does not explicitly teach wherein the second DCNN comprises multiple stages, each stage containing a different convolutional layer, nonlinearity layer, and pooling layer.
Pilly teaches DCNN comprises multiple stages, each stage containing a different convolutional layer, nonlinearity layer, and pooling layer (see at least [0015], [0058]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Li such that DCNN comprises multiple stages, each stage containing a different convolutional layer, nonlinearity layer, and pooling layer as similarly taught by Pilly because Li does not limit the structure of its DCNN and it would have been obvious to utilize alternative implementations of DCNN known in the art.

As per claim 14, this is the system claim of claim 7.  Therefore, claim 14 is rejected using the same reasons as claim 7.

Allowable Subject Matter
Claims 3, 4, 10, 11, 17, and 18 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

Response to Arguments
Rejection of claims under §103: 
Applicant argued that Li fails to teach or suggest “feeding the first input document to the second DCNN, obtaining a second transformation of the first input document.”  Applicant argued that the only input to the 24-net are cropped detection windows from the 12-calibration net which are resized to 24x24 images.
Applicant’s arguments have been fully considered, and Examiner respectfully disagrees. The limitation of “feeding the first input document” does not require the whole first input document to be fed, and does not exclude the interpretation that part of the first input document is fed.  A cropped and resized section of an image is still part of the image.  Thus, feeding a cropped and resized section of a test image is sufficient to meet the recited limitation of “feeding the first input document.”  Alternatively, Li discloses the 12-calibration-net receives cropped regions resized to 12 x 12 as the input images (see at least page 5327, section 3.2.2), and 12x12 resolution is also fed to a sub-structure same as 12-net in 24-net (see at least page 5328, section 

Applicant argued that Li does not perform a transformation of an input document, but merely selects another possible transformation of the transformed image passed to it.  Applicant argued that 24-net only performs the evaluation of the already cropped images, and does not actually perform the cropping itself.  Applicant argued that there is no optimization of the cropping process itself in Li, whereas in the present claims, the neural network itself performs the transformation and is trained to perform that transformation, result in an optimized transformation process.
Applicant’s argument have been fully considered. Examiner respectfully disagrees that the claim as recited requires an optimized transformation process, and that Li must disclose optimization in the cropping itself to meet the claim limitation.  In Li, the neural network is trained to output detection windows, which are used to crop the image.  The detection windows specify the transformation to be performed to an image and performing a cropping of an image using a detection window is performing a transformation.  Prior art Zhou shows that a neural network can perform the cropping of images, and thus, Li is modifiable by Zhou such that Li’s neural network also perform the cropping of an image using the coordinates of the detection window.

Applicant argued that it would not have been obvious to combine Erhan with Li to teach the concept of the sample documents having been labeled with a value for the first metric. 
Applicant’s arguments have been fully considered, but Examiner respectfully disagrees.  Erhan discloses updating parameters of the neural network using bounding box data which includes confidence value (see at least column 4, lines 29-43).  Thus, the confidence value is used in training a neural network.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Jue Louie whose telephone number is 571-270-1655.  The examiner can normally be reached on M-F 9:30 am - 5:00pm (EST).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Li Zhen can be reached on 571-272-3768.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.



/Jue Louie/
Primary Examiner
Art Unit 2121