Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
 
Status of Claims
            The amendment filed on February 28, 2022 in response to the November 30, 2021 non-final Office action has been entered. The status of the claims is as follows:
Claims 1-3, 5-10, and 12-20 are currently pending of which claims 1-2, 8-9, and 15-16 are amended, and claims 4, 11, and 18 are canceled in the amendment.  
 
Response to Arguments
            The amendment and arguments filed on February 28, 2022 have been fully considered. The examiner’s response is delineated as follows.
(a)       Response to Arguments Regarding Rejection of Claims under 35 U.S.C. § 112(b): The rejections of claims 2, 6, 9, 13, and 16 under 35 U.S.C. § 112(b) in the previous non-final Office action are withdrawn in view of the amendment to claims.
(b)       Response to Arguments Regarding Rejection of Claims under 35 U.S.C. § 101:
Applicant’s arguments have been considered but are not persuasive. Therefore, the rejections of claims 1-3, 5-10, and 12-20 are maintained. 
(1)       Regarding Applicant’s arguments that “[a] court case can be utilized as an example for the analysis herein. The case is Speedtrack Inc. v. Amazon.com, Inc. (N.D. Cal. 2017) (‘Speedtrack’)”. The examiner disagrees and notes that patent eligibility is not an issue in the Nov. 21, 2017 opinion under Case No. 09-cv-04479-JSW by the U.S. District Court for the Northern District of California. The examiner further notes that on appeal, the Federal Circuit held that “Cross-Appellants also argued that the claims recite patent-ineligible subject matter …. But we need not here. Cross-Appellants state that if we ‘affirm[] the judgment of non-infringement, [they] will voluntarily dismiss their cross-appeal, because the ‘360 patent expired more than six years ago.’ Cross-Appellants’ Br. at 59. Therefore, we do not reach the cross-appeal.” Therefore, the cited case does not support Applicant’s arguments.  
(2)       Regarding Applicant’s argument that cites to ¶ [0061] of the present disclosure and argues that “[i]n the present claims, computers operate much more efficiently” and because “the claims feature additional elements reﬂecting an improvement in the functioning of a computer, or an improvement to other technology or technical ﬁeld”. The examiner finds this argument unpersuasive.  More specifically, what ¶ [0061] actually describes is that “[a] CNN model, if running at half of the original image size, can gain a remarkable computational savings of 75%.” That is, ¶ [0061] merely encompasses a CNN model running on half of the original image size with a claim of 75% saving on computation. Nonetheless, the claimed invention recites a first CNN running on “a downscaled input” and a second CNN running on the original input.  Even assuming arguendo that the claimed “downscaled input” refers to half of the original image size as described in ¶ [0061], which the Examiner disagrees due to the breadth of the claim language, this ¶ [0061] nevertheless fails to encompass the second CNN running on the original input. Further, the overall computational resources used with both the claimed first and second CNN’s are also unknown, and thus the question of whether the present claims actually enable computers to operate much more efficiently and hence “feature additional elements reflecting an improvement in the functioning of a computer, or an improvement to other technology or technical field” as alleged remains open and unanswered by the present disclosure, let alone ¶ [0061] upon which Applicant’s arguments rely. Further, ¶ [0061] explicitly describes that the aforementioned 75% computational savings come at the price of accuracy. The examiner notes that this compromise in computational accuracy in order to achieve computational savings can hardly support the claim that the claimed invention improves the functioning of a computer or another technology. Therefore, Applicant’s arguments that the claimed invention satisfy step 2 of the Alice framework are not persuasive. 
(3)       Regarding Applicant’s argument that the claimed invention integrates the judicial exception into a practical exception without any attempt to monopolize the judicial exception by drafting efforts, the examiner notes that the claim language generally recites running a first CNN on a downscaled input and running a second CNN on an original input, both of which cover, under their broadest reasonable interpretation, cover sweeping mathematical concepts, principles, and/or operations but for the recitation of the insignificant, generic, additional element – processor. The limitation downsampling the original input into a downsampled input, but for the recitation of the insignificant, generic, additional element of a processor, broadly covers down-sampling operations. The limitation merging outputs of the first and second CNNs, but for the recitation of the insignificant, generic, additional element of a processor, broadly covers any mathematical operations that combine outputs of two CNNs. Therefore, the broad claim language, unlike what Applicant argued, broadly encompasses any such mathematical concepts, principles, and/or operations pertaining to respectively running two CNNs on data having two different resolutions as well as any mathematical operations that combine outputs of two CNNs with insignificant additional elements. Therefore, the examiner notes that Applicant’s arguments are not supported by what is actually recited in the claims and are thus not persuasive. 
(4)       Regarding Applicant’s argument that there is a clear defining of the technological problem, and that the claims present a technical solution by utilizing two CNNs of varying complexity to process inputs which fully integrate the present claims as a whole into a practical application, the examiner notes that the search for a technological solution to a technological problem as in DDR Holdings and Amdocs does not, in and of itself, confer patent eligibility.  Rather, the U.S. Supreme Court clearly indicates that determining eligibility considers whether the claims purport to improve the functioning of the computer or other technology or technical field.  See Alice Corp. Pty. Ltd. v. CLS Bank Int’l, 573 U.S. 208, 225, 110USPQ2d 1976, 1984 (2014).  As the examiner noted immediately above, the neither the claimed invention nor ¶ [0061] relied upon by the Applicant shows that the claimed invention, as a whole, improves the functioning of a computer or another technical field.  Therefore, Applicant’s arguments are not persuasive.  
(c)      Response to Arguments Concerning Rejections of claims under 35 U.S.C. § 103:
(1)       Regarding Applicant’s argument that “Li discloses a CNN pipeline that feds forward images (example) for processing. However, there is no disclose [sic] as to a merger of results from the multiple CNN pipelines shown”, and that “[t]he outputs of the CNN appears to be fed into a higher level CNN in the pipeline instead of a merger taking place”, the examiner disagrees. 
The examiner first notes that what was recited in the original claim is “merging, by the processor, the output of the first CNN with the output of the second CNN”. As explicitly pointed out in the non-final Office action, Li’s processing the downscaled input (e.g., the “remaining detection windows” after over 92% rejection by Li’s 12-net and 12-calibration-net) with its second convolutional neural network (e.g., 24-net) teaches running a first CNN on the downscaled input, and Li’s processing a test image first at its 12-net teaches running a second CNN on the original input.  That is, the non-final Office action clearly indicates that Li’s 12-net and 12-calibration-net renders the claimed first CNN obvious, and that Li’s 24-net (which includes a separate 12-net and thus has more layers than the 12-net or the first CNN) renders the claimed second CNN obvious. 
Further, Applicant’s argument that the output of the CNN appears to be fed into a higher level CNN in the pipeline is not persuasive. More specifically, Li’s 24-net receives the output from its 12-net and 12-calibration-net as input, resizes such output into resized output, and uses the resized output as input to the 24-net and to the 12-net as explicitly shown in FIG. 2.  The output of the 12-net is then provided together with the output of the max pooling layer of the 24-net to the fully-connected layer in the 24-net to compute the output of the 24-net. Therefore, although Li does not use the word “merging” or “merger” as Applicant’s literal argument appears to argue, Li actually renders the limitation of merging the output of the first CNN with the output of the second CNN. Therefore, Applicant’s arguments are not persuasive. 
(2)       Regarding Applicant’s argument that “without the description of a merger taking place, there is no disclosure of a groupwise merger present in the Li reference”, the examiner notes that this newly added claimed limitation necessitates a revised search and is addressed in the rejection of claims under 35 U.S.C. § 103 below. 
(3)       Regarding Applicant’s argument that “Roblek and Chen fail to cure the deficiencies in Li”, the examiner notes that Applicant’s conclusory arguments merely constitute attorney’s arguments without any supporting factual evidence and are thus not persuasive. 
 
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
 
Claims 1-20 stand rejected under 35 U.S.C. 101 because the claimed invention is directed to a judicial exception without significantly more.
Step 1: Claims 1-7 are directed to the statutory category of processes; claims 8-14 are directed to the statutory category of machines; and claims 15-20 are directed to the statutory category of manufactures. 
 
                     Claim 1 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Judicial Exceptions: Step 2A – Prong One: 
But for the recitation of insignificant additional elements that are analyzed below in Additional Elements – Step 2A Prong Two & Step 2B, claim 1, under its broadest reasonable interpretation, recites the following judicial exceptions:
 
downsampling, by the processor, the original input into a downscaled input; (Mathematical concept / principle / algorithm: the examiner notes that this limitation is directed to the judicial exception of an abstract idea such as a mathematical principle or formula/formulae.  More specifically, this limitation is directed to a generic operation of resizing the width and/or height of an image or filtering high-frequency audio signal components with a low-pass filter and keeping a subset of the filtered audio signal components by an integer factor.  Both operations are directed to a respective mathematical concept/formula(e).  The Federal Circuit in Bilski v. Kappos, 561 U.S. 593, 611, 95 USPQ2d 1001, 1010 (2010) held that "Diehr explained that while an abstract idea, law of nature, or mathematical formula could not be patented, ‘an application of a law of nature or mathematical formula to a known structure or process may well be deserving of patent protection.’"  Therefore, this limitation is directed to an abstract idea and thus fails Prong One of Step 2A. See MPEP § 2106.04(a).)
 
running, by the processor, a first convolutional neural network (“CNN”) on the downscaled input; (Mathematical concept / principle / algorithm: The examiner notes that this limitation is directed to a generic matrix or tensorial operation of sliding a kernel across the input tensor and computing the tensorial product of the kernel and the input tensor.  This limitation is thus directed to a mathematical concept/formula(e).  The Federal Circuit in Bilski v. Kappos, 561 U.S. 593, 611, 95 USPQ2d 1001, 1010 (2010) held that "Diehr explained that while an abstract idea, law of nature, or mathematical formula could not be patented, ‘an application of a law of nature or mathematical formula to a known structure or process may well be deserving of patent protection.’"  Therefore, this limitation is directed to an abstract idea and thus fails Prong One of Step 2A. See MPEP § 2106.04(a).)
 
running, by the processor, a second CNN on the original input, (Mathematical concept / principle / algorithm: The examiner notes that this limitation is directed to a generic matrix or tensorial operation of sliding a kernel across the input tensor and computing the tensorial product of the kernel and the input tensor.  This limitation is thus directed to a mathematical concept/formula(e).  The Federal Circuit in Bilski v. Kappos, 561 U.S. 593, 611, 95 USPQ2d 1001, 1010 (2010) held that "Diehr explained that while an abstract idea, law of nature, or mathematical formula could not be patented, ‘an application of a law of nature or mathematical formula to a known structure or process may well be deserving of patent protection.’"  Therefore, this limitation is directed to an abstract idea and thus fails Prong One of Step 2A. See MPEP § 2106.04(a).)
 
 
merging, by the processor, the output of the first CNN with the output of the second CNN, (mental process: The examiner notes that this limitation, under its broadest reasonable interpretation, can be performed by a human. For example, a human can combine two results. This mental process has been held to be insufficient to satisfy Step 2A Prong One. See MPEP § 2106.04(a).)
 
wherein the merging is performed as a groupwise merger; and (Mathematical concept / principle / algorithm: The examiner notes that this limitation, under its broadest reasonable interpretation, is directed to mathematical concept or algorithm of concatenating two pieces of data.  This limitation is thus directed to a mathematical concept/formula(e).  The Federal Circuit in Bilski v. Kappos, 561 U.S. 593, 611, 95 USPQ2d 1001, 1010 (2010) held that "Diehr explained that while an abstract idea, law of nature, or mathematical formula could not be patented, ‘an application of a law of nature or mathematical formula to a known structure or process may well be deserving of patent protection.’"  Therefore, this limitation is directed to an abstract idea and thus fails Prong One of Step 2A. See MPEP § 2106.04(a).)

Additional Elements – Step 2A Prong Two: 
receiving, by a processor, an original input; (Insignificant extra-solution activity: the examiner notes that, this additional element, when analyzed individually, merely constitutes extra-solution activity (e.g., receiving an input) that has been held by the Federal Circuit as insufficient to integrate the claimed judicial exception into a practical application in CyberSource v. Retail Decisions, Inc., 654 F.3d 1366, 1375, 99 USPQ2d 1690, 1694 (Fed. Cir. 2011). See MPEP § 2106.05(b)(III).)
 
where the second CNN has fewer layers than the first CNN; (Mental process - an observation, judgment, and/or opinion by a human: The examiner notes that claim 13 merely recites an abstract idea – mental process such as an observation, judgment, and/or opinion by a human.  For example, a human can review network representations to count which network representation has fewer layers. See MPEP § 2106.04(a)(2)(III).)
 
providing a result, by the processor, following the merging of the outputs. (Insignificant extra-solution activity: the examiner notes that, this additional element, when analyzed individually and again as a whole, merely constitutes extra-solution activity (e.g., receiving an input) that has been held by the Federal Circuit as insufficient to integrate the claimed judicial exception into a practical application in CyberSource v. Retail Decisions, Inc., 654 F.3d 1366, 1375, 99 USPQ2d 1690, 1694 (Fed. Cir. 2011). See MPEP § 2106.05(b)(III).)
 
Additional Elements – Step 2A Prong Two & Step 2B: 
The examiner asserts that the additional elements do not amount to significantly more than the aforementioned judicial exception because the additional elements are mere well-known, routine, conventional computer functions and/or components and further because mere physicality or tangibility of an additional element or elements is not a relevant consideration in Step 2B. 
          More specifically, these additional elements, when analyzed as an ordered combination, merely recite generic computer components (e.g., “processor”) at a high level of generality (e.g., “by the processor”).  Therefore, these additional elements, when considered as an ordered combination, “[a]dd nothing … that is not already present when the steps are considered separately’" and simply recite intermediated settlement as performed by a generic computer", which has been held by the U.S. Supreme Court to be in sufficient to amount to significantly more than the claimed judicial exceptions in Alice Corp. 573 U.S. at 225 (citing Mayo, 566 U.S. at 79, 101 USPQ2d at 1972).
          For example, the claimed limitations receiving, by a processor, an original input and providing a result, by the processor, following the merging of the outputs merely recite the additional elements at a high level of generality (e.g., “by the processor” as recited) to perform the respective extra-solution activities of receiving input and generating output. This has been held by the Federal Circuit as insufficient to amount to significantly more than the claimed judicial exception in CyberSource v. Retail Decisions, Inc., 654 F.3d 1366, 1375, 99 USPQ2d 1690, 1694 (Fed. Cir. 2011).
          The remaining limitations of independent claim 1 merely respectively recite applying the judicial exceptions of generic computer functions of selecting a subset of data (e.g., downsampling) and performing repeated calculations (e.g., looping) of additions and multiplications pertaining to matrix or tensor elements (e.g., running convolutional neural networks and merging outputs) “by the processor”.  Therefore, these additional elements (“the processor”) add nothing to the claimed invention of claim 1 that is not already present when the steps of claim 1 are considered separately and simply recite a mathematical concept/formula(e) as performed by a generic computer processor.  This has been held by the U.S. Supreme Court to be in sufficient to amount to significantly more than the claimed judicial exceptions in Alice Corp. 573 U.S. at 225 (citing Mayo, 566 U.S. at 79, 101 USPQ2d at 1972).
Therefore, these additional elements, when analyzed as an ordered combination, merely constitute respective instructions to implement the claimed judicial exception to a generic computer and thus fail to amount to significantly more than the claimed judicial exception and thus fail to satisfy Step 2B.
Therefore, claim 1 is rejected under 35 U.S.C. § 101 for at least the foregoing reasons. 

                     Claim 8 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Judicial Exceptions: Step 2A – Prong One: 
Claim 13, as amended, recites the following limitations:
a memory; 
a processor communicatively coupled to the memory, the processor operable to execute instructions stored in the memory, the instructions causing the processor to: 
receive an original input; 
downsample the original input into a downscaled input; 
run a first convolutional neural network (“CNN”) on the downscaled input; 
run a second CNN on the original input, where the second CNN has fewer layers than the first CNN; 
merge the output of the first CNN with the output of the second CNN, 
wherein the merging is performed as a groupwise merger; and 
provide a result following the merging of the outputs. 
As it can be clearly seen from the reproduced claimed limitations, independent claim 8 recites identical or substantially limitations as independent claim 1 with the exceptions of a memory and a processor operable to execute instructions stored in the memory that are separately analyzed below in Prong Two of Step 2A and Step 2B below.
Therefore, the claimed invention of independent claim 8 is, like independent claim 1, also clearly directed to an abstract idea such as a mental process that could be performed by a human analog, with or without a physical aid and thus fails to satisfy prong one of step 2A.
Regarding the additional elements a memory and a processor operable to execute instructions stored in the memory, the examiner notes that, like claim 1, these additional elements of independent claim 13, when considered individually, merely constitute instructions to implement the respective judicial exceptions to the corresponding generic computer components (e.g., a processor communicatively coupled to the memory, the processor operable to execute instructions stored in the memory, the instructions causing the processor to: [perform judicial exceptions]) or merely recite extra-solution activities (e.g., receive an original input and provide a result) and thus fail to integrate the respectively claimed judicial exceptions into a practical application to satisfy Prong One of Step 2A. See MPEP § 2106.05(f).
Furthermore, like claim 1, the additional elements of independent claim 8, when analyzed as an ordered combination, merely constitute mere instructions to implement the respective judicial exceptions to the corresponding generic computer components.
For example, independent claim 8 merely recites a processor communicatively coupled to the memory, the processor operable to execute instructions stored in the memory, the instructions causing the processor to perform generic computer functions of selecting a subset of data (e.g., downsample), performing repeated calculations (e.g., looping) of additions and multiplications pertaining to matrix or tensor elements (e.g., run convolutional neural networks and merge outputs in a group merger), and extra-solution activities (e.g., receive an original input and provide a result following the merging of the outputs).  Therefore, these additional elements (“the processor”) add nothing to the claimed invention of claim 1 that is not already present when the steps of claim 8 are considered separately and simply recite a mathematical concept/formula(e) as performed by a generic computer processor.  This has been held by the U.S. Supreme Court to be in sufficient to amount to significantly more than the claimed judicial exceptions to satisfy step 2B in Alice Corp. 573 U.S. at 225 (citing Mayo, 566 U.S. at 79, 101 USPQ2d at 1972). See MPEP § 2106.05(I)(a)(iii).
Therefore, independent claim 8 is also rejected under 35 U.S.C. § 101 for at least the identical or substantially similar reasons as those for claim 1 presented above.
 
                     Claim 15 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Independent claim 15, as amended, recites the following limitations:
receiving, by a processor, an original input; 
downsampling, by the processor, the original input into a downscaled input; 
running, by the processor, a first convolutional neural network 
running, by the processor, a second CNN on the original input, where the second CNN has fewer layers than the first CNN; 
merging, by the processor, the output of the first CNN with the output of the second CNN, 
wherein the merging is performed as a groupwise merger; and 
providing a result, by the processor, following the merging of the outputs. 
As it can be clearly seen from the reproduced claimed limitations, independent claim 15 recites identical or substantially limitations as independent claim 1.
Therefore, the claimed invention of independent claim 15 is, like independent claim 1, also clearly directed to an abstract idea such as a mental process that could be performed by a human analog, with or without a physical aid and thus fails to satisfy prong one of step 2A.
Regarding the additional elements a processor and a computer readable storage medium having program instructions embodied therewith, the examiner notes that, like claim 1, these additional elements of independent claim 15, when considered individually, merely constitute instructions to implement the respective judicial exceptions to a generic computer or a generic processor or merely recite a generic processor performing extra-solution activities and thus fail to integrate the respectively claimed judicial exceptions into a practical application to satisfy Prong One of Step 2A. See MPEP § 2106.05(f).
Furthermore, like claim 1, the additional elements of independent claim 15, when analyzed as an ordered combination, merely constitute mere instructions to implement the respective judicial exceptions to the corresponding generic computer components.
For example, independent claim 15 merely recites a processor at a high level of generality (e.g., “by a processor”) to perform respective, generic computer functions of selecting a subset of data (e.g., downsampling), performing repeated calculations (e.g., looping) of additions and multiplications pertaining to matrix or tensor elements (e.g., running convolutional neural networks and merging outputs), and extra-solution activities (e.g., receiving an original input and providing a result following the merging of the outputs). 
Therefore, these additional elements (“the processor”) add nothing to the claimed invention of claim 1 that is not already present when the steps of claim 15 are analyzed separately and simply recite a mathematical concept/formula(e) as performed by a generic computer processor.  This has been held by the U.S. Supreme Court to be in sufficient to amount to significantly more than the claimed judicial exceptions to satisfy step 2B in Alice Corp. 573 U.S. at 225 (citing Mayo, 566 U.S. at 79, 101 USPQ2d at 1972). See MPEP § 2106.05(I)(a)(iii).
Therefore, independent claim 15 is also rejected under 35 U.S.C. § 101 for at least the identical or substantially similar reasons as those for claim 1 presented above.
 
         With respect to claim 2, claim 2 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. Claim 2 merely further narrows data (the input) is stored and recites the following non-bolded additional elements:
wherein the input comprises image data representing an image.
Judicial Exceptions: Step 2A – Prong One: 
wherein the input comprises image data representing an image. (mental process - an observation, judgment, and/or opinion by a human: The examiner notes that claim 2, under its merely further narrows data (the original input) by reciting what that the original data represents (“an image”). Nonetheless, the examiner asserts that merely describing what the original input is or represents (“an image”) does not render the ineligible judicial exceptions claimed in the base claim 1 eligible. See MPEP § 2106.04(a)(2)(III).)
Therefore, claim 2 also fails Prong One of Step 2A.

Additional Elements – Step 2A Prong Two & Step 2B: 
Claim 2 does not recite any additional elements, much less additional elements that integrate the claimed judicial exception into a practical application or that amount to significantly more than the claimed judicial exception.  Claim 2 is thus not patent eligible.
 
                     Regarding claims 9 and 16, claims 9 and 16 respectively recite identical or substantially similar limitations as those of claim 2.  Therefore, claims 9 and 16 are also rejected under 35 U.S.C. § 101, the same rationale presented immediately above applying.  
 
                     Regarding claim 3, claim 3 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Judicial Exceptions: Step 2A – Prong Two & Step 2B: 
Claim 3 recites the following additional elements:
providing the output of the first CNN as an input to the second CNN. (receiving, storing, or transmitting data over a network: More specifically, the additional elements recited in claim 3, when analyzed individually, merely generically recites transmitting an output of one network as the input to another network at a high level of generality and is thus merely using a computer or other machinery in its ordinary capacity for economic or other tasks (e.g., to receive, store, or transmit data) or simply adding a general-purpose computer or computer components after the fact to an abstract idea, which has been held by the Federal Circuit to be insufficient to integrate the claimed judicial exception to a practical application in TLI Communications LLC v. AV Auto, LLC, 823 F.3d 607, 613, 118 USPQ2d 1744, 1748 (Fed. Cir. 2016).  In addition, claim 3’s generically reciting the transmission of an output of one network as the input to another network at a high level of generality is similar to receiving and transmitting data over a network, which has been held by the Federal Circuit to be insufficient to show an improvement integrate the claimed judicial exception into a practical exception in buySAFE, Inc. v. Google, Inc., 765 F.3d 1350, 1355, 112 USPQ2d 1093, 1096 (Fed. Cir. 2014). See MPEP § 2106.05(f)(2). Therefore, these additional elements fail to satisfy Prong Two of Step 2A.
Further, the examiner further asserts that the additional elements fail to amount to significantly more than the aforementioned judicial exception and thus fail to satisfy Step 2B.  More specifically, the additional elements recited in claim 3, when analyzed as an ordered combination, merely generically recites transmitting an output of one network as the input to another network at a high level of generality and is thus merely using a computer or other machinery in its ordinary capacity for economic or other tasks (e.g., to receive, store, or transmit data) or simply adding a general-purpose computer or computer components after the fact to an abstract idea, which has been held by the Federal Circuit to be routine, conventional, and well-known activities previously known to the industry and is thus insufficient to amount to significantly more than the claimed judicial exception in TLI Communications LLC v. AV Auto, LLC, 823 F.3d 607, 613, 118 USPQ2d 1744, 1748 (Fed. Cir. 2016).  See MPEP § 2106.05(d)(II). 
            Therefore, claim 3 is also rejected under 35 U.S.C. §101 for at least the foregoing reasons.
 
                     Regarding claims 10 and 17, claims 10 and 17 respectively recite identical or substantially similar limitations as those of claim 3.  Therefore, claims 10 and 17 are also rejected under 35 U.S.C. § 101, same rationale presented immediately above applying.  
 
                     Regarding claim 5, claim 5 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Judicial Exceptions: Step 2A – Prong One: 
 Claim 5 recites the following limitations: 
wherein the result is an identification of an object. (Mental process - an observation, judgment, and/or opinion by a human: The examiner notes that claim 5 merely recites an abstract idea – mental process such as an observation, judgment, and/or opinion by a human.  For example, a human can review a result and determine what the result is (e.g., identification of an object therein). See MPEP § 2106.04(a)(2)(III).)

Additional Elements – Step 2A Prong Two & Step 2B: 
The examiner notes that claim 5 does not recite any additional element, much less ones that satisfy Step 2A Prong Two or Step 2B. 
            Therefore, claim 5 is also rejected under 35 U.S.C. §101 for at least the foregoing reasons.
 
                     Regarding claims 12 and 19:
Claims 12 and 19 respectively recite identical or substantially similar limitations as those of claim 5.  Therefore, claims 12 and 19 are also rejected under 35 U.S.C. § 101, same rationale presented immediately above applying.  
 
                     Regarding claim 6, claim 6 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Judicial Exceptions: Step 2A – Prong One: 
Claim 6 recites the following limitations:
wherein the input comprises audio data presenting an audio input. (Mental process - an observation, judgment, and/or opinion by a human: The examiner notes that claim 5 merely recites an abstract idea – mental process such as an observation, judgment, and/or opinion by a human.  For example, a human can review the input to identify that the input includes audio data). See MPEP § 2106.04(a)(2)(III).)

Additional Elements – Step 2A Prong Two & Step 2B: 
The examiner notes that claim 6 does not recite any additional element, much less ones that satisfy Step 2A Prong Two or Step 2B. 
            Therefore, claim 6 is also rejected under 35 U.S.C. §101 for at least the foregoing reasons.
 
         Regarding claim 13, claim 13 recites identical or substantially similar limitations as those of claim 6.  Therefore, claim 13 is also rejected under 35 U.S.C. § 101, same rationale presented immediately above applying.  
 
         Regarding claim 7, claim 7 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Judicial Exceptions: Step 2A – Prong One: 
Claim 7 recites the following limitations:
wherein the second CNN has a smaller feature map than the first CNN. (Mental Process:  the examiner notes that determining sizes of feature maps is also directed to a mental process that can be performed by a human analog by, for example, sliding a kernel (e.g., a 3x3 kernel) across an input (e.g., 12x12) at a stride value (e.g., stride 1).  The dimension of this feature map can be mentally calculated as a 10x10 activation map.  Therefore, the claimed limitations of claim 7 is like the “mathematical equation in the repetitively calculating step” that has been held to constitute a mental process by the U.S. Supreme Court in Mayo, 566 U.S. 66, 75-77, 101 USPQ2d 1961, 1967-68 (2012). See MPEP § 2106.04(a)(2)(III).)

Additional Elements – Step 2A Prong Two & Step 2B: 
The examiner notes that claim 7 does not recite any additional element, much less ones that satisfy Step 2A Prong Two or Step 2B. 
            Therefore, claim 7 is also rejected under 35 U.S.C. §101 for at least the foregoing reasons.
 
                     Regarding claims 14 and 20, claims 14 and 20 respectively recite identical or substantially similar limitations as those of claim 7.  Therefore, claims 14 and 20 are also rejected under 35 U.S.C. § 101, same rationale presented immediately above applying.  
 
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. 
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
 
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
 
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
 
                     Claim(s) 1-3, 5-10, 12-17, and 19-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Li et al., A Convolutional Neural Network Cascade for Face Detection (2015) (hereinafter Li) in view of Roblek et al. US PGPub 2017/0330586 with publication date of Nov. 16, 2017 (hereinafter Roblek) and further in view of Chen et al., Chen et al. Semantic Image segmentation with Deep Convolutional Nets and Fully Connected CRFs (June 7, 2016) (hereinafter Chen).
With respect to claim 1, Li teaches a computer-implemented method comprising: receiving an original input; (Li, ¶ 2, § 3.1, p. 5327: “Given a test image, the 12-net scans the whole image densely across different scales to quickly reject more than 90% of the detection windows.” Table 1 lists the cascade neural networks with the total number of “sliding window” and the “global NMS”.  The examiner notes that Li’s receiving a test image at the 12-net CNN teaches this limitation.)
downsampling, by the processor, the original input into a downscaled input; (Li, p. 5328, § 3.2.2, Last paragraph: “In our experiment, we observe that the 12-net and 12-calibration-net reject 92.7% detection windows while keeping 94.8% recall on FDDB (see Table 1).”   ¶ 1, § 3.2.3, p. 5328: “24-net is an intermediate binary classiﬁcation CNN to further reduce the number of detection windows. Remaining detection windows from the 12-calibration-net are cropped out and resized into 24 × 24 images and evaluated by the 24-net.”  The examiner notes that Li’s rejecting 92.7% of the total number of sliding windows in the test image with its 12-net and 12-calibration-net and forwarding the remaining 7.3% detection windows to the next convolutional neural network (e.g., 24-net) teaches down-sampling the original input (having 100% detection windows at Li’s 12-net) into 7.3% out of the total 100% detection windows and hence a downscaled input.)
 
running, by the processor, a first convolutional neural network ("CNN") on the downscaled input; (Li, ¶ 1, § 3.2.3, p. 5328: “24-net is an intermediate binary classiﬁcation CNN to further reduce the number of detection windows. Remaining detection windows from the 12-calibration-net are cropped out and resized into 24 × 24 images and evaluated by the 24-net.”  The examiner notes that Li’s processing the downscaled input (e.g., the “remaining detection windows” after over 92% rejection by Li’s 12-net and 12-calibration-net) with its second convolutional neural network (e.g., 24-net) teaches the above limitation.)
 
running, by the processor, a second CNN on the original input, where the second CNN has fewer layers than the first CNN; (Li, ¶ 2, § 3.1, p. 5327: “Given a test image, the 12-net scans the whole image densely across different scales to quickly reject more than 90% of the detection windows.” ¶ 2, § 3.2.3, p. 5328: “A similar shallow structure is chosen for time efﬁciency. Besides, we adopt a multi-resolution structure in the 24-net. In additional to the 24 × 24 input, we also feed the input in 12 × 12 resolution to a sub-structure same as the 12-net in 24-net.”The examiner notes that Li’s processing a test image first at its 12-net teaches running a second CNN on the original input. The examiner further notes that Li’s 24-net includes a sub-structure that is the same as the 12-net shows that Li’s second CNN (e.g., 12-net) has fewer layers than the first CNN (e.g., 24-net).)
merging, by the processor, the output of the first CNN with the output of the second CNN, (Li, “12-net” and “24-net” in FIG. 2, p. 5328:

    PNG
    media_image1.png
    188
    400
    media_image1.png
    Greyscale

The examiner notes that Li’s 24-net reproduced above receives the output from Li’s 12-net + 12-calibration-net, resizes the output, and uses the resized output as the input to the 24-net as well as the 12-net as shown in the upper portion of FIG. 2 reproduced above.  The output of the 12-net sub-structure in Li’s 24-net is then provided to the fully-connected layer (together with the output of the max-pooling layer) in the 24-net.  Therefore, Li teaches this limitation.)
providing a result, by the processor, following the merging of the outputs. (FIG. 2, “Labels 2 Classes face/non-face” in “24-net”.  The examiner notes that Li’s providing the classification output labels with its 24-net teaches this limitation.)
          Li thus teaches receiving an original input.  
In the same field of endeavor, Roblek further teaches:
receiving, by a processor, an original input (Roblek, ¶ [0011]: “Another innovative aspect of the subject matter described in this specification can be embodied in methods for processing data input through each of a plurality of layers of a neural network”. ¶ [0107]: “The processes and logic flows described in this specification can be performed by one or more programmable computers executing one or more computer programs to perform functions by operating on input data and generating output.” ¶ [0108]: “Computers suitable for the execution of a computer program include, by way of example, can be based on general or special purpose microprocessors or both, or any other kind of central processing unit.”)
Li and Roblek are analogous art because both pertain to object recognition with cascade neural networks.  It would have been obvious for a person of ordinary skill in the art prior to the effective filing date to combine Li’s “convolutional neural network cascade” (Li at Abstract) with Roblek’s “frequency domain features” (Roblek at ¶ [0003]) so that Li modified by Roblek can “achieve improved accuracy and more reliable feature extraction over other neural network systems since the system does not require that a hard choice be made about particular resolution for a given task.” (Roblek at ¶ [0030]).

Li modified by Roblek does not appear to explicitly teach: 
wherein the merging is performed as a groupwise merger; and 
 
Chen does, however, teach: 
wherein the merging is performed as a groupwise merger; and (Chen, p. 6, § 4.3, ¶ 1: “Specifically, we attach to the input image and the output of each of the ﬁrst four max pooling layers a two-layer MLP (ﬁrst layer: 128 3x3 convolutional ﬁlters, second layer: 128 1x1 convolutional ﬁlters) whose feature map is concatenated to the main network’s last layer feature map.” 
The examiner notes that according to ¶ [0058] of the present disclosure, merging includes adding original images to the merged information, and groupwise merging further includes concatenating features of multiple networks and, if needed, subsequently applies a convolution to fuse the features, Chen’s attaching the input images and the output of each of a plurality of layers to a two-layer network whose output (feature map) is concatenated to the main network’s last layer feature map thus teaches a groupwise merger.)
Li, Roblek, and Chen are analogous art because all three references pertain to digital object recognition with cascade neural networks.  
It would have been obvious for a person of ordinary skill in the art prior to the effective filing date to have modified Li’s “convolutional neural network cascade” (Li at Abstract) in view of Roblek’s “frequency domain features” (Roblek at ¶ [0003]) to incorporate Chen’s groupwise merger (Chen, supra). The modification increases the boundary localization accuracy and improves localization performance (Chen, p. 6, § 4.3, ¶ 1: “Following the promising recent results of (Hariharan et al., 2014a; Long et al., 2014) we have also explored a multi-scale prediction method to increase the boundary localization accuracy”; and “As discussed in the experimental section, introducing these extra direct connections from fine-resolution layers improves localization performance, yet the effect is not as dramatic as the one obtained with the fully-connected CRF.”)
 
With respect to claim 2, Li teaches the computer-implemented method of claim 1, and Li further teaches:
wherein the input comprises image data representing an image.   (Li, ¶ 2, § 3.1, p. 5327: “Given a test image, the 12-net scans the whole image densely across different scales to quickly reject more than 90% of the detection windows.”)
 
With respect to claim 3, Li teaches the computer-implemented method of claim 1, and Li further teaches:
providing the output of the first CNN as an input to the second CNN. (Li, ¶ 1, § 3.4.2, p. 5330: “The detection nets 12-net, 24-net and 48-net are trained following the cascade structure. We resize all training faces into 12 x 12 and randomly sample 200; 000 non-face patches from the background images to train the 12-net. We then apply a 2-stage cascade consists of the 12-net and 12-calibration-net on a subset of the AFLW images to choose a threshold T1 at 99% recall rate.”¶ 2, § 3.4.2, p. 5330: “Then we densely scan all background images with the 2-stage cascade. All detection windows with confidence score larger than T1 become the negative training samples for the 24-net. 24-net is trained with the mined negative training samples and all training faces in 24 x 24. After that, we follow the same process for the 4-stage cascade consists of the 12-net, 12-calibration-net, 24-net and 24-calibrationnet. We set the threshold T2 to keep 97% recall rate.”
 “24-net” in FIG. 2, p. 5328. 

    PNG
    media_image2.png
    200
    400
    media_image2.png
    Greyscale

The examiner notes that Li’s approach starts with the 12-net in the 2-stage cascade (12-net and 12-calibration net) then proceeds to another 2-stage cascade (24-net and 24-calibration-net). More importantly, after the 2-stage cascade of 24-net and 24-calibration net, Li’s approach proceeds to the 4-stage cascade starting with the 12-net again.  Therefore, Li’s output of the 24-net (the first CNN) during the 2-stage cascade training of the 24-net + 24-cal-net is forwarded to the input of its 12-net in the subsequent (“After that”) 4-stage cascade that begins with Li’s 12-net again. As such, the examiner asserts that Li teaches the above limitation.)
 
With respect to claim 5, Li teaches the computer implemented method of claim 1, and Li further teaches:
wherein the result is an identification of an object.  (Li, Abstract: “In real-world face detection, large visual variations, such as those due to pose, expression, and lighting, demand an advanced discriminative model to accurately differentiate faces from the backgrounds. Consequently, effective models for the problem tend to be computationally prohibitive. To address these two conflicting challenges, we propose a cascade architecture built on convolutional neural networks (CNNs) with very powerful discriminative capability, while maintaining high performance.” The examiner notes that Li’s detecting faces with its CNNs teaches the claimed limitation.)
 
With respect to claim 7, Li teaches the computer implemented method of claim 1, and Li further teaches:
wherein the second CNN has a smaller feature map than the first CNN. (Li, FIG. 2, 12-net with “3 channels 12 x 12” input window of image that is processed by “16 3x3 filters stride 1”.  24-net with “3 channels 24x24” input window of image that is processed by “64 5x5 filters stride 1”. The examiner notes that for 12-net, applying a 3x3 filter to a 12x12x3 image (w x h x depth) at stride 1 produces a 9x9 activation map (feature map).  Therefore, applying 16 3x3 filters at stride 1 produces 16 10x10 feature maps stacked. For 24-net, applying a 5x5 filter to a 24x24x3 image (w x h x depth) at stride 1 produces a 20x20 activation map (feature map).  Therefore, applying 64 5x5 filters at stride 1 produces 64 20x20 feature maps stacked.  Therefore, the second CNN (the 12-net that processes the original input) has a smaller feature map than the first CNN (the 24-net that processes the downscaled input).)
 
With respect to claim 8, Li teaches:
A system to: receive an original input; (Li, ¶ 4, § 4.3, p. 5332: “a personal workstation”.  ¶ 2, § 3.1, p. 5327: “Given a test image, the 12-net scans the whole image densely across different scales to quickly reject more than 90% of the detection windows.” Table 1 lists the cascade neural networks with the total number of “sliding window” and the “global NMS”.  The examiner notes that Li’s receiving a test image at the 12-net CNN teaches this limitation.)
 
downsample the original input into a downscaled input; (Li, Last paragraph, § 3.2.2, p. 5328: “In our experiment, we observe that the 12-net and 12-calibration-net reject 92.7% detection windows while keeping 94.8% recall on FDDB (see Table 1).”   ¶ 1, § 3.2.3, p. 5328: “24-net is an intermediate binary classiﬁcation CNN to further reduce the number of detection windows. Remaining detection windows from the 12-calibration-net are cropped out and resized into 24 × 24 images and evaluated by the 24-net.”  The examiner notes that Li’s rejecting 92.7% of the total number of sliding windows in the test image with its 12-net and 12-calibration-net and forwarding the remaining 7.3% detection windows to the next convolutional neural network (e.g., 24-net) teaches down-sampling the original input (having 100% detection windows at Li’s 12-net) into 7.3% out of the total 100% detection windows and hence a downscaled input.)
 
run a first convolutional neural network ("CNN") on the downscaled input; (Li, ¶ 1, § 3.2.3, p. 5328: “24-net is an intermediate binary classiﬁcation CNN to further reduce the number of detection windows. Remaining detection windows from the 12-calibration-net are cropped out and resized into 24 × 24 images and evaluated by the 24-net.”  The examiner notes that Li’s processing the downscaled input (e.g., the “remaining detection windows” after over 92% rejection by Li’s 12-net and 12-calibration-net) with its second convolutional neural network (e.g., 24-net) teaches the above limitation.)
 
run a second CNN on the original input, where the second CNN has fewer layers than the first CNN; (Li, ¶ 2, § 3.1, p. 5327: “Given a test image, the 12-net scans the whole image densely across different scales to quickly reject more than 90% of the detection windows.” ¶ 2, § 3.2.3, p. 5328: “A similar shallow structure is chosen for time efﬁciency. Besides, we adopt a multi-resolution structure in the 24-net. In additional to the 24 × 24 input, we also feed the input in 12 × 12 resolution to a sub-structure same as the 12-net in 24-net.”The examiner notes that Li’s processing a test image first at its 12-net teaches running a second CNN on the original input. The examiner further notes that Li’s 24-net includes a sub-structure that is the same as the 12-net shows that Li’s second CNN (e.g., 12-net) has fewer layers than the first CNN (e.g., 24-net).)
 
merge the output of the first CNN with the output of the second CNN, (Li, “12-net” and “24-net” in FIG. 2, p. 5328:

    PNG
    media_image2.png
    200
    400
    media_image2.png
    Greyscale

The examiner notes that Li’s 24-net reproduced above receives the output from Li’s 12-net + 12-calibration-net, resizes the output, and uses the resized output as the input to the 24-net as well as the 12-net as shown in the upper portion of FIG. 2 reproduced above.  The output of the 12-net sub-structure in Li’s 24-net is then provided to the fully-connected layer (together with the output of the max-pooling layer) in the 24-net.  Therefore, Li teaches this limitation.)
 
provide a result following the merging of the outputs. (Li, FIG. 2, “Labels 2 Classes face/non-face” in “24-net”.  The examiner notes that Li’s providing the classification output labels with its 24-net teaches this limitation.)
          Li teaches a system to receive an original input.  
In the same field of endeavor, Roblek teaches:
A system comprising: a memory; a processor communicatively coupled to the memory, the processor operable to execute instructions stored in the memory, the instructions causing the processor to: receive an original input; (Roblek,  ¶ [0011]: “Another innovative aspect of the subject matter described in this specification can be embodied in methods for processing data input through each of a plurality of layers of a neural network”. ¶ [0107]: “The processes and logic flows described in this specification can be performed by one or more programmable computers executing one or more computer programs to perform functions by operating on input data and generating output.” ¶ [0108]: “Computers suitable for the execution of a computer program include, by way of example, can be based on general or special purpose microprocessors or both, or any other kind of central processing unit.”)
Li and Roblek are analogous art because both pertain to object recognition with cascade neural networks.  It would have been obvious for a person of ordinary skill in the art prior to the effective filing date to combine Li’s “convolutional neural network cascade” (Li at Abstract) with Roblek’s “frequency domain features” (Roblek at ¶ [0003]) so that Li modified by Roblek can “achieve improved accuracy and more reliable feature extraction over other neural network systems since the system does not require that a hard choice be made about particular resolution for a given task.” (Roblek at ¶ [0030]).
Li modified by Roblek does not appear to explicitly teach: 
wherein the merging is performed as a groupwise merger; and 
 
Chen does, however, teach: 
wherein the merging is performed as a groupwise merger; and (Chen, p. 6, § 4.3, ¶ 1: “Specifically, we attach to the input image and the output of each of the ﬁrst four max pooling layers a two-layer MLP (ﬁrst layer: 128 3x3 convolutional ﬁlters, second layer: 128 1x1 convolutional ﬁlters) whose feature map is concatenated to the main network’s last layer feature map.” 
The examiner notes that according to ¶ [0058] of the present disclosure, merging includes adding original images to the merged information, and groupwise merging further includes concatenating features of multiple networks and, if needed, subsequently applies a convolution to fuse the features, Chen’s attaching the input images and the output of each of a plurality of layers to a two-layer network whose output (feature map) is concatenated to the main network’s last layer feature map thus teaches a groupwise merger.)
Li, Roblek, and Chen are analogous art because all three references pertain to digital object recognition with cascade neural networks.  
It would have been obvious for a person of ordinary skill in the art prior to the effective filing date to have modified Li’s “convolutional neural network cascade” (Li at Abstract) in view of Roblek’s “frequency domain features” (Roblek at ¶ [0003]) to incorporate Chen’s groupwise merger (Chen, supra). The modification increases the boundary localization accuracy and improves localization performance (Chen, p. 6, § 4.3, ¶ 1: “Following the promising recent results of (Hariharan et al., 2014a; Long et al., 2014) we have also explored a multi-scale prediction method to increase the boundary localization accuracy”; and “As discussed in the experimental section, introducing these extra direct connections from fine-resolution layers improves localization performance, yet the effect is not as dramatic as the one obtained with the fully-connected CRF.”)
 
With respect to claim 9, it is substantially similar to claim 2 and is rejected in the same manner, the same art and reasoning applying. 
 
With respect to claim 10, it is substantially similar to claim 3 and is rejected in the same manner, the same art and reasoning applying. 
 
With respect to claim 12, it is substantially similar to claim 5 and is rejected in the same manner, the same art and reasoning applying. 
 
With respect to claim 13, it is substantially similar to claim 6 and is rejected in the same manner, the same art and reasoning applying. 
 
With respect to claim 14, it is substantially similar to claim 7 and is rejected in the same manner, the same art and reasoning applying. 
 
With respect to claim 15, Li teaches:
a computer program for multiscale representation of image data, comprising: receiving an original input; (Li, Abstract: “we propose a cascade architecture built on convolutional neural networks (CNNs) with very powerful discriminative capability”. ¶ 2, § 3.1, p. 5327: “Given a test image, the 12-net scans the whole image densely across different scales to quickly reject more than 90% of the detection windows.” Table 1 lists the cascade neural networks with the total number of “sliding window” and the “global NMS”.  The examiner notes that Li’s receiving a test image at the 12-net CNN teaches this limitation.)
 
downsampling, by the processor, the original input into a downscaled input; (Li, Last paragraph, § 3.2.2, p. 5328: “In our experiment, we observe that the 12-net and 12-calibration-net reject 92.7% detection windows while keeping 94.8% recall on FDDB (see Table 1).”   ¶ 1, § 3.2.3, p. 5328: “24-net is an intermediate binary classiﬁcation CNN to further reduce the number of detection windows. Remaining detection windows from the 12-calibration-net are cropped out and resized into 24 × 24 images and evaluated by the 24-net.”  The examiner notes that Li’s rejecting 92.7% of the total number of sliding windows in the test image with its 12-net and 12-calibration-net and forwarding the remaining 7.3% detection windows to the next convolutional neural network (e.g., 24-net) teaches down-sampling the original input (having 100% detection windows at Li’s 12-net) into 7.3% out of the total 100% detection windows and hence a downscaled input.)
running, by the processor, a first convolutional neural network ("CNN") on the downscaled input; (Li, ¶ 1, § 3.2.3, p. 5328: “24-net is an intermediate binary classiﬁcation CNN to further reduce the number of detection windows. Remaining detection windows from the 12-calibration-net are cropped out and resized into 24 × 24 images and evaluated by the 24-net.”  The examiner notes that Li’s processing the downscaled input (e.g., the “remaining detection windows” after over 92% rejection by Li’s 12-net and 12-calibration-net) with its second convolutional neural network (e.g., 24-net) teaches the above limitation.)
 
running, by the processor, a second CNN on the original input, where the second CNN has fewer layers than the first CNN; (Li, ¶ 2, § 3.1, p. 5327: “Given a test image, the 12-net scans the whole image densely across different scales to quickly reject more than 90% of the detection windows.” ¶ 2, § 3.2.3, p. 5328: “A similar shallow structure is chosen for time efﬁciency. Besides, we adopt a multi-resolution structure in the 24-net. In additional to the 24 × 24 input, we also feed the input in 12 × 12 resolution to a sub-structure same as the 12-net in 24-net.”The examiner notes that Li’s processing a test image first at its 12-net teaches running a second CNN on the original input. The examiner further notes that Li’s 24-net includes a sub-structure that is the same as the 12-net shows that Li’s second CNN (e.g., 12-net) has fewer layers than the first CNN (e.g., 24-net).)
merging, by the processor, the output of the first CNN with the output of the second CNN, (Li, “12-net” and “24-net” in FIG. 2, p. 5328:

    PNG
    media_image2.png
    200
    400
    media_image2.png
    Greyscale

The examiner notes that Li’s 24-net reproduced above receives the output from Li’s 12-net + 12-calibration-net, resizes the output, and uses the resized output as the input to the 24-net as well as the 12-net as shown in the upper portion of FIG. 2 reproduced above.  The output of the 12-net sub-structure in Li’s 24-net is then provided to the fully-connected layer (together with the output of the max-pooling layer) in the 24-net.  Therefore, Li teaches this limitation.)
providing a result, by the processor, following the merging of the outputs.  (Li, FIG. 2, “Labels 2 Classes face/non-face” in “24-net”.  The examiner notes that Li’s providing the classification output labels with its 24-net teaches this limitation.)
          Li thus teaches a computer program for multiscale representation of image data, comprising: receiving an original input.  Li does not appear to explicitly teach:
A computer program product, the computer program product comprising a computer readable storage medium having program instructions embodied therewith, the program instructions executable by a computer, to cause the computer to perform a method comprising:
In the same field of endeavor, Roblek does, however, teach:
A computer program product, the computer program product comprising a computer readable storage medium having program instructions embodied therewith, the program instructions executable by a computer, to cause the computer to perform a method comprising: (¶ [0103]: “Embodiments of the subject matter described in this specification can be implemented as one or more computer programs, i.e., one or more modules of computer program instructions encoded on a tangible non transitory program carrier for execution by, or to control the operation of, data processing apparatus.” “The computer storage medium can be a machine-readable storage device, a machine-readable storage substrate, a random or serial access memory device, or a combination of one or more of them.”)
In addition, Roblek further teaches:
receiving, by a processor, an original input (Roblek, ¶ [0011]: “Another innovative aspect of the subject matter described in this specification can be embodied in methods for processing data input through each of a plurality of layers of a neural network”. ¶ [0107]: “The processes and logic flows described in this specification can be performed by one or more programmable computers executing one or more computer programs to perform functions by operating on input data and generating output.” ¶ [0108]: “Computers suitable for the execution of a computer program include, by way of example, can be based on general or special purpose microprocessors or both, or any other kind of central processing unit.”)
Li and Roblek are analogous art because both pertain to object recognition with cascade neural networks.  It would have been obvious for a person of ordinary skill in the art prior to the effective filing date to combine Li’s “convolutional neural network cascade” (Li at Abstract) with Roblek’s “frequency domain features” (Roblek at ¶ [0003]) so that Li modified by Roblek can “achieve improved accuracy and more reliable feature extraction over other neural network systems since the system does not require that a hard choice be made about particular resolution for a given task.” (Roblek at ¶ [0030]).
Li modified by Roblek does not appear to explicitly teach: 
wherein the merging is performed as a groupwise merger; and 
 
Chen does, however, teach: 
wherein the merging is performed as a groupwise merger; and (Chen, p. 6, § 4.3, ¶ 1: “Specifically, we attach to the input image and the output of each of the ﬁrst four max pooling layers a two-layer MLP (ﬁrst layer: 128 3x3 convolutional ﬁlters, second layer: 128 1x1 convolutional ﬁlters) whose feature map is concatenated to the main network’s last layer feature map.” 
The examiner notes that according to ¶ [0058] of the present disclosure, merging includes adding original images to the merged information, and groupwise merging further includes concatenating features of multiple networks and, if needed, subsequently applies a convolution to fuse the features, Chen’s attaching the input images and the output of each of a plurality of layers to a two-layer network whose output (feature map) is concatenated to the main network’s last layer feature map thus teaches a groupwise merger.)
Li, Roblek, and Chen are analogous art because all three references pertain to digital object recognition with cascade neural networks.  
It would have been obvious for a person of ordinary skill in the art prior to the effective filing date to have modified Li’s “convolutional neural network cascade” (Li at Abstract) in view of Roblek’s “frequency domain features” (Roblek at ¶ [0003]) to incorporate Chen’s groupwise merger (Chen, supra). The modification increases the boundary localization accuracy and improves localization performance (Chen, p. 6, § 4.3, ¶ 1: “Following the promising recent results of (Hariharan et al., 2014a; Long et al., 2014) we have also explored a multi-scale prediction method to increase the boundary localization accuracy”; and “As discussed in the experimental section, introducing these extra direct connections from fine-resolution layers improves localization performance, yet the effect is not as dramatic as the one obtained with the fully-connected CRF.”)
 
With respect to claim 16, it is substantially similar to claim 2 and is rejected in the same manner, the same art and reasoning applying. 
 
With respect to claim 17, it is substantially similar to claim 3 and is rejected in the same manner, the same art and reasoning applying. 
 
With respect to claim 19, it is substantially similar to claim 5 and is rejected in the same manner, the same art and reasoning applying. 
 
With respect to claim 20, it is substantially similar to claim 7 and is rejected in the same manner, the same art and reasoning applying. 
 
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
         Krizhevsky et al., ImageNet Classiﬁcation with Deep Convolutional Neural Networks (2012) teaches teach a pairwise merger and describes a net that includes eight layers with weights where the first five layers are convolutional layers and the remaining three layers are fully connected, and an intermediate fully connected layer performs tensorial operations on the input with the corresponding weights to compute the sum of weighted inputs and forwards the sum to the following convolutional or fully connected layer as an input that is further multiplied with their respective weights for computing a successive sum of weighted inputs. The output of the last fully-connected layer is then fed to a 1000-way softmax which produces a distribution over the 1000 class labels.
         Ghodrati et al., DeepProposals: Hunting Objects and Actions by Cascading Deep Convolutional Layers (March 15, 2017) teaches an inverse cascade that, going backward from the later to the earlier convolutional layers of the CNN, selects the most promising locations and reﬁnes them in a coarse-to-ﬁne manner.

Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ERICH C. TZOU whose telephone number is (571)272-9852. The examiner can normally be reached Monday-Friday 6:00AM-5:00PM PST with alternative Fridays off.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ann J. Lo can be reached on 571-272-9767. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/E.C.T./Examiner, Art Unit 2126  
/ANN J LO/Supervisory Patent Examiner, Art Unit 2126