DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claims 1-10, 21- 25 are presented for examination in the amendments dated 11/30/2021. 
Election/Restrictions
Newly submitted claims 21-22, 24 and 25 directed to an invention that is independent or distinct from the invention originally claimed for the following reason(s):
In response a restriction requirement dated 05/20/2021, Applicant had made clear on record that claims 1-11 are elected without traverse and withdrawn claims 12-20.
The original claim 12 that is not elected is as below:

12. 	A computing system comprising:

one or more processors; and

at least one tangible, non-transitory computer-readable medium that stores a convolutional neural network implemented by the one or more processors, the convolutional neural network comprising:

a plurality of convolutional blocks, each of the plurality of convolutional blocks configured to receive an input and generate an output, at least one of the plurality of convolutional blocks comprising:

a projection separable convolutional layer configured to apply a depthwise convolution and a pointwise convolution during processing of an input to the projection separable convolutional layer to generate an output of the projection separable convolutional layer, and wherein the output of the projection separable convolutional layer has a depth dimension that is less than a depth dimension of the input of the projection separable convolutional layer;

an activation layer configured to receive the output of the projection separable convolutional layer and generate an input for an expansion separable convolutional layer;

the expansion separable convolutional layer configured to apply a depthwise convolution and a pointwise convolution during processing of the input for the expansion separable convolutional layer to generate an output of the expansion separable convolutional layer, and wherein the output of the expansion separable convolutional layer has a depth dimension that is greater than a depth dimension of the input of the expansion separable convolutional layer, and wherein the depthwise convolution of at least one of the projection separable convolutional layer or the expansion separable convolutional layer is applied with a kernel size that is greater than 3 x 3; and

a residual shortcut connection from the input of the projection separable convolutional layer to the output of expansion separable convolutional layer.

The Examiner has noticed that claims 21-24 however are now added back to the claims containing embodiment of claims 12-20.  
Notwithstanding the fact that this raises concern over whether there is support for combining two separate embodiments into one, this practice is improper under provision for Restriction by Original Presentation (See at least MPEP § 821.03). 
Since applicant has received an action on the merits for the originally presented invention, this invention has been constructively elected by original presentation for prosecution on the merits.  Accordingly, claims 21-24 are thus withdrawn from consideration as being directed to a non-elected invention.  See 37 CFR 1.142(b) and MPEP § 821.03. Amendments will be carefully screened for similar issue for the instant application to ensure proper compliance. 
Response to Arguments
Applicant's arguments filed 11/30/2021 have been fully considered but they are not persuasive. 
In the arguments, Applicant challenges reference Fu (WO 2019/213459) pertaining the limitation of “kernel size that is greater than 3x3”.
Fu discloses in at least ¶0078 an example of kernel size of 3x3, and makes very clear that “it should be understood that kernel size is not limited to 3 and can be of any suitable kernel size”.
Applicant asserts that the USPTO should consider what “suitable” constitutes, and further asserts that conventional belief has it a small kernel size of 3x3 is encouraged for maximized processing speed.
Applicant cites their own Specification ¶0021 as an evidence for the assertion and conclude that Fu’s “suitable” kernel size should be 3 or less.
The Examiner respectfully disagrees with this line of reasoning.
It is improper to import the instant Specification for purpose of claim interpretation and prior art interpretation.  Although the claims are interpreted in light of the specification, limitations from the specification are not read into the claims.  See In re Van Geuns, 988 F.2d 1181, 26 USPQ2d 1057 (Fed. Cir. 1993).
If Applicant would like to challenge the term “suitable” size to mean 3 or less, evidences should be from Fu reference itself.  Absent of such objective evidence, the assertion above is merely Applicant’s own subjective opinion.  
 If Applicant were to assert that Fu teaches away from “greater than 3x3” size, the evidence must be from Fu itself, and NOT from Applicant’s own Specification. It is reminded "the prior art’s mere disclosure of more than one alternative does not constitute a teaching away from any of these alternatives because such disclosure does not criticize, discredit, or otherwise discourage the solution claimed…." In re Fulton, 391 F.3d 1195, 1201, 73 USPQ2d 1141, 1146 (Fed. Cir. 2004).
Granted the proposition that conventional knowledge encourages small kernel size of 3x3 for processing speed.  However, this design’s goal is suitable for only processing speed aspect, but not necessary other goals such as quality-related goals (loss, consistency, accuracy), thus should not be taken as a general preference for all intentions and purposes
As a matter of fact, larger sizes of kernel size have their own goal/advantages that are suitable to each individual’s preference/design goal. Larger size is necessary if the input dimension are significant enough that smaller size kernel would have missed out features.
For example, in Chao et al. (Large Kernel Matters – Improve Semantic Segmentation by Global Convolution Network) – 2017, Abstract and section 4.1.1, it is found that larger kernel sizes yields improved performance.
As such, the term “suitable” in Fu should be interpreted in an open-minded view of a multitudes of various individual design goals instead of simply “processing speed”. 
The argument is thus not persuasive.


Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.

3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1-10 is/are rejected under 35 U.S.C. 103 as being unpatentable over Howard et al. (US 2019/0147318) in view of Fu et al. (WO2019/213459).
As to claim 1:
Howard discloses computing system (Fig. 4) comprising:
at least one processor;
at least one tangible, non-transitory computer-readable medium (See ¶0013, a computing system with processor and CRM) that stores a convolutional neural network implemented by the one or more processors, the convolutional neural network configured to receive an input that describes a image (See ¶0014, input image), and in response to receipt of the input generate an output that describes a content depicted in the image (See Fig. 4,  and also all of Figs. 3, ¶0040, 0043, producing an output of the input image at the end block of the system)
the convolutional neural network comprising:
a plurality of convolutional blocks, each of the plurality of convolutional blocks configured to receive an input and generate an output, (Fig. 4, ¶0101, 3A, ¶0086-0088, at least blocks 304, 306 among the plurality blocks, which receive an input of the preceding block and generate an output as seen)
at least one of the plurality of convolutional blocks ((Fig. 3A, ¶0086-0088. Also Fig. 4, ¶0101) comprising:
one or more separable convolutional layers (Abstract, separable layers) configured to apply a

generate the output, (See ¶0087, 0088, 0092, pointwise and depthwise 306 and 304 respectively, see Fig. 3A. Also Fig. 4, ¶0101) and wherein the depthwise convolution is applied with a kernel size  (See ¶0110, a standard kernel size can be used)
and
a residual shortcut connection from the respective input of at least one of the plurality of convolutional blocks to the output of the at least one of the plurality of convolutional blocks. (See Abstract, ¶0013, 0100, residual shortcut connection exist between the convolutional blocks to pass information from the preceding to the subsequent block’s output, for example shortcut 408)

Howard however does not explicitly disclose the input image containing a face which to be processed by the system to generate an output to describe a face. And nor does Howard this close the kernel size being greater than 3 x 3.

Fu in the same field of endeavor discloses a cascading structure of convolutional network (See abstract, ¶0012) in which the input image containing a face (¶0004, 0179, Fig. 1A and 1B) to be processed and output facial landmarks, which depicts the face. Fu also discloses kernel size can, similar to Howard, be any of 1x1 or 3 x 3, but is not limited to size of 3.  The size in fact can any any suitable kernel size as the operator see fits (See at least ¶0078).

It would have been obvious to one of ordinary skill in the art before the effective filing time of the invention that Howard system can be used for facial image processing with a kernel size Howard is in fact disclosing object extraction in general (¶0128), thus inclusive of facial recognition/extraction.  Fu made clear in ¶0004, that such object can be a face (¶0004), thus such implementation of Howard’s system to facial recognition would be appreciated due to the prevalence and popularity for facial recognition in, for example, biometric authentication or personnel tracking for example. Pertaining the kernel size, the Examiner asserts that the choice of kernel size is merely a routine in the art, rather than an inventive step since Fu made clear that the size can be anything other than the common used size so long as the operator finds it suitable (¶0078), thus this is a matter of preference and/or design constrain. 

As to claim 2:
Howard in view of Fu discloses all limitations of claim 1, wherein the one or more separable
convolutional layers of the at least one of the plurality of convolutional blocks comprises a
single separable convolutional layer and the residual shortcut connection is connected from
the input of the single separable convolutional layer to the output of the single separable
convolutional layer. (See Howard, Abstract, one or more layers, thus the system can be simply one single layer. Pertaining Fig. 4 and abstract, in case of a single layer, the residual shortcut goes from the input to the output of said layer)

As to claim 3:
Howard in view of Fu discloses all limitations of claim 1, wherein the respective one or more
separable convolutional layers of multiple of the plurality of convolutional blocks comprise
respective single separable convolutional layers such that the respective residual shortcut

layer to the respective output of the single separable convolutional layer, and wherein the
multiple of the plurality of convolutional blocks are arranged in a stack one after the other
such that a respective output of at least one of the multiple convolutional blocks is received
as a respective input for at least another of the multiple convolutional blocks. (See Howard, Fig. 3A or Fig. 4, a plurality of blocks. Each block is arranged in a cascading order, i.e. output of one block is the input of another. ¶0042, “ (…) further include the passing of residual information between layers (e.g., between linear bottleneck layers) via a residual shortcut connection”)


As to claim 4:
Howard in view of Fu discloses all limitations of claim 1, wherein the plurality of convolutional
blocks is arranged in a stack with the convolutional blocks sequentially connected one after
the other. ((See Howard, Fig. 3A or Fig. 4, a plurality of blocks. Each block is arranged in a cascading order, i.e. output of one block is the input of another)

As to claim 5:
Howard in view of Fu discloses all limitations of claim 1, wherein the respective one or more
separable convolutional layers of at least one of the plurality of convolutional blocks
comprises:
a first separable convolutional layer; a second separable convolutional layer;  (See Howard, ¶0101, block 402 and 406. See also Fig. 3A)
an activation layer configured to receive the output of the first separable
See Howard, Fig. 4, block 404 can be regarded as the activation layer as it receives output of 402 and processes to generate an input for block 406)

As to claim 6:
Howard in view of Fu discloses all limitations of claim 5, wherein Fu further discloses  the activation layer is configured to perform a parametric operation comprising one or more learned parameters (Fu, ¶0123, a convolutional network can use a parametric warping process for facial pose recognition. ¶0070, the network is trained according to a given model)
It would have been obvious to one of ordinary skill in the art before the effective filing time of the invention that a parametric operation can be used with trained parameter in the system of Howard, as suggested by Fu.  Such implementation is said to be faster than real-time facial alignment (Fu ¶0123)

As to claim 7:
Howard in view of Fu discloses all limitations of claim 6, wherein the output of the first separable convolutional layer has a depth dimension that is less than a depth dimension of an input of the first separable convolutional layer. (See Fu,  ¶0015, output might have fewer number of channel for a given layer)

As to claim 8:
Howard in view of Fu discloses all limitations of claim 7, wherein an output of the second
separable convolutional layer has a depth dimension that is greater than a depth dimension of
See Fu, ¶0079, pointwise convolution specifically create more features, i.e. more channel (larger depth dimension))

As to claim 9:
Howard in view of Fu discloses all limitations of claim 1, and regarding: wherein each of the respective outputs of the plurality of convolutional blocks has a size of 4 x 4 or greater. The examiner assert that an output a convolutional block is, as standard in the art, determined by the following formula:  [ ( image’s dimension  -  Kernel Size) + 2*padding]/Stride + 1.  
Given that the output is affected by so many variables, i.e. image dimension, padding and stride and kernel size. A result of 4x4 or greater can simply be achieved having an input image with large enough resolution for any kernel size, padding and stride parameters of a system.  Therefore, this limitation can be achieved just by the virtue of having a high resolution image, rather than being a novel feature.  Note that both Fu and Howard discloses Kernel size of 3x3 or greater.  A image of common resolution of 128x128 can easily achieve the claimed limitation of 4x4 or greater with kernel size of merely 4x4 with standard stride of 1 and no padding.


As to claim 10:
Howard in view of Fu discloses all limitations of claim 1, wherein the at least one non-transitory
computer-readable medium further stores a pyramid pooling model configured to apply a
plurality of feature maps to data describing at least one respective output of the plurality of
convolutional blocks, and wherein the feature maps have respective resolutions of 4 x 4 or
See Howard, at least 0132 of Howard, feature maps that describes extracted feature of each block.  Output stride of 16 or 8 can be used, thus can achieve a feature map of 4x4 or greater with a large dimension input image, for example 128x128. See also the discussion in claim 9)

Allowable Subject Matter
Claim 23 is newly added and contain subject matters of the original claim 1 and original claim 11 that was objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.  As such claim 23 is allowed.
Conclusion
The prior art made of record and considered pertinent to applicant's disclosure. 
Chao et al. (Large Kernel Matters – Improve Semantic Segmentation by Global Convolution Network) – 2017- in the field of semantic segmentation, where we need to perform dense per-pixel prediction, we find that the large kernel (and effective receptive field) plays an important role when we have to perform the classification and localization tasks simultaneously. Following our design principle, we propose a Global Convolutional Network to address both the classification and localization issues for the semantic segmentation. We also suggest a residual-based boundary refinement to further refine the object boundaries. Our approach achieves state-of-art performance on two public benchmarks and significantly outperforms previous results, 82.2% (vs 80.2%) on PASCAL VOC 2012 dataset and 76.9% (vs 71.8%) on Cityscapes dataset.
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  


Any inquiry concerning this communication or earlier communications from the examiner should be directed to QUAN M HUA whose telephone number is (571)270-7232.  The examiner can normally be reached on 10:30-6:30.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Anthony Addy can be reached on 571-272-7795.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR 






/QUAN M HUA/Primary Examiner, Art Unit 2645