DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
Applicant’s arguments with respect to claim(s) 1-5 and 13-15 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.

Claim Objections
Claim 18 is objected to because of the following informalities:  
Claim 18 reads “the plurality of scales raging from a full image to a fine-grained region of the full image”. The term “raging” should be “ranging”.   
Appropriate correction is required.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 3 and 23 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claim 3 recites “wherein determining the category of the117800044.1Reply to Office Action of April 13, 2022object in the image based at least in part on the second local feature further comprises: determining the category of the object in the image further based on 
Claim 23 recites “zooming in on the first attention region until the first attention region is a same size as the image”. The act of zooming in on a particular region does not change the size of that particular region or the image as a whole. Further support, such as describing the use of image interpolation techniques, is required to adequately describe the effective resizing of the region in the image.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 3-5, 13, and 15-29 are rejected under 35 U.S.C. 103 as being unpatentable over Zhao Bo et al. (NPL: “A survey on deep learning-based fine-grained object classification and semantic segmentation”), hereinafter Zhao, in view of Jiang et al. (US-2020/0117954-A1), hereinafter Jiang.

Regarding claim 1, Zhao teaches a device, comprising: a processing unit; and a memory coupled to the processing unit and having instructions stored thereon which, when executed by the processing unit, cause the device to perform acts comprising:
(Zhao, pages 119-135: Zhao teaches a survey of fine-grained recognition methods, a processing unit and a memory that can allocate instruction to perform a method – implicit in the implementation of any computer implemented method.)

extracting a global feature of an image using a first sub-network of a first learning network, wherein the first learning network, a second learning network, and a third learning network form a stacked learning network that is implemented in an image recognition module of the device; 
(Zhao, pages 123-124, Section 2.2.5: Zhao teaches that a global feature is a feature computed from the input image, any neural network computes features from the input image to obtain the result, therefore, the localization sub-network is equated to the first learning network where a first part computes features, namely the first sub-network and a second computes the output, the coordinates of the found part, namely the second sub-network;
Zhao, Figure 9:

    PNG
    media_image1.png
    553
    949
    media_image1.png
    Greyscale

)

extracting a first local feature of the first attention region using a first sub-network of [[a]] the second learning network; 
(Zhao, Page 124, Figure 9: The second learning network found in this application is equated to Zhao’s teaching of the classification sub-network where part of it calculates features for the input image, which is the part or first attention region in this case, and determines the category label. )

determining a first attention region of the image based on the global feature using a second sub-network of the first learning network, the first attention region comprising a discriminative portion of an object in the image; 
(Zhao, Page 124, Section 2.2.5: 

    PNG
    media_image2.png
    111
    457
    media_image2.png
    Greyscale

; Zhao teaches an equivalent to the first attention. The localized part on (Page 124, Fig. 9) is equated to the “first attention region” of Claim 1.)

and determining a category of the object in the image based at least one of the first local feature or the second local feature.
(Zhao, Page 124, Figure 9: The second learning network found in this application is equated to Zhao’s teaching of the classification sub-network where part of it calculates features for the input image, which is the part or first attention region in this case, and determines the category label.)

Zhao does not expressly disclose: extracting a global feature of an image using a first sub-network of a first learning network, wherein the first learning network, a second learning network, and a third learning network form a stacked learning network that is implemented in an image recognition module of the device; 
Jiang teaches:
extracting a global feature of an image using a first sub-network of a first learning network, wherein the first learning network, a second learning network, and a third learning network form a stacked learning network that is implemented in an image recognition module of the device;
(Jiang, Para 30:

    PNG
    media_image3.png
    258
    632
    media_image3.png
    Greyscale

; Jiang describes a network flow of increasing granularity between three DNN networks, the networks forming a ‘stack’ structure.)
	(Jiang, Figure 1:
	
    PNG
    media_image4.png
    636
    865
    media_image4.png
    Greyscale

; Jiang visualizes the network flow of the three DNN ‘stack’.)

	Zhao does not expressly disclose:
 [[and]] determining a second attention region of the image based on the first local feature using a second sub-network of the second learning network, the second attention region being comprised in the first attention region and comprising a discriminative sub-portion of the object in the image;  extracting a second local feature of the second attention region using the third learning network;
Jiang teaches:
[[and]] determining a second attention region of the image based on the first local feature using a second sub-network of the second learning network, the second attention region being comprised in the first attention region and comprising a discriminative sub-portion of the object in the image; extracting a second local feature of the second attention region using the third learning network;
(Jiang, Figure 1:

    PNG
    media_image5.png
    635
    866
    media_image5.png
    Greyscale

; Jiang displays the framework in which a second attention region, comprised within the first attention region, is determined by a second DNN and passed to a third DNN for extraction.)

It would have been obvious to a person having ordinary skill in the art before the time of the effective filing date of the claimed invention of the instant application to modify the image recognition device of Zhao such that it utilized a connection of networks in a stacked structure, as taught by Jiang. It would have been further obvious to pair the stacked structure of Jiang with the sub-network structures of Zhao, thus having a stacked structure where each DNN contains its own respective extraction and classification network.
The suggestion/motivation for doing so would have been to arrange networks in a fashion that would lead to a gradual increase in granularity as the output of one network is passed to the next network in the stack to be processed. 
Further, one skilled in the art could have combined the elements as described above by known methods with no change in their respective functions, and the combination would have yielded nothing more than predictable results.
Therefore, it would have been obvious to combine Zhao with Jiang to obtain the invention as specified in claim 1.

Regarding claim 3, Zhao in view of Jiang teaches the device of claim [[2]] 1, wherein determining the category of the117800044.12U.S. Patent Application Serial No. 16/631,923 Reply to Office Action of April 13, 2022object in the image based at least in part on the second local feature further comprises: determining the category of the object in the image further based on 
(Zhao, Pages 121-123, Sections 2.2.1 and 2.3.4: A refinement of the part localization, or the multiple granularity CNN are presented.
Zhao, Figure 5:

    PNG
    media_image6.png
    413
    572
    media_image6.png
    Greyscale

)

Regarding claim 4, Zhao in view of Jiang teaches the device of claim 1, wherein extracting the first local feature of the first attention region using the first sub-network of the second learning network comprises: 
zooming in on the first attention region; and 
(Zhao, Page 124, Section 2.2.5:

    PNG
    media_image7.png
    114
    580
    media_image7.png
    Greyscale

; This scaling is equated to the zooming found in the present application.)
extracting the first local feature of the zoomed-in first attention region using the first sub- network of the second learning network.
(Zhao, Pages 123-124, Section 2.2.4: Zhao demonstrates the relationship between the alignment sub-network and the localization and classification modules)

Regarding claim 5, Zhao in view of Jiang teaches the device of claim 1, wherein determining the first attention region of the image based on the global feature comprises: 
determining, based on the global feature, a location parameter indicative of a location of the first attention region in the image using the second sub-network of the first learning network; and determining the first attention region from the image based on the location parameter.  
(Zhao, Page 124, Section 2.2.5, and Figure 9:

    PNG
    media_image2.png
    111
    457
    media_image2.png
    Greyscale

; In Figure 9, it is shown that ‘part’ in the above sentence is referring to the localized part, a characteristic part of the object on which the fine-grained classification is based.)

Regarding claims 13 and 15-16, they recite the device of claims 1 and 4-5, respectively, as a method. Zhao discloses the method:
(Zhao, Page 121, Section 2.2:

    PNG
    media_image8.png
    142
    571
    media_image8.png
    Greyscale

; Zhao describes the introduction of part detection methods and further incorporates a multitude of methods by reference throughout the reference.)
With respect to the remaining limitations of claims 13 and 15-16, the analyses in rejecting claims 1 and 4-5 are equally applicable to claims 13 and 15-16, respectively. 
 
Regarding claims 17 and 22, they recite the device of claims 1 and 4, respectively, as a system. Zhao discloses the system:
(Zhao, Page 123, Section 2.2.5:

    PNG
    media_image9.png
    116
    580
    media_image9.png
    Greyscale

; Zhao describes the use of various system for localization, alignment, and classification.)
With respect to the remaining limitations of claims 17 and 22, the analysis in rejecting claims 1 and 4 are equally applicable to claims 17 and 22.  

Regarding claim 18, Zhao in view of Jiang teaches the system of claim 17, wherein the first sub-network and the second sub-network process the image at a different scale of a plurality of scales, the plurality of scales raging from a full image to a fine-grained region of the full image.
(Zhao, Page 126, Figure 13:

    PNG
    media_image10.png
    408
    569
    media_image10.png
    Greyscale

; Figure 13 demonstrates neural networks at different granularity scales ranging from class to species, equivalent to the “ranging from a full image to a fine-grained region of the full image” of claim 18.)

Regarding claim 19, Zhao in view of Jiang teaches the system of claim 17, wherein the global feature indicates feature information of the object or a background of the image.
(Zhao, Page 121, Section 2.2:

    PNG
    media_image11.png
    199
    581
    media_image11.png
    Greyscale

; Zhao describes extracting features within cooccurring patterns of the object being imaged.)

Regarding claim 20, Zhao in view of Jiang teaches the system of claim 19, wherein the feature information is at least one of: 
a color; a profile; an edge; or a line.  
(Zhao, Page 122, Section 2.2.2:

    PNG
    media_image12.png
    111
    576
    media_image12.png
    Greyscale

; Zhao describes extracting edge box crops from each image.)

Regarding claim 21, Zhao in view of Jiang teaches the system of claim 17, wherein the second sub-network is comprised of a convolutional neural network including one or more convolutional layers, activation layers, or pooling layers N for extracting feature maps.
(Zhao, Page 124, Section 2.2.5: 

    PNG
    media_image2.png
    111
    457
    media_image2.png
    Greyscale

; Zhao describes the localization sub-network as a convolutional neural network containing multiple convolutional layers.)  

Regarding claim 23, Zhao in view of Jiang teaches the system of claim 22, wherein zooming in on the first attention region comprises: 
zooming in on the first attention region until the first attention region is a same size as the image.
(Zhao, Page 128, Section 2.4.3:

    PNG
    media_image13.png
    93
    462
    media_image13.png
    Greyscale

;)  

Zhao discloses substantially the claimed invention as set forth in the discussion above for claim 23.
Zhao does not disclose expressly zooming in on the first attention region until the first attention region is a same size as the image.
At the time of the invention, it would have been obvious to a person of ordinary skill in the art to zoom in on the first attention region until the first attention region is a same size as the image.
Applicant has not disclosed that zooming in on the first attention region until the first attention region is a same size as the image provides an advantage, is used for a particular purpose or solves a stated problem.  One of ordinary skill in the art, furthermore, would have expected Applicant’s invention to perform equally well with either the resizing taught by Zhao or the claimed zooming in on the first attention region until the first attention region is a same size as the image because both zooming actions perform the same function of resizing the image.
Therefore, it would have been obvious to combine to one of ordinary skill in this art to modify Zhao with to obtain the invention as specified in claim 23.

Regarding claim 24, Zhao in view of Jiang teaches the system of claim 17, the second learning network further comprises a fully-connected (FC) layer for mapping the local feature to a feature vector.
Jiang teaches:
wherein the second learning network further comprises a fully-connected (FC) layer for mapping the local feature to a feature vector.
(Jiang, Para 31:

    PNG
    media_image14.png
    289
    636
    media_image14.png
    Greyscale

;) 

Regarding claim 25, Zhao in view of Jiang teaches the system of claim 24,
Jiang teaches:
  wherein the feature vector includes one or more elements corresponding to one or more object categories.
(Jiang, Para 66:

    PNG
    media_image15.png
    172
    632
    media_image15.png
    Greyscale

; Jiang describes feature vectors including data about the category of food in the imaged food region)

Regarding claim 26, Zhao in view of Jiang teaches the system of claim 24, wherein the second learning network further comprises a multinomial logistic regression layer for determining an object category from the one or more object categories.
(Zhao, Page 128, Section 2.4.3:

    PNG
    media_image16.png
    152
    578
    media_image16.png
    Greyscale

; Zhao describes the use of a Softmax layer. Softmax is well known in the art as a multinomial logistic regression method. Applicant discloses the use of this method [034];)  

Regarding claim 27, Zhao in view of Jiang teaches the system of claim 26, wherein determining the object category from the one or more object categories is based on an output of the FC layer.
(Zhao, Figure 17:

    PNG
    media_image17.png
    569
    945
    media_image17.png
    Greyscale

;)  

Regarding claim 28, Zhao in view of Jiang teaches the system of claim 27, wherein the multinomial logistic regression layer converts the output of the FC layer into respective probabilities for the one or more object categories. 
(Zhao, Page 128, Section 2.4.4:

    PNG
    media_image18.png
    92
    579
    media_image18.png
    Greyscale
 
;)

Regarding claim 29, Zhao in view of Jiang teaches the system of claim 28, wherein determining the category of the object in the image based on at least one of the first local feature or the second local feature comprises: 
identifying the object category has a highest probability of the one or more object categories; and selecting the object category as the category of the object in the image based on the highest probability.
(Zhao, Page 128, Section 2.4.4:

    PNG
    media_image19.png
    171
    578
    media_image19.png
    Greyscale

;)


Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
US-2019/0318405-A1
US-2018/0025249-A1
US-11321593 -B2

Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to MYCHAL J GIBBENS whose telephone number is (571)272-5553. The examiner can normally be reached Monday - Friday 8:00am - 5:00pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, SUMATI LEFKOWITZ can be reached on 571-272-3638. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/MYCHAL J GIBBENS/Examiner, Art Unit 2662/



/GANDHI THIRUGNANAM/Primary Examiner, Art Unit 2662