Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

DETAILED ACTION
This is a response to the amendments filed on 01/29/2021 and interviewed with the examiner’s amendments and attorney authorized and confirmed with the applicant on 04/06/2021.

EXAMINER'S AMENDMENT
An examiner's amendment to the record appears below. Authorization for this examiner's amendment was given in an interview with Attorney Justin Swindells on 04/06/2021. Should the changes and/or additions be unacceptable to applicant, an amendment may be filed as provided by 37 C.F.R. § 1.312. To ensure consideration of such an amendment, it MUST be submitted no later than the payment of the issue fee.

The application has been amended as follows (NOTE: Only amended claims listed):

1. (Currently Amended) A method of training an image semantic annotation apparatus, comprising: 
a. providing a plurality of training images, wherein semantics and visual attribution description of respective training images are known, by automatically parsing, through a computer device, webpages including images to obtain therefrom the plurality of training images, the known semantics, and the known visual attribute descriptions of respective ones of the training images, the known semantics include coarse-grained semantics and fine-grained semantics that are not completely identical to the corresponding coarse-grained semantics for a 
b. inputting the training images each including a given fine-grained classification object to a locator of the image semantic annotation apparatus, wherein the locator is configured to determine a coordinate on the training image, determine a local area on the training image based on the coordinate, and determine a part of the fine-grained classification object within the local area as the feature part; 4827-6678-1657.13 
c. determining, by the locator, a plurality of local areas of each input training image, wherein a location of the at least one local area on the input training image is determined by probability distribution sampling of the locator outputs of the plurality of feature parts, wherein each of the determined local areas of each input training image comprises one feature part of the given fine-grained classification object included in the each input training image, and the different determined local areas in the each input training image comprise different features parts of the given fine-grained classification object, and inputting the determined respective local areas into an attribute predictor of the image semantic annotation apparatus, the plurality of local areas including a coordinate on the input training image and having a size less than a size of the input training image; 

e. training the locator and the attribute predictor according to the obtained visual attribute prediction result of each local area and a known visual attribute description of the corresponding training image, and for each of the feature parts to be located on the corresponding training image, repeating steps a to e until convergence to complete training of the locator and of the attribute predictor, wherein the known visual attribute description of the corresponding training image comprises known visual attribute descriptions of the feature parts of the given fine-grained classification object in the corresponding training image; 
f. selecting at least part of training images from the plurality of training images; 
g. by the trained locator, locating, on the training image, the plurality of feature parts of the given fine-grained classification object corresponding to each of the selected training images by processing each of the selected training images, wherein the locating comprises determining the coordinate on the each training image, and determining the4827-6678-1657.14 feature part of the given fine-grained classification object based on the coordinate on the each training image; 
h. inputting the feature parts located for the given fine-grained classification object of each of the selected training images and the known fine-grained semantic of the given fine-grained classification object in the each training image into a classifier of the image semantics annotation apparatus to train the classifier.
6. (Currently Amended) The method according to claim 1, wherein the step e comprises: for each of the local areas, calculating a loss function according to the visual attribute prediction result of the local area and the visual attribute description of the corresponding training image, for training the locator and the attribute predictor.  
11. (Currently Amended) A computer device that can train itself, comprising: a processor and a memory, the processor being configured to: 

input the training images each including a given fine-grained classification object to a locator, wherein the locator is configured to determine a coordinate on the training image, determine a local area on the training image based on the coordinate, and determine a part of the fine-grained classification object within the local area as the feature part; 4827-6678-1657.16 
determine, by the locator, a plurality of local areas of each input training image, wherein a location of the at least one local area on the input training image is determined by probability distribution sampling of the locator outputs of the plurality of feature parts, wherein each of the determined local areas of each input training image comprises one feature part of the given fine-grained classification object, and the different determined local areas in the each input training image comprise different features parts of the given fine-grained classification 
obtain, with the each local area determined, a visual attribute prediction result of each input local area, wherein the visual attribute prediction result of each input local area comprises a visual attribute description of the feature part located in the each input local area; 
train the computer device according to the obtained visual attribute prediction result of each local area and a known visual attribute description of the corresponding training image, and for each of the feature parts to be located on the corresponding training image, perform the aforementioned operations until convergence to complete training of the computer device, wherein the known visual attribute description of the corresponding training image comprises known visual attribute descriptions of the feature parts of the given fine-grained classification object in the corresponding training image; 
select at least part of training images from the plurality of training images; 
locatthe plurality of feature parts of the given fine-grained classification object corresponding to each of the selected training images by processing each of the selected training images, wherein the locating comprises determining the coordinate on the each training image, and determining the feature part of the given fine-grained classification object based on the coordinate on the each training image; and 
input the feature parts located for the given fine-grained classification object of each of the selected training images and the known fine-grained semantic of the given fine-grained classification object in the each training image into a classifier to train the computer device.
16. (Currently Amended) The computer device according to claim 11, wherein the processor is further configured to: for each of the local areas, calculate a loss function according to the visual attribute prediction result of the local area and the visual attribute description of the corresponding training image, for training the computer device.  

a. providing a plurality of training images, wherein semantics and visual attribution description of respective training images are known, by automatically parsing, through a computer device, webpages including images to obtain therefrom the plurality of training images, the known semantics, and the known visual attribute descriptions of respective ones of the training images, the known semantics include coarse-grained semantics and fine-grained semantics that are not completely identical to the corresponding coarse-grained semantics for a respective one of the plurality of training images, wherein the coarse-grained semantics corresponds to a coarse-grained classification object, and the different fine-grained semantics corresponds to different fine-grained classification objects belonging to the same coarse-grained classification object, each of the fine-grained classification objects including a plurality of feature parts, the visual attribute descriptions being divided into different groups based on their corresponding feature parts, each of the visual attribute descriptions expressing a local visual 4827-6678-1657.18 appearance of corresponding ones of the feature parts, wherein the webpages include textual data relating to the images, and wherein the known semantics and known visual attribute descriptions result from capturing the textual data; 
b. inputting the training images each including a given fine-grained classification object to a locator of the image semantic annotation apparatus, wherein the locator is configured to determine a coordinate on the training image, determine a local area on the training image based on the coordinate, and determine a part of the fine-grained classification object within the local area as the feature part; 
c. determining, by the locator, a plurality of local areas of each input training image, wherein a location of the at least one local area on the input training image is determined by probability distribution sampling of the locator outputs of the plurality of feature parts, wherein 
d. obtaining a visual attribute prediction result of each input local area from the attribute predictor, wherein the visual attribute prediction result of each input local area comprises a visual attribute description of the feature part located in the each input local area; 
e. training the locator and the attribute predictor according to the obtained visual attribute prediction result of each local area and a known visual attribute description of the corresponding training image, and for each of the feature parts to be located on the corresponding training image, repeating steps a to e until convergence to complete training of the locator and of the attribute predictor, wherein the known visual attribute description of the corresponding training image comprises known visual attribute descriptions of the feature parts of the given fine-grained classification object in the corresponding training image; 
f. selecting at least part of training images from the plurality of training images; 4827-6678-1657.19 
g. via the trained locator, locating, on the training image, the plurality of feature parts of the given fine-grained classification object corresponding to each of the selected training images by processing each of the selected training images, wherein the locating comprises determining the coordinate on the each training image, and determining the feature part of the given fine-grained classification object based on the coordinate on the each training image; 
h. inputting the feature parts located for the given fine-grained classification object of each of the selected training images and the known fine-grained semantic of the fine-grained 

Allowable Subject Matter

	Claims 1, 6-7, 9-11, and 16-21 are allowed.
	The following is a statement of reasons for the indication of allowable subject matter:

	Regarding to the independent claim 1, 11, and 18:

	The prior art fails to teach or suggest every the limitations together. Further, the examiner cannot determine a reasonable motivation, either in the prior art or the existing case law, to combine the known elements to render the claimed invention.

	Thus, claims 1, 11, and 18 are allowable.

	Claims 6-7, 9-10, 16-17, and 19-21 are dependent upon claims 1, 11, and 18, respectively and are thus allowable.

Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee.  Such submissions should be clearly labeled “Comments on Statement of Reasons for Allowance.”

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JENQ-KANG (Kang) CHU whose telephone number is (571)270-7396.  The examiner can normally be reached on M-F 8-6 EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kavita Padmanabhan can be reached on 5712728352.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/JENQ-KANG CHU/Examiner, Art Unit 2176                                                                                                                                                                                                        
/KAVITA STANLEY/Supervisory Patent Examiner, Art Unit 2176