Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

EXAMINER’S AMENDMENT
An examiner’s amendment to the record appears below. Should the changes and/or additions be unacceptable to applicant, an amendment may be filed as provided by 37 CFR 1.312. To ensure consideration of such an amendment, it MUST be submitted no later than the payment of the issue fee.
Authorization for this examiner’s amendment was given in an interview with Jinggao Li on 04/21/2022.

The application has been amended as follows: 

Claim 1. (Currently Amended) A visual relationship detection method based on adaptive clustering learning, comprising, executed by a processor, the following steps: 
detecting visual objects from an input image and recognizing the visual objects by a contextual message passing mechanism to obtain context representations of the visual objects; 
embedding the context representations of pair-wise visual objects into a low-dimensional joint subspace to obtain visual relationship sharing representations; 
embedding the context representations of pair-wise visual objects into a plurality of low- dimensional clustering subspaces, respectively, to obtain a plurality of preliminary visual relationship enhancing representations; and then performing regularization to the preliminary visual relationship enhancing representations by clustering-driven attention mechanisms; and 
fusing the visual relationship sharing representations, the regularized visual relationship enhancing representations and a prior distribution over 

Claim 4. (Currently Amended) The visual relationship detection method based on adaptive clustering learning according to claim 1, wherein the step of obtaining the visual relationship sharing representations is specifically: 
obtaining a first product of a joint subject mapping matrix and the context representations of the visual object of the subject, obtaining a second product of a joint object mapping matrix and the context representations of the visual object of the object; 
subtracting the second product from the first product, and dot-multiplying the difference value and convolutional features of a visual relationship candidate region; 
wherein, the joint subject mapping matrix and the joint object mapping matrix are mapping matrices that map the visual objects context representations to a joint subspace; and the visual relationship candidate region is the minimum rectangle box that can fully cover the corresponding visual object candidate regions of the subject and object; 
the convolutional features are extracted from the visual relationship candidate region by a convolutional neural network.
Claim 8. (Currently Amended) The visual relationship detection method based on adaptive clustering learning according to claim 6, wherein the step of "fusing the visual relationship sharing representations and the regularized visual relationship enhancing representations with a prior distribution over 
inputting a predicted category label of visual object of subject and a predicted category label of visual object of object into the visual relationship prior function to obtain a prior distribution over the category label of visual relationship predicate; and 
obtaining a seventh product of the visual relationship sharing mapping matrix and the visual relationship sharing representations, obtaining an eighth product of the visual relationship enhancing mapping matrix and the regularized visual relationship enhancing representations; summing the seventh product, the eighth product and the prior distribution over the category label of visual relationship predicate, and then substituting the result into the softmax function.

Claim 9. (Currently Amended) A system for a visual relationship detection method based on adaptive clustering learning, the system comprising: 
a processor configured for: 
detecting visual objects from an input image and recognizing the visual objects by a contextual message passing mechanism to obtain context representations of the visual objects; 
embedding the context representations of pair-wise visual objects into a low-dimensional joint subspace to obtain visual relationship sharing representations; 
embedding the context representations of pair-wise visual objects into a plurality of low- dimensional clustering subspaces, respectively, to obtain a plurality of preliminary visual relationship enhancing representations; and then performing regularization to the preliminary visual relationship enhancing representations by clustering-driven attention mechanisms; and 
fusing the visual relationship sharing representations, the regularized visual relationship enhancing representations and a prior distribution over 

Claim 10. (Currently Amended) The system according to claim [[1]] 9, wherein the method further comprises: 
calculating empirical distribution of the visual relationships from training set samples of a visual relationship data set to obtain a visual relationship prior function.

Claim 11. (Currently Amended) The system according to claim [[1]] 9, wherein the method further comprises: constructing an initialized visual relationship detection model, and training the model by the training data of the visual relationship data set.

Claim 12. (Currently Amended) The system according to claim [[1]] 9, wherein the step of obtaining the visual relationship sharing representations is specifically: 
obtaining a first product of a joint subject mapping matrix and the context representations of the visual object of the subject, obtaining a second product of a joint object mapping matrix and the context representations of the visual object of the object; 
subtracting the second product from the first product, and dot-multiplying the difference value and convolutional features of a visual relationship candidate region; 
wherein, the joint subject mapping matrix and the joint object mapping matrix are mapping matrices that map the visual objects context representations to a joint subspace; and 
the visual relationship candidate region is the minimum rectangle box that can fully cover the corresponding visual object candidate regions of the subject and object; 
the convolutional features are extracted from the visual relationship candidate region by a convolutional neural network.

Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation(s) is/are: 

“contextual message passing mechanism” and “clustering-driven attention mechanisms” in claims 1 and 9

Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.

Allowable Subject Matter
Claims 1-16 are allowed.

The following is an examiner’s statement of reasons for allowance:

Regarding claim 1, the prior art made of record fails to teach a visual relationship detection method based on adaptive clustering learning, comprising, executed by a processor, the following steps: 
detecting visual objects from an input image and recognizing the visual objects by a contextual message passing mechanism to obtain context representations of the visual objects; 
embedding the context representations of pair-wise visual objects into a low-dimensional joint subspace to obtain visual relationship sharing representations; 
embedding the context representations of pair-wise visual objects into a plurality of low- dimensional clustering subspaces, respectively, to obtain a plurality of preliminary visual relationship enhancing representations; and then performing regularization to the preliminary visual relationship enhancing representations by clustering-driven attention mechanisms; and 
fusing the visual relationship sharing representations, the regularized visual relationship enhancing representations and a prior distribution over category labels of visual relationship predicates, to predict visual relationship predicates by synthetic relational reasoning.

Regarding claim 9, the prior art made of record fails to teach a system for a visual relationship detection method based on adaptive clustering learning, the system comprising: 
a processor configured for: 
detecting visual objects from an input image and recognizing the visual objects by a contextual message passing mechanism to obtain context representations of the visual objects; 
embedding the context representations of pair-wise visual objects into a low-dimensional joint subspace to obtain visual relationship sharing representations; 
embedding the context representations of pair-wise visual objects into a plurality of low- dimensional clustering subspaces, respectively, to obtain a plurality of preliminary visual relationship enhancing representations; and then performing regularization to the preliminary visual relationship enhancing representations by clustering-driven attention mechanisms; and 
fusing the visual relationship sharing representations, the regularized visual relationship enhancing representations and a prior distribution over category labels of visual relationship predicates, to predict visual relationship predicates by synthetic relational reasoning.

Jung et al, “Visual Relationship Detection with Language prior and Softmax” (published in 2018 IEEE International Conference on Image Processing, Applications and Systems (IPAS), pages 143-148, December 2018) generally teaches a visual relationship detection method (see Jung Abstract) comprising 

detecting visual objects from an input image and recognizing the visual objects by a neural network (see Section V, D, second paragraph) to obtain context representations of the visual objects (see Table 1); 

embedding the context representations of pair-wise visual objects into a low-dimensional joint subspace to obtain visual relationship sharing representations (see section II, B, first paragraph and section V, F, Figure 4);  

clustering the context representations (see section V, F, first paragraph);

fusing results to predict visual relationship predicates by synthetic relational reasoning (see section V, first paragraph).

However, Jung does not teach or suggest the features highlighted above. 

Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee.  Such submissions should be clearly labeled “Comments on Statement of Reasons for Allowance.”

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.

Han et al, “Visual Relationship Detection Based on Local Feature and Context Feature” (published in 2018 International Conference on Network Infrastructure and Digital Content (IC-NIDC), pages 420-424, August 2018) generally teaches a visual relationship detection method (see Han Abstract) comprising detecting visual objects from an input image and recognizing the visual objects by a neural network to obtain context representations of the visual objects (see section 3.2, first paragraph); embedding the context representations of pair-wise visual objects into a low-dimensional joint subspace to obtain visual relationship sharing representations (see section 3.2, final two paragraphs) for predicting predicates (see section 4.3, second paragraph). 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to CASEY L KRETZER whose telephone number is (571)272-5639. The examiner can normally be reached M-F 10:00-7:00 PM PDT.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, DAVID C PAYNE can be reached on (571)272-3024. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/CASEY L KRETZER/           Examiner, Art Unit 2637