DETAILED ACTION

Applicant's arguments filed 02/18/2022 have been fully considered but they are either not persuasive or a new grounds of rejection has been given.

Applicant has argued as follows: Interpretation under 35 U.S.C. §112(f) or 35 U.S.C. §112 (pre-AIA ), sixth paragraph The Examiner indicates that some claim limitations invoke interpretation under 35 U.S.C. §112(f). By the present amendments, alleged generic placeholders have been replaced by recitations of sufficient structures to perform the claimed functions. In particular: 
I. the recitation of "segmenting unit" in the claim 5 is replaced with "partitioner"; 
II. the recitation of "extracting unit" in the claim 5 is replaced with "extractor"; 
III. the recitation of "first generation unit" in the claim 5 is replaced with "first generator"; 
IV. the recitation of "second generation unit" in the claim 5 is replaced with "second 
generator"; 
V. the recitation of "determining unit" in the claim 5 is replaced with "determiner"; 
and 
VI. the recitation of "displaying unit" in the claim 5 is replaced with "display".
Examiner’s Response: Claim 5 is still considered to invoke interpretation under 35 USC 112(f). The new terms that have been substituted for previous units are still considered to be generic placeholders that are not modified by sufficient structure, 

Applicant has argued as follows: As mentioned above, amended independent claims 1 and 5 require an image object detection method and device, which include, inter alia, "determining the degree of spatial association between each image region and remaining image regions in the at least one image region, and generating the spatial distribution relation graph based on the degree of spatial association" (emphasis added). 
In contrast, neither Liang nor Li discloses, teaches or suggests determination of a degree of spatial association and generation of the spatial distribution relation graph based on the degree of spatial association as required by present claims 1 and 5.
Examiner’s Response: A new reference of Mi has been added to the rejection as disclosed in the rejection of claim 1. Mi discloses using the distance between two boxes in order determine the edges of a spatial relation graph. The spatial distance between boxes can be considered to be a “degree of spatial association.” This fits in well with Liang since Liang has already disclosed determine spatial features that include distance between coordinates of two boxes as well as the coordinates of each box. Additionally both references are have similar Graph Attention Networks. 

CLAIM INTERPRETATION

The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 

Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder 
Because these claim limitations are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, they are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have these limitations interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitations to avoid them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitations recite sufficient structure to perform the claimed function so as to avoid them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 1 and 5-7 are rejected under 35 U.S.C. 103 as being unpatentable over Liang et al. (“Visual-Semantic Graph Attention Networks for Human-Object Interaction Detection”) in view of Mi et al. (“Hierarchical Graph Attention Network for Visual Relationship Detection”).
Regarding claim 1, Liang discloses, an image object detection method, including: performing region segmentation on a target image to obtain at least one 
(See Liang p. 4 left column section 3.2, First, the RPN generates (hundreds) of subject and object proposals. Thus, for an image I, the ith human bounding-box bi h and the jth object bounding-box bhi are used to extract latent features from Faster-RCNNs last fully-connected layer.”)
generating a semantic relation graph (See Liang p. 5 right column section 3.3.2, “In the semantic graph, Word2vec latent representations of the class labels of detected objects are used to instantiate the graph’s nodes.”)
and a spatial distribution relation graph based on the at least one feature map and the at least one image region; (See Liang p. 5 left column section 3.3.1, “The visual graph instantiates a node vi from the latent features hv of each of detected objects. Then, edge eij is constructed from the spatial features sij from Sec. 3.2.”)
generating an image region relation graph based on the semantic relation graph and spatial distribution relation graph; (See Liang p. 5 right column section 3.4, “To jointly leverage the dynamic information of both the visual (Gv) and the semantic (Gs) GATs, it is necessary to fuse them as illustrated in the “Combined Graph” (Gc) of Fig. 2.”)
determining a target image region from the at least one image region based on the image region relation graph; (See Liang p. 5 right column section 3.5, “The last step is to infer the interaction label for a predicate as part of our original triplet <subject, predicate, object>. Note that a person can concurrently perform different actions with each of the available target objects.”)

Liang discloses the above limitations and in Section 3.2 he discloses “Spatial feature will be used to (i) build the edges in the Visual graph” but he fails to disclose the following limitations of using spatial distance to obtain the edges.
However Mi discloses, wherein generating the semantic relation graph and the spatial distribution relation graph based on the at least one feature map and the at least one image region, comprises: determining a degree of spatial association between each image region and remaining image regions in the at least one image region, and generating the spatial distribution relation graph based on the degree of spatial association. (See Mi p 13890 Section 3.3.1, “Two factors are considered in establishing the graph: spatial correlation and semantic correlation. We use dis(bi , bj ) and iou(bi , bj ) to evaluate the spatial correlation of two object proposals. The spatial graph can be defined as: e^sp ij = { 1, dis (bi , bj ) < t1 or iou (bi , bj ) > t2 0, otherwise (4) where t1 and t2 are two thresholds which we set as 0.5 in our experiments.”)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include determining the edge for the spatial graph using distance between boxes as suggested by Mi to Liang’s spatial graph that also has edges between boxes and uses spatial features that includes coordinates of boxes. This can be done using known engineering techniques, with a reasonable expectation of success. The motivation for doing so is in order to accurately determine the structure of a spatial graph.

Regarding claim 5, Liang and Mi disclose, an image object detection device, comprising: a partitioner, configured to perform region segmentation on a target image to obtain at least one image region; an extractor, configured to perform feature extraction on each image region in the at least one image region to obtain at least one feature map; a first generator, configured to generate a semantic relation graph and a spatial distribution relation graph based on the at least one feature map and the at least one image region; a second generator, configured to generate an image region relation graph based on the semantic relation graph and spatial distribution relation graph; a determiner, configured to determine a target image region from the at least one image region based on the image region relation graph; a display, configured to display the target image regions wherein the first generator is further configured to determine a degree of spatial association between each image region and remaining image regions in the at least one image region, and generate the spatial distribution relation graph based on the degree of spatial association. (See the rejection of claim 1 as it is equally applicable for claim 5 as well.)

Regarding claim 6, Liang and Mi disclose, an electronic equipment, including: one or more processors; a storage device, on which one or more programs are stored; when one or more programs are executed by one or more processors, the one or more processors are made to implement the method stated in claim 1. (The image processing will inherently be implemented by a computer containing a processors, and storage device which contains a program.) 

Regarding claim 7, Liang and Mi disclose, a computer-readable medium with a computer program stored thereon, wherein, when the program is executed by a processor, the method described in claim 1  is implemented.   (The image processing will inherently be implemented by a computer containing a processors, and computer-readable medium which contains a program.) 

Claim 2 is rejected under 35 U.S.C. 103 as being unpatentable over Liang et al. (“Visual-Semantic Graph Attention Networks for Human-Object Interaction Detection”) in view of Li et al. (“Visual-Semantic Graph Reasoning for Pedestrian Attribute Recognition”).
Regarding claim 2, Liang  and Mi disclose ,the method according to claim 1, but he fails to disclose the following limitations. 
However Li discloses, wherein, generating a semantic relation graph and a spatial distribution relation graph based on the at least one feature map and the at least one image region, comprises: determining a degree of semantic association between each image region and remaining image regions in the at least one image region according to following formula:

    PNG
    media_image1.png
    809
    947
    media_image1.png
    Greyscale

 f represents a semantic association calculation function, 
and generating a semantic relation graph based on the degree of semantic association.  (See Li p. 8636 right column 2nd paragraph, “The first sub-graph measures visual similarity in image feature space. Assuming that the input of the spatial graph X = (x1, x2, ..., xM )T consists of the visual features extracted from a convolutional neural network, where M de- notes the number of locations of the convolutional feature maps and xi corresponds to i-th image region. The pairwise similarity between every two part regions can be computed by the function, Fs(xi, xj) = ϕs(xi)Tϕls(xj)               (3)
ϕs(x) = wsx and ϕls(x) = wsl x denote two different linear transformations of the visual features. The weight matrices ws ∈ Rd×d  and wsl   ∈ Rd×d  can be learned through back propagation.”)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the pairwise similarity equation that is used to create a graph as suggested by Li to Liang and Mi’s semantic graph using known engineering techniques, with a reasonable expectation of success. The motivation for doing so according to Li see p. 8636 right column, is in order that “the learned spatial context is then embedded into semantic space to guide relationship learning between attributes on the directed semantic graph.”

Regarding claim 5, Liang, Mi. and Li disclose, an image object detection device, comprising: a segmenting unit, configured to perform region segmentation on a target image to obtain at least one image region; an extracting unit, configured to perform feature extraction on each image region in the at least one image region to obtain at least one feature map; a first generating unit, configured to generate a semantic relation graph and a spatial distribution relation graph based on the at least one feature map and the at least one image region; a second generating unit, configured to generate an image region relation graph based on the semantic relation graph and spatial distribution relation graph; a determining unit, configured to determine a target image region from the at least one image region based on the image region relation graph; and a displaying unit, configured to display the target image region.  (See the rejection of claim 1 as it is equally applicable for claim 5 as well.)


	Allowable Subject Matter
Claim 3 and 4 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
	Regarding claims 3 and 4, the discloses prior art of record fails to disclose the limitations of these claims.


Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 


Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.  
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Sumati Lefkowitz can be reached on (571) 272-3638.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/DAVID PERLMAN/Primary Examiner, Art Unit 2662