DETAILED ACTION
1.	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
2.	A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant’s submission filed 05 May 2022 [hereinafter Response], where:
Claims 1-10 and 12-22 have been amended.
Claims 1-22 are pending.
Claims 1-22 are rejected.
Claim Rejections - 35 U.S.C. § 112(b)
4.	The rejection to claim 21 under Section 112(b) is withdrawn in view of Applicant’s amendment to the claim.
Claim Rejections - 35 U.S.C. § 101
5.	35 U.S.C. § 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
6.	Claims 1-22 are rejected under 35 U.S.C. § 101 because the claimed invention is directed to an abstract idea without significantly more. 
Claim 1 recites:
A computer-implemented method, comprising:
obtaining, using a camera of a computing device, a two-dimensional (2D) image of an environment including a first object and a second object;
receiving, from a server, an identification of the first object . . . ;
obtaining, using one or more sensors of the computing device, additional data of the first object;
obtaining . . . additional data of the environment; 
determining, based on the additional data of the environment, a spatial relationship between the first object and the second object in the surrounding environment;
generating a knowledge graph including (i) the additional data of the first object associated with the identification of the first object and (ii) the additional data of the environment also associated with the identification of the first object, wherein the knowledge graph is organized in a hierarchical semantic manner to illustrate the spatial relationship between the first object and the second object;
obtaining a second two-dimensional image including at least part of the first object; and
identifying, using the knowledge graph, the first object in the second two-dimensional image.
Under Step 1, the instant claim recites a method, which falls under the four categories of Section 101.
Step 2A Prong One of the eligibility analysis evaluates whether the claim recites a judicial exception. The claim recites the steps of:
determining . . . a contextual relationship between the first object and the second object in the environment;
generating a knowledge graph . . . ; and
identifying, using the knowledge graph, the first object in the second two-dimensional image.
The claim recites the steps of determining relationships and generating a knowledge graph for identifying objects in a two-dimensional image, which is an act of evaluating information that can practically be performed in the human mind. MPEP § 2106.04(a)(2). The claim is directed to an abstract idea of processing data to form a knowledge graph representing relations between objects of an image.
Step 2A Prong Two of the eligibility analysis evaluates whether the claim as a whole integrates the recited judicial exception into a practical application of the exception. The claim recites additional claim elements beyond the identified judicial exception that include
“a camera of a computing device,” which is a conventional component of a conventional computing device, (Yu v. Apple Inc., 2021 U.S.P.Q.2d 632 (Fed. Cir. 2021));
“a server,” which is a generic computer component, (TLI Communications LLC v. AV Auto. LLC, 823 F.3d 607, 614-15, 118 USPQ2d 1744, 1749-50 (Fed. Cir. 2016));
“one or more sensors of the computing device,” which is a generic computer component to collect data, (Yu v. Apple Inc., 2021 U.S.P.Q.2d 632 (Fed. Cir. 2021));
“a knowledge graph,” which is a generic method of gathering and organizing data to show connections between data (see Wu1 1:59-65).
The claim recites activities that include:
obtaining . . . a two-dimensional (2D) image of an environment including a first object and a second object;
receiving . . . an identification of the first object . . . ;
obtaining . . . additional data of the first object;
obtaining . . . additional data of the environment; 
obtaining a second two-dimensional image including at least part of the first object.
These generic steps of obtaining and receiving are executed as instructions using generic computer components do not change the character of the claim from an abstract idea into a practical application. See In re Board of Trustees of the Leland Stanford Junior University, 991 F.3d 1245, 2021 U.S.P.Q.2d 361 (Fed. Cir. 2021); see also MPEP § 2106.07(a).II.
Step 2B of the eligibility analysis evaluates whether the claim recites additional elements that amount to an inventive concept (also known as “significantly more”) than the recited judicial exception. The claim recites the additional elements of
“a camera of a computing device,” which is a conventional component of a conventional computing device, (Yu v. Apple Inc., 2021 U.S.P.Q.2d 632 (Fed. Cir. 2021));
“a server,” which is a generic computer device, (TLI Communications LLC v. AV Auto. LLC, 823 F.3d 607, 614-15, 118 USPQ2d 1744, 1749-50 (Fed. Cir. 2016));
“one or more sensors of the computing device,” which is a generic computer component to collect data, (Yu v. Apple Inc., 2021 U.S.P.Q.2d 632 (Fed. Cir. 2021));
“a knowledge graph,” which is a generic method of gathering and organizing data to show connections between the data (Wu2 1:59-65).
Receiving data, storing data, and data processing using conventional components and functions generic to the technology are well-known, routine, and conventional. In Re: Board of Trustees of the Leland Stanford Junior University, 991 F.3d 1245, 2021 U.S.P.Q.2d 361 (Fed. Cir. 2021); Free Stream Media Corp. v. Alphonso Inc., 996 F.3d 1355, 2021 U.S.P.Q.2d 521 (Fed. Cir. 2021). See also MPEP § 2106.05(a).II. Thus, claim 1 is directed to non-eligible subject matter.
Claim 2 recites the “computer-implemented method of claim 1, wherein the one or more sensors of the computing device include the camera of the computing device, and wherein obtaining the additional data of the object comprises obtaining images of the first object from additional points of view.” The claim merely recites more details or specifics of the abstract idea by of obtaining images of claim 1, and accordingly, is merely more specific to the abstract idea. Also, there are no additional elements set out that amount to an inventive concept (also known as “significantly more”) than the recited judicial exception. Thus, the claim is directed to a judicial exception as discussed in detail above with respect to claim 1.
Claim 3 recites the “computer-implemented method of claim 1, wherein the one or more sensors of the computing device include a depth camera of the computing device, and wherein obtaining the additional data of the object comprises obtaining depth images of the first object.” The claim merely recites more details or specifics of the abstract idea by of obtaining images of an object of claim 1, and accordingly, is merely more specific to the abstract idea. Also, though the claim recites the additional claim element that “the one or more sensors . . . include a depth camera,” such detail does not add significantly more than the judicial exception because the “depth camera” is used for the intended “obtaining depth images of the object.” Thus, the claim is directed to a judicial exception as discussed in detail above with respect to claim 1.
Claim 4 recites the “computer-implemented method of claim 1, wherein the one or more sensors of the computing device include a microphone of the computing device, and wherein obtaining the additional data of the first object comprises obtaining, using the microphone, audio from the environment.” The claim merely recites more details or specifics of the abstract idea by of obtaining images of an object of claim 1, and accordingly, is merely more specific to the abstract idea. Also, though the claim recites the additional claim element that “the one or more sensors . . . include a microphone,” such detail does not add significantly more than the judicial exception because the “microphone” is used for the intended “obtaining, using the microphone, audio.” Thus, the claim is directed to a judicial exception as discussed in detail above with respect to claim 1.
Claim 5 recites the “computer-implemented method of claim 1, wherein generating the knowledge graph comprises storing the additional data of the first object with a label indicating the identification of the first object.” 
Under Step 2A Prong One of the eligibility evaluation, the claim recites the step of:
“storing the additional data of the first object with a label indicating the identification of the first object.” 
The step is directed to the abstract idea of storing data information, though in a computer-implemented context, is not practically performed in the human mind. 
Under Step 2A Prong Two of the eligibility evaluation, the claim recites in the context of a “computer-implemented method” the activity of
“storing the additional data of the first object with a label . . . .” 
Though this is an additional element recited in the claim beyond the judicial exception; the “storing” is used for “additional data of the first object,” which is a generic computer functions recited at a high level of generality. Accordingly, this generic computing function does not meaningfully limit the claim.
Under Step 2B of the eligibility evaluation, the additional element of “storing the additional data of the first object with a label” is used for storing the result of gathering and analyzing information, which is simply using conventional techniques and displaying the result. (MPEP § 2106.05(d).II.iv). It is a well-understood, routine, and conventional for storing and retrieving information in memory. Thus, the claim is directed to a judicial exception as also discussed in detail above with respect to claim 1.
Claim 6 recites the “computer-implemented method of claim 1, wherein the computing device is a robotic device operable to move throughout an environment, and wherein obtaining, using the one or more sensors of the computing device, the additional data of the first object comprises: the robotic device moving around the object to collect data of the object using the one or more sensors on-board the robotic device.” 
Under Step 2A Prong Two of the eligibility evaluation, the claim recites the additional claim element of 
“the computing device is a robotic device using the one or more sensors on-board the robotic device”
This is an additional element recited in the claim beyond the judicial exception; however, that the robotic devices is “moving around the first object to collect data of the object” is a generic function of the robotic device and sensors thereto at a high level of generality, such as gathering data. According, this generic data gathering function does not meaningfully limit the claim.
Under Step 2B of the eligibility evaluation, the additional element of “the robotic device moving around the first object to collect data of the object using the one or more sensors on-board the robotic device” for gathering information, which is simply using conventional techniques and displaying the result. (MPEP § 2106.05(d).II.iv). It is a well-understood, routine, and conventional for storing and retrieving information (in memory). Thus, the claim is directed to a judicial exception as is also discussed in detail above with respect to claim 1.
Claim 7 recites the “computer-implemented method of claim 1, wherein obtaining, using the one or more sensors of the computing device, the additional data of the first object comprises obtaining the additional data at different times of day.” The claim merely recites more details or specifics of the abstract idea by of obtaining images of claim 1, and accordingly, is merely more specific to the abstract idea. Also, there are no additional elements set out that amount to an inventive concept (also known as “significantly more”) than the recited judicial exception. Thus, the claim is directed to a judicial exception as discussed in detail above with respect to claim 1.
Claim 8 recites the “computer-implemented method of claim 1, wherein obtaining the additional data of the first object comprises obtaining audio including speech, and the method further comprises: receiving, from another server, an output of speech recognition of the speech; and assigning an entity identification to the knowledge graph using the output of the speech recognition of the speech, wherein the entity identification indicates an identification of a node of the knowledge graph.” Under Step 2A Prong One of the eligibility evaluation, the claim recites the additional step of:
“assigning an entity identification to the knowledge graph” by 
“using the output of the speech recognition of the speech”
The step is directed to the abstract idea of processing information data, though from another server, which is practically performed in the human mind. 
Under Step 2A Prong Two of the eligibility evaluation, the claim recites the additional claim element of 
receiving, from “another server,” an output of speech recognition of the speech;
This is an additional element recited in the claim beyond the judicial exception; however, that the receiving via “another server” performs the generic function of speech recognition and is recited at a high level of generality, such as receiving and analyzing data. According, this generic data gathering function does not meaningfully limit the claim.
Under Step 2B of the eligibility evaluation, the additional element of the “another server” is directed to data analysis, which is simply using conventional techniques and displaying the result. (MPEP § 2106.05(d).II.iv). It is a well-understood, routine, and conventional for data analysis. (MPEP § 2106.04(a)(2).III.C). Thus, the claim is directed to a judicial exception as is also discussed in detail above with respect to claim 1.
Claim 9 recites the “computer-implemented method of claim 1, wherein obtaining, using the camera of a computing device, the 2D image of the first object comprises obtaining the 2D image at a particular time, and the method further comprises: obtaining, from the computing device, a log of sensor data indicative of the environment of the first object during a time period prior to the particular time; and generating the knowledge graph to include the log of sensor data associated with the identification of the first object.” The claim merely recites more details or specifics of the abstract idea by of “obtaining additional data” of claim 1 directed to “a particular time,” and accordingly, is merely more specific to the abstract idea. Also, there are no additional elements set out that amount to an inventive concept (also known as “significantly more”) than the recited judicial exception. Thus, the claim is directed to a judicial exception as discussed in detail above with respect to claim 1.
Claim 10 recites the “computer-implemented method of claim 1, wherein the additional data of the environment indicates a spatial layout between the at least one item represented by the additional data of the environment and the first object, and wherein the method further comprises: generating the knowledge graph to include information indicating the spatial layout . . . .” The claim merely recites more details or specifics of the abstract idea by of obtaining additional data of claim 1, and accordingly, is merely more specific to the abstract idea. Also, there are no additional elements set out that amount to an inventive concept (also known as “significantly more”) than the recited judicial exception. Thus, the claim is directed to a judicial exception as discussed in detail above with respect to claim 1.
Claim 11 recites the “computer-implemented method of claim 1, further comprising: receiving outputs of the one or more sensors of the computing device; and accessing the knowledge graph to determine an identification of one or more objects represented by one or more of the outputs of the one or more sensors of the computing device.” The claim merely recites more details or specifics of the abstract idea by of obtaining images of claim 1, and accordingly, is merely more specific to the abstract idea, in that the “obtaining, using the one or more sensors of the computing device, additional data of the objects” of claim 1 is by “receiving outputs” and that the “identifying” of claim 1 is through “accessing the knowledge graph.” Also, there are no additional elements set out that amount to an inventive concept (also known as “significantly more”) than the recited judicial exception. Thus, the claim is directed to a judicial exception as discussed in detail above with respect to claim 1
Claim 12 recites the “computer-implemented method of claim 1, further comprising: determining a person associated with the first object; and generating the knowledge graph to include information indicating the person associated with the first object to label the first object as belonging to the person.” The claim merely recites more details or specifics of the abstract idea by of obtaining images of claim 1, and accordingly, is merely more specific to the abstract idea. Also, there are no additional elements set out that amount to an inventive concept (also known as “significantly more”) than the recited judicial exception. Thus, the claim is directed to a judicial exception as discussed in detail above with respect to claim 1.
Claim 13 recites the “computer-implemented method of claim 1, further comprising: receiving, from another server, information indicating an activity related to a scene in the 2D image of the object; and generating the knowledge graph to include the information indicating the activity associated with the identification of the first object.” The claim merely recites more details or specifics of the abstract idea by of obtaining images of claim 1, and accordingly, is merely more specific to the abstract idea, in that the “obtaining, using the one or more sensors of the computing device, additional data of the objects” is by “receiving outputs” and that the “identifying” is through “accessing the knowledge graph.” Also, there are no additional elements set out that amount to an inventive concept (also known as “significantly more”) than the recited judicial exception. Thus, the claim is directed to a judicial exception as discussed in detail above with respect to claim 1.
Claim 14 recites the “computer-implemented method of claim 1, further comprising: determining whether the first object is stationary or movable; and generating the knowledge graph to include information indicating whether the first object is stationary or movable.” The claim merely recites more details or specifics of the abstract idea by of determining characteristics of the object identified in claim 1, and accordingly, is merely more specific to the abstract idea. Also, there are no additional elements set out that amount to an inventive concept (also known as “significantly more”) than the recited judicial exception. Thus, the claim is directed to a judicial exception as discussed in detail above with respect to claim 1.
Claim 21 recites the “computer-implemented method of claim 1, further comprising: determining a contextual relationship between the first object and the second object, the contextual relationship being a personal association, an activity, or a chronology of events, wherein the knowledge graph is generated based on the contextual relationship between the first object and the second object.” The claim merely recites more details or specifics of the abstract idea by of the contextual relationship of claim 1, and accordingly, is merely more specific to the abstract idea. Also, there are no additional elements set out that amount to an inventive concept (also known as “significantly more”) than the recited judicial exception. Thus, the claim is directed to a judicial exception as discussed in detail above with respect to claim 1.
Claim 22 recites the “computer-implemented method of claim 1, wherein identifying the first object in the second two-dimensional image comprises applying an object classifier that was trained using the knowledge graph to the second two-dimensional image.” The claim merely recites more details or specifics of the abstract idea by of identifying an object of claim 1 with an object classifier, and accordingly, is merely more specific to the abstract idea. Also, there are no additional elements set out that amount to an inventive concept (also known as “significantly more”) than the recited judicial exception. Thus, the claim is directed to a judicial exception as discussed in detail above with respect to claim 1.
Claim 15 recites:
A computing device comprising:
a camera;
one or more sensors;
at least one processor;
memory; and
program instructions, stored in the memory, that upon execution by the at least one processor cause the computing device to perform operations comprising:
obtaining, using the camera, a two-dimensional (2D) image of an environment comprising a first object and a second object;
receiving, from a server, an identification of the first object based on the 2D image of the first object;
obtaining, using the one or more sensors, additional data of the first object;
obtaining . . . additional data of the environment; 
determining . . . a spatial relationship between the first object and the second object in the environment; 
generating a knowledge graph including (i) the additional data of the first object associated with the identification of the first object and (ii) the additional data of the environment also associated with the identification of the object, wherein the knowledge graph is organized in a hierarchical semantic manner to illustrate the spatial relationship between the first object and the second object;
obtaining a second two-dimensional image including at least part of the first object; and
identifying, using the knowledge graph, the first object in the second two-dimensional image.
Under Step 1, the instant claim recites a computing device, which falls under the four categories of Section 101.
Step 2A Prong One of the eligibility analysis evaluates whether the claim recites a judicial exception. The claim recites the steps of:
determining . . . a contextual relationship between the first object and the second object in the environment;
generating a knowledge graph . . . ;
identifying, using the knowledge graph, the first object in the second two-dimensional image.
The steps of determining relationships and generating a knowledge graph for identifying objects in a two-dimensional image are an act of evaluating information that can practically be performed in the human mind. MPEP § 2106.04(a)(2). The claim is directed to an abstract idea of processing data to form a knowledge graph representing relations between objects of an image.
Step 2A Prong Two of the eligibility analysis evaluates whether the claim as a whole integrates the recited judicial exception into a practical application of the exception. The claim recites additional claim elements beyond the identified judicial exception that include
“a camera,” which is a conventional component of a conventional computing device, (Yu v. Apple Inc., 2021 U.S.P.Q.2d 632 (Fed. Cir. 2021));
“a server,” which is a generic computer component, (TLI Communications LLC v. AV Auto. LLC, 823 F.3d 607, 614-15, 118 USPQ2d 1744, 1749-50 (Fed. Cir. 2016));
“one or more sensors,” which is a generic computer component to collect data, (Yu v. Apple Inc., 2021 U.S.P.Q.2d 632 (Fed. Cir. 2021));
“at least one processor,” which is a generic computer component, (TLI Communications LLC v. AV Auto. LLC, 823 F.3d 607, 614-15, 118 USPQ2d 1744, 1749-50 (Fed. Cir. 2016));
“memory,” which is a generic computer component, (TLI Communications LLC v. AV Auto. LLC, 823 F.3d 607, 614-15, 118 USPQ2d 1744, 1749-50 (Fed. Cir. 2016)); and
“a knowledge graph,” which is a generic method of gathering and organizing data to show connections between data (see Wu3 1:59-65).
The claim recites activities that include:
obtaining . . . a two-dimensional (2D) image of an environment comprising a first object and a second object;
receiving . . . an identification of the first object . . . ;
obtaining . . . additional data of the first object;
obtaining . . . additional data of a environment. 
obtaining a second two-dimensional image including at least part of the object.
These generic steps of obtaining and receiving data are executed as instructions using generic computer components do not change the character of the claim from an abstract idea into a practical application. See In re Board of Trustees of the Leland Stanford Junior University, 991 F.3d 1245, 2021 U.S.P.Q.2d 361 (Fed. Cir. 2021); see also MPEP § 2106.07(a).II.
Step 2B of the eligibility analysis evaluates whether the claim recites additional elements that amount to an inventive concept (also known as “significantly more”) than the recited judicial exception. The claim recites the additional elements of
“a camera,” which is a conventional component of a conventional computing device, (Yu v. Apple Inc., 2021 U.S.P.Q.2d 632 (Fed. Cir. 2021));
“a server,” which is a generic computer device, (TLI Communications LLC v. AV Auto. LLC, 823 F.3d 607, 614-15, 118 USPQ2d 1744, 1749-50 (Fed. Cir. 2016));
“one or more sensors of the computing device,” which is a generic computer component to collect data, (Yu v. Apple Inc., 2021 U.S.P.Q.2d 632 (Fed. Cir. 2021));
“at least one processor,” which is a generic computer component, (TLI Communications LLC v. AV Auto. LLC, 823 F.3d 607, 614-15, 118 USPQ2d 1744, 1749-50 (Fed. Cir. 2016));
“memory,” which is a generic computer component, (TLI Communications LLC v. AV Auto. LLC, 823 F.3d 607, 614-15, 118 USPQ2d 1744, 1749-50 (Fed. Cir. 2016)); and
“a knowledge graph,” which is a generic method of gathering and organizing data to show connections between the data (Wu4 1:59-65).
Receiving data, storing data, and data processing using conventional components and functions generic to the technology are well-known, routine, and conventional. In Re: Board of Trustees of the Leland Stanford Junior University, 991 F.3d 1245, 2021 U.S.P.Q.2d 361 (Fed. Cir. 2021); Free Stream Media Corp. v. Alphonso Inc., 996 F.3d 1355, 2021 U.S.P.Q.2d 521 (Fed. Cir. 2021). See also MPEP § 2106.05(a).II. Thus, claim 15 is directed to non-eligible subject matter.
Claim 16 recites the “computing device of claim 15, wherein the one or more sensors of the computing device include a microphone of the computing device, and wherein obtaining the additional data of the first object comprises: obtaining . . . images of the object from additional points of view; obtaining, using the microphone, audio from the environment of the first object; and obtaining . . . the additional data of the first object comprises obtaining the additional data at different times of day.” The claim merely recites more details or specifics of the abstract idea by of obtaining images of an object of claim 15, and accordingly, is merely more specific to the abstract idea. Also, though the claim recites the additional claim element that “the one or more sensors . . . include a microphone,” such detail does not add significantly more than the judicial exception because the “microphone” is used for the intended “obtaining, using the microphone, audio.” Thus, the claim is directed to a judicial exception as discussed in detail above with respect to claim 15.
Claim 17 recites the “computing device of claim 15, wherein the program instructions further comprise instructions, stored in the memory, that upon execution by the at least one processor cause the computing device to perform operations comprising: obtaining . . . the 2D image of the first object at a particular time; obtaining . . . a log of sensor data indicative of the environment during a time period prior to the particular time; and generating the knowledge graph to include the log of sensor data . . . .” The claim merely recites more details or specifics of the abstract idea by of obtaining images of claim 15, and accordingly, is merely more specific to the abstract idea. Also, there are no additional elements set out that amount to an inventive concept (also known as “significantly more”) than the recited judicial exception. Thus, the claim is directed to a judicial exception as discussed in detail above with respect to claim 15.
Claim 18 recites:
A non-transitory computer-readable medium having stored therein instructions, that when executed by a computing device, cause the computing device to perform functions comprising:
obtaining, using a camera of the computing device, a two-dimensional (2D) image of an environment comprising a first object and a second object;
receiving, from a server, an identification of the first object based on the 2D image of the first object;
obtaining, using one or more sensors of the computing device, additional data of the first object;
obtaining . . . additional data of a surrounding environment of the first object; 
determining, based on the additional data of the environment, a spatial relationship between the first object and the second object in the environment;
generating a knowledge graph including (i) the additional data of the object associated with the identification of the first object and (ii) the additional data of the environment also associated with the identification of the first object, wherein the knowledge graph is organized in a hierarchical semantic manner to illustrate the contextual relationship between the first object and the second object;
obtaining a second two-dimensional image including at least part of the first object; and
identifying, using the knowledge graph, the first object in the second two-dimensional image.
Under Step 1, the instant claim recites a non-transitory computer-readable medium, which falls under the four categories of Section 101.
Step 2A Prong One of the eligibility analysis evaluates whether the claim recites a judicial exception. The claim recites the steps of:
determining . . . a contextual relationship between the object and a second object in the surrounding environment;
generating a knowledge graph . . . ;
identifying, using the knowledge graph, the object in the second two-dimensional image.
The claim recites the steps of determining relationships and generating a knowledge graph for identifying objects in a two-dimensional image, is an act of evaluating information that can practically be performed in the human mind. MPEP § 2106.04(a)(2). The claim is directed to an abstract idea of processing data to form a knowledge graph representing relations between objects of an image.
Step 2A Prong Two of the eligibility analysis evaluates whether the claim as a whole integrates the recited judicial exception into a practical application of the exception. The claim recites additional claim elements beyond the identified judicial exception that include
“a camera of a computing device,” which is a conventional component of a conventional computing device, (Yu v. Apple Inc., 2021 U.S.P.Q.2d 632 (Fed. Cir. 2021));
“a server,” which is a generic computer component, (TLI Communications LLC v. AV Auto. LLC, 823 F.3d 607, 614-15, 118 USPQ2d 1744, 1749-50 (Fed. Cir. 2016));
“one or more sensors of the computing device,” which is a generic computer component to collect data, (Yu v. Apple Inc., 2021 U.S.P.Q.2d 632 (Fed. Cir. 2021));
“a knowledge graph,” which is a generic method of gathering and organizing data to show connections between data (see Wu5 1:59-65).
The claim recites activities that include:
obtaining . . . a two-dimensional (2D) image of an object;
receiving . . . an identification of the object . . . ;
obtaining . . . additional data of the object;
obtaining . . . additional data of a surrounding environment of the object.
These generic steps of obtaining data, generating a knowledge graph to illustrate contextual relationships between objects, and identifying an object using the knowledge graph are executed as instructions using generic computer components do not change the character of the claim from an abstract idea into a practical application. See In re Board of Trustees of the Leland Stanford Junior University, 991 F.3d 1245, 2021 U.S.P.Q.2d 361 (Fed. Cir. 2021); see also MPEP § 2106.07(a).II.
Step 2B of the eligibility analysis evaluates whether the claim recites additional elements that amount to an inventive concept (also known as “significantly more”) than the recited judicial exception. The claim recites the additional elements of
“a camera of a computing device,” which is a conventional component of a conventional computing device, (Yu v. Apple Inc., 2021 U.S.P.Q.2d 632 (Fed. Cir. 2021));
“a server,” which is a generic computer device, (TLI Communications LLC v. AV Auto. LLC, 823 F.3d 607, 614-15, 118 USPQ2d 1744, 1749-50 (Fed. Cir. 2016));
“one or more sensors of the computing device,” which is a generic computer component to collect data, (Yu v. Apple Inc., 2021 U.S.P.Q.2d 632 (Fed. Cir. 2021));
“a knowledge graph,” which is a generic method of gathering and organizing data to show connections between the data (Wu6 1:59-65).
Receiving data, storing data, and data processing using conventional components and functions generic to the technology are well-known, routine, and conventional. In Re: Board of Trustees of the Leland Stanford Junior University, 991 F.3d 1245, 2021 U.S.P.Q.2d 361 (Fed. Cir. 2021); Free Stream Media Corp. v. Alphonso Inc., 996 F.3d 1355, 2021 U.S.P.Q.2d 521 (Fed. Cir. 2021). See also MPEP § 2106.05(a).II. Thus, claim 18 is directed to non-eligible subject matter
Claim 19 recites the “non-transitory computer-readable medium of claim 18, wherein the additional data of the environment indicates a spatial layout between the at least one item represented by the additional data of the environment and the first object, and wherein the functions further comprise: generating the knowledge graph to include information indicating the spatial layout between the at least one item represented by the additional data of the environment and the first object.” The claim merely recites more details or specifics of the abstract idea by of obtaining additional data of claim 18, and accordingly, is merely more specific to the abstract idea. Also, there are no additional elements set out that amount to an inventive concept (also known as “significantly more”) than the recited judicial exception. Thus, the claim is directed to a judicial exception as discussed in detail above with respect to claim 18.
Claim 20 recites the “non-transitory computer-readable medium of claim 18, wherein the functions further comprise: determining a person associated with the first object; and generating the knowledge graph to include information indicating the person associated with the first object to label the first object as belonging to the person.” The claim merely recites more details or specifics of the abstract idea by of obtaining additional data of claim 18, and accordingly, is merely more specific to the abstract idea. Also, there are no additional elements set out that amount to an inventive concept (also known as “significantly more”) than the recited judicial exception. Thus, the claim is directed to a judicial exception as discussed in detail above with respect to claim 18.
Claim Rejections - 35 U.S.C. § 103
7.	The following is a quotation of 35 U.S.C. § 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
8.	The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. § 103 are summarized as follows:
1. 	Determining the scope and contents of the prior art.
2. 	Ascertaining the differences between the prior art and the claims at issue.
3. 	Resolving the level of ordinary skill in the pertinent art.
4. 	Considering objective evidence present in the application indicating obviousness or nonobviousness.
9.	This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. § 102(b)(2)(C) for any potential 35 U.S.C. § 102(a)(2) prior art against the later invention.
10.	Claims 1, 5, 7, 9-15, and 17-22 are rejected under 35 U.S.C. § 103 as being unpatentable over US Patent 10467290 to Wu et al. [hereinafter Wu] in view of Dillon et al., "Cite-Scene Understanding and Object Recognition," Advances in Computer Vision & Machine Intelligence (1997) [hereinafter Dillon] and Hueting et al., MCGraph: Multi-criterion representation for scene understanding,” ACM (2014) [hereinafter Hueting].
Regarding claim 1, Wu teaches [a] computer-implemented method (Wu 18:60-63 teaches computer 700 might also include computer-readable storage media for performing any of the other computer-implemented operations described herein), comprising:
obtaining, using a camera of a computing device, a two-dimensional (2D) image of an environment including a first object and a second object (Wu 5:23-25 teaches images 112 may be obtained from video of the user (e.g., taken by a smartphone or camera of the user) (that is, obtaining, using a camera of a computing device, a two-dimensional (2D) image of an environment including a first object and a second object));
receiving, from a server, an identification of the first object based on the 2D image of the first object (Wu 5:6-7 teaches the image analysis service 120 may be performed on the computing device 102 and/or some other computing device (that is, from a server); Wu 6:16-19 teaches the image analysis service 120 generates tags based on results of scene understanding and/or facial recognition that are performed on each image 112 (that is, receiving, from a server, an identification of the first object based on the 2D image of the first object));
obtaining, using one or more sensors of the computing device, additional data of the first object (Wu 5:25-39 teaches [a]digital image, such as one of the images 112, often includes a set of metadata (meaning data about the image). For example, a digital image 112 may include . . . focal length (e.g., 4mm); 35 mm focal length (e.g., 33); dimensions of the image; horizontal resolution; vertical resolution; bit depth (e.g., 24); color representation (e.g., sRGB); camera model (e.g., iPhone 6); F-stop; . . . GPS (Global Positioning System) latitude . . . longitude . . . and altitude (that is, obtaining, using one or more sensors of the computing device, additional data of the first object));
obtaining, using the one or more sensors of the computing device, additional data of the environment (Wu 6:1-15 teaches a piece of software program code . . . can be used to read the metadata and tags from the images (that is, “tags” are additional data of a surrounding environment of the object). In some configurations, the image analysis service 120 normalizes the tags of the retrieved images. For example, both “dusk” and “twilight” tags are changed to “sunset.” The image analysis service 120 may also generate additional tags for each image. The image analysis service 120 may also generate additional tags for each image. . . In some examples, the image analysis service 120 sends the GPS coordinates within the GeoTag to a map service server requesting for a location corresponding to the GPS coordinates (that is, obtaining, using the one or more sensors of the computing device, additional data of the environment)); 
determining, based on the additional data of the environment, a . . . relationship (Wu, 5:51-56, teaches a “contextual relationship” in that, [f]or example, a “family” tag (that is, a “tag” is determining, based on the additional data of the surrounding environment) indicates that the image is a family image, a “wedding” tag indicates that the image is a wedding image, a “subset” tag indicates that the image is a sunset scene image (that is, “sunset scene image” shows objects in a relationship), a “Santa Monica beach” tag indicates that the image is a taken at Santa Monica beach, etc.) between the first object and the second object in the environment (Wu, 5:59-63, teaches user tags may be used to identify recognized individuals in the images 112, other tags may identify particular animals in the images 112, other tags may identify objects within the images (e.g., cars, buildings, tables, chairs); 
generating a knowledge graph (Wu 8:12-13 teaches [a]fter identifying the related words, the [Knowledge Graph] manager 130 generates the dense [Knowledge Graph] 124 (that is, generating a knowledge graph); Wu Fig. 1 teaches (Examiner notations added):

    PNG
    media_image1.png
    585
    1184
    media_image1.png
    Greyscale

Wu 4:18-20 teaches depicting an illustrative operating environment in which a knowledge graph is generated and used with tagged images of a user) including (i) the additional data of the first object associated with the identification of the first object and (ii) the additional data of the environment also associated with the identification of the first object (Wu Fig. 2 for generating and refining a knowledge graph; Wu 7:41-43 teaches [g]enerally, the dense KG 124 is used in conjunction with the tags and metadata 118A for the images 112 of a user when identifying the objects and concepts within an image (that is, the tags and metadata 118A are additional data of the first object) and concepts within an image (that is, the tags and metadata 118A are additional data of the environment)), wherein the knowledge graph is organized in a hierarchical semantic manner to illustrate the . . . relationship between the first object and the second object (Wu Fig. 3 teaches (Examiner notations added):

    PNG
    media_image2.png
    700
    943
    media_image2.png
    Greyscale

Wu 1:39-40 teaches that FIG. 3 is a block diagram shown an illustrative dense knowledge graph generated from an exemplary image 302; Wu 12:36-44 teaches [Knowledge Graph] 304A shows an outdoors category 306A that includes tags identified within the image and included within a generated dense knowledge graph 124 for the outdoors category 304A. In the current example, the outdoors category 306A includes an identification of “bay, soil, nature, beach, scenery, sand, ocean, coast, water, and sea.” The plant category 306B graph shows a classification of “tree and palm tree”. The field category 306C graph includes “land and island”; Wu 3:2-3 teaches [t]he created [Knowledge Graph] can have a large number of hierarchical levels (that is, organized in a hierarchical semantic manner to illustrate the . . . relationship between the first object and the second object)) 
* * *
Though Wu teaches an image analysis service can access and search a knowledge graph data to identify different categories depicted by objects within an image including positional data, Wu, however, does not explicitly teach -
* * *
obtaining a second two-dimensional image including at least part of the first object; and
identifying, using the knowledge graph, the first object in the second two-dimensional image.
But Dillon teaches -
* * *
obtaining a second two-dimensional image (Dillon, Table 4.6, teaches results of images (that is, a plurality of “images,” which is obtaining a second-two dimensional image) of four scenarios:

    PNG
    media_image3.png
    201
    639
    media_image3.png
    Greyscale

Dillon at p. 173, “4.10 System Performance and Results,” first paragraph, teaches [t]he knowledge bases for the scenarios were constructed from a collection of 85 images (that is, the “85 images” include a second two-dimensional image)) including at least part of the first object (Dillon, at p. 124, “4.2 World Knowledge,” first paragraph, teaches [o]ne of the most important data representations in scene understanding and object recognition systems is the representation of world knowledge. Cite represents world knowledge as a semi-restricted graph in which each node represents an object or visual concept which is either a part-of, a view-of or a type-of its parent or parents (that is, “a part-of” is including at least part of the first object)); and
identifying, using the knowledge graph, the first object in the second two-dimensional image (Dillon, Fig. 4.2, which teaches:

    PNG
    media_image4.png
    196
    639
    media_image4.png
    Greyscale

Dillon at p. 124, “4.2 World Knowledge,” first paragraph, teaches that [w]ithin each node is stored information about the optimal segmentation, feature extraction, and matching algorithms that are used to recognize this object in an image).
Wu and Dillon are from the same or similar field of endeavor. Wu teaches an image analysis service that includes knowledge graph data to identify categories depicted by objects within an image. Dillon teaches world knowledge as a semi-restricted graph in which each node represents an object or visual concept which is either a part-of, a view-of or a type-of its parent or parents. Thus, it would have been obvious to a person having ordinary skill in the art as of the effective filing date of Applicant’s invention to modify Wu pertaining to a knowledge graph for object identification with the semi-restricted graph of Dillon.
The motivation for doing so is to provide a hierarchical knowledge base provides with a rich scene description that can include contextual, taxonomic and deep decomposition information. (Dillon at p. 122, “4.1.1. Proposed Theory,” fourth partial paragraph).
Though Wu and Dillon teach the feature of object relationships represented by a knowledge graph, the combination of Wu and Dillon, however, do not explicitly teach that the “relationship” illustrated be the knowledge graph is a “spatial relationship.”
But Hueting teaches a “knowledge graph is organized in a hierarchical semantic manner to illustrate the contextual spatial relationship between the first object and the second object” (Hueting, left column of p. 4, “3.3 Knowledge Graph,” second paragraph, teaches labels the graph stores knowledge units. They can be simple labels, but also more complex concepts such as primitive proxies, or spatial relations; Hueting Fig. 4 teaches:

    PNG
    media_image5.png
    241
    479
    media_image5.png
    Greyscale

Hueting, Fig. 4 caption, teaches relationships can exist in the abstraction graph as well as in the knowledge graph).
Wu, Dillon, and Hueting are from the same or similar field of endeavor. . Wu teaches an image analysis service that includes knowledge graph data to identify categories depicted by objects within an image. Dillon teaches world knowledge as a semi-restricted graph in which each node represents an object or visual concept which is either a part-of, a view-of or a type-of its parent or parents. Hueting teaches a unified multi-criterion data representation for understanding and processing of large-scale 3D scene. Thus, it would have been obvious to a person having ordinary skill in the art as of the effective filing date of Applicant’s invention to modify the combination of Wu and Dillon pertaining to a knowledge graph for object identification with the multi-criterion data representation of Hueting.
The motivation for doing so is to take scene understanding to the next level of performance by taking into account a unified representation of knowledge simultaneously. (Hueting, Abstract).
Regarding claim 5, the combination of Wu, Dillon, and Hueting teaches all of the limitations of claim 1, as described above in detail.
Wu teaches wherein generating the knowledge graph comprises storing the additional data of the object with a label indicating the identification of the first object (Wu 5:48-51 teaches [t]he images 112 can also include one or more tags (that is, a label) embedded in the image, or possibly stored separately from the image, as metadata. The tags describe and indicate the characteristics of the image; Wu 7:41-43 teaches [g]enerally, the dense [Knowledge Graph] 124 is used in conjunction with the tags and metadata 118A for the images 112 of a user when identifying the objects and concepts within an image (that is, generating the knowledge graph comprises storing the additional data of the object with a label indicating the identification of the object)).
Regarding claim 7, the combination of Wu, Dillon, and Hueting teaches all of the limitations of claim 1, as described above in detail.
wherein obtaining, using the one or more sensors of the computing device, the additional data of the first object comprises obtaining the additional data at different times of day (Wu 6:24-33 teaches [i]n some examples, the image creation time determined from the metadata associated with the image 112 may be used to assist scene understanding. For example, when the scene type is determined to be “beach” and the creation time is 6:00 PM for an image, both beach and sunset beach may be tags for the scene types of the image 112. As an additional example, a dusk scene image and a sunset scene image of a same location or structure may appear to be similar. In such a case, the image creation time helps to determine the scene type, i.e., a dusk scene or a sunset scene (that is, the additional data of the first object comprises obtaining the additional data at different times of day)).
Regarding claim 9, the combination of Wu, Dillon, and Hueting teaches all of the limitations of claim 1, as described above in detail.
wherein obtaining, using the camera of a computing device, the 2D image of the first object comprises obtaining the 2D image at a particular time, and the method further comprises:
obtaining, from the computing device, a log of sensor data indicative of the environment during a time period prior to the particular time (Wu 2:33-38 teaches the parser may modify (e.g., remove, add, edit) the related words received from the data source 35 such that the words include the desired classes for objects identified within images. In some configurations, the parser modifies the tags to match existing tags that are used to identify objects within digital images (that is, obtaining, from the computing device, a log of sensor data); Wu 6:61-68 teaches The creation time might also be used to identify a special event for the user. For example, the event might be a birthday, a wedding, a graduation, or the like. In some configurations, the image analysis service 120 may access calendar data for the user (or other users recognized within the image 112) to determine whether that day is associated with an event identified by the user (that is, indicative of the environment during a time period prior to the particular time); see also Wu 6:34-38 also teaches [t]he date of the creation time and geolocation of the image 35 may also be considered in determining the scene type. For example, the sun disappears out of sight from the sky at different times in different seasons of the year. Moreover, sunset times are different for different locations (that is, a time period prior to the particular time)); and
generating the knowledge graph to include the log of sensor data associated with the identification of the first object (Wu 6:24-33 teaches [i]n some examples, the image creation time determined from the metadata associated with the image 112 may be used to assist scene understanding. For example, when the scene type is determined to be “beach” and the creation time is 6:00 PM for an image (that is, a particular time), both beach and sunset beach may be tags for the scene types of the image 112. As an additional example, a dusk scene image and a sunset scene image of a same location or structure may appear to be similar. In such a case, the image creation time helps to determine the scene type, i.e., a dusk scene or a sunset scene).
Regarding claim 10, the combination of Wu, Dillon, and Hueting teaches all of the limitations of claim 1, as described above in detail.
wherein the additional data of the environment indicates a spatial layout between the at least one item represented by the additional data of the surrounding environment and the object, and wherein the method further comprises:
generating the knowledge graph to include information indicating the spatial layout between the at least one item represented by the additional data of the environment and the first object (Wu 12:45-55 teaches [t]he display element 308 shows a portion of an exemplary listing of tags and their association with the outdoors category 306A. In the current example, the display element 308 shows that the sea identification has a rating or confidence level (that is, information indicating) of 0.866 (out of 1), whereas the sand identification has a 0.4086 rating, the beach identification has a rating of 0.405 and the water identification has a rating of 0.3747 (that is, information indicating the spatial layout). As discussed above, a user, such as a tester or user of the image software product 106 may make modifications to the tags and/or the graphs 304A-304C that may then be used by the KG manager 130 to update the affected KGs 124 (that is, the rating or confidence level is information indicating the spatial layout between the at least one item represented by the additional data of the environment and the first object)).
Regarding claim 11, the combination of Wu, Dillon, and Hueting teaches all of the limitations of claim 1, as described above in detail.
Wu teaches -
receiving outputs of the one or more sensors of the computing device (Wu 4:33-35 teaches an application, such as an image software product 106 executing on the computing device 102 communicates with the service provider network 104; Wu 5:26-27 teaches a digital . . . often includes a set of metadata (that is, receiving outputs of the one or more sensors of the computing device)); and
accessing the knowledge graph to determine an identification of one or more additional objects represented by one or more of the outputs of the one or more sensors of the computing device (Wu 8:47-51 teaches [a]fter creation of the KG 124 by the knowledge graph manager 130, or some other component or device, the knowledge graph manager 130 may generate a GUI that includes user interface (“UI”) elements to represent the identified tags and a generated dense KG 124 to view (that is, accessing the knowledge graph to determine an identification of one or more additional objects represented by one or more of the outputs of the one or more sensors of the computing device)).
Regarding claim 12, the combination of Wu, Dillon, and Hueting teaches all of the limitations of claim 1, as described above in detail.
Wu teaches -
determining a person associated with the object (Wu 5:59-63 teaches user tags may be used to identify recognized individuals in the images 112 (that is, determining a person associated with the object), other tags may identify particular animals in the images 112, other tags may identify objects within the images (e.g., cars, buildings, tables, chairs); and
generating the knowledge graph to include information indicating the person associated with the first object to label the first object as belonging to the person (Wu 7:41-43 teaches the dense KG 124 is used in conjunction with the tags and metadata 118A for the images 112 of a user (that is, indicating . . . to label the object as belonging to the person) when identifying the objects and concepts within an image (that is, generating the knowledge graph to include information indicating the person associated with first object to label the first object as belonging to the person)).
Regarding claim 13, the combination of Wu, Dillon, and Hueting teaches all of the limitations of claim 1, as described above in detail.
Wu teaches further comprising:
receiving, from another server (Wu 4:53-65 teaches the service provider network 104 may include a collection of rapidly provisioned and, potentially, released computing resources hosted in connection with the electronic marketplace . . . . computing resources may correspond to both virtual machine instances and physical computing devices (that is, from another server)), information indicating an activity related to a scene in the 2D image of the object (Wu 6:56-63 teaches the image analysis service 120 may also generate an event tag for special days and/or events. For example, the event might be a birthday, a wedding, a graduation (that is, indicating an activity to a scene), or the like (that is, receiving . . . information indicating an activity related to a scene in the 2D image of the object)); and
generating the knowledge graph to include the information indicating the activity associated with the identification of the first object (Wu 7:41-43 the dense KG 124 is used in conjunction with the tags and metadata 118A for the images 112 of a user when identifying the objects and concepts within an image (that is, generating the knowledge graph to include information indicating the activity associated with the identification of the first object)).
Regarding claim 14, the combination of Wu, Dillon, and Hueting teaches all of the limitations of claim 1, as described above in detail.
Wu teaches -
determining whether the first object is stationary or movable (Wu 5:48-63 teaches images 112 can also include one or more tags embedded in the image, or possibly stored separately from the image, as metadata. . . . [U]ser tags may be used to identify recognized individuals in the images 112, other tags may identify particular animals in the images 112, other tags may identify objects within the images (e.g., cars, buildings, tables, chairs) (that is, determining whether the first object is stationary or movable); Examiner points out that objects in static images appear as stationary, while the tags of Wu may indicate additional attributes of the object (such as movable); accordingly, the BRI of the claim as presented reads on the teachings of Wu); and
generating the knowledge graph to include information indicating whether the first object is stationary or movable (Wu 7:41-43 the dense KG 124 is used in conjunction with the tags and metadata 118A for the images 112 of a user when identifying the objects and concepts within an image (that is, generating the knowledge graph to include information indicating whether the first object is stationary or movable)).
Regarding claim 15, Wu teaches [a] computing device comprising:
a camera (Wu 5:23-25 teaches images 112 may be obtained from video of the user (e.g., taken by a smartphone or camera of the user);
one or more sensors (Wu 5:25-39 teaches [a]digital image, such as one of the images 112, often includes a set of metadata (meaning data about the image). For example, a digital image 112 may include . . . focal length (e.g., 4mm); 35 mm focal length (e.g., 33); dimensions of the image; horizontal resolution; vertical resolution; bit depth (e.g., 24); color representation (e.g., sRGB); camera model (e.g., iPhone 6); F-stop; . . . GPS (Global Positioning System) latitude . . . longitude . . . and altitude (that is, one or more sensors));
at least one processor (Wu 16:65-67 teaches [t]he CPUs 704 may be standard programmable processors that perform arithmetic and logical operations necessary for the operation of the computer 700 (that is, at least one processor));
memory (Wu 17:17-18 teaches [t]he chipset 706 may provide an interface to a RAM 708, used as the main memory in the computer 700 (that is, memory)); and
program instructions, stored in the memory, that upon execution by the at least one processor cause the computing device to perform operations (Wu 18:50-54 encoded with computer-executable instructions that, when loaded into the computer 700, transform the computer into a special-purpose computer capable of implementing the examples described herein (that is, program instructions, stored in the memory, that upon execution by the at least one processor cause the computing device to perform operation)) comprising:
obtaining, using the camera, a two-dimensional (2D) image of an environment comprising a first object and a second object (Wu 5:23-25 teaches images 112 may be obtained from video of the user (e.g., taken by a smartphone or camera of the user) (that is, obtaining, using a camera of a computing device, a two-dimensional (2D) image of an environment comprising a first object and a second object));
receiving, from a server, an identification of the first object based on the 2D image of the first object (Wu 5:6-7 teaches the image analysis service 120 may be performed on the computing device 102 and/or some other computing device (that is, from a server); Wu 6:16-19 teaches the image analysis service 120 generates tags based on results of scene understanding and/or facial recognition that are performed on each image 112 (that is, receiving, from a server, an identification of the first object based on the 2D image of the first object));
obtaining, using the one or more sensors, additional data of the first object (Wu 5:25-39 teaches [a]digital image, such as one of the images 112, often includes a set of metadata (meaning data about the image). For example, a digital image 112 may include . . . focal length (e.g., 4mm); 35 mm focal length (e.g., 33); dimensions of the image; horizontal resolution; vertical resolution; bit depth (e.g., 24); color representation (e.g., sRGB); camera model (e.g., iPhone 6); F-stop; . . . GPS (Global Positioning System) latitude . . . longitude . . . and altitude (that is, obtaining, using one or more sensors of the computing device, additional data of the first object));
obtaining, using the one or more sensors, additional data of an environment (Wu 6:1-15 teaches a piece of software program code . . . can be used to read the metadata and tags from the images (that is, “tags” are additional data of a surrounding environment of the object). In some configurations, the image analysis service 120 normalizes the tags of the retrieved images. For example, both “dusk” and “twilight” tags are changed to “sunset.” The image analysis service 120 may also generate additional tags for each image. The image analysis service 120 may also generate additional tags for each image. . . In some examples, the image analysis service 120 sends the GPS coordinates within the GeoTag to a map service server requesting for a location corresponding to the GPS coordinates (that is, obtaining, using the one or more sensors of the computing device, additional data of the environment)); 
determining, based on the additional data of the environment, a . . . relationship (Wu, 5:51-56, teaches a “relationship” in that, [f]or example, a “family” tag (that is, a “tag” is determining, based on the additional data of the environment) indicates that the image is a family image, a “wedding” tag indicates that the image is a wedding image, a “subset” tag indicates that the image is a sunset scene image (that is, “sunset scene image” is a relationship), a “Santa Monica beach” tag indicates that the image is a taken at Santa Monica beach, etc.) between the first object and the second object in the environment (Wu, 5:59-63, teaches user tags may be used to identify recognized individuals in the images 112, other tags may identify particular animals in the images 112, other tags may identify objects within the images (e.g., cars, buildings, tables, chairs (that is, between the first object and the second object in the environment)); 
generating a knowledge graph (Wu 8:12-13 teaches [a]fter identifying the related words, the [Knowledge Graph] manager 130 generates the dense [Knowledge Graph] 124 (that is, generating a knowledge graph); Wu Fig. 1 teaches (Examiner notations added):

    PNG
    media_image1.png
    585
    1184
    media_image1.png
    Greyscale

Wu 4:18-20 teaches depicting an illustrative operating environment in which a knowledge graph is generated and used with tagged images of a user) including (i) the additional data of the first object associated with the identification of the first object and (ii) the additional data of the environment also associated with the identification of the first object (Wu Fig. 2 for generating and refining a knowledge graph; Wu 7:41-43 teaches [g]enerally, the dense KG 124 is used in conjunction with the tags and metadata 118A for the images 112 of a user when identifying the objects and concepts within an image (that is, the tags and metadata 118A are additional data of the object) and concepts within an image (that is, the tags and metadata 118A are additional data of the environment also associated with the identification of the first object)), wherein the knowledge graph is organized in a hierarchical semantic manner to illustrate the . . . relationship between the first object and the second object (Wu Fig. 3 teaches (Examiner notations added):

    PNG
    media_image2.png
    700
    943
    media_image2.png
    Greyscale

Wu 1:39-40 teaches that FIG. 3 is a block diagram shown an illustrative dense knowledge graph generated from an exemplary image 302; Wu 12:36-44 teaches [Knowledge Graph] 304A shows an outdoors category 306A that includes tags identified within the image and included within a generated dense knowledge graph 124 for the outdoors category 304A. In the current example, the outdoors category 306A includes an identification of “bay, soil, nature, beach, scenery, sand, ocean, coast, water, and sea.” The plant category 306B graph shows a classification of “tree and palm tree”. The field category 306C graph includes “land and island”; Wu 3:2-3 teaches [t]he created [Knowledge Graph] can have a large number of hierarchical levels (that is, organized in a hierarchical semantic manner to illustrate relationships between the first object and the second object));
* * *
Though Wu teaches an image analysis service can access and search a knowledge graph data to identify different categories depicted by objects within an image, Wu, however, does not explicitly teach -
* * *
obtaining a second two-dimensional image including at least part of the first object; and
identifying, using the knowledge graph, the first object in the second two-dimensional image. 
But Dillon teaches -
* * *
obtaining a second two-dimensional image (Dillon, Table 4.6, teaches results of images (that is, a plurality of “images,” which is obtaining a second-two dimensional image) of four scenarios:

    PNG
    media_image3.png
    201
    639
    media_image3.png
    Greyscale

Dillon at p. 173, “4.10 System Performance and Results,” first paragraph, teaches [t]he knowledge bases for the scenarios were constructed from a collection of 85 images (that is, the “85 images” include a second two-dimensional image)) including at least part of the first object (Dillon, at p. 124, “4.2 World Knowledge,” first paragraph, teaches [o]ne of the most important data representations in scene understanding and object recognition systems is the representation of world knowledge. Cite represents world knowledge as a semi-restricted graph in which each node represents an object or visual concept which is either a part-of, a view-of or a type-of its parent or parents (that is, “a part-of” is including at least part of the first object)); and
identifying, using the knowledge graph, the first object in the second two-dimensional image (Dillon, Fig. 4.2, which teaches:

    PNG
    media_image4.png
    196
    639
    media_image4.png
    Greyscale

Dillon at p. 124, “4.2 World Knowledge,” first paragraph, teaches that [w]ithin each node is stored information about the optimal segmentation, feature extraction, and matching algorithms that are used to recognize this object in an image).
Wu and Dillon are from the same or similar field of endeavor. Wu teaches an image analysis service that includes knowledge graph data to identify categories depicted by objects within an image. Dillon teaches world knowledge as a semi-restricted graph in which each node represents an object or visual concept which is either a part-of, a view-of or a type-of its parent or parents. Thus, it would have been obvious to a person having ordinary skill in the art as of the effective filing date of Applicant’s invention to modify Wu pertaining to a knowledge graph for object identification with the semi-restricted graph of Dillon.
The motivation for doing so is to provide a hierarchical knowledge base provides with a rich scene description that can include contextual, taxonomic and deep decomposition information. (Dillon at p. 122, “4.1.1. Proposed Theory,” fourth partial paragraph).
Though Wu and Dillon teach the feature of object relationships represented by a knowledge graph, the combination of Wu and Dillon, however, do not explicitly teach that the “relationship” illustrated be the knowledge graph is a “spatial relationship.”
But Hueting teaches a “knowledge graph is organized in a hierarchical semantic manner to illustrate the contextual spatial relationship between the first object and the second object” (Hueting, left column of p. 4, “3.3 Knowledge Graph,” second paragraph, teaches labels the graph stores knowledge units. They can be simple labels, but also more complex concepts such as primitive proxies, or spatial relations; Hueting Fig. 4 teaches:

    PNG
    media_image5.png
    241
    479
    media_image5.png
    Greyscale

Hueting, Fig. 4 caption, teaches relationships can exist in the abstraction graph as well as in the knowledge graph).
Wu, Dillon, and Hueting are from the same or similar field of endeavor. . Wu teaches an image analysis service that includes knowledge graph data to identify categories depicted by objects within an image. Dillon teaches world knowledge as a semi-restricted graph in which each node represents an object or visual concept which is either a part-of, a view-of or a type-of its parent or parents. Hueting teaches a unified multi-criterion data representation for understanding and processing of large-scale 3D scene. Thus, it would have been obvious to a person having ordinary skill in the art as of the effective filing date of Applicant’s invention to modify the combination of Wu and Dillon pertaining to a knowledge graph for object identification with the multi-criterion data representation of Hueting.
The motivation for doing so is to take scene understanding to the next level of performance by taking into account a unified representation of knowledge simultaneously. (Hueting, Abstract).
Regarding claim 17, the combination of Wu, Dillon, and Hueting teaches all of the limitations of claim 15, as described above in detail. 
wherein the program instructions further comprise instructions, stored in the memory, that upon execution by the at least one processor cause the computing device to perform operations comprising:
obtaining, using the camera of a computing device, the 2D image of the first object at a particular time; obtaining, from the computing device, a log of sensor data indicative of the environment during a time period prior to the particular time (Wu 2:33-38 teaches the parser may modify (e.g., remove, add, edit) the related words received from the data source 35 such that the words include the desired classes for objects identified within images. In some configurations, the parser modifies the tags to match existing tags that are used to identify objects within digital images (that is, obtaining, from the computing device, a log of sensor data); Wu 6:61-68 teaches The creation time might also be used to identify a special event for the user. For example, the event might be a birthday, a wedding, a graduation, or the like. In some configurations, the image analysis service 120 may access calendar data for the user (or other users recognized within the image 112) to determine whether that day is associated with an event identified by the user (that is, indicative of the environment during a time period prior to the particular time); see also Wu 6:34-38 also teaches [t]he date of the creation time and geolocation of the image 35 may also be considered in determining the scene type. For example, the sun disappears out of sight from the sky at different times in different seasons of the year. Moreover, sunset times are different for different locations (that is, a time period prior to the particular time)); and
generating the knowledge graph to include the log of sensor data associated with the identification of the object (Wu 6:24-33 teaches [i]n some examples, the image creation time determined from the metadata associated with the image 112 may be used to assist scene understanding. For example, when the scene type is determined to be “beach” and the creation time is 6:00 PM for an image (that is, a particular time), both beach and sunset beach may be tags for the scene types of the image 112. As an additional example, a dusk scene image and a sunset scene image of a same location or structure may appear to be similar. In such a case, the image creation time helps to determine the scene type, i.e., a dusk scene or a sunset scene).
Regarding claim 18, Wu teaches [a] non-transitory computer-readable medium (Wu 18:15-18 teaches that computer-readable storage media is any available media that provides for the non-transitory storage of data and that may be accessed by the computer 700) having stored therein instructions, that when executed by a computing device, cause the computing device to perform functions (Wu 18:15-18 that computer-readable storage media is any available media that provides for the non-transitory storage of data and that may be accessed by the computer 700) comprising:
obtaining, using a camera of the computing device, a two-dimensional (2D) image of an environment comprising a first object and a second object (Wu 5:23-25 teaches images 112 may be obtained from video of the user (e.g., taken by a smartphone or camera of the user) (that is, obtaining, using a camera of a computing device, a two-dimensional (2D) image of an environment comprising a first object and a second object));
receiving, from a server, an identification of the first object based on the 2D image of the first object (Wu 5:6-7 teaches the image analysis service 120 may be performed on the computing device 102 and/or some other computing device (that is, from a server); Wu 6:16-19 teaches the image analysis service 120 generates tags based on results of scene understanding and/or facial recognition that are performed on each image 112 (that is, receiving, from a server, an identification of the first object based on the 2D image of the first object));
obtaining, using one or more sensors of the computing device, additional data of the first object (Wu 5:25-39 teaches [a]digital image, such as one of the images 112, often includes a set of metadata (meaning data about the image). For example, a digital image 112 may include . . . focal length (e.g., 4mm); 35 mm focal length (e.g., 33); dimensions of the image; horizontal resolution; vertical resolution; bit depth (e.g., 24); color representation (e.g., sRGB); camera model (e.g., iPhone 6); F-stop; . . . GPS (Global Positioning System) latitude . . . longitude . . . and altitude (that is, obtaining, using one or more sensors of the computing device, additional data of the first object));
obtaining, using the one or more sensors of the computing device, additional data of the environment (Wu 6:1-15 teaches a piece of software program code . . . can be used to read the metadata and tags from the images (that is, “tags” are additional data of a surrounding environment of the object). In some configurations, the image analysis service 120 normalizes the tags of the retrieved images. For example, both “dusk” and “twilight” tags are changed to “sunset.” The image analysis service 120 may also generate additional tags for each image. The image analysis service 120 may also generate additional tags for each image. . . In some examples, the image analysis service 120 sends the GPS coordinates within the GeoTag to a map service server requesting for a location corresponding to the GPS coordinates (that is, obtaining, using the one or more sensors of the computing device, additional data of the environment)); 
determining, based on the additional data of the environment, a . . . relationship (Wu, 5:51-56, teaches a “contextual relationship” in that, [f]or example, a “family” tag (that is, a “tag” is determining, based on the additional data of the environment) indicates that the image is a family image, a “wedding” tag indicates that the image is a wedding image, a “subset” tag indicates that the image is a sunset scene image (that is, “sunset scene image” is a relationship), a “Santa Monica beach” tag indicates that the image is a taken at Santa Monica beach, etc.) between the first object and the second object in the environment (Wu, 5:59-63, teaches user tags may be used to identify recognized individuals in the images 112, other tags may identify particular animals in the images 112, other tags may identify objects within the images (e.g., cars, buildings, tables, chairs (that is, between the first object and a the second object in the surrounding environment));
generating a knowledge graph (Wu 8:12-13 teaches [a]fter identifying the related words, the [Knowledge Graph] manager 130 generates the dense [Knowledge Graph] 124 (that is, generating a knowledge graph); Wu Fig. 1 teaches (Examiner notations added):

    PNG
    media_image1.png
    585
    1184
    media_image1.png
    Greyscale

Wu 4:18-20 teaches depicting an illustrative operating environment in which a knowledge graph is generated and used with tagged images of a user) including (i) the additional data of the first object associated with the identification of the first object and (ii) the additional data of the environment also associated with the identification of the first object (Wu Fig. 2 for generating and refining a knowledge graph; Wu 7:41-43 teaches [g]enerally, the dense KG 124 is used in conjunction with the tags and metadata 118A for the images 112 of a user when identifying the objects and concepts within an image (that is, the tags and metadata 118A are additional data of the object) and concepts within an image (that is, the tags and metadata 118A are additional data of the environment)), wherein the knowledge graph is organized in a hierarchical semantic manner to illustrate the . . . relationship between the first object and the second object (Wu Fig. 3 teaches (Examiner notations added):

    PNG
    media_image2.png
    700
    943
    media_image2.png
    Greyscale

Wu 1:39-40 teaches that FIG. 3 is a block diagram shown an illustrative dense knowledge graph generated from an exemplary image 302; Wu 12:36-44 teaches [Knowledge Graph] 304A shows an outdoors category 306A that includes tags identified within the image and included within a generated dense knowledge graph 124 for the outdoors category 304A. In the current example, the outdoors category 306A includes an identification of “bay, soil, nature, beach, scenery, sand, ocean, coast, water, and sea.” The plant category 306B graph shows a classification of “tree and palm tree”. The field category 306C graph includes “land and island”; Wu 3:2-3 teaches [t]he created [Knowledge Graph] can have a large number of hierarchical levels (that is, organized in a hierarchical semantic manner to illustrate the . . . relationship between the first object and the second object)); 
* * *
Though Wu teaches an image analysis service can access and search a knowledge graph data to identify different categories depicted by objects within an image, Wu, however, does not explicitly teach -
* * *
obtaining a second two-dimensional image including at least part of the first object; and
identifying, using the knowledge graph, the first object in the second two-dimensional image.
But Dillon teaches -
* * *
obtaining a second two-dimensional image (Dillon, Table 4.6, teaches results of images (that is, a plurality of “images,” which is obtaining a second-two dimensional image) of four scenarios:

    PNG
    media_image3.png
    201
    639
    media_image3.png
    Greyscale

Dillon at p. 173, “4.10 System Performance and Results,” first paragraph, teaches [t]he knowledge bases for the scenarios were constructed from a collection of 85 images (that is, the “85 images” include a second two-dimensional image)) including at least part of the first object (Dillon, at p. 124, “4.2 World Knowledge,” first paragraph, teaches [o]ne of the most important data representations in scene understanding and object recognition systems is the representation of world knowledge. Cite represents world knowledge as a semi-restricted graph in which each node represents an object or visual concept which is either a part-of, a view-of or a type-of its parent or parents (that is, “a part-of” is including at least part of the first object)); and
identifying, using the knowledge graph, the first object in the second two-dimensional image (Dillon, Fig. 4.2, which teaches:

    PNG
    media_image4.png
    196
    639
    media_image4.png
    Greyscale

Dillon at p. 124, “4.2 World Knowledge,” first paragraph, teaches that [w]ithin each node is stored information about the optimal segmentation, feature extraction, and matching algorithms that are used to recognize this object in an image).
Wu and Dillon are from the same or similar field of endeavor. Wu teaches an image analysis service that includes knowledge graph data to identify categories depicted by objects within an image. Dillon teaches world knowledge as a semi-restricted graph in which each node represents an object or visual concept which is either a part-of, a view-of or a type-of its parent or parents. Thus, it would have been obvious to a person having ordinary skill in the art as of the effective filing date of Applicant’s invention to modify Wu pertaining to a knowledge graph for object identification with the semi-restricted graph of Dillon.
The motivation for doing so is to provide a hierarchical knowledge base provides with a rich scene description that can include contextual, taxonomic and deep decomposition information. (Dillon at p. 122, “4.1.1. Proposed Theory,” fourth partial paragraph).
Though Wu and Dillon teach the feature of object relationships represented by a knowledge graph, the combination of Wu and Dillon, however, do not explicitly teach that the “relationship” illustrated be the knowledge graph is a “spatial relationship.”
But Hueting teaches a “knowledge graph is organized in a hierarchical semantic manner to illustrate the contextual spatial relationship between the first object and the second object” (Hueting, left column of p. 4, “3.3 Knowledge Graph,” second paragraph, teaches labels the graph stores knowledge units. They can be simple labels, but also more complex concepts such as primitive proxies, or spatial relations; Hueting Fig. 4 teaches:

    PNG
    media_image5.png
    241
    479
    media_image5.png
    Greyscale

Hueting, Fig. 4 caption, teaches relationships can exist in the abstraction graph as well as in the knowledge graph).
Wu, Dillon, and Hueting are from the same or similar field of endeavor. . Wu teaches an image analysis service that includes knowledge graph data to identify categories depicted by objects within an image. Dillon teaches world knowledge as a semi-restricted graph in which each node represents an object or visual concept which is either a part-of, a view-of or a type-of its parent or parents. Hueting teaches a unified multi-criterion data representation for understanding and processing of large-scale 3D scene. Thus, it would have been obvious to a person having ordinary skill in the art as of the effective filing date of Applicant’s invention to modify the combination of Wu and Dillon pertaining to a knowledge graph for object identification with the multi-criterion data representation of Hueting.
The motivation for doing so is to take scene understanding to the next level of performance by taking into account a unified representation of knowledge simultaneously. (Hueting, Abstract).
Regarding claim 19, the combination of Wu, Dillon, and Hueting teaches all of the limitations of claim 18, as described above in detail.
Wu teaches -
wherein the additional data of the environment indicates a spatial layout between the at least one item represented by the additional data of the environment and the first object, and wherein the functions further comprise:
generating the knowledge graph to include information indicating the spatial layout between the at least one item represented by the additional data of the environment and the first object (Wu 12:45-55 teaches [t]he display element 308 shows a portion of an exemplary listing of tags and their association with the outdoors category 306A. In the current example, the display element 308 shows that the sea identification has a rating or confidence level (that is, information indicating) of 0.866 (out of 1), whereas the sand identification has a 0.4086 rating, the beach identification has a rating of 0.405 and the water identification has a rating of 0.3747 (that is, information indicating the spatial layout). As discussed above, a user, such as a tester or user of the image software product 106 may make modifications to the tags and/or the graphs 304A-304C that may then be used by the KG manager 130 to update the affected KGs 124 (that is, the rating or confidence level is information indicating the spatial layout between the at least one item represented by the additional data of the environment and the first object)).
Regarding claim 20, the combination of Wu, Dillon, and Hueting teaches all of the limitations of claim 18, as described above in detail.
Wu teaches wherein the functions further comprise:
determining a person associated with the first object (Wu 5:59-63 teaches user tags may be used to identify recognized individuals in the images 112 (that is, determining a person associated with the first object), other tags may identify particular animals in the images 112, other tags may identify objects within the images (e.g., cars, buildings, tables, chairs); and
generating the knowledge graph to include information indicating the person associated with the first object to label the first object as belonging to the person (Wu 7:41-43 the dense KG 124 is used in conjunction with the tags and metadata 118A for the images 112 of a user (that is, indicating . . . to label the object as belonging to the person) when identifying the objects and concepts within an image (that is, generating the knowledge graph to include information indicating the person associated with object to label the object as belonging to the person)).
Regarding claim 21, the combination of Wu, Dillon, and Hueting teaches all of the limitations of claim 1, as described above in detail.
determining a contextual relationship between the first object and the second object (Wu, 5:51-56, teaches a “contextual relationship” in that, [f]or example, a “family” tag (that is, a “tag” is determining, based on the additional data of the surrounding environment) indicates that the image is a family image, a “wedding” tag indicates that the image is a wedding image, a “subset” tag indicates that the image is a sunset scene image (that is, “sunset scene image” shows objects in a relationship), a “Santa Monica beach” tag indicates that the image is a taken at Santa Monica beach, etc.), the contextual relationship being a personal association (Wu 6:44-46 teaches the image analysis service 120 may also perform facial recognition to recognize faces and determine facial expressions of individuals within the images 112 (that is, the contextual relationship is a personal association); see also, Wu 5:50-54 teaches [t]he tags describe and indicate the characteristics of the image. For example, a “family” tag indicates the image is a family image), an activity (Wu 5:50-54 teaches [t]he tags describe and indicate the characteristics of the image. For example, . . . a “wedding” tag indicates the image is a wedding image (that is, “wedding” is an activity)), or a chronology of events (Wu 6:24-26 teaches the image creation time determined from the metadata associated with the image 112 may be used to assist scene understanding (that is, “image creation time” is a chronology of events)), wherein the knowledge graph is generated based on the contextual relationship between the first object and the second object (Wu 4:18-20 & Fig. 1 teaches depicting an illustrative operating environment in which a knowledge graph is generated and used with tagged images of a user (that is, the knowledge graph is generated base on the contextual relationship between the first object and the second object)).
Regarding claim 22, the combination of Wu, Dillon, and Hueting teaches all of the limitations of claim 1, as described above in detail.
Dillon teaches -
wherein identifying the first object in the second two-dimensional image comprises applying an object classifier (Dillon at p. 151, “4.6 Hypothesis Generation,” first paragraph, teaches Cite generates classification labels and grouping hypotheses in the visual interpretation and scene interpretation structures based on segmentation results (that is, applying an object classifier)) that was trained using the knowledge graph to the second two-dimensional image (Dillon, at p. 154, “4.6.2 Generating Visual Instance Nodes from the Visual Instance Graph,” first paragraph, teaches Intermediate parent VI nodes can be generated directly from the image data and VI graphs (that is, a “visual instance graph” is a knowledge graph) if the connected components operator, ProcessGroupingCC, is activated. Connected components grouping is only used during early training (that is, an object classifier that was trained using the knowledge graph) when the system is shown single objects and has not yet learned enough to reliably run the clique resolving process).
11.	Claims 2, 4, 6, 8, and 16 are rejected under 35 U.S.C. § 103 as being unpatentable over US Patent 10467290 to Wu et al. [hereinafter Wu] in view of Dillon et al., "Cite-Scene Understanding and Object Recognition," Advances in Computer Vision & Machine Intelligence (1997) [hereinafter Dillon], Hueting et al., MCGraph: Multi-criterion representation for scene understanding,” ACM (2014) [hereinafter Hueting], and US Published Application 20190056726 to Weldermariam et al. [hereinafter Weldermariam].
Regarding claim 2, the combination of Wu, Dillon, and Heuting teaches all of the limitations of claim 1, as described above in detail.
Though Wu, Dillon, and Heuting teach the feature of generating knowledge graphs from image capture, the combination of Wu, Dillon, and Hueting , however, does not explicitly teach -
wherein the one or more sensors of the computing device include the camera of the computing device, and wherein obtaining the additional data of the object comprises obtaining images of the first object from additional points of view.
But Weldermariam teaches -
wherein the one or more sensors of the computing device include the camera of the computing device, and wherein obtaining the additional data of the object comprises obtaining images of the first object from additional points of view (Weldermariam ¶ 0027 teaches the system and/or method of the present disclosure may be applied to more than one drone, for example, a drone swarm (that is, a drone swarm being a computing device such that each is obtaining, using a camera of a computing device, a two-dimensional (2D) image of an object, where each drone of a drone swarm is obtaining images of the first object from additional points of view)).
Wu, Dillon, Hueting, and Weldermariam are from the same or similar field of endeavor. Wu teaches an image analysis service that includes knowledge graph data to identify categories depicted by objects within an image. Dillon teaches world knowledge as a semi-restricted graph in which each node represents an object or visual concept which is either a part-of, a view-of or a type-of its parent or parents. Hueting teaches a unified multi-criterion data representation for understanding and processing of large-scale 3D scene. Weldermariam teaches linking a database with a growing knowledge graph according to previously carried out services, tests performed, and/or other information relating to image capture. Thus, it would have been obvious to a person having ordinary skill in the art as of the effective filing date of Applicant’s invention to modify the combination of Wu, Dillon, and Hueting pertaining to knowledge graph generation based on image capture with the varying point-of-view image capture of the drone device, or drone swarm, of Weldermariam.
The motivation for doing so is to provide a knowledge graph relating to proving the drone capabilities through a database linked with a growing knowledge graph of previously carried out services, tests performed, and/or other drone information. (Weldermariam ¶ 0022).
Regarding claim 4, the combination of Wu, Dillon, and Hueting teaches all of the limitations of claim 1, as described above in detail. 
Though Wu, Dillon, and Hueting teach the feature of generating knowledge graphs from image capture, the combination of Wu, Dillon, and Hueting, however, does not explicitly teach -
wherein the one or more sensors of the computing device include a microphone of the computing device, and wherein obtaining the additional data of the first object comprises obtaining, using the microphone, audio from the environment.
But Weldermariam teaches -
wherein the one or more sensors of the computing device include a microphone of the computing device, and wherein obtaining the additional data of the object comprises obtaining, using the microphone, audio from the surrounding environment of the object (Weldermariam ¶ 0048 teaches [t]he system and/or method in one embodiment may measure noise (that is, obtaining, using the microphone, audio from the surrounding environment of the first object) with a microphone (that is, the one or more sensors . . . include a microphone of the computing device)).
Wu, Dillon, Hueting, and Weldermariam are from the same or similar field of endeavor. Wu teaches an image analysis service that includes knowledge graph data to identify categories depicted by objects within an image. Dillon teaches world knowledge as a semi-restricted graph in which each node represents an object or visual concept which is either a part-of, a view-of or a type-of its parent or parents. Hueting teaches a unified multi-criterion data representation for understanding and processing of large-scale 3D scene. Weldermariam teaches linking a database with a growing knowledge graph according to previously carried out services, tests performed, and/or other information relating to image capture. Thus, it would have been obvious to a person having ordinary skill in the art as of the effective filing date of Applicant’s invention to modify the combination of Wu, Dillon, and Hueting pertaining to knowledge graph generation based on image capture with the varying point-of-view image capture of the drone device, or drone swarm, of Weldermariam.
The motivation for doing so is to provide a knowledge graph relating to proving the drone capabilities through a database linked with a growing knowledge graph of previously carried out services, tests performed, and/or other drone information. (Weldermariam ¶ 0022).
Regarding claim 6, the combination of Wu, Dillon, and Hueting teaches all of the limitations of claim 1, as described above in detail.
Though Wu, Dillon, and Hueting teach the feature of generating knowledge graphs from image capture, the combination of Wu, Dillon, and Hueting, however, does not explicitly teach -
wherein the computing device is a robotic device operable to move throughout an environment, and wherein obtaining, using the one or more sensors of the computing device, the additional data of the first object comprises:
the robotic device moving around the first object to collect data of the first object using the one or more sensors on-board the robotic device.
But Weldermariam teaches - 
wherein the computing device is a robotic device operable to move throughout an environment, and wherein obtaining, using the one or more sensors of the computing device, the additional data of the object comprises:
the robotic device moving around the object to collect data of the object using the one or more sensors on-board the robotic device (Weldermariam ¶ 0025 teaches present[ing] the drone (that is, a robotic device operable to move throughout an environment) with a test or puzzle. For instance, the system and/or method in one embodiment may transmit an automated signal that asks the drone to fly in a specified pattern, test its voice recognition system and/or its collision avoidance system may be tested. Other examples of tests may include but are not limited to, any one or more of imaging abilities and resolution, infrared (IR) camera tests, the drone's ability to safely drop a package, and/or others. Other systems can be tested such as thermal camera abilities, video camera abilities, and/or or a multispectral camera ability. For example, dome drones may need properly functioning hyperspectral cameras, designed specifically for use on unmanned aerial vehicles/systems (UAV/UAS), or remotely operated vehicles (ROV). For instance, drone-supported multispectral imaging can help monitor crops and plants (that is, the robotic device moving around the first object to collect data of the first object using the one or more sensors on-board the robotic device)).
Wu, Dillon, Hueting, and Weldermariam are from the same or similar field of endeavor. Wu teaches an image analysis service that includes knowledge graph data to identify categories depicted by objects within an image. Dillon teaches world knowledge as a semi-restricted graph in which each node represents an object or visual concept which is either a part-of, a view-of or a type-of its parent or parents. Hueting teaches a unified multi-criterion data representation for understanding and processing of large-scale 3D scene. Weldermariam teaches linking a database with a growing knowledge graph according to previously carried out services, tests performed, and/or other information relating to image capture. Thus, it would have been obvious to a person having ordinary skill in the art as of the effective filing date of Applicant’s invention to modify the combination of Wu, Dillon, and Hueting pertaining to knowledge graph generation based on image capture with the varying point-of-view image capture of the drone device, or drone swarm, of Weldermariam.
The motivation for doing so is to provide a knowledge graph relating to proving the drone capabilities through a database linked with a growing knowledge graph of previously carried out services, tests performed, and/or other drone information. (Weldermariam ¶ 0022).
Regarding claim 8, the combination of Wu, Dillon, and Hueting teaches all of the limitations of claim 1, as described above in detail.
Though Wu, Dillon, and Hueting teach the feature of generating knowledge graphs from image capture, the combination of Wu, Dillon, and Hueting, however, does not explicitly teach -
wherein obtaining the additional data of the first object comprises obtaining audio including speech, and the method further comprises:
receiving, from another server, an output of speech recognition of the speech; and
assigning an entity identification to the knowledge graph using the output of the speech recognition of the speech, wherein the entity identification indicates an identification of a node of the knowledge graph.
But Weldermariam teaches - 
wherein obtaining the additional data of the object comprises obtaining audio including speech (Weldermariam ¶ 0018 teaches [r]eceiving information from the drone may be based on information related to: . . . demonstrating speech recognition ability of a desired level, for example, responding to several verbal commands (that is, wherein obtaining the additional data of the first object comprises obtaining audio including speech)), and the method further comprises:
receiving, from another server, an output of speech recognition of the speech (Weldermariam Fig. 4 teaches:

    PNG
    media_image6.png
    678
    1175
    media_image6.png
    Greyscale

Weldermariam ¶ 0028 teaches [a]s another example, the drone may communicate a spoken command or speech via wireless communication (that is, receiving, from another server, an output of speech recognition of the speech)); and
assigning an entity identification to the knowledge graph using the output of the speech recognition of the speech, wherein the entity identification indicates an identification of a node of the knowledge graph (Weldermariam ¶ 0044 teaches challenges or tests may be context dependent . . . . In one embodiment, as the drone flies, the drone may continually broadcast a signal indicating that it has passed the test. The signal may be in the form of an encrypted communication and may also include a drone ID (for example, a universally unique identifier (UUID), which may be a 128-bit number used to identify information in computer systems or the like). For example, during the first communication, the drone responds with UUID, and forthcoming communication payloads always include the UUID; Weldermariam ¶ 0048 teaches a CAPTCHA test generator may generate a challenge to test the drone with different noise characteristics, for example, including a voice command with a level of distortion that simulates noisy environment; Weldermariam ¶ 0053 teaches that [t]he database is further linked with growing knowledge graph according to previously carried out services, tests performed, and/or others. The system fetches context and type of the service from the database using the requested order payload (that is, through the UUID and CAPTCHA test, assigning an entity identification to the knowledge graph using the output of the speech recognition of the speech, wherein the entity identification indicated an identification of a node of the knowledge graph); Examiner notes the UUID of Weldermariam is an identification of a node of the knowledge graph).
Wu, Dillon, Hueting, and Weldermariam are from the same or similar field of endeavor. Wu teaches an image analysis service that includes knowledge graph data to identify categories depicted by objects within an image. Dillon teaches world knowledge as a semi-restricted graph in which each node represents an object or visual concept which is either a part-of, a view-of or a type-of its parent or parents. Hueting teaches a unified multi-criterion data representation for understanding and processing of large-scale 3D scene. Weldermariam teaches linking a database with a growing knowledge graph according to previously carried out services, tests performed, and/or other information relating to image capture. Thus, it would have been obvious to a person having ordinary skill in the art as of the effective filing date of Applicant’s invention to modify the combination of Wu, Dillon, and Hueting pertaining to knowledge graph generation based on image capture with the varying point-of-view image capture of the drone device, or drone swarm, of Weldermariam.
The motivation for doing so is to provide a knowledge graph relating to proving the drone capabilities through a database linked with a growing knowledge graph of previously carried out services, tests performed, and/or other drone information. (Weldermariam ¶ 0022).
Regarding claim 16, the combination of Wu, Dillon, and Hueting teaches all of the limitations of claim 15, as described above in detail.
Wu teaches - 
* * *
and obtaining, using the one or more sensors of the computing device, the additional data of the first object comprises obtaining the additional data at different times of day (Wu 6:24-33 teaches [i]n some examples, the image creation time determined from the metadata associated with the image 112 may be used to assist scene understanding. For example, when the scene type is determined to be “beach” and the creation time is 6:00 PM for an image, both beach and sunset beach may be tags for the scene types of the image 112. As an additional example, a dusk scene image and a sunset scene image of a same location or structure may appear to be similar. In such a case, the image creation time helps to determine the scene type, i.e., a dusk scene or a sunset scene (that is, the additional data of the first object comprises obtaining the additional data at different times of day)).
Though Wu, Dillon, and Hueting teach the feature of generating knowledge graphs from image capture, the combination of Wu, Dillon, and Heuting, however, does not explicitly teach -
. . . wherein the one or more sensors of the computing device include a microphone of the computing device, and wherein obtaining the additional data of the first object comprises:
obtaining, using the camera, images of the first object from additional points of view;
obtaining, using the microphone, audio from the environment of the first object; and
* * *
But Weldermariam teaches -
. . . wherein the one or more sensors of the computing device include a microphone of the computing device (Weldermariam ¶ 0048 teaches a microphone), and wherein obtaining the additional data of the object comprises:
obtaining, using the camera, images of the first object from additional points of view (Weldermariam ¶ 0027 teaches the system and/or method of the present disclosure may be applied to more than one drone, for example, a drone swarm (that is, a drone swarm being a computing device such that each is obtaining, using a camera of a computing device, a two-dimensional (2D) image of an object, where each drone of a drone swarm is obtaining, using the camera, images of the first object from additional points of view));
obtaining, using the microphone, audio from the environment of the first object (Weldermariam ¶ 0048 teaches [t]he system and/or method in one embodiment may measure noise (that is, obtaining, using the microphone, audio from the environment of the object) with a microphone (that is, the one or more sensors . . . include a microphone of the computing device)); and
* * *
Wu, Dillon, Hueting, and Weldermariam are from the same or similar field of endeavor. Wu teaches an image analysis service that includes knowledge graph data to identify categories depicted by objects within an image. Dillon teaches world knowledge as a semi-restricted graph in which each node represents an object or visual concept which is either a part-of, a view-of or a type-of its parent or parents. Hueting teaches a unified multi-criterion data representation for understanding and processing of large-scale 3D scene. Weldermariam teaches linking a database with a growing knowledge graph according to previously carried out services, tests performed, and/or other information relating to image capture. Thus, it would have been obvious to a person having ordinary skill in the art as of the effective filing date of Applicant’s invention to modify the combination of Wu, Dillon, and Hueting pertaining to knowledge graph generation based on image capture with the varying point-of-view image capture of the drone device, or drone swarm, of Weldermariam.
The motivation for doing so is to provide a knowledge graph relating to proving the drone capabilities through a database linked with a growing knowledge graph of previously carried out services, tests performed, and/or other drone information. (Weldermariam ¶ 0022). 
12.	Claim 3 is rejected under 35 U.S.C. § 103 as being unpatentable over US Patent 10467290 to Wu et al. [hereinafter Wu] in view of in view of Dillon et al., "Cite-Scene Understanding and Object Recognition," Advances in Computer Vision & Machine Intelligence (1997) [hereinafter Dillon], Hueting et al., MCGraph: Multi-criterion representation for scene understanding,” ACM (2014) [hereinafter Hueting], and US Published Application 20160189365 to Lee et al. [hereinafter Lee].
Regarding claim 3, the combination of Wu, Dillon, and Hueting teaches all of the limitations of claim 1, as described above in detail.
Though Wu, Dillon, Hueting teach generating knowledge graphs from image capture, where the image capture includes metadata such as “focal length”, the combination of Wu, Dillon, and Heuting, however, does not explicitly teach 
wherein the one or more sensors of the computing device include a depth camera of the computing device, and wherein obtaining the additional data of the first object comprises obtaining depth images of the object.
But Lee teaches -
wherein the one or more sensors of the computing device include a depth camera of the computing device, and wherein obtaining the additional data of the object comprises obtaining depth images of the object (Lee ¶ 0085 teaches the identifying module 320 may identify a location of at least one object by using a focal length, a principal point, i.e., image coordinates of a point at which an optical axis meets the image sensor, or an asymmetric coefficient, i.e., a degree of inclination of the image sensor, included in the intrinsic parameter of the image sensor (that is, wherein the one or more sensors of the computing device include a depth camera of the computing device, and wherein obtaining the additional data of the first object comprises obtaining depth images of the object); Examiner notes that plain and ordinary meaning of “depth camera” is to include use of algorithms for determining an object “depth,” which is not inconsistent with the specification. For example, the specification recites that “[d]epth camera 137 may be configured to recover information regarding depth of objects in an environment, such as three-dimensional (3D) characteristics of the objects. For example, depth camera 137 may be or include an RGB-infrared (RGB-IR) camera that is configured to capture one or more images of a projected infrared pattern, and provide the images to a processor that uses various algorithms to triangulate and extract 3D data and outputs one or more RGBD images.” (PGPUB ¶ 0057)).
Wu, Dillon, Hueting, and Lee are from the same or similar field of endeavor. Wu teaches an image analysis service that includes knowledge graph data to identify categories depicted by objects within an image. Dillon teaches world knowledge as a semi-restricted graph in which each node represents an object or visual concept which is either a part-of, a view-of or a type-of its parent or parents. Hueting teaches a unified multi-criterion data representation for understanding and processing of large-scale 3D scene. Lee teaches identifying location information about one or more objects included in the image by using metadata such as focal length. Thus, it would have been obvious to a person having ordinary skill in the art as of the effective filing date of Applicant’s invention to modify the combination of Wu, Dillon, and Hueting pertaining to knowledge graph generation from image capture including metadata such as “focal length” with the distance algorithm determinations of Lee.
The motivation for doing so is for an electronic device to overcome accuracy decrease in location accuracy due to reflection and diffraction of electronic waves in an environment. (Lee ¶ 0005).
Response to Arguments
13.	Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action with regard to the reliance on the features taught by the prior art reference of Hueting, as set out in detail hereinabove. 
14.	With regard to the rejections under Section 101, Applicant argues that “the Office Action states that the ‘claim recites steps ... that can practically be performed in the human mind" and is thus directed to an abstract idea in view of Alice Step 2A Prong 1. However, the independent claims 1, 15, and 18 each recite a combination of elements that cannot be performed in the human mind. Thus, when considered as a whole, the claims do not recite an abstract idea and, even if an abstract idea were recited, the additional elements that cannot be performed in the human mind integrate any underlying abstract idea into a practical application.” (Response at pp. 11-12).
Examiner clarifies the rejection under Section 101 as set out above.
Under Step 2A, prong 1, the abstract idea identified an abstract idea, or judicial exception, of processing data to form a knowledge graph representing relations between objects of an image. 
Under Step 2A, prong 2, whether the claim as a whole integrates the judicial exception into a practical application of the exception is considered - the further steps of “obtaining” or “receiving” do not integrate the exception into a practical application, nor do the claim elements beyond the identified judicial exception of “a camera,” “a server, “sensors,” or a “knowledge graph” serve to integrate the exception for the reasons set out in the rejections above. 
Under Step 2B, considered is whether the claim recites additional elements that amount to an inventive concept than the recited judicial exception - the additional elements of “a camera,” “a server, “sensors,” or a “knowledge graph” as recited by the claims are well-understood, routine, and conventional, and used as such, within the claims, and accordingly, do not provide an inventive concept, as set out above in detail in the rejections above. (See also, e.g., claims 1, 15, and 18 and those claims depending directly or indirectly therefrom).
With respect to language may overcome the rejection under Section 101, Examiner points to the guidance from the Office’s Example 37- Relocation of Icons on a Graphical User Interface, and Example 39-Method for Training a Neural Network for Facial Detection. Also, Examiner points Applicant to the relatively recent guidance provided by the Federal Circuit under Cosmokey Solutions v. Duo Security, 15 F.4th 1091, 2021 USPQ2d 1003 (Fed. Cir 2021). Accordingly, Examiner generally notes aspects of Applicant’s specification recite “training a new object classifier,” via a knowledge graph (Specification ¶ 0070), or a graph framework for use with “machine learning or unsupervised learning” for node addition to the graph (Specification ¶ 0085), etc. 
15.	Applicant argues that, in view of the Applicant’s amended claims, that neither “Wu [nor Dillon teach] teach determining, based on additional data, a spatial relationship between a first and a second object. . . . The secondary reference, Dillon, also fails to teach ‘determining, based on the additional data . . . a spatial relationship . . . .’” (Response at pp. 12-13 (emphasis added)).
Examiner agrees. The prior art reference of Hueting is cited in the rejections above as teaching features covered by the “spatial relationship” as set out by Applicant’s amended claims. 
Moreover, the rejections above clearly set forth which claim limitations are taught by each of the prior art references, and the reason why it would be obvious to a person having ordinary skill in the art as of the effective filing date of the Applicant’s invention to combine their teachings.
Conclusion
16.	Any inquiry concerning this communication or earlier communications from the Examiner should be directed to KEVIN L. SMITH whose telephone number is (571) 272-5964. Normally, the Examiner is available on Monday-Thursday 0730-1730. 
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the Examiner by telephone are unsuccessful, the Examiner’s supervisor, KAKALI CHAKI can be reached on 571-272-3719. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/K.L.S./
Examiner, Art Unit 2122
/BRIAN M SMITH/Primary Examiner, Art Unit 2122                                                                                                                                                                                                        


    
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
    

    
        1 US Patent 10467290 to Wu et al. [hereinafter Wu].
        2 US Patent 10467290 to Wu et al. [hereinafter Wu].
        3 US Patent 10467290 to Wu et al. [hereinafter Wu].
        4 US Patent 10467290 to Wu et al. [hereinafter Wu].
        5 US Patent 10467290 to Wu et al. [hereinafter Wu].
        6 US Patent 10467290 to Wu et al. [hereinafter Wu].