DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This Office action is in response to Request for Continued Examination and Amendments filed on 12/30/2020.
Claim(s) 4, 8, 14, 18 is/are canceled.
Claim(s) 1, 5-7, 10-11, 15-17 is/are amended.
Claim(s) 1, 3, 5-7, 9-13, 15-17, 19-21 is/are pending in this Office Action.
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 12/30/2020 has been entered.
Information Disclosure Statement
The Information Disclosure Statement submitted by Applicant on 5/24/2021 has been considered by the examiner. 
Claim Objections
Upon further consideration of the claims, Claim 1 is objected to because “an image sensor” is referred to in both lines 5 and 6. Appropriate correction is required.
Claim Rejections - 35 USC § 112
Applicant' s amendments filed 12/30/2020, hereafter referred to as Applicant' s amendments, to overcome 35 USC 112(b) rejections of the final rejection mailed 11/17/2020, hereafter referred to as the final rejection, have been approved. The rejections have been removed. 
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.





Claim(s) 1, 3, 5-7, 9, 11-13, 15-17, 19-20 is/are rejected under 35 U.S.C. 103 as being obvious over Goto et al. (US 2006/0064203 A1), hereafter referred to as Goto, in view of Tsai et al. (US 2012/0120196 A1), hereafter referred to as Tsai, further in view of Zhao et al. (US 8,170,280 B2), hereafter referred to as Zhao.
Regarding claim 1, Goto teaches a robot (“mobile unit 301”, Fig. 26) comprising: 
a body (see block-like body of “mobile unit 301” in Fig. 26); 
a memory (“objective person storage part 112”, Fig. 1A) configured to store a person shape model (“information that can identify the person”, see para. 0110 citation below) including a leg pair (“The data of the objective person 101 inputted from the camera 103a of the objective person measurement device 103 is exemplified as the image data of the whole body or the lower half of the body of the objective person 101 as shown in FIG. 12. Moreover, besides this, the data of the objective person 101 inputted from the input device 901 may include information that can identify the person, such as the physical features (face, figure, shoes size, etc.) of the objective person 101, the features of shoes and clothes (color, size, and shape of the shoes, color, size, and shape of clothes, etc.), data of the walking state (footstep etc.), and so on. It is acceptable to preparatorily store these data in the objective person storage part 112 of the tracking unit 1000”, para. 0110);
a motor (“driving devices 119b such as motors”, para. 0093, Fig. 1A) configured to move the robot through a region (“coordinate system”, para. 0081, Fig. 1B)
(“the mobile unit 301 has a pair of running wheels 119a, a pair of driving devices 119b such as motors for forwardly and reversely driving the pair of running wheels 119a, and a pair of drivers 119c for controlling the driving of the pair of driving devices 119b on the basis of the movement command 118”, para. 0093, “FIG. 1B shows the coordinate system used in the present invention with illustrations of an objective person 101 to be tracked, an arrow 102 indicating the walking state (for example, landing of a foot of the objective person) of the objective person 101, and the mobile unit 301.”, para. 0081); 
(“objective person measurement device 103”, Fig. 1A) configured to obtain image information (“captured image”, see para. 0083 citation below) about a portion of the region (“an image capturing device 103a such as a camera for detecting the walking state 102…and an image processing part 103b for processing the captured image or being constructed of a sensor part that serves as a signal receiver”, para. 0083); and 
a processor (“walking stated determination part 109” and “objective person discrimination part 114”, “movement commanding part 117”, Fig. 1A) configured to: 
extract a leg pair (see pair of legs of “objective person 101” which comprises “feet 302 and 303”, Fig. 13, and “lower half of body”, para. 0110 citation below) which is respectively associated with a foot sole region (“feet 302 and 303”, Fig. 26 and para. 0179 citation below) in the region based on the obtained image information
(“The operation steps are carried out principally in the order of an objective person specifying process in step S100, an objective person walking state data obtainment process in step S200, an objective person confirmation process in step S300, an objective person walking speed and angle calculation process in step S400, and an objective person tracking process in step S500”, para. 0096, 
“The step S100 of the process for specifying the objective person 101 is described in detail with reference to the flow chart of FIG. 3 and FIGS. 10, 11, and 12”, para. 0108, “First of all, the data of the objective person 101 to be accompanied by the mobile unit 301 is inputted to the objective person storage part 112, the walking state obtainment part 105, and the objective person discrimination part 114 of the tracking unit 1000 from the objective person measurement device 103”, para. 0109, The data of the objective person 101 inputted from the camera 103a of the objective person measurement device 103 is exemplified as the image data of the whole body or the lower half of the body of the objective person 101 as shown in FIG. 12”, para. 0110, “The tracking unit 1000 first carries out the process for specifying the objective person 101 (step S100) That is, the images of both feet (the right foot of the objective person 101 is denoted by reference numeral 302, and the left foot is denoted by reference numeral 303) of the objective person 101 are obtained by the camera 103a”, para. 0179),
obtain a distance value (“distance”, see para. 0107 citation below) between each of the leg pair and the image sensor
(“FIG. 12 is a view showing an image 880 obtained by the camera 103a in the walking start state when t=t0 (state of FIG. 10, i.e., state in which both feet are landed) and an image region 890, to which attention is paid, in the image 880. It is noted that the dashed quadrangular frames 881 in the image 880 are added to the image after measurement in order to measure a distance between the mobile unit 301 and the objective person 101”, para. 0107), 
select, as a follow target, the leg pair (“The step S100 of the process for specifying the objective person 101”, para. 0110, this selection occurs in “step S100”, see Fig. 3, see also where the “walking state 102” of the “objective person” is measured in subsequent “S200” of Fig. 4, “The objective person measurement device 103 measures the information of the walking state 102 of the objective person 101 (step S201)…Next, the information of the walking state 102 measured in step S201”, para. 0114-0115), 
determine a moving point (“walk vector 111”, Fig. 26, “It is further assumed that a line segment, which connects a midpoints 501 of both feet when t=t0 with a midpoint 502 of both feet when t=t1, is assumed to be the walk vector 111.”, para. 0199, Fig. 29) based on a moving direction (“arrow 102”, Fig. 26) of the foot sole region (“feet 302 and 303”, Fig. 26) of the selected leg pair and a relative position (“midpoint”, see para. 0185 citation below) between a left foot sole region (“left foot 303”) and a right foot sole region (“right foot 302”) included in the foot sole region (“feet 302 and 303”),
(“After the process for specifying the objective person 101 (step S100) ends to specify the objective person 101, the walking state data obtainment process for obtaining the walking state data of the objective person 101 (step S200) is carried out…Then, the walking state determination part 109 calculates the predictive walking range 601 on the basis of the obtained information of the position coordinates of both feet 302 and 303…When walking is started, the predictive walking range 601 is predicted on the basis of the previously stored information until two or more steps are taken.”, para. 0182,
“Next,…In the objective person tracking process (step S500), the movement command 118 is transferred from the movement commanding part 117 of the tracking unit 1000 to the driving part 119 of the mobile unit 301 by the objective person confirmation data 116 and the walk vector 111 when t=t0, and the mobile unit 301 tracks the objective person 101 while keeping a prescribed distance (minimum distance) to the objective person 101…At this time, the position, which becomes the movement target of the mobile unit 301, is assumed to be located at, for example, the position coordinates of the midpoint between the position coordinates of the right foot 302 and the position coordinates of the left foot 303 of the objective person 101.”, para. 0185), and
control the motor to move the body to the moving point 5Serial No. 15/883,841Docket No. HI-1380
(“Next, the driving part 119 of the mobile unit 301 starts driving on the basis of the movement command 118, and the mobile unit 301 is moved or braked so that the mobile unit 301 tracks the objective person 101 while keeping a prescribed distance (minimum distance) to the objective person 101 (step S505).”, para. 0175),
wherein the processor is further configured to: 
extract a person candidate region (“image region 890”, Fig. 12) based on the obtained image (“FIG. 12 is a view showing an image 880 obtained by the camera 103a in the walking start state when t=t.sub.0 (state of FIG. 10, i.e., state in which both feet are landed) and an image region 890, to which attention is paid, in the image 880”, para. 0107). 
Goto does not explicitly teach wherein the “walking stated determination part 109” and “objective person discrimination part 114”, “movement commanding part 117” are one component (i.e., Applicant’s “a processor”), however, one of ordinary skill in the art at the time of filing would have found it obvious to make these parts one part, as it has been held that making components integral only involves routine skill in the art. In re Larson, 340 F.2d 965, 968, 144 USPQ 347, 349 (CCPA 1965), Schenck v. Nortron Corp., 713 F.2d 782, 218 USPQ 698 (Fed. Cir. 1983). In this case, combining these parts would require only ordinary skill in the art at the time of filing. 

Goto does not explicitly teach wherein the image sensor includes a depth sensor, the extracted leg pair is a plurality of leg pairs, and Reply to Office Action of wherein the processor is further configured to: 
obtain a plurality of distance values, and
select, as a follow target, a leg pair, of the plurality of leg pairs, that is located closest to the depth sensor based on the plurality of distance values.

However, Tsai teaches an image counting method and apparatus comprising:
a processor (“processor 420”, Fig. 4) configured to:
obtain a plurality of distance values (“z coordinate therein usually represents the depth of an object”, see para. 0018 citation below, “z value”, see para. 0021 citation below) between each of a plurality of “persons m1, m2 m3” (see para. 0018, 0021 citation below and Fig. 2-3) and a depth sensor (“3D camera 410”, Fig. 4, “depth camera”, see para. 0018 citation below)
(“FIG. 2 is a diagram showing an embodiment of the 3D camera to illustrate step S102. Different from the 2D photography technique, the present invention employs the 3D camera to acquire 3D images of every object, and further acquires x, y, z coordinates and pixel data of a plurality of pixels in the 3D image. The 3D camera may use 3D cameras, or depth cameras able to obtain spatial coordinates of objects with infrared light or laser…Taking FIG. 2 for example, the z coordinate therein usually represents the depth of an object (distance between the object and the camera) in the region.”, para. 0018), and
determining an object (“person m3”, Fig. 3), of the plurality of objects, that is located closest to the depth sensor based on the plurality of distance values (“lowest z value”, see para. 0021 citation below)
(“FIGS. 3A, 3B and 3C are diagrams illustrating step S104 of establishing a spatial correlative coordinate (x, z, t). The term "t" represents the number of pixels at the same coordinate value (x, z) which have pixel data lower than a threshold in a y direction.”, para. 0020,
“FIG. 3A is a 2D image showing three specific objects (i.e., three persons in this embodiment) m1, m2 and m3 which are respectively away from the camera from a far point to a near point. In addition to these three specific objects, other objects and backgrounds are shown in FIG. 3A. In this embodiment, the pixel data is the RGB value. FIG. 3B shows images of FIG. 3A with a gray level, where the gray level values in FIG. 3B can be obtained by gray-scaling the RGB value based on the previously described technique. In FIG. 3B, the darkest area corresponds to the background, the less darkest area corresponds to the person m1 who is farthest away from the camera and has the highest z coordinate, the slightly bright area corresponds to the person m2 who is second farthest from the camera and has the second highest z coordinate, and the brightest area corresponds to the person m3 who is closest to the camera and has the lowest z value. ”, para. 0021).

Goto teaches controlling a robot to move based on a moving direction of a foot sole region of a leg pair, and Tsai teaches determining a person of a plurality of persons is closest to a depth sensor by 

	Goto in view of Tsai do not explicitly teach wherein the processor is further configured to: 
compare the person shape model stored in the memory with the person candidate region,
determine that the extracted person candidate region is a person region when a matching rate between the extracted person candidate region and the person shape model is equal to or greater than a predetermined matching rate, and 
determine that the extracted person candidate region is an obstacle region when the extracted person candidate region is not the person region.

	However, comparing a tracked object to a model, measuring similarity between the object and model, and comparing the similarity against a threshold is known in the art. See Zhao teaches a method and apparatus for pattern tracking, comprising:
(“database”, see C7, lines 39-44 and C12, lines 32-33 citations below) configured to store a person shape model (“model”, Fig. 3) (“As will be understood and will become apparent below, embodiments of the present system may be operated in a computer environment including databases and other storage apparatuses, servers, processors, terminals and displays, computer-readable media, algorithms, and other computer-related components”, C7, lines 39-44, “the model is stored as a file in a database within the present system”, C12, lines 32-33); and
a processor (“processors”, see C7, lines 39-44 citation above) configured to:
extract a person candidate region (“features extracted”, see C15, lines 46-56 citation below) based on an obtained image (“shot or scene”, see C15, lines 46-56 citation below) (“Referring now to FIG. 5, a flowchart is shown describing an object tracking procedure 500 according to one embodiment of the present system. While the object tracking procedure 500 shown in FIG. 5 is for a singular shot or scene…At step 505, local features are extracted from an initial frame (usually the first frame in a shot), and a local object model is initialized for the given object”, C15, lines 46-56), 
compare the person shape model stored in the memory with the person candidate region (“If…the shot is complete, then an average similarity score is calculated between the optimal object images in each group and the global object models stored in the system (process 520)”, C17, lines 4-7, Fig. 5),
determine that the extracted person candidate region is a person region (“corresponding to the identified model”, “520”, Fig. 5) when a matching rate (“similarity”, “525”, Fig. 5) between the extracted person candidate region and the person shape model is equal to or greater than a predetermined matching rate (“predetermined threshold”, “525”, Fig. 5) (“YES” at “525”, Fig. 4), and 
determine that the extracted person candidate region is an obstacle region (“unknown”, “535”, Fig. 5) when the extracted person candidate region is not the person region (“NO” at “525”, Fig. 4) (“At step 525, each average confidence score or similarity measure is compared to a predetermined threshold. In one embodiment, the predetermined threshold is set by a system operator based on a desired accuracy and/or precision of recognition and/or tracking”, C17, lines 16-21, Fig. 5, “At step 525, if the calculated average similarity exceeds the predetermined threshold value, then the system automatically labels the object group as corresponding to the identified model (step 530)”, C17, lines 40-43, “If, however, the average similarity does not exceed the threshold, then the object group is labeled as unknown or unidentified (step 535)”, C17, lines 45-47, Zhao teaches wherein “the object image comprises an image of a face”, C5, lines 3-4, and thus “YES” at “525” (Fig. 5) results in the detected object being labeled as a face (i.e., a person) and “NO” at “525” results in the detected object not being a face).

Zhao states, “Recognition of objects within videos plays an important role for many video-related purposes, such as indexing and retrieval of videos based on identified objects, security and surveillance, and other similar functions. As used herein, the term "object" shall refer to a definable image within a video, such as a face, automobile, article of clothing, or virtually any other type of object. For example, FIG. 1 illustrates a sample frame of a video scene. Exemplary objects that are capable of being recognized within the illustrated video include characters' faces, a plant in a vase, a shoe, and an automobile tire, each of which is shown within a dashed box to indicate its detection and recognition within the frame. As will be understood, however, virtually any image may be detected and recognized within a given video” (C1, lines 27-40). Both Goto in view of Tsai and Zhao teach processors configured to preform object tracking, extract person candidate regions, and memories configured to store models. Thus, it would have been obvious to one of ordinary skill in the art at the time of filing to modify the invention of Goto in view of Tsai with the teachings of Zhao by implementing the similarity score comparison as taught by Zhao (“520”-“530”, Fig. 5). The motivation for doing so would be to “[identify] 

Regarding claim 11, Goto teaches a robot (“mobile unit 301”, Fig. 26) comprising: 
a memory (“objective person storage part 112”, Fig. 1A) configured to store a person shape model (“information that can identify the person”, see para. 0110 citation below) including a leg pair (“The data of the objective person 101 inputted from the camera 103a of the objective person measurement device 103 is exemplified as the image data of the whole body or the lower half of the body of the objective person 101 as shown in FIG. 12. Moreover, besides this, the data of the objective person 101 inputted from the input device 901 may include information that can identify the person, such as the physical features (face, figure, shoes size, etc.) of the objective person 101, the features of shoes and clothes (color, size, and shape of the shoes, color, size, and shape of clothes, etc.), data of the walking state (footstep etc.), and so on. It is acceptable to preparatorily store these data in the objective person storage part 112 of the tracking unit 1000”, para. 0110);
a motor (“driving devices 119b such as motors”, para. 0093, Fig. 1A) configured to move the robot through a region (“coordinate system”, para. 0081, Fig. 1B)
(“the mobile unit 301 has a pair of running wheels 119a, a pair of driving devices 119b such as motors for forwardly and reversely driving the pair of running wheels 119a, and a pair of drivers 119c for controlling the driving of the pair of driving devices 119b on the basis of the movement command 118”, para. 0093, “FIG. 1B shows the coordinate system used in the present invention with illustrations of an objective person 101 to be tracked, an arrow 102 indicating the walking state (for example, landing of a foot of the objective person) of the objective person 101, and the mobile unit 301.”, para. 0081); 
(“objective person measurement device 103”, Fig. 1A) configured to identify respective distances (“distance”, see para. 0107 citation below) between the robot and one or more legs (see pair of legs of “objective person 101” which comprises “feet 302 and 303”, Fig. 13, and “lower half of body”, para. 0110 citation below) within the region, wherein the legs are associated with at least one person (“objective person 101”, Fig. 13)
(“The operation steps are carried out principally in the order of an objective person specifying process in step S100, an objective person walking state data obtainment process in step S200, an objective person confirmation process in step S300, an objective person walking speed and angle calculation process in step S400, and an objective person tracking process in step S500”, para. 0096, 
“The step S100 of the process for specifying the objective person 101 is described in detail with reference to the flow chart of FIG. 3 and FIGS. 10, 11, and 12”, para. 0108, “First of all, the data of the objective person 101 to be accompanied by the mobile unit 301 is inputted to the objective person storage part 112, the walking state obtainment part 105, and the objective person discrimination part 114 of the tracking unit 1000 from the objective person measurement device 103”, para. 0109, The data of the objective person 101 inputted from the camera 103a of the objective person measurement device 103 is exemplified as the image data of the whole body or the lower half of the body of the objective person 101 as shown in FIG. 12”, para. 0110, “The tracking unit 1000 first carries out the process for specifying the objective person 101 (step S100) That is, the images of both feet (the right foot of the objective person 101 is denoted by reference numeral 302, and the left foot is denoted by reference numeral 303) of the objective person 101 are obtained by the camera 103a”, para. 0179); and
a processor (“walking stated determination part 109” and “objective person discrimination part 114”, “movement commanding part 117”, Fig. 1A) configured control the motor to move the robot
(“The reference numeral 119 denotes a driving part of the walk tracking type mobile unit 301 equipped with the tracking unit 1000, for operating as follows by executing movement command 118 obtained from the movement commanding part 117 of the tracking unit 1000”, para. 0093),
Reply to Office Action of wherein the processor is further configured to: 
extract a leg pair (see pair of legs of “objective person 101” which comprises “feet 302 and 303”, Fig. 13, and “lower half of body”, para. 0110 citation above) which is respectively associated with a foot sole region (“feet 302 and 303”, Fig. 26 and para. 0179 citation above) in the region using the sensor, wherein the foot sole region includes a left foot sole region (“foot 303”) and a right foot sole region (“foot 302”),
obtain a distance value (“distance”, see para. 0107 citation below) between each of the leg pair and the image sensor
(“FIG. 12 is a view showing an image 880 obtained by the camera 103a in the walking start state when t=t0 (state of FIG. 10, i.e., state in which both feet are landed) and an image region 890, to which attention is paid, in the image 880. It is noted that the dashed quadrangular frames 881 in the image 880 are added to the image after measurement in order to measure a distance between the mobile unit 301 and the objective person 101”, para. 0107), 
select, as a follow target, the leg pair (“The step S100 of the process for specifying the objective person 101”, para. 0110, this selection occurs in “step S100”, see Fig. 3, see also where the “walking state 102” of the “objective person” is measured in subsequent “S200” of Fig. 4, “The objective person measurement device 103 measures the information of the walking state 102 of the objective person 101 (step S201)…Next, the information of the walking state 102 measured in step S201”, para. 0114-0115), 
determine a moving point of the person (“walk vector 111”, Fig. 26, “It is further assumed that a line segment, which connects a midpoints 501 of both feet when t=t0 with a midpoint 502 of both feet when t=t1, is assumed to be the walk vector 111.”, para. 0199, Fig. 29) based on a moving direction (“arrow 102”, Fig. 26) of the foot sole region (“feet 302 and 303”, Fig. 26) of the selected leg pair and a relative position (“midpoint”, see para. 0185 citation below) between a left foot sole region (“left foot 303”) and a right foot sole region (“right foot 302”) included in the foot sole region (“feet 302 and 303”),
(“After the process for specifying the objective person 101 (step S100) ends to specify the objective person 101, the walking state data obtainment process for obtaining the walking state data of the objective person 101 (step S200) is carried out…Then, the walking state determination part 109 calculates the predictive walking range 601 on the basis of the obtained information of the position coordinates of both feet 302 and 303…When walking is started, the predictive walking range 601 is predicted on the basis of the previously stored information until two or more steps are taken.”, para. 0182,
“Next,…In the objective person tracking process (step S500), the movement command 118 is transferred from the movement commanding part 117 of the tracking unit 1000 to the driving part 119 of the mobile unit 301 by the objective person confirmation data 116 and the walk vector 111 when t=t0, and the mobile unit 301 tracks the objective person 101 while keeping a prescribed distance (minimum distance) to the objective person 101…At this time, the position, which becomes the movement target of the mobile unit 301, is assumed to be located at, for example, the position coordinates of the midpoint between the position coordinates of the right foot 302 and the position coordinates of the left foot 303 of the objective person 101.”, para. 0185), and
extract a person candidate region (“image region 890”, Fig. 12) based on the obtained image (“FIG. 12 is a view showing an image 880 obtained by the camera 103a in the walking start state when t=t.sub.0 (state of FIG. 10, i.e., state in which both feet are landed) and an image region 890, to which attention is paid, in the image 880”, para. 0107). 
In re Larson, 340 F.2d 965, 968, 144 USPQ 347, 349 (CCPA 1965), Schenck v. Nortron Corp., 713 F.2d 782, 218 USPQ 698 (Fed. Cir. 1983). In this case, combining these parts would require only ordinary skill in the art at the time of filing. 

Goto does not explicitly teach wherein the sensor is a depth sensor, wherein the extracted leg pair is a plurality of leg pairs, and Reply to Office Action of wherein the processor is further configured to: 
obtain a plurality of distance values, and
select, as a follow target, a leg pair, of the plurality of leg pairs, that is located closest to the depth sensor based on the plurality of distance values.

However, Tsai teaches an image counting method and apparatus comprising a depth sensor (“3D camera 410”, Fig. 4, “depth camera”, see para. 0018 citation below), and a processor (“processor 420”, Fig. 4) configured to:
obtain a plurality of distance values (“z coordinate therein usually represents the depth of an object”, see para. 0018 citation below, “z value”, see para. 0021 citation below) between each of a plurality of “persons m1, m2 m3” (see para. 0018, 0021 citation below and Fig. 2-3) and the depth sensor 
(“FIG. 2 is a diagram showing an embodiment of the 3D camera to illustrate step S102. Different from the 2D photography technique, the present invention employs the 3D camera to acquire 3D images of every object, and further acquires x, y, z coordinates and pixel data of a plurality of pixels in the 3D image. The 3D camera may use 3D cameras, or depth cameras able to obtain spatial coordinates of objects with infrared light or laser…Taking FIG. 2 for example, the z coordinate therein usually represents the depth of an object (distance between the object and the camera) in the region.”, para. 0018), and
determining an object (“person m3”, Fig. 3), of the plurality of objects, that is located closest to the depth sensor based on the plurality of distance values (“lowest z value”, see para. 0021 citation below)
(“FIGS. 3A, 3B and 3C are diagrams illustrating step S104 of establishing a spatial correlative coordinate (x, z, t). The term "t" represents the number of pixels at the same coordinate value (x, z) which have pixel data lower than a threshold in a y direction.”, para. 0020,
“FIG. 3A is a 2D image showing three specific objects (i.e., three persons in this embodiment) m1, m2 and m3 which are respectively away from the camera from a far point to a near point. In addition to these three specific objects, other objects and backgrounds are shown in FIG. 3A. In this embodiment, the pixel data is the RGB value. FIG. 3B shows images of FIG. 3A with a gray level, where the gray level values in FIG. 3B can be obtained by gray-scaling the RGB value based on the previously described technique. In FIG. 3B, the darkest area corresponds to the background, the less darkest area corresponds to the person m1 who is farthest away from the camera and has the highest z coordinate, the slightly bright area corresponds to the person m2 who is second farthest from the camera and has the second highest z coordinate, and the brightest area corresponds to the person m3 who is closest to the camera and has the lowest z value. ”, para. 0021).

Goto teaches controlling a robot to move based on a moving direction of a foot sole region of a leg pair, and Tsai teaches determining a person of a plurality of persons is closest to a depth sensor by measuring depth from the person to the sensor. Tsai further teaches the person may be stationary or 

Goto in view of Tsai do not explicitly teach wherein the processor is further configured to: 
compare the person shape model stored in the memory with the person candidate region,
determine that the extracted person candidate region is a person region when a matching rate between the extracted person candidate region and the person shape model is equal to or greater than a predetermined matching rate, and 
determine that the extracted person candidate region is an obstacle region when the extracted person candidate region is not the person region.

However, comparing a tracked object to a model, measuring similarity between the object and model, and comparing the similarity against a threshold is known in the art.  See Zhao teaches a method and apparatus for pattern tracking, comprising:
(“database”, see C7, lines 39-44 and C12, lines 32-33 citations below) configured to store a person shape model (“model”, Fig. 3) (“As will be understood and will become apparent below, embodiments of the present system may be operated in a computer environment including databases and other storage apparatuses, servers, processors, terminals and displays, computer-readable media, algorithms, and other computer-related components”, C7, lines 39-44, “the model is stored as a file in a database within the present system”, C12, lines 32-33); and
a processor (“processors”, see C7, lines 39-44 citation above) configured to:
extract a person candidate region (“features extracted”, see C15, lines 46-56 citation below) based on an obtained image (“shot or scene”, see C15, lines 46-56 citation below) (“Referring now to FIG. 5, a flowchart is shown describing an object tracking procedure 500 according to one embodiment of the present system. While the object tracking procedure 500 shown in FIG. 5 is for a singular shot or scene…At step 505, local features are extracted from an initial frame (usually the first frame in a shot), and a local object model is initialized for the given object”, C15, lines 46-56), 
compare the person shape model stored in the memory with the person candidate region (“If…the shot is complete, then an average similarity score is calculated between the optimal object images in each group and the global object models stored in the system (process 520)”, C17, lines 4-7, Fig. 5),
determine that the extracted person candidate region is a person region (“corresponding to the identified model”, “520”, Fig. 5) when a matching rate (“similarity”, “525”, Fig. 5) between the extracted person candidate region and the person shape model is equal to or greater than a predetermined matching rate (“predetermined threshold”, “525”, Fig. 5) (“YES” at “525”, Fig. 4), and 
determine that the extracted person candidate region is an obstacle region (“unknown”, “535”, Fig. 5) when the extracted person candidate region is not the person region (“NO” at “525”, Fig. 4) (“At step 525, each average confidence score or similarity measure is compared to a predetermined threshold. In one embodiment, the predetermined threshold is set by a system operator based on a desired accuracy and/or precision of recognition and/or tracking”, C17, lines 16-21, Fig. 5, “At step 525, if the calculated average similarity exceeds the predetermined threshold value, then the system automatically labels the object group as corresponding to the identified model (step 530)”, C17, lines 40-43, “If, however, the average similarity does not exceed the threshold, then the object group is labeled as unknown or unidentified (step 535)”, C17, lines 45-47, Zhao teaches wherein “the object image comprises an image of a face”, C5, lines 3-4, and thus “YES” at “525” (Fig. 5) results in the detected object being labeled as a face (i.e., a person) and “NO” at “525” results in the detected object not being a face).

Zhao states, “Recognition of objects within videos plays an important role for many video-related purposes, such as indexing and retrieval of videos based on identified objects, security and surveillance, and other similar functions. As used herein, the term "object" shall refer to a definable image within a video, such as a face, automobile, article of clothing, or virtually any other type of object. For example, FIG. 1 illustrates a sample frame of a video scene. Exemplary objects that are capable of being recognized within the illustrated video include characters' faces, a plant in a vase, a shoe, and an automobile tire, each of which is shown within a dashed box to indicate its detection and recognition within the frame. As will be understood, however, virtually any image may be detected and recognized within a given video” (C1, lines 27-40). Both Goto in view of Tsai and Zhao teach processors configured to preform object tracking, extract person candidate regions, and memories configured to store models. Thus, it would have been obvious to one of ordinary skill in the art at the time of filing to modify the invention of Goto in view of Tsai with the teachings of Zhao by implementing the similarity score comparison as taught by Zhao (“520”-“530”, Fig. 5). The motivation for doing so would be to “[identify] 

Regarding claims 3 and 13, Goto further teaches wherein the processor (“image processing part 103b”, “walking state obtainment part 105”, “walking state determination part 109”, “objective person discrimination part 114”, Fig. 1A) obtains a point (“midpoint”, also referred to as “SrO”, Fig. 19) spaced apart from a center of a line (see center of “line segment OhLOFs”, Fig. 13), connecting a center (“Oh-LO”, Fig. 19) of the left32DOCKET NO.: HI-1380 foot sole region (“left foot 303”, Fig. 26) and a center (“Oh-RO”, Fig. 19) of the right foot sole region (“right foot 302”, Fig. 26), by a certain distance in a direction opposite to the moving direction (“arrow 102”, Fig. 26), and determines the point as the moving point
(“Sr0 is the movement target of the mobile unit 301 and herein allowed to have the position coordinates of the midpoint between the position coordinates OhRO of the right foot 302 and the position coordinates OhLO of the left foot 303”, para. 0107, “the objective person tracking process (step S500) is started…the position, which becomes the movement target of the mobile unit 301, is assumed to be located at, for example, the position coordinates of the midpoint between the position coordinates of the right foot 302 and the position coordinates of the left foot 303 of the objective person 101.”, para. 0185).

Regarding claims 5 and 15, Goto further teaches wherein the moving regions (“frames 881”, Fig. 12, 14-16, 18”) include a first moving region (“frames 881” which comprise “image region 890”) that is associated with the person region (“image region 890”, Fig. 12, 14-16, 18) and a second moving region (portions of “frames 881” which do not comprise “image region 890”) that is not associated with the person region, and the processor (“image processing part 103b”, “walking state obtainment part 105”, “walking state determination part 109”, “objective person discrimination part 114”, Fig. 1A): 
identifies the second moving region as an obstacle region (“an obstacle is detected in the image picked-up from the camera”, see para. 0120 citation below), and 
controls the motor so that the body avoids the obstacle region (“the angle of the predictive walking range predicting at that time may temporarily be 180 degrees.”, see para. 0120 citation below)
(“In a case where generation of a sudden movement of the objective person 101 is predicted, for example, when an obstacle is detected in the image picked-up from the camera, the angle of the predictive walking range predicting at that time may temporarily be 180 degrees.”, para. 0120).

Regarding claim 6, Tsai further teaches wherein the image sensor is a depth sensor (“3D camera 410”, Fig. 4, “depth camera”, see rejection to claim 1).
Regarding claim 16, Goto further teaches wherein the sensor generates image information (“captured image”, see para. 0083 citation below) about a portion of the region (“an image capturing device 103a such as a camera for detecting the walking state 102…and an image processing part 103b for processing the captured image or being constructed of a sensor part that serves as a signal receiver”, para. 0083).
Regarding claims 6 and 16, Tsai further teaches wherein the depth sensor includes at least one receptor to detect light reflected (“infrared light or laser”, “The 3D camera may use 3D cameras, or depth cameras able to obtain spatial coordinates of objects with infrared light or laser.”, para. 0018) from the portion of a region (“region”, “In various embodiments, the camera lens may be disposed around or integrated into a display for obtaining the spatial coordinate of a user facing the display. Taking FIG. 2 for example, the z coordinate therein usually represents the depth of an object (distance between the object and the camera) in the region.”, para. 0018), the receptor of the depth sensor being selectively activated during different time periods (multiple time periods is inherent, as Tsai teaches the “3D camera” acquires multiple “3D images”, wherein the “3D camera” has a “camera lens”, i.e., to capture multiple images with one lens, the camera must be activated at different times), and
wherein the depth sensor generates the image information based on the respective amounts of light detected by the at least one receptor during the different time periods (“FIG. 2 is a diagram showing an embodiment of the 3D camera to illustrate step S102. Different from the 2D photography technique, the present invention employs the 3D camera to acquire 3D images of every object, and further acquires x, y, z coordinates and pixel data of a plurality of pixels in the 3D image.”, para. 0018).

Regarding claims 7 and 17, Tsai further teaches wherein the robot further obtains color image information (“the present invention employs the 3D camera to acquire 3D images of every object, and further acquires x, y, z coordinates and pixel data of a plurality of pixels in the 3D image”, para. 0018, “Generally, the pixel data can be RGB values (value of Red color channel, value of Green color channel, and value of Blue color channel) or gray level values obtained by mapping from the RGB values.”, para. 0019) from the “3D camera” about the portion of the region, but
Goto in view of Tsai do not explicitly teach an RGB sensor in addition to the “3D sensor”, or wherein when the moving region cannot be extracted through the image information obtained by the depth sensor, the processor extracts the moving region by using the color image information obtained by the RGB sensor.
However, it would have been obvious to one of ordinary skill in the art at the time of filing to utilize one type of sensor as a back-up to, or in addition of another sensor. It is well-known in the art to equip autonomous devices with more than one sensor type configured to perform the same function. 

Regarding claims 9 and 19, Goto further teaches wherein the sensor is provided on an upper front section (see Fig. 26 with “image processing part 103b” and “camera 103a” both on an upper front section) of the body. 

Regarding claim 20, Tsai further teaches wherein the “3D camera 410” detects a location (“y and z coordinate values”, see para. 0029 citation below) of the “persons m1, m2 m3” (see rejection to claim 11) within a region (“region”, see para. 0029 citation below)
(“The 3D camera 410 is used for acquiring 3D images from the region, where the 3D images comprise a plurality of pixels, and the pixels have x, y and z coordinate values and pixel data. The mapping unit 422 of the present invention is used for mapping the x, y and z coordinate values and the pixel data of the pixels to a plurality of correlative coordinate values of a spatial correlative coordinate represented as (x, z, t)”, para. 0029), 
wherein the processor further: 
generates a map (“mapping the x, y and z coordinate values”, see para. 0029 citation above) based on the location of the obstacle within the region, but Tsai does not explicitly teach a sensor in St. Regis Paper Co. v. Bemis Co., 193 USPQ 8.
Further, Goto teaches detecting an obstacle (“obstacle”, see para. 0120 citation below) within the region and directing the robot toward the moving point based on the obstacle detection (“In a case where generation of a sudden movement of the objective person 101 is predicted, for example, when an obstacle is detected in the image picked-up from the camera, the angle of the predictive walking range predicting at that time may temporarily be 180 degrees. Here, a case where an obstacle is a moving matter or a case where the objective person 101 does not become aware of an obstacle is conceivable as a case where occurrence of a sudden movement of the objective person 101 is predicted.”, para. 0120). 
Thus, since Tsai teaches mapping object locations, and Goto teaches maneuvering the “mobile unit 301” based on obstacle detection, it would have been obvious to one of ordinary skill in the art at the time of filing to further modify the invention of Goto in view of Tsai further in view of Zhao with these teachings of Tsai by directing the directing the “mobile robot 301” of Goto toward the moving point based on a map as taught by Tsai (para. 0029). This would achieve the predictable result of maneuvering the robot based on a map stored in memory. KSR International Co. v. Teleflex Inc. (KSR), 550 U.S. 398, 82 USPQ2d 1385 (2007).


Claim(s) 10 and 21 is/are rejected under 35 U.S.C. 103 as being unpatentable over Goto et al. (US 2006/0064203 A1), in view of Tsai et al. (US 2012/0120196 A1), further in view of Zhao et al. (US 8,170,280 B2), further in view of Schultink et al. (US 2018/0000305 A1), hereafter referred to as Schultink.
Regarding claims 10 and 21, Goto in view of Tsai do not explicitly teach wherein the robot further includes a suction head configured to suck in dust, and 
wherein the suction head is separated from the cleaner body and is connected to the cleaner body by a suction hose.
However, Schultink teaches an autonomously drive floor vacuum cleaner comprising:
a cleaner body (“canister module”, see para. 0028 citation below); 
a suction head (“cleaning head module”, see para. 0028 citation below) that sucks in dust from a floor, wherein the suction head (“cleaning head module”) is separated from the cleaner body (“canister module”) and is connected to the cleaner body by a suction hose (“suction hose”, see para. 0028 citation below)
(“The cleaning head module is in this respect separated from the canister module either at the hose or together with the hose and is replaced with a cleaning head or cleaning tool, etc. manually operable by a user. This cleaning head connected to the canister module…also comprises a suction hose via which the connection to the canister module takes place.”, para. 0028); 
an image sensor (“sensor”, see para. 0035 citation below) that obtains image information about a portion (“surrounding space”, see para. 0035 citation below) of the region 
(“The control contained in the autonomously operable vacuum cleaner preferably has at least one sensor for mapping the surrounding space, in particular at least one camera sensor, sonar sensor, lidar sensor, infrared sensor or 3D scanner sensor”, para. 0035);
a motor (“drive mechanism”, see para. 0009 citation below) that is selectively activated to move the cleaner body through a region (“wherein both the cleaning head module and the canister module each have a drive mechanism that provides independent mobility to the respective modules”, para. 0009); and 
(“control”, para. 0032) that controls the motor to move the cleaner body (“canister module”) to a moving point (“a user”, see para. 0033 citation below). 
(“The user thus has the choice whether he sets the canister module into a quasi-autonomous operating state in which the canister module follows the user; this function can also be called a "follow me" function. In this case, the navigation function is configured such that it recognizes a user and follows him in the event that the user, for example, moves over a predefined distance away from the canister module. para. 0033).
Goto states in para. 0008, “the present invention has the object of solving the issues of the prior art and providing a method for making a mobile unit accompany an objective person, capable of easily predicting and then tracking the walk of the objective person without any special tool or the like regardless of the age and the movement of the objective person”.
Schultink teaches separating a suction head from a cleaner body in an autonomous cleaner. Shultink states, “The vacuum cleaner that is autonomously operable in accordance with the invention is now characterized in that the cleaning head module and the canister module can be separated from one another. Such a separation of the individual modules from one another makes possible a diverse spectrum of further usage options of the autonomously operable vacuum cleaner…Both the canister module and the cleaning module can in this respect and independently of one another have three or four wheels…The canister module and the cleaning module can in particular be driven independently of one another. They can move in different directions, for example. One of the two modules can also not be moved while the other is moved.”, para. 0011-0014. 
Thus, since Goto in view of Tsai further in view of Zhao teach a mobile unit which is configured to follow a user, and Shultink teaches a vacuum which is configured to follow a user, it would have been obvious to one of ordinary skill in the art at the time of filing to modify the invention of Goto in view of Tsai further in view of Zhao with the teachings of Shultink by implementing the methodologies of Goto . 
Response to Arguments







Applicant's arguments filed 12/30/2020 have been fully considered but they are not persuasive. 
Applicant states, pg. 12,
“Thus, GOTO et al. generally compares captured foot data or walking gate data with stored data for a particular objective user 101 to verify whether the particular objective user 101 is being followed, and does not perform any type of analysis based on comparing an extracted person candidate region to a person shape model stored in the memory to determine whether the extracted person candidate region is a person or an obstacle, as recited in claim 1…Thus, TSAI et al. generally relates to processing pixels in threedimensional image of an region to identifies objects within the region, and does not perform any type of analysis based on comparing an extracted person candidate region to a person shape model stored in the memory to determine whether the extracted person candidate region is a person or an obstacle, as recited in claim 1.”

As stated above in this Office action, while Goto teaches an “objective person storage part” (Fig. 1A) configured to store “information that can identify a person”, and a processor configured to extract an “image region” (Fig. 12) based on the “captured image” (para. 0083), Zhao is relied up, necessitated by Applicant’s amendments, to teach the “extract”, “compare”, and two “determine” steps of the amendments of the independent claims. 

Applicant further states, pg. 13,

describes a "follow me" function in which a cleaner moves to follow a user, SCHUL TINK
et al. does not discuss how this function is accomplished or how the robot vacuum cleaner
would differentiate a followed person from an obstacle.”

As stated above in this Office action and in the final rejection, Shultink is relied upon to teach the limitations of claims 10 and 21. The examiner parallels the similarities between the robots of Goto in view of Tsai further in view of Zhao and that of Schultink (i.e., a body, an image sensor, a motor, and a processor configured to move the robot to a point) to support the combination of the suction head of Schultink with the robot of Goto in view of Tsai further in view of Zhao. Schultink is not relied up to teach the limitations of the independent claims related to the extraction of a “person candidate region”. 
Conclusion
The prior art made of record and not relied upon is considered pertinent to Applicant's disclosure:  
Guan et al. (US 9,665,767 B2) teaches “comparing [a] new model and a reference model, determining a position whose new model generates a highest similarity score, determining whether that similarity score is greater than a predetermined threshold, and wherein if it is determined that the similarity score is greater than the predetermined threshold, the object is tracked” (Abstract)

Any inquiry concerning this communication or earlier communications from the examiner should be directed to AMELIA VORCE whose telephone number is (313) 446-4917.  The examiner can normally be reached on Monday-Friday, 8AM- 5PM, EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Christian Chace can be reached at (571) 272-4190.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.	
	/A.V./               Examiner, Art Unit 3665                                                                                                                                                                                       
/CHRISTIAN CHACE/Supervisory Patent Examiner, Art Unit 3665