DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This communication is responsive to the correspondence filled on 8/30/19.
Claims 1-13 are presented for examination.

IDS Considerations

The information disclosure statement (IDS) submitted on 4/13/20 and 8/30/19 is/are being considered by the examiner as the submission is in compliance with the provisions of 37 CFR 1.97.

Claim Rejections - 35 USC § 103

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 1-6 and 8-9 is/are rejected under 35 U.S.C. 103 as being unpatentable over Mallinson (U.S. Pub. No. 20190077007 A1), in view of Shavit (U.S. Pub. No. 20170368413 A1).

Regarding to claim 1 and 9:

1. Mallinson teach an artificial intelligence learning method comprising: (Mallinson [0010] the system includes a robot communicating with a server hosted on a cloud system, over a network. The robot includes artificial intelligence (AI) logic that is configured to detect presence of a user in a geo-location in which the robot is present and to identify an activity that the user is scheduled to perform at a current time. The AI logic includes a plurality of logic modules that communicate with the server to obtain relevant information)
receiving data acquired through an image acquisition unit comprising one or more cameras and a sensor unit comprising one or more sensors; (Mallinson [0060] FIG. 3A illustrates one such representation where the sensor fusion module 117a of the AI logic 114 receives data from various sensors (motion sensors, proximity sensors, cameras, image capturing devices, etc.,) as input, processes the data and outputs one or more models of the user)
generating, when a user’s action is detected from image data including a user, acquired by the one or more cameras, an on-screen label based on the image data including the user; (Mallinson [0028] FIG. 1 illustrates a system in which a robot is used to track activity of a user 101 and to provide feedback to the user 101, in accordance with an implementation. A user 101 may be interested in performing an activity, such as an exercise routine, in a geo-location 100 and a robot 110 in the geo-location 100 that is associated with the user, is used to detect the user 101 [on-screen label] performing the various exercises in the exercise routine, move into position proximal to the user to capture images of the user performing the exercise routine and provide feedback to enable the user to improve their posture when performing the exercises in the exercise routine)
generating an off-screen label based on sensing data acquired by the sensor unit when the image data including the user is acquired by the one or more cameras; and (Mallinson [0060] FIG. 3A illustrates one such representation where the sensor fusion module 117a of the AI logic 114 receives data from various sensors (motion sensors, proximity sensors, cameras, image capturing devices, etc.,) as input, processes the data and outputs one or more models of the user. For example, the logic in the sensor fusion module 117a processes the data to generate models for a posture held by the user [off-screen label] when performing an exercise. The models that are generated for a single posture may identify the posture from different angles or view points. The models that are generated are stick-figure models that substantially mimic a skeletal outline of the posture held by the user [image data excluding the user]. The stick-figure models include dots representing various pivot points and lines that represent the limbs or body parts on which the pivot points are located)
artificial intelligence based on the on-screen label and the off-screen label. (Mallinson [0035] the robot 110 may then position itself in one or more areas in the geo-location proximate to the user so that the robot may be able to capture images of the user performing the exercises in the exercise routine. The images of the user performing the exercise may be compared against one or more virtual models and artificial intelligence (AI) logic within the robot 110 may be used to provide feedback to the user. Mallinson [0028], [0060] FIG. 3A)

Mallinson do not explicitly teach training an artificial intelligence configured to recognize human actions.

However Shavit teach training an artificial intelligence configured to recognize human actions (Shavit FIG. 30-31 [0921] 5. artificial intelligence or machine learning algorithms: Finding Y may be based on the use of learning and classification algorithms, Machine Learning, Deep Learning and alike. [0923] i. The algorithm has a pre-defined and/or configurable set of attributes. This set may be, for example, any group of system-inputs or training measures as defined above, any feature in the user or in the user data or environment)

It would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Mallinson, further incorporating Shavit in video/camera technology. One would be motivated to do so, to incorporate training an artificial intelligence configured to recognize human actions. This functionality will improve efficiency.

Regarding to claim 2:

2. Mallinson teach the artificial intelligence learning method according to claim 1, wherein, in the generating the on-screen label, (Mallinson [0028] FIG. 1 illustrates a system in which a robot is used to track activity of a user 101 and to provide feedback to the user 101, in accordance with an implementation. A user 101 may be interested in performing an activity, such as an exercise routine, in a geo-location 100 and a robot 110 in the geo-location 100 that is associated with the user, is used to detect the user 101 [on-screen label] performing the various exercises in the exercise routine, move into position proximal to the user to capture images of the user performing the exercise routine and provide feedback to enable the user to improve their posture when performing the exercises in the exercise routine) the on-screen label is generated based on the image data including the user and the sensing data. (Mallinson [0034] the external camera 109 captures images of the user and transmits the captured images to the robot 110 using wireless communication protocol. The robot 110 may use the user attributes captured by the sensors and the images of the user captured by the one or more external cameras 109 to verify the user. In one implementation, the robot may have been previously associated with a particular user and the verification is to ensure that the user detected in the geo-location [sensing data] is the particular user that was previously associated with the robot)

Regarding to claim 3:

3. Mallinson teach the artificial intelligence learning method according to claim 1, wherein, in the generating the off-screen label, the off-screen label is generated based on the sensing data and image data excluding the user. (Mallinson [0060] FIG. 3A illustrates one such representation where the sensor fusion module 117a of the AI logic 114 receives data from various sensors (motion sensors, proximity sensors, cameras, image capturing devices, etc.,) as input, processes the data and outputs one or more models of the user. For example, the logic in the sensor fusion module 117a processes the data to generate models for a posture held by the user [off-screen label] when performing an exercise. The models that are generated for a single posture may identify the posture from different angles or view points. The models that are generated are stick-figure models that substantially mimic a skeletal outline of the posture held by the user [image data excluding the user]. The stick-figure models include dots representing various pivot points and lines that represent the limbs or body parts on which the pivot points are located)

Regarding to claim 4:

4. Mallinson teach the artificial intelligence learning method according to claim 3, Mallinson do not explicitly teach further comprising generating the image data excluding the user by removing image data corresponding to the user from the image data including the user.

However Shavit teach further comprising generating the image data excluding the user by removing image data corresponding to the user from the image data including the user. (Shavit [0082] the system may monitor and track the location of certain body parts of the user, the user's direction of motion, acceleration and other kinematic measurements. This data may be incorporated with other sensors data to achieve for example identification of the exact exercise the user is doing or creating an image or skeleton diagram or alike representing the user and its motion. This image or diagram can be 2D or 3D. Shavit [0413] At S2850, the collection of pixels or shapes or other form of machine representation ingredients representing the user in 3D/2D space is attained from various sensors and filtered from the background as described herein above. Alternatively, or additionally data from other sensors such as accelerometers or other sensors mentioned in this disclosure or in the references or known in the art, which measure movement and or location and or orientation and alike is obtained. At S2860, the candidate filtered possible reference skeleton or body maps or exercises postures are projected on this stage attained 3D/2D representations of the user in 3D space. Alternatively, or additionally data movement and or location and or orientation and alike is obtained from additional sensors is compared to the locations in the reference skeleton or body maps. [0053] The sensors may be a camera or an optical sensor that can produce 2D or 3D still or video images. The sensors may further include acoustical LIDAR or RADAR sensors that can produce 2D or 3D still or moving images or mapping in any method known in the art. Shavit teach putting a computer generated same background for any image is obvious because Shavit [0361] two or more of the sensors can produce 3D images of the target/s in methods reviewed above or other methods known in the art. The first steps toward creating body or skeleton mapping can be the steps from the previous paragraphs (Separating from background, reducing noise . . . ) for every 3D sensor. One preferable step is separating the collection of points and/or shapes representing the target/s from the background. Separated background does not have user)

Regarding to claim 5:

5. Mallinson teach the artificial intelligence learning method according to claim 3, wherein the image data excluding the user is image data at a different point in time from a point in time when the image data including the user is acquired. (Mallinson [0064] FIG. 4 the presence of the user in the geo-location activates the AI logic, which then tracks the user's movement by activating various sensors and cameras of the robot. The activated sensors and cameras capture the image of the user while the user is performing an activity, such as an exercise (e.g., a jumping jack exercise). The AI logic then performs sensor fusion by merging the data received from the various sensors and builds one or more models of the user holding different postures while performing the exercise. The AI logic then queries database of virtual models to identify virtual models [image data excluding the user at a different point in time] for the exercise routine. The AI logic then retrieves one or more virtual models for the exercise and compares the retrieved virtual models with the model(s) generated from the user's posture and generates feedback. The feedback may be in the form of an image that identifies areas where the posture needs to be corrected by the user. The image to correct any mistakes in the posture is provided for rendering on the TV monitor or via HMD)

Mallinson do not explicitly teach acquired from the same background.

However Shavit teach acquired from the same background. (Shavit teach putting a computer generated same background for any image is obvious because Shavit [0361] two or more of the sensors can produce 3D images of the target/s in methods reviewed above or other methods known in the art. The first steps toward creating body or skeleton mapping can be the steps from the previous paragraphs (Separating from background, reducing noise . . . ) for every 3D sensor. One preferable step is separating the collection of points and/or shapes representing the target/s from the background. [0360] References like US Patent Application publication 20150319424 by Haimovitch-Yogev et al. or U.S. Pat. No. 9,011,293B2, both incorporated herein by reference. Haimovitch-Yogev [0189] pre-event volumetric reconstruction of the background and live event volumetric reconstruction of the foreground in and of itself is not CGI but rather photo-realistic rendering based on underlying images of physical universe of event 10. [0212] CEM method 220 preferably executes on CEM module 221 to create an environment model 223 used in subsequent reconstruction of environment 16 associated with event 10. Environment model 223 may be understood to be the background 3D model or the background data representation) 

Regarding to claim 6:

6. Mallinson teach the artificial intelligence learning method according to claim 1, further comprising detecting the user's action by deducing vertices of respective body regions of the user included in the image data acquired by the one or more cameras through a skeleton technique. (Mallinson [0060] FIG. 3A illustrates one such representation where the sensor fusion module 117a of the AI logic 114 receives data from various sensors (motion sensors, proximity sensors, cameras, image capturing devices, etc.,) as input, processes the data and outputs one or more models of the user. For example, the logic in the sensor fusion module 117a processes the data to generate models for a posture held by the user when performing an exercise. The models that are generated for a single posture may identify the posture from different angles or view points. The models that are generated are stick-figure models that substantially mimic a skeletal outline of the posture held by the user [image data excluding the user]. The stick-figure models include dots representing various pivot points and lines that represent the limbs or body parts on which the pivot points are located)

Regarding to claim 8:

8. Mallinson teach the artificial intelligence learning method according to claim 1, further comprising: receiving data for recognition; recognizing the user’s action based on the on-screen label, (Mallinson [0028] FIG. 1 illustrates a system in which a robot is used to track activity of a user 101 and to provide feedback to the user 101, in accordance with an implementation. A user 101 may be interested in performing an activity, such as an exercise routine, in a geo-location 100 and a robot 110 in the geo-location 100 that is associated with the user, is used to detect the user 101 [on-screen label] performing the various exercises in the exercise routine, move into position proximal to the user to capture images of the user performing the exercise routine and provide feedback to enable the user to improve their posture when performing the exercises in the exercise routine) when the data for recognition comprises the image data including the user, acquired through the one or more cameras; and (Mallinson [0034] the external camera 109 captures images of the user and transmits the captured images to the robot 110 using wireless communication protocol. The robot 110 may use the user attributes captured by the sensors and the images of the user captured by the one or more external cameras 109 to verify the user. In one implementation, the robot may have been previously associated with a particular user and the verification is to ensure that the user detected in the geo-location [sensing data] is the particular user that was previously associated with the robot)
recognizing the user’s action based on the off-screen label, when the data for recognition does not comprise the image data including the user, acquired through the one or more cameras. (Mallinson [0060] FIG. 3A illustrates one such representation where the sensor fusion module 117a of the AI logic 114 receives data from various sensors (motion sensors, proximity sensors, cameras, image capturing devices, etc.,) as input, processes the data and outputs one or more models of the user. For example, the logic in the sensor fusion module 117a processes the data to generate models for a posture held by the user [off-screen label] when performing an exercise. The models that are generated for a single posture may identify the posture from different angles or view points. The models that are generated are stick-figure models that substantially mimic a skeletal outline of the posture held by the user [image data excluding the user]. The stick-figure models include dots representing various pivot points and lines that represent the limbs or body parts on which the pivot points are located)

Claims 7 and 10 is/are rejected under 35 U.S.C. 103 as being unpatentable over Mallinson (U.S. Pub. No. 20190077007 A1), in view of Shavit (U.S. Pub. No. 20170368413 A1), further in view of Merler (U.S. Pub. No. 20190289372 A1).

Regarding to claim 7:

7. Mallinson teach the artificial intelligence learning method according to claim 1, Mallinson do not explicitly teach wherein, in the training the artificial intelligence, self-supervised learning is performed using each of the on-screen label and the off-screen label as input data.

However Merler teach wherein, in the training the artificial intelligence, self-supervised learning is performed using each of the on-screen label and the off-screen label as input data. (Merler [0061] scenario 604 illustrates self-supervised learning of a player action recognition classifier 217 by on-screen overlay recognition classifier 216. Mallinson [0028], [0060] FIG. 3A)
The motivation for combining Mallinson and Shavit as set forth in claim 1 is equally applicable to claim 7. It would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Mallinson, further incorporating Shavit and Merler in video/camera technology. One would be motivated to do so, to incorporate the training the artificial intelligence, self-supervised learning is performed using each of the on-screen label and the off-screen label as input data. This functionality will improve user experience.

Regarding to claim 10:

10. Mallinson teach the operating method according to claim 9, Mallinson do not explicitly teach wherein, in the training the artificial intelligence: self-supervised learning is performed using each of the on-screen label and the off-screen label as input data; or the artificial intelligence is updated by receiving artificial intelligence-related data acquired by performing the self-supervised learning using each of the on-screen label and the off-screen label as the input data.

However Merler teach wherein, in the training the artificial intelligence: self-supervised learning is performed using each of the on-screen label (Merler Fig. 3 [0060] Scenario 603 illustrates self-supervised learning of the facial recognition [on-screen label] classifier 215 by metadata associated with a segment.) and the off-screen label as input data; or the artificial intelligence is updated by receiving artificial intelligence-related data acquired by performing the self-supervised learning using each of the on-screen label and the off-screen label as the input data. (Merler [0061] scenario 604 illustrates self-supervised learning of a player action [off-screen label] recognition classifier 217 by on-screen overlay recognition classifier 216. It would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art Mallinson [0060] FIG. 3A identify the posture [off-screen label] will be applicable with predictable results. Mallinson [0028], [0060] FIG. 3A)

Claims 11-12 is/are rejected under 35 U.S.C. 103 as being unpatentable over Mallinson (U.S. Pub. No. 20190077007 A1), in view of Shavit (U.S. Pub. No. 20170368413 A1), further in view of Cho (U.S. Pub. No. 20170134694 A1).

Regarding to claim 11:

11. Mallinson teach the operating method according to claim 9, further comprising performing a corresponding motion based on the recognized action. (Mallinson [0028] FIG. 1 illustrates a system in which a robot is used to track activity of a user 101 and to provide feedback to the user 101, in accordance with an implementation. A user 101 may be interested in performing an activity, such as an exercise routine, in a geo-location 100 and a robot 110 in the geo-location 100 that is associated with the user, is used to detect the user 101 performing the various exercises in the exercise routine, move into position proximal to the user to capture images of the user performing the exercise routine and provide feedback to enable the user to improve their posture when performing the exercises in the exercise routine)

Alternatively Cho teach further comprising performing a corresponding motion (Cho [0128] the second electronic device 101-2 may perform a motion by using the received information 610. The second electronic device 101-2 may receive the motion data, and may drive a motor based on the received motion data. Alternatively, the second electronic device 101-2 may acquire motion data by parsing the synthesized data which is the received information 610, and may drive the motor on the basis of the motion data obtained by parsing the synthesized data. For example, as illustrated in FIG. 6A, the second electronic device 101-2 may drive the motor on the basis of the motion data, and thereby may move a left arm of the second electronic device 101-2 to the vicinity of a relative lower side of a head portion) based on the recognized action. (Cho [0123] the first electronic device 101-1 may estimate a motion of the user on the basis of a difference between the multiple images 610. The first electronic device 101-1 may generate motion data of an electronic device corresponding to the estimated motion of the user. For example, the first electronic device 101-1 may analyze the multiple images 610, and thereby may estimate a motion of the user expressing that the user, who exists outside of the first electronic device 101-1, puts the left hand over the mouth)

The motivation for combining Mallinson and Shavit as set forth in claim 1 is equally applicable to claim 11. It would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Mallinson, further incorporating Shavit and Cho in video/camera technology. One would be motivated to do so, to incorporate performing a corresponding motion based on the recognized action. This functionality will enhance capabilities.

Regarding to claim 12:

12. Mallinson teach the operating method according to claim 11, based on the action recognized based on the off- screen label. (Mallinson [0060] FIG. 3A illustrates one such representation where the sensor fusion module 117a of the AI logic 114 receives data from various sensors (motion sensors, proximity sensors, cameras, image capturing devices, etc.,) as input, processes the data and outputs one or more models of the user. For example, the logic in the sensor fusion module 117a processes the data to generate models for a posture held by the user when performing an exercise. The models that are generated for a single posture may identify the posture from different angles or view points. The models that are generated are stick-figure models that substantially mimic a skeletal outline of the posture held by the user [image data excluding the user]. The stick-figure models include dots representing various pivot points and lines that represent the limbs or body parts on which the pivot points are located)

Mallinson do not explicitly teach wherein, in the performing the corresponding motion based on the recognized action, is rotated so that one surface thereof provided with an operation unit and a first display disposed thereon faces the user.

However Cho teach wherein, in the performing the corresponding motion based on the recognized action, (Cho [0128] the second electronic device 101-2 may perform a motion by using the received information 610. The second electronic device 101-2 may receive the motion data, and may drive a motor based on the received motion data. Alternatively, the second electronic device 101-2 may acquire motion data by parsing the synthesized data which is the received information 610, and may drive the motor on the basis of the motion data obtained by parsing the synthesized data. For example, as illustrated in FIG. 6A, the second electronic device 101-2 may drive the motor on the basis of the motion data, and thereby may move a left arm of the second electronic device 101-2 to the vicinity of a relative lower side of a head portion. Cho [0123] the first electronic device 101-1 may estimate a motion of the user on the basis of a difference between the multiple images 610. The first electronic device 101-1 may generate motion data of an electronic device corresponding to the estimated motion of the user. For example, the first electronic device 101-1 may analyze the multiple images 610, and thereby may estimate a motion of the user expressing that the user, who exists outside of the first electronic device 101-1, puts the left hand over the mouth) a top cover (Cho [0059] FIG. 1B the head portion 190 includes a front cover 161 corresponding to the shape of the face of the human being) is rotated so that one surface thereof provided with an operation unit (Cho [0062] a driving unit 191 may include at least one motor that enables the head portion 190 to move, and may change, for example, a direction of the head portion 190. The driving unit 191 may be used to mechanically change a movement and other elements) and a first display disposed thereon faces the user. (Cho [0111] By using signal-processed data, middleware 430 may recognize a three-dimensional (3D) gesture of the user (as indicated by reference numeral 431); may detect or track a location of the face of the user, or may perform authentication through face recognition (as indicated by reference numeral 432); [0165] FIG. 12B an electronic device 1201 may be docked (as indicated by reference numeral 1210) to a robot 1202) 

Allowable subject matter

Regarding to claim 13:

Claims 13 is/are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

Conclusion

Any inquiry concerning this communication or earlier communications from the examiner should be directed to NASIM N NIRJHAR whose telephone number is (571) 272-3792.  The examiner can normally be reached on Monday - Friday, 8 am to 5 pm ET.

If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Christopher Kelley can be reached on (571) 272-7331.  The fax phone number for the organization where this application or proceeding is assigned is (571) 273-8300.

Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/NASIM N NIRJHAR/Primary Examiner, Art Unit 2482