DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendment
In response to the amendment filed 5/10/2021; claims 1 - 38 are pending.


Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1-5,13-23, 31-38 are rejected under 35 U.S.C. 103 as being unpatentable over Wang et al. (US 2018/0314689 A1) in view of Yang (US 2007/0055523 A1) and Sahin (US 2015/0099946 A1).
Re claims 1, 19:
1. Wang teaches a computing device (Wang, Abstract) comprising: 
a camera; a microphone; a speaker; a display device; memory accessible by at least one processor, the memory storing one or more program modules executable by the at least one processor; wherein the one or more program modules, when executed by the at least one processor (Wang, [0085]; [0129]; [0537]), is capable of: 
receiving audio data from said microphone and recognizing the speech associated with said user of said computing device based on said received audio data (Wang, fig. 3, “User Audio Input”; [0075], “automatic speech recognition engine”; [0223]; [0245]; [0248] – [0250]); 
receiving video data from said camera and determining a direction of eye gaze of a particular user of said computing device based on said received video data (Wang, fig. 3; [0277], “an eye gaze detection”; [0279]; [0301] ; [0310]); 
identifying an object based on the received video data from said camera using a character recognition technique (Wang, [0266], “the complex event recognition engine 1950 may interface with an automated speech recognition (ASR) system 1966 and/or an optical character recognition (OCR) system 1968 … The OCR system 1968 may recognize text that is present in a visual scene of the video and provide the recognized text to the complex event recognition engine 1950”) or an object recognition technique;
causing a first audio information to be transmitted from said speaker of said computing device in response to the received audio data and the received video data (Wang, [0047] – [0048]; [0168]; [0128]), 
wherein the first audio information comprises audio identifying the object in order to cause an audio response from said user of said computing device (Wang, fig. 3; [0065], “A multi-modal virtual personal assistant can also accept visual input, including video or still images, and determine information such as facial expressions, gestures, and iris biometrics (e.g., characteristics of a person's eyes)”; [0085], “a virtual personal assistant 150 can interact with the smartphone 102 using various sensory input, such as audio input 110, image input 120, and/or tactile input 130”; [0167], “the dialog inputs captured and processed by the multi-modal user interface 1012 may be in the form of audio, images, text, some other natural language inputs, or a combination of inputs”; [0300], “Human conversation can also be dynamic, and people's intentions and emotional states can change over the course of a conversation”; Wang teaches a conversation between a human and a virtual personal assistant; wherein the virtual assistant accept different input from a user (audio, image, text (extracted from image)… ) and output audio information in response to said user inputs). 

Multi-Modal Cues (or eye movements, 306) cause an audio response from the user (302,314)
[AltContent: textbox (Multi-Modal (eye movement) causes an audio response from the user.)][AltContent: textbox (Multi-Modal including: eye movement)]
    PNG
    media_image1.png
    716
    1012
    media_image1.png
    Greyscale






19. A computer-implemented method for education comprising the steps of: receiving audio input data from a microphone connected to a computing device; 
receiving video input data from a camera connected to the computing device; 
processing the audio input data and video input data by the computing device; 
sending, by an audio output module, audio output data from a speaker connected to the computing device; 
sending, by a video output module, video output data from a display device connected to the computing device;
wherein the step of processing includes recognizing, by the computing device, speech associated with a particular user of the computing device based on the received audio input data; 
wherein the step of processing includes determining, by the computing device, a direction of eye gaze of said user of the computing device based on the received video input data; and 
wherein the step of processing includes identifying, by the computing device, an object that is disposed within the direction of said eye gaze based on the received video input data using a character recognition technique or an object recognition technique; and 
wherein the speaker connected to the computing device is configured to transmit a first audio information caused by the computing device in response to the received audio input data and the received video input data, 
wherein the first audio information comprises audio identifying the object disposed within the direction of said eye gaze in order to cause an audio response from said user of said computing device;
wherein the step of processing includes determining, by the computing device, that the audio response from said user of said computing device corresponds to the first audio information by matching at least one of a vocabulary word and a pronunciation between the audio response and the first audio information; and 
wherein the display device connected to the computing device is configured to display a celebration animation caused by the computing device and the speaker connected to the computing device is configured to transmit a second audio information caused by the computing device based on said step of processing including determining, by the computing device, that the audio response from said user of said computing device corresponds to the first audio information to provide positive feedback (See claim 1 rejection above). 

Wang teaches a receiving video data from said camera and determining a direction of eye gaze of a particular user of said computing device based on said received video data (Wang, fig. 3; [0277], “an eye gaze detection”; [0279]; [0301] ; [0310]); identifying an object based on the received video data from said camera using a character recognition technique (Wang, [0266], “the complex event recognition engine 1950 may interface with an automated speech recognition (ASR) system 1966 and/or an optical character recognition (OCR) system 1968 … The OCR system 1968 may recognize text that is present in a visual scene of the video and provide the recognized text to the complex event recognition engine 1950”); wherein the audio information comprises audio identifying the object in order to cause a response from said user of said computing device (Wang, fig. 3; [0065], “A multi-modal virtual personal assistant can also accept visual input, including video or still images, and determine information such as facial expressions, gestures, and iris biometrics (e.g., characteristics of a person's eyes)”; [0085], “a virtual personal assistant 150 can interact with the smartphone 102 using various sensory input, such as audio input 110, image input 120, and/or tactile input 130”; [0167], “the dialog inputs captured and processed by the multi-modal user interface 1012 may be in the form of audio, images, text, some other natural language inputs, or a combination of inputs”; [0300], “Human conversation can also be dynamic, and people's intentions and emotional states can change over the course of a conversation”; Wang teaches a conversation between a human and a virtual personal assistant; wherein the virtual assistant accept different input from a user (audio, image, text (extracted from image)… ) and output audio information in response to said user inputs). 

Wang does not explicitly disclose determining that the audio response from said user of said computing device corresponds to the first audio information by matching at least one of a vocabulary word and a pronunciation between the audio response and the first audio information; and causing a celebration to be displayed on the display device of said computing device to provide positive feedback

Yang (US 2007 /0055523 A1) teaches a pronunciation training system extracts pronunciation features from various pronunciation samples, links pronunciation features with corresponding muscle movements and diagram representations, displays related waveforms and pronunciation processes, and mark the differences between different waveforms and different pronunciation processes for helping a user to distinguish different sounds.  Yang further teaches Wang’s deficiency; specifically, a method includes determining that the audio response from said user of said computing device corresponds to the first audio information by matching at least one of a vocabulary word and a pronunciation between the audio response and the first audio information (Yang, fig. 5; [0128], “the system can track user's eye movement. According to where the eyes focus, when the eyes returns to left side from right side, or to top from bottom, and the contents on monitor, the system can judge what the user is reading now. Third, the system can also provide interface for a user to indicate what the user is reading”; [0131], “The system finds the pronunciation feature deviations between the pronunciation features extracted from the pronunciation samples of a user and the pronunciation features extracted from the pronunciation samples of a native speaker”); and causing a celebration to be displayed on the display device of said computing device to provide positive feedback (Yang, [0138], “the system performs various statistical analyses on user's pronunciation samples, mistakes made, etc. to generate a new profile about user's pronunciation on trouble sounds, how good for pronouncing a particular sound, etc”; [0139], “The system can also display the progress in proper diagrams. The system can further provide proper encourage messages saved in database for predefined situations”).  Therefore, in view of Yang, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the method/system described in Wang, by providing the pronunciation training system / method as taught by Yang, in order to provide a system for helping people to improve the ability to discriminate different sounds and to produce correct sounds (Yang, [0001]).  In order to provide interfaces for experts to define pronunciation features, extracts and compares pronunciation features, and build links between pronunciation features and pronunciation processes … displays related waveforms for helping a user to enhance the user's awareness on different sounds. The system can further increase the user's awareness on how a sound relates to a pronunciation feature and the muscle movements of a pronunciation organ by providing interfaces for a user to create different sounds by modifying the existing sounds on its loudness, tone, duration, and pace, by modifying the features in time domain or frequency domain, and by modifying the muscle movements of related pronunciation organs (Yang, Abstract).

The combination of Wang and Yang does not explicitly disclose causing a celebration animation to be displayed; nor disclose a second audio information to be transmitted from said speaker of said computing device based on said determining that the audio response from said user of said computing device corresponds to the first audio information to provide positive feedback.  

Sahin (US 2015/0099946 A1) teaches system, environment, and methods evaluation of an individual for ASD (Sahin, Abstract).  Sahin further teaches causing a celebration animation to be displayed on the display device of said computing device and a second audio information to be transmitted from said speaker of said computing device based on said determining that the audio response from said user of said computing device corresponds to the first audio information to provide positive feedback (Sahin, [0210], “Positive reinforcement feedback may include an enjoyable or celebratory sound, such as a fanfare, cheering, or happy music. Verbal positive feedback, such as the words "success", "hooray", "good job", or "way to go" may be audibly or visually presented to the user. The positive reinforcement feedback may include a color, image, animation, or other pleasing visual representation presented, for example, in the heads-up display of the wearable data collection device”; [0219]).  Therefore, in view of Sahin, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the method / system described in Wang, by providing the animation and audio positive feedback as taught by Sahin, in order to provide positive feedback and encouragement to the user (Sahin, [0210]). 

Wang does not explicitly disclose identifying an object that is disposed within the direction of said eye gaze based on the received video data from said camera using a character recognition technique.   Wang does not explicitly disclose wherein the audio information comprises audio identifying the object disposed within the direction of said eye gaze.  

Sahin teaches Wang’s deficiency; specifically, identifying an object that is disposed within the direction of said eye gaze based on the received video data from said camera using a character recognition technique (Sahin, [0125], “may use one or more optical character recognition modules to identify that text has been captured within the video recording data 116b”); wherein the audio information comprises audio identifying the object disposed within the direction of said eye gaze ([0129], “single word reading algorithm 532b may identify definitions, pronunciations, graphic or video illustrations, audio snippets, and other rich information associated with an identified word of phrase. The single word reading algorithm 532b may then present enhanced information to the individual 502 regarding the presented text, automatically or upon selection. In a particular illustration, the single word reading algorithm 532b provides the individual 502 with the opportunity to select a word or phrase within the text for additional information, such as pronunciation, definition, and/or graphic illustration (e.g., what does a crested gecko look like, what is the pronunciation of "inchoate", or what does "lethargy" mean)”; [0139], “presentation of verbal information regarding an object identified through the machine vision language tutor algorithm 532a”).   Therefore, in view of Sahin, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the method/system described in Wang, by identifying an object based on eye gaze as taught by Sahin, in order to provide an individual with the opportunity to select a word or phrase within the text for additional information, such as pronunciation, definition, and/or graphic illustration (Sahin, [0129]). Sahin further suggests “the machine vision language tutor algorithm may include, for example, the ability to identify encoded objects within the vicinity of the wearable data collection device. For example, the machine vision language tutor algorithm may scan the immediate vicinity of the individual wearing the wearable data collection device to identify objects encoded with standardized index elements” (Sahin, [0102]).  

Re claims 2 – 3, 20 – 21:
2. The computing device of claim 1, wherein said one or more program modules, when executed by the at least one processor, is capable of receiving video data from said camera sufficient to determine the physical gestures of said user of said computing device based on said received video data (Wang, [0062]; [0112]). 

20. The method of claim 19, wherein the device is capable of receiving video data from said camera sufficient to determine the physical gestures of said user of the device based on the received video data (See claim 2 rejection above). 

3. The computing device of claim 2, wherein said video data comprises the facial expressions of said user of said computing device (Wang, [0065]; [0114]; [0166]). 

21. The method of claim 20, wherein the video data comprises the facial expressions of said user of the computing device (See claim 3 rejection above). 

Re claims 4, 22:
4. The computing device of claim 1, further comprising a touch pad device (Wang, [0165]). 
22. The method of claim 19, further comprising receiving information from a touch pad device connected to the computing device (See claim 4 rejection above). 

Re claims 5, 23:
5. The computing device of claim 1, further comprising a sensor device, wherein said sensor device operates to track the movement of said user (Wang, [0166]; [0167]; [0441]). 
23. The method of claim 19, further comprising receiving information from a sensor device connected to the computing device, wherein said sensor device operates to track the movement of said user (See claim 5 rejections above). 

Re claims 13 – 17:
13. The computing device of claim 1, further comprising a projector device attached to said computing device, wherein said projector device projects images or data for the user in response to the operation of said program modules executed by the at least one processor.  14. The computing device of claim 2, further comprising a projector device attached to said computing device, wherein said projector device projects images or data for the user in response to the operation of said program modules executed by the at least one processor.  15. The computing device of claim 3, further comprising a projector device attached to said computing device, wherein said projector device projects images or data for the user in response to the operation of said program modules executed by the at least one processor.  16. The computing device of claim 4, further comprising a projector device attached to said computing device, wherein said projector device projects images or data for the user in response to the operation of said program modules executed by the at least one processor.  17. The computing device of claim 5, further comprising a projector device attached to said computing device, wherein said projector device projects images or data for the user in response to the operation of said program modules executed by the at least one processor (Wang, [0461]; [0116] – [0117]; [0523]). 

Re claims 31 – 35:
31. The method of claim 19, further comprising transmitting information from a projector device connected to the computing device, wherein said projector device projects images or data for the user in response to one or more of the processing steps.  32. The method of claim 20, further comprising transmitting information from a projector device connected to the computing device, wherein said projector device projects images or data for the user in response to one or more of the processing steps.  33. The method of claim 21, further comprising transmitting information from a projector device connected to the computing device, wherein said projector device projects images or data for the user in response to one or more of the processing steps.  34. The method of claim 22, further comprising transmitting information from a projector device connected to the computing device, wherein said projector device projects images or data for the user in response to one or more of the processing steps.  35. The method of claim 23, further comprising transmitting information from a projector device connected to the computing device, wherein said projector device projects images or data for the user in response to one or more of the processing steps (Wang, [0461]; [0116] – [0117]; [0523]).

Re claims 18 and 36:
18.  The computing device of claim 6, further comprising a projector device attached to said computing device, wherein said projector device projects images or data for the user in response to the operation of said program modules executed by the at least one processor.   36. The method of claim 24, further comprising transmitting information from a projector device connected to the computing device, wherein said projector device projects images or data for the user in response to one or more of the processing steps (Wang, [0117]; [0143]; [1059]).

Re claims 37  - 38:
37.  The computing device of claim 1, wherein the object that is disposed within the direction of said eye gaze comprises a text.  38. The method of claim 19, wherein the object that is disposed within the direction of said eye gaze comprises a text (Yang, fig. 5; [0128], “the system can track user's eye movement. According to where the eyes focus, when the eyes returns to left side from right side, or to top from bottom, and the contents on monitor, the system can judge what the user is reading now. Third, the system can also provide interface for a user to indicate what the user is reading”; [0131], “The system finds the pronunciation feature deviations between the pronunciation features extracted from the pronunciation samples of a user and the pronunciation features extracted from the pronunciation samples of a native speaker”).

Claims 6 – 12 and 24 – 30 are rejected under 35 U.S.C. 103 as being unpatentable over Wang et al. (US 2018/0314689 A1) in view of Yang (US 2007/0055523 A1) and Sahin (US 2015/0099946 A1) as applied to claims 1 and 19 above, and further in view of Vu et al. (US 2007/0192910 A1).
Re claims 7 – 11 and 25 – 29:
Wang does not explicitly disclose an output device attached to said computing device, wherein said output device is movable in response to the operation of said program modules executed by the at least one processor.  Vu teaches a mobile robot guest for interacting with a human resident performs a room-traversing search procedure prior to interacting with the resident, and may verbally query whether the resident being sought is present (Vu, Abstract).  Vu further teaches a movable head of a robot (Vu, [0124]; [0166]; fig. 6B; figs. 5A - 5C).  Therefore, in view of Vu, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the invention described in Wang, by providing the movable head as taught by Vu, since Vu states “the robot may perform "politely and graciously," moving out of the way of the resident if it becomes apparent the resident is beginning to move in a certain direction-the robot's recording of the position of doors and/or traffic paths and/or household obstacles would facilitate keeping out of the way. This is an example of the robot simulating social interaction via robot expression motion” (Vu, [0214]).

Re claims 6 and 24:
Wang teaches a system uses a user’s user name and password for identification (Wang, [0087]).  However, Wang does not explicitly discloses a passive RFID reader device.  Vu teaches a mobile robot guest for interacting with a human resident.  Vu further teaches a passive RFID reader device ([0260]).  Therefore, in view of Vu, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the device/method described in Wang, by providing the RFID tag as taught by Vu, since it was known in the art the RFID provide secured and convenient way to authenticate a user.

Re claims 12 and 30:
Wang teaches an output device attached to said computing device (Wang, figs. 35 - 37) and a service robot (Wang, fig. 37).  Wang does not explicitly disclose said output device is movable in response to the operation of said program modules executed by the at least one processor.  Vu teaches a moveable robot (“said output device is movable in response to one or more of the processing steps”) (Vu, [0124]; [0166]; fig. 6B; figs. 5A - 5C).  Therefore, in view of Vu, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the device/method described in Wang, by providing the moveable display as taught by Vu, in order to provide shrugging, nodding, head shaking, looking away (change of subject) and other gestural cues; the mobility system permits approach and recede motions (personal space and conversational attention cues); and a facial matrix panel may permit a full range of facial expressions (Vu, [0105]).

Response to Arguments
Applicant’s arguments with respect to claims 1 – 38 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JACK YIP whose telephone number is (571)270-5048.  The examiner can normally be reached on Monday thru Friday; 9:00 AM - 5:00 PM EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, XUAN THAI can be reached on (571) 272-7147.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/JACK YIP/Primary Examiner, Art Unit 3715