DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Status of the Claims
Claims 1-20 are pending in the present application, with claims 1, 10, and 20 being independent.
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 15 September 2022 has been considered by the examiner.
Specification
The use of the term “PowerPoint” in paragraph 68, which is a trade name or a mark used in commerce, has been noted in this application. The term should be accompanied by the generic terminology; furthermore the term should be capitalized wherever it appears or, where appropriate, include a proper symbol indicating use in commerce such as ™, SM , or ® following the term.
Although the use of trade names and marks used in commerce (i.e., trademarks, service marks, certification marks, and collective marks) are permissible in patent applications, the proprietary nature of the marks should be respected and every effort made to prevent their use in any manner which might adversely affect their validity as commercial marks.
Appropriate correction is required.
Claim Interpretation
The following interpretations are being applied to the claimed limitations.
user characteristics are being as characteristic information in the voice information that reflects the user identify. 
Paragraph 35 sets forth “According to the embodiments of the disclosure, the user characteristics are characteristic information in the voice information that reflects the user identity. The user characteristics may be semantic characteristics converted from the voice information into a voice text, for example, a name, an address, a company name, a position, a title, a nickname, etc., or may also be tone characteristics in the voice information, etc.”
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art. The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier. Such claim limitation(s) is/are: an audio acquisition module and an image acquisition module in claim 10.
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may: (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.
Claim Objections
Claim(s) 9 and 14 is/are objected to because of the following informalities:  
Claim 9 appears as though it should recite “prompt information sent by a client is received; and the prompt information…” or “prompt information sent by a client is received [[, wherein the prompt information…”.  
Claim 14 appears as though it should recite “acquire voice associated information associated with the semantic keyword; and display…”
Appropriate correction is required.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claim(s) 20 is/are rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter. The claim(s) does/do not fall within at least one of the four categories of patent eligible subject matter because claim 20 recites a nonvolatile computer-readable storage medium. In addition to setting forth non-limiting examples of storage mediums in paragraph 120, the disclosure sets forth “storage mediums include any mediums in which information may be stored or transmitted in a way of being read by devices (for example, computers) not set forth a definition for computer-readable storage medium, see paragraph 120. 
Since the specification does not explicitly exclude transitory propagating signals from being included in the computer-readable media, the broadest reasonable interpretation consistent with the specification and state-of-the-art, at the time of the invention the full scope of “computer-readable media” would cover both non-transitory tangible media (e.g., RAM, ROM, hard drive) and transitory propagating signals (e.g., carrier waves, signals) per se. Transitory propagating signals do not fall within the definition of a process, machine, manufacture or composition of matter and therefore must be rejected under 35 U.S.C. 101 as covering non-statutory subject matter (See In re Nuijten, 500 F.3d 1346, 1356-57 (Fed. Cir. 2007) (transitory embodiments are not directed to statutory subject matter) and Interim Examination Instructions for Evaluating Subject Matter Eligibility Under 35 U.S.C. § 101, Aug. 24, 2009; p. 2.). The examiner suggests amending the claim to exclude transitory propagating signals, by adding a modifier, such as non-transitory to the claimed medium.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Yao (US PG Publication 2020/0075011) in view of Cheng et al. (US PG Publication 2019/0147889).
Regarding claim 1, Yao teaches an AR-based information displaying method (see for instance, paragraphs 101-109 and figs. 1-4), comprising: acquiring voice information and a user image of a user (voice information and video information are collected in real time, see for instance, paragraph 98. The user wears the AR glasses and during a conversation with the speaking object A and the speaking object B (persons), the user’s AR glasses capture video information of the conversation process of the speaking object A and the speaking object B, and also collects voice messages sent by both of them, see paragraph 101 and figs. 1 and 3); 
identifying the voice information and extracting user characteristics (In an implementation step S102, which speaking object the currently collected voice information belongs to may be determined by recognizing a person currently speaking, see paragraph 104. When voice information corresponding to an existing sound is collected, the acquaintance face is searched in the video information as the face corresponding to the voice information and the speaking object is determined, see paragraph 109); and 
when the user image matches the user characteristics, displaying, by an AR displaying device, target information associated with the user at a display position corresponding to the user image, wherein the target information comprises at least one of user information and voice associated information (superimposing and displaying an augmented reality AR sign language animation corresponding to the voice information on a gesture area corresponding to the speaking object to obtain a sign language video, see paragraphs 111 and 112).
Yao does not appear to teach that user information is output.
In the same art of graphics and voice, Cheng teaches that voiceprint recognition is the process by which a speaker is determined based on the voice uttered by the speaker, which is the recognition technique that serves the voice as the identity authentication, see for instance, paragraphs 66-67. User information matching an extracted acoustic feature is acquired and outputted, see for instance, paragraphs 83-88. After a voice is received, the voice may be converted into a text and display the user information of the speaker in front of the converted text simultaneously after identifying the user, see paragraph 77. The received voice information can be translated into another language, see paragraph 77.
It would have been obvious to one of ordinary skill in the art having the teachings of Yao and Cheng in front of them before the effective filing date of the claimed invention to incorporate user identification as taught by Cheng into Yao’s AR sign language system, as identifying a user based on their acoustic information and outputting the information to the display, such as described by Cheng was well known at the time of the effective filing date invention and would have yielded predictable results in combination with Yao. 
The modification of Yao with Cheng would have explicitly allowed the user information to be outputted. 
The motivation for combining Yao with Cheng would have been to use a known technique (voiceprint recognition), improve the user experience, and enhance functionality, see for instance, Cheng, paragraph 2.
Regarding claim 2, Yao in view of Cheng teach the method according to claim 1 and further teach wherein when the user image matches the user characteristics, before the step of displaying, by an AR displaying device, target information associated with the user at a display position corresponding to the user image, the method further comprises: searching a preset user library for a standard user image according to the user characteristics (An acquaintance face corresponding to a sound may be pre-recorded, and then when and then, when voice information corresponding to an existing sound is collected, the acquaintance face is searched in the video information as the face corresponding to the voice information, and then the speaking object is determined according to the face, see Yao, paragraph 109. Specifically, the sound attribute information (such as amplitude information, audio information, and/or accent cycle information), corresponding to the voice information may be obtained first, …then, a historical face image corresponding to the sound attribute information is determined in a pre-stored face set, see for instance, Yao, paragraph 109. The pre-stored face set may include multiple face images, and each face image is associated with at least one pre-stored sound attribute information, see Yao, paragraph 109. User information and acoustic features corresponding to the input voice can be stored in a preset file, see for instance, Cheng, paragraphs 93-96. User information matching an extracted acoustic feature is acquired and outputted, see for instance, Cheng, paragraphs 83-88); and when the user image matches the standard user image, confirming that the user image matches the user characteristics (A historical face image corresponding to the sound attribute information is determined in a pre-stored face set, see for instance, Yao, paragraph 109. The pre-stored face set may include multiple face images, and each face image is associated with at least one pre-stored sound attribute information, see Yao, paragraph 109. User information and acoustic features corresponding to the input voice can be stored in a preset file, see for instance, Cheng, paragraphs 93-96. User information matching an extracted acoustic feature is acquired and outputted, see for instance, Cheng, paragraphs 83-88. By comparing the obtained sound attribute information with the pre-stored sound attribute information, sound attribute information matching the sound attribute information corresponding to the voice information is found in the pre-stored sound attribute information, so that the corresponding face image is taken as the historical face image, see for instance, Yao, paragraph 109). The motivation to combine Yao in view of Cheng is the same as that which was set forth with respect to claim 1.
Regarding claim 3, Yao in view of Cheng teach the method according to claim 2, and further teach wherein when the user characteristics comprise a semantic keyword characteristic (Semantic recognition may be performed on the voice information to obtain text information, see paragraph 113. The voice text information may be understood as a sematic of the voice information…it can be understood that each pre-stored AR gesture animation has a corresponding gesture sematic, and in a case where the gesture semantic and voice text information match, the AR gesture animation corresponding to the gesture semantic is obtained, see paragraph 113), the preset user library comprises a corresponding relation between the standard user image and an identity keyword characteristic (User information and acoustic features corresponding to the input voice can be stored in a preset file, see for instance, Cheng, paragraphs 93-96. An acquaintance face corresponding to a sound may be pre-recorded, and then when and then, when voice information corresponding to an existing sound is collected, the acquaintance face is searched in the video information as the face corresponding to the voice information, and then the speaking object is determined according to the face, see Yao, paragraph 109. Specifically, the sound attribute information (such as amplitude information, audio information, and/or accent cycle information), corresponding to the voice information may be obtained first, …then, a historical face image corresponding to the sound attribute information is determined in a pre-stored face set, see for instance, Yao, paragraph 109.); the step of searching a preset user library for a standard user image according to the user characteristics comprises: searching the preset user library for the standard user image corresponding to a target identity keyword characteristic, wherein the target identity keyword characteristic matches the semantic keyword characteristic (User information and acoustic features corresponding to the input voice can be stored in a preset file, see for instance, Cheng, paragraphs 93-96. User information matching an extracted acoustic feature is acquired and outputted, see for instance, Cheng, paragraphs 83-88. User information includes a user’s name, a user image, a user’s job title, and the like, see for instance, Cheng, paragraph 72. An acquaintance face corresponding to a sound may be pre-recorded, and then when and then, when voice information corresponding to an existing sound is collected, the acquaintance face is searched in the video information as the face corresponding to the voice information, and then the speaking object is determined according to the face, see Yao, paragraph 109. Specifically, the sound attribute information (such as amplitude information, audio information, and/or accent cycle information), corresponding to the voice information may be obtained first, …then, a historical face image corresponding to the sound attribute information is determined in a pre-stored face set, see for instance, Yao, paragraph 109). The motivation to combine Yao in view of Cheng is the same as that which was set forth with respect to claim 1.
Regarding claim 4, Yao in view of Cheng teach the method according to claim 2 and further teach wherein when the user characteristics comprise a tone characteristic (The acoustic feature extraction performs voice information parameterization on the input voice – voice parameters include one or more pitch periods, see Cheng, paragraph 68. Since timbre of each person is usually not the same, the acoustic feature corresponding to the same piece of textural content will also be different, see Cheng, paragraph 69. Sound information includes amplitude information, audio information, and/or accent cycle information, see for instance, Yao, paragraph 109), the preset user library comprises a corresponding relation between the standard user image and a standard tone characteristic (User information and acoustic features corresponding to the input voice can be stored in a preset file, see for instance, Cheng, paragraphs 93-96. An acquaintance face corresponding to a sound may be pre-recorded, and then when and then, when voice information corresponding to an existing sound is collected, the acquaintance face is searched in the video information as the face corresponding to the voice information, and then the speaking object is determined according to the face, see Yao, paragraph 109. Specifically, the sound attribute information (such as amplitude information, audio information, and/or accent cycle information), corresponding to the voice information may be obtained first, …then, a historical face image corresponding to the sound attribute information is determined in a pre-stored face set, see for instance, Yao, paragraph 109.); the step of searching a preset user library for a standard user image according to the user characteristics comprises: searching the preset user library for the standard user image corresponding to a target tone characteristic, wherein the target tone characteristic matches the tone characteristic (User information and acoustic features corresponding to the input voice can be stored in a preset file, see for instance, Cheng, paragraphs 93-96. User information matching an extracted acoustic feature is acquired and outputted, see for instance, Cheng, paragraphs 83-88. User information includes a user’s name, a user image, a user’s job title, and the like, see for instance, Cheng, paragraph 72. An acquaintance face corresponding to a sound may be pre-recorded, and then when and then, when voice information corresponding to an existing sound is collected, the acquaintance face is searched in the video information as the face corresponding to the voice information, and then the speaking object is determined according to the face, see Yao, paragraph 109. Specifically, the sound attribute information (such as amplitude information, audio information, and/or accent cycle information), corresponding to the voice information may be obtained first, …then, a historical face image corresponding to the sound attribute information is determined in a pre-stored face set, see for instance, Yao, paragraph 109). The motivation to combine Yao in view of Cheng is the same as that which was set forth with respect to claim 1.
Regarding claim 5, Yao in view of Cheng teach the method according to claim 1, and further teach wherein when the target information comprise the voice associated information, the step of displaying, by an AR displaying device, target information associated with the user at a display position corresponding to the user image comprises: identifying the voice information and extracting a semantic keyword (Semantic recognition may be performed on the voice information to obtain text information, see paragraph 113. The voice text information may be understood as a sematic of the voice information…it can be understood that each pre-stored AR gesture animation has a corresponding gesture sematic, and in a case where the gesture semantic and voice text information match, the AR gesture animation corresponding to the gesture semantic is obtained, see paragraph 113); acquiring voice associated information associated with the semantic keyword (The voice text information may be understood as a sematic of the voice information…it can be understood that each pre-stored AR gesture animation has a corresponding gesture sematic, and in a case where the gesture semantic and voice text information match, the AR gesture animation corresponding to the gesture semantic is obtained, see paragraph 113. It can be understood that the at least one AR gesture animation is sequentially stitched in an order of the voice text information to obtain the sign language AR animation, see Yao, paragraph 113); and displaying, by the AR displaying device, voice associated information at the display position corresponding to the user image (superimposing and displaying an augmented reality AR sign language animation corresponding to the voice information on a gesture area corresponding to the speaking object (such as around the face area) to obtain a sign language video, see Yao, paragraphs 111, 112, and 114). The motivation to combine Yao in view of Cheng is the same as that which was set forth with respect to claim 1.
Regarding claim 6, Yao in view of Cheng teach the method according to claim 5 and further teach wherein, the method further comprises: searching for contents associated with the semantic keyword from preset multimedia contents, and determining a searched result as the voice associated information (Semantic recognition may be performed on the voice information to obtain text information, see paragraph 113. The voice text information may be understood as a sematic of the voice information…it can be understood that each pre-stored AR gesture animation has a corresponding gesture sematic, and in a case where the gesture semantic and voice text information match, the AR gesture animation (multimedia content) corresponding to the gesture semantic is obtained, see paragraph 113). The motivation to combine Yao in view of Cheng is the same as that which was set forth with respect to claim 1.
Regarding claim 7, Yao in view of Cheng teach the method according to claim 5, and further teach wherein, the method further comprises: performing retrieving according to the semantic keyword, and determining the retrieved contents as the voice associated information (Semantic recognition may be performed on the voice information to obtain text information, see paragraph 113. The voice text information may be understood as a sematic of the voice information…it can be understood that each pre-stored AR gesture animation has a corresponding gesture sematic, and in a case where the gesture semantic and voice text information match, the AR gesture animation corresponding to the gesture semantic is obtained, see paragraph 113). The motivation to combine Yao in view of Cheng is the same as that which was set forth with respect to claim 1.
Regarding claim 8, Yao in view of Cheng teach the method according to claim 1, and further teach wherein when the target information comprises the user information, the step of displaying, by an AR displaying device, target information associated with the user at a display position corresponding to the user image comprises: searching the preset user library for user information corresponding to the standard user image to display the user information by the AR displaying device at the display position corresponding to the user image (An acquaintance face corresponding to a sound may be pre-recorded, and then when and then, when voice information corresponding to an existing sound is collected, the acquaintance face is searched in the video information as the face corresponding to the voice information, and then the speaking object is determined according to the face, see Yao, paragraph 109. Specifically, the sound attribute information (such as amplitude information, audio information, and/or accent cycle information), corresponding to the voice information may be obtained first, …then, a historical face image corresponding to the sound attribute information is determined in a pre-stored face set, see for instance, Yao, paragraph 109. User information and acoustic features corresponding to the input voice can be stored in a preset file, see for instance, Cheng, paragraphs 93-96. User information matching an extracted acoustic feature is acquired and outputted, see for instance, Cheng, paragraphs 83-88.), wherein the user information comprises at least one of location information, a name, a company name, a position, interest, a photo and organization information (User information includes a user’s name, a user image, a user’s job title, and the like, see for instance, Cheng, paragraph 72 and Yao, paragraph 109.). The motivation to combine Yao in view of Cheng is the same as that which was set forth with respect to claim 1.
Regarding claim 9, Yao in view of Cheng teach the method according to claim 1, wherein the method further comprises: prompt information sent by a client is received (The prompting portion may be configured to generate a prompt message for inputting user information, wherein the prompt message is used for the new user to input user information, see for instance, Cheng, paragraph 122); the prompt information is displayed by the AR displaying device (The prompting portion may be configured to generate a prompt message for inputting user information, wherein the prompt message is used for the new user to input user information, see for instance, Cheng, paragraph 122. The prompt message may be a voice or text prompt message, for example, displaying a text message of “please input the name, head portrait or the like of the speaker”, see paragraphs 91-93. The acoustic feature and corresponding user information are stored in a preset file, when user information input by a user based on the prompt message is received, see Cheng, paragraph 93. An acquaintance face corresponding to a sound may be pre-recorded, and then when and then, when voice information corresponding to an existing sound is collected, the acquaintance face is searched in the video information as the face corresponding to the voice information, and then the speaking object is determined according to the face, see Yao, paragraph 109). The motivation to combine Yao in view of Cheng is the same as that which was set forth with respect to claim 1.
Regarding claim 10, Yao teaches an AR apparatus (see for instance, paragraphs 101, 161, and 167), comprising an AR displaying device (see for instance, paragraphs 168 and 172), an audio acquisition module (see for instance, paragraphs 168, 173), an image acquisition module (see for instance, paragraphs 168, 172) and a processor (see for instance, 168, 169 and fig. 8 and 9); 
wherein the audio acquisition module is configured to acquire voice information of a user (voice information and video information are collected in real time, see for instance, paragraph 98. The user wears the AR glasses and during a conversation with the speaking object A and the speaking object B (persons), the user’s AR glasses capture video information of the conversation process of the speaking object A and the speaking object B, and also collects voice messages sent by both of them, see paragraph 101 and figs. 1 and 3. The audio component is configured to output and/or input audio signals (such as through a microphone), see for instance, paragraph 173); 
the image acquisition module is configured to acquire a user image of the user (voice information and video information are collected in real time, see for instance, paragraph 98. The user wears the AR glasses and during a conversation with the speaking object A and the speaking object B (persons), the user’s AR glasses capture video information of the conversation process of the speaking object A and the speaking object B, and also collects voice messages sent by both of them, see paragraph 101 and figs. 1 and 3. The multimedia component includes a front and/or rear camera, see for instance, paragraph 172); 
the processor is configured to identify the voice information and extract user characteristics (In an implementation step S102, which speaking object the currently collected voice information belongs to may be determined by recognizing a person currently speaking, see paragraph 104. When voice information corresponding to an existing sound is collected, the acquaintance face is searched in the video information as the face corresponding to the voice information and the speaking object is determined, see paragraph 109. ); and 
the AR displaying device is configured to, when the user image matches the user characteristics, display, by the AR displaying device, target information associated with the user at a display position corresponding to the user image, wherein the target information comprises at least one of user information and voice associated information (superimposing and displaying an augmented reality AR sign language animation corresponding to the voice information on a gesture area corresponding to the speaking object to obtain a sign language video, see paragraphs 111 and 112).
Yao does not appear to teach that user information is output.
In the same art of graphics and voice, Cheng teaches that voiceprint recognition is the process by which a speaker is determined based on the voice uttered by the speaker, which is the recognition technique that serves the voice as the identity authentication, see for instance, paragraphs 66-67. User information matching an extracted acoustic feature is acquired and outputted, see for instance, paragraphs 83-88. After a voice is received, the voice may be converted into a text and display the user information of the speaker in front of the converted text simultaneously after identifying the user, see paragraph 77. The received voice information can be translated into another language, see paragraph 77.
It would have been obvious to one of ordinary skill in the art having the teachings of Yao and Cheng in front of them before the effective filing date of the claimed invention to incorporate user identification as taught by Cheng into Yao’s AR sign language system, as identifying a user based on their acoustic information and outputting the information to the display, such as described by Cheng was well known at the time of the effective filing date invention and would have yielded predictable results in combination with Yao. 
The modification of Yao with Cheng would have explicitly allowed the user information to be outputted. 
The motivation for combining Yao with Cheng would have been to use a known technique (voiceprint recognition), improve the user experience, and enhance functionality, see for instance, Cheng, paragraph 2.
Regarding claim 11, Yao in view of Cheng teach the AR apparatus according to claim 10, wherein the processor is further configured to: search a preset user library for a standard user image according to the user characteristics(An acquaintance face corresponding to a sound may be pre-recorded, and then when and then, when voice information corresponding to an existing sound is collected, the acquaintance face is searched in the video information as the face corresponding to the voice information, and then the speaking object is determined according to the face, see Yao, paragraph 109. Specifically, the sound attribute information (such as amplitude information, audio information, and/or accent cycle information), corresponding to the voice information may be obtained first, …then, a historical face image corresponding to the sound attribute information is determined in a pre-stored face set, see for instance, Yao, paragraph 109. The pre-stored face set may include multiple face images, and each face image is associated with at least one pre-stored sound attribute information, see Yao, paragraph 109. User information and acoustic features corresponding to the input voice can be stored in a preset file, see for instance, Cheng, paragraphs 93-96. User information matching an extracted acoustic feature is acquired and outputted, see for instance, Cheng, paragraphs 83-88); and when the user image matches the standard user image, confirm that the user image matches the user characteristics (A historical face image corresponding to the sound attribute information is determined in a pre-stored face set, see for instance, Yao, paragraph 109. The pre-stored face set may include multiple face images, and each face image is associated with at least one pre-stored sound attribute information, see Yao, paragraph 109. User information and acoustic features corresponding to the input voice can be stored in a preset file, see for instance, Cheng, paragraphs 93-96. User information matching an extracted acoustic feature is acquired and outputted, see for instance, Cheng, paragraphs 83-88. By comparing the obtained sound attribute information with the pre-stored sound attribute information, sound attribute information matching the sound attribute information corresponding to the voice information is found in the pre-stored sound attribute information, so that the corresponding face image is taken as the historical face image, see for instance, Yao, paragraph 109). The motivation to combine Yao in view of Cheng is the same as that which was set forth with respect to claim 10.
Regarding claim 12. Yao in view of Cheng teach the AR apparatus according to claim 11, wherein when the user characteristics include a semantic keyword characteristic (Semantic recognition may be performed on the voice information to obtain text information, see paragraph 113. The voice text information may be understood as a sematic of the voice information…it can be understood that each pre-stored AR gesture animation has a corresponding gesture sematic, and in a case where the gesture semantic and voice text information match, the AR gesture animation corresponding to the gesture semantic is obtained, see paragraph 113), the preset user library comprises a corresponding relation between the standard user image and an identity keyword characteristic (User information and acoustic features corresponding to the input voice can be stored in a preset file, see for instance, Cheng, paragraphs 93-96. An acquaintance face corresponding to a sound may be pre-recorded, and then when and then, when voice information corresponding to an existing sound is collected, the acquaintance face is searched in the video information as the face corresponding to the voice information, and then the speaking object is determined according to the face, see Yao, paragraph 109. Specifically, the sound attribute information (such as amplitude information, audio information, and/or accent cycle information), corresponding to the voice information may be obtained first, …then, a historical face image corresponding to the sound attribute information is determined in a pre-stored face set, see for instance, Yao, paragraph 109); the processor is further configured to: search the preset user library for the standard user image corresponding to a target identity keyword characteristic, wherein the target identity keyword characteristic matches the semantic keyword characteristic (User information and acoustic features corresponding to the input voice can be stored in a preset file, see for instance, Cheng, paragraphs 93-96. User information matching an extracted acoustic feature is acquired and outputted, see for instance, Cheng, paragraphs 83-88. User information includes a user’s name, a user image, a user’s job title, and the like, see for instance, Cheng, paragraph 72. An acquaintance face corresponding to a sound may be pre-recorded, and then when and then, when voice information corresponding to an existing sound is collected, the acquaintance face is searched in the video information as the face corresponding to the voice information, and then the speaking object is determined according to the face, see Yao, paragraph 109. Specifically, the sound attribute information (such as amplitude information, audio information, and/or accent cycle information), corresponding to the voice information may be obtained first, …then, a historical face image corresponding to the sound attribute information is determined in a pre-stored face set, see for instance, Yao, paragraph 109). The motivation to combine Yao in view of Cheng is the same as that which was set forth with respect to claim 10.
Regarding claim 13, Yao in view of Cheng teach the AR apparatus according to claim 11, wherein when the user characteristics include a tone characteristic (The acoustic feature extraction performs voice information parameterization on the input voice – voice parameters include one or more pitch periods, see Cheng, paragraph 68. Since timbre of each person is usually not the same, the acoustic feature corresponding to the same piece of textural content will also be different, see Cheng, paragraph 69. Sound information includes amplitude information, audio information, and/or accent cycle information, see for instance, Yao, paragraph 109), the preset user library comprises a corresponding relation between the standard user image and a standard tone characteristic (User information and acoustic features corresponding to the input voice can be stored in a preset file, see for instance, Cheng, paragraphs 93-96. An acquaintance face corresponding to a sound may be pre-recorded, and then when and then, when voice information corresponding to an existing sound is collected, the acquaintance face is searched in the video information as the face corresponding to the voice information, and then the speaking object is determined according to the face, see Yao, paragraph 109. Specifically, the sound attribute information (such as amplitude information, audio information, and/or accent cycle information), corresponding to the voice information may be obtained first, …then, a historical face image corresponding to the sound attribute information is determined in a pre-stored face set, see for instance, Yao, paragraph 109); the processor is further configured to: search the preset user library for the standard user image corresponding to a target tone characteristic, wherein the target tone characteristic matches the tone characteristic (User information and acoustic features corresponding to the input voice can be stored in a preset file, see for instance, Cheng, paragraphs 93-96. User information matching an extracted acoustic feature is acquired and outputted, see for instance, Cheng, paragraphs 83-88. User information includes a user’s name, a user image, a user’s job title, and the like, see for instance, Cheng, paragraph 72. An acquaintance face corresponding to a sound may be pre-recorded, and then when and then, when voice information corresponding to an existing sound is collected, the acquaintance face is searched in the video information as the face corresponding to the voice information, and then the speaking object is determined according to the face, see Yao, paragraph 109. Specifically, the sound attribute information (such as amplitude information, audio information, and/or accent cycle information), corresponding to the voice information may be obtained first, …then, a historical face image corresponding to the sound attribute information is determined in a pre-stored face set, see for instance, Yao, paragraph 109). The motivation to combine Yao in view of Cheng is the same as that which was set forth with respect to claim 10.
Regarding claim 14, Yao in view of Cheng teach the AR apparatus according to claim 10, wherein when the target information comprises voice associated information, the AR displaying device is further configured to: identify the voice information and extract a semantic keyword (Semantic recognition may be performed on the voice information to obtain text information, see paragraph 113. The voice text information may be understood as a sematic of the voice information…it can be understood that each pre-stored AR gesture animation has a corresponding gesture sematic, and in a case where the gesture semantic and voice text information match, the AR gesture animation corresponding to the gesture semantic is obtained, see paragraph 113); 
acquire voice associated information associated with the semantic keyword (The voice text information may be understood as a sematic of the voice information…it can be understood that each pre-stored AR gesture animation has a corresponding gesture sematic, and in a case where the gesture semantic and voice text information match, the AR gesture animation corresponding to the gesture semantic is obtained, see paragraph 113. It can be understood that the at least one AR gesture animation is sequentially stitched in an order of the voice text information to obtain the sign language AR animation, see Yao, paragraph 113); 
display, by an AR displaying device, voice associated information at the display position corresponding to the user image (superimposing and displaying an augmented reality AR sign language animation corresponding to the voice information on a gesture area corresponding to the speaking object (such as around the face area) to obtain a sign language video, see Yao, paragraphs 111, 112, and 114). The motivation to combine Yao in view of Cheng is the same as that which was set forth with respect to claim 10.
Regarding claim 15, Yao in view of Cheng teach the AR apparatus according to claim 14, wherein the processor is further configured to search preset multimedia contents for contents associated with the semantic keyword, and determine the searched result as the voice associated information (Semantic recognition may be performed on the voice information to obtain text information, see paragraph 113. The voice text information may be understood as a sematic of the voice information…it can be understood that each pre-stored AR gesture animation has a corresponding gesture sematic, and in a case where the gesture semantic and voice text information match, the AR gesture animation (multimedia content) corresponding to the gesture semantic is obtained, see paragraph 113). The motivation to combine Yao in view of Cheng is the same as that which was set forth with respect to claim 10.
Regarding claim 16. Yao in view of Cheng teach the AR apparatus according to claim 14, wherein the processor is further configured to retrieve according to the semantic keyword, and determine the retrieved result as the voice associated information (Semantic recognition may be performed on the voice information to obtain text information, see paragraph 113. The voice text information may be understood as a sematic of the voice information…it can be understood that each pre-stored AR gesture animation has a corresponding gesture sematic, and in a case where the gesture semantic and voice text information match, the AR gesture animation corresponding to the gesture semantic is obtained, see paragraph 113). The motivation to combine Yao in view of Cheng is the same as that which was set forth with respect to claim 10.
Regarding claim 17, Yao in view of Cheng teach the AR apparatus according to claim 10, wherein when the target information comprises user information, the processor is further configured to search the preset user library for user information corresponding to the standard user image to display the user information by the AR displaying device at the display position corresponding to the user image (An acquaintance face corresponding to a sound may be pre-recorded, and then when and then, when voice information corresponding to an existing sound is collected, the acquaintance face is searched in the video information as the face corresponding to the voice information, and then the speaking object is determined according to the face, see Yao, paragraph 109. Specifically, the sound attribute information (such as amplitude information, audio information, and/or accent cycle information), corresponding to the voice information may be obtained first, …then, a historical face image corresponding to the sound attribute information is determined in a pre-stored face set, see for instance, Yao, paragraph 109. User information and acoustic features corresponding to the input voice can be stored in a preset file, see for instance, Cheng, paragraphs 93-96. User information matching an extracted acoustic feature is acquired and outputted, see for instance, Cheng, paragraphs 83-88), wherein the user information comprises at least one of location information, a name, a company name, a position, interest, a photo and organization information (User information includes a user’s name, a user image, a user’s job title, and the like, see for instance, Cheng, paragraph 72 and Yao, paragraph 109). The motivation to combine Yao in view of Cheng is the same as that which was set forth with respect to claim 10.
Regarding claim 18, Yao in view of Cheng teach the AR apparatus according to claim 10 and further teach wherein the processor is further configured to: prompt information sent by a client is received (The prompting portion may be configured to generate a prompt message for inputting user information, wherein the prompt message is used for the new user to input user information, see for instance, Cheng, paragraph 122); wherein, the prompt information is displayed by the AR displaying device (The prompting portion may be configured to generate a prompt message for inputting user information, wherein the prompt message is used for the new user to input user information, see for instance, Cheng, paragraph 122. The prompt message may be a voice or text prompt message, for example, displaying a text message of “please input the name, head portrait or the like of the speaker”, see paragraphs 91-93. The acoustic feature and corresponding user information are stored in a preset file, when user information input by a user based on the prompt message is received, see Cheng, paragraph 93. An acquaintance face corresponding to a sound may be pre-recorded, and then when and then, when voice information corresponding to an existing sound is collected, the acquaintance face is searched in the video information as the face corresponding to the voice information, and then the speaking object is determined according to the face, see Yao, paragraph 109). The motivation to combine Yao in view of Cheng is the same as that which was set forth with respect to claim 10.
Regarding claim 19, Yao in view of Cheng further teach an electronic device, comprising a processor, a memory and a program or instruction which is stored in the memory and is capable of being run in the processor, wherein when the program or instruction is executed by the processor, the AR-based information displaying method according to claim 1 is executed (see for instance, Yao, paragraphs 168-179 and figs. 8 and 9). The motivation to combine Yao in view of Cheng is the same as that which was set forth with respect to claim 1.
Regarding claim 20, Yao teaches a nonvolatile computer-readable storage medium (see for instance, paragraphs 178 and 179), wherein a computer program code in the storage medium is executable by a processor of an electronic device, whereby the electronic device is configured to perform operations (see for instance, paragraphs 178 and 179) comprising: 
acquiring voice information and a user image of a user (voice information and video information are collected in real time, see for instance, paragraph 98. The user wears the AR glasses and during a conversation with the speaking object A and the speaking object B (persons), the user’s AR glasses capture video information of the conversation process of the speaking object A and the speaking object B, and also collects voice messages sent by both of them, see paragraph 101 and figs. 1 and 3); 
identifying the voice information and extracting user characteristics (In an implementation step S102, which speaking object the currently collected voice information belongs to may be determined by recognizing a person currently speaking, see paragraph 104. When voice information corresponding to an existing sound is collected, the acquaintance face is searched in the video information as the face corresponding to the voice information and the speaking object is determined, see paragraph 109); and 
when the user image matches the user characteristics, displaying, by an AR displaying device, target information associated with the user at a display position corresponding to the user image, wherein the target information comprises at least one of user information and voice associated information (superimposing and displaying an augmented reality AR sign language animation corresponding to the voice information on a gesture area corresponding to the speaking object to obtain a sign language video, see paragraphs 111 and 112).
Yao does not appear to teach that user information is output.
In the same art of graphics and voice, Cheng teaches that voiceprint recognition is the process by which a speaker is determined based on the voice uttered by the speaker, which is the recognition technique that serves the voice as the identity authentication, see for instance, paragraphs 66-67. User information matching an extracted acoustic feature is acquired and outputted, see for instance, paragraphs 83-88. After a voice is received, the voice may be converted into a text and display the user information of the speaker in front of the converted text simultaneously after identifying the user, see paragraph 77. The received voice information can be translated into another language, see paragraph 77.
It would have been obvious to one of ordinary skill in the art having the teachings of Yao and Cheng in front of them before the effective filing date of the claimed invention to incorporate user identification as taught by Cheng into Yao’s AR sign language system, as identifying a user based on their acoustic information and outputting the information to the display, such as described by Cheng was well known at the time of the effective filing date invention and would have yielded predictable results in combination with Yao. 
The modification of Yao with Cheng would have explicitly allowed the user information to be outputted. 
The motivation for combining Yao with Cheng would have been to use a known technique (voiceprint recognition), improve the user experience, and enhance functionality, see for instance, Cheng, paragraph 2.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHAEL J COBB whose telephone number is (571)270-3875. The examiner can normally be reached Monday - Friday, 11am - 7pm ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Xiao Wu can be reached on 571-272-7761. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/MICHAEL J COBB/  Primary Examiner, Art Unit 2613