Detailed Action
This action is in response to RCE filed on 03/26/2021. 
This application was filed on 03/15/2019. 
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claims 1-20 are pending.
Claims 1-20 are rejected.

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 03/26/2021 has been entered.
 
Applicant’s Response
In Applicant’s Response dated 03/26/2021, Applicant amended claims 1, 4, 9, 12, 17, and 20.  Applicant argued against all rejections previously set forth in the Office Action mailed 01/13/2021. 
In light of Applicant’s amendments and remarks, all rejections of the claims under double patenting type rejection set forth previously are withdrawn. 

Information Disclosure Statement
The information disclosure Statement (IDS) submitted on 04/13/2021 and 04/13/2021 are in compliance with the provisions of 37 CFR 1.97.  
Accordingly, the IDS statements are being considered by the examiner.

Examiner Notes
Examiner cites particular columns, paragraphs, figures and line numbers in the references as applied to the claims below for the convenience of the applicant. Although the specified citations are representative of the teachings in the art and are applied to the specific limitations within the individual claim, other passages and figures may apply as well. It is respectfully requested that, in preparing responses, the applicant fully consider the references in their entirety as potentially teaching all or part of the claimed invention, as well as the context of the passage as taught by the prior art or disclosed by the examiner.
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

Claim Rejections - 35 USC § 103

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:

	
Claims 1-20 are rejected under 35 U.S.C. 103 as being unpatentable over Arakawa et al. (US 2018/0246569 A1, referred hereinafter as D1) in view of Vozila et al. (US 2019/0272902 A1, referred hereinafter as D2).

As per claim 1, D1 discloses, 
An apparatus comprising, (D1, title, abstract, figure 13).  
one or more processors; and one or more memories storing instructions which, when processed by the one or more processors, cause, (D1, title, abstract, figure 13).  
receiving a media content item representing captured content from a discussion of an electronic document by one or more users, (D1, title, abstract, figure 4, 0023-0030, 0034, 0036 discloses receiving user voice audio data and video data (gaze) discussing and viewing document portions.).
identifying, from the media content item, portions of media content corresponding to content suggestions for the electronic document, (D1, title, abstract, figure 4, figure 10, 0023-0030, 0034, 0036 discloses receiving user voice audio data and video data (gaze) (collectively construed as “media content”) discussing and viewing document portions, where based on the media content, identifying/generating text from audio data corresponding, and attaching the text as annotation to specific portions of a document). 
for each portion of media content of the portions of the media content corresponding to content suggestions for the one more electronic documents: analyzing the portion of the media content to identify a document portion, from the one or more electronic documents, that corresponds to the portion of the media content, (D1, title, abstract, figure 4, figure 10, 0023-0030, 0034, 0036, 0063 discloses receiving user voice audio data and continues video data (gaze) (collectively construed as “media contents”) discussing and viewing document portions, where based on the media content , identifying/generating text from audio data, and attaching the text as annotation to specific portions of a document based on the analysis of video media (e.g. eye gaze)).
generating annotation that represents the portion of media content, and associating the annotation to a location corresponding to the document portions within a particular electronic document from the one or more electronic documents, (D1, title, abstract, figure 4, figure 10, 0023-0030, 0034, 0036 discloses receiving user voice audio data and video data (gaze) (collectively construed as “media content”) discussing and viewing document portions, where based on the media content, identifying/generating text from audio data, and attaching the text as annotation to specific portions of a document). 
wherein the annotation identifies a user speaking within the portion of the media content, (D1, title, abstract, figure 4, figure 10, 0023-0030, 0034, 0036, 0078 discloses receiving user voice audio data and video data (gaze) (collectively construed as “media content”) discussing and viewing document portions, where based on the media content, identifying/generating text from audio data, and  a meeting name, meeting place, and meeting date and time, a user ID that identifies a speaking user within the portion of media content.  For instance, the user may utter “this is john, the author of this annotations...etc.”, and the annotation will include such data.).
and displaying, in electronic form within a display window, the one or more electronic documents with their corresponding one or more generated annotations from the portions of media content, (D1, title, abstract, figure 4, figure 10, 0023-0030, 0034, 0036 discloses receiving user voice audio data and video data (gaze) (collectively construed as “media content”) discussing and viewing document portions, where based on the media content, identifying/generating text from audio data, and attaching the text as annotation to specific portions of a document).
As noted above, D1 arguably discloses analyzing the portion of media content to identify a document portion; nevertheless, for the sake clarity/completeness, D2 also and/or alternatively discloses the above limitation.
D2 (0083) explicitly discloses adding comments/text to various portions of a document based on identifying the sections of document according to analysis of user utterances/audio data.  Thus, D2 additionally discloses for each portion of media content of the portions of the media content corresponding to content suggestions for the one more electronic documents: analyzing the portion of the media content to identify a document portion, from the one or more electronic documents, that corresponds to the portion of the media content   
Accordingly, it would have been obvious to one having ordinary skill in the art before the effective filing date of the invention, as disclosed in D1, to include the teachings of D2 as noted above.  This would have resulted in system of D1 with ability to detect user speech and extract relevant content for inclusion in a particular document portions based audio analysis of user’s spoken words.  This would have been obvious for the purpose of assisting annotators or document reviewer to add specific content or text to a document without having to type it out as disclosed by D2. 
 	
As per claim 2, the rejection of claim 1 further incorporated, D1 discloses,
wherein the media content item is one of an audio file, a video file, a captured screenshot, or an interactive whiteboard file that contains a series of coordinates corresponding to received input representing generated marks on an interactive whiteboard, (D1, title, abstract, figure 4, figure 10, 0023-0030, 0034, 0036 discloses receiving user voice audio data and video data (gaze) (collectively construed as “media content”) discussing and viewing document portions, where based on the media content, identifying/generating text from audio data, and attaching the text as annotation to specific portions of a document).

As per claim 3, the rejection of claim 1 further incorporated, D1 discloses,
generating updated one more electronic documents that each include their associated annotations corresponding to identified portion, (D1, title, abstract, figure 4, figure 10, 0023-0030, 0034, 0036 discloses receiving user voice audio data and video data (gaze) (collectively construed as “media content”) discussing and viewing document portions, where based on the media content, identifying/generating text from audio data, and attaching the text as annotation to specific portions of a document).

As per claim 4, the rejection of claim 1 further incorporated, D1 discloses,
wherein generating the annotation that represents the portion of media content and associating the annotation to the location corresponding to the document portion within the particular electronic document of the one or more electronic documents, comprises, (D1, title, abstract, figure 4, figure 10, 0023-0030, 0034, 0036 discloses receiving user voice audio data and video data (gaze) (collectively construed as “media content”) discussing and viewing document portions, where based on the media content, identifying/generating text from audio data, and attaching the text as annotation to specific portions of a document).
 generating the annotation comprising a text transcription of the portion of media content and an electronic link to the media content item containing the portion of media content, wherein the electronic link to the media content item is queued to play the portion of media content; and associating the annotation to the location corresponding to the document portion within the particular electronic document of the one or more electronic documents, (D1, title, 

As per claim 5, the rejection of claim 1 further incorporated, D1 discloses,
wherein the annotation that represents the portion of media content includes one or more meeting details that include at least one of a meeting name, meeting place, and meeting date and time; or user ID that identifies the user  speaking within the portion of media content, (D1, title, abstract, figure 4, figure 10, 0023-0030, 0034, 0036, 0078 discloses receiving user voice audio data and video data (gaze) (collectively construed as “media content”) discussing and viewing document portions, where based on the media content, identifying/generating text from audio data, and attaching the text as annotation to specific portions of a document, where the annotation further includes annotation output button/link, and selection of the button plays the audio content.  The examiner notes that the content of the annotation is at the whim of the user, and may include any data including non-functional data such as a meeting name, meeting place, and meeting date and time, a user ID that identifies a speaking user within the portion of media content. For instance, the user may utter “this is john, the author of this annotations...etc.”, and the annotation will include such data.).
As per claim 6, the rejection of claim 1 further incorporated, D1 discloses,
identifying the portions of media content that correspond to phrases indicating the content suggestions for the electronic document, (D1, title, abstract, figure 4, figure 10, 0023-0030, 0034, 0036 discloses receiving user voice audio data and video data (gaze) (collectively construed as “media content”) discussing and viewing document portions, where based on the media content, identifying/generating text from audio data, and attaching the text as annotation to specific portions of a document).
D1 fails to expressly disclose - using a machine-learning model… wherein the machine-learning model has been trained using an input data set of media content items that have identified content suggestion speech.
D2 (0064-0065, 0082-0083) discloses using a machine-learning model… wherein the machine-learning model has been trained using an input data set of media content items that have identified content suggestion speech (e.g. prior transcripts and medical reports are used as training data to generate models for mapping transcripts and medical reports). 
Accordingly, it would have been obvious to one having ordinary skill in the art before the effective filing date of the invention, as disclosed in D1, to include using a machine-learning model… wherein the machine-learning model has been trained using an input data set of media content items that have identified content suggestion speech.  This would be resulted in system of D1 with ability to detect user speech and extract relevant content.  This would have been obvious for the purpose of assisting annotators 

As per claim 7:
The rejection of claim 6 further incorporated. 
D1 fails to expressly disclose - cause: using the machine-learning model, determining a content suggestion type for each of the portions of media content, wherein the content suggestion type is one of a comment or a suggested edit; upon generating the annotation that represents the portion of media content, determining that the annotation for the portion of media content corresponds to a suggested edit to the document portion; calculating a confidence score for the suggested edit, wherein the confidence score represents a level of confidence that the portion of media content corresponds to the suggested edit to the document portion; determining that the confidence score for the suggested edit is above a confidence score threshold for automatically editing the document portion; and automatically editing the document portion to reflect changes proposed in the suggested edit.
D2 (0057-0059, 0064-0065, 0069, 0070, 0082-0083, 0117, figure 8) discloses using the machine-learning model, determining a content suggestion type for each of the portions of media content, wherein the content suggestion type is one of a comment or a suggested edit (e.g. D2 discloses non-relevant content is not included in the medical report, while conversation content relevant to medical report are included); upon generating the annotation that represents the portion of media content (e.g. , determining that the annotation for the portion of media content corresponds to a suggested edit to the document portion; calculating a confidence score for the suggested edit, wherein the confidence score represents a level of confidence that the portion of media content corresponds to the suggested edit to the document portion; determining that the confidence score for the suggested edit is above a confidence score threshold for automatically editing the document portion; and automatically editing the document portion to reflect changes proposed in the suggested edit (e.g. medical report is edited with content from transcripts based on content confidence being above a threshold).  
Accordingly, it would have been obvious to one having ordinary skill in the art before the effective filing date of the invention, as disclosed in D1, to include the teachings of D2 as noted above.  This would have resulted in system of D1 with ability to detect user speech and extract relevant content for inclusion in a particular document.  This would have been obvious for the purpose of assisting annotators or document reviewer to add specific content or text to a document without having to type it out as disclosed by D2. 

As per claim 8:
The rejection of claim 1 further incorporated. 
D1 fails to expressly disclose - using a first machine-learning model, identifying document portions within the one or more electronic documents based upon a determined document types associated with the one or more electronic documents and combinations of words within the one or more electronic documents, wherein the first machine-learning model has been trained using a plurality of documents of different document types; and using a second machine-learning model, correlating the document portion of the document portions to the portion of media content based upon a relative position determined for the portion of media content and a text transcription of the portion of media content, wherein the second machine-learning model has been trained using a plurality of document portions from a plurality of electronic documents and corresponding content suggestions for the plurality of document portions from the plurality of electronic documents.
D2 (0057-0059, 0064-0065, 0070, 0082-0083) discloses using a first machine-learning model, identifying document portions within the one or more electronic documents based upon a determined document types  (e.g. medical report associated with particular physician, institution etc.) associated with the electronic document and combinations of words within the electronic document (e.g. link document report portions to audio/transcript based on word mapping) , wherein the first machine-learning model has been trained using a plurality of documents of different document types; and using a second machine-learning model (e.g. system of D1 trained using previous transcripts and medical reports), correlating the document portion of the document portions to the portion of media content based upon a relative position determined for the portion of media content and a text transcription of the portion of media content (e.g. map medical report portions to transcript “linkage”), wherein the second machine-learning model has been trained using a plurality of document portions from a plurality of electronic documents and corresponding content suggestions for the plurality of document portions from the plurality of electronic documents (e.g. system of D1 trained using previous transcripts and medical reports), where the first machine learning model and the second machine learning maybe the same model.  
Accordingly, it would have been obvious to one having ordinary skill in the art before the effective filing date of the invention, as disclosed in D1, to include the teachings of D2 as noted above.  This would have resulted in system of D1 with ability to detect user speech and extract relevant content for inclusion in a particular document.  This would have been obvious for the purpose of assisting annotators or document reviewer to add specific content or text to a document without having to type it out as disclosed by D2. 

As per claims 9-20:
Claims 9-20 are media and method claims corresponding to apparatus claims 1-8 and are of substantially same scope.  
Accordingly, claims 9-20 are rejected under the same rational as set forth for claims 1-8. 

Response to Arguments
	Applicant’s arguments filed on 03/26/2021 have been fully considered but they are not persuasive and/or moot in view of modified grounds of rejections. 

Conclusion

The prior art made of record and not relied upon is considered pertinent to applicant’s disclosure. 
See form 892 for additional cited prior art.
In particular see:
US 8091028 B2 
TITLE: Method and apparatus for annotating a line-based document
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MUSTAFA A AMIN whose telephone number is (571)270-3181.  The examiner can normally be reached on Monday-Friday 8am-5pm.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Scott Baderman can be reached on 571-272-3644.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.