DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . 
This Office Action is sent in response to Applicant’s Communication received 8/5/2021 for application number 16/786,783. 
Claims 1-20 are pending for examination. Claims 1, 17 and 20 are independent claims. 

Priority
Acknowledgement is made of applicant’s claim for the provisional applications, US Provisional Patent Application No. 62/803,965 and 62/888,852, filed on 2/11/2019 and 8/19/2019, respectively.   Thus, the effective filing date of the claims is considered to be 2/11/2019.

Claim Rejections - 35 USC § 103
  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.


Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claims 1—14 and 16-20 are rejected under 35 U.S.C. 103 as being unpatentable over Dohring et al. (US Patent Application 2013/0130212; hereinafter Dohring) in view of Tesch et al. (US Patent Application 2014/0164507 ; hereinafter Tesch), further in view of August et al. (US Patent Application 2003/0028378 ; hereinafter August).  

As to independent claim 1, Dohring teaches a system comprising:
a non-transitory computer-readable storage medium storing computer-executable instructions; and
one or more hardware processors in communication with the computer-readable memory, wherein the executable instructions, when executed by the one or more hardware processors, cause the one or more hardware processors to at least:
generate user interface data that, when executed by a user device, causes the user device to display a user interface comprising a plurality of selectable sound identifiers, wherein [Para 0018 -wherein said interface allows a learner to optionally access a visual representation and an auditory representation of each said phoneme in said taxonomy; and a software module for providing an interface for practicing each said phoneme in the context of the beginning, middle, and end of words of said target language];
process an indication that a first sound identifier in the plurality of sound identifiers is selected [Para 0067 - the software module for providing translation of voiceover and/or text produces a voiced translation by selecting an appropriate audio data file from among a collection of stored data files];
determine a set of video files that are each associated with the first sound identifier [Para 0041 - a visual representation includes, by way of non-limiting examples, images, videos, animations, and illustrations associated with the sound of the phoneme];
update the user interface data to form updated user interface data, wherein the updated user interface data, when executed by the user device, causes the user device to update the user interface to display the set of video files [Fig. 2, Para 0047 - FIG. 2 depicts a non-limiting example of such a module for practicing the English phoneme /m/. In further embodiments, the software module includes a visual representation of the phoneme currently practiced (e.g., /m/) and a plurality of words that begin with that phoneme (e.g., map, moon, and man)];
process a second indication that a second sound identifier in the plurality of sound identifiers is selected [Para 0067 - the software module for providing translation of voiceover and/or text produces a voiced translation by selecting an appropriate audio data file from among a collection of stored data files – Examiner note: Dohring allow a user to select a different audio data file];
Dohring does not appear to teach:
determine a subset of the set of video files that are each associated with the first sound identifier and the second sound identifier; and
update the updated user interface data to form second updated user interface data, wherein the second updated user interface data, when executed by the user device, causes the user device to update the user interface to display the subset of the set of video files in place of the set of video files.
However, Tesch teaches in the same field of endeavor:
determine a subset of the set of video files that are each associated with the first sound identifier and the second sound identifier [Para 0071 - The recommendation component 108 operates to further narrow searching or identification of media content portions within media content. Because the volume of media content can be large from multiple different data stores and continue to grow, the recommendations component 108 can further focus the generation of media content portion 107 to a subset of recommended media content portions from a larger set of media content portions], wherein each video file in the subset illustrates at least how the first and second sounds are pronounced [Abs - The media content portions are identified and extracted among media content based on predetermined criteria that can include a match of audio content with the words or phrases of the received message inputs; Para 0124 - a matching of media content portions of the set of media content portions from the media content identified with a set of words or phrases, a matching audio clip or portion within the set of media content portions and/or a matching action to the words or phrases can also be part of the set of predetermined criteria by which the media extraction component 808 can extract portions of video/audio content from media content files or recordings; Fig. 20, Para 0184 - a message input having a set of words or phrases for generating a multimedia message. At 2004, the method includes determining, from media content, a first media content portion that includes a first audio content portion of a first video content portion and a second media content portion that includes a second audio content portion of a second video content portion, wherein the first media content portion and the second media content portion correspond to the set of words or phrases of the message input based on a set of predetermined criteria, for example]; and
update the updated user interface data to form second updated user interface data, wherein the second updated user interface data, when executed by the user device, causes the user device to update the user interface to display the subset of the set of video files in place of the set of video files [Abs - The media content portions correspond to the words or phrases of the message inputs and can further be recommended to a user based on additional criteria. The media content portions that are recommended can be included in multimedia message, which can be further communicated].
It would have been obvious to one of ordinary skill in art, having the teachings of Dohring and Tesch at the time of filing, to modify a digital processing device and a computer program that creates a language phoneme practice engine disclosed by Dohring to include the concept of generating media content portions that match message input having words or phrases  taught by Tesch to overcome problems with conventional systems [Tesch, Para 0004].
[Tesch, Para 0004].
Dohring and Tesch do not appear to teach:
wherein each video file in the set illustrates a person demonstrating at least how the first sound is pronounced;
wherein each video file in the subset illustrates a person demonstrating at least how the first and second sounds are pronounced;
However, August teaches in the same field of endeavor:
wherein each video file in the set illustrates a person demonstrating at least how the first sound is pronounced;
wherein each video file in the subset illustrates a person demonstrating at least how the first and second sounds are pronounced;  [Fig. 2, Para 0056 - The animation module 34 provides visual aid to a student. The module 34 is synchronized with the TTS module 30, retrieves text files and, together with the TTS module or engine, pronounces the word for the student through an animated image of a human head and face. Preferably, the animated image of the face and human head portrays a three-dimensional perspective and the image has the capability of being rotated, tilted, etc. for full view from various angles. Accordingly, the student can observe characteristics of facial and mouth movements, and placement of the tongue, lips and teeth during speech examples]
It would have been obvious to one of ordinary skill in art, having the teachings of Dohring, Tesch and August at the time of filing, to modify a digital processing device and a [August, Para 0009].
One of the ordinary skill in the art wanted to be motivated to include the concept of interactive language instruction taught by August to have available such a system that selectively incorporates facial animation to assist a student in the learning process [August, Para 0009].

As to dependent claim 2, Dohring, Tesch and August teach the system of claim 1. 
Tesch further teaches: wherein the executable instructions, when executed, further cause the one or more hardware processors to at least:
process a third indication that a first video file and a second video file from the subset of the set of video files are selected;
merge the first and second video files to form an assignment video file, wherein the assignment video file is associated with one or more users; and
store the video assignment file for subsequent access by the one or more users [Para 0185 - At 2006, the first audio content portion is combined with the second video content portion to form a third media content portion, and at 2008 a multimedia message is generated that includes the third media content portion].

As to dependent claim 3, Dohring, Tesch and August teach the system of claim 2. 
Tesch further teaches: wherein the first video file is selected prior to the second video file [Fig. 20, Para 0184 - At 2004, the method includes determining, from media content, a first media content portion that includes a first audio content portion of a first video content portion and a second media content portion that includes a second audio content portion of a second video content portion], and wherein the executable instructions, when executed, further cause the one or more hardware processors to at least concatenate the second video file to the end of the first video file to form the assignment video file [Para 0185].

As to dependent claim 4, Dohring, Tesch and August teach the system of claim 1. 
Tesch further teaches: wherein the executable instructions, when executed, further cause the one or more hardware processors to at least:
process a third indication that a first video file and a second video file from the subset of the set of video files are selected [Para 0087 - recommendations component 108 can further focus the generation of media content portion 107 to a subset of recommended media content portions from a larger set of media content portions. In this way, various types of refined preferences can be used for various types of objectives];
assign the first and second video files to a first user [Para 0176 - a personal media content preference selected to correspond with the set of media content from a personal video or audio stored in a data store, such as a characteristic pertaining to the media content portions; Para 0151 - public stores can be used for other parts of the multimedia message, and a personal data store used for yet another part of the multimedia message being created – Examiner note: personal data store indicates that multimedia messages are assigned to a user]; and 
stream the first and second video files to a second user device associated with the first user [Para 0155 - a set of shared media content portions are published via a network to provide access to the set of shared media content portions at a social network data store based on a set of user preferences or a set of classification criteria. At 1406, the multimedia message is generated with the set of shared media content portions to correspond to a set of words or phrases receive].

As to dependent claim 5, Dohring, Tesch and August teach the system of claim 1. 
Dohring further teaches: wherein the first sound comprises one of a consonant, a vowel, a consonant blend, a vocalic sound, or a phoneme [Abs - a software module for providing an interface for practicing each said phoneme in said taxonomy].

As to dependent claim 6, Dohring, Tesch and August teach the system of claim 1. 
Tesch further teaches: wherein the executable instructions, when executed, further cause the one or more hardware processors to at least:
process a third indication that a clip type is selected [Para 0314 - The thumbnail component 4412 generates a display of a representation of each media content portion (e.g., video clips) with an indicator of the type of message the media content portion expresses]; and 
[Para 0315 - the viewing pane 4800 can include various classifications of various media content portions, such as alphabetical orderings, popular phrases, type of content or categories of words or phrases, quotes, effects and others, which can include sound effects, stage effects, video effects, dramatic actions, expressions, shouts, etc.,].

As to dependent claim 7, Dohring, Tesch and August teach the system of claim 6. 
Tesch further teaches: wherein the clip type comprises at least one of an isolated sound, a word, a phrase, a sentence, or a tongue twister [Para 0314 - classifying component 4708 and/or an index component 4710, and further according to media content corresponding to the phrases, words, and/or images that meet a set of classification criteria].

As to dependent claim 8, Dohring, Tesch and August teach the system of claim 1. 
Dohring further teaches: wherein the first sound identifier is selected in association with a sound placement [Para 0006 - the taxonomy of phonemes includes one or more phonemes represented by a single letter and phonemes represented by one or more combinations of letters. In some embodiments, the taxonomy of phonemes includes one or more phonemes represented by an image. In some embodiments, the taxonomy is comprehensive and comprises all sounds in said target language. In other embodiments, the taxonomy is partial and comprises some of the sounds in said target language. In some embodiments, the module for providing an interface for practicing each said phoneme in the context of the beginning, middle, and end of words identifies a selected phoneme in each word. In some embodiments, the visual representation of each said word comprises a photographic image or an illustration].

As to dependent claim 9, Dohring, Tesch and August teach the system of claim 8. 
Dohring further teaches: wherein the sound placement comprises one of an initial placement, a medial placement, or a final placement [Para 0031 - the engine includes a software module that provides an interface for learners to practice each phoneme in the context of the beginning, middle, and end of words of the target language].

As to dependent claim 10, Dohring, Tesch and August teach the system of claim 8. 
Dohring further teaches: wherein each video file in the set of video files illustrates the person demonstrating at least how the first sound is pronounced when positioned at the sound placement in a word or phrase [Para 0031 - the interfaces allow learners to optionally access visual and auditory representations of each word and each phoneme in each word].

As to dependent claim 11, Dohring, Tesch and August teach the system of claim 10. 
[Para 0006 - the taxonomy of phonemes includes one or more phonemes represented by a single letter and phonemes represented by one or more combinations of letters. In some embodiments, the taxonomy of phonemes includes one or more phonemes represented by an image. In some embodiments, the taxonomy is comprehensive and comprises all sounds in said target language. In other embodiments, the taxonomy is partial and comprises some of the sounds in said target language. In some embodiments, the module for providing an interface for practicing each said phoneme in the context of the beginning, middle, and end of words identifies a selected phoneme in each word. In some embodiments, the visual representation of each said word comprises a photographic image or an illustration].

As to dependent claim 12, Dohring, Tesch and August teach the system of claim 11. 
Dohring further teaches: wherein each video file in the subset of the set of video files illustrates the person demonstrating at least how the first sound is pronounced when positioned at the sound placement in a word or phrase and how the second sound is pronounced when positioned at the second sound placement in the word or phrase [Para 0006 - said interface allows a learner to optionally access a visual representation and an auditory representation of each said phoneme in said taxonomy; and a software module for providing an interface for practicing each said phoneme in the context of the beginning, middle, and end of words of said target language, wherein said interface allows a learner to optionally access a visual and an auditory representation of each said word and each said phoneme in each said word].

As to dependent claim 13, Dohring, Tesch and August teach the system of claim 1. 
Dohring further teaches: wherein the executable instructions, when executed, further cause the one or more hardware processors to at least:
process a third indication that a phoneme structure is selected [Para 0041 - the software module for practicing phonemes provides a GUI that allows a learner to optionally access a visual representation and an auditory representation of each said phoneme in said taxonomy]; and 
Tesch further teaches:
update the second updated user interface data to form third updated user interface data, wherein the third updated user interface data, when executed by the user device, causes the user device to update the user interface to display a second subset of the set of video files in place of the subset of the set of video files, wherein each video file in the second subset is associated with the phoneme structure and illustrates the person demonstrating at least how the first and second sounds are pronounced in a word or phrase corresponding to the phoneme structure [Para 0315 - the viewing pane 4800 can include various classifications of various media content portions, such as alphabetical orderings, popular phrases, type of content or categories of words or phrases, quotes, effects and others, which can include sound effects, stage effects, video effects, dramatic actions, expressions, shouts, etc.,].

As to dependent claim 14, Dohring, Tesch and August teach the system of claim 13. 
Dohring further teaches: wherein the phoneme structure comprises an ordered placement of consonant and vowel sounds [Para 0038 - a taxonomy of phonemes organizes phonemes into groups for consonant phonemes and vowel phonemes].

As to dependent claim 16, Dohring, Tesch and August teach the system of claim 1. 
Dohring further teaches: wherein the set of video files is ordered alphabetically according to a title field associated with each video file in the set [Para 0315 - the viewing pane 4800 can include various classifications of various media content portions, such as alphabetical orderings, popular phrases, type of content or categories of words or phrases, quotes, effects and others, which can include sound effects, stage effects, video effects, dramatic actions, expressions, shouts, etc.; Para 0176 - The set of classifications include at least one of a set of themes selected to correspond with the set of media content, a set of song artists selected to correspond with the set of media content, a set of actors selected to correspond with the set of media content, a set of titles].

As to independent claims 17 and 20, the claims are substantially similar to claim 1 and are rejected on the same ground. 

As to dependent claim 18, the claim is substantially similar to claim 2 and is rejected on the same ground. 

As to dependent claim 19, the claim is substantially similar to claim 3 and is rejected on the same ground. 

Claim 15 is rejected under 35 U.S.C. 103 as being unpatentable over Dohring in view of Tesch and August, further in view of Kalinowski et al. (US Patent Application 2005/0255430 ; hereinafter Kalinowski).  

As to dependent claim 15, Dohring, Tesch and August teach the system of claim 13. 
Dohring, Tesch and August do not appear to teach: wherein the first video file comprises a transparent overlay showing at least one of movement of an inner facial bone structure of a speaker, movement and placement of a tongue of the speaker, or teeth of the speaker.
However, Kalinowski teaches in the same field of endeavor:
wherein the first video file comprises a transparent overlay showing at least one of movement of an inner facial bone structure of a speaker, movement and placement of a tongue of the speaker, or teeth of the speaker [Para 0028 - Animated sound profile screen 116 comprise a partially hidden facial profile 118 of a simulated person 120. Partially hidden facial profile 118 depicts the internal position and orientation of upper jaw 122, lower jaw 124, upper teeth 126, lower teeth 128, upper lip 130, lower lip 132, tongue 134 and throat 136. Interactive program 102 contains an audio file corresponding to the selected sound tab, in the present case, "mom" from sound tab 114a such that the partially hidden facial profile 118 essentially "speaks" the word mom as the upper jaw 122, lower jaw 124, upper teeth 126, lower teeth 128, upper lip 130, lower lip 132, tongue 134 and throat 136 move in conjunction with the sound of the word].
It would have been obvious to one of ordinary skill in art, having the teachings of Dohring, Tesch, August and Kalinowski at the time of filing, to modify a digital processing device and a computer program that creates a language phoneme practice engine disclosed by Dohring  and a system for generating media content portions that match message input having words or phrases  taught by Tesch and interactive language instruction taught by August to include the concept of a speech instruction method and apparatus  taught by Kalinowski to allow a user to selectively choose the particular sound to be practiced [Kalinowski, Para 0012].
One of the ordinary skill in the art wanted to be motivated to include the concept of a speech instruction method and apparatus  taught by Kalinowski to allow a user to selectively choose the particular sound to be practiced [Kalinowski, Para 0012].

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Marmorstein et al. (US Patent Application US 2005/0048449 A1) – teaches a system and method for language instruction.
  
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  


Any inquiry concerning this communication or earlier communications from the examiner should be directed to SANG H KIM whose telephone number is (571)270-5285.  The examiner can normally be reached on M-F 9am-6pm.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kavita Stanley can be reached on (571) 272-8352.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.