DETAILED ACTION
	This action is responsive to applicant’s supplemental communication filed 05/19/2022.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Status of the Claims
	Claims 1, 4, 6-7, 11-12, and 14-16 are rejected under 35 U.S.C. 103.

Response to Arguments
	Due to the amendments, the non-statutory double patenting rejections made in the prior office action over co-pending application 16/983,341 have been withdrawn.

	Applicants arguments in the Remarks filed 05/19/2022 have been fully considered but are respectfully moot given the new grounds for rejection necessitated by the amendment.

Applicant argues at the end of Page 14 of the Remarks filed 05/11/2022 that Goel does not describe that an analysis data, which includes a time series of notes played in a first period and a time series of notes that are expected to be played in a second period, are input to the RNN for generating control data to control a behavior of a virtual object. The examiner respectfully disagrees. In response to applicant's arguments against the references individually, one cannot show nonobviousness by attacking references individually where the rejections are based on combinations of references.  See In re Keller, 642 F.2d 413, 208 USPQ 871 (CCPA 1981); In re Merck & Co., 800 F.2d 1091, 231 USPQ 375 (Fed. Cir. 1986).
In this case, and as discussed in part by the applicant on Pages 13-14 of the Remarks, Goel teaches producing a control output for a virtual avatar from a trained machine learning model based on input musical data. Ryyanen teaches predicting a next sequence of musical notes based on a first sequence of musical notes. Ryyanen therefore teaches musical data which would be compatible with the learned model of Goel and would yield predictable results. It therefore would have been obvious for the musical data of Ryyanen, which includes both a current period of notes and a future period of notes, to be input into the learned model of Goel for producing an animation of a virtual avatar.


Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claims 1, 4, 6-7, 11-12, and 14-16 are rejected under 35 U.S.C. 103 as being unpatentable over Ryyanen (US 2019/0156807 A1) in view of Takahashi (US 2016/0104469 A1) and further in view of Goel (US 2019/0043239 A1), Villa (US 2009/0100988 A1), Rigiroli (US 10,535,174 B1), and Kishi (US 2019/0026932 A1).

Regarding Claim 1, Ryyanen teaches an information processing method comprising: generating performance data from a sound signal of a sound sounded in a performance of a performer… (“signals of a plurality of the instruments 110 are combined to the received audio signal. The combining is performed e.g. acoustically by capturing with one microphone sound produced by plural instruments… The real-time audio signal of the played music is received” Paragraphs 0052-53. See Figure 2 step 210. Performance data from one or more instruments is captured using an input sensor, such as a microphone and performance data is generated by combining the captured audio signals from the plurality of instruments, which are being played by at least one performer.)
generating analysis data based on the sequentially supplied performance data, wherein the analysis data includes a time series of notes played in a first period (“recognising 230 from the real-time audio signal at least one of chords; notes; and drum sounds and accordingly detecting repetitions in the played music” Paragraph 0049. See Figure 2 steps 220-230. See Figure 4 and Paragraph 0068, which illustrates an output of the analysis represented by a timeline. A first period precedes a current time indicated on the representation. A time series of notes, including a repeated sequence of notes, is displayed within the first period. See Paragraphs 0060-64, which discusses an algorithm for analyzing the performance data.)
and a time series of notes that are expected to be played in a second period; (“The predicting of the at least one of chords; notes; and drum sounds can be performed by detecting self-similarity in the played music” Paragraph 0056. See Figure 2 step 240. See Figure 4 and Paragraphs 0068-69, which illustrates an output of the analysis represented by a timeline. A second period occurs after a current time indicated on the representation. A time series of notes that are expected to be played are displayed within the second period.)
Ryyanen does not teach sequentially supplyinq the performance data to a performance device for an automatic performance of the performance device.
However, Takahashi, which is also directed to analysis of musical data, teaches sequentially supplying the performance data to a performance device for an automatic performance of the performance device (“The MIDI message and the date/time information are performance information of a player, and correspond to a result of the performance of the player. Besides, the control section 101 controls the communication section 105 to acquire a MIDI message, date/time information and the like stored in the server device 20. The control section 101 can also conduct an automatic performance by controlling the drive section 108 in accordance with MIDI messages and date/time information” Paragraph 0022. See Figure 2, which shows an embodiment of an automatic performance device. Messages can be sent to a driver on the system in order to conduct an automatic performance.)
	Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to modify the acquiring and prediction of musical notes from audio data taught by Ryyanen by incorporating the method of sending messages to an automatic performance device in order to conduct an automatic performance taught by Takahashi. Since Ryyanen also teaches use of MIDI signals as outputs (Paragraphs 0015, 45, 53, 73-74), the combination would yield predictable results. It would have been obvious to provide the acquired performance data to an automatic performance device for the device to automatically play the musical performance. Such an implementation would also further the goal taught by Ryyanen (Paragraph 0068) of assisting an amateur musician in learning a musical piece.
	Ryyanen in view of Takahashi does not teach generating control data to control a behavior of a virtual object by inputting the analysis data into a learned model… and controlling, based on the control data, a movement of a virtual object such that the movement of the virtual object is operatively associated with the automatic performance of the performance device, wherein the virtual object represents a performer.
However, Goel, which is directed to animating an avatar representative of a performer, teaches generating control data to control a behavior of a virtual object by inputting the analysis data into a learned model, (“a biomechanical model simulates human movements to create the avatar movement(s) in a manner that displays the enacted emotion, in which the model includes details associated with particular musical styles, particular tempos and/or the particular emotion of the musician… the avatar response generator 100 retrieves audio data from at least one of the musicians 102 and/or the audio data storage 104 (the audio source selected based upon an input to the user interface 106) and invokes a machine learning model trained by the avatar response generator 100. In such examples, the machine learning model generates an audio and/or visual response to be applied to at least one of the avatars 108” Paragraphs 0019-22. Control data is generated from audio data, which are MIDI tracks, as discussed in Paragraph 0021. The MIDI performance data is first processed [analyzed] before being input into the machine learning model for generating the control data. 
Also see Paragraph 0009 of provisional application 62/614,477 (hereinafter “the provisional application”), which teaches that avatar movements are controlled based on musical input, and Paragraphs 0013-0015, which describe the process of acquiring audio data, extracting musical features, and controlling the avatar. Paragraphs 0013-15 describes a LSTM network which is trained on one more audio input samples to extract features used for controlling the behavior of an animated avatar.)
and controlling, based on the control data, a movement of a virtual object (“Examples disclosed herein modify and/or otherwise control (e.g., generate) one or more audio and/or visual characteristics of an avatar based on a musical input (e.g., input from a musical instrument digital interface (MIDI) protocol/interface) associated with at least one of stored musical data and/or a live musical presentation passed through a model trained utilizing machine learning techniques” Paragraph 0016. “motion profiles 114 such as an example first motion profile 114A and/or an example second motion profile 114B are associated with the example first avatar 108A and the example second avatar 108B, respectively, and generated via movement instructions generated by the example avatar response generator 100” Paragraph 0027. An avatar is a virtual object controlled using the generated control data. Also see Paragraphs 0009 and 0015 of the provisional application.)
such that the movement of the virtual object is operatively associated with the automatic performance of the performance device, wherein the virtual object represents a performer. (“The example avatars 108 of the example avatar environment 101 are digital representations of musicians. In some examples, the avatar(s) 108A, 108B include a graphical representation of a musician in addition to an audio representation of the instrument played by the musician. In such examples, one or more characteristics of the graphical representation (e.g., positioning of the avatars 108, motion of the avatars 108, etc.) of the avatars 108 can correspond to one or more characteristics of the audio representation of the instrument played by the musician.” Paragraph 0025. The movements of the avatar are determined based on music being played in real-time by musicians. In view of Takahashi, which teaches the automatic performance device, it would have been obvious for the generated movements to therefore correspond to a musician that would be playing the automatic performance device based on the analyzed musical performance data. 
Also see Paragraph 0009 of the provisional application: “a virtual avatar responds to the music input by enacting an action of playing an instrument (e.g., a guitar) as a response to the music input. In some examples, a biomechanical model simulates human movements to create the avatar in a manner that includes details associated with particular musical styles, particular tempos and/or particular emotions of the musician.” The avatar represents a musician and the movement of the avatar is operatively associated with the playing of a musical instrument. The automatic performance device taught by Takahashi is a musical instrument.)
Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to modify the analysis of musical performance data and the playing of an automatic performance device taught by Ryyanen in view of Takahashi by incorporating the display and control of a virtual avatar representative of a performer as taught by Goel. Since Goel (Paragraphs 0014-0018) teaches analysis of acquired performance data using a neural network in order to control the movement of the avatar, it would have been obvious for the performance and analysis data taught by Ryyanen to be input into the control system for animating an avatar, along with the automatic performance device of Takahashi. It would have been further obvious for the avatar of Goel, which is representative of a musician, to be representative of a musician that would be playing the automatic performance device.
Ryyanen in view of Takahashi and Goel do not explicitly teach controlling a display device to display the virtual object concurrently with the automatic performance of the performance device.
However, Villa, which is directed to control of audio and video effects for a musical instrument, teaches controlling a display device to display the virtual object concurrently with the automatic performance of the performance device. (“the image 810 comprises avatars for each of the users 804, 806 and 808. By connecting to the network 802, the users 804, 806 and 808 may participate in a virtual jamming session where each user may control the sound produced by his or her musical instrument and the specific movements of his or her avatar.” Paragraph 0069. A virtual object such as an avatar representative of a user playing a musical instrument is displayed on a display concurrently with the musical instrument being played by the user in a performance. See Figure 5A and Paragraph 0054: “The console 106 generates an image 502 that is representative of the user 102 playing the guitar 104. The image 502 is displayed on the display 112. In the illustrated embodiment, the image 502 comprises an avatar playing a representation of the guitar.” Also see Paragraphs 0059-63, which teach controlling an avatar based on the sound produced by a musical instrument, including manipulating the movement of the avatar in response to a specific note or chord being played.)
Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to modify the animation of a virtual avatar representative of a performer and a musical instrument being automatically played taught by Ryyanen in view of Takahashi and Goel by concurrently displaying the virtual avatar during a musical performance by a user as taught by Villa. In view of Takahashi, it would have been obvious for the musical performance to be provided by an automatic performance device and the avatar to be displayed concurrently with the performance of the automatic performance device. Since Villa (Paragraphs 59-63) is also directed to controlling a virtual avatar based on a sound sounded in a performance, the combination would have yielded predictable results. As taught by Villa (Paragraphs 66, 71), controlling the movement of an image responsive to the necessary manipulations required to produce the sound being played may be performed in real-time, providing a richer virtual performance.
Ryyanen in view of Takahashi, Goel, and Villa does not teach a plurality of control points represents a skeleton of the virtual object, and the control data includes… coordinates indicating a position of each of the plurality of control points; minimizing a temporal change between a first control point of the plurality of control points and a second control point of the plurality of control points, based on an optimization of a plurality of coefficients of the learned model; wherein the learned model outputs the control data based on the optimized plurality of coefficients.
	However, Rigiroli, which is directed to a realistic animation of a character model, teaches a plurality of control points represents a skeleton of the virtual object, (“FIG. 3 illustrates an embodiment of positions of a character model 150. The character model 150 can be movable and is illustrated in a first pose 160A and a second pose 160B. The character model 150 can include a mesh (not shown) and the illustrated skeleton frame, which includes a plurality of elements, such as joints 154 and rigid bodies 156.” Column 8:58-65. See Figure 3, which shows a skeleton comprising a plurality of control points, such as joints 154.)
and the control data includes… coordinates indicating a position of each of the plurality of control points; (“The IK system can iteratively solve the problem in order to arrive at the final position of each element of the character model 170, as illustrated in FIG. 4D.” Column 9:35-40. The IK system outputs control data which determines the new positions of the control points that make up the character model.)
minimizing a temporal change between a first control point of the plurality of control points and a second control point of the plurality of control points, (“In order to reduce the computation time and/or generate more realistic poses, the IK system can generate an estimated pose for the model, such as illustrated in FIG. 4C. In FIG. 4C, joint 173 moves to predictive position 178C. The predictive pose can be calculated based on a set of rules that define how a joint moves based on the movement of another joint.” Column 9:37-40. Figure 4B shows the end position of a control point with respect to another control point. Figure 4C shows that the change between the two points is reduced based on various rules and constraints in order to create a more realistic movement of the body part.
“each joint can have a defined range of movement. A joint can be coupled with one or more connectors. Generally, the connectors are rigid segments that are a defined length.” Column 10:17-20. The connector is an interval between the first control point and the second control point. Since the segments are rigid, they remain the same length during motion of the character model.
“During the final pose generation phase 920, the particle solver 924 can interact with the DNN 922 in order to generate a pose that satisfies the constraints associated with the respective elements of the character model. The iterative calculations performed by the particle solver can be done in accordance with one or more DNNs, which can help to smooth out the animation so that it becomes more realistic.” Column 15:49-56. The DNN, or machine learning model, therefore minimizes the changes between the joints of the skeletal model in order to produce a smooth animation.)
based on an optimization of a plurality of coefficients of the learned model; wherein the learned model outputs the control data based on the optimized plurality of coefficients (“The parameters 862 and weights 864 can be updated and modified during the model generation phase to generate the prediction model 860. In some embodiments, weights may be applied to the parameter functions or prediction models themselves. For example, the mathematical complexity or the number of parameters included in a particular prediction model 860 may affect a weight for the particular prediction model 860, which may impact the generation of the model and/or a selection algorithm or a selection probability that the particular prediction model 860 is selected.” Column 12:59-67. “For example, FIG. 9A illustrates a DNN 912 associated with the torso of the character model that uses shoulder and chest joint positions in order to generate joint positions for the spine, neck, and collar. The nodes within the DNN can generate the output joint positions by applying the parameters, constraints, and weights determined during the model generation process to the received input data” Column 15:20-25. The weights are equivalent to the claimed coefficients. Since the weights are being updated to produce the best prediction of the updated position of a virtual avatar, the weights are being optimized. The weights are then applied to the learned model to produce an output.)
	Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to modify the machine learning model for producing an animation of an avatar responsive to musical input data taught by Goel by incorporating the method of updating coefficients of the machine learning model in order to reduce the variation between the joints of a skeletal model as taught by Rigiroli. Since both references teach machine learning models for animating an avatar, the combination would have yielded predictable results. As taught by Rigiroli (Column 9:60-63), such a process “can be helpful in better approximating realistic poses.”
Ryyanen in view of Takahashi, Goel, Villa, and Rigiroli does not teach wherein the virtual object is displayed in a two-dimensional coordinate space… and the control data includes normalized coordinates.
However, Kishi, which is directed to a CG character model animated according to a tempo, teaches wherein the virtual object is displayed in a two-dimensional coordinate space… and the control data includes normalized coordinates. (“When a CG model is placed in three-dimensional space, the model is modeled on the coordinate system for the model (local coordinate system), with reference to the location of the root. Then, the modeled CG model is mapped onto a coordinate system representing the whole three-dimensional space… The coordinates of the components (joint points, bones, etc.) may be expressed as absolute coordinates or relative coordinates in three-dimensional space, or may be expressed as relative angles of predetermined bones with respect to a predetermined joint point. For example, when the coordinates are represented using relative angles of bones, the reference posture of the CG model may be made a posture to resemble the letter T, as shown in FIG. 4A, or the angle of each bone in the posture may be made 0°” Paragraphs 0070-71. The coordinates of the avatar model are normalized with respect to a reference position. While the avatar is displayed in a three-dimensional space, mapping the avatar onto a two-dimensional space would have been obvious to one of ordinary skill in the art.)
	Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to modify the movement of an avatar representing a performer according to musical notes corresponding to obtained real-time performance data taught by Ryyanen in view of Takahashi, Goel, Villa, and Rigiroli by displaying the avatar in a two-dimensional coordinate space and normalizing the coordinates used to represent the avatar as taught by Kishi. Since Kishi is also directed to animating an avatar responsive to musical input data, the combination would have yielded predictable results. Kishi is also directed to creating an animation with natural movements (Paragraph 0029) similar to Rigiroli. Since Rigiroli also teaches a skeletal model of an animated avatar with joint positions that move responsive to an input, it would have been obvious for the joint positions to be normalized with respect to some reference.

Regarding Claim 4, Ryyanen in view of Takahashi, Goel, Villa, Rigiroli, and Kishi further teaches wherein the control data is data for controlling the movement of the virtual object (Goel, “a biomechanical model simulates human movements to create the avatar movement(s) in a manner that displays the enacted emotion, in which the model includes details associated with particular musical styles, particular tempos and/or the particular emotion of the musician.” Paragraph 0019. Control data for controlling the movement of an avatar is generated by a biomechanical model.)
at a time of playing a musical instrument (Ryyanen, “The producing of the real-time output includes displaying a timeline with indication of events placed on the timeline such that the timeline comprises several rows on the screen” Paragraph 0068. The analysis of the audio signal captured in real-time from a plurality of instruments (Figure 1) is also performed in real time, i.e. at a time of playing a musical instrument, and an output is provided in real-time. 
Paragraph 0018 of Goel also teaches the musical instrument being played in real time. Also see the provisional application: Paragraph 0009, “a virtual avatar responds to the music input by enacting an action of playing an instrument,” and Paragraphs 0011 and 0013, which describe receiving audio input from a microphone. Receiving audio input from a microphone would occur in real time or “at a time of playing a musical instrument”, which the virtual avatar is being controlled to imitate.
Paragraph 0066 of Villa also teaches real-time manipulation of a virtual avatar in response to received sound data.)
	Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art for the movement of the virtual object according to analyzed musical notes taught by Goel to be generated at a time of playing of a musical instrument given the teachings of Ryyanen. Ryyanen (Paragraph 0068) further teaches an advantage of producing a real-time output with a visualization is that such an implementation would “allow an amateur musician to play along with a song even though they would not know the song in advance or would not be able to predict "by ear" what should be played at a next time instant.” Producing a virtual representation similar to Goel or Villa would therefore help an amateur musician learn how to perform the musical piece. 

Regarding Claim 6, Ryyanen teaches an information processing device comprising: a control device including at least one processor configured to (See Figure 5 processor 520 and Paragraph 0074.)
generate performance data from a sound signal of a sound sounded in a performance of a performer… (“signals of a plurality of the instruments 110 are combined to the received audio signal. The combining is performed e.g. acoustically by capturing with one microphone sound produced by plural instruments… The real-time audio signal of the played music is received” Paragraphs 0052-53. See Figure 2 step 210. Performance data from one or more instruments is captured using an input sensor, such as a microphone.)
generate analysis data based on the sequentially supplied performance data, wherein the analysis data includes a time series of notes played in a first period (“recognising 230 from the real-time audio signal at least one of chords; notes; and drum sounds and accordingly detecting repetitions in the played music” Paragraph 0049. See Figure 2 steps 220-230. See Figure 4 and Paragraph 0068, which illustrates an output of the analysis represented by a timeline. A first period precedes a current time indicated on the representation. A time series of notes, including a repeated sequence of notes, is displayed within the first period. See Paragraphs 0060-64, which discusses an algorithm for analyzing the performance data.)
and a time series of notes that are expected to be played in a second period; (“The predicting of the at least one of chords; notes; and drum sounds can be performed by detecting self-similarity in the played music” Paragraph 0056. See Figure 2 step 240. See Figure 4 and Paragraphs 0068-69, which illustrates an output of the analysis represented by a timeline. A second period occurs after a current time indicated on the representation. A time series of notes that are expected to be played are displayed within the second period.)
Ryyanen does not teach sequentially supply the performance data to a performance device for an automatic performance of the performance device.
However, Takahashi, which is also directed to analysis of musical data, teaches sequentially supply the performance data to a performance device for an automatic performance of the performance device (“The MIDI message and the date/time information are performance information of a player, and correspond to a result of the performance of the player. Besides, the control section 101 controls the communication section 105 to acquire a MIDI message, date/time information and the like stored in the server device 20. The control section 101 can also conduct an automatic performance by controlling the drive section 108 in accordance with MIDI messages and date/time information” Paragraph 0022. See Figure 2, which shows an embodiment of an automatic performance device. Messages can be sent to a driver on the system in order to conduct an automatic performance.)
	Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to modify the acquiring and prediction of musical notes from audio data taught by Ryyanen by incorporating the method of sending messages to an automatic performance device in order to conduct an automatic performance taught by Takahashi. Since Ryyanen also teaches use of MIDI signals as outputs (Paragraphs 0015, 45, 53, 73-74), the combination would yield predictable results. It would have been obvious to provide the acquired performance data to an automatic performance device for the device to automatically play the musical performance. Such an implementation would also further the goal taught by Ryyanen (Paragraph 0068) of assisting an amateur musician in learning a musical piece.
Ryyanen in view of Takahashi does not teach generate control data to control a behavior of a virtual object by inputting the analysis data into a learned model; and control, based on the control data, a movement of a virtual object such that the movement of the virtual object is operatively associated with the automatic performance of the performance device, wherein the virtual object represents a performer.
However, Goel, which is directed to animating an avatar representative of a performer, teaches generate control data to control a behavior of a virtual object by inputting the analysis data into a learned model, (“a biomechanical model simulates human movements to create the avatar movement(s) in a manner that displays the enacted emotion, in which the model includes details associated with particular musical styles, particular tempos and/or the particular emotion of the musician… the avatar response generator 100 retrieves audio data from at least one of the musicians 102 and/or the audio data storage 104 (the audio source selected based upon an input to the user interface 106) and invokes a machine learning model trained by the avatar response generator 100. In such examples, the machine learning model generates an audio and/or visual response to be applied to at least one of the avatars 108” Paragraphs 0019-22. Control data is generated from audio data, which are MIDI tracks, as discussed in Paragraph 0021. The MIDI performance data is first processed [analyzed] before being input into the machine learning model for generating the control data. 
Also see Paragraph 0009 of provisional application 62/614,477 (hereinafter “the provisional application”), which teaches that avatar movements are controlled based on musical input, and Paragraphs 0013-0015, which describe the process of acquiring audio data, extracting musical features, and controlling the avatar. Paragraphs 0013-15 describes a LSTM network which is trained on one more audio input samples to extract features used for controlling the behavior of an animated avatar.)
and control, based on the control data, a movement of a virtual object (“Examples disclosed herein modify and/or otherwise control (e.g., generate) one or more audio and/or visual characteristics of an avatar based on a musical input (e.g., input from a musical instrument digital interface (MIDI) protocol/interface) associated with at least one of stored musical data and/or a live musical presentation passed through a model trained utilizing machine learning techniques” Paragraph 0016. “motion profiles 114 such as an example first motion profile 114A and/or an example second motion profile 114B are associated with the example first avatar 108A and the example second avatar 108B, respectively, and generated via movement instructions generated by the example avatar response generator 100” Paragraph 0027. An avatar is a virtual object controlled using the generated control data. Also see Paragraphs 0009 and 0015 of the provisional application.)
such that the movement of the virtual object is operatively associated with the automatic performance of the performance device, wherein the virtual object represents a performer. (“The example avatars 108 of the example avatar environment 101 are digital representations of musicians. In some examples, the avatar(s) 108A, 108B include a graphical representation of a musician in addition to an audio representation of the instrument played by the musician. In such examples, one or more characteristics of the graphical representation (e.g., positioning of the avatars 108, motion of the avatars 108, etc.) of the avatars 108 can correspond to one or more characteristics of the audio representation of the instrument played by the musician.” Paragraph 0025. The movements of the avatar are determined based on music being played in real-time by musicians. In view of Takahashi, which teaches the automatic performance device, it would have been obvious for the generated movements to therefore correspond to a musician playing the automatic performance device based on the analyzed musical performance data.
Also see Paragraph 0009 of the provisional application: “a virtual avatar responds to the music input by enacting an action of playing an instrument (e.g., a guitar) as a response to the music input. In some examples, a biomechanical model simulates human movements to create the avatar in a manner that includes details associated with particular musical styles, particular tempos and/or particular emotions of the musician.” The avatar represents a musician and the movement of the avatar is operatively associated with the playing of a musical instrument. The automatic performance device taught by Takahashi is a musical instrument.)
Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to modify the analysis of musical performance data and the playing of an automatic performance device taught by Ryyanen in view of Takahashi by incorporating the display and control of a virtual avatar representative of a performer as taught by Goel. Since Goel (Paragraphs 0014-0018) teaches analysis of acquired performance data using a neural network in order to control the movement of the avatar, it would have been obvious for the performance and analysis data taught by Ryyanen to be input into the control system for animating an avatar, along with the automatic performance device of Takahashi. It would have been further obvious for the avatar of Goel, which is representative of a musician, to be representative of a musician playing the automatic performance device.
Ryyanen in view of Takahashi and Goel do not explicitly teach control a display device to display the virtual object concurrently with the automatic performance of the performance device.
However, Villa, which is directed to control of audio and video effects for a musical instrument, teaches controlling a display device to display the virtual object concurrently with the automatic performance of the performance device. (“the image 810 comprises avatars for each of the users 804, 806 and 808. By connecting to the network 802, the users 804, 806 and 808 may participate in a virtual jamming session where each user may control the sound produced by his or her musical instrument and the specific movements of his or her avatar.” Paragraph 0069. A virtual object such as an avatar representative of a user playing a musical instrument is displayed on a display concurrently with the musical instrument being played by the user in a performance. See Figure 5A and Paragraph 0054: “The console 106 generates an image 502 that is representative of the user 102 playing the guitar 104. The image 502 is displayed on the display 112. In the illustrated embodiment, the image 502 comprises an avatar playing a representation of the guitar.” Also see Paragraphs 0059-63, which teach controlling an avatar based on the sound produced by a musical instrument, including manipulating the movement of the avatar in response to a specific note or chord being played.)
Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to modify the animation of a virtual avatar representative of a performer and a musical instrument being automatically played taught by Ryyanen in view of Takahashi and Goel by concurrently displaying the virtual avatar during a musical performance by a user as taught by Villa. In view of Takahashi, it would have been obvious for the musical performance to be provided by an automatic performance device and the avatar to be displayed concurrently with the performance of the automatic performance device. Since Villa (Paragraphs 59-63) is also directed to controlling a virtual avatar based on a sound sounded in a performance, the combination would have yielded predictable results. As taught by Villa (Paragraphs 66, 71), controlling the movement of an image responsive to the necessary manipulations required to produce the sound being played may be performed in real-time, providing a richer virtual performance.
Ryyanen in view of Takahashi, Goel, and Villa does not teach a plurality of control points represents a skeleton of the virtual object, and the control data includes… coordinates indicating a position of each of the plurality of control points; minimizing a temporal change between a first control point of the plurality of control points and a second control point of the plurality of control points, based on an optimization of a plurality of coefficients of the learned model; wherein the learned model outputs the control data based on the optimized plurality of coefficients.
	However, Rigiroli, which is directed to a realistic animation of a character model, teaches a plurality of control points represents a skeleton of the virtual object, (“FIG. 3 illustrates an embodiment of positions of a character model 150. The character model 150 can be movable and is illustrated in a first pose 160A and a second pose 160B. The character model 150 can include a mesh (not shown) and the illustrated skeleton frame, which includes a plurality of elements, such as joints 154 and rigid bodies 156.” Column 8:58-65. See Figure 3, which shows a skeleton comprising a plurality of control points, such as joints 154.)
and the control data includes… coordinates indicating a position of each of the plurality of control points; (“The IK system can iteratively solve the problem in order to arrive at the final position of each element of the character model 170, as illustrated in FIG. 4D.” Column 9:35-40. The IK system outputs control data which determines the new positions of the control points that make up the character model.)
minimize a temporal change between a first control point of the plurality of control points and a second control point of the plurality of control points, (“In order to reduce the computation time and/or generate more realistic poses, the IK system can generate an estimated pose for the model, such as illustrated in FIG. 4C. In FIG. 4C, joint 173 moves to predictive position 178C. The predictive pose can be calculated based on a set of rules that define how a joint moves based on the movement of another joint.” Column 9:37-40. Figure 4B shows the end position of a control point with respect to another control point. Figure 4C shows that the change between the two points is reduced based on various rules and constraints in order to create a more realistic movement of the body part.
“each joint can have a defined range of movement. A joint can be coupled with one or more connectors. Generally, the connectors are rigid segments that are a defined length.” Column 10:17-20. The connector is an interval between the first control point and the second control point. Since the segments are rigid, they remain the same length during motion of the character model.
“During the final pose generation phase 920, the particle solver 924 can interact with the DNN 922 in order to generate a pose that satisfies the constraints associated with the respective elements of the character model. The iterative calculations performed by the particle solver can be done in accordance with one or more DNNs, which can help to smooth out the animation so that it becomes more realistic.” Column 15:49-56. The DNN, or machine learning model, therefore minimizes the changes between the joints of the skeletal model in order to produce a smooth animation.)
based on an optimization of a plurality of coefficients of the learned model; wherein the learned model outputs the control data based on the optimized plurality of coefficients (“The parameters 862 and weights 864 can be updated and modified during the model generation phase to generate the prediction model 860. In some embodiments, weights may be applied to the parameter functions or prediction models themselves. For example, the mathematical complexity or the number of parameters included in a particular prediction model 860 may affect a weight for the particular prediction model 860, which may impact the generation of the model and/or a selection algorithm or a selection probability that the particular prediction model 860 is selected.” Column 12:59-67. “For example, FIG. 9A illustrates a DNN 912 associated with the torso of the character model that uses shoulder and chest joint positions in order to generate joint positions for the spine, neck, and collar. The nodes within the DNN can generate the output joint positions by applying the parameters, constraints, and weights determined during the model generation process to the received input data” Column 15:20-25. The weights are equivalent to the claimed coefficients. Since the weights are being updated to produce the best prediction of the updated position of a virtual avatar, the weights are being optimized. The weights are then applied to the learned model to produce an output.)
	Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to modify the machine learning model for producing an animation of an avatar responsive to musical input data taught by Goel by incorporating the method of updating coefficients of the machine learning model in order to reduce the variation between the joints of a skeletal model as taught by Rigiroli. Since both references teach machine learning models for animating an avatar, the combination would have yielded predictable results. As taught by Rigiroli (Column 9:60-63), such a process “can be helpful in better approximating realistic poses.”
Ryyanen in view of Takahashi, Goel, Villa, and Rigiroli does not teach wherein the virtual object is displayed in a two-dimensional coordinate space… and the control data includes normalized coordinates.
However, Kishi, which is directed to a CG character model animated according to a tempo, teaches wherein the virtual object is displayed in a two-dimensional coordinate space… and the control data includes normalized coordinates. (“When a CG model is placed in three-dimensional space, the model is modeled on the coordinate system for the model (local coordinate system), with reference to the location of the root. Then, the modeled CG model is mapped onto a coordinate system representing the whole three-dimensional space… The coordinates of the components (joint points, bones, etc.) may be expressed as absolute coordinates or relative coordinates in three-dimensional space, or may be expressed as relative angles of predetermined bones with respect to a predetermined joint point. For example, when the coordinates are represented using relative angles of bones, the reference posture of the CG model may be made a posture to resemble the letter T, as shown in FIG. 4A, or the angle of each bone in the posture may be made 0°” Paragraphs 0070-71. The coordinates of the avatar model are normalized with respect to a reference position. While the avatar is displayed in a three-dimensional space, mapping the avatar onto a two-dimensional space would have been obvious to one of ordinary skill in the art.)
	Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to modify the movement of an avatar representing a performer according to musical notes corresponding to obtained real-time performance data taught by Ryyanen in view of Takahashi, Goel, Villa, and Rigiroli by displaying the avatar in a two-dimensional coordinate space and normalizing the coordinates used to represent the avatar as taught by Kishi. Since Kishi is also directed to animating an avatar responsive to musical input data, the combination would have yielded predictable results. Kishi is also directed to creating an animation with natural movements (Paragraph 0029) similar to Rigiroli. Since Rigiroli also teaches a skeletal model of an animated avatar with joint positions that move responsive to an input, it would have been obvious for the joint positions to be normalized with respect to some reference.

Regarding Claim 7, Ryyanen in view of Takahashi, Goel, Villa, Rigiroli, and Kishi further teaches wherein the control data is data to control the movement of the virtual object (Goel, “a biomechanical model simulates human movements to create the avatar movement(s) in a manner that displays the enacted emotion, in which the model includes details associated with particular musical styles, particular tempos and/or the particular emotion of the musician.” Paragraph 0019. Control data for controlling the movement of an avatar is generated by a biomechanical model.)
at a time of playing a musical instrument (Ryyanen, “The producing of the real-time output includes displaying a timeline with indication of events placed on the timeline such that the timeline comprises several rows on the screen” Paragraph 0068. The analysis of the audio signal captured in real-time from a plurality of instruments (Figure 1) is also performed in real time, i.e. at a time of playing a musical instrument, and an output is provided in real-time.
Paragraph 0018 of Goel also teaches the musical instrument being played in real time. Also see the provisional application: Paragraph 0009, “a virtual avatar responds to the music input by enacting an action of playing an instrument,” and Paragraphs 0011 and 0013, which describe receiving audio input from a microphone. Receiving audio input from a microphone would occur in real time or “at a time of playing a musical instrument”, which the virtual avatar is being controlled to imitate.)
	Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art for the movement of the virtual object according to analyzed musical notes taught by Goel to be generated at a time of playing of a musical instrument given the teachings of Ryyanen. Ryyanen (Paragraph 0068) further teaches an advantage of producing a real-time output with a visualization is that such an implementation would “allow an amateur musician to play along with a song even though they would not know the song in advance or would not be able to predict "by ear" what should be played at a next time instant.” Producing a virtual representation similar to Goel would therefore help an amateur musician learn how to perform the musical piece. 

Regarding Claim 11, Ryyanen teaches a performance system comprising… a sound collecting device configured to obtain a sound signal of a sound sounded in a performance of a performer; (“signals of a plurality of the instruments 110 are combined to the received audio signal. The combining is performed e.g. acoustically by capturing with one microphone sound produced by plural instruments 110 and/or electrically by combining electric signals representing outputs of different instruments 120. The real-time audio signal of the played music is received e.g. using the internal microphone 122, external microphone 130 and/or an instrument input such as MIDI or electric guitar input.” Paragraphs 0052-53. The real-time audio signal (performance data) is generated from a plurality of instruments being played in real-time, the sounds of which are captured with a microphone or other audio input.)
and an information processing device including at least one processor configured to sequentially obtain performance data including sounding of a musical note on a time axis, wherein the performance data is sequentially obtained based on the sound signal… (“signals of a plurality of the instruments 110 are combined to the received audio signal. The combining is performed e.g. acoustically by capturing with one microphone sound produced by plural instruments… The real-time audio signal of the played music is received” Paragraphs 0052-53. See Figure 2 step 210. The real-time audio signal corresponds to performance data. See Figure 4 and Paragraph 0068, which illustrate an output of the subsequent analysis on a timeline representation. Since the audio signal is captured in real-time, the sounding of the musical notes would occur on a time axis.)
set an analysis period in the obtained performance data, wherein the analysis period includes a predetermined time, (“the detecting of the repetitions in the played music comprises detecting that latest L frames are very similar to a sequence of frames that happened X seconds earlier.” Paragraph 0064. The latest L frames corresponds to an analysis period, with L being a predetermined value.)
a first period preceding the predetermined time, and a second period succeeding the predetermined time, (“repetition may be detected if the similarity is above a given threshold for the pair of representations R at times T and T-X, for the pair at times T-1 and T-X-1, and so forth until the pair at times T-L and T-X-L. When repetition is detected, the next development in the played music can be predicted for coming frames from current time T onwards.” Paragraph 0064. The first period is the time period of “T-X-L” to “T-X”, which precedes the analysis period. The second period succeeding the analysis period is “the coming frames from the current time T onwards”.)
sequentially generate, from the sequentially supplied performance data, analysis data including a time series of musical notes included in the first period (“recognising 230 from the real-time audio signal at least one of chords; notes; and drum sounds and accordingly detecting repetitions in the played music” Paragraph 0049. See Figure 2 steps 220-230. See Figure 4 and Paragraph 0068, which illustrates an output of the analysis represented by a timeline. A first period precedes a current time indicated on the representation. A time series of notes, including a repeated sequence of notes, is displayed within the first period.)
and a time series of musical notes included in the second period, wherein the time series of the musical notes included in the second period is predicted from the time series of the musical notes in the first period, (“The predicting of the at least one of chords; notes; and drum sounds can be performed by detecting self-similarity in the played music” Paragraph 0056. See Figure 2 step 240. See Figure 4 and Paragraphs 0068-69, which illustrates an output of the analysis represented by a timeline. A second period occurs after a current time indicated on the representation. A time series of notes that are expected to be played are displayed within the second period.)
Ryyanen does not teach an automatic performance device … sequentially supply the obtained performance data to the automatic performance device for an automatic performance of the automatic performance device.
However, Takahashi, which is also directed to analysis of musical data, teaches an automatic performance device… sequentially supply the obtained performance data to the automatic performance device for an automatic performance of the automatic performance device (“The MIDI message and the date/time information are performance information of a player, and correspond to a result of the performance of the player. Besides, the control section 101 controls the communication section 105 to acquire a MIDI message, date/time information and the like stored in the server device 20. The control section 101 can also conduct an automatic performance by controlling the drive section 108 in accordance with MIDI messages and date/time information” Paragraph 0022. See Figure 2, which shows an embodiment of an automatic performance device. Messages can be sent to a driver on the system in order to conduct an automatic performance.)
	Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to modify the acquiring and prediction of musical notes from audio data taught by Ryyanen by incorporating the method of sending messages to an automatic performance device in order to conduct an automatic performance taught by Takahashi. Since Ryyanen also teaches use of MIDI signals as outputs (Paragraphs 0015, 45, 53, 73-74), the combination would yield predictable results. It would have been obvious to provide the acquired performance data to an automatic performance device for the device to automatically play the musical performance. Such an implementation would also further the goal taught by Ryyanen (Paragraph 0068) of assisting an amateur musician in learning a musical piece.
Ryyanen in view of Takahashi does not teach a display device… sequentially generate control data to control a behavior of a virtual object by inputting the analysis data into a learned model; and control the display device to display a virtual object… wherein the virtual object represents the performer; and control, based on the control data, a movement of the virtual object such that the movement of the virtual object is operatively associated with the automatic performance of the automatic performance device.
However, Goel, which is directed to animating an avatar representative of a performer, teaches sequentially generate control data to control a behavior of a virtual object by inputting the analysis data into a learned model, (“a biomechanical model simulates human movements to create the avatar movement(s) in a manner that displays the enacted emotion, in which the model includes details associated with particular musical styles, particular tempos and/or the particular emotion of the musician… the avatar response generator 100 retrieves audio data from at least one of the musicians 102 and/or the audio data storage 104 (the audio source selected based upon an input to the user interface 106) and invokes a machine learning model trained by the avatar response generator 100. In such examples, the machine learning model generates an audio and/or visual response to be applied to at least one of the avatars 108” Paragraphs 0019-22. Control data is generated from audio data, which are MIDI tracks, as discussed in Paragraph 0021. The MIDI performance data is first processed [analyzed] before being input into the machine learning model for generating the control data. 
Also see Paragraph 0009 of provisional application 62/614,477 (hereinafter “the provisional application”), which teaches that avatar movements are controlled based on musical input, and Paragraphs 0013-0015, which describe the process of acquiring audio data, extracting musical features, and controlling the avatar. Paragraphs 0013-15 describes a LSTM network which is trained on one more audio input samples to extract features used for controlling the behavior of an animated avatar.)
a display device… and control the display device to display a virtual object… wherein the virtual object represents the performer; (“the graphical portion(s) of the avatars 108 are output via the displays 111. The displays 111 may be, but are not limited to, LCD screens, LED screens, OLED screens, projection screens, any display capable of displaying video” Paragraph 0026. The avatars are representative of musicians, as discussed in Paragraph 0025. Also see Paragraph 0026 of the provisional application, which describes display devices, and Paragraphs 0010 and 0015, which describe that the virtual avatar representative of the musician is displayed.)
and control, based on the control data, a movement of the virtual object (“Examples disclosed herein modify and/or otherwise control (e.g., generate) one or more audio and/or visual characteristics of an avatar based on a musical input (e.g., input from a musical instrument digital interface (MIDI) protocol/interface) associated with at least one of stored musical data and/or a live musical presentation passed through a model trained utilizing machine learning techniques” Paragraph 0016. “motion profiles 114 such as an example first motion profile 114A and/or an example second motion profile 114B are associated with the example first avatar 108A and the example second avatar 108B, respectively, and generated via movement instructions generated by the example avatar response generator 100” Paragraph 0027. An avatar is a virtual object controlled using the generated control data. Also see Paragraphs 0009 and 0015 of the provisional application.)
such that the movement of the virtual object is operatively associated with the automatic performance of the automatic performance device. (“The example avatars 108 of the example avatar environment 101 are digital representations of musicians. In some examples, the avatar(s) 108A, 108B include a graphical representation of a musician in addition to an audio representation of the instrument played by the musician. In such examples, one or more characteristics of the graphical representation (e.g., positioning of the avatars 108, motion of the avatars 108, etc.) of the avatars 108 can correspond to one or more characteristics of the audio representation of the instrument played by the musician.” Paragraph 0025. The movements of the avatar are determined based on music being played in real-time by musicians. In view of Takahashi, which teaches the automatic performance device, it would have been obvious for the generated movements to therefore correspond to a musician playing the automatic performance device based on the analyzed musical performance data.
Also see Paragraph 0009 of the provisional application: “a virtual avatar responds to the music input by enacting an action of playing an instrument (e.g., a guitar) as a response to the music input. In some examples, a biomechanical model simulates human movements to create the avatar in a manner that includes details associated with particular musical styles, particular tempos and/or particular emotions of the musician.” The avatar represents a musician and the movement of the avatar is operatively associated with the playing of a musical instrument. The automatic performance device taught by Takahashi is a musical instrument.)
Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to modify the analysis of musical performance data and the playing of an automatic performance device taught by Ryyanen in view of Takahashi by incorporating the display and control of a virtual avatar representative of a performer as taught by Goel. Since Goel (Paragraphs 0014-0018) teaches analysis of acquired performance data using a neural network in order to control the movement of the avatar, it would have been obvious for the performance and analysis data taught by Ryyanen to be input into the control system for animating an avatar, along with the automatic performance device of Takahashi. It would have been further obvious for the avatar of Goel, which is representative of a musician, to be representative of a musician playing the automatic performance device.
Ryyanen in view of Takahashi and Goel do not explicitly teach control the display device to display a virtual object concurrently with the automatic performance of the automatic performance device
However, Villa, which is directed to control of audio and video effects for a musical instrument, teaches control the display device to display a virtual object concurrently with the automatic performance of the automatic performance device (“the image 810 comprises avatars for each of the users 804, 806 and 808. By connecting to the network 802, the users 804, 806 and 808 may participate in a virtual jamming session where each user may control the sound produced by his or her musical instrument and the specific movements of his or her avatar.” Paragraph 0069. A virtual object such as an avatar representative of a user playing a musical instrument is displayed on a display concurrently with the musical instrument being played by the user in a performance. See Figure 5A and Paragraph 0054: “The console 106 generates an image 502 that is representative of the user 102 playing the guitar 104. The image 502 is displayed on the display 112. In the illustrated embodiment, the image 502 comprises an avatar playing a representation of the guitar.” Also see Paragraphs 0059-63, which teach controlling an avatar based on the sound produced by a musical instrument, including manipulating the movement of the avatar in response to a specific note or chord being played.)
Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to modify the animation of a virtual avatar representative of a performer and a musical instrument being automatically played taught by Ryyanen in view of Takahashi and Goel by concurrently displaying the virtual avatar during a musical performance by a user as taught by Villa. In view of Takahashi, it would have been obvious for the musical performance to be provided by an automatic performance device and the avatar to be displayed concurrently with the performance of the automatic performance device. Since Villa (Paragraphs 59-63) is also directed to controlling a virtual avatar based on a sound sounded in a performance, the combination would have yielded predictable results. As taught by Villa (Paragraphs 66, 71), controlling the movement of an image responsive to the necessary manipulations required to produce the sound being played may be performed in real-time, providing a richer virtual performance.
Ryyanen in view of Takahashi, Goel, and Villa does not teach a plurality of control points represents a skeleton of the virtual object, and the control data includes… coordinates indicating a position of each of the plurality of control points; minimizing a temporal change between a first control point of the plurality of control points and a second control point of the plurality of control points, based on an optimization of a plurality of coefficients of the learned model; wherein the learned model outputs the control data based on the optimized plurality of coefficients.
	However, Rigiroli, which is directed to a realistic animation of a character model, teaches a plurality of control points represents a skeleton of the virtual object, (“FIG. 3 illustrates an embodiment of positions of a character model 150. The character model 150 can be movable and is illustrated in a first pose 160A and a second pose 160B. The character model 150 can include a mesh (not shown) and the illustrated skeleton frame, which includes a plurality of elements, such as joints 154 and rigid bodies 156.” Column 8:58-65. See Figure 3, which shows a skeleton comprising a plurality of control points, such as joints 154.)
and the control data includes… coordinates indicating a position of each of the plurality of control points; (“The IK system can iteratively solve the problem in order to arrive at the final position of each element of the character model 170, as illustrated in FIG. 4D.” Column 9:35-40. The IK system outputs control data which determines the new positions of the control points that make up the character model.)
minimize a temporal change between a first control point of the plurality of control points and a second control point of the plurality of control points, (“In order to reduce the computation time and/or generate more realistic poses, the IK system can generate an estimated pose for the model, such as illustrated in FIG. 4C. In FIG. 4C, joint 173 moves to predictive position 178C. The predictive pose can be calculated based on a set of rules that define how a joint moves based on the movement of another joint.” Column 9:37-40. Figure 4B shows the end position of a control point with respect to another control point. Figure 4C shows that the change between the two points is reduced based on various rules and constraints in order to create a more realistic movement of the body part.
“each joint can have a defined range of movement. A joint can be coupled with one or more connectors. Generally, the connectors are rigid segments that are a defined length.” Column 10:17-20. The connector is an interval between the first control point and the second control point. Since the segments are rigid, they remain the same length during motion of the character model.
“During the final pose generation phase 920, the particle solver 924 can interact with the DNN 922 in order to generate a pose that satisfies the constraints associated with the respective elements of the character model. The iterative calculations performed by the particle solver can be done in accordance with one or more DNNs, which can help to smooth out the animation so that it becomes more realistic.” Column 15:49-56. The DNN, or machine learning model, therefore minimizes the changes between the joints of the skeletal model in order to produce a smooth animation.)
based on an optimization of a plurality of coefficients of the learned model; wherein the learned model outputs the control data based on the optimized plurality of coefficients (“The parameters 862 and weights 864 can be updated and modified during the model generation phase to generate the prediction model 860. In some embodiments, weights may be applied to the parameter functions or prediction models themselves. For example, the mathematical complexity or the number of parameters included in a particular prediction model 860 may affect a weight for the particular prediction model 860, which may impact the generation of the model and/or a selection algorithm or a selection probability that the particular prediction model 860 is selected.” Column 12:59-67. “For example, FIG. 9A illustrates a DNN 912 associated with the torso of the character model that uses shoulder and chest joint positions in order to generate joint positions for the spine, neck, and collar. The nodes within the DNN can generate the output joint positions by applying the parameters, constraints, and weights determined during the model generation process to the received input data” Column 15:20-25. The weights are equivalent to the claimed coefficients. Since the weights are being updated to produce the best prediction of the updated position of a virtual avatar, the weights are being optimized. The weights are then applied to the learned model to produce an output.)
	Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to modify the machine learning model for producing an animation of an avatar responsive to musical input data taught by Goel by incorporating the method of updating coefficients of the machine learning model in order to reduce the variation between the joints of a skeletal model as taught by Rigiroli. Since both references teach machine learning models for animating an avatar, the combination would have yielded predictable results. As taught by Rigiroli (Column 9:60-63), such a process “can be helpful in better approximating realistic poses.”
Ryyanen in view of Takahashi, Goel, Villa, and Rigiroli does not teach wherein the virtual object is displayed in a two-dimensional coordinate space… and the control data includes normalized coordinates.
However, Kishi, which is directed to a CG character model animated according to a tempo, teaches wherein the virtual object is displayed in a two-dimensional coordinate space… and the control data includes normalized coordinates. (“When a CG model is placed in three-dimensional space, the model is modeled on the coordinate system for the model (local coordinate system), with reference to the location of the root. Then, the modeled CG model is mapped onto a coordinate system representing the whole three-dimensional space… The coordinates of the components (joint points, bones, etc.) may be expressed as absolute coordinates or relative coordinates in three-dimensional space, or may be expressed as relative angles of predetermined bones with respect to a predetermined joint point. For example, when the coordinates are represented using relative angles of bones, the reference posture of the CG model may be made a posture to resemble the letter T, as shown in FIG. 4A, or the angle of each bone in the posture may be made 0°” Paragraphs 0070-71. The coordinates of the avatar model are normalized with respect to a reference position. While the avatar is displayed in a three-dimensional space, mapping the avatar onto a two-dimensional space would have been obvious to one of ordinary skill in the art.)
	Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to modify the movement of an avatar representing a performer according to musical notes corresponding to obtained real-time performance data taught by Ryyanen in view of Takahashi, Goel, Villa, and Rigiroli by displaying the avatar in a two-dimensional coordinate space and normalizing the coordinates used to represent the avatar as taught by Kishi. Since Kishi is also directed to animating an avatar responsive to musical input data, the combination would have yielded predictable results. Kishi is also directed to creating an animation with natural movements (Paragraph 0029) similar to Rigiroli. Since Rigiroli also teaches a skeletal model of an animated avatar with joint positions that move responsive to an input, it would have been obvious for the joint positions to be normalized with respect to some reference.

Regarding Claim 12, Ryyanen in view of Takahashi, Goel, Villa, Rigiroli, and Kishi further teaches wherein the at least one processor is further configured to: obtain the sound signal from the sound collecting device (Ryyanen, See Paragraphs 0052-53, Figure 5 and Paragraph 0074. A communication unit 530 communicates with the processor 520 and includes the sound collecting device 532. The real-time audio signal of the played music corresponds to the performance data, which is obtained from the microphone capturing sounds produced by the plurality of instruments. See Figure 2: since analysis (steps 230-240) follows receiving of the audio signal (step 210), an “analysis” component of the code being executed by the processor must receive the audio signal from a “performance control” component.)

Regarding Claim 14, Ryyanen in view of Takahashi, Goel, Villa, Rigiroli, and Kishi further teaches wherein the learned model has learnt a relationship between the analysis data and the control data (Goel, “the aforementioned model is generated utilizing machine learning techniques in connection with a large amount of musical data… musical data can be generated in real time by an individual and/or group of individuals with musical instruments. Using the stored musical data or dynamically generated musical data, machine learning (e.g., deep learning) techniques can be used to generate an audio (e.g., musical) response and/or a visual (e.g., emotional, movement, etc.) response to a portion of the stored data… Once converted, the audio and visual response can be applied to a digital avatar (e.g., an avatar of a musician) for display in real time.” Paragraph 0018. Also see Paragraphs 0020 and 0027: A machine learning model is trained to learn a relationship between the musical data and the visual response, such as movement, of an avatar. Also see Paragraphs 0013 and 0020 of the provisional application, which discuss applying audio inputs to trained neural networks.)

Regarding Claim 15, Ryyanen in view of Takahashi, Goel, Villa, Rigiroli, and Kishi further teaches wherein the learned model includes: a convolutional neural network for generating a feature vector based on the input of the analysis data, wherein the feature vector indicates a feature of the analysis data, (Goel, “The example feature extractor 206 of FIG. 2 retrieves the output of the example machine learning engine 216 and/or the example audio data coder 204 as (musical) note sequences and extracts one or more features contained therein. Features, in some examples, are associated with one or more characteristics (e.g., tempo, note type, octave, note duration, pitch, velocity (e.g., volume), etc.) of the one or more notes (e.g., tones) included in the note sequence.” Paragraph 0039. A machine learning engine analyzes audio data to determine musical note sequences, which are then input into a feature extractor in order to extract features, such as tempo and pitch. Also see Paragraph 0014 of the provisional application, which describes the feature extractor.)
“The example machine learning engine 216 provides a trained model for use by at least one of the example feature extractor 206 and/or the example avatar behavior controller 218 of FIG. 2.” Paragraph 0050. See Paragraph 0013, which discusses use of convolutional neural networks as a machine learning algorithm. Also see Paragraph 0007 of the provisional application, which discusses deep neural networks that use convolution operations. The feature extractor would therefore use a trained machine learning model, such as a CNN.)
and a recurrent neural network for generating the control data based on the feature vector (Goel, “the feature extractor 206 distributes the features to the example biomechanical model engine 220” Paragraph 0039. “The example biomechanical model engine 220 of FIG. 2 applies the emotion of at least one of the avatars 108 as determined by the feature extractor 206 (e.g., retrieved from the example emotional response lookup table 212) to the static 3D model of at least one of the avatars 108 stored in the visual data storage 210 as movement instructions.” Paragraph 0047. Based on an extracted feature, a model generates control data for controlling the movement of an avatar. Also see Paragraph 0015 of the provisional application, which discusses the avatar behavior controller.
“The example machine learning engine 216 provides a trained model for use by at least one of the example feature extractor 206 and/or the example avatar behavior controller 218 of FIG. 2.” Paragraph 0050. See Paragraph 0014, which discusses use of recurrent neural networks as a machine learning algorithm. The avatar behavior controller, which includes the biomechanical model, uses a trained machine learning model, such as an RNN. Also see Paragraph 0012 of the provisional application, which discusses that the avatar animator uses a neural network such as an RNN.)

Regarding Claim 16, Ryyanen in view of Takahashi, Goel, Villa, Rigiroli, and Kishi further teaches variably controlling a timing of output of the performance data to the performance device (Ryyanen, “the respective timing based on the estimated time of the next beat need not be limited to defining the time on the next beat. Instead, the next time to play the predicted development may be timed at an offset of some fraction of the time between beats from the next beat. The offset may be anything from k to I beats, wherein k=-1 and I is greater than or equal to 0, for example 0; N/8, N/16, N/32 wherein N is an integer greater or equal to 1. For example, the offset could be 5/8 or 66/16 beats i.e. more than one beats ahead but not necessarily with the same beat division as the base beat” Paragraph 0066. The timing of the generated real-time output (steps 240-250 of Figure 2) is variable. Also see Paragraph 0073, which teaches that the jamming assistant can produce an output to be played by a synthesizer. It would have been obvious for the output, including the variable timing of the beats of the musical performance, to instead be output to the automatic playing piano of Takahashi.)


Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Kim (US 2017/0206695 A1) teaches animation of 3D characters, including representing the characters with skeletal models with multiple control points. (Figs. 1, 5, ¶ 42, 75)
Pai (US 2016/0203630 A1) teaches minimizing the difference between an animation parameter between a current time step and previous time step. (¶ 146)
Witkin (US 8,358,311 B1) teaches animation, including determining the difference between control points of a model at different time points. (Fig. 3E) 


Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to RAMI RAFAT OKASHA whose telephone number is (571)272-0675. The examiner can normally be reached M-F 9-5 EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kieu Vu can be reached on (571) 272-4057. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/R.R.O./Examiner, Art Unit 2173                                                                                                                                                                                                        

/KIEU D VU/Supervisory Patent Examiner, Art Unit 2173