DETAILED ACTION
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.   
This is in response to applicant’s amendment/response filed on 9/28/2021, which has been entered and made of record. Claims 1, 12, 16 have been amended. Claim(s) 1-5, 7, 10-16 are pending in the application. The claim interpretation under 35 USC 112(f) is maintained. 
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1, 10-12, 16 is/are rejected under 35 U.S.C. 103 as being unpatentable over Goossens et al. (US 20110296324) in view of Szirtes et al. (US 20180158093).
Regarding claim 1, Goossens discloses An apparatus for configuring an avatar responsive to a content (Goossens, fig.1, “[0004] When an avatar of a user is presented on the user interface of a device, the avatar's expressions and/or body language are changed according to the user states that are detected on the device based on occurrences of these different trigger events”), comprising:
an authoring unit (Goossens, fig.2, the avatar server 102) configured to: detect one or more events; categorized one or more of the detected events; and generate and store an event track of the content (Goossens, “[0011] the method further includes the actions of creating and storing a user state definition for the user state based on the first user input, the user state definition specifying the one or more trigger events indicating presence of the user state and referencing the instance of the avatar associated with the user state. [0044] The particular emotion can be defined by the user as a user state. When trigger events indicating the particular emotion are detected on a device associated with the user at a particular moment, the avatar instance associated with that emotion can be used to represent the user at that moment. [0065] A preset user state can be a user emotion, such as "Happy," "Surprised," or "Angry." [0099] the user state can be defined by a photograph of the user showing a particular emotion. For example, the user can upload a photograph showing a big smile to the avatar server, and use the photograph as a trigger event for a "Happy" user state. Subsequently, when a camera on the user device captures a digital image of the user in real -time and the captured image shows the user having the same facial expression as that shown in the uploaded photograph, the avatar server can determine that the "Happy" user state has occurred”. Therefore, the user state in real-time corresponds to an event track).
a user input unit configured to receive user input data and generate a user profile characterizing behavioral patterns of a user, wherein the user profile is updated in response to the user input data received for the user (Goossens, “[0071] in some implementations, the user can specify or modify a presentation theme for the avatar instance. [0077] An extraverted person may adjust the slider to increase the intensity of the expression shown on the preset avatar instance such that the expression on the preset avatar instance ; 
a modelling unit configured to adapt, in response to the user profile, an avatar configuration model defining a plurality of predetermined avatar configurations, wherein each of the plurality of predetermined avatar configurations is respectively mapped to a corresponding event categorisation and the modelling unit is configured to adapt the mapping in response to the user profile (Goossens, “[0003] When a user interacts with others in various communication contexts (e.g., in online chat sessions, emails, etc), the user can sometimes enter textual strings or preset emotional icons ("emoticons") in a text message to reflect his or her current emotional state (e.g., happy, angry, sad, etc.) to other users. [0056] If user A is an extraverted person and user B is an introverted person, user A is likely to have a more exaggerated facial expression than user B when they both feel the same emotion as expressed by the emoticon ":-)". To have their individual avatars more accurately reflect their emotions, user A can create an avatar instance that shows more excitement with a smile (e.g., with enlarged eyes, raised eye brows, and wide open mouth). The two avatar instances are stored in the avatar definition database 106 in association with their respective users, and are referenced by the user state definition triggered by the emoticon ":-)". [0063] Once the trigger events associated with the user state is detected on a user's device, the avatar presented on a user interface of the device can be updated with the instance of the avatar associated with the user state (206). [0089] The user can enter the word, phrase, emoticon, punctuation, or text format as , 
a selecting unit configured to select a predetermined avatar configuration in accordance with the avatar configuration model in response to a respective categorised event of the event track (Goossens, “[0065] FIG. 3 is a flow diagram of an exemplary process 300 for customizing an avatar instance for a user state. In some implementations, the avatar server can provide a number of preset user states that are common among many users. A preset user state can be a user emotion, such as " Happy," "Surprised," or "Angry." The trigger events for these preset user states can be commonly defined for a large number of users. For example, the emoticon ":-)" can be a default trigger event for a present " Happy" user state for all users. [0131] Avatar editing environment 914 can provide the user interfaces for selection and creation of user states and avatar instances described in reference to FIGS. 1-8”. Therefore, the user state corresponds to an event track, and comprises categorized events (e.g., happy, surprised, or angry)); and 
an output generator configured to generate control data to configure an avatar in response to the selected predetermined avatar configuration (Goossens, fig.3&4, “[0067] For each of the preset user states presented to the user, a respective user input is received from the user for generating a customized avatar instance for the preset user state (304). [0079] In some implementations, the avatar server 102 can generate these preset avatar instances based on the user's individual avatar and common characteristics of facial expressions shown on the customized avatar instances of other users that are associated with the same preset user state”).
On the other hand, Goossens fails to explicitly disclose but Szirtes discloses a content comprising at least one of a video and an audio signal containing at least one of video image data and audio data (Szirtes, “[0064] FIG. 2 is a schematic data flow diagram that illustrates how information is processed and transformed in one or more embodiments of the invention. The process flow 200 begins with a raw data input 202 for a user j. The raw data may be any suitable source of data that is indicative of a user's ongoing response to a piece of media content. In the specific example give below, the raw data input is image data collected by a webcam on the user's computer. In other examples, the raw data may be any type of self-reported, behavioral or physiological data collected for the user. For example, audio data from the user can be recorded using a microphone, and physiological data can be collected using a wearable device or appropriate sensor (e.g., electromyography sensors, electrodermal activity sensors, LUX light sensors, electrocardiogram sensors)”);
an authoring unit configured to: detect, in the content, one or more events based on properties of, and anaylysis taken directly from, the at least one of the video image data and the audio data; categorise one or more of the detected events; and generate and store an event track for the content, the event track comprising one or more of the categorised events each associated with a respective time within the content, and the content being media content (Szirtes, “[0064] The raw data may be any suitable source of data that is indicative of a user's ongoing response to a piece of media content. In the specific example give below, the raw data input is image data collected by a webcam on the user's computer. In other examples, the raw data may be any type of self-reported, behavioral or physiological data collected for the user. For example, audio data from the user can be recorded using a microphone [0065] The raw data input 202 is used to generate one or more time series signals from which a predictive parameter that correlates with a desired output can be calculated. [0066] In the example shown in FIG. 2, the process flow 200 then extracts various descriptor data 204 from . 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined Szirtes and Goossens, to include all limitations of claim 1. That is, applying the extracting description data and obtaining emotion state data of Szirtes to generate the event track to configure an avatar of Goossens. The motivation/ suggestion would have been The predictive parameter is thus a property of dynamic information relating to a user's emotional state, which can provide a significant improvement in performance prediction over previously used static parameters (Szirtes, [0021]).
Regarding claim(s) 12, it is interpreted and rejected for the same reasons set forth in claim(s) 1.
Regarding claim(s) 16, it is in similar scope as claim 1 except that claim 16 further recites “A non-transitory, computer-readable storage medium containing computer software which, when executed by a computer, causes the computer to perform a method”.
A non-transitory, computer-readable storage medium containing computer software which, when executed by a computer, causes the computer to perform a method (Goossens, “claim 18, A computer-readable medium having instructions stored thereon, which, when executed by one or more processors, cause the processors to perform operations”).
Regarding claim 10, Goossens in view of Szirtes discloses An apparatus according to claim 1. 
Goossens further discloses the avatar configuration model is adapted for the user in response to the user profile by identifying the behavioural patterns of the user, and generating one or more new avatar configurations according to one or more behavioural patterns (Goossens, “[0027] In addition, users are allowed to define their own trigger events for different existing user states and create new user states. For example, in addition to preexisting emoticons, the users can create their own emoticons as trigger events for a particular user state that has special meaning to only a small group of friends, and create an avatar instance expressing the special meaning. Therefore, user states that are unique to an individual user or a group of associated users having common interests and experiences can be created”).
Regarding claim 11, Goossens in view of Szirtes discloses An apparatus according to claim 1. 
Goossens further discloses when the behaviour expected for the user for a categorised event of the event track most closely matches a first avatar configuration defined by the avatar configuration model, the selecting unit is configured to select the configuration of the avatar to correspond to a second avatar configuration (Goossens “[0063] Once the trigger events associated with the user state is detected on a user's device, the avatar presented on .
Claim(s) 5 is/are rejected under 35 U.S.C. 103 as being unpatentable over Goossens et al. (US 20110296324) in view of Szirtes et al. (US 20180158093), and further in view of Marks et al. (US 20080001951).
Regarding claim 5, Goossens in view of Szirtes discloses An apparatus according to claim 1.
On the other hand, Goossens in view of Szirtes fails to explicitly disclose but Marks discloses the output generator is configured to generate the control data for the content in real time (Marks, “[0109] This can be seen in the window 2204 because a car 2212, "driven" by player 2 is visible to player 1. In substantially real time player 2, who has nothing in front of him, as seen in the screen 2210, can see the avatar of player 1 grimace as a result of the real-life grimace of player 1. [0111] the camera and associated software can be used to monitor a real-world user for changes in facial expression, head movement, and hand movement and continuously update the avatar representation of the real-world user in substantially real-time”). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined Goossens, Szirtes and Marks. That is, applying the real time updating of Marks to control the avatar of Goossens and Szirtes. The motivation/ suggestion would have been Some or all of this information can be received by the computing .
Claim(s) 2, 13, 14 is/are rejected under 35 U.S.C. 103 as being unpatentable over Goossens et al. (US 20110296324) in view of Szirtes et al. (US 20180158093), further in view of Kubo et al. (US 20180165863).
Regarding claim 2, Goossens in view of Szirtes discloses An apparatus according to claim 1.
On the other hand, Goossens in view of Szirtes fails to explicitly disclose but Kubo discloses an image generator configured to generate, based on the control data, images including a virtual representation of the avatar having a selected configuration, for display to a user by a head mountable display (Kubo, “[0147] The field-of-view region 17A is an image displayed on a monitor 130A of the HMD 120A. The avatar object 6B of the user 5B is displayed in the field-of-view region 17A. [0177] the processor 210 in the HMD system 110A may generate motion data defining motion of each face part of the avatar object 6B based on the face tracking data on the user 5B received as the avatar information”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined Kubo into the combination of Goossens and Szirtes, to include all limitations of claim 2. That is, applying the HMD of Kubo to display the avatar of the system of Goossens and Szirtes. The motivation/ suggestion would have been the server 600 may communicate to/from another computer 200 for providing virtual reality to the HMD 120 used by another user. For example, when a plurality of users play a participatory game in an amusement facility, each computer 200 communicates to/from another computer 200 via 
Regarding claim(s) 14, it is interpreted and rejected for the same reasons set forth in claim(s) 2.
Regarding claim 13, Goossens in view of Szirtes discloses a method according to claim 12.
On the other hand, Goossens in view of Szirtes fails to explicitly disclose but Kubo discloses transmitting or streaming the content to a receiver in association with one or more from a list comprising: i. event track data; and ii. control data (Kubo, “[0057] The processor 210 transmits a signal for providing a virtual space to the HMD 120 via the input/output interface 240. The HMD 120 displays a video on the monitor 130 based on the signal. [0151] This avatar information contains information on an avatar such as motion information, face tracking data, and sound data. [0154] Next, the HMD sets 110A to 110C execute processing of Step S1330A to Step S1330C, respectively, based on the integrated pieces of avatar information transmitted from the server 600 to the HMD sets 110A to 110C.”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined Kubo into the combination of Goossens and Szirtes, to include all limitations of claim 13. That is, adding the transmitting step of Kubo to the content and control data of Goossens and Szirtes. The motivation/ suggestion would have been the server 600 may communicate to/from another computer 200 for providing virtual reality to the HMD 120 used by another user. For example, when a plurality of users play a participatory game in an amusement facility, each computer 200 communicates to/from another computer 200 .
Claim(s) 3, 15 is/are rejected under 35 U.S.C. 103 as being unpatentable over Goossens et al. (US 20110296324) in view of Szirtes et al. (US 20180158093), further in view of Connor et al. (US 20150075303).
Regarding claim 3, Goossens in view of Szirtes discloses An apparatus according to claim 1.
On the other hand, Goossens in view of Szirtes fails to explicitly disclose but Connor discloses a robot control unit configured to control, based on the control data, one or more actuators of a robot representation of the avatar to configure the robot to have a selected configuration (Connor, fig.2, “[0115] The bottom half of FIG. 2 shows an example of how this device can be used to animate a physical object. In an example, this can be a telerobotics application. Pressure information from multiple tubes (including tube 103) is transmitted wirelessly through wireless signal 202 to robot 204. This pressure information is then used to control actuators within robot 204 which cause robot 204 to imitate the movements of the running woman 201.”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined Connor into the combination of Goossens and Szirtes, to include all limitations of claim 3. That is, replacing the configuring avatar of Goossens and Szirtes with the configuring robot of Connor. The motivation/ suggestion would have been the device can be used to animate a physical object (Connor, [0115]).
Regarding claim(s) 15, it is interpreted and rejected for the same reasons set forth in claim(s) 3.
(s) 4 is/are rejected under 35 U.S.C. 103 as being unpatentable over Goossens et al. (US 20110296324) in view of Szirtes et al. (US 20180158093), further in view of Li (US 20170047096).
Regarding claim 4, Goossens in view of Szirtes discloses An apparatus according to claim 1.
On the other hand, Goossens in view of Szirtes fails to explicitly disclose but Li discloses the authoring unit is configured to categorise a detected event according to one or more of the list comprising: i. an audio classification; ii. an image classification; and iii. supplementary descriptive data associated with the content, and apply an event marker to the event track with a timestamp derived according to the detected event (Li, fig.6, “[0020] If the facial expression identification module 32 of the processing device 30 determines that the child's face expression is a smiling face, the emotion analysis module 34 analyzes the emotion of the child as a happy emotion. The emotion tag generation module 36 generates the emotion tag corresponding to the child's face image. If the voice is noisy (for example, the default values of the voice frequency and the voice volume can be used as a judging criteria), the emotion analysis module 34 analyzes the emotion of the child as an exciting emotion. And, the emotion tag generation module 36 generates the emotion tag corresponded to the child's face image. Therefore, the user can use the emotion tag to further classify the video file, edit the video file or apply the effect to the video file. [0035] the processing device 30 can generate multiple emotion tags corresponding to each time points along with the emotion change of a person in the video file”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined Li into the combination of Goossens and Szirtes, .
Claim(s) 7 is/are rejected under 35 U.S.C. 103 as being unpatentable over Goossens et al. (US 20110296324) in view of Szirtes et al. (US 20180158093), further in view of Westmoreland (US 20120054281).
Regarding claim 7, Goossens in view of Szirtes discloses An apparatus according to claim 1.
Goossens further discloses the user input data comprises image data associated with the user captured by a camera (Goossens, “[0099] For example, the user can upload a photograph showing a big smile to the avatar server, and use the photograph as a trigger event for a "Happy" user state. Subsequently, when a camera on the user device captures a digital image of the user in real-time and the captured image shows the user having the same facial expression as that shown in the uploaded photograph, the avatar server can determine that the "Happy" user state has occurred”).
On the other hand, Goossens in view of Szirtes fails to explicitly disclose but Westmoreland discloses the user input data comprises at least one of data entered by the user using an input device in response to one or more questions associated with the content (Westmoreland, “[0023] At various points in the common experiential environment defined by the set of content (e.g., along the path or waterway), questions may be presented to users controlling the avatars. The users may be encouraged and/or required to answer the questions. Answers to the questions may be received via client platform, and/or kept locally by the users. .
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined Westmoreland into the combination of Goossens and Szirtes, to include all limitations of claim 7. That is, adding the input answers of Westmoreland together with the captured image to the user input data of Goossens and Szirtes. The motivation/ suggestion would have been to provide a virtual space for users associated with a specific entity to enhance group innovation by interacting with spatially separate areas for teambuilding, idea generation, and collaboration, and/or other activities that encourage users to work together to create solutions, services, and/or products (Westmoreland, [0001]).  
Response to Arguments
Applicant's arguments filed 9/28/2021 have been fully considered but they are not persuasive. 
The applicant submits: Szirtes (U.S. 2018/0158093) discloses a system for obtaining physiological reactions of a user who is consuming media content. The system of Szirtes (U.S. 2018/0158093) manifestly does not disclose detecting one or more events in the content based on properties of, and analysis taken directly from, the at least one of the video image data and the audio data. Indeed, Szirtes (U.S. 2018/0158093) discloses something entirely different, which is to detect a user's personal and subjective reaction to watching media content (Remarks, page 9, 2nd paragraph).
The examiner respectfully disagrees. Szirtes discloses a system for obtaining raw input data of a user and generate time series signals of emotional state data from the raw input data. Especially, Szirtes discloses “[0047] The raw input data may be image data captured at each of raw input data (e.g., video image data and the audio data).
The applicant submits: Neither Goossens nor Szirtes uses the term "event track for a content" and by using this term in the final paragraph on page 16 of the Office Action the Examiner is attempting to equate certain features in Goossens with certain features in Szirtes which are in fact obtained in different ways and used for different purposes in the respective prior art citations, and which therefore are technically different from each other and would not have been equated with each other by the person skilled in the art when reading the respective prior art disclosures without knowledge of the presently claimed subject-matter (Remarks, page 9, 5th paragraph). 
As explained in each of our previous responses, the feature of an event track that is generated for a content and used for configuring an avatar is not disclosed in either Goossens or Szirtes and the Examiner is instead attempting to rely on a combination of prior art documents for this feature in claim 1. It is therefore entirely inappropriate for the Examiner to state on page 16 of the Office Action that Goossens and Szirtes each disclose an event track (Remarks, page 10, 1st paragraph).
The examiner respectfully disagrees. Goossens teaches configuring an avatar in response to a categorized event of an event track (Goossens, “[0065] FIG. 3 is a flow diagram of an exemplary process 300 for customizing an avatar instance for a user state. In some implementations, the avatar server can provide a number of preset user states that are common among many users. A preset user state can be a user emotion, such as " Happy," "Surprised," or "Angry." [0131] Avatar editing environment 914 can provide the user interfaces for selection and creation of user states and avatar instances described in reference to FIGS. 1-8. [0099] In some implementations, the user state can be defined by a photograph of the user showing a particular emotion. For example, the user can upload a photograph showing a big smile to the avatar server, and use the photograph as a trigger event for a "Happy" user state. Subsequently, when a camera on the user device captures a digital image of the user in real -time and the captured image shows the user having the same facial expression as that shown in the uploaded photograph, the avatar server can determine that the "Happy" user state has occurred”). Therefore, the user states defined in real-time correspond to an event track, and comprises categorized events (e.g., happy, surprised, or angry). Namely, the user state changes in accordance with the captured images of the user in real-time.
On the other hand, Szirtes also teaches an event track associated with user states comprising one or more of the categorized events associated with a respective time (Szirtes, “ [0072] The method 300 continues with a step 304 of extracting descriptor data points (i.e., a time 
Since both Goossens and Szirtes teach a track of categorized user emotional states, it would have been obvious to one of ordinary skill in the art to combine Szirtes and Goossens, to include all limitations of claim 1. That is, applying the extracting description data from input video/audio data of Szirtes to generate emotional state data to configure an avatar of Goossens. The motivation/ suggestion would have been The predictive parameter is thus a property of dynamic information relating to a user's emotional state, which can provide a significant improvement in performance prediction over previously used static parameters (Szirtes, [0021]).
The applicant submits: Whilst Goossens does refer to configuring an avatar, this is actually achieved in Goossens by detecting a trigger event (e.g. an emoticon in a text message) that is associated with a user state that is in turn associated with an avatar instance. Goossens therefore teaches that an avatar is updated upon detecting the trigger event on the user's device (see [0063] of Goossens). There is no disclosure in Goossens of detecting audio and/or video events for a content (as acknowledged by the Examiner), and there is certainly no disclosure of generating an event track for the content so that the event track comprises detected events that have been categorized associated with a respective time so that an avatar can be configured using the event categorizations and timings of the event track (Remarks, page 10, 2nd paragraph). 

Goossens discloses “[0099] In some implementations, the user state can be defined by a photograph of the user showing a particular emotion. For example, the user can upload a photograph showing a big smile to the avatar server, and use the photograph as a trigger event for a "Happy" user state. Subsequently, when a camera on the user device captures a digital image of the user in real -time and the captured image shows the user having the same facial expression as that shown in the uploaded photograph, the avatar server can determine that the "Happy" user state has occurred”. Therefore, the user states defined in real-time correspond to an event track, and comprises categorized events (e.g., happy, surprised, or angry). Namely, Goossens discloses generating an event track for the content so that the event track comprises detected events that have been categorized associated with a respective time (e.g., in real-time).
The applicant submits: Given that Szirtes does not disclose any technique for configuring an avatar, it is clear that Szirtes does not disclose any feature that can be considered to be technically the same as the event track as defined in claim 1  (Remarks, page 10, 3rd paragraph). 
Turning to Szirtes, the aim of this document is to gather raw data regarding a user's behavioural, physical and emotional state whilst the user is consuming a piece of media content so as to predict performance data for that piece of media content. There is no teaching at all in relation to configuring an avatar and, as explained above, the skilled person could not have adapted Goossens using any teaching from Szirtes so as to control an avatar in a different way to what is disclosed in Goossens. For at least this reason, the combination of Goossens and Szirtes does not teach the subject-matter of claim 1 (Remarks, page 13, 2nd paragraph).
The examiner respectfully disagrees. Szirtes was only used to teach categorizing user emotional states and generating a user emotional state track from audio/video signal. Since Goossens already discloses configuring an avatar based on a user emotional state track, the combination of Goossens and Szirtes teaches the entire claim 1.
The applicant submits: Consequently, Applicant requests that the Examiner uses the terminology used in Goossens and the terminology used in Szirtes when explaining how the skilled person would have combined the two disclosures rather than importing terminology from claim 1, so as to avoid the use of hindsight. Therefore, the Examiner's statement on page 16 of the Office Action can be more appropriately worded as: "Goossens teaches configuring an avatar based on user states, while Szirtes teaches a time series of emotional state data based on video and/or audio signals. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined Goosens and Szirtes by replacing the user states of Goosens with the time series of emotional state data of Szirtes ". Even this statement is not an accurate assessment of the prior art citation Goossens, because the Examiner has failed to appreciate that Goossens actually teaches a technique in which trigger events (e.g. an emoticon or a word in a text message) must be detected on a user's device in order to update the avatar (Remarks, page 10, 4th paragraph).
The examiner respectfully disagrees. The prior arts of Goossens and Szirtes are analogue in that both are associated with user emotional states. Goossens teaches configuring an avatar based on user emotional states in real-time, while Szirtes teaches user emotional states are 
The applicant submits: Applicant fails to see how replacing the user state in Goossens with the time series of emotional state data in Szirtes results in an arrangement falling within the scope of claim 1, and moreover Applicant fails to see how this combination can be considered to be an obvious adaptation for the person skilled in the art given the different purposes of Goossens and Szirtes. It will be appreciated that adapting Goossens so that a trigger event is associated with the time series of emotional state data in Szirtes does not fall within the scope of claim 1 at least because the trigger events in Goossens do not anticipate the detected audio and/or video events as required by claim 1, as acknowledged by the Examiner (Remarks, page 11, 2nd paragraph).
The examiner respectfully disagrees. Goossens discloses “[0099] In some implementations, the user state can be defined by a photograph of the user showing a particular emotion. For example, the user can upload a photograph showing a big smile to the avatar server, and use the photograph as a trigger event for a "Happy" user state. Subsequently, when a camera on the user device captures a digital image of the user in real -time and the captured image shows the user having the same facial expression as that shown in the uploaded photograph, the avatar server can determine that the "Happy" user state has occurred”. Therefore, the user states defined in real-time correspond to an event track, and comprises categorized events (e.g., happy, surprised, or angry). Captured digital image of the user in real-time of Goossens corresponds to video signal, which is analogous to the input video content of Szirtes. Since Goossens teaches an avatar is configured based on emotional states associated with the time series, and Szirtes teaches 
The applicant submits: As acknowledged by the Examiner on page 3 of the Office Action, Goossens actually teaches that: i) a user must manually specify the properties of a trigger event by entering a word, phrase or emoticon for example (this differs from the technique used in the present arrangement which instead detects audio and/or video events for a content) and ii) the user must provide an input to associate a user state with a particular avatar instance (see paragraph [0053] of Goossens). It will be appreciated that claim 1 actually requires that the modelling unit is configured to adapt an avatar configuration model in response to the generated user profile characterising behavioural patterns of a user. This is technically very different from the user-input based technique in Goossens in which a user enters a word, phrase or emoticon to specify a trigger event which is then associated with a user state and an avatar instance by a user providing further user inputs. This difference has not been acknowledged by the Examiner and it is not clear from the Examiner's analysis which feature in Goossens is considered to anticipate the modelling unit, the avatar configuration model and the feature of adapting the avatar configuration model in response to a user profile generated for a user. A user profile generated for a user to characterise behavioural patterns for the user is not anticipated by the feature of a user input in Goossens (Remarks, page 11, 3rd paragraph).
The examiner respectfully disagrees. User manually specify the properties of a trigger event by entering a word, phrase or emoticon is disclosed in some embodiments of Goossens, MUST manually specify the properties of a trigger event. Actually, Goossens discloses in paragraph [0099] the trigger event is generated based on captured digital images of the user in real-time, which does not need user manually enter a word, phrase, etc.
Goossens discloses a user profile generated for a user to characterize behavioral patterns for the user (Goossens, “[0056] If user A is an extraverted person and user B is an introverted person, user A is likely to have a more exaggerated facial expression than user B when they both feel the same emotion as expressed by the emoticon ":-)". To have their individual avatars more accurately reflect their emotions, user A can create an avatar instance that shows more excitement with a smile (e.g., with enlarged eyes, raised eye brows, and wide open mouth). The two avatar instances are stored in the avatar definition database 106 in association with their respective users, and are referenced by the user state definition triggered by the emoticon ":-)". [0063] Once the trigger events associated with the user state is detected on a user's device, the avatar presented on a user interface of the device can be updated with the instance of the avatar associated with the user state (206)”). Therefore, Goossens reads on the modelling unit clearly. “Anticipate the modelling unit” was not in claim 1, thus is unrelated to the claim mapping.
The applicant submits: The aim of Goossens is to create and use avatars to reflect user states by allowing individual users to associate individualized avatar expressions and/or body language with trigger events for user states that are associated with particular emotions detected on the user's device [0004]. Note that whereas Goossens aims to update the avatar to reflect a user's current state, the present arrangement provides a technique in which an avatar is configured responsive to the event categorizations and timings of an event track generated for a content. In the present arrangement, an avatar configuration is selected to correspond to a respective categorized event of the event track so that an avatar can be animated to react to the events in the event track and this is technically very different to what is disclosed in Goossens (Remarks, page 12, 2nd paragraph).
The examiner respectfully disagrees. Goossens ([0099]) discloses configuring an avatar in accordance with captured image in real-time, and Goossens does not exclude to configure an avatar in accordance with categorized event of an event track in time series. Technically, processing “captured images in real-time” and “categorized event of an event track in time series” are the same to a computer, thus there is no difference between the prior arts and claim 1 in this aspect.
The applicant submits: Starting from the teaching in Goossens of configuring an avatar according to a user's current state by configuring an avatar upon detecting a trigger event, the skilled person could not have adapted Goossens to abandon the use of the trigger events which are manually specified by the user, and therefore could not have adapted Goossens using Szirtes to arrive at an arrangement in which an event track comprising detected audio and/or video events that have been categorized is generated and stored for a content for use in configuring an avatar for the content (Remarks, page 12, 3rd paragraph).
The examiner respectfully disagrees. User manually specify the properties of a trigger event by entering a word, phrase or emoticon is disclosed in some embodiments of Goossens, however, previous office actions never address the user MUST manually specify the properties of a trigger event. Actually, Goossens discloses in paragraph [0099] the trigger event is generated based on captured digital images of the user in real-time, which does not need user manually enter a word, phrase, etc. Both Goossens and Szirtes disclose processing a time series 
The applicant submits: Consequently, Goossens (U.S. 2011/0296324), which fails to disclose the event track, cannot possibly disclose all of the subject-matter defining the modelling unit and the selecting unit. There is a clear relationship between the avatar configuration model and the event track which has not been considered in the Examiner’s analysis (Remarks, page 14, 5th paragraph).
The examiner respectfully disagrees. Goossens does disclose generate an event track for the content (Goossens, “[0044] The particular emotion can be defined by the user as a user state. When trigger events indicating the particular emotion are detected on a device associated with the user at a particular moment, the avatar instance associated with that emotion can be used to represent the user at that moment. [0065] A preset user state can be a user emotion, such as "Happy," "Surprised," or "Angry." [0099] In some implementations, the user state can be defined by a photograph of the user showing a particular emotion. For example, the user can upload a photograph showing a big smile to the avatar server, and use the photograph as a trigger event for a "Happy" user state. Subsequently, when a camera on the user device captures a digital image of the user in real -time and the captured image shows the user having the same facial expression as that shown in the uploaded photograph, the avatar server can determine that the "Happy" user state has occurred”). Therefore, the user state (e.g. user emotion, such as “Happy”, “Surprised”, or “Angry”) in real-time corresponds to an event track.
The applicant submits: Szirtes (U.S. 2018/0158093) is silent in relation to any technique for configuring an avatar responsive to a content. Problems associated with appropriately configuring an avatar according to audio and or video events in a content are not considered by Szirtes (U.S. 2018/0158093). Szirtes (U.S. 2018/0158093) does not disclose detecting events in a content comprising at least one of a video and audio signal so as to generate an event track for the content comprising categorised detected events so that the event track can be used to configure an avatar responsive to the content (Remarks, page 15, 2nd paragraph).
The examiner respectfully disagrees. In response to applicant's arguments against the references individually, one cannot show nonobviousness by attacking references individually where the rejections are based on combinations of references.  See In re Keller, 642 F.2d 413, 208 USPQ 871 (CCPA 1981); In re Merck & Co., 800 F.2d 1091, 231 USPQ 375 (Fed. Cir. 1986). It is the combination of Goossens and Szirtes to teach claim 1. Goossens already teaches “configuring an avatar responsive to a content”, and Szirtes is only used to teach “detect, in a content comprising at least one of a video and an audio signal one or more events based on properties of at least one of the video signal and the audio signal; categorise one or more of the detected events and generate an event track for the content, the event track comprising one or more of the categorised events associated with a respective time” as addressed above in the OA.
The applicant submits: Neither the performance data nor the emotional state data points in Szirtes (U.S. 2018/0158093) are used for configuring an avatar and there is no feature in Szirtes (U.S. 2018/0158093) which can be considered to be the same as the event track in the pending independent claims. In fact, it will be appreciated that rather than analysing the media content itself, Szirtes (U.S. 2018/0158093) actually teaches a technique for analysing raw input data for a user watching a media content item obtain performance data for the media content and this is completely different from what is presently claimed (Remarks, page 15, 3rd paragraph).
 Therefore, Szirtes teaches an event track comprising a time series of emotional state data.
In response to applicant's argument that Szirtes teaches a different technique from the claimed invention, a recitation of the intended use of the claimed invention must result in a structural difference between the claimed invention and the prior art in order to patentably distinguish the claimed invention from the prior art.  If the prior art structure is capable of performing the intended use, then it meets the claim. 
The applicant submits: Consequently, the techniques in Szirtes (U.S. 2018/0158093) for predicting performance data for a media content could not be combined with the disclosure of Goossens (U.S. 2011/0296324) by the person of ordinary skill in the art to thereby arrive at an arrangement in which an event track is generated for a content and an avatar configuration model respectively maps a plurality of predetermined avatar configurations to the event categorisations used in the event track so as to allow selection of a predetermined avatar configuration in response to a categorised event of the event track (Remarks, page 15, 4th paragraph).
The examiner respectfully disagrees. Since Goossens teaches configuring an avatar based on an event track comprising user states, while Szirtes teaches generating an event track comprising a time series of emotional state data based on video and/or video signals, It would 

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to GRACE Q LI whose telephone number is (571)270-0497. The examiner can normally be reached Monday - Friday, 8:00 am-5:00 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kee Tung can be reached on (571)-272-7794. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about 





/GRACE Q LI/Examiner, Art Unit 2611                                                                                                                                                                                            10/6/2021