PNG
    media_image1.png
    340
    340
    media_image1.png
    Greyscale
United States Patent and Trademark Office    
        
            
                                
            
        
    

Commissioner for Patents
United States Patent and Trademark Office
P.O. Box 1450
Alexandria, VA 22313-1450
www.uspto.gov











BEFORE THE PATENT TRIAL AND APPEAL BOARD


Application Number: 15/728,649
Filing Date: 10 Oct 2017
Appellant(s): Forouhar et al.



__________________
David S. Kim
Reg. No. 64,609
For Appellant


EXAMINER’S ANSWER





This is in response to the appeal brief filed 10/27/21.

(1) Grounds of Rejection to be Reviewed on Appeal
Every ground of rejection set forth in the Office action dated 3/30/21 from which the appeal is taken is being maintained by the examiner except for the grounds of rejection (if any) listed under the subheading “WITHDRAWN REJECTIONS.”  New grounds of rejection (if any) are provided under the subheading “NEW GROUNDS OF REJECTION.”
(2) Response to Argument	
Appellant's arguments (section VI., sub-sections A.-G., pages 6-21) filed 10/27/21 have been fully considered but they are not persuasive.
VI.	ARGUMENT

A.	The rejection of claims 1, 13-16, 21, 22, and 25 under 35 U.S.C. § 103 as being unpatentable over Bonito in view of Li is improper.

Appellants state in page 7, emphasis added:

“Applicant respectfully submits that the instant Section 103 rejection is improper at least because no combination of Bonito and Li describes or suggests an apparatus as recited in claim 1 or a system as recited in claim 13. For example, no combination of Bonito and Li describes or suggests analyzing one or more spatial and temporal patterns to determine one or more sporting activity events of a sporting activity within context of the sporting activity and one or more event characterizations that provide meaning to the one or more sporting activity events within context of the sporting activity. Indeed, the Office acknowledges that Bonito does not describe "analyz[ing] the one or more spatial and temporal patterns to determine one or more sporting activity events of the sporting activity within context of the sporting activity and one or more event characterizations that provide meaning to the one or more sporting activity events within context of the sporting activity." (Office Action dated 03/30/2021, p. 29 (markings omitted)). The Office cites Li to allegedly cure the noted deficiencies of Bonito. (/d., pp. 33-34 and 39-40).” 


around 10 people watching a golf game on grass under the sky in a golf park used to retrieve or search for corresponding video. Thus, Li teaches: citing US 8,655,030 first, via columns, lines, then citing provisional application 61/635,034 next, via page, lines:
analyzing (via fig. 5:154: “Shot data” analyzed into segments such as fig. 5: “Crowd shot” and “Game shot”) one or more spatial and temporal patterns (in figures 16-19: kicked ball mapped to fig. 5: “Game shot”) to determine one or more sporting activity events (via “identifies the…game”, c.12,ll. 26-29 or pg. 20,l.30 to pg. 21,l.2 or “recog-nized…event”, c.19,ll. 54,55 or “recognized… event”, pg.33,ll.5,6”) of a sporting activity (or a crowd watching the game event) within (i.e., in the field, sphere, or scope of) context of the sporting activity (since the recognized game event is within the field, sphere, or scope of “contextually connected shots”,c.7,ll.11-14 or pg.12, ll.1-3, i.e., the identified game event is contextually within the field, sphere, or scope of an identified crowd watching the game event) and one or more event characterizations (or determined image features being the basis for a text string used for video retrieval: “text string… descriptions of features”, c.7,ll.46,47 or pg. 12,ll. 27-30) that provide meaning to the one or more sporting activity events within context of the sporting activity. 
Thus, Bonito as combined with Li will result in said “around 10 people watching a golf game on grass under the sky in a golf park” retrieving golf videos as taught by Bonito.
VI.	ARGUMENT

A.	The rejection of claims 1, 13-16, 21, 22, and 25 under 35 U.S.C. § 103 as being unpatentable over Bonito in view of Li is improper.

1. The cited references do not describe or suggest each and every claim feature.

Appellants state on page 10 of the brief (Emphasis Added):
	“(Emphasis added). While the temporal and spatial information described in Li may be used to recognize features shown in the image sequence 310, like sky, grass, and even people, the level of recognition described in Li is devoid of any context in regard to a sporting activity and, thus, cannot be used to determine one or more sporting activity events of a sporting activity within context of the sporting activity and one or more event characterizations that provide meaning to the one or more sporting activity events within context of the sporting activity.” 

The examiner respectfully disagrees since Li teaches as claimed in claim 1, filed 12/18/20 (note that the first “context” in 14 font is directed more to the remark itself than claim 1):…











within (i.e., in the field, sphere, or scope of: Dictionary.com, definition 13, modifying the claimed “events” or said football game modified as being in the field, sphere, or scope of) context (via “high-level…contextual descript-tions”, c.7,ll.35-37 or pg. 12,ll. 20-22, or “sports…high-level context-level descriptions”, c.7,ll. 51,52 or pg. 13,ll.1-3, contextually describing said football game mapped to said contextually identified, connected shots of the kicked ball and the watching crowd) of the sporting activity and one or more event characterizations (via said “determination of…features” and “recog-nition features…used to…characterize”, c.4,ll.58-62 or pg.8,ll.8-11) that provide meaning (comprised by said “text string…descriptions of features”, c.7,ll.46,47 or pg. 12,ll. 29,30) to the one or more sporting activity events (via said game) within context (or the game is in scope of said “sports…context-level descriptions” “presented as text strings”,c.7,l.52 or pg. 13,ll. 2,3; such as “ ‘around 10 people (Adam, Brian . . . ) watching a live Elton John show on grass under the sky in the Queen's Park.’ ”,c.7,ll.41,42 or pg. 12,ll. 24,25; that “identifies…sports”, c.7,ll. 1-3 or pg. 11,ll. 25-27 and contextually describes said game such as said crowd or people or said grass  or sky as shown in fig. 12:clouds) of the sporting activity (said punt or crowd via Li US 8,655,030 or corresponding redundant provisional application 61/635,034:

“The feedback generated by the video codec 103 can take on many different forms. For example, while temporal and spatial information is used by video codec 103 to remove redundancy, this information can also be used by pattern recognition module 125 to detect or recognize features like sky, grass, sea, wall, buildings and building features such as the type of building, the number of building stories, etc., moving vehicles and animals (including people). Temporal feedback in the form of motion vectors estimated in encoding or retrieved in decoding (or motion information gotten by optical flow for very low resolution) can be used by pattern recognition module 125 for motion-based pattern partition or recognition via a variety of moving group algorithms. In addition, temporal information can be used by pattern recognition module 125 to improve recognition by temporal noise filtering, providing multiple picture candidates to be selected from for recognition of the best image in an image sequence, as well as for recognition of temporal features over a sequence of images. Spatial information such as statistical information, like variance, frequency components and bit consumption estimated from input YUV or retrieved for input streams, can be used for texture based pattern partition and recognition by a variety of different classifiers. More recogni-tion features, like structure, texture, color and motion characters can be used for precise pattern partition and recognition. For instance, line structures can be used to identify and characterize manmade objects such as building and vehicles. Random motion, rigid motion and relative position motion are effective to discriminate water, vehicles and animal respectively. Shot transition information from encoding or decoding that identifies transitions between video shots in an image sequence can be used to start new pattern detecting and reorganization and provide points of demarcation for temporal recognition across a plurality of images.”






















“Index data 115 can include a text string that identifies a pattern of interest for use in video storage and retrieval, and particularly to find videos of interest (e.g. relating to sports or cooking), locate videos containing certain scenes (e.g. a man and a woman on a beach), certain subject matter (e.g. regarding the American Civil War), certain venues (e.g. the Eiffel Tower) certain objects (e.g. a Patek Phillipe watch), certain themes (e.g. romance, action, horror), etc. Video indexing can be subdivided into five steps: modeling based on domain-specific attributes, segmentation, extraction, representation, organization. Some functions, like shot (temporally and visually connected frames) and scene (temporally and contextually connected shots) segmentation, used in encoding can likewise be used in visual indexing.”

c.7,ll. 29-53:
“Consider an example where video signal 110 contains a video broadcast. Index data 115 that indicates anchor shots and field shots show alternately could indicate a news broadcast; crowd shots and sports shots shown alternately could indicate a sporting event. Scene information can also be used for rate control, like quantization parameter (QP) initialization at shot transition in encoding. Index data 115 can be used to generate more high-level motive and contextual descriptions via manual review by human personnel. For instance, based on results mentioned above, operators could process index data 115 to provide additional descriptors for an image sequence 310 to, for example, describe an image sequence as ‘around 10 people (Adam, Brian . . . ) watching a live Elton John show on grass under the sky in the Queen's Park.’
The indexing data 115 can contain pattern recognition data 156 and other hierarchical indexing information like: frame-level temporal and spatial information including variance, global motion and bit number etc.; shot-level objects and text string or other descriptions of features such as text regions of a video, human and action description, object information and background texture description etc.; scene-level represents such as video category (news cast, sitcom, commercials, movie, sports or documentary etc.), and high-level context-level descriptions and presentations presented as text strings, numerical classifiers or other data descriptors.”

c.11,ll. 13-22:
	“In another example of operation, the video processing system 102 is part of a web server, teleconferencing system security system or set top box that generates indexing data 115 with recognition of human action. In this fashion and region of human action can be determined along with the determination of human action descriptions such as a number of people, body sizes and features, pose types, position, velocity and actions such as kick, throw, catch, run, walk, fall down, loiter, drop an item, etc. can be detected and recognized.”







“FIG. 5 presents a temporal block diagram representation of shot data 154 in accordance with a further embodiment of the present invention. In the example, presented a video signal 110 includes an image sequence 310 of a sporting event such as a football game that is processed by shot segmentation module 150 into shot data 154. Coding feedback data 300 from the video codec 103 includes shot transition data that indicates which images in the image sequence fall within which of the four shots that are shown. A first shot in the temporal sequence is a commentator shot, the second and fourth shots are shots of the game and the third shot is a shot of the crowd.”

c.19,ll. 32-56:
“For example, the video server 80 or other video system employs the video processing system 102 to generate a plurality of text strings that describe the videos of the video library 82 in conjunction with the encoding/decoding and/or transcoding these videos. A memory 88, coupled to the video processing system 102, stores a searchable index 162 that includes the plurality of text strings. The search module 86 identifies matching video from the video library 82 by comparing the search terms 398 or other input text strings to the plurality of text strings of the searchable index 162. Because the video processing system 102 generates the plurality of text strings to correspond to particular shots of the videos of video library 82, the search module 86 can further identify matching shots in the matching videos that contain the images that correspond to the search terms 398. In this fashion, a user can use search terms to search on particular, people, faces, text, human actions or other recognized objects, events, places or other things in the video library 82 and not only generate particular videos of the video library 82 that correspond to these search terms, but also be directed to the particular shot or shots in these matching videos that contain the recognized person, face, text, human action or other recog-nized object, event, place or other thing specified via the search terms 398.”










pages 7,8 of provisional application 61/635,034:
“The feedback generated by the video codec 103 can take on many different forms. For example, while temporal and spatial information is used by video codec 103 to remove redundancy, this information can also be used by pattern recognition module 125 to detect or recognize features like sky, grass, sea, wall, buildings and building features such as the type of building, the number of building stories, etc., moving vehicles and animals (including people). Temporal feedback in the form of motion vectors estimated in encoding or retrieved in decoding (or motion information gotten by optical flow for very low resolution) can be used by pattern recognition module 125 for motion-based pattern partition or recognition via a variety of moving group algorithms. In addition, temporal information can be used by pattern recognition module 125 to improve recognition by temporal noise filtering, providing multiple picture candidates to be selected from for recognition of the best image in an image sequence, as well as for recognition of temporal features over a sequence of images. Spatial information such as statistical information, like variance, frequency components and bit consumption estimated from input YUV or retrieved for input streams, can be used for texture based pattern partition and recognition by a variety of different classifiers. More recognition features, like structure, texture, color and motion characters can be used for precise pattern partition and recognition. For instance, line structures can be used to identify and characterize manmade objects such as building and vehicles. Random motion, rigid motion and relative position motion are effective to discriminate water, vehicles and animal respectively. Shot transition information from encoding or decoding that identifies transitions between video shots in an image sequence can be used to start new pattern detecting and reorganization and provide points of demarcation for temporal recognition across a plurality of images.”

pages 11,12:
“Index data 115 can include a text string that identifies a pattern of interest for use in video storage and retrieval, and particularly to find videos of interest (e.g. relating to sports or cooking), locate videos containing certain scenes (e.g. a man and a woman on a beach), certain subject matter (e.g. regarding the American Civil War), certain venues (e.g. the Eiffel Tower) certain objects (e.g. a Patek Phillipe watch), certain themes (e.g. romance, action, horror), etc. Video indexing can be subdivided into five steps: modeling based on domain-specific attributes, segmentation, extraction, representation, organization. Some functions, like shot (temporally and visually connected frames) and scene (temporally and contextually connected shots) segmentation, used in encoding can likewise be used in visual indexing.”





“Consider an example where video signal 110 contains a video broadcast. Index data 115 that indicates anchor shots and field shots show alternately could indicate a news broadcast; crowd shots and sports shots shown alternately could indicate a sporting event. Scene information can also be used for rate control, like quantization parameter (QP) initialization at shot transition in encoding. Index data 115 can be used to generate more high-level motive and contextual descriptions via manual review by human personnel. For instance, based on results mentioned above, operators could process index data 115 to provide additional descriptors for an image sequence 310 to, for example, describe an image sequence as “around 10 people (Adam, Brian...) watching a live Elton John show on grass under the sky in the Queen’s Park.”
The indexing data 115 can contain pattern recognition data 156 and other hierarchical indexing information like: frame-level temporal and spatial information including variance, global motion and bit number etc.; shot-level objects and text string or other descriptions of features such as text regions of a video, human and action description, object information and background texture description etc.; scene-level represents such as video category (news cast, sitcom, commercials, movie, sports or documentary etc.), and high-level context-level descriptions and presentations presented as text strings, numerical classifiers or other data descriptors.”

pages 18,19:
	“In another example of operation, the video processing system 102 is part of a web server, teleconferencing system security system or set top box that generates indexing data 115 with recognition of human action. In this fashion and region of human action can be determined along with the determination of human action descriptions such as a number of people, body sizes and features, pose types, position, velocity and actions such as kick, throw, catch, run, walk, fall down, loiter, drop an item, etc. can be detected and recognized.

page 20, ll. 13-20:
“FIG. 5 presents a temporal block diagram representation of shot data 154 in accordance with a further embodiment of the present invention. In the example, presented a video signal 110 includes an image sequence 310 of a sporting event such as a football game that is processed by shot segmentation module 150 into shot data 154. Coding feedback data 300 from the video codec 103 includes shot transition data that indicates which images in the image sequence fall within which of the four shots that are shown. A first shot in the temporal sequence is a commentator shot, the second and fourth shots are shots of the game and the third shot is a shot of the crowd.”





“For example, the video server 80 or other video system employs the video
processing system 102 to generate a plurality of text strings that describe the videos of
the video library 82 in conjunction with the encoding/decoding and/or transcoding these
videos. A memory 88, coupled to the video processing system 102, stores a searchable
index 162 that includes the plurality of text strings. The search module 86 identifies
matching video from the video library 82 by comparing the search terms 398 or other
input text strings to the plurality of text strings of the searchable index 162. Because the video processing system 102 generates the plurality of text strings to correspond to particular shots of the videos of video library 82, the search module 86 can further identify matching shots in the matching videos that contain the images that correspond to the search terms 398. In this fashion, a user can use search terms to search on particular, people, faces, text, human actions or other recognized objects, events, places or other things in the video library 82 and not only generate particular videos of the video library 82 that correspond to these search terms, but also be directed to the particular shot or shots in these matching videos that contain the recognized person, face, text, human action or other recognized object, event, place or other thing specified via the search terms 398.”













In response to appellant's argument that the references fail to show certain features of appellant’s invention, it is noted that the features upon which appellant relies (i.e., “ down, distance, field position, score, time remaining, or any other context in regard to the sporting activity” via page 10:
“(Emphasis added). Here, Li describes generating pattern recognition data 156 including human action descriptors such as "football player", "kick", "punt", and other descriptors that characterize the human action shown in the image sequence 310, as well as information regarding the distance, height, and trajectory of the ball. (17:61-18:31). However, as long as an image sequence 310 shows a player punting a football, Li's pattern recognition data 156 would include the same human action descriptors (e.g., football player, kick, punt) regardless of down, distance, field position, score, time remaining, or any other context in regard to the sporting activity.” 

) are not recited in the rejected claim(s).  Although the claims are interpreted in light of the specification, limitations from the specification are not read into the claims.  See In re Van Geuns, 988 F.2d 1181, 26 USPQ2d 1057 (Fed. Cir. 1993).
	If these Markush alternatives were to be claimed in the future, Li teaches the Markush alternative:
any other context is regard to the sporting activity (via said text string, such as “ ‘around 10 people (Adam, Brian . . . ) watching a live Elton John show on grass under the sky in the Queen's Park.’ ”, used to identify sports, such as using the query “people…on grass under the sky”). 






In response to appellant's argument that the references fail to show certain features of appellant’s invention, it is noted that the features upon which appellant relies (i.e.,  “recognize[s] specific activities and/or events to intelligently account for situations that occur in games” and “good fit within a particular team's strategy and playing schemes” and “a ‘good’ outcome or a ‘bad’ outcome”, page 11, emphasis added:
“In contrast, the present application describes technology that "recognize[s] specific activities and/or events to intelligently account for situations that occur in games." (11:2-3). By way of example and not limitation, page 11, lines 4-27 of the present application explains:…
Unlike the claimed invention, Li is not concerned with whether the football player shown in FIGS. 16-19 is a "good fit within a particular team's strategy and playing schemes" as is described at page 26, lines 12-20 of the present application and/or whether the kick or punt shown in FIGS. 16-19 resulted in a "good" outcome or a "bad" outcome as is described at page 11, lines 18-27 and page 23, lines 22-30 of the present application. Frankly, nothing in Li describes or suggests that the pattern recognition module 125 is even capable of determining a sporting activity event within context of the sporting activity.”

) are not recited in the rejected claim(s).  Although the claims are interpreted in light of the specification, limitations from the specification are not read into the claims.  See In re Van Geuns, 988 F.2d 1181, 26 USPQ2d 1057 (Fed. Cir. 1993).









Appellants state in page 11,12, emphasis added:
“Unlike the claimed invention, Li is not concerned with whether the football player shown in FIGS. 16-19 is a "good fit within a particular team's strategy and playing schemes" as is described at page 26, lines 12-20 of the present application and/or whether the kick or punt shown in FIGS. 16-19 resulted in a "good" outcome or a "bad" outcome as is described at page 11, lines 18-27 and page 23, lines 22-30 of the present application. Frankly, nothing in Li describes or suggests that the pattern recognition module 125 is even capable of determining a sporting activity event within context of the sporting activity.”

The examiner disagrees since Li teaches determining a sporting activity event (via said recognized, comprising “determine”, event) within (“within”, i.e., in the field, sphere, or scope of: Dictionary.com: definition 13, a preposition modifying “event” not “determining” and consistent with claim 1, line 15’s “events within context”) context (said event within the scope of said contextual text string describing said event and used to find said event as indicated in fig. 7,8,22:162: “Searchable index”) of the sporting activity (wherein said recognized of recognized event is defined via Dictionary.com:
recognize
verb (used with object), rec·og·nized, rec·og·niz·ing.
1	to identify as something or someone previously seen, known, etc.:
He had changed so much that one could scarcely recognize him.
wherein “identify” is defined:
identify
verb (used with object), i·den·ti·fied, i·den·ti·fy·ing.
1	to recognize or establish as being a particular person or thing; verify 
the identity of:
to identify handwriting; to identify the bearer of a check.
wherein “verify” is defined:
verify
verb (used with object), ver·i·fied, ver·i·fy·ing.
2	to ascertain the truth or correctness of, as by examination, research, or comparison:
to verify a spelling.
wherein “ascertain” is defined:
ascertain
verb (used with object)
1	to find out definitely; learn with certainty or assurance; determine:
to ascertain the facts.
In response to appellant's argument that the references fail to show certain features of appellant’s invention, it is noted that the features upon which appellant relies (i.e., “context in regard to the activity itself” and “ing” of “analyzing” via pages 12,13, emphasis, added (“context” is directed more to the remark itself than claim 1) :
“(Emphasis added). Even with "more high-level motive and contextual descriptions," like "grass" and "sky," Li still does not describe or suggest providing any context in regard to the activity itself. Contrary to the assertions at pages 33 and 39 of the Office Action, even if the purported "context" taught by Li (1.e., "grass", "sky") were combined with the purported sporting activity shown in FIGS. 16-19 (e.g., the football game), Li would not describe or suggest analyzing one or more spatial and temporal patterns to determine one or more sporting activity events of the sporting activity within context of the sporting activity, let alone one or more event characterizations that provide meaning to the one or more sporting activity events within context of the sporting activity. For instance, the description provided at col. 7, lines 41-42 may be modified to describe the image sequence 310 shown in FIGS. 16-19 as follows: 

Original description					Modified description 

“around 10 people (Adam, Brian... )	 	“around 1 person (the kicker) 
watching a live Elton John show on grass 	kicking a football on grass under the sky in the Queen's Park.” 				under the sky in the football stadium.” 

As with the pattern recognition data 156, Li's index data 115 may be able to describe what features are shown in the image sequence 310 (e.g., football player, kick, punt, grass, sky), but the description is devoid of any context in regard to the sporting activity.” 

) are not recited in the rejected claim(s).  Although the claims are interpreted in light of the specification, limitations from the specification are not read into the claims.  See In re Van Geuns, 988 F.2d 1181, 26 USPQ2d 1057 (Fed. Cir. 1993).
	If “context in regard to the activity itself” is claimed in the future, Li teaches context (or surrounding or included crowd and commentator of the game) in regard to the activity itself (said kicking the ball being watch by context: the included crowd and the commentator as shown in Li’s fig. 5: “Commentator shot” “Game shot” “Crowd shot” ). 

Appellants state on page 12,13, emphasis, added:
“(Emphasis added). Even with "more high-level motive and contextual descriptions," like "grass" and "sky," Li still does not describe or suggest providing any context in regard to the activity itself. Contrary to the assertions at pages 33 and 39 of the Office Action, even if the purported "context" taught by Li (1.e., "grass", "sky") were combined with the purported sporting activity shown in FIGS. 16-19 (e.g., the football game), Li would not describe or suggest analyzing one or more spatial and temporal patterns to determine one or more sporting activity events of the sporting activity within context of the sporting activity, let alone one or more event characterizations that provide meaning to the one or more sporting activity events within context of the sporting activity. For instance, the description provided at col. 7, lines 41-42 may be modified to describe the image sequence 310 shown in FIGS. 16-19 as follows: 

Original description					Modified description 
“around 10 people (Adam, Brian... )	 	“around 1 person (the kicker) 
watching a live Elton John show on grass 	kicking a football on grass under the sky in the Queen's Park.” 				under the sky in the football stadium.” 

As with the pattern recognition data 156, Li's index data 115 may be able to describe what features are shown in the image sequence 310 (e.g., football player, kick, punt, grass, sky), but the description is devoid of any context in regard to the sporting activity.” 
	
	The examiner disagrees since Li teaches as claimed in claim 9:
analyze (via “pattern…analyzes”: note that the “analyzing” instead of the claimed “analyze” can be used: see below sub-section C) the one or more spatial and temporal patterns (resulting in “recognition data 156” contained by figs. 3,4,7:115: “Indexing data”) to determine (via “manual review”, includes “ascertain”, of said fig. 6: “Indexing data”) one or more sporting activity events (from said recognized event) of the sporting activity within (i.e.,--in the field, sphere, or scope of-- modifying “events” not “analyze” or “determine”) context (via said contextually connected shots “used to generate … context…via manual review” resulting in said “Modified description”) of the sporting activity and one or more event characterizations (via said “recog-nition features”) that provide meaning (comprised in said “text string…descriptions of features”) to the one or more sporting activity events within (or in scope or view) context of the sporting activity (or the crowd watching the event via Li:

“Consider an example where video signal 110 contains a video broadcast. Index data 115 that indicates anchor shots and field shots show alternately could indicate a news broadcast; crowd shots and sports shots shown alternately could indicate a sporting event. Scene information can also be used for rate control, like quantization parameter (QP) initialization at shot transition in encoding. Index data 115 can be used to generate more high-level motive and contextual descriptions via manual review by human personnel. For instance, based on results mentioned above, operators could process index data 115 to provide additional descriptors for an image sequence 310 to, for example, describe an image sequence as ‘around 10 people (Adam, Brian . . . ) watching a live Elton John show on grass under the sky in the Queen's Park.’ ”

wherein “review” is defined via Dictionary.com:
review, noun
4	a general survey of something, especially in words; a report or account of something.

wherein “survey” is defined:
survey, verb (used with object)
2	to view in detail, especially to inspect, examine, or appraise formally or officially in order to ascertain condition, value, etc.; 

c.6,ll. 35-42:
“The pattern recognition module 125 includes a shot segmentation module 150 that segments the image sequence 310 into shot data 154 corresponding to the plurality of shots, based on the coding feedback data 300. A pattern detection module 175 analyzes the shot data 154 and generates pattern recognition data 156 that identifies at least one pattern of interest in conjunction with at least one of the plurality of shots.”

or redundantly, via said provisional application:

pg. 12, ll. 16-25:
“Consider an example where video signal 110 contains a video broadcast. Index data 115 that indicates anchor shots and field shots show alternately could indicate a news broadcast; crowd shots and sports shots shown alternately could indicate a sporting event. Scene information can also be used for rate control, like quantization parameter (QP) initialization at shot transition in encoding. Index data 115 can be used to generate more high-level motive and contextual descriptions via manual review by human personnel. For instance, based on results mentioned above, operators could process index data 115 to provide additional descriptors for an image sequence 310 to, for example, describe an image sequence as “around 10 people (Adam, Brian...) watching a live Elton John show on grass under the sky in the Queen’s Park.’ ”; and

pgs. 10,11:
The pattern recognition module 125 includes a shot segmentation module 150 that segments the image sequence 310 into shot data 154 corresponding to the plurality of shots, based on the coding feedback data 300. A pattern detection module 175 analyzes the shot data 154 and generates pattern recognition data 156 that identifies at least one pattern of interest in conjunction with at least one of the plurality of shots.



Appellants state on page 12,13, emphasis, added:
“(Emphasis added). Even with "more high-level motive and contextual descriptions," like "grass" and "sky," Li still does not describe or suggest providing any context in regard to the activity itself. Contrary to the assertions at pages 33 and 39 of the Office Action, even if the purported "context" taught by Li (1.e., "grass", "sky") were combined with the purported sporting activity shown in FIGS. 16-19 (e.g., the football game), Li would not describe or suggest analyzing one or more spatial and temporal patterns to determine one or more sporting activity events of the sporting activity within context of the sporting activity, let alone one or more event characterizations that provide meaning to the one or more sporting activity events within context of the sporting activity. For instance, the description provided at col. 7, lines 41-42 may be modified to describe the image sequence 310 shown in FIGS. 16-19 as follows: 

Original description					Modified description 
“around 10 people (Adam, Brian... )	 	“around 1 person (the kicker) 
watching a live Elton John show on grass 	kicking a football on grass under the sky in the Queen's Park.” 				under the sky in the football stadium.” 

As with the pattern recognition data 156, Li's index data 115 may be able to describe what features are shown in the image sequence 310 (e.g., football player, kick, punt, grass, sky), but the description is devoid of any context in regard to the sporting activity.” 
	
	The examiner disagrees since Li teaches said Original description, that may be modified as shown above corresponding to said sports high-level context description, as manually determined or ascertained via said manual review that includes “context” such as “sky” “grass” and “stadium” that “surround” the “event” or game, represented as “the kicker”:
Original description				Modified description
“around 10 people (Adam, Brian... )	 	“around 1 person (the kicker) 
watching a live Elton John show on grass 	kicking a football on grass under the sky in the Queen's Park.” 				under the sky in the football stadium.” 

wherein “context” is defined via Dictionary.com:
context
noun
2	the set of circumstances or facts that surround a particular event, situation, etc.


VI.	ARGUMENT

A.	The rejection of claims 1, 13-16, 21, 22, and 25 under 35 U.S.C. § 103 as being unpatentable over Bonito in view of Li is improper.

2. The resultant combination would not describe or suggest the claimed invention.

Appellants state in pages 14,15, emphasis added:
“As described above, Li describes using temporal and spatial information to determine when and where certain features are shown within an image sequence 310. While Li's pattern recognition data 156 may be able to describe what features are shown in the image sequence 310 (e.g., football player, kick, punt, grass, sky), the description is devoid of any context in regard to the sporting activity. Therefore, even if, arguendo, the combination of the local server 105, advertiser server 117, email server 119, portal server 121, personal computer 123, host server 129, and external communication network 113 disclosed in Bonito were interpreted as a unit and 14 such unit were modified to recognize or describe patterns of interest "or concern or importance relative to a person, such as a person concerned with said 'grass' or 'sky' or a celebrity," as is suggested at page 42 of the Office Action, Applicant respectfully submits that the resultant combination would merely use Li's pattern recognition data 156 and/or index data 115 to recognize what is being shown in each shot of an image sequence (e.g., electronic score card, views on the golf course, golfer's used equipment, his or her swing) at a technical, image-specific level. (See, e.g., Li at 4:23-25).” 

The examiner disagrees, the combination, presented in of the Office action of 3/30/31 at page 42, results in a manually determined or ascertained contextual descriptor of a golf game at “sports…high-level context-level descriptions…presented as text strings”, as shown in said Original or Modified description, via Li: 
c.7,ll. 43-53:
The indexing data 115 can contain pattern recognition data 156 and other hierarchical indexing information like: frame-level temporal and spatial information including variance, global motion and bit number etc.; shot-level objects and text string or other descriptions of features such as text regions of a video, human and action description, object information and background texture description etc.; scene-level represents such as video category (news cast, sitcom, commercials, movie, sports or documentary etc.), and high-level context-level descriptions and presentations presented as text strings, numerical classifiers or other data descriptors.”

Appellants state in page 15:
“Accordingly, Applicant respectfully submits that Li does not overcome the noted deficiencies of Bonito because any analyzing of spatial and temporal patterns by the resultant combination would be limited to recognizing what is shown in the image (e.g., grass, sky, celebrity) and, thus, would not determine one or more sporting activity events of a sporting activity within context of the sporting activity and one or more event characterizations that provide meaning to the one or more sporting activity events within context of the sporting activity, as is recited in claims | and 13.” 

The examiner disagrees since Li teaches “recognition data 156” from a low-level (i.e., “frame-level temporal and spatial information including variance, global motion and bit number etc.”) to a hjgh-level (i.e., “high-level context-level descriptions” as manually determined by reviewing the contextually connected shots or indexing data, as shown in fig. 6, such as said kicking of the ball watched by the contextually connected crowd and contextually connected commentator) via Li: 
c.7,ll. 43-53:
The indexing data 115 can contain pattern recognition data 156 and other hierarchical indexing information like: frame-level temporal and spatial information including variance, global motion and bit number etc.; shot-level objects and text string or other descriptions of features such as text regions of a video, human and action description, object information and background texture description etc.; scene-level represents such as video category (news cast, sitcom, commercials, movie, sports or documentary etc.), and high-level context-level descriptions and presentations presented as text strings, numerical classifiers or other data descriptors.”

Thus, the combination is not limited to any one recognition level.







VI.	ARGUMENT
A.	The rejection of claims 1, 13-16, 21, 22, and 25 under 35 U.S.C. § 103 as being unpatentable over Bonito in view of Li is improper.
3. There is no valid motivation or rationale to combine the cited references.

Appellants state in page 16, emphasis added:
“However, Applicant respectfully submits that this reasoning is insufficient because the purported advantage provided by Li (i.e., using pattern recognition data to determine contextual descriptors, such as "grass" and "sky") is not needed and not relevant to the mobile terminal 101 described in Bonito. That is, contrary to the assertion at page 42 of the Office Action, it is not reasonable to expect that one of ordinary skill in the art would have looked to combine Bonito and Li "because Li's location/position 'indicates the pattern or patterns of interest’ or concern or importance relative to a person, such as a person concerned with said 'grass' or 'sky' or a celebrity, and Li's face-profile data set is a 'more accurate...feature' that is used to present 'video content ...of interest...or other objects/features of interest’ or concern or importance relative to a person such as said fig. 16:382 corresponding to Elton John within the context of said 'grass' and 'sky'.’ ”

The examiner disagrees since Bonito and Li as well as appellant’s invention is pertinent or relevant in the field of endeavor of features or characteristics and video retrieval and, before the invention was filed, one of skill in the art of features and video retrieval would of reasonably looked to Bonito and Li for a teaching of features and retrieval and combine as shown in the combination rendering under 35 USC 103 what is obviousness instead of what is needed.
The combination renders the advantages of a desirable and accurate feature used for video retrieval. If Bonito teaches a feature but is silent about the feature being desirable and accurate, then the desirable and accurate feature is obvious to combine to acquire those advantages for retrieval. If Bonito teaches video retrieval (“retrieve” “information” or “video”) but is silent regarding the details of video retrieval, then Li’s detailed teaching of video retrieval, via the desirable video features or characteristics, becomes obvious to combine to one of ordinary skill in the art of video retrieval via: 

“[0011] While the foregoing systems provide a variety of independent and integrated features associated with a game of golf, none of the aforementioned systems provide an automatic reporting mechanism to provide the golfer feedback relating to his round of golf, without requiring the golfer to physically transfer the information by carrying a portable memory device or other tangible item (e.g., a piece of paper bearing the golf scores). Further, none of the prior art systems include electronic submission of golf scores in accordance with U.S. Golf Association (USGA) rules, which require the attestation of a player's score by another player in the group. Still further, none of the prior art systems provide a password-accessible portal for use by golfers at which the golfers can retrieve information pertaining to various rounds of golf played at courses employing mobile computer generated scoring and advertising, as well as optionally select for presentation and otherwise customize the arrangement of certain information on the mobile golf car display. Still further, none of the prior art systems provide automatic email reporting of electronically generated golf scores, as well as optionally supplying information related to advertisements or other information for which the golfer has, or has shown, interest.”. 

[0058] For example, if "JIM" had customized his display prior to commencing play, then when "JIM" logs into the mobile terminal 101 through use of his RFID card, the mobile terminal processing device 205 sends a wireless message to the local server 105 requesting Jim's customized display. The local server 205 then accesses the portal server 121 over the external communication network 113 (e.g., the Internet) and retrieves Jim's display information (e.g., the supplemental information to be displayed as selected by Jim together with control information containing instructions for arranging the supplemental information on the display 209 in accordance with Jim's stored arrangement). Upon acquiring Jim's display information, the local server 105 transmits the display information to the mobile terminal 101, which in turn stores the display information in the memory 207. Upon determining that Jim is the active terminal user (e.g., by detecting that the name "JIM" has been pressed on the touch screen display 209 or otherwise selected using the mobile terminal's user interface 211), the mobile terminal processing device 205 retrieves Jim's customized profile and display information along with the display control instructions from memory 207, and presents the display information on the display 209 in accordance with the control instructions. To inform the golfers as to which golfer is presently active, the name of the active golfer may be highlighted, shown flashing or blinking, shown in bold or a predetermined color, or otherwise emphasized on the score card 301.







information or supplemental information associated with the player-related information to an email server 119 hosting an email account of the player. For example, the local server 105 may send the electronic scorecard 301, ad images or other information related to advertisers sponsoring or identified in ads that were selected by the player during play of the round of golf (e.g., hyperlinks to web sites of the advertisers or to web sites at which the advertiser's products or service may be purchased), identifications of clubs selected and used by the player at particular times during play (e.g., based on distances to the pin), video of the player's swing as captured by the video camera 111, notes taken by the player during play, final tournament standings, tournament video or pictorial images, and other information of interest to the player's email account via the external communication network 113. Alternatively, the scorecard 301 and/or supplemental information may be sent by the local server 105 to the host server 129, which in turn sends the received information to the golfer's email account. In a preferred embodiment, communication of the scores and/or other player-related or supplemental information to the player's email account occurs after completion of the round of golf. Alternatively, communication of the scores and other information to the player's email account may occur at any other time, including in real time, as so desired by the player, the golf course operator, the applicable association or membership, or the network operator.”













Appellants state in pages 16,17, emphasis added:
“As acknowledged at page 42 of the Office Action, Bonito already describes golf features, such as "features within the holes (e.g., greens and hazards)" and "distance to pin or other features on the golf hole." For example, Bonito describes a comprehensive golf scoring, marketing and reporting system in which golf scores and/or other information are automatically reported and/or communicated to a golfer upon completion of or during a round of golf. (Abstract; see 9[0019]; see also JY [0005 ]-[0011] (disclosing features of well-known computer-based golf car systems, including "a method for mapping the perimeter of the holes and of features within the holes (e.g., greens and hazards)" (§[0007]) and "showing distance to pin or other features on the golf hole" (4[0009]))). Moreover, as described above, recognizing objects like "grass," "sky," or a celebrity using the teachings of Li (e.g., pattern recognition data 156) would be devoid of any context in regard to a sporting activity. Therefore, the addition of "indicat[ing] the pattern or patterns of interest' or concern or importance relative to a person, such as a person concerned with said 'grass' or 'sky" or a celebrity'" to Bonito's mobile terminal 101, as suggested at page 42 of the Office Action, would serve no purpose and, thus, is not needed and not relevant to the mobile terminal 101 described in Bonito.” 

The examiner disagrees for the same reasons above (pages 5-9 or this Office action’s response to appellant’s section VI., subsection A., sub-sub section 1, response paragraph 2) in the response to appellant’s remark regarding Li’s teaching of the contextual text string, comprising “grass” “sky” and celebrity golfer, used to find or identify sports videos. Therefore, the addition of "indicat[ing] the pattern or patterns of interest' or concern or importance relative to a person, such as a person concerned with said 'grass' or 'sky" or a celebrity'" to Bonito's mobile terminal 101, as suggested at page 42 of the Office Action, would serve purpose and, thus, is relevant to the mobile terminal 101 described in Bonito.




VI.	ARGUMENT

B. The rejection of claims 3 and 17 under 35 U.S.C. § 103 as being unpatentable over Bonito in view of Li and Abbott is improper. 

Appellants state in page 17:
“Accordingly, Applicant respectfully submits that the instant Section 103 rejection is improper at least because no combination of Bonito, Li, and Abbott describes or suggests an apparatus as recited in claim 1 or a system as recited in claim 13.” 

The examiner disagrees for the same reasons as presented above.
VI.	ARGUMENT

C. The rejection of claims 7-11, 19, and 20 under 35 U.S.C. § 103 as being unpatentable over Bonito in view of Li and Campbell is improper.

Appellants state in page 18, emphasis added:
“As described above, claim 13 and all claims dependent therefrom are patentable over Bonito in view of Li at least because no combination of Bonito and Li describes or suggests a system as recited in claim 13. Moreover, independent claim 7 includes recitations similar to those of claims 1 and 13 in regard to analyzing one or more spatial temporal patterns to determine one or more sporting activity events of the sporting activity within context of the sporting activity and one or more event characterizations that provide meaning to the one or more sporting activity events within context of the sporting activity. Therefore, Applicant respectfully submits that claim 7 and all claims dependent therefrom are patentable over Bonito in view of Li for at least the same reasons as claims 1 and 13. Campbell, which is cited to allegedly describe an overlay, does not overcome the noted deficiencies of Bonito and Li.”

The examiner disagrees for the same reasons as presented above.





In response to appellant's argument that the references fail to show certain features of appellant’s invention, it is noted that the features upon which appellant relies (i.e., “ing” of “analyzing” via appellant’s remarks, page 18, emphasis added:
“As described above, claim 13 and all claims dependent therefrom are patentable over Bonito in view of Li at least because no combination of Bonito and Li describes or suggests a system as recited in claim 13. Moreover, independent claim 7 includes recitations similar to those of claims 1 and 13 in regard to analyzing one or more spatial temporal patterns to determine one or more sporting activity events of the sporting activity within context of the sporting activity and one or more event characterizations that provide meaning to the one or more sporting activity events within context of the sporting activity. Therefore, Applicant respectfully submits that claim 7 and all claims dependent therefrom are patentable over Bonito in view of Li for at least the same reasons as claims | and 13. Campbell, which is cited to allegedly describe an overlay, does not overcome the noted deficiencies of Bonito and Li.”

) are not recited in the rejected claim(s).  Although the claims are interpreted in light of the specification, limitations from the specification are not read into the claims.  See In re Van Geuns, 988 F.2d 1181, 26 USPQ2d 1057 (Fed. Cir. 1993).
In contrast, claim 13 claims “analyze the one or more spatial and temporal patterns” directed more to the “action” than the “result” of the claimed “verb” analyze via Dictionary.com:-ing1
1	a suffix of nouns formed from verbs, expressing the action of the verb or its result, product, material, etc. (the art of building; a new building; cotton wadding). It is also used to form nouns from words other than verbs (offing; shirting). Verbal nouns ending in -ing are often used attributively (the printing trade) and in forming compounds (drinking song). In some compounds (sewing machine), the first element might reasonably be regarded as the participial adjective, -ing2, the compound thus meaning “a machine that sews,” but it is commonly taken as a verbal noun, the compound being explained as “a machine for sewing.”




A.	analyze (in the present tense1) one or more spatial temporal patterns to determine one or more sporting activity events (this phrase is in the present tense given analyze1) of the sporting activity within (a preposition modifying the claimed “events”2 and not the claimed “analyze”) context of the sporting activity and one or more event characterizations (in the present tense given analyze1,3) that provide meaning to the one or more sporting activity events within context of the sporting activity; and
B.	analyzing (expressing action or result of analyze1,3) one or more spatial temporal patterns to determine one or more sporting activity events (this phrase is expressing action or result of analyze1,3) of the sporting activity within (a preposition modifying “events”2 and not “analyzing”) context of the sporting activity and one or more event characterizations (“determine…characterizations” expressing action or result of analyze1,3) that provide meaning to the one or more sporting activity events within (modifying “events”) context of the sporting activity.
1	Appellant’s corresponding published application (US 2018/0137364 A1), emphasis added:

“[0079] This video coupling technology has numerous appealing applications. The user can search, parse, retrieve and transfer any activity of interest inside or alongside video. For example, for any event or outcome of interest identified using the analytics software, the video clip of that particular event would be readily available to the user for further analysis. This technology could also make the video itself the vehicle for the analytics, such as, for example, by making the video data searchable via a database, whereby the event profile data, and quantitative and qualitative event attribute data therein, is embedded within or graphically overlaid on the video media.”

2	Appellant’s corresponding published application, emphasis added:
“[0013] Figure 4 is an illustration of a two-dimensional animation of a sporting event including additional contextual data according to one embodiment of the invention.”

3	Appellants corresponding published application, emphasis added:
“Abstract

A system for enhanced sports analytics and/or content creation includes: an object tracking system that generates coordinate data corresponding to object motion in a sports event; a data processing module that receives the coordinate data from the object tracking system, analyzes the coordinate data with an event recognition algorithm that identifies and characterizes events and outcomes of interest, and catalogs the data in accordance with the identified events and outcomes into event profile data; a database that receives and stores the event profile data generated by the data processing module; a user application that accesses the event profile data from the database; and at least one processing unit that executes instructions stored in at least one non-transitory medium to implement at least one of the object tracking system, the data processing module, or the user application.”

	The claimed preposition “within” in claim 1 can grammatically modify the claimed verb “analyze” in claim 1,line 12, including the claimed “to determine” in claim, line13 and “meaning” in claim 1, line15, such that claim 1 would read as:
a)	“analyze…to determine…within context”; or
b)	“provide meaning…within context of the sporting activity”. 
However, in view of the above footnotes 1-3, the phrases:
a)	“analyze…to determine…within context”; and 
b)	“provide meaning…within context of the sporting activity” 
such that the claimed “within” each modify said phrases a) and b) are not consistent with footnotes 1-3.


VI.	ARGUMENT

D. The rejection of claim 12 under 35 U.S.C. § 103 as being unpatentable over Bonito in view of Li, Campbell, and Abbott is improper. 

Appellants state in pages 18,19:
“As described above, claim 7 and all claims dependent therefrom are patentable over Bonito in view of Li and Campbell at least because no combination of Bonito, Li, and Campbell 18 3519409.0003 PATENT describes or suggests an apparatus as recited in claim 7. Abbott, which is cited to allegedly describe sensitivity analysis for removing GPS errors, does not overcome the noted deficiencies of Bonito, Li, and Campbell.
Accordingly, Applicant respectfully submits that the instant Section 103 rejection is improper at least because no combination of Bonito, Li, Campbell, and Abbott describes or suggests an apparatus as recited in claim 7.”

The examiner disagrees for the same reasons as discussed above.
VI.	ARGUMENT

E. The rejection of claim 18 under 35 U.S.C. § 103 as being unpatentable over Bonito in view of Li and Wilk is improper. 

Appellants state in page 19:

“As described above, claim 13 and all claims dependent therefrom are patentable over Bonito in view of Li at least because no combination of Bonito and Li describes or suggests a system as recited in claim 13. Wilk, which is cited to allegedly describe a golf park, does not overcome the noted deficiencies of Bonito and Li. 
Accordingly, Applicant respectfully submits that the instant Section 103 rejection is improper at least because no combination of Bonito, Li, and Abbott describes or suggests a system as recited in claim 13.”

The examiner disagrees for the same reasons as discussed above.










VI.	ARGUMENT

F. The rejection of claim 23 under 35 U.S.C. § 103 as being unpatentable over Bonito in view of Li and Chatterjee is improper.

Appellants state in page 20:
“As described above, claim 1 and all claims dependent therefrom are patentable over Bonito in view of Li at least because no combination of Bonito and Li describes or suggests an apparatus as recited in claim 1. Chatterjee, which is cited to allegedly describe determining one or more outcome probabilities, does not overcome the noted deficiencies of Bonito and L1. 
Accordingly, Applicant respectfully submits that the instant Section 103 rejection is improper at least because no combination of Bonito, Li, and Chatterjee describes or suggests an apparatus as recited in claim 1.” 

The examiner disagrees for the same reasons as discussed above.


VI.	ARGUMENT

G. The rejection of claim 24 under 35 U.S.C. § 103 as being unpatentable over Bonito in view of Li and Berg is improper.

Appellants state in page 20:
“As described above, claim 1 and all claims dependent therefrom are patentable over Bonito in view of Li at least because no combination of Bonito and Li describes or suggests an apparatus as recited in claim 1. Berg, which is cited to allegedly describe determining recommended sporting activity events likely to lead to a desired outcome, does not overcome the noted deficiencies of Bonito and Li. 
Accordingly, Applicant respectfully submits that the instant Section 103 rejection is improper at least because no combination of Bonito, Li, and Berg describes or suggests an apparatus as recited in claim 1.”

The examiner disagrees for the same reasons as discussed above.





Respectfully submitted,
/DR/
Dennis Rosario
24 January 2022                                                                                                                                                                                                        
Conferees:
/MATTHEW C BELLA/Supervisory Patent Examiner, Art Unit 2667                                                                                                                                                                                                        
/SUMATI LEFKOWITZ/Supervisory Patent Examiner, Art Unit 2662                                                                                                                                                                                                        

Requirement to pay appeal forwarding fee.  In order to avoid dismissal of the instant appeal in any application or ex parte reexamination proceeding, 37 CFR 41.45 requires payment of an appeal forwarding fee within the time permitted by 37 CFR 41.45(a), unless appellant had timely paid the fee for filing a brief required by 37 CFR 41.20(b) in effect on March 18, 2013.