Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED OFFICE ACTION

Status of Claims

Claims 1-5,7-26 and 28-43  are allowed.
Claims 6 and 27 have been cancelled.

Reasons for Allowance

1.	The following is an examiner’s statement of reasons for allowance: 
Prior art made of record fails to teach the limitations underlined within the independent claims mentioned below.


Regarding Claim 1,
A fully-automatic system for classifying the spatial format of a video file  comprising: a video pre-processing stage, splitting and resizing an input video file into a set of frames; a convolutional neural network, receiving the set of frames and predicting one or more confidence values for each of the frames respectively, wherein the confidence values are one-hot encoded confidence values signifying either 2D, spherical (equirectangular) or stereo spherical (top-bottom equirectangular) projections detected per frame; and a discriminator stage, receiving the confidence values from the convolutional neural network and determining the spatial format of video based upon a mean confidence value calculated from the sum of per-frame confidence values; and a tagging stage adds the correct metadata associated with the video file corresponding to the format of the video determined at the discriminator stage, wherein the convolutional neural network is trained to identify optical distortions associated with the one or more video formats. 

Regarding Claim 7,
A fully automatic system for classifying the spatial format of a video file, comprising: a video pre-processing stage, splitting and resizing an input video file into a set of frames; a convolutional neural network, receiving the set of frames and predicting one or more confidence values for each of the frames respectively, wherein the confidence values are one-hot encoded confidence values signifying either 2D, spherical (equirectangular) or stereo spherical (top-bottom equirectangular) projections detected per frame; and a discriminator stage, receiving the confidence values from the convolutional neural network and determining the spatial format of video based upon a mean confidence value calculated from the sum of per-frame confidence values; and a tagging stage adds the correct metadata associated with the video file corresponding to the format of the video determined at the discriminator stage; wherein the convolutional neural network is trained to identify optical distortions present in equirectangular and stereo equirectangular projections.  

Regarding Claim 10,
A fully automatic system for classifying the spatial format of a video file, comprising: a video pre-processing stage, splitting and resizing an input video file into a set of frames; a convolutional neural network, receiving the set of frames and predicting one or more confidence values for each of the frames respectively, wherein the confidence values are one-hot encoded confidence values signifying either 2D, spherical (equirectangular) or stereo spherical (top-bottom equirectangular) projections detected per frame, and  a discriminator stage, receiving the confidence values from the convolutional neural network and determining the spatial format of video based upon a mean confidence value calculated from the sum of per-frame confidence values; and a tagging stage adds the correct metadata associated with the video file corresponding to the format of the video determined at the discriminator stage; wherein the video pre-processing stage selects a subset of the total frames of video from the input video file; and wherein the video pre-processing stage for training the network generates additional video frames from the selected subset to incorporate a range of viewport variances into the set of video frames sent to the convolutional neural network.  

Regarding Claim 11,
 A fully automatic system  for classifying the spatial format of a video file, comprising: a video pre-processing stage, splitting and resizing an input video file into a set of frames; a convolutional neural network, receiving the set of frames and predicting one or more confidence values for each of the frames respectively, wherein the confidence values are one-hot encoded confidence values signifying either 2D, spherical (equirectangular) or stereo spherical (top-bottom equirectangular) projections detected per frame; and a discriminator stage, receiving the confidence values from the convolutional neural network and determining the spatial format of video based upon a mean confidence value calculated from the sum of per-frame confidence values; and a tagging stage adds the correct metadata associated with the video file corresponding to the format of the video determined at the discriminator stage; wherein the video pre-processing stage selects a subset of the total frames of video from the input video file; and wherein the video pre-processing stage for training the network generates additional video frames from the selected subset to incorporate a range of intra-class variances into the set of video frames sent to the convolutional neural network. 

Regarding Claim 15,
 A fully automatic system for classifying the spatial format of a video file, comprising: a video pre-processing stage, splitting and resizing an input video file into a set of frames; a convolutional neural network, receiving the set of frames and predicting one or more confidence values for each of the frames respectively, wherein the confidence values are one-hot encoded confidence values signifying either 2D, spherical (equirectangular) or stereo spherical (top-bottom equirectangular) projections detected per frame; and a discriminator stage, receiving the confidence values from the convolutional neural network and determining the spatial format of video based upon a mean confidence value calculated from the sum of per-frame confidence values; and a tagging stage adds the correct metadata associated with the video file corresponding to the format of the video determined at the discriminator stage; wherein the video pre-processing stage selects a subset of the total frames of video from the input video file; wherein the convolutional neural network predictor is trained using pre-labeled video frames; wherein the convolutional neural network trains using a learning rate value that decays over time; and wherein the convolutional neural network trains using a 0.01 learning rate value that decays over time. 

Regarding Claim 16,
 A fully automatic system for classifying the spatial format of a video file, comprising: a video pre-processing stage, splitting and resizing an input video file into a set of frames; a convolutional neural network, receiving the set of frames and predicting one or more confidence values for each of the frames respectively, wherein the confidence values are one-hot encoded confidence values signifying either 2D, spherical (equirectangular) or stereo spherical (top-bottom equirectangular) projections detected per frame; and a discriminator stage, receiving the confidence values from the convolutional neural network and determining the spatial format of video based upon a mean confidence value calculated from the sum of per-frame confidence values; and a tagging stage adds the correct metadata associated with the video file corresponding to the format of the video determined at the discriminator stage; wherein the pre-processing stage provides a uniform size and channel count to each frame provided to the convolutional neural network predictor. 

Regarding Claim 17,
 A fully automatic system for classifying the spatial format of a video file, comprising: a video pre-processing stage, splitting and resizing an input video file into a set of frames; a convolutional neural network, receiving the set of frames and predicting one or more confidence values for each of the frames respectively, wherein the confidence values are one-hot encoded confidence values signifying either 2D, spherical (equirectangular) or stereo spherical (top-bottom equirectangular) projections detected per frame; and a discriminator stage, receiving the confidence values from the convolutional neural network and determining the spatial format of video based upon a mean confidence value calculated from the sum of per-frame confidence values; and a tagging stage adds the correct metadata associated with the video file corresponding to the format of the video determined at the discriminator stage; wherein the system does not utilize computer vision heuristics, and is built around a deep convolutional neural network architecture. 

Regarding Claim 18,
 A fully automatic system for classifying the spatial format of a video file, comprising: a video pre-processing stage, splitting and resizing an input video file into a set of frames; a convolutional neural network, receiving the set of frames and predicting one or more confidence values for each of the frames respectively, wherein the confidence values are one-hot encoded confidence values signifying either 2D, spherical (equirectangular) or stereo spherical (top-bottom equirectangular) projections detected per frame; and a discriminator stage, receiving the confidence values from the convolutional neural network and determining the spatial format of video based upon a mean confidence value calculated from the sum of per-frame confidence values; and a tagging stage adds the correct metadata associated with the video file corresponding to the format of the video determined at the discriminator stage; wherein the convolutional neural network architecture utilizes convolution layers to break down the frames into smaller filters that contain patterns associated with one or more distortion types. 

Regarding Claim 22,
A method for classifying the format of a video file, comprising the steps of: splitting an input video file into a set of frames; generating, by a convolutional neural network, one or more confidence value predictions for each of the frames in the set respectively with respect to one or more of a two-dimensional video format, a spherical video format and/or a stereo spherical video format; and determining the format of the video file based upon a mean confidence value calculated from the confidence value predictions; developing metadata associated with the video file corresponding to the determined format of the video; and training the convolutional neural network to identify optical distortions associated with the one or more video formats. 

Regarding Claim 28,
A method for classifying the format of a video file, comprising the steps of: splitting an input video file into a set of frames; generating, by a convolutional neural network, one or more confidence value predictions for each of the frames in the set respectively with respect to one or more of a two-dimensional video format, a spherical video format and/or a stereo spherical video format; and  determining the format of the video file based upon a mean confidence value calculated from the confidence value predictions; developing metadata associated with the video file corresponding to the determined format of the video; and  training the convolutional neural network to identify optical distortions present in equirectangular and stereo equirectangular projections. 

Regarding Claim 31,
A method for classifying the format of a video file, comprising the steps of: splitting an input video file into a set of frames; generating, by a convolutional neural network, one or more confidence value predictions for each of the frames in the set respectively with respect to one or more of a two-dimensional video format, a spherical video format and/or a stereo spherical video format; and determining the format of the video file based upon a mean confidence value calculated from the confidence value predictions; developing metadata associated with the video file corresponding to the determined format of the video; and a video pre-processing step that selects a subset of the total frames of video from the input video file; wherein the video pre-processing step generates additional video frames from the selected subset to incorporate a range of viewport variances into the set of video frames sent to the convolutional neural network. 

Regarding Claim 32,
 A method for classifying the format of a video file, comprising the steps of: splitting an input video file into a set of frames; generating, by a convolutional neural network, one or more confidence value predictions for each of the frames in the set respectively with respect to one or more of a two-dimensional video format, a spherical video format and/or a stereo spherical video format; and determining the format of the video file based upon a mean confidence value calculated from the confidence value predictions; developing metadata associated with the video file corresponding to the determined format of the video; and a video pre-processing step that selects a subset of the total frames of video from the input video file; wherein the video pre-processing step generates additional video frames from the selected subset to incorporate a range of intra-class variances into the set of video frames sent to the convolutional neural network. 

Regarding Claim 36,
A method for classifying the format of a video file, comprising the steps of: splitting an input video file into a set of frames;  generating, by a convolutional neural network, one or more confidence value predictions for each of the frames in the set respectively with respect to one or more of a two-dimensional video format, a spherical video format and/or a stereo spherical video format; and determining the format of the video file based upon a mean confidence value calculated from the confidence value predictions; developing metadata associated with the video file corresponding to the determined format of the video; a video pre-processing step that selects a subset of the total frames of video from the input video file; and training the convolutional neural network using pre-labeled video frames; wherein the training step trains using a learning rate value that decays over time; and wherein the training step trains using a 0.01 learning rate value that decays over time.  

Regarding Claim 37,
A method for classifying the format of a video file, comprising the steps of: splitting an input video file into a set of frames; generating, by a convolutional neural network, one or more confidence value predictions for each of the frames in the set respectively with respect to one or more of a two-dimensional video format, a spherical video format and/or a stereo spherical video format; and determining the format of the video file based upon a mean confidence value calculated from the confidence value predictions; developing metadata associated with the video file corresponding to the determined format of the video; and  a pre-processing step that provides a uniform size and channel count to each frame provided to the convolutional neural network. 

Regarding Claim 38,
A method for classifying the format of a video file, comprising the steps of: splitting an input video file into a set of frames;  generating, by a convolutional neural network, one or more confidence value predictions for each of the frames in the set respectively with respect to one or more of a two-dimensional video format, a spherical video format and/or a stereo spherical video format; and determining the format of the video file based upon a mean confidence value calculated from the confidence value predictions; and developing metadata associated with the video file corresponding to the determined format of the video; wherein the method does not utilize computer vision heuristics, and is built around a deep convolutional neural network architecture. 

Regarding Claim 39,
 A method for classifying the format of a video file, comprising the steps of: splitting an input video file into a set of frames; generating, by a convolutional neural network, one or more confidence value predictions for each of the frames in the set respectively with respect to one or more of a two-dimensional video format, a spherical video format and/or a stereo spherical video format; and determining the format of the video file based upon a mean confidence value calculated from the confidence value predictions; and developing metadata associated with the video file corresponding to the determined format of the video; wherein the convolutional neural network architecture utilizes convolution layers to break down the frames into smaller filters that contain patterns associated with one or more distortion types. 

Regarding Claim 43,
 A system for classifying the spatial format of a video file: a video pre-processing stage, splitting an input video file into a set of frames; a convolutional neural network, receiving the set of frames and predicting one or more confidence values for each of the frames respectively, wherein the confidence values signify one or more spatial video formats detected per frame; and a discriminator stage, receiving the per-frame confidence values from the convolutional neural network and determining the spatial format of video based upon an overall confidence value calculated from the plurality of per-frame confidence values; and an action stage that takes appropriate action with respect to the video file based upon the determined spatial format of the video, the appropriate action including meta-tagging the video file according to the determined spatial format, storing the video file according to the determined spatial format, playing the video file according to the determined spatial format, transmitting the video file according to the determined spatial format, and/or displaying the result of the determined spatial format to a user.   

Regarding Claim 1: Claim 1 is   rejected over  Nikolaos Zioulis ( NPL Doc. : “OmniDepth: Dense Depth Estimation for Indoors Spherical Panoramas.,”25 July 2018, Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 1-14) in view of Liu et al. (USPUB 20200265219) in further view of Mathieu et al. ( USPUB 20180137389)teaches A fully-automatic system for classifying the spatial format of a video file  comprising: a video pre-processing stage, splitting and resizing an input video file into a set of frames; a convolutional neural network, receiving the set of frames and predicting one or more confidence values for each of the frames respectively, wherein the confidence values are one-hot encoded confidence values signifying either 2D, spherical (equirectangular) or stereo spherical (top-bottom equirectangular) projections detected per frame;  respectively (detailed rejection of the claim mentioned within Office Action dated 03/03/2022) within claim 1,  but does not teach the limitations  ( previously  objected allowable limitation of claim 6 within office action dated 03/03/2022) as also amended within the claim   " a discriminator stage, receiving the confidence values from the convolutional neural network and determining the spatial format of video based upon a mean confidence value calculated from the sum of per-frame confidence values; and a tagging stage adds the correct metadata associated with the video file corresponding to the format of the video determined at the discriminator stage, wherein the convolutional neural network is trained to identify optical distortions associated with the one or more video formats.”

Regarding Claims 7,10,11,15,16,17,18,28,31,32,36,37,38 and 39: The following claims were objected as allowable claims within office action issued on 3/3/2022. Applicant amended the claims to independent form with allowable subject matter ( under lined above within the claims), prior art on record does not teach the above underlined limitations , therefore for the following claims are considered Allowed. 
Regarding Claim 22: Claim 22 is   rejected over  Panoramas.,”25 July 2018, Proceedings of the European Conference on Computer Vision(ECCV), 2018, pp. 1-14) in view of Liu et al. (USPUB 20200265219) in further view of Turcot et al. ( USPUB 20180189581) teaches A method for classifying the format of a video file, comprising the steps of: splitting an input video file into a set of frames; generating, by a convolutional neural network, one or more confidence value predictions for each of the frames in the set respectively with respect to one or more of a two-dimensional video format  respectively (detailed rejection of the claim mentioned within Office Action dated 03/03/2022) within claim 22,  but does not teach the limitations  ( previously  objected allowable limitation of claim 27 within office action dated 03/03/2022) as also amended within the claim   " a spherical video format and/or a stereo spherical video format; and determining the format of the video file based upon a mean confidence value calculated from the confidence value predictions; developing metadata associated with the video file corresponding to the determined format of the video; and training the convolutional neural network to identify optical distortions associated with the one or more video formats.”



Regarding Claim 43: Claim 43  is   rejected over  Nikolaos Zioulis ( NPL Doc. : “OmniDepth: Dense Depth Estimation for Indoors Spherical Panoramas.,”25 July 2018, Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 1-14) in view of Liu et al. (USPUB 20200265219) in further view of Mathieu et al. ( USPUB 20180137389)teaches A system for classifying the spatial format of a video file: a video pre-processing stage, splitting an input video file into a set of frames; a convolutional neural network, receiving the set of frames and predicting one or more confidence values for each of the frames respectively, wherein the confidence values signify one or more spatial video formats detected per frame; and a discriminator stage, receiving the per-frame confidence values from the convolutional neural network and determining the spatial format of video based upon an overall confidence value calculated from the plurality of per-frame confidence values;  respectively ,  but does not teach the limitations  ( previously  objected allowable limitation of claim 6 within office action dated 03/03/2022) as also amended within the claim   " an action stage that takes appropriate action with respect to the video file based upon the determined spatial format of the video, the appropriate action including meta-tagging the video file according to the determined spatial format, storing the video file according to the determined spatial format, playing the video file according to the determined spatial format, transmitting the video file according to the determined spatial format, and/or displaying the result of the determined spatial format to a user. ”

Conclusion


2.	Any inquiry concerning this communication or earlier communications from the examiner should be directed to OMAR S ISMAIL whose telephone number is (571)272-9799 and Fax #  ( 571) 273- 9799.  The examiner can normally be reached on M-F 9:00am-6:00pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at
http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, David C. Payne can be reached on ((571) 272-3024.  The fax phone number for the organization where this application or proceeding is assigned is (571)273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free)? If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/OMAR S ISMAIL/
Primary Examiner, Art Unit 2637