Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

DETAILED OFFICE ACTION

Status of Claims

Claims 1-42 are pending in this Office Action.
Claim 43 is Allowed.



Claim Rejections - 35 USC § 103

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b) (2) (C) for any potential 35 U.S.C. 102(a) (2) prior art against the later invention.

1.	Claims 1,9,12,13,14 and 20  are rejected under 35 U.S.C 103 as being patentable over Nikolaos Zioulis ( NPL Doc. : “OmniDepth: Dense Depth Estimation for Indoors Spherical Panoramas.,”25 July 2018,  Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 1-14) in view of Liu et al.  ( USPUB 20200265219) in further view of Mathieu et al. ( USPUB 20180137389).

As per Claim 1, Nikolaos Zioulis teaches  A fully-automatic system for classifying the spatial format of a video file ( Page 4- “… demonstrated in classification and single variable regression
problems. In addition, they are also applied in the spectral domain while we
formulate our network design for the spatial image domain….” AND  “Page 2- “…we train a CNN to learn to estimate a scene’s depth given an omnidirectional (equirectangular) image as input….”) : a video pre-processing stage ( preprocessing  block taught within Page 9 – Fig. 4  and Page 8 – Fig. 3 ) , splitting and resizing an input video file into a set of frames ( Page 10 = “…We split our dataset into corresponding train and tests sets as follows: (i) Initially we remove 1 complete area from Stanford2D3D, 3 complete buildings from Matterport3D and 3 CAD scenes from SunCG for our test set totaling 1,298 samples….”) ; spherical (equirectangular) or stereo spherical (top-bottom equirectangular) projections detected per frame ( equirectangular images and inputs taught within Page 3 – “…2. We offer a dataset consisting of 360o color images paired with ground truth360o depth maps in equirectangular format. The dataset is available online 3. We propose and validate, a CNN auto-encoder architecture specifically designed for estimating depth directly on equirectangular images.4. We show how monocular depth estimation methods trained on traditional 2Dimages fall short or produce low quality results when applied to equirectangular inputs, highlighting the need for learning directly on the 360o domain….”) ;
Nikolaos Zioulis does not explicitly teach  convolutional neural network, receiving the set of frames and predicting one or more confidence values for each of the frames respectively, 
wherein the confidence values are one-hot encoded confidence values signifying either 2D, and a discriminator stage, receiving the confidence values from the convolutional neural network and determining the spatial format of video based upon a mean confidence value calculated from the sum of per-frame confidence values; and a tagging stage adds the correct metadata associated with the video file corresponding to the format of the video determined at the discriminator stage.  
However, within analogous art, Liu et al. teaches convolutional neural network ( CNN architectures taught within Paragraph [0059]) , receiving the set of frames and predicting one or more confidence values for each of the frames respectively ( confidence coefficient and the predicting of learnt data from neural network taught within Paragraphs [0062-0063]) , wherein the confidence values are one-hot encoded confidence values signifying either 2D ( one-hot pose code taught within Paragraphs [0057] and [0059]) , 
	One of ordinary skill in the art would have been motivated to combine the teaching of Liu et al. within the modified teaching of the OmniDepth: Dense Depth Estimation for Indoors Spherical Panoramas mentioned by Nikolaos Zioulis  because the Disentangled representation learning generative adversarial network for pose-invariant face recognition mentioned by Liu et al. provides a system and method for implementing the efficient identification of object within video images with neural network . 
	Therefore, it would have been obvious for one in the ordinary skills in the art before the effective filing date of the claimed invention to implement the Disentangled representation learning generative adversarial network for pose-invariant face recognition mentioned by Liu et al. within modified teaching of the OmniDepth: Dense Depth Estimation for Indoors Spherical Panoramas mentioned by Nikolaos Zioulis for implementation of a system and method for the efficient identification of object within video images with neural network .
	Combination of Nikolaos Zioulis and Liu et al. does not explicitly teach a discriminator stage, receiving the confidence values from the convolutional neural network and determining the spatial format of video based upon a mean confidence value calculated from the sum of per-frame confidence values; and a tagging stage adds the correct metadata associated with the video file corresponding to the format of the video determined at the discriminator stage. 
	However, within analogous art, Mathieu et al. teaches a discriminator stage, receiving the confidence values from the convolutional neural network and determining the spatial format of video based upon a mean confidence value calculated from the sum of per-frame confidence values ( discriminating process within the neural network for the input image frames within Paragraphs [0029-0035]) ; and a tagging stage adds the correct metadata associated with the video file corresponding to the format of the video determined at the discriminator stage ( Paragraphs [0096]- “… interacting with content, tagging or being tagged in images, joining groups, listing and confirming attendance at events, checking-in at locations, liking particular pages, creating pages, and performing other tasks that facilitate social action. In particular embodiments, social-networking system 860 may calculate a coefficient based on the user's actions with particular types of content….” AND Paragraph [0097]- “…f a user is tagged in a first photo, but merely likes a second photo, social-networking system 860 may determine that the user has a higher coefficient with respect to the first photo than the second photo because having a tagged-in-type relationship with content may be assigned a higher weight and/or rating than having a like-type relationship with content. In particular embodiments, social-networking system 860 may calculate a coefficient for a first user based on the relationship one or more second users have with a particular object. …”) .
	One of ordinary skill in the art would have been motivated to combine the teaching of Mathieu et al.  within the combined  modified teaching of the OmniDepth: Dense Depth Estimation for Indoors Spherical Panoramas mentioned by Nikolaos Zioulis  and the Disentangled representation learning generative adversarial network for pose-invariant face recognition mentioned by Liu et al. because the Deep Multi-Scale Video Prediction mentioned by Mathieu et al. provides a system and method for implementing the training models that are able to accurately predict and construct image progression.
	Therefore, it would have been obvious for one in the ordinary skills in the art before the effective filing date of the claimed invention to implement the Deep Multi-Scale Video Prediction mentioned by Mathieu et al. within the combined modified teaching of the OmniDepth: Dense Depth Estimation for Indoors Spherical Panoramas mentioned by Nikolaos Zioulis  and the Disentangled representation learning generative adversarial network for pose-invariant face recognition mentioned by Liu et al. for implementation of a system and method for the training models that are able to accurately predict and construct image progression.

As per Claim 9, Combination of Nikolaos Zioulis and Liu et al. and Mathieu et al. teach claim 1,
Combination of Nikolaos Zioulis and Mathieu et al. does not explicitly teach wherein the video pre-processing stage selects a subset of the total frames of video from the input video file.
Within analogous art, Liu et al. teaches wherein the video pre-processing stage selects a subset of the total frames of video from the input video file ( Paragraph [0046]- “…various pre-processing procedures may be applied to the received images, including filtering, enhancing, combining, or separating various features, portions, or components of the images….”  AND the frames of the video taught within Paragraph [0109]) .  
	One of ordinary skill in the art would have been motivated to combine the teaching of Liu et al. within the combined modified teaching of the OmniDepth: Dense Depth Estimation for Indoors Spherical Panoramas mentioned by Nikolaos Zioulis  and the Deep Multi-Scale Video Prediction mentioned by Mathieu et al. because the Disentangled representation learning generative adversarial network for pose-invariant face recognition mentioned by Liu et al. provides a system and method for implementing the efficient identification of object within video images with neural network . 
	Therefore, it would have been obvious for one in the ordinary skills in the art before the effective filing date of the claimed invention to implement the Disentangled representation learning generative adversarial network for pose-invariant face recognition mentioned by Liu et al. within the combined modified teaching of the OmniDepth: Dense Depth Estimation for Indoors Spherical Panoramas mentioned by Nikolaos Zioulis   and the Deep Multi-Scale Video Prediction mentioned by Mathieu et al. for implementation of a system and method for the efficient identification of object within video images with neural network .

As per Claim 12, Combination of Nikolaos Zioulis and Liu et al. and Mathieu et al. teach claim 9,
Combination of Nikolaos Zioulis and Mathieu et al. does not explicitly teach wherein the convolutional neural network predictor is trained using pre-labeled video frames.
Within analogous art, Liu et al. teaches wherein the convolutional neural network predictor is trained using pre-labeled video frames( labeling of the video frames within neural network broadly taught within Paragraph [0084-0085]) .  
	One of ordinary skill in the art would have been motivated to combine the teaching of Liu et al. within the combined modified teaching of the OmniDepth: Dense Depth Estimation for Indoors Spherical Panoramas mentioned by Nikolaos Zioulis  and the Deep Multi-Scale Video Prediction mentioned by Mathieu et al. because the Disentangled representation learning generative adversarial network for pose-invariant face recognition mentioned by Liu et al. provides a system and method for implementing the efficient identification of object within video images with neural network . 
	Therefore, it would have been obvious for one in the ordinary skills in the art before the effective filing date of the claimed invention to implement the Disentangled representation learning generative adversarial network for pose-invariant face recognition mentioned by Liu et al. within the combined modified teaching of the OmniDepth: Dense Depth Estimation for Indoors Spherical Panoramas mentioned by Nikolaos Zioulis   and the Deep Multi-Scale Video Prediction mentioned by Mathieu et al. for implementation of a system and method for the efficient identification of object within video images with neural network .

As per Claim 13,  Combination of Nikolaos Zioulis and Liu et al. and Mathieu et al. teach claim 12,
Combination of Nikolaos Zioulis and Mathieu et al. does not explicitly teach wherein the convolutional neural network trains on a multitude of batches containing sixty-four pre-labeled video frames.
Within analogous art, Liu et al. teaches wherein the convolutional neural network trains on a multitude of batches containing sixty-four pre-labeled video frames (  the Batch size within convolutional neural network is broadly taught within Paragraph [0090]- “…optimization strategies, all models were trained with a batch size of 64. All weights were initialized from a zero-centered normal distribution with a standard deviation of 0.02….”) .  
	One of ordinary skill in the art would have been motivated to combine the teaching of Liu et al. within the combined modified teaching of the OmniDepth: Dense Depth Estimation for Indoors Spherical Panoramas mentioned by Nikolaos Zioulis  and the Deep Multi-Scale Video Prediction mentioned by Mathieu et al. because the Disentangled representation learning generative adversarial network for pose-invariant face recognition mentioned by Liu et al. provides a system and method for implementing the efficient identification of object within video images with neural network . 
	Therefore, it would have been obvious for one in the ordinary skills in the art before the effective filing date of the claimed invention to implement the Disentangled representation learning generative adversarial network for pose-invariant face recognition mentioned by Liu et al. within the combined modified teaching of the OmniDepth: Dense Depth Estimation for Indoors Spherical Panoramas mentioned by Nikolaos Zioulis   and the Deep Multi-Scale Video Prediction mentioned by Mathieu et al. for implementation of a system and method for the efficient identification of object within video images with neural network .

As per Claim 14, Combination of Nikolaos Zioulis and Liu et al. and Mathieu et al. teach claim 12,
Combination of Nikolaos Zioulis and Liu et al. does not explicitly teach wherein the convolutional neural network trains using a learning rate value that decays over time.
Within analogous art, Mathieu et al. teaches  wherein the convolutional neural network trains using a learning rate value that decays over time ( learning rate taught within Paragraphs [0042-0043] and decay factor taught within Paragraph [0095]) .  
One of ordinary skill in the art would have been motivated to combine the teaching of Mathieu et al.  within the combined  modified teaching of the OmniDepth: Dense Depth Estimation for Indoors Spherical Panoramas mentioned by Nikolaos Zioulis  and the Disentangled representation learning generative adversarial network for pose-invariant face recognition mentioned by Liu et al. because the Deep Multi-Scale Video Prediction mentioned by Mathieu et al. provides a system and method for implementing the training models that are able to accurately predict and construct image progression.
	Therefore, it would have been obvious for one in the ordinary skills in the art before the effective filing date of the claimed invention to implement the Deep Multi-Scale Video Prediction mentioned by Mathieu et al. within the combined modified teaching of the OmniDepth: Dense Depth Estimation for Indoors Spherical Panoramas mentioned by Nikolaos Zioulis  and the Disentangled representation learning generative adversarial network for pose-invariant face recognition mentioned by Liu et al. for implementation of a system and method for the training models that are able to accurately predict and construct image progression.
As per Claim 20, Combination of Nikolaos Zioulis and Liu et al. and Mathieu et al. teach claim 1,
Nikolaos Zioulis teaches wherein the convolutional neural network architecture is extendable to any number of classes (distortion types) ( Page 4 – “…CNNs as well as achieve invariance to the viewpoint’s rotation, the alternative pursued by [24] is based on graph-based deep learning. Specifically, they model distortion directly into the graph’s structure and apply it to a classification task. A novel approach taken in [25] is to learn appropriate convolution weights for equirectangular projected spherical images by transferring them from an existing network trained on traditional 2D images. …are very well suited for learning the distortion model of spherical images…”).  

2.	Claims 2,3 and 4  are rejected under 35 U.S.C 103 as being patentable over Nikolaos Zioulis ( NPL Doc. : “OmniDepth: Dense Depth Estimation for Indoors Spherical Panoramas.
,”25 July 2018,  Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 1-14) in view of Liu et al.  ( USPUB 20200265219) in further view of Mathieu et al. ( USPUB 20180137389) and Qiu et al. ( USPUB 20190034709).

As per Claim 2, Combination of Nikolaos Zioulis and Liu et al. and Mathieu et al. teach claim 1, 
Within analogous art, Qiu et al. teaches wherein the convolutional neural network architecture utilizes less than five convolution layers and at least one dropout layer ( multiple convolutional layers within Paragraphs  [0074], [0141-0142] and Paragraphs [0317] and [0341])  .  
	One of ordinary skill in the art would have been motivated to combine the teaching of Qiu et al. within the combined modified teaching of the OmniDepth: Dense Depth Estimation for Indoors Spherical Panoramas mentioned by Nikolaos Zioulis  and the Deep Multi-Scale Video Prediction mentioned by Mathieu et al. and  the Disentangled representation learning generative adversarial network for pose-invariant face recognition mentioned by Liu et al. because the Method and apparatus for expression recognition mentioned by Qiu et al. 
provides a system and method for implementing convolutional neural network architecture for in depth image recognition. 
	Therefore, it would have been obvious for one in the ordinary skills in the art before the effective filing date of the claimed invention to implement the Method and apparatus for expression recognition mentioned by Qiu et al.  combined modified teaching of the OmniDepth: Dense Depth Estimation for Indoors Spherical Panoramas mentioned by Nikolaos Zioulis  and the Deep Multi-Scale Video Prediction mentioned by Mathieu et al. and  the Disentangled representation learning generative adversarial network for pose-invariant face recognition mentioned by Liu et al.  for implementation of a system and method for a convolutional neural network architecture for in depth image recognition.

As per Claim 3, Combination of Nikolaos Zioulis and Liu et al. and Mathieu et al. teach claim 1,  
Combination of Nikolaos Zioulis and Liu et al. and Mathieu et al. does not explicitly teach  wherein the convolutional neural network architecture utilizes a plurality of convolution layers and at least three dropout layers.
Within analogous art, Qiu et al. teaches wherein the convolutional neural network architecture utilizes a plurality of convolution layers and at least three dropout layers( multiple convolutional layers and dropout layers within Paragraphs  [0074], [0141-0142] and Paragraphs [0317] and [0341]).  

As per Claim 4, Combination of Nikolaos Zioulis and Liu et al. and Mathieu et al. teach claim 1,
Combination of Nikolaos Zioulis and Liu et al. and Mathieu et al. does not explicitly teach  wherein the convolutional neural network architecture utilizes five convolution layers and three dropout layers.
Within analogous art, Qiu et al. teaches wherein the convolutional neural network architecture utilizes five convolution layers and three dropout layers( multiple convolutional layers and dropout layers within Paragraphs  [0074], [0141-0142] and Paragraphs [0317] and [0341]). 

3.	Claim 5  is rejected under 35 U.S.C 103 as being patentable over Nikolaos Zioulis ( NPL Doc. : “OmniDepth: Dense Depth Estimation for Indoors Spherical Panoramas.
,”25 July 2018,  Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 1-14) in view of Liu et al.  ( USPUB 20200265219) in further view of Mathieu et al. ( USPUB 20180137389) and Zia et al. ( USPUB 20180130355).


As per Claim 5, Combination of Nikolaos Zioulis and Liu et al. and Mathieu et al. teach claim 1,
Combination of Nikolaos Zioulis and Liu et al. and Mathieu et al. does not explicitly teach  wherein the convolutional neural network architecture utilizes batch normalization on each layer.
Within analogous art, Zia et al. teaches wherein the convolutional neural network architecture utilizes batch normalization on each layer ( Paragraph [0117]- “…Our network resembles a VGG neural network and includes deeply stacked 3×3 convolutional layers. However, unlike VGG, we remove local spatial pooling and couple each convolutional layer with batch normalization and ReLU…”) .  
	One of ordinary skill in the art would have been motivated to combine the teaching of Zia et al. within the combined modified teaching of the OmniDepth: Dense Depth Estimation for Indoors Spherical Panoramas mentioned by Nikolaos Zioulis  and the Deep Multi-Scale Video Prediction mentioned by Mathieu et al. and  the Disentangled representation learning generative adversarial network for pose-invariant face recognition mentioned by Liu et al. because the Advanced driver-assistance system with landmark localization on objects in images using convolutional neural networks mentioned by Zia et al. provides a system and method for implementing advanced localization of objects within images utilizing neural networking. 
	Therefore, it would have been obvious for one in the ordinary skills in the art before the effective filing date of the claimed invention to implement  the Advanced driver-assistance system with landmark localization on objects in images using convolutional neural networks mentioned by Zia et al.  within the combined modified teaching of the OmniDepth: Dense Depth Estimation for Indoors Spherical Panoramas mentioned by Nikolaos Zioulis  and the Deep Multi-Scale Video Prediction mentioned by Mathieu et al. and  the Disentangled representation learning generative adversarial network for pose-invariant face recognition mentioned by Liu et al.  for implementation of a system and method for an advanced localization of objects within images utilizing neural networking.

4.	Claim 8  is rejected under 35 U.S.C 103 as being patentable over Nikolaos Zioulis ( NPL Doc. : “OmniDepth: Dense Depth Estimation for Indoors Spherical Panoramas.
,”25 July 2018,  Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 1-14) in view of Liu et al.  ( USPUB 20200265219) in further view of Mathieu et al. ( USPUB 20180137389) and WANG et al.  ( USPUB 20180165548).

As per Claim 8, Combination of Nikolaos Zioulis and Liu et al. and Mathieu et al. teach claim 1,
Combination of Nikolaos Zioulis and Liu et al. and Mathieu et al. does not explicitly teach  wherein the confidence value predictions comprise one-hot encoded label.
Within analogous art, WANG et al. teaches  wherein the confidence value predictions comprise one-hot encoded label ( one hot operation  and encoder-decoder network taught within Paragraphs [0062] and [0069]- “… label map 410 may be first transformed to a score map 420, e.g., by one-hot operation, and the score of each pixel may be embedded into a 32-dimensional feature vector 434…” and confidence range taught within Paragraph [0078]).
One of ordinary skill in the art would have been motivated to combine the teaching of WANG et al. within the combined modified teaching of the OmniDepth: Dense Depth Estimation for Indoors Spherical Panoramas mentioned by Nikolaos Zioulis  and the Deep Multi-Scale Video Prediction mentioned by Mathieu et al. and  the Disentangled representation learning generative adversarial network for pose-invariant face recognition mentioned by Liu et al. because the Systems and methods for object tracking mentioned by WANG et al. provides a system and method for implementing object detecting and tracking within video images utilizing neural network architecture. 
	Therefore, it would have been obvious for one in the ordinary skills in the art before the effective filing date of the claimed invention to implement  the Systems and methods for object tracking mentioned by WANG et al. within the combined modified teaching of the OmniDepth: Dense Depth Estimation for Indoors Spherical Panoramas mentioned by Nikolaos Zioulis  and the Deep Multi-Scale Video Prediction mentioned by Mathieu et al. and  the Disentangled representation learning generative adversarial network for pose-invariant face recognition mentioned by Liu et al.  for implementation of an object detecting and tracking within video images utilizing neural network architecture.


5.	Claim 21  is rejected under 35 U.S.C 103 as being patentable over Nikolaos Zioulis ( NPL Doc. : “OmniDepth: Dense Depth Estimation for Indoors Spherical Panoramas.
,”25 July 2018,  Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 1-14) in view of Liu et al.  ( USPUB 20200265219) in further view of Mathieu et al. ( USPUB 20180137389) and Nirenberg et al.  ( USPUB 20140355861).

As per Claim 21, Combination of Nikolaos Zioulis and Liu et al. and Mathieu et al. teach claim 1,
Combination of Nikolaos Zioulis and Liu et al. and Mathieu et al. does not explicitly teach  wherein the system is built as a fully automatic pipeline for video format classification, thereby allowing easy human computer interaction (HCI) when uploading videos to a video sharing application.
 Within analogous art, Nirenberg et al. teaches wherein the system is built as a fully automatic pipeline for video format classification, thereby allowing easy human computer interaction (HCI) when uploading videos to a video sharing application ( computer human interactive and video streaming taught within Paragraphs [0006-0007]) .  
One of ordinary skill in the art would have been motivated to combine the teaching of Nirenberg et al. within the combined modified teaching of the OmniDepth: Dense Depth Estimation for Indoors Spherical Panoramas mentioned by Nikolaos Zioulis  and the Deep Multi-Scale Video Prediction mentioned by Mathieu et al. and  the Disentangled representation learning generative adversarial network for pose-invariant face recognition mentioned by Liu et al. because the Retinal encoder for machine vision mentioned by Nirenberg et al.  provides a system and method for implementing an efficient computer vision image processing with encoded data. 
	Therefore, it would have been obvious for one in the ordinary skills in the art before the effective filing date of the claimed invention to implement  the Retinal encoder for machine vision mentioned by Nirenberg et al. within the combined modified teaching of the OmniDepth: Dense Depth Estimation for Indoors Spherical Panoramas mentioned by Nikolaos Zioulis  and the Deep Multi-Scale Video Prediction mentioned by Mathieu et al. and  the Disentangled representation learning generative adversarial network for pose-invariant face recognition mentioned by Liu et al.  for implementation of an efficient computer vision image processing with encoded data.

6.	Claims 22,30,33,34 and 41  are rejected under 35 U.S.C 103 as being patentable over Nikolaos Zioulis ( NPL Doc. : “OmniDepth: Dense Depth Estimation for Indoors Spherical Panoramas.,”25 July 2018,  Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 1-14) in view of Liu et al.  ( USPUB 20200265219) in further view of Turcot et al.  ( USPUB 20180189581).

As per Claim 22, Nikolaos Zioulis teaches A method for classifying the format of a video file ( Page 4- “… demonstrated in classification and single variable regression problems. In addition, they are also applied in the spectral domain while we formulate our network design for the spatial image domain….” AND  “Page 2- “…we train a CNN to learn to estimate a scene’s depth given an omnidirectional (equirectangular) image as input….”)  , comprising the steps of: splitting an input video file into a set of frames ( Page 10 = “…We split our dataset into corresponding train and tests sets as follows: (i) Initially we remove 1 complete area from Stanford2D3D, 3 complete buildings from Matterport3D and 3 CAD scenes from SunCG for our test set totaling 1,298 samples….”) ; a spherical video format and/or a stereo spherical video format ( equirectangular images and inputs taught within Page 3 – “…2. We offer a dataset consisting of 360o color images paired with ground truth360o depth maps in equirectangular format. The dataset is available online 3. We propose and validate, a CNN auto-encoder architecture specifically designed for estimating depth directly on equirectangular images.4. We show how monocular depth estimation methods trained on traditional 2Dimages fall short or produce low quality results when applied to equirectangular inputs, highlighting the need for learning directly on the 360o domain….”) ;
Nikolaos Zioulis does not explicitly teach  generating, by a convolutional neural network, one or more confidence value predictions for each of the frames in the set respectively with respect to one or more of a two-dimensional video format, and determining the format of the video file based upon a mean confidence value calculated from the confidence value predictions; and developing metadata associated with the video file corresponding to the determined format of the video.  
However, within analogous art, Liu et al. teaches generating, by a convolutional neural network ( CNN architectures taught within Paragraph [0059]) , one or more confidence value predictions for each of the frames in the set respectively ( confidence coefficient and the predicting of learnt data from neural network taught within Paragraphs [0062-0063]) , with respect to one or more of a two-dimensional video format (Paragraphs [0057] and [0059]) , 
	One of ordinary skill in the art would have been motivated to combine the teaching of Liu et al. within the modified teaching of the OmniDepth: Dense Depth Estimation for Indoors Spherical Panoramas mentioned by Nikolaos Zioulis  because the Disentangled representation learning generative adversarial network for pose-invariant face recognition mentioned by Liu et al. provides a system and method for implementing the efficient identification of object within video images with neural network . 
	Therefore, it would have been obvious for one in the ordinary skills in the art before the effective filing date of the claimed invention to implement the Disentangled representation learning generative adversarial network for pose-invariant face recognition mentioned by Liu et al. within modified teaching of the OmniDepth: Dense Depth Estimation for Indoors Spherical Panoramas mentioned by Nikolaos Zioulis for implementation of a system and method for the efficient identification of object within video images with neural network .
	Combination of Nikolaos Zioulis and Liu et al. does not explicitly teach determining the format of the video file based upon a mean confidence value calculated from the confidence value predictions; and developing metadata associated with the video file corresponding to the determined format of the video.  
	However, within analogous art, Turcot et al.  teaches determining the format of the video file based upon a mean confidence value calculated from the confidence value predictions ( confidence and the predicting within bounding box of neural network taught within Paragraphs [0109] and [0118]) ; and developing metadata associated with the video file corresponding to the determined format of the video (metadata and the video format taught within  Paragraphs [0110-0111]) .
	One of ordinary skill in the art would have been motivated to combine the teaching of Turcot et al.  within the combined  modified teaching of the OmniDepth: Dense Depth Estimation for Indoors Spherical Panoramas mentioned by Nikolaos Zioulis  and the Disentangled representation learning generative adversarial network for pose-invariant face recognition mentioned by Liu et al. because the Vehicle manipulation using convolutional image processing  mentioned by Turcot et al. provides a system and method for implementing image analysis of  input video image by utilizing convolutional neural network. 
	Therefore, it would have been obvious for one in the ordinary skills in the art before the effective filing date of the claimed invention to implement the Vehicle manipulation using convolutional image processing  mentioned by Turcot et al. within the combined modified teaching of the OmniDepth: Dense Depth Estimation for Indoors Spherical Panoramas mentioned by Nikolaos Zioulis  and the Disentangled representation learning generative adversarial network for pose-invariant face recognition mentioned by Liu et al. for implementation of a system and method for image analysis of  input video image by utilizing convolutional neural network.

As per Claim 30, Combination of Nikolaos Zioulis and Liu et al. and Turcot et al.   teach claim 22,
Combination of Nikolaos Zioulis and Turcot et al. does not explicitly teach further comprising a video pre-processing step that selects a subset of the total frames of video from the input video file.
Within analogous art, Liu et al. teaches further comprising a video pre-processing step that selects a subset of the total frames of video from the input video file ( Paragraph [0046]- “…various pre-processing procedures may be applied to the received images, including filtering, enhancing, combining, or separating various features, portions, or components of the images….”  AND the frames of the video taught within Paragraph [0109]) .  
	One of ordinary skill in the art would have been motivated to combine the teaching of Liu et al. within the combined modified teaching of the OmniDepth: Dense Depth Estimation for Indoors Spherical Panoramas mentioned by Nikolaos Zioulis and the Vehicle manipulation using convolutional image processing  mentioned by Turcot et al.   because the Disentangled representation learning generative adversarial network for pose-invariant face recognition mentioned by Liu et al. provides a system and method for implementing the efficient identification of object within video images with neural network . 
	Therefore, it would have been obvious for one in the ordinary skills in the art before the effective filing date of the claimed invention to implement the Disentangled representation learning generative adversarial network for pose-invariant face recognition mentioned by Liu et al. within the combined modified teaching of the OmniDepth: Dense Depth Estimation for Indoors Spherical Panoramas mentioned by Nikolaos Zioulis and the Vehicle manipulation using convolutional image processing  mentioned by Turcot et al.  for implementation of a system and method for the efficient identification of object within video images with neural network .

As per Claim 33, Combination of Nikolaos Zioulis and Liu et al. and Turcot et al.    teach claim 30,
Combination of Nikolaos Zioulis and Turcot et al. does not explicitly teach further comprising the step of training the convolutional neural network using pre-labeled video frame.
 	 Within analogous art, Liu et al. teaches further comprising the step of training the convolutional neural network using pre-labeled video frames( labeling of the video frames within neural network broadly taught within Paragraph [0084-0085]).  
	One of ordinary skill in the art would have been motivated to combine the teaching of Liu et al. within the combined modified teaching of the OmniDepth: Dense Depth Estimation for Indoors Spherical Panoramas mentioned by Nikolaos Zioulis and the Vehicle manipulation using convolutional image processing  mentioned by Turcot et al.   because the Disentangled representation learning generative adversarial network for pose-invariant face recognition mentioned by Liu et al. provides a system and method for implementing the efficient identification of object within video images with neural network . 
	Therefore, it would have been obvious for one in the ordinary skills in the art before the effective filing date of the claimed invention to implement the Disentangled representation learning generative adversarial network for pose-invariant face recognition mentioned by Liu et al. within the combined modified teaching of the OmniDepth: Dense Depth Estimation for Indoors Spherical Panoramas mentioned by Nikolaos Zioulis and the Vehicle manipulation using convolutional image processing  mentioned by Turcot et al.  for implementation of a system and method for the efficient identification of object within video images with neural network .

As per Claim 34,  Combination of Nikolaos Zioulis and Liu et al. and Turcot et al.   teach claim 33,
Combination of Nikolaos Zioulis and Turcot et al. does not explicitly teach wherein the training step trains on a multitude of batches containing sixty-four pre-labeled video frames.
Within analogous art, Liu et al. teaches wherein the training step trains on a multitude of batches containing sixty-four pre-labeled video frames (  the Batch size within the training of convolutional neural network is broadly taught within Paragraph [0090]- “…optimization strategies, all models were trained with a batch size of 64. All weights were initialized from a zero-centered normal distribution with a standard deviation of 0.02….”) .  
	One of ordinary skill in the art would have been motivated to combine the teaching of Liu et al. within the combined modified teaching of the OmniDepth: Dense Depth Estimation for Indoors Spherical Panoramas mentioned by Nikolaos Zioulis and the Vehicle manipulation using convolutional image processing  mentioned by Turcot et al.   because the Disentangled representation learning generative adversarial network for pose-invariant face recognition mentioned by Liu et al. provides a system and method for implementing the efficient identification of object within video images with neural network . 
	Therefore, it would have been obvious for one in the ordinary skills in the art before the effective filing date of the claimed invention to implement the Disentangled representation learning generative adversarial network for pose-invariant face recognition mentioned by Liu et al. within the combined modified teaching of the OmniDepth: Dense Depth Estimation for Indoors Spherical Panoramas mentioned by Nikolaos Zioulis and the Vehicle manipulation using convolutional image processing  mentioned by Turcot et al.  for implementation of a system and method for the efficient identification of object within video images with neural network .

As per Claim 41, Combination of Nikolaos Zioulis and Liu et al. and Turcot et al.  teach claim 22,
Nikolaos Zioulis teaches wherein the convolutional neural network architecture is extendable to any number of classes (distortion types) ( Page 4 – “…CNNs as well as achieve invariance to the viewpoint’s rotation, the alternative pursued by [24] is based on graph-based deep learning. Specifically, they model distortion directly into the graph’s structure and apply it to a classification task. A novel approach taken in [25] is to learn appropriate convolution weights for equirectangular projected spherical images by transferring them from an existing network trained on traditional 2D images. …are very well suited for learning the distortion model of spherical images…”).

7.	Claims 23,24 and 25   are rejected under 35 U.S.C 103 as being patentable over Nikolaos Zioulis ( NPL Doc. : “OmniDepth: Dense Depth Estimation for Indoors Spherical Panoramas.
,”25 July 2018,  Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 1-14) in view of Liu et al.  ( USPUB 20200265219) in further view of Mathieu et al. ( USPUB 20180137389) and Qiu et al. ( USPUB 20190034709).

As per Claim 23, Combination of Nikolaos Zioulis and Liu et al. and Turcot et al.   teach claim 22,
Combination of Nikolaos Zioulis and Liu et al. and Turcot et al. does not explicitly teach wherein the convolutional neural network utilizes less than five convolution layers and at least one dropout layer.
Within analogous art,  Qiu et al. teaches wherein the convolutional neural network utilizes less than five convolution layers and at least one dropout layer( multiple convolutional layers within Paragraphs  [0074], [0141-0142] and Paragraphs [0317] and [0341])  .  
	One of ordinary skill in the art would have been motivated to combine the teaching of Qiu et al. within the combined modified teaching of the OmniDepth: Dense Depth Estimation for Indoors Spherical Panoramas mentioned by Nikolaos Zioulis  and   the Disentangled representation learning generative adversarial network for pose-invariant face recognition mentioned by Liu et al. and the Vehicle manipulation using convolutional image processing  mentioned by Turcot et al.  because the Method and apparatus for expression recognition mentioned by Qiu et al. provides a system and method for implementing convolutional neural network architecture for in depth image recognition. 
	Therefore, it would have been obvious for one in the ordinary skills in the art before the effective filing date of the claimed invention to implement the Method and apparatus for expression recognition mentioned by Qiu et al.  combined modified teaching of the OmniDepth: Dense Depth Estimation for Indoors Spherical Panoramas mentioned by Nikolaos Zioulis  and  the Disentangled representation learning generative adversarial network for pose-invariant face recognition mentioned by Liu et al. and the Vehicle manipulation using convolutional image processing  mentioned by Turcot et al.  for implementation of a system and method for a convolutional neural network architecture for in depth image recognition.


As per Claim 24, Combination of Nikolaos Zioulis and Liu et al. and Turcot et al.   teach claim 22,
Combination of Nikolaos Zioulis and Liu et al. and Turcot et al. does not explicitly teach wherein the convolutional neural network utilizes a plurality of convolution layers and at least three dropout layers.
Within analogous art,  Qiu et al. teaches wherein the convolutional neural network utilizes a plurality of convolution layers and at least three dropout layers( multiple convolutional layers and dropout layers within Paragraphs  [0074], [0141-0142] and Paragraphs [0317] and [0341]).    

As per Claim 25, Combination of Nikolaos Zioulis and Liu et al. and Turcot et al.    teach claim 22,
Combination of Nikolaos Zioulis and Liu et al. and Turcot et al. does not explicitly teach wherein the convolutional neural network utilizes five convolution layers and three dropout layers.
Within analogous art,  Qiu et al. teaches wherein the convolutional neural network utilizes five convolution layers and three dropout layers( multiple convolutional layers and dropout layers within Paragraphs  [0074], [0141-0142] and Paragraphs [0317] and [0341]). 


8.	Claim 26  is rejected under 35 U.S.C 103 as being patentable over Nikolaos Zioulis ( NPL Doc. : “OmniDepth: Dense Depth Estimation for Indoors Spherical Panoramas.
,”25 July 2018,  Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 1-14) in view of Liu et al.  ( USPUB 20200265219) in further view of Turcot et al.  ( USPUB 20180189581)  and Zia et al. ( USPUB 20180130355).

As per Claim 26, Combination of Nikolaos Zioulis and Liu et al. and Turcot et al.   teach claim 22,
Combination of Nikolaos Zioulis and Liu et al. and Turcot et al. does not explicitly teach wherein the convolutional neural network utilizes batch normalization on each layer in the generating step.
Within analogous art, Zia et al. teaches wherein the convolutional neural network utilizes batch normalization on each layer in the generating step( Paragraph [0117]- “…Our network resembles a VGG neural network and includes deeply stacked 3×3 convolutional layers. However, unlike VGG, we remove local spatial pooling and couple each convolutional layer with batch normalization and ReLU…”).  
 	One of ordinary skill in the art would have been motivated to combine the teaching of Zia et al. within the combined modified teaching of the OmniDepth: Dense Depth Estimation for Indoors Spherical Panoramas mentioned by Nikolaos Zioulis  and  the Disentangled representation learning generative adversarial network for pose-invariant face recognition mentioned by Liu et al. and the Vehicle manipulation using convolutional image processing  mentioned by Turcot et al. because the Advanced driver-assistance system with landmark localization on objects in images using convolutional neural networks mentioned by Zia et al. provides a system and method for implementing advanced localization of objects within images utilizing neural networking. 
	Therefore, it would have been obvious for one in the ordinary skills in the art before the effective filing date of the claimed invention to implement  the Advanced driver-assistance system with landmark localization on objects in images using convolutional neural networks mentioned by Zia et al.  within the combined modified teaching of the OmniDepth: Dense Depth Estimation for Indoors Spherical Panoramas mentioned by Nikolaos Zioulis  and  the Disentangled representation learning generative adversarial network for pose-invariant face recognition mentioned by Liu et al. and the Vehicle manipulation using convolutional image processing  mentioned by Turcot et al.   for implementation of a system and method for an advanced localization of objects within images utilizing neural networking.


9.	Claim 29 is rejected under 35 U.S.C 103 as being patentable over Nikolaos Zioulis ( NPL Doc. : “OmniDepth: Dense Depth Estimation for Indoors Spherical Panoramas.
,”25 July 2018,  Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 1-14) in view of Liu et al.  ( USPUB 20200265219) in further view of Turcot et al.  ( USPUB 20180189581) and WANG et al.  ( USPUB 20180165548).


As per Claim 29, Combination of Nikolaos Zioulis and Liu et al. and Turcot et al.    teach claim 22,
Combination of Nikolaos Zioulis and Liu et al. and Turcot et al. does not explicitly teach wherein the confidence value predictions comprise one-hot encoded labels.

Within analogous art, WANG et al. teaches  wherein the confidence value predictions comprise one-hot encoded labels ( one hot operation  and encoder-decoder network taught within Paragraphs [0062] and [0069]- “… label map 410 may be first transformed to a score map 420, e.g., by one-hot operation, and the score of each pixel may be embedded into a 32-dimensional feature vector 434…” and confidence range taught within Paragraph [0078]).
One of ordinary skill in the art would have been motivated to combine the teaching of WANG et al. within the combined modified teaching of the OmniDepth: Dense Depth Estimation for Indoors Spherical Panoramas mentioned by Nikolaos Zioulis  and  the Disentangled representation learning generative adversarial network for pose-invariant face recognition mentioned by Liu et al. and the Vehicle manipulation using convolutional image processing  mentioned by Turcot et al. because the Systems and methods for object tracking mentioned by WANG et al. provides a system and method for implementing object detecting and tracking within video images utilizing neural network architecture. 
	Therefore, it would have been obvious for one in the ordinary skills in the art before the effective filing date of the claimed invention to implement  the Systems and methods for object tracking mentioned by WANG et al. within the combined modified teaching of the OmniDepth: Dense Depth Estimation for Indoors Spherical Panoramas mentioned by Nikolaos Zioulis  and  the Disentangled representation learning generative adversarial network for pose-invariant face recognition mentioned by Liu et al. and the Vehicle manipulation using convolutional image processing  mentioned by Turcot et al.  for implementation of an object detecting and tracking within video images utilizing neural network architecture.

10.	Claim 35  is rejected under 35 U.S.C 103 as being patentable over Nikolaos Zioulis ( NPL Doc. : “OmniDepth: Dense Depth Estimation for Indoors Spherical Panoramas.
,”25 July 2018,  Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 1-14) in view of Liu et al.  ( USPUB 20200265219) in further view of Turcot et al.  ( USPUB 20180189581) and Mathieu et al. ( USPUB 20180137389).

As per Claim 35,   Combination of Nikolaos Zioulis and Liu et al. and Turcot et al.   teach claim 33,
Combination of Nikolaos Zioulis and Liu et al. and Turcot et al. does not explicitly teach  wherein the training step trains using a learning rate value that decays over time.
Within analogous art, Mathieu et al. teaches  wherein the training step trains using a learning rate value that decays over time ( learning rate taught within Paragraphs [0042-0043] and decay factor taught within Paragraph [0095]) .  
One of ordinary skill in the art would have been motivated to combine the teaching of Mathieu et al.  within the combined modified teaching of the OmniDepth: Dense Depth Estimation for Indoors Spherical Panoramas mentioned by Nikolaos Zioulis  and  the Disentangled representation learning generative adversarial network for pose-invariant face recognition mentioned by Liu et al. and the Vehicle manipulation using convolutional image processing  mentioned by Turcot et al.  because the Deep Multi-Scale Video Prediction mentioned by Mathieu et al. provides a system and method for implementing the training models that are able to accurately predict and construct image progression.
	Therefore, it would have been obvious for one in the ordinary skills in the art before the effective filing date of the claimed invention to implement the Deep Multi-Scale Video Prediction mentioned by Mathieu et al. within the combined modified teaching of the OmniDepth: Dense Depth Estimation for Indoors Spherical Panoramas mentioned by Nikolaos Zioulis  and  the Disentangled representation learning generative adversarial network for pose-invariant face recognition mentioned by Liu et al. and the Vehicle manipulation using convolutional image processing  mentioned by Turcot et al.  for implementation of a system and method for the training models that are able to accurately predict and construct image progression.


11.	Claim 42  is rejected under 35 U.S.C 103 as being patentable over Nikolaos Zioulis ( NPL Doc. : “OmniDepth: Dense Depth Estimation for Indoors Spherical Panoramas.
,”25 July 2018,  Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 1-14) in view of Liu et al.  ( USPUB 20200265219) in further view of Turcot et al.  ( USPUB 20180189581) and Nirenberg et al.  ( USPUB 20140355861).

As per Claim 42, Combination of Nikolaos Zioulis and Liu et al. and Turcot et al.   teach claim 22,
Combination of Nikolaos Zioulis and Liu et al. and Turcot et al. does not explicitly teach further comprising the step of uploading the video to a video sharing application, wherein the system is built as a fully automatic pipeline for video format classification, thereby allowing easy human computer interaction (HCI) during the uploading step.
Within analogous art, Nirenberg et al. teaches further comprising the step of uploading the video to a video sharing application, wherein the system is built as a fully automatic pipeline for video format classification, thereby allowing easy human computer interaction (HCI) during the uploading step (computer human interactive and video streaming taught within Paragraphs [0006-0007]) .
One of ordinary skill in the art would have been motivated to combine the teaching of Nirenberg et al. within the combined modified teaching of the OmniDepth: Dense Depth Estimation for Indoors Spherical Panoramas mentioned by Nikolaos Zioulis  and  the Disentangled representation learning generative adversarial network for pose-invariant face recognition mentioned by Liu et al. and the Vehicle manipulation using convolutional image processing  mentioned by Turcot et al.  because the Retinal encoder for machine vision mentioned by Nirenberg et al.  provides a system and method for implementing an efficient computer vision image processing with encoded data. 
	Therefore, it would have been obvious for one in the ordinary skills in the art before the effective filing date of the claimed invention to implement  the Retinal encoder for machine vision mentioned by Nirenberg et al. within the combined modified teaching of the OmniDepth: Dense Depth Estimation for Indoors Spherical Panoramas mentioned by Nikolaos Zioulis  and  the Disentangled representation learning generative adversarial network for pose-invariant face recognition mentioned by Liu et al. and the Vehicle manipulation using convolutional image processing  mentioned by Turcot et al.  for implementation of an efficient computer vision image processing with encoded data.




 
Examiner’s Notes

12.	The Examiner acknowledges the following prior arts below as pertinent to the current applications claim limitations and inventive concept, although the following prior arts shown below were not relied upon to address the limitations within the claim, they are analogous art mentioning the inventive concept key points on (convolutional neural network , 3D video image processing, image processing , confidence vales, neural network layers and equirectangular video etc.).


1)	JIACHEN YANG,"3D Panoramic Virtual Reality Video Quality Assessment Based on 3D Convolutional Neural Networks, “July 11, 2018, IEEE Access ( Volume: 6),Pages 38669-38676.

2)	Hsien-Tzu Cheng,"Cube Padding forWeakly-Supervised Saliency Prediction in 360◦ Videos," 8 july 2016, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR),June 2018,Pages 1420-1426.

3) 	MUZAMMAL NASEER,"Indoor Scene Understanding in 2.5/3D for Autonomous Agents: A Survey," 12 December 2018, IEEE Access ( Volume: 7),Pages 1859-1879.

4)   Haiman Tian,"Multimodal deep representation learning for video classification," 3 May 2018,World Wide Web (2019) 22:1325–1341, https://doi.org/10.1007/s11280-018-0548-3, Pages 1326-1333.

Allowable Subject Matter

13.          Claims 6,7,10,11,15,16,17,18,19,27,28,31,32,36,37,38,39 and 40  are  objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

14.         The following is an examiner’s statement of reasons for objecting the claims as allowable subject matter: 

As to claims  6  and 27 , prior art of record does not teach or suggest the limitation mentioned within claims  6 and 27: “the convolutional neural network is trained to identify optical distortions associated with the one or more video formats.  ”

As to claims  7 and 28  , prior art of record does not teach or suggest the limitation mentioned within claims  7 and 28: “…the convolutional neural network is trained to identify optical distortions present in equirectangular and stereo equirectangular projections. ”

As to claims  10 and 31  , prior art of record does not teach or suggest the limitation mentioned within claims 10 and 31 : “…the video pre-processing stage for training the network generates additional video frames from the selected subset to incorporate a range of viewport variances into the set of video frames sent to the convolutional neural network.”

As to claims  11 and 32  , prior art of record does not teach or suggest the limitation mentioned within claims 11 and 32 : “…the video pre-processing stage for training the network generates additional video frames from the selected subset to incorporate a range of intra-class variances into the set of video frames sent to the convolutional neural network.”

As to claims 15 and 36 , prior art of record does not teach or suggest the limitation mentioned within claims 15 and 36  : “…the convolutional neural network trains using a 0.01 learning rate value that decays over time.”

As to claims 16 and 37 , prior art of record does not teach or suggest the limitation mentioned within claims 16 and 37  : “…the pre-processing stage provides a uniform size and channel count to each frame provided to the convolutional neural network….”

As to claims 17 and 38 , prior art of record does not teach or suggest the limitation mentioned within claims 17 and 38 : “…does not utilize computer vision heuristics, and is built around a deep convolutional neural network architecture.”

As to claims  18 and 39  , prior art of record does not teach or suggest the limitation mentioned within claims 18 and 39 : “…the convolutional neural network architecture utilizes convolution layers to break down the frames into smaller filters that contain patterns associated with one or more distortion types.  ”

As to claim  19 , Claim 19 depends on objected allowable claim 18 , therefore the following claim is considered allowable over prior art. 

As to claim  40 , Claim 40 depends on objected allowable claim 39 , therefore the following claim is considered allowable over prior art. 

Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee.  Such submissions should be clearly labeled “Comments on Statement of Reasons for Allowance.”


Reasons for Allowance


15.	The following is an examiner’s statement of reasons for allowance: 
Prior art made of record fails to teach the limitations underlined within the independent claims mentioned below.

Regarding Claim 43, A system for classifying the spatial format of a video file: a video pre-processing stage, splitting an input video file into a set of frames; a convolutional neural network, receiving the set of frames and predicting one or more confidence values for each of the frames respectively, wherein the confidence values signify one or more spatial video formats detected per frame; and a discriminator stage, receiving the per-frame confidence values from the convolutional neural network and determining the spatial format of video based upon an overall confidence value calculated from the plurality of per-frame confidence values; and an action stage that takes appropriate action with respect to the video file based upon the determined spatial format of the video, the appropriate action including meta-tagging the video file according to the determined spatial format, storing the video file according to the determined spatial format, playing the video file according to the determined spatial format, transmitting the video file according to the determined spatial format, and/or displaying the result of the determined spatial format to a user.


Regarding Claim 43: Nikolaos Zioulis teaches  A system for classifying the spatial format of a video file ( Page 4- “… demonstrated in classification and single variable regression
problems. In addition, they are also applied in the spectral domain while we formulate our network design for the spatial image domain….” AND  “Page 2- “…we train a CNN to learn to estimate a scene’s depth given an omnidirectional (equirectangular) image as input….”) : a video pre-processing stage ( preprocessing  block taught within Page 9 – Fig. 4  and Page 8 – Fig. 3 ) , splitting an input video file into a set of frames ( Page 10-“…We split our dataset into corresponding train and tests sets as follows: (i) Initially we remove 1 complete area from Stanford2D3D, 3 complete buildings from Matterport3D and 3 CAD scenes from SunCG for our test set totaling 1,298 samples….”) ; 
Nikolaos Zioulis does not explicitly teach a convolutional neural network, receiving the set of frames and predicting one or more confidence values for each of the frames respectively, wherein the confidence values signify one or more spatial video formats detected per frame; and a discriminator stage, receiving the per-frame confidence values from the convolutional neural network and determining the spatial format of video based upon an overall confidence value calculated from the plurality of per-frame confidence values; 
However, within analogous art, Liu et al. teaches a convolutional neural network ( CNN architectures taught within Paragraph [0059]) , receiving the set of frames and predicting one or more confidence values for each of the frames respectively ( confidence coefficient and the predicting of learnt data from neural network taught within Paragraphs [0062-0063]) , wherein the confidence values signify one or more spatial video formats detected per frame ( one-hot pose code taught within Paragraphs [0057] and [0059]) , 
	Combination of Nikolaos Zioulis and Liu et al. does not explicitly teach a discriminator stage, receiving the per-frame confidence values from the convolutional neural network and determining the spatial format of video based upon an overall confidence value calculated from the plurality of per-frame confidence values;
	However, within analogous art, Mathieu et al. teaches a discriminator stage, receiving the per-frame confidence values from the convolutional neural network and determining the spatial format of video based upon an overall confidence value calculated from the plurality of per-frame confidence values ( discriminating process within the neural network for the input image frames within Paragraphs [0029-0035])but does not teach the limitations   as mentioned within the claim:   “ an action stage that takes appropriate action with respect to the video file based upon the determined spatial format of the video, the appropriate action including meta-tagging the video file according to the determined spatial format, storing the video file according to the determined spatial format, playing the video file according to the determined spatial format, transmitting the video file according to the determined spatial format, and/or displaying the result of the determined spatial format to a user.”

Conclusion

16. 	Any inquiry concerning this communication or earlier communications from the examiner should be directed to OMAR S ISMAIL whose telephone number is (571)272-9799 and Fax #  ( 571) 273- 9799.  The examiner can normally be reached on M-F 9:00am-6:00pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at
http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, David C. Payne can be reached on ((571) 272-3024.  The fax phone number for the organization where this application or proceeding is assigned is (571)273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free)? If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/OMAR S ISMAIL/
Primary Examiner, Art Unit 2637