DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
Claims 1-20 are pending in this application. The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . 
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

Specification
The title of the invention is not descriptive.  A new title is required that is clearly indicative of the invention to which the claims are directed. 

35 U.S.C. § 112 Sixth Paragraph - Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.


The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an 
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitations are: “unit” in claims 1-20. 
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claim 18 is rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention. The term “extraction unit” leads to indefiniteness as it lacks antecedent basis and was not previously defined in the claimed features. Rather a Neural Network was used so it is unclear the scope that the applicant intends. 

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.


Claims 1-20 are rejected under 35 U.S.C. 102(a)(1)/(a)(2) as being anticipated by Yan et al. (US PGPub 2016/0117587, and US Patent US 10,387,773 B2, hereby referred to as “Yan”). For examination purposes, sections of the US Patent are cited below for applicant’s convenience. 


Yan teaches: 
1. An image identification apparatus, comprising: / 18. An image identification method, comprising: / 19. A training apparatus, comprising: / 20. A neural network (NN) (Yan: abstract, Hierarchical branching deep convolutional neural networks (HD-CNNs) improve existing convolutional neural network (CNN) technology. In a HD-CNN, classes that can be easily distinguished are classified in a higher layer coarse category CNN, while the most difficult classifications are done on lower layer fine category CNNs. Multinomial logistic loss and a novel temporal sparsity penalty may be used in HD-CNN training. The use of multinomial logistic loss and a temporal sparsity penalty causes each branching component to deal with distinct subsets of categories. Column 3 lines 4-21, Figure 1)
1. an extraction unit configured to extract a feature value of an image from image data using a Neural Network (NN); /18. extracting a feature value of an image from image data using a Neural Network (NN); / 19. a training unit configured to train a Neural Network (NN), (Yan: Column 3 lines 4-21, Figure 1, column 4 lines 34-42, Figure 2, FIG. 2 is a block diagram illustrating components of the HD-CNN server 130, according to some example embodiments. The HD-CNN server 130 is shown as including a communication module 210, a coarse category identification 40 module 220, a pretrain module 230, a fine-tune module 240, a classification module 250, and a storage module 260 all configured to communicate with each other ( e.g., via a bus, shared memory, or a switch).)
1. and a processing unit configured to identify the image based on the feature value extracted by the extraction unit, wherein the NN comprises a plurality of calculation (Yan: Column 3 lines 4-21, Figure 1 FIG. 1 is a network diagram illustrating a network environment 100 suitable for creating and using hierarchical deep CNNs for image classification, according to some example embodiments. Columns 3-4)
1. and wherein the NN includes a plurality of sub-neural networks for performing processing of calculation layers after a specific calculation layer, / 18. and wherein the NN includes a plurality of sub-neural networks for performing processing of calculation layers after a specific calculation layer, /  19. wherein the NN includes a plurality of sub-neural networks for performing processing of calculation layers after a specific calculation layer, / 20. wherein the NN includes a plurality of sub-neural networks for performing processing of calculation layers after a specific calculation layer, (Yan: column 6 lines 38-60, FIG. 5 is a block diagram illustrating relationships between components of the classification module 250, 40 according to some example embodiments. A single standard deep CNN can be used as the building block of the fine prediction components of an HD-CNN. As shown in FIG. 5, a coarse category CNN 520 predicts the probabilities over coarse categories. Multiple branching CNNs 540-550 are independently added. In some example embodiments, branching CNNs 540-550 share the branching shallow layers 530. The coarse category CNN 520 and the multiple branching CNNs 540-550 each receive the input image and operate on it in parallel. Although each branching CNN 50 540-550 receives the input image and gives a probability distribution over the full set of fine categories, the result of each branching CNN 540-550 is only valid for a subset of categories. The multiple full predictions from branching CNNs 540-550 are linearly combined by the probabilistic averaging layer 560 to form the final fine category prediction, weighted by the corresponding coarse category probabilities.)
1. and wherein mutually different data from an output of the specific calculation layer are respectively inputted to the plurality of sub-neural networks. / 18. and wherein mutually different data from an output of the specific calculation layer are respectively inputted to the plurality of sub-neural networks. / 19. and wherein mutually different data from an output of the specific calculation layer are respectively inputted to the plurality of sub-neural networks, / 20. and wherein mutually different data from an output of the specific calculation layer are respectively inputted to the plurality of sub-neural networks. (Yan: column 5 lines 43-57, FIG. 3 is a block diagram illustrating components of the device 150, according to some example embodiments. The device 150 is shown as including an input module 310, a camera module 320, and a communication module 330, all configured to communicate with each other ( e.g., via a bus, shared memory, or a switch). Any one or more of the modules described herein may be implemented using hardware ( e.g., a processor of a machine). Moreover, any two or more of these modules may be combined into a single module, and the functions described herein for a single module may be subdivided among multiple modules. Furthermore, according to various example embodiments, modules described herein as being implemented within a single machine, database, or device may be distributed across multiple machines, databases, or devices.)
(Yan: column 7 lines 15-40, FIG. 5 also shows a set ofbranching CNNs 540-550, each of which makes a prediction over the full set of fine categories. In some example embodiments, the branching CNNs 540-550 share parameters in shallow layers 530 but have independent deep layers…. An overly large number of parameters in the model will increase the likelihood of overfitting. Third, both the computational cost and memory consumption of the HDCNN are also reduced by sharing the shallow layers, which is of practical significance to deploy HD-CNN in real applications. )

2. Yan teaches: The image identification apparatus according to claim 1, wherein the output of the specific calculation layer is expressed as data having a three-dimensional structure, and each of the sub-neural networks is configured to be inputted with data, that is within a range limited in relation to at least one dimensional direction from the three-dimensional structure, from the output of the specific calculation layer. (Yan: column 7 lines 15-40, FIG. 5 also shows a set of branching CNNs 540-550, each of which makes a prediction over the full set of fine categories. In some example embodiments, the branching CNNs 540-550 share parameters in shallow layers 530 but have independent deep layers. Shallow layers are the layers of a CNN that are closest to the original inputs while deep layers are the layers closer to the final output. Sharing parameters in shallow layers may provide the following benefits. First, in shallow layers, each CNN may extract primitive low-level features (e.g., blobs, corners) which are useful for classifying all fine categories. Accordingly, the shallow layers may be shared between the branching components even though each branching component is focused on a different set of fine categories. Second, sharing parameters in shallow layers greatly reduces the total number of parameters in the HD-CNN, which may aid in the success of training the HD-CNN model. If each branching fine category component is trained completely independently from the others, the number of free parameters in the HD-CNN will be linearly proportional to the number of coarse categories. An overly large number of parameters in the model will increase the likelihood of overfitting. Third, both the computational cost and memory consumption of the HDCNN are also reduced by sharing the shallow layers, which is of practical significance to deploy HD-CNN in real applications.)

3. Yan teaches: The image identification apparatus according to claim 1, wherein the NN is configured to be inputted with image data, and each of the sub-neural networks is configured to be inputted with data, that is within a limited range with respect to an image region, from the output of the specific calculation layer. (Yan: column 7 lines 1-40, FIG. 5, HD-CNN mainly comprises three parts, namely a single coarse category component B ( corresponding to the coarse category CNN 520), multiple branching fine category components {P}, for j in the range of 1 to C' (corresponding to the branching CNNs 540-550), and a single probabilistic averaging layer (corresponding to the probabilistic averaging layer 560). The single coarse category CNN 520 receives raw image pixel data as input and outputs a probability distribution over coarse categories. The coarse category probabilities are used by the probabilistic averaging layer 560 to assign weights to the full predictions made by the branching CNNs 540-550.)

4. Yan teaches: The image identification apparatus according to claim 3, wherein a first sub-neural network of the plurality of sub-neural networks is inputted with data from the output of the specific calculation layer corresponding to a first region of the image,, a second sub-neural  (Yan: column 7 all, specifically lines 15-40, FIG. 5 also shows a set of branching CNNs 540-550, each of which makes a prediction over the full set of fine categories. In some example embodiments, the branching CNNs 540-550 share parameters in shallow layers 530 but have independent deep layers. Shallow layers are the layers of a CNN that are closest to the original inputs while deep layers are the layers closer to the final output. Sharing parameters in shallow layers may provide the following benefits. First, in shallow layers, each CNN may extract primitive low-level features (e.g., blobs, corners) which are useful for classifying all fine categories. Accordingly, the shallow layers may be shared between the branching components even though each branching component is focused on a different set of fine categories. Second, sharing parameters in shallow layers greatly reduces the total number of parameters in the HD-CNN, which may aid in the success of training the HD-CNN model. If each branching fine category component is trained completely independently from the others, the number of free parameters in the HD-CNN will be linearly proportional to the number of coarse categories. An overly large number of parameters in the model will increase the likelihood of overfitting. Third, both the computational cost and memory consumption of the HDCNN are also reduced by sharing the shallow layers, which is of practical significance to deploy HD-CNN in real applications. Column 8 lies 18-25, FIG. 6 is a flowchart illustrating operations of the HDCNN server 130 in performing a process 600 of identifying coarse categories, according to some example embodiments. The process 600 includes operations 610, 620, 630, 640, and 650. By way of example only and not limitation, the operations 610-650 are described as being performed by the modules 210-260.)

5. Yan teaches: The image identification apparatus according to claim 1, wherein the plurality of sub-neural networks have the same hierarchical structure and different calculation parameters. (Yan: Column 3 lines 4-21, Figure 1 FIG. 1 is a network diagram illustrating a network environment 100 suitable for creating and using hierarchical deep CNNs for image classification, according to some example embodiments. Columns 3-4 column 7 all, specifically lines 15-40, FIG. 5 also shows a set of branching CNNs 540-550, each of which makes a prediction over the full set of fine categories. In some example embodiments, the branching CNNs 540-550 share parameters in shallow layers 530 but have independent deep layers. Shallow layers are the layers of a CNN that are closest to the original inputs while deep layers are the layers closer to the final output. Sharing parameters in shallow layers may provide the following benefits. First, in shallow layers, each CNN may extract primitive low-level features (e.g., blobs, corners) which are useful for classifying all fine categories. Accordingly, the shallow layers may be shared between the branching components even though each branching component is focused on a different set of fine categories.)

6. Yan teaches: The image identification apparatus according to claim 1, wherein the plurality of sub-neural networks have different hierarchical structure with each other. (Yan: Column 3 lines 4-21, Figure 1 FIG. 1 is a network diagram illustrating a network environment 100 suitable for creating and using hierarchical deep CNNs for image classification, according to some example embodiments. Columns 3-4 column 7 lines 15-40, FIG. 5 Column 8 lies 18-25, FIG. 6 is a flowchart illustrating operations of the HDCNN server 130 in performing a process 600 of identifying coarse categories, according to some example embodiments. The process 600 includes operations 610, 620, 630, 640, and 650. By way of example only and not limitation, the operations 610-650 are described as being performed by the modules 210-260.)

7. Yan teaches: The image identification apparatus according to claim 1, wherein at least one of the sub-neural networks has a first portion for performing processing with a part of the output of the specific calculation layer as an input, and second and third portions for performing processing with mutually different data from an output of the first portion as inputs. (Yan: column 7 all, specifically lines 15-62, FIG. 5 also shows a set of branching CNNs 540-550, each of which makes a prediction over the full set of fine categories. In some example embodiments, the branching CNNs 540-550 share parameters in shallow layers 530 but have independent deep layers. Shallow layers are the layers of a CNN that are closest to the original inputs while deep layers are the layers closer to the final output. The probabilistic averaging layer 560 receives all branching CNN 540-550 predictions as well as the coarse category CNN 520 prediction and produces a weighted average as the final prediction for image i, p(x,), as shown by the equation below…. Both the coarse category CNN 520 and the branching CNNs 540-550 can be implemented as any end-to-end deep CNN model, which takes a raw image as input and returns probabilistic prediction over categories as output). 

8. Yan teaches: The image identification apparatus according to claim 1, wherein the plurality of sub-neural networks are configured so that processing can be performed independently without mutually exchanging calculation results at intermediate layers. (Yan: column 7 all, specifically lines 15-62, FIG. 5 also shows a set of branching CNNs 540-550, each of which makes a prediction over the full set of fine categories. In some example embodiments, the branching CNNs 540-550 share parameters in shallow layers 530 but have independent deep layers. Shallow layers are the layers of a CNN that are closest to the original inputs while deep layers are the layers closer to the final output. The probabilistic averaging layer 560 receives all branching CNN 540-550 predictions as well as the coarse category CNN 520 prediction and produces a weighted average as the final prediction for image i, p(x,), as shown by the equation below…. Both the coarse category CNN 520 and the branching CNNs 540-550 can be implemented as any end-to-end deep CNN model, which takes a raw image as input and returns probabilistic prediction over categories as output).

9. Yan teaches: The image identification apparatus according to claim 1, further comprising a combining calculation layer configured to combine outputs from the plurality of sub-neural networks. (Yan: column 7 all, specifically lines 15-62, FIG. 5 also shows a set of branching CNNs 540-550, each of which makes a prediction over the full set of fine categories. In some example embodiments, the branching CNNs 540-550 share parameters in shallow layers 530 but have independent deep layers. Shallow layers are the layers of a CNN that are closest to the original inputs while deep layers are the layers closer to the final output. The probabilistic averaging layer 560 receives all branching CNN 540-550 predictions as well as the coarse category CNN 520 prediction and produces a weighted average as the final prediction for image i, p(x,), as shown by the equation below…. Both the coarse category CNN 520 and the branching CNNs 540-550 can be implemented as any end-to-end deep CNN model, which takes a raw image as input and returns probabilistic prediction over categories as output).

10. Yan teaches: The image identification apparatus according to claim 1, further comprising: a calculation unit configured to calculate a cost of the NN; and a determination unit configured to determine parameters of the NN by performing training so as to reduce the cost. (Yan: column 7 lines 1-40, FIG. 5 also shows a set of branching CNNs 540-550, each of which makes a prediction over the full set of fine categories. In some example embodiments, the branching CNNs 540-550 share parameters in shallow layers 530 but have independent deep layers…. An overly large number of parameters in the model will increase the likelihood of overfitting. Third, both the computational cost and memory consumption of the HDCNN are also reduced by sharing the shallow layers, which is of practical significance to deploy HD-CNN in real applications. Columns 7-8, Column 8 lines 56-67, In operation 640, low-dimensional feature representations {f,}, for i in the range of 1 to C, are obtained for the fine categories. For example, the Laplacian eigenmap may be used for this purpose. The low-dimensional feature representations preserve local neighborhood information on a low-dimensional manifold and are used to cluster fine categories into coarse categories.)

11. Yan teaches: The image identification apparatus according to claim 10, wherein the calculation unit is further configured to calculate respective sub-costs of the plurality of sub-neural networks, and to calculate the cost of the NN based on the sub-costs. (Yan: column 7 lines 1-40, FIG. 5 also shows a set of branching CNNs 540-550, each of which makes a prediction over the full set of fine categories. In some example embodiments, the branching CNNs 540-550 share parameters in shallow layers 530 but have independent deep layers…. An overly large number of parameters in the model will increase the likelihood of overfitting. Third, both the computational cost and memory consumption of the HDCNN are also reduced by sharing the shallow layers, which is of practical significance to deploy HD-CNN in real applications. Columns 7-8, Column 8 lines 56-67, In operation 640, low-dimensional feature representations {f,}, for i in the range of 1 to C, are obtained for the fine categories. For example, the Laplacian eigenmap may be used for this purpose. The low-dimensional feature representations preserve local neighborhood information on a low-dimensional manifold and are used to cluster fine categories into coarse categories.)

12. Yan teaches: The image identification apparatus according to claim 11, wherein the calculation unit is further configured to calculate the cost of the NN using a weighted addition of the sub-costs with a weight for each of the plurality of sub- neural networks. (Yan: column 7 lines 1-40, The single coarse category CNN 520 receives raw image pixel data as input and outputs a probability distribution over coarse categories. The coarse category probabilities are used by the probabilistic averaging layer 560 to assign weights to the full predictions made by the branching CNNs 540-550. FIG. 5 also shows a set of branching CNNs 540-550, each of which makes a prediction over the full set of fine categories. In some example embodiments, the branching CNNs 540-550 share parameters in shallow layers 530 but have independent deep layers…. An overly large number of parameters in the model will increase the likelihood of overfitting. Third, both the computational cost and memory consumption of the HDCNN are also reduced by sharing the shallow layers, which is of practical significance to deploy HD-CNN in real applications. Columns 7-8, Column 8 lines 56-67, In operation 640, low-dimensional feature representations {f,}, for i in the range of 1 to C, are obtained for the fine categories. For example, the Laplacian eigenmap may be used for this purpose. The low-dimensional feature representations preserve local neighborhood information on a low-dimensional manifold and are used to cluster fine categories into coarse categories.)

13. Yan teaches:  The image identification apparatus according to claim 12, wherein the weight for each of the plurality of sub-neural networks is determined by the training. (Yan: column 7 lines 1-40, The single coarse category CNN 520 receives raw image pixel data as input and outputs a probability distribution over coarse categories. The coarse category probabilities are used by the probabilistic averaging layer 560 to assign weights to the full predictions made by the branching CNNs 540-550. FIG. 5 also shows a set of branching CNNs 540-550, each of which makes a prediction over the full set of fine categories. In some example embodiments, the branching CNNs 540-550 share parameters in shallow layers 530 but have independent deep layers…. An overly large number of parameters in the model will increase the likelihood of overfitting. Third, both the computational cost and memory consumption of the HDCNN are also reduced by sharing the shallow layers, which is of practical significance to deploy HD-CNN in real applications. Columns 7-8, Column 8 lines 56-67, In operation 640, low-dimensional feature representations {f,}, for i in the range of 1 to C, are obtained for the fine categories. For example, the Laplacian eigenmap may be used for this purpose. The low-dimensional feature representations preserve local neighborhood information on a low-dimensional manifold and are used to cluster fine categories into coarse categories.)

14. Yan teaches: The image identification apparatus according to claim 10, wherein the calculation unit is further configured to calculate the cost of the NN based on a result of combining outputs from the plurality of sub-neural networks. (Yan: column 7 lines 1-40, FIG. 5 also shows a set of branching CNNs 540-550, each of which makes a prediction over the full set of fine categories. In some example embodiments, the branching CNNs 540-550 share parameters in shallow layers 530 but have independent deep layers…. An overly large number of parameters in the model will increase the likelihood of overfitting. Third, both the computational cost and memory consumption of the HDCNN are also reduced by sharing the shallow layers, which is of practical significance to deploy HD-CNN in real applications. Columns 7-8, Column 8 lines 56-67, In operation 640, low-dimensional feature representations {f,}, for i in the range of 1 to C, are obtained for the fine categories. For example, the Laplacian eigenmap may be used for this purpose. The low-dimensional feature representations preserve local neighborhood information on a low-dimensional manifold and are used to cluster fine categories into coarse categories.)

15. Yan teaches: The image identification apparatus according to claim 1, wherein the plurality of sub-neural networks independently output respective results. (Yan: column 7 all, specifically lines 15-62, FIG. 5 also shows a set of branching CNNs 540-550, each of which makes a prediction over the full set of fine categories. In some example embodiments, the branching CNNs 540-550 share parameters in shallow layers 530 but have independent deep layers. Shallow layers are the layers of a CNN that are closest to the original inputs while deep layers are the layers closer to the final output. The probabilistic averaging layer 560 receives all branching CNN 540-550 predictions as well as the coarse category CNN 520 prediction and produces a weighted average as the final prediction for image i, p(x,), as shown by the equation below…. Both the coarse category CNN 520 and the branching CNNs 540-550 can be implemented as any end-to-end deep CNN model, which takes a raw image as input and returns probabilistic prediction over categories as output).

16. Yan teaches: The image identification apparatus according to claim 1, wherein the plurality of sub-neural networks are respectively inputted with independent data and respectively outputs independent data after the specific calculation layer. (Yan: column 7 all, specifically lines 15-62, FIG. 5 also shows a set of branching CNNs 540-550, each of which makes a prediction over the full set of fine categories. In some example embodiments, the branching CNNs 540-550 share parameters in shallow layers 530 but have independent deep layers. Shallow layers are the layers of a CNN that are closest to the original inputs while deep layers are the layers closer to the final output. The probabilistic averaging layer 560 receives all branching CNN 540-550 predictions as well as the coarse category CNN 520 prediction and produces a weighted average as the final prediction for image i, p(x,), as shown by the equation below…. Both the coarse category CNN 520 and the branching CNNs 540-550 can be implemented as any end-to-end deep CNN model, which takes a raw image as input and returns probabilistic prediction over categories as output).

17. Yan teaches: The image identification apparatus according to claim 1, wherein the NN comprises a single neural network before the specific calculation layer. (Yan: column 5 lines 43-67, Figure 3, column 6 lines 38-57, Figure 5, FIG. 5 is a block diagram illustrating relationships between components of the classification module 250, according to some example embodiments. A single standard deep CNN can be used as the building block of the fine prediction components of an HD-CNN. As shown in FIG. 5, a coarse category CNN 520 predicts the probabilities over coarse categories. Multiple branching CNNs 540-550 are independently added. In some example embodiments, branching CNNs 540-550 share the branching shallow layers 530. The coarse category CNN 520 and the multiple branching CNNs 540-550 each receive the input image and operate on it in parallel. Although each branching CNN 540-550 receives the input image and gives a probability distribution over the full set of fine categories, the result of each branching CNN 540-550 is only valid for a subset of categories. The multiple full predictions from branching CNNs 540-550 are linearly combined by the probabilistic averaging layer 560 to form the final fine category prediction, weighted by the corresponding coarse category probabilities.)

Conclusion
The prior art made of record in form PTO-892 and not relied upon is considered pertinent to applicant's disclosure. 
XIAO; Ying et al., US 20180025079 A1, VIDEO SEARCH METHOD AND APPARATUS
SMOLIC; Aljoscha et al., US 20170011264 A1, SYSTEMS AND METHODS FOR AUTOMATIC KEY FRAME EXTRACTION AND STORYBOARD INTERFACE GENERATION FOR VIDEO
Liu; Shujie et al., US 20160358628 A1, HIERARCHICAL SEGMENTATION AND QUALITY MEASUREMENT FOR VIDEO EDITING
SHI; Jianping et al., US 20200394414 A1, KEYFRAME SCHEDULING METHOD AND APPARATUS, ELECTRONIC DEVICE, PROGRAM AND MEDIUM
Gonzalez Aguirre; David I. et al., US 20190314984 A1, Automatic Robot Perception Programming by Imitation Learning
Tang; Xiaoou et al., US 9530047 B1, Method and system for face image recognition
WANG; Xiaogang et al., US 20190122035 A1, METHOD AND SYSTEM FOR POSE ESTIMATION
Nakano; Shunsuke et al., US 10699102 B2, Image identification apparatus and image identification method
Any inquiry concerning this communication or earlier communications from the examiner should be directed to TAHMINA ANSARI whose telephone number is 571-270-3379.  The examiner can normally be reached on IFP Flex - Monday through Friday 9 to 5.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, SUMATI LEFKOWITZ can be reached on 571-272-3638.  The fax phone numbers for the organization where this application or proceeding is assigned are 571-273-8300 for regular communications and 571-273-8300 for After Final communications. TC 2600’s customer service number is 571-272-2600.





2662
/Tahmina Ansari/

April 10, 2021

/TAHMINA N ANSARI/Primary Examiner, Art Unit 2662