DETAILED ACTION
 
Notice of Pre-AIA  or AIA  Status
            The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
 
Status of Claims
            The following claim(s) is/are pending in this Office action: 1-20.
            Claim(s) 1-20 are rejected.  This rejection is NON-FINAL.
 
Claim Objections
            Claims 1, 6, and 9-12 stand objected to because of the following informalities:
(a)      Claim 1: The examiner suggests amending the limitation “for each segment, calculate a descriptor of the segment using values associated with the one or more artificial neurons of the segment; and” to recite “for each segment of the plurality of segments, calculate a first descriptor of the segment using values associated with the one or more artificial neurons of the segment; and”.  The examiner also suggests amending the limitation “compile a descriptor of the artificial neural network using at least part of the plurality of calculated descriptors of the segments.” to recite “compile a second descriptor of the artificial neural network using at least part of the plurality of calculated first descriptors of the segments.”
(b)      Claim 6: The examiner suggests amending the limitation “wherein the at least one processor is further configured to compare the descriptor of the artificial neural network third descriptor of a second artificial neural network to obtain a matching score.”
(c)      Claim 9: Claim 9 contains clerical informalities.  The examiner suggests amending “calculating descriptor of the segment …” to recite “calculating a segment descriptor of the segment …” and amending “using at least part of the plurality of calculated descriptors of the segments” to recite “using at least part of segment descriptors of the segments”.
(d)      Claim 10: Claim 10 contains a clerical informality. The examiner suggests amending “wherein the values associated with the one or more artificial neurons comprises outputs of the one or more artificial neurons for selected inputs” into “wherein the values associated with the one or more artificial neurons comprise outputs of the one or more artificial neurons for selected inputs”.
(e)      Claim 11: Claim 11 contains a clerical informality. The examiner suggests amending “wherein the values associated with the one or more artificial neurons comprises parameters of the one or more artificial neurons” into “wherein the values associated with the one or more artificial neurons comprise parameters of the one or more artificial neurons.”
(f)      Claim 12: Claim 12 contains a clerical informality. The examiner suggests amending “wherein the values associated with the one or more artificial neurons comprises hyper-parameters …” into “wherein the values associated with the one or more artificial neurons comprise hyper-parameters …”. Appropriate correction is required.
second network descriptor of a second artificial neural network to obtain a matching score” to distinguish from the limitation “compiling a first network descriptor of the first artificial neural network using at least part of the plurality of calculated descriptors of the segments” recited in base claim 9. 

 
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.
 
The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.
 
            Claims 1-18 and 20 stand rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
(a)      Claim 1:
(1)      The limitation “the plurality of the calculated descriptors” and “the segments” in “the plurality of the calculated descriptors of the segments” lack proper antecedent basis.

(c)      Claim 5: The limitation “the graph nodes” lacks proper antecedent basis.
(d)      Claims 7 and 17: The limitation “wherein the second artificial neural network is a result of using a machine learning algorithm to update the artificial neural network” is indefinite because it is unclear what the claimed “a result of using a machine learning algorithm to update the artificial neural network” refers to.  More specifically, does the claimed limitation refer to devising a separate neural network based on updating the artificial neural network or modifying the artificial neural network into the claimed “second artificial neural network”.  Clarification is thus required.
(e)       Claim 9:
(1) The limitation “the artificial neural network” lacks proper antecedent basis.  For purpose of examination, this limitation is interpreted as “a first artificial neural network”.
(f)      Claims 10-18 depend from claim 9 and thus inherit the above defects from claim 9.  Claims 10-18 are thus rejected under 35 U.S.C. § 112(b) for at least the foregoing reasons, the same rationale(s) applying.
(g)      Claim 15: The limitation “the graph nodes” lacks proper antecedent basis and is thus separately rejected under 35 U.S.C. § 112(b) for this reason.
(h)       Claim 16:
(1) The claimed “a second artificial neural network” is unclear due to the absence of “a first artificial neural network” because the base independent claim 1 recites a first artificial neural network”, and the above limitation is interpreted as “a second artificial neural network” as recited in claim 16. Alternatively, the limitation “the artificial neural network” recited in claim 9 is interpreted as “an artificial neural network”, and the above limitation is interpreted as “a separate artificial neural network”.
 (i)       Claim 18: the limitation “comparing the descriptors …” is indefinite because claim 16 and its base claims recite at least three different descriptors (each of which is referred to as a “descriptor”).  Therefore, it is unclear which “descriptors” are being compared despite the recitation of the adverbial phrase “based on a statistical distance between the first distribution and the second distribution”. 
(j)      Claim 20:
(1) The limitation “the segments” lacks proper antecedent basis. For purpose of examination, this limitation is interpreted as “the plurality of segments of the artificial neural network”.
(2) The limitations “descriptor of the segment” and “a descriptor of the artificial neural network” are indefinite.  For the purpose of examination, “descriptor of the segment” is interpreted as “a segment descriptor of the segment”; and “a descriptor of the artificial neural network” is interpreted as “a first network descriptor of the artificial neural network”.

 
Claim Rejections - 35 USC § 101
            35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
 
Claims 1-20 stand rejected under 35 U.S.C. 101 because the claimed invention is directed to a judicial exception without significantly more.

Step 1: Claims 1-8 are directed to machines; claims 9-19 are directed to processes; and claim 20 is directed to manufactures.
 
Step 2A – Prong One: This part of the eligibility analysis evaluates whether the claim recites a judicial exception. As explained in MPEP 2106.04(II) and the October 2019 Update, a claim “recites” a judicial exception when the judicial exception is “set forth” or “described” in the claim.
With respect to claim 1, claim 1 recites the abstract idea as shown in the following judicial exception(s).  Claims 9 and 20 also respectively recite identical and/or substantially similar limitations of claim 1 and are thus rejected in the same manner, the same art and reasoning applying. 
analyze the artificial neural network to obtain a plurality of segments of the artificial neural network, each segment comprises one or more artificial neurons;
for each segment, calculate a descriptor of the segment using values associated with the one or more artificial neurons of the segment; and

Claim 1 further recite the following additional elements that are analyzed under Step 2A Prong Two and Step 2B below. 
at least one storage device configured to store an artificial neural network; and
at least one processor configured to:
 (a)       Step 2A Prong One:
(1)      Regarding the limitation analyze the artificial neural network to obtain a plurality of segments of the artificial neural network, each segment comprises one or more artificial neurons: (
          Mental Process: The examiner notes that the above limitation is merely analyzing collected information (e.g., an artificial neural network) to obtain certain results of the analysis (e.g., topological or architectural characteristic such as segments of the artificial neural network) and thus recites a mental process.  See MPEP § 2106.04-(III)-(A). Therefore, the above limitation fails to satisfy Step 2A Prong One.)
 
(2)      Regarding the limitation for each segment, calculate a descriptor of the segment using values associated with the one or more artificial neurons of the segment; (mathematical principle / relationship: According to ¶ [0185] of the present disclosure, the examiner notes that this limitation merely recites a simple mathematical operation of computing an average (involving additions and division on neuron input values) or a weighted average of values (e.g., involving element-wise multiplications, additions, and division to compute an average of neuron input values or average of neuron input values weighted by respective weights of the corresponding connections) associated with one or more neurons in a portion of a and is thus directed to an abstract idea of a basic mathematical principle / concept and thus fails Step 2A Prong One. See MPEP § 2106.04-(I).  Therefore, the above limitation fails to satisfy Step 2A Prong One.)
 
(3)      Regarding the limitation compile a descriptor of the artificial neural network using at least part of the plurality of calculated descriptors of the segments: (mathematical principle / relationship: According to ¶ [0186] of the present disclosure, the examiner notes that this limitation merely recites a simple mathematical principle of aggregating or combining a plurality of values into a set of values and is thus directed to an abstract idea of a basic mathematical principle / concept and fails Step 2A Prong One. See MPEP § 2106.04-(I).   Therefore, the above limitation fails to satisfy Step 2A Prong One.)
 
(b)       Step 2A Prong Two:
       Regarding the additional element at least one processor configured to: (The examiner notes that this additional element, when analyzed individually, merely concerns a field of use of a processor and/or merely instruction to implement the above judicial exception on a processor (“apply it”) recited at a high level of generality.  Therefore, the above limitation, when analyzed individually, fails to satisfy Step 2A Prong Two.  See MPEP § 2106.05(b)-(III) and § 2106.05(f).)
(c)	Step 2B: 
(1)	Regarding the additional element at least one storage device configured to store an artificial neural network; and: (Well-understood, routing, conventional activity: The examiner notes that this additional element is not only recited at a generic manner but also merely recites storing information in memory such as a storage device. This merely storing information in memory recited in a generic manner has been found to constitute well-understood, routine, and conventional functions that fail to amount to significantly more than the claimed judicial exception to satisfy Step 2B. See MPEP § 2106.05-(II)-(iv).)
(2)	Regarding the additional element at least one processor configured to: [functional limitations omitted): (The examiner notes that this additional element merely pertains to instruction to implement the above judicial exception on a processor (“apply it”) recited at a high level of generality.  Therefore, the above limitation, when analyzed individually, fails to satisfy Step 2B.  See MPEP § 2106.05(h).)
          Therefore, the examiner asserts that claim 1 is directed to a judicial exception without additional elements that amount to significantly more than the claimed judicial exception and is thus rejected under 35 U.S.C. § 101 for at least the foregoing reasons.
 
          With respect to claim 2, claim 2 recites the abstract idea as shown in the following judicial exception(s). Claims 10-12 also respectively recites identical and/or substantially similar corresponding portions of the limitations of claim 2 and are thus rejected in the same manner, the same art and reasoning applying. 
wherein the values associated with the one or more artificial neurons comprises at least one of outputs of the one or more artificial neurons for selected inputs, parameters of the one or more artificial neurons, and hyper-parameters associated with the one or more artificial neurons.
(a)       Step 2A Prong One:
The examiner notes that this limitation merely recites an observation of what the parameters encompass (e.g., parameters such as weights, hypermeters for a neural network, values corresponding to the outputs of neurons, etc.) and is thus directed to a mental process that can be performed in the mind of a human.  Therefore, claim 2 is directed to an abstract idea (a mental process) and thus fails to satisfy Step 2A Prong One. See MPEP § 2106.04(a)(3).
 
(b)       Step 2A Prong Two:

 
          With respect to claim 3, claim 3 recites the abstract idea as shown in the following judicial exception(s).  Claim 13 also recites identical and/or substantially similar limitations of claim 3 and is thus rejected in the same manner, the same art and reasoning applying. 
wherein calculating the descriptor of the segment comprises using a hash function of at least part of the values associated with the one or more artificial neurons.
(a)       Step 2A Prong One:
The examiner notes that this limitation merely recites performing a basic hash function on part of the values and is thus directed to a mathematical concept / principle.  Therefore, this limitation fails to satisfy Step 2A Prong One.
The examiner further notes that this limitation merely recites applying a hash function that maps at least part of the values to another value and is thus directed to a mental process that can be performed in the mind of a human.  Therefore, claim 3 is directed to an abstract idea (a mental process) and thus fails to satisfy Step 2A Prong One. See MPEP § 2106.04(a)(3).
 
Step 2A Prong Two:
The examiner notes that claim 3 merely recites a judicial exception (mathematical concept and/or mental process) but does not recite any additional elements, much less additional elements that integrate the claimed judicial exception into a practical application to satisfy Step 2A or additional elements that amount to significantly more than the judicial exception to satisfy Prong Two and Step 2B. Claim 3 is thus rejected under 35 U.S.C. § 101 for at least the foregoing reasons.
 
          With respect to claim 4, claim 4 recites the abstract idea as shown in the following judicial exception(s). The non-bolded limitations denote additional elements that are analyzed under Step 2A Prong Two and Step 2B below.  Claim 14 also recites identical and/or substantially similar limitations of claim 4 and is thus rejected in the same manner, the same art and reasoning applying. 
wherein calculating the descriptor of the segment comprises identifying a property of a distribution of at least part of the values associated with the one or more artificial neurons.
(a)       Step 2A Prong One:
The examiner further notes that this limitation merely recites an observation of a distribution of data (e.g., a distribution of some captured data such as noise, colors, errors, etc.) and is thus directed to a mental process that can be performed in the mind of a human, with or without a physical aid.  Therefore, claim 4 is directed to an abstract idea (a mental process) and thus fails to satisfy Step 2A Prong One. See MPEP § 2106.04(a)(3).
 
(b)       Step 2A Prong Two:
The examiner notes that claim 4 merely recites a judicial exception (mental process) but does not recite any additional elements, much less additional elements that integrate the claimed judicial exception into a practical application to satisfy Step 2A or additional elements that amount to significantly more than the judicial exception to satisfy Prong Two and Step 2B. Claim 4 is thus rejected under 35 U.S.C. § 101 for at least the foregoing reasons.
 
          With respect to claim 5, claim 5 recites the abstract idea as shown in the following bolded limitations. The non-bolded limitations denote additional elements that are analyzed under Step 2A Prong Two and Step 2B below.  Claim 15 also recites identical and/or substantially similar limitations of claim 5 and is thus rejected in the same manner, the same art and reasoning applying. 
wherein compiling a descriptor of the artificial neural network comprises constructing a graph, where the graph nodes correspond to the plurality of segments.
(a)       Step 2A Prong One:
The examiner notes that this limitation merely recites performing a basic mathematical concept / principle of building a graph with multiple nodes and is thus directed to a mathematical concept / principle.  Therefore, this limitation fails to satisfy Step 2A Prong One.
 
(b)       Step 2A Prong Two:
The examiner notes that claim 5 merely recites a judicial exception (mental process) but does not recite any additional elements, much less additional elements that integrate the claimed judicial exception into a practical application to satisfy Step 2A or additional elements that amount to significantly more than the judicial exception to satisfy Prong Two and Step 2B. Claim 5 is thus rejected under 35 U.S.C. § 101 for at least the foregoing reasons.
 
          With respect to claim 6, claim 6 recites the abstract idea as shown in the following judicial exception(s). The non-bolded limitations denote additional elements that are analyzed under Step 2A Prong Two and Step 2B below.  Claim 16 also recites identical and/or substantially similar limitations of claim 6 and is thus rejected in the same manner, the same art and reasoning applying. 
wherein the at least one processor is further configured to compare the descriptor of the artificial neural network with a descriptor of a second artificial neural network to obtain a matching score.
Claim 6 further recites the additional elements “the at least one processor is further configured to …” that will be analyzed in Step 2A Prong Two and Step 2B below. 
(a)       Step 2A Prong One:

The examiner further notes that this limitation merely recites comparing two values (e.g., descriptors) that can be performed in the mind of a human, with or without physical aid. 
Therefore, claim 6 is directed to an abstract idea (a mathematical concept / principle and/or a mental process) and thus fails to satisfy Step 2A Prong One. See MPEP § 2106.04(a)(3).
 
(b)       Step 2A Prong Two & Step 2B:
          Therefore, claim 6 merely recites a judicial exception without reciting any additional elements, let alone additional elements that integrate the claimed judicial exception into a practical application to satisfy Step 2A Prong Two or additional elements that amount to significantly more than the claimed judicial exception to satisfy Step 2B and is thus rejected under 35 U.S.C. § 101 for at least the foregoing reasons.
         
          With respect to claim 7, claim 7 recites the abstract idea as shown in the following judicial exception(s).  Claim 17 also recites identical and/or substantially similar 
wherein the second artificial neural network is a result of using a machine learning algorithm to update the artificial neural network.
(a)       Step 2A Prong One:
The examiner notes that this limitation merely recites a mathematical concept / principle of performing an algorithm to update another model that is generically recited at a high level of generality.  Therefore, this limitation is merely directed to generic mathematical calculations or relationships (e.g., a machine learning algorithm) to perform generic updates on a neural network. The examiner notes that this has been found to be an abstract idea.  See MPEP § 2106.04(a)(2)(I).
 
(b)       Step 2A Prong Two:
The examiner notes that the additional elements of updating a neural network by using a result of a generically recited machine learning algorithm, when analyzed individually and again in an ordered combination, merely recite an idea of a solution or outcome (e.g., an idea of using a generic machine learning algorithm to update a neural network recited in a generic manner) to cover any solution that uses any machine learning algorithm to generically update a neural network without restricting on how the result is accomplished. Nonetheless, 
Therefore, claim 7 fails to satisfy Step 2A Prong Two.
(c)       Step 2B:
The examiner notes that these additional elements, when analyzed individually and again as an ordered combination, merely generally recites an effect or outcome (to update the second artificial neural network) of the claimed judicial exception (mathematical calculations or relationships).  Generically reciting an effect or output (e.g., a result of using a machine learning algorithm as recited in claim 7) of a claimed judicial exception (e.g., a machine learning algorithm to update the artificial neural network as recited in claim 7) has been found to be merely adding the words “apply it” to the judicial exception” and thus failing to amount to significantly more than the claimed judicial exception to satisfy Step 2B.  See MPEP § 2106.05(f)(3). 
          Therefore, claim 7 merely recites a judicial exception (mental process) but does not recite any additional elements, much less additional elements that integrate the claimed 
         
          With respect to claim 8, claim 8 recites the abstract idea as shown in the following judicial exception(s). The additional elements are analyzed under Step 2A Prong Two and Step 2B below.  Claim 19 also recites identical and/or substantially similar limitations of claim 8 and is thus rejected in the same manner, the same art and reasoning applying. 
(a)       Step 2A Prong One:
compare the matching score with at least one threshold; (abstract idea: mathematical concept / principle.  The examiner notes that this limitation merely recites a basic mathematical operation of comparing two values and is thus directed to an abstract idea of a basic mathematical concept / principle.  In addition, the examiner notes that this limitation of comparing two values can be performed in the mind of a human, with or without a physical aid.  Therefore, this limitation is thus directed to a mental process and thus fails to satisfy Step 2A Prong One. Further, this limitation does not recite any additional elements and thus requires no further analysis.)
decide, based on the comparison of the matching score and the at least one threshold, to utilize the second artificial neural network; and (abstract idea: mental process.  The examiner notes that this limitation is merely reciting making an observation, opinion, or judgment as to whether a model or a process is good based on a result of comparing two values which can be performed in the mind of a human, with or without a physical aid.  For example, a human can perform the aforementioned comparison between a value generated by an algorithm and a threshold to determine whether the algorithm is suitable for use.  Therefore, this limitation is thus directed to a mental process and thus fails to satisfy Step 2A Prong One. Further, this limitation does not recite any additional elements and thus requires no further analysis)
Therefore, claim 8 is directed to the aforementioned judicial exception and thus fails to satisfy Step 2A Prong One.
 (b)       Step 2B:
receive the artificial neural network from an external device using a communication device; (Well-understood, routine, and conventional activity:  The examiner notes that receiving an artificial neural network is merely directed to receiving data from an external device over a network (e.g., via a communication device) that is recited at a high level of generality and is previously known in the art. More specifically, this limitation at issue does not specify how interactions (e.g., receiving data) with the claimed external device are manipulated to yield a result that overrides the routine and conventional sequence of events that are ordinarily known in the art. This has been found to be insufficient to satisfy Step 2B.  See MPEP § 2106.05(d) (I)(2).)
based on the decision to utilize the second artificial neural network, transmit the second artificial neural network to the external device. (well-understood, routine, and conventional functions: The examiner notes that this additional element, when analyzed individually or as an ordered combination, merely recites transmitting data to an external device at a high level of generality and is thus similar to receiving or transmitting data over a network.  More specifically, this limitation at issue does not specify how interactions (e.g., transmitting data) with the claimed external device are manipulated to yield a result that overrides the routine and conventional sequence of events that are ordinarily known in the art. This has been found to constitute well-understood, routine, and conventional functions that fail to amount to significantly more than the claimed judicial exception to satisfy Step 2B.  See MPEP § 2106.05(d)(II)(i)).
Therefore, claim 8 merely recites a judicial exception (mental process) but does not recite any additional elements, much less additional elements that integrate the claimed judicial exception into a practical application to satisfy Prong Two of Step 2A or any additional elements that amount to significantly more than the judicial exception to satisfy Step 2B and is thus rejected under 35 U.S.C. § 101 for at least the foregoing reasons.
 
          With respect to claim 18, claim 18 recites the abstract idea as shown in the following judicial exception(s). The additional elements are analyzed under Step 2A Prong Two and Step 2B below. 
(a)       Step 2A Prong One:
(mental process: The examiner notes that the claimed limitation is directed to observation of a distribution of values (e.g., a distribution of values of the claimed descriptor) that can be performed in the human mind and is merely an ineligible mental process.  See MPEP § 2106.04(a)(3).)
the descriptor of the second artificial neural network comprises a second distribution, and (mental process: The examiner similarly notes that the claimed limitation is directed to observation of a distribution of values (e.g., a distribution of values of the claimed descriptor) that can be performed in the human mind and is merely an ineligible mental process.  See MPEP § 2106.04(a)(3).)
comparing the descriptors is based on a statistical distance between the first distribution and the second distribution. (mathematical concept / principle: The examiner notes that this limitation merely recites performing a basic mathematical concept / principle of representing how close two entities are (hence comparison between the two entities) with a distance between the two entities.  Therefore, this limitation is directed to an abstract idea (mathematical concept / principle) and thus fails to satisfy Step 2A Prong One.)
Therefore, claim 18 is directed to the aforementioned judicial exception and thus fails to satisfy Step 2A Prong One.
(b)       Step 2A Prong Two and Step 2B:
The examiner notes that claim 18 merely recites a judicial exception without reciting any additional elements, let alone additional elements that integrate the claimed judicial exception into a practical application to satisfy Step 2A Prong Two or additional 
Therefore, claim 18 is rejected under 35 U.S.C. § 101 for at least the foregoing reasons.
 
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. 
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.


This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
 
7.            Claim(s) 1-2, 6-7, 9-12, 16-17, and 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Jaech et al. US PGPub 2018 / 0349477 A1 effectively filed on June 6, 2017 (hereinafter Jaech) in view of Chopra et al., Learning a Similarity Metric Discriminatively, with Application to Face Verification (2005) (hereinafter Chopra).
With respect to claim 1, Jaech teaches:
A system for generating descriptors of artificial neural networks, the system comprising: (Jaech at ¶ [0058]: “Huang et al. [17] introduced the first Deep Neural Network architectures for Web search that operated on (query, title) pairs, using a so-called siamese architecture [23], in which two feed-forward networks NNQ and NND map the query q and the title of a given web document d, respectively, into fixed-length representations: Φ(q,d)=cos(NNQ(q),NND(d)), The social-networking system may, in this approach, then rank the final documents based on their similarity to the query in this space computed using cosine similarity.”  The examiner notes that Jaech’s multiple neural networks (NNQ and NND) that rank objects based on their similarity teaches a system for generating descriptors of artificial neural networks.)
at least one storage device configured to store an artificial neural network; and at least one processor configured to: (Jaech at ¶ [0139]: “In particular embodiments, computer system 1200 includes a processor 1202, memory 1204, storage 1206, an input/output (I/O) interface 1208, a communication interface 810, and a bus 812.” ¶ [0058]: “Huang et al. [17] introduced the first Deep Neural Network architectures for Web search that operated on (query, title) pairs, using a so-called siamese architecture [23], in which two feed-forward networks NNQ and NND map the query q and the title of a given web document d, respectively, into fixed-length representations: Φ(q,d)=cos(NNQ(q),NND(d)), The social-networking system may, in this approach, then rank the final documents based on their similarity to the query in this space computed using cosine similarity.”)
 
 
analyze the artificial neural network to obtain a plurality of segments of the artificial neural network, each segment comprises one or more artificial neurons; (Jaech at ¶ [0058]: “Huang et al. [17] introduced the first Deep Neural Network architectures for Web search that operated on (query, title) pairs, using a so-called siamese architecture [23], in which two feed-forward networks NNQ and NND map the query q and the title of a given web document d, respectively, into fixed-length representations: Φ(q,d)=cos(NNQ(q),NND(d)), The social-networking system may, in this approach, then rank the final documents based on their similarity to the query in this space computed using cosine similarity. The application of convolutional neural networks, in lieu of feed-forward-networks, by Shen et al. [41] marks the next notable advancement using the same siamese architecture.”  ¶ [0104]: “Semantic Similarity Model (SSM):” ¶ [0105]: “A model using the siamese network architecture based on the Semantic Similarity Models (SSM) appearing in other work [17, 34, 41] has been constructed. Detailed procedure for the SSM model is shown in FIG. 8.” The examiner notes that Jaech’s use of two neural networks NNQ and NND each having a plurality of neurons as shown in FIG. 8 or the use of a Siamese architecture having multiple segments with a plurality of neurons teaches the above limitation.)
 
Jaech does not appear to explicitly teach:
for each segment, calculate a descriptor of the segment using values associated with the one or more artificial neurons of the segment; and 
compile a descriptor of the artificial neural network using at least part of the plurality of calculated descriptors of the segments. 
 
Chopra does, however, teach:
for each segment, calculate a descriptor of the segment using values associated with the one or more artificial neurons of the segment; and (Chopra, FIG. 1 (reproduction omitted) and p. 3, § 2.2 “The energy function of the EBM”, ¶ 2: “Let X1 and X2 be a pair of images shown to our learning machine. Let Y be a binary label of the pair, Y = 0 if the images X1 and X2 belong to the same person (a ‘genuine pair’) and Y = 1 otherwise (an ‘imposter pair’). Let W be the shared parameter vector that is subject to learning, and let GW(X1) and GW(X2) be the two points in the low-dimensional space that are generated by mapping X1 and X2. Then our system can be viewed as a scalar ‘energy function’ EW(X1, X2) that measures the compatibility between X1, X2. It is defined as: 

    PNG
    media_image1.png
    44
    296
    media_image1.png
    Greyscale
         (1)
p. 3, § 2.2, ¶ 3: “Given a genuine pair from the training set (X1, X2), and an impostor pair from the training set (X1, X2’), the machine behaves in a desirable manner if the following condition holds:

    PNG
    media_image2.png
    47
    421
    media_image2.png
    Greyscale
).
The examiner notes that Chopra’s two neural networks teach two segments, that these two networks’ respectively receiving X1 and input X2 as inputs teaches values associated with neurons in the segment, and that these two neural networks respectively compute GW(X1) and GW(X2) based on the corresponding input x1 and x2 teaches calculating a descriptor for each segment as claimed.)
 
compile a descriptor of the artificial neural network using at least part of the plurality of calculated descriptors of the segments. (Chopra, FIG. 1 (reproduction omitted) and p. 3, § 2.2 “The energy function of the EBM” p. 3, § 2.2, ¶ 2: “Let X1 and X2 be a pair of images shown to our learning machine. Let Y be a binary label of the pair, Y = 0 if the images X1 and X2 belong to the same person (a ‘genuine pair’) and Y = 1 otherwise (an ‘imposter pair’). Let W be the shared parameter vector that is subject to learning, and let GW(X1) and GW(X2) be the two points in the low-dimensional space that are generated by mapping X1 and X2. Then our system can be viewed as a scalar ‘energy function’ EW(X1, X2) that measures the compatibility between X1, X2. It is defined as:

    PNG
    media_image1.png
    44
    296
    media_image1.png
    Greyscale
         (1)
The examiner notes that Chopra’s energy EW(X1, X2) that is assigned to each configuration of variables modeled in Chopra is obtained by compiling GW(X1) and GW(X2) to determine EW(X1, X2) and thus teaches a descriptor for the neural network as also evidenced by FIG. 1).
 
Jaech and Chopra are analogous art because both pertain to determining similarity between digital contents by using neural networks.  
It would have been obvious for a person of ordinary skill in the art prior to the effective filing date to have modified Jaech’s identification of a plurality of segments by analyzing a neural network (Jaech, supra) with Chopra’s calculating a segment descriptor for each segment of the neural network and compiling a network descriptor from the calculated segment descriptors  (Chopra, supra). The modification applies Chopra’s energy-based models (EBM) to neural network configurations to determine segment descriptors in low-dimensional space and compile such low-dimensional space for a network descriptor that is used in measuring compatibility and/or similarity among Chopra, p. 3, § 2.2, ¶ 2: “Let W be the shared parameter vector that is subject to learning, and let GW(X1) and GW(X2) be the two points in the low-dimensional space that are generated by mapping X1 and X2. Then our system can be viewed as a scalar ‘energy function’ EW(X1, X2) that measures the compatibility between X1, X2.” p. 3, § 2, ¶ 1: “While probabilistic models assign a normalized probability to every possible configuration of the variables being modeled, energy-based models (EBM) assign an unnormalized energy to those configurations [18, 9].” P. 3, § 2, ¶ 2: “The advantage of EBMs over traditional probabilistic models, particularly generative models, is that there is no need for estimating normalized probability distributions over the input space. The absence of normalization saves us from computing partition functions that may be intractable. It also gives us considerably more freedom in the choice of architectures for the model [9].”).
 
With respect to claim 2, Jaech modified by Chopra teaches the system of claim 1, and Jaech further teaches:
wherein the values associated with the one or more artificial neurons comprises at least one of outputs of the one or more artificial neurons for selected inputs, parameters of the one or more artificial neurons, and hyper-parameters associated with the one or more artificial neurons. (Jaech at ¶ [0105]: “A model using the siamese network architecture based on the Semantic Similarity Models (SSM) appearing in other work [17, 34, 41] has been constructed. Detailed procedure for the SSM model is shown in FIG. 8. A query embedding is constructed by concatenating the last output from each of the forward and backward directions of the query bi-LSTM. A document embedding is constructed by max-pooling over the output bi-LSTM states across the entire document.”; and “The model parameters and hyper-parameters were optimized on the same dataset as the Match-Tensor model.” ¶ [0113]: “To improve on the query-agnostic pooling schemes of SSM, an attention pooling mechanism for the document embeddings as an alternative to max pooling is implemented”; and “Attention weights are determined by taking the dot product between these vectors and normalized using the Softmax function. The attention-pooled document embedding is the weighted combination of the bi-LSTM outputs.”
The examiner notes that Jaech’s weights and/or model parameters teach parameters of the one or more artificial neurons.  The examiner also notes that Jaech’s hyper-parameters teach the hyper-parameters associated with the one or more artificial neurons.  The examiner further notes that Jaech’s output of each sub-network as shown in FIG. 8 teaches outputs of the one or more artificial neurons.)
 
With respect to claim 6, Jaech modified by Chopra teaches the system of claim 1, and Jaech further teaches:
wherein the at least one processor is further configured to compare the descriptor of the artificial neural network with a descriptor of a second artificial neural network to  (Chopra at p. 7, § 4.1, ¶ 3: “The likelihood that a test image is genuine, genuine, is found by evaluating the normal density of the test image on the model of the concerned subject”; “The probability that the given image is genuine is given by 
    PNG
    media_image3.png
    47
    281
    media_image3.png
    Greyscale
”; and “The values of the percentage of falsely rejected images and the falsely accepted images are plotted for all possible values of the threshold probability. The optimal threshold probability is the value that partitions the test set into genuine and impostor pairs and minimizes FA and FR rates.” The examiner notes that Chopra’s first convolutional network output GW(X1) and second convolutional network output GW(X2) as illustrated in FIG. 1 teaches two descriptors.  The examiner further notes that Chopra’s comparing the aforementioned outputs to determine the probability of test image (e.g., X1 or X2 in FIG. 1) being genuine (e.g., the test image and the target image being from the same person) teaches a matching score.) 
Jaech and Chopra are analogous art because both pertain to determining similarity between digital contents by using neural networks.  
It would have been obvious for a person of ordinary skill in the art prior to the effective filing date to have modified Jaech’s identification of a plurality of segments by analyzing a neural network (Jaech, supra) with Chopra’s comparison between descriptors of corresponding neural networks (Chopra, supra). The modification verifies the comparison result of falsely rejected genuine pairs and the falsely accepted imposter pairs and further Chopra, pp. 5-6, § 3.2, last paragraph: “The performance of the network was measured by a calculation of the percentage of impostor pairs accepted (FA), and the percentage of genuine pairs rejected (FR).”; and p. 6, § 4.1, ¶ 4: “The values of the percentage of falsely rejected images and the falsely accepted images are plotted for all possible values of the threshold probability. The optimal threshold probability is the value that partitions the test set into genuine and impostor pairs and minimizes FA and FR rates.” § 4, last paragraph: “The AT&T dataset is relatively small, and our system required only 5000 training samples to achieve very high performance on the test set.”).
 
With respect to claim 7, Jaech modified by Chopra teaches the system of claim 6, and Chopra further teaches:  
wherein the second artificial neural network is a result of using a machine learning algorithm to update the artificial neural network. (Chopra at p. 3, § 2.2, ¶ 3: “Given a genuine pair from the training set (X1, X2), and an impostor pair from the training set (X1, X2’), the machine behaves in a desirable manner if the following condition holds:

    PNG
    media_image2.png
    47
    421
    media_image2.png
    Greyscale
.
The examiner notes that Chopra’s training a neural network illustrated in FIG. 1 with a training set having genuine pairs and impostor pairs teaches that one of the sub-networks is a result of using a machine learning algorithm, and that Chopra’s Siamese that learns an imposter pair (X1, X2’) and a genuine pair (X1, X2) from the training set teaches updating the artificial neural network as claimed.)
Jaech and Chopra are analogous art because both pertain to determining similarity between digital contents by using neural networks.  
It would have been obvious for a person of ordinary skill in the art prior to the effective filing date to have modified Jaech’s identification of a plurality of segments by analyzing a neural network (Jaech, supra) to incorporate Chopra’s updating a neural network using machine learning algorithm (Chopra, supra). The modification utilizes Chopra’s energy-based models to impose a condition on a Siamese architecture so that the Siamese architecture behaves in a desire manner e.g., correctly distinguishing a genuine pair from an imposter pair as represented by Condition 1) by training and updating a neural network to satisfy the aforementioned condition  (Chopra, p. 3, § 2.2, ¶ 2: “Then our system can be viewed as a scalar ‘energy function’ EW(X1, X2) that measures the compatibility between X1, X2. It is deﬁned as EW(X1, X2) = || GW(X1) - GW(X2)||    (1)”. p. 3, § 2.2, ¶ 3 : “Given a genuine pair from the training set (X1, X2), and an impostor pair from the training set (X1, X2’), the machine behaves in a desirable manner if the following condition holds:
,”).

With respect to claim 9, Jaech teaches a method for generating descriptors of artificial neural networks, the method comprising: (Jaech at ¶ [0058]: “Huang et al. [17] introduced the first Deep Neural Network architectures for Web search that operated on (query, title) pairs, using a so-called siamese architecture [23], in which two feed-forward networks NNQ and NND map the query q and the title of a given web document d, respectively, into fixed-length representations: Φ(q,d)=cos(NNQ(q),NND(d)), The social-networking system may, in this approach, then rank the final documents based on their similarity to the query in this space computed using cosine similarity.” The examiner notes that Jaech’s using multiple neural networks (NNQ and NND) to rank objects based on their similarity teaches a method for generating descriptors of artificial neural networks.)
 
analyze the artificial neural network to obtain a plurality of segments of the artificial neural network, each segment comprises one or more artificial neurons; (Jaech at ¶ [0058]: “Huang et al. [17] introduced the first Deep Neural Network architectures for Web search that operated on (query, title) pairs, using a so-called siamese architecture [23], in which two feed-forward networks NNQ and NND map the query q and the title of a given web document d, respectively, into fixed-length representations: Φ(q,d)=cos(NNQ(q),NND(d)), The social-networking system may, in this approach, then rank the final documents based on their similarity to the query in this space computed using cosine similarity. The application of convolutional neural networks, in lieu of feed-forward-networks, by Shen et al. [41] marks the next notable advancement using the same siamese architecture.”  ¶ [0104]: “Semantic Similarity Model (SSM):” ¶ [0105]: “A model using the siamese network architecture based on the Semantic Similarity Models (SSM) appearing in other work [17, 34, 41] has been constructed. Detailed procedure for the SSM model is shown in FIG. 8.” The examiner notes that Jaech’s use of two neural networks NNQ and NND each having a plurality of neurons as shown in FIG. 8 or the use of a Siamese architecture having multiple segments with a plurality of neurons teaches the above limitation.)
 
Jaech does not appear to explicitly teach:
for each segment, calculating descriptor of the segment using values associated with the one or more artificial neurons of the segment; and
compiling a descriptor of the artificial neural network using at least part of the plurality of calculated descriptors of the segments.
 
Chopra does, however, teach:
for each segment, calculating descriptor of the segment using values associated with the one or more artificial neurons of the segment; and (Chopra at p. 3, § 2.2, ¶ 2: “Let X1 and X2 be a pair of images shown to our learning machine. Let Y be a binary label of the pair, Y = 0 if the images X1 and X2 belong to the same person (a ‘genuine pair’) and Y = 1 otherwise (an ‘imposter pair’). Let W be the shared parameter vector that is subject to learning, and let GW(X1) and GW(X2) be the two points in the low-dimensional space that are generated by mapping X1 and X2. Then our system can be viewed as a scalar ‘energy function’ EW(X1, X2) that measures the compatibility between X1, X2. It is defined as: 

    PNG
    media_image1.png
    44
    296
    media_image1.png
    Greyscale
         (1)
p. 3, § 2.2, ¶ 3: “Given a genuine pair from the training set (X1, X2), and an impostor pair from the training set (X1, X2’), the machine behaves in a desirable manner if the following condition holds:

    PNG
    media_image2.png
    47
    421
    media_image2.png
    Greyscale
).
The examiner notes that Chopra’s two neural networks teach two segments, that these two networks’ respectively receiving X1 and input X2 as inputs teaches values associated with neurons in the segment, and that these two neural networks respectively receive input X1 and input X2, and compute GW(X1) and GW(X2) teaches calculating a descriptor for each segment as claimed.)
 
compiling a descriptor of the artificial neural network using at least part of the plurality of calculated descriptors of the segments. (Chopra at p. 3, § 2.2, ¶ 2: “Let X1 and X2 be a pair of images shown to our learning machine. Let Y be a binary label of the pair, Y = 0 if the images X1 and X2 belong to the same person (a ‘genuine pair’) and Y = 1 otherwise (an ‘imposter pair’). Let W be the shared parameter vector that is subject to learning, and let GW(X1) and GW(X2) be the two points in the low-dimensional space that are generated by mapping X1 and X2. Then our system can be viewed as a scalar ‘energy function’ EW(X1, X2) that measures the compatibility between X1, X2. It is defined as:

    PNG
    media_image1.png
    44
    296
    media_image1.png
    Greyscale
         (1)
The examiner notes that Chopra’s energy EW(X1, X2) that is assigned to each configuration of variables modeled in Chopra is obtained by compiling GW(X1) and GW(X2) and thus teaches a descriptor for the neural network as also evidenced by FIG. 1).
Jaech and Chopra are analogous art because both pertain to determining similarity between digital contents by using neural networks.  
It would have been obvious for a person of ordinary skill in the art prior to the effective filing date to have modified Jaech’s identification of a plurality of segments by analyzing a neural network (Jaech, supra) with Chopra’s calculating a segment descriptor for each segment of the neural network and compiling a network descriptor from the calculated segment descriptors  (Chopra, supra). The modification applies Chopra’s energy-based models (EBM) to neural network configurations and thus saves time and resources for computing intractable partition functions by setting aside the need for estimating normalized probability distributions while allowing more freedom in the choice of architectures for a neural network model (Chopra, p. 3, § 2, ¶ 1: “While probabilistic models assign a normalized probability to every possible configuration of the variables being modeled, energy-based models (EBM) assign an unnormalized energy to those configurations [18, 9].” P. 3, § 2, ¶ 2: “The advantage of EBMs over traditional probabilistic models, particularly generative models, is that there is no need for estimating normalized probability distributions over the input space. The absence of normalization saves us from computing partition functions that may be intractable. It also gives us considerably more freedom in the choice of architectures for the model [9].”).

 
With respect to claim 10, claim 10 recites identical or substantially similar limitations that are a subset of the claimed limitations of claim 2 and is thus rejected in the same manner, the same art and reasoning applying. 
 
With respect to claim 11, claim 11 recites identical or substantially similar limitations that are a subset of the claimed limitations of claim 2 and is thus rejected in the same manner, the same art and reasoning applying. 
 
With respect to claim 12, claim 12 recites identical or substantially similar limitations that are a subset of the claimed limitations of claim 2 and is thus rejected in the same manner, the same art and reasoning applying. 
 
With respect to claim 16, claim 16 recites identical or substantially similar limitations of claim 6 and is thus rejected in the same manner, the same art and reasoning applying. 
 
With respect to claim 17, claim 17 recites identical or substantially similar limitations of claim 7 and is thus rejected in the same manner, the same art and reasoning applying. 
 
Jaech teaches:
A non-transitory computer readable medium storing data and computer implementable instructions for carrying out a method for generating descriptors of artificial neural networks, the method comprising: (Jaech at ¶ [0146]: “A computer-readable non-transitory storage medium may be volatile, non-volatile, or a combination of volatile and non-volatile, where appropriate”.  ¶ [0058]: “Huang et al. [17] introduced the first Deep Neural Network architectures for Web search that operated on (query, title) pairs, using a so-called siamese architecture [23], in which two feed-forward networks NNQ and NND map the query q and the title of a given web document d, respectively, into fixed-length representations: Φ(q,d)=cos(NNQ(q),NND(d)), The social-networking system may, in this approach, then rank the final documents based on their similarity to the query in this space computed using cosine similarity.” The examiner notes that Jaech’s using multiple neural networks (NNQ and NND) to rank objects based on their similarity teaches a method for generating descriptors of artificial neural networks. ) 
 
obtaining a plurality of segments of an artificial neural network, each segment comprises one or more artificial neurons; (Jaech at ¶ [0058]: “Huang et al. [17] introduced the first Deep Neural Network architectures for Web search that operated on (query, title) pairs, using a so-called siamese architecture [23], in which two feed-forward networks NNQ and NND map the query q and the title of a given web document d, respectively, into fixed-length representations: Φ(q,d)=cos(NNQ(q),NND(d)), The social-networking system may, in this approach, then rank the final documents based on their similarity to the query in this space computed using cosine similarity. The application of convolutional neural networks, in lieu of feed-forward-networks, by Shen et al. [41] marks the next notable advancement using the same siamese architecture.”  ¶ [0104]: “Semantic Similarity Model (SSM):” ¶ [0105]: “A model using the siamese network architecture based on the Semantic Similarity Models (SSM) appearing in other work [17, 34, 41] has been constructed. Detailed procedure for the SSM model is shown in FIG. 8.” The examiner notes that Jaech’s use of two neural networks NNQ and NND each having a plurality of neurons as shown in FIG. 8 or the use of a Siamese architecture having multiple segments with a plurality of neurons teaches the above limitation.)
 
Jaech does not appear to explicitly teach:
for each segment, calculating descriptor of the segment using values associated with the one or more artificial neurons of the segment; and
compiling a descriptor of the artificial neural network using at least part of the plurality of calculated descriptors of the segments.
 
Chopra does, however, teach:
for each segment, calculating descriptor of the segment using values associated with the one or more artificial neurons of the segment; and (Chopra at p. 3, § 2.2, ¶ 2: “Let X1 and X2 be a pair of images shown to our learning machine. Let Y be a binary label of the pair, Y = 0 if the images X1 and X2 belong to the same person (a ‘genuine pair’) and Y = 1 otherwise (an ‘imposter pair’). Let W be the shared parameter vector that is subject to learning, and let GW(X1) and GW(X2) be the two points in the low-dimensional space that are generated by mapping X1 and X2. Then our system can be viewed as a scalar ‘energy function’ EW(X1, X2) that measures the compatibility between X1, X2. It is defined as: 

    PNG
    media_image1.png
    44
    296
    media_image1.png
    Greyscale
         (1)
p. 3, § 2.2, ¶ 3: “Given a genuine pair from the training set (X1, X2), and an impostor pair from the training set (X1, X2’), the machine behaves in a desirable manner if the following condition holds:

    PNG
    media_image2.png
    47
    421
    media_image2.png
    Greyscale
).
The examiner notes that Chopra’s two neural networks teach two segments, that these two networks’ respectively receiving X1 and input X2 as inputs teaches values associated with neurons in the segment, and that these two neural networks respectively receive input X1 and input X2, and compute GW(X1) and GW(X2) teaches calculating a descriptor for each segment as claimed.)
 
compiling a descriptor of the artificial neural network using at least part of the plurality of calculated descriptors of the segments. (Chopra at p. 3, § 2.2, ¶ 2: “Let X1 and X2 be a pair of images shown to our learning machine. Let Y be a binary label of the pair, Y = 0 if the images X1 and X2 belong to the same person (a ‘genuine pair’) and Y = 1 otherwise (an ‘imposter pair’). Let W be the shared parameter vector that is subject to learning, and let GW(X1) and GW(X2) be the two points in the low-dimensional space that are generated by mapping X1 and X2. Then our system can be viewed as a scalar ‘energy function’ EW(X1, X2) that measures the compatibility between X1, X2. It is defined as:

    PNG
    media_image1.png
    44
    296
    media_image1.png
    Greyscale
         (1)
The examiner notes that Chopra’s energy EW(X1, X2) that is assigned to each configuration of variables modeled in Chopra is obtained by compiling GW(X1) and GW(X2) and thus teaches a descriptor for the neural network as also evidenced by FIG. 1).
 
Jaech and Chopra are analogous art because both pertain to determining similarity between digital contents by using neural networks.  
It would have been obvious for a person of ordinary skill in the art prior to the effective filing date to have modified Jaech’s identification of a plurality of segments by analyzing a neural network (Jaech, supra) to incorporate Chopra’s calculating a segment descriptor for each segment of the neural network and compiling a network descriptor from the calculated segment descriptors (Chopra, supra) The modification applies Chopra’s energy-based models (EBM) to neural network configurations and thus saves time and resources for computing intractable partition functions by setting aside the need for estimating normalized probability distributions while allowing more freedom in the choice of architectures for a neural network model (Chopra, p. 3, § 2, ¶ 1: “While probabilistic models assign a normalized probability to every possible configuration of the variables being modeled, energy-based models (EBM) assign an unnormalized energy to those configurations [18, 9].” P. 3, § 2, ¶ 2: “The advantage of EBMs over traditional probabilistic models, particularly generative models, is that there is no need for estimating normalized probability distributions over the input space. The absence of normalization saves us from computing partition functions that may be intractable. It also gives us considerably more freedom in the choice of architectures for the model [9].”).
 
            Claim(s) 3-4, 13-14, and 18 is/are rejected under 35 U.S.C. 103 as being unpatentable over Jaech et al. US PGPub 2018 / 0349477 A1 effectively filed on June 6, 2017 (hereinafter Jaech) in view of Chopra et al., Learning a Similarity Metric Discriminatively, with Application to Face Verification (2005) (hereinafter Chopra) and further in view of Ioannidou et al., Deep Learning Advances in Computer Vision with 3D Data: A Survey (June 2017) (hereinafter Ioannidou).
          With respect to claim 3, Jaech modified by Chopra teaches the system of claim 1 but does not appear to explicitly teach:
wherein calculating the descriptor of the segment comprises using a hash function of at least part of the values associated with the one or more artificial neurons. 
 
Ioannidou does, however, teach:
wherein calculating the descriptor of the segment comprises using a hash function of at least part of the values associated with the one or more artificial neurons. (Ioannidou at p. 20:14, § 3.3, ¶ 3: “Except for the computational time, significant work has been done toward reducing the storage cost of DNNs”; “Chen et al. [2015a] used the Hashing Trick to compress a network’s parameters and introduced a novel deep architecture called HashedNets. In their network, a low-cost hash function was utilized for grouping the weights into hash buckets, therefore all parameters belonging to the same bucket had the same weight value.” The examiner notes that Ioannidou’s weights teach at least part of values associated with neurons, and that Ioannidou’s hashing weights to compress a network’s parameters teaches the above limitation.)
Jaech, Chopra, and Ioannidou are analogous art because all three references pertain to determining similarity between digital contents by using neural networks.  
It would have been obvious for a person of ordinary skill in the art prior to the effective filing date to have modified Jaech in view of Chopra to incorporate Ioannidou’s using a hash function of one or more values associated with one or more neurons (Ioannidou, supra). The modification not only compresses a neural network’s parameters but also saves on the number of weights for the network’s parameters by grouping weights into hash buckets and using the same weight value for each bucket of weights (Ioannidou at p. 20:14, § 3.3, ¶ 3 : “Chen et al. [2015a] used the Hashing Trick to compress a network’s parameters and introduced a novel deep architecture called HashedNets. In their network, a low-cost hash function was utilized for grouping the weights into hash buckets, therefore all parameters belonging to the same bucket had the same weight value.”)
 
Jaech modified by Chopra teaches the system of claim 1 but does not appear to explicitly teach:
wherein calculating the descriptor of the segment comprises identifying a property of a distribution of at least part of the values associated with the one or more artificial neurons. 
Ioannidou does, however, teach:
wherein calculating the descriptor of the segment comprises identifying a property of a distribution of at least part of the values associated with the one or more artificial neurons.  (Ioannidou at p. 20.13, § 3.1, last paragraph: “Stochastic pooling [Zeiler and Fergus 2013] is also a method for regularizing large CNNs. At first, probabilities are computed for all the activations within a region by normalizing them. Based on these probabilities, a multinomial distribution is formed and finally used to randomly sample the activation that the pooling operation will return. Stochastic pooling is applied to the convolutional layers and it can be combined with any other technique, such as weight decay and dropout.” The examiner notes that Ioannidou’s computed distributions of probabilities for activations and/or the subsequently formed multinomial distribution teaches a property of a distribution of at least part of the values associated with one or more neurons as claimed.)
Jaech, Chopra, and Ioannidou are analogous art because all three references pertain to determining similarity between digital contents by using neural networks.  
Jaech in view of Chopra to incorporate Ioannidou’s identification of a property distribution associated with neuron value(s) using Ioannidou’s stochastic pooling (Ioannidou, supra).  The modification performs regularization techniques on neural networks to reduce or prevent overfitting (Ioannidou at p. 20:13, § 3.2, ¶ 1: “Stochastic pooling [Zeiler and Fergus 2013] is also a method for regularizing large CNNs. At first, probabilities are computed for all the activations within a region by normalizing them. Based on these probabilities, a multinomial distribution is formed and finally used to randomly sample the activation that the pooling operation will return. Stochastic pooling is applied to the convolutional layers and it can be combined with any other technique, such as weight decay and dropout.”).
 
With respect to claim 13, claim 13 recites identical or substantially similar limitations of claim 3 and is thus rejected in the same manner, the same art and reasoning applying. 
 
With respect to claim 14, claim 14 recites identical or substantially similar limitations of claim 4 and is thus rejected in the same manner, the same art and reasoning applying. 
 
Jaech modified by Chopra teaches the method of claim 16, and Chopra further teaches:
comparing the descriptors is based on a statistical distance between the first descriptor and the second descriptor. (Chopra at p. 3, § 2.1, last paragraph: “Our approach is to build a trainable system that nonlinearly maps the raw images of faces to points in a low dimensional space so that the distance between these points is small if the images belong to the same person and large otherwise.  Learning the similarity metric is realized by training a network that consists of two identical convolutional networks that share the same set of weights - a Siamese Architecture [4] (see figure 1).” The examiner notes that Chopra’s points mapped from respective raw images into a low-dimension space teaches a first descriptor and a second descriptor. The examiner further notes that Chopra’s distance between two of the aforementioned points teaches a statistical distance, and that Chopra’s determining whether a distance between two points in a low-dimension space is sufficiently small to determine similarity of two digital objects teaches comparing the descriptors based on a statistical distance between the first descriptor and the second descriptor.)
Jaech and Chopra are analogous art because both pertain to determining similarity between digital contents by using neural networks.  
It would have been obvious for a person of ordinary skill in the art prior to the effective filing date to have modified Jaech’s descriptor determination with Chopra’s comparison between descriptors of corresponding neural networks (Chopra, supra). The modification verifies the comparison result of falsely rejected Chopra, pp. 5-6, § 3.2, last paragraph: “The performance of the network was measured by a calculation of the percentage of impostor pairs accepted (FA), and the percentage of genuine pairs rejected (FR).”; and p. 6, § 4.1, ¶ 4 : “The values of the percentage of falsely rejected images and the falsely accepted images are plotted for all possible values of the threshold probability. The optimal threshold probability is the value that partitions the test set into genuine and impostor pairs and minimizes FA and FR rates.”).
 
Jaech modified by Chopra does not appear to explicitly teach:
wherein the first descriptor of the artificial neural network comprises a first distribution, 
the second descriptor of the second artificial neural network comprises a second distribution, and
 
Ioannidou does, however, teach:
wherein the first descriptor of the artificial neural network comprises a first distribution, (Ioannidou at p. 20:13, § 3.1, last paragraph: “Stochastic pooling [Zeiler and Fergus 2013] is also a method for regularizing large CNNs. At first, probabilities are computed for all the activations within a region by normalizing them. Based on these probabilities, a multinomial distribution is formed and finally used to randomly sample the activation that the pooling operation will return. Stochastic pooling is applied to the convolutional layers and it can be combined with any other technique, such as weight decay and dropout.” The examiner notes that Ioannidou’s computing the distribution of probabilities for all activations and/or the determination of a multinomial distribution teaches computing first probabilities and hence a first distribution for first activation(s) of a first sub-network (e.g., one of the two sub-networks cited in claim 1 above) or for the entire network cited in claim 1 above.)
the second descriptor of the second artificial neural network comprises a second distribution, and (Ioannidou at p. 20:13, § 3.1, last paragraph: “Stochastic pooling [Zeiler and Fergus 2013] is also a method for regularizing large CNNs. At first, probabilities are computed for all the activations within a region by normalizing them. Based on these probabilities, a multinomial distribution is formed and finally used to randomly sample the activation that the pooling operation will return. Stochastic pooling is applied to the convolutional layers and it can be combined with any other technique, such as weight decay and dropout.” The examiner notes that Ioannidou’s computing probabilities for all activations teaches computing second probabilities and hence a second distribution for second activation(s) of a second sub-network (e.g., the other sub-network cited in claim 1 above) as claimed.)
 
Jaech, Chopra, and Ioannidou are analogous art because both pertain to determining similarity between digital contents by using neural networks.  
Jaech in view of Chopra to incorporate Ioannidou’s determination of probability distributions and multinomial distributions using Ioannidou’s stochastic pooling (Ioannidou, supra).  The modification not only directly evaluates the probability of an output but also estimates the input-output joint probability distribution to encompass the capabilities of both the discriminative deep learning methods and the generative deep learning methods (Ioannidou,  p. 20:7, § 3.1, ¶ 1 : “Over the last few years, a large number of DL approaches has been presented. These methods can be divided into two general categories based on how they are used [Deng 2014]: (1) discriminative and (2) generative. Discriminative methods directly evaluate the probability of an output given a certain input, while generative methods estimate the input-output joint probability distribution.”)

            Claim(s) 5 and 15 is/are rejected under 35 U.S.C. 103 as being unpatentable over Jaech et al. US PGPub 2018 / 0349477 A1 effectively filed on June 6, 2017 (hereinafter Jaech) in view of Chopra et al., Learning a Similarity Metric Discriminatively, with Application to Face Verification (2005) (hereinafter Chopra) and further in view of Scarselli et al., The graph neural network model (2009) (hereinafter Scarselli).
          With respect to claim 5, Jaech modified by Chopra teaches the system of claim 1 but does not appear to explicitly teach:
wherein compiling a descriptor of the artificial neural network comprises constructing a graph, where the graph nodes correspond to the plurality of segments. 
 
Scarselli does, however, teach:
wherein compiling a descriptor of the artificial neural network comprises constructing a graph, where the graph nodes correspond to the plurality of segments. (Scarselli at p. 65, FIG. 3 Caption: “Graph (on the top), the corresponding encoding network (in the middle), and the network obtained by unfolding the encoding network (at the bottom). The nodes (the circles) of the graph are replaced, in the encoding network, by units computing fw and gw (the squares). When fw and gw are implemented by feedforward neural networks, the encoding network is a recurrent neural network. In the unfolding network, each layer corresponds to a time instant and contains a copy of all the units of the encoding network. Connections between layers depend on encoding network connectivity.” The examiner notes that reproduction of FIG. 3 is omitted due to its size. The examiner further notes that Scarselli’s representing the neural network shown at the bottom of FIG. 3 as a graph shown at the top of FIG. 3 with the encoding network shown in the middle of FIG. 3 teaches constructing a graph having nodes that correspond to a plurality of segments such as Scarselli’s fw and gw as claimed.)
 
Jaech, Chopra, and Scarselli are analogous art because both pertain to recognition of digital contents using neural networks and probabilities.  
It would have been obvious for a person of ordinary skill in the art prior to the effective filing date to have modified Jaech in view of Chopra to incorporate Scarselli’s  graph neural network construction (Scarselli, supra).  This combination provides the additional capability of processing the data represented in graph domainsto preserve the graph-structured nature of the data while graphically showing the topological relationships among nodes in areas of science and engineering such as computer vision, molecular chemistry, molecular biology, pattern recognition, and data mining, can be represented in terms of graphs (Scarselli, p. 61, Abstract: “Many underlying relationships among data in several areas of science and engineering, e.g., computer vision, molecular chemistry, molecular biology, pattern recognition, and data mining, can be represented in terms of graphs. In this paper, we propose a new neural network model, called graph neural network (GNN) model, that extends existing neural network methods for processing the data represented in graph domains.”  pp.61-62, § 1, ¶ 4: “More recently, there have been various approaches [17], [18] attempting to preserve the graph structured nature of the data for as long as required before the processing phase. The idea is to encode the underlying graph structured data using the topological relationships among the nodes of the graph, in order to incorporate graph structured information in the data processing step.”)
 
With respect to claim 15, claim 15 recites identical or substantially similar limitations of claim 5 and is thus rejected in the same manner, the same art and reasoning applying. 
 
10.            Claim(s) 8 and 19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Jaech et al. US PGPub 2018 / 0349477 A1 effectively filed on June 6, 2017 (hereinafter Jaech) in view of Chopra et al., Learning a Similarity Metric Discriminatively, with Application to Face Verification (2005) (hereinafter Chopra) and further in view of Dean et al., Large Scale Distributed Deep Networks (2012) (hereafter Dean).
          With respect to claim 8, Jaech modified by Chopra teaches the system of claim 6, and Chopra further teaches:
compare the matching score with at least one threshold; (Chopra at p. 7, § 4.1, ¶ 4: “The likelihood that a test image is genuine, genuine, is found by evaluating the normal density of the test image on the model of the concerned subject”; “The probability that the given image is genuine is given by 
    PNG
    media_image3.png
    47
    281
    media_image3.png
    Greyscale
”; and “The values of the percentage of falsely rejected images and the falsely accepted images are plotted for all possible values of the threshold probability. The optimal threshold probability is the value that partitions the test set into genuine and impostor pairs and minimizes FA and FR rates.” The examiner notes that Chopra’s threshold probability teaches at least one threshold. The examiner further notes that Chopra’s plotting the values of the probability of FA and FR for a threshold probability, which is determined by minimizing FA and FR rates, teaches the above limitation.)
 
decide, based on the comparison of the matching score and the at least one threshold, to utilize the second artificial neural network; and (Chopra at p. 7, § 4.1, ¶ 5: “The verification rates obtained from testing the AT&T database and the AR/Purdue database are strikingly different (see table 1 and figures 6 and 7), underlining the differences in difficulty in the two databases. The AT&T dataset is relatively small, and our system required only 5000 training samples to achieve very high performance on the test set. The AR/Purdue dataset is very large and diverse, with huge variations in expression, lighting, and added occlusions. Our higher error rates reflect this level of difficulty.” The examiner notes that Chopra’s verification based on the aforementioned probabilities that has a high error rate of its network on datasets having “huge variations” teaches less favorable utilization or non-utilization of its network for such datasets, and that Chopra’s successful verification based on the aforementioned probabilities of its network with “very high performance” on datasets having relatively small variations teaches utilize the artificial neural network as claimed.)

Jaech and Chopra are analogous art because both pertain to determining similarity between digital contents by using neural networks.  
It would have been obvious for a person of ordinary skill in the art prior to the effective filing date to have modified Jaech in view of Chopra to further incorporate Chopra’s comparison between a matching score (e.g., probability of a pair being a genuine pair based on the descriptors of corresponding neural networks) and a threshold and decision to utilize the neural network based on the comparison (Chopra, supra). The modification improves the performance (e.g., accuracy) of the neural network’s recognizing genuine pairs and imposter pairs of inputs by comparing the aforementioned matching score with a threshold value that is determined in such a manner to minimize error rate (e.g., the falsely accepted (FA) and falsely rejected (FR) rates for partitioning a dataset into genuine and imposter pairs (Chopra, p. 6, § 4.1, ¶ 4: “The values of the percentage of falsely rejected images and the falsely accepted images are plotted for all possible values of the threshold probability. The optimal threshold probability is the value that partitions the test set into genuine and impostor pairs and minimizes FA and FR rates.” p. 5, § 3.2, last paragraph – p. 6, § 3.2, first paragraph: “The performance of the network was measured by a calculation of the percentage of impostor pairs accepted (FA), and the percentage of genuine pairs rejected (FR).”)
 
Jaech modified by Chopra does not appear to explicitly teach:
wherein the at least one processor is further configured to: receive the artificial neural network from an external device using a communication device; 
based on the decision to utilize the second artificial neural network, transmit the second artificial neural network to the external device. 
Dean does, however, teach:
wherein the at least one processor is further configured to: receive the artificial neural network from an external device using a communication device; (Dean at p. 4, FIG. 2:

    PNG
    media_image4.png
    162
    439
    media_image4.png
    Greyscale

 
p. 2, § 1, ¶ 1: “Within this framework, we have designed and implemented two novel methods for large-scale distributed training: (i) Downpour SGD, an asynchronous stochastic gradient descent procedure which leverages adaptive learning rates and supports a large number of model replicas, and (ii) Sandblaster L-BFGS, a distributed implementation of L-BFGS that uses both data and model parallelism.” p. 4, § 4.1, ¶ 2: “The models communicate updates through a centralized parameter server, which keeps the current state of all parameters for the model, sharded across many machines (e.g., if we have 10 parameter server shards, each shard is responsible for storing and applying updates to 1/10th of the model parameters) (Figure 2).”
The examiner notes that a “model replica” of an entire neural network (e.g., Dean’s DistBelief model) teaches an artificial neural network, and that Dean’s instantiating multiple replicas of a neural network (e.g., the “Model replicas” shown in the left-hand portion of FIG. 2 reproduced above) on “many machines” (e.g., multiple “parameter servers”) via “distributed implementation of L-BFGS” or model parallelism teaches receiving an artificial neural network from an external device (e.g., the “centralized parameter server”).)
 
based on the decision to utilize the second artificial neural network, transmit the second artificial neural network to the external device. (Dean at p. 5, § 4.1, ¶ 2: “For synchronous SGD, if one machine fails, the entire training process is delayed; whereas for asynchronous SGD, if one machine in a model replica fails, the other model replicas continue processing their training data and updating the model parameters via the parameter servers.” p. 4, § 4.1, ¶ 3: “In the simplest implementation, before processing each mini-batch, a model replica asks the parameter server service for an updated copy of its model parameters. Because DistBelief models are themselves partitioned across multiple machines, each machine needs to communicate with just the subset of parameter server shards that hold the model parameters relevant to its partition.”
The examiner notes that Dean’s non-failing replicas’ updating the model parameters via the parameter servers renders the above limitation obvious. More specifically, Dean explicitly teaches each replica update the parameters via the parameter sever(s) during the training process. Therefore, the multiple model replicas thus transmit all the updated parameters for the neural network (from which each of these multiple replicas was originally instantiated) to the parameter server(s) and hence teaches transmit the second neural network to the external device (e.g., a “parameter server”).)
 
Jaech, Chopra, and Dean are analogous art because all three references pertain to recognition of digital contents using neural networks.  
It would have been obvious for a person of ordinary skill in the art prior to the effective filing date to have modified Jaech in view of Chopra to incorporate Dean’s transmitting and receiving a neural network from and to an external device (Dean, supra). The modification not only supports a large number of replicas for a neural network but also provides more robust systems for running the neural network with a failover mechanism (Dean, p. 5, § 4.1, ¶ 2: “Downpour SGD is more robust to machines failures than standard (synchronous) SGD. For synchronous SGD, if one machine fails, the entire training process is delayed; whereas for asynchronous SGD, if one machine in a model replica fails, the other model replicas continue processing their training data and updating the model parameters via the parameter servers.”)

With respect to claim 19, claim 19 recites identical or substantially similar limitations of claim 8 and is thus rejected in the same manner, the same art and reasoning applying. 
 
Conclusion
         The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
(a)      Bromley et al., Signature Verification using a "Siamese" Time Delay Neural Network (August 1993) teaches an algorithm for verification of signatures written on a pen-input tablet. The algorithm is based on a novel, artificial neural network, called a "Siamese" neural network. This network consists of two identical sub-networks joined at their outputs. During training the two sub-networks extract features from two signatures, while the joining neuron measures the distance between the two feature vectors. Verification consists of comparing an extracted feature vector with a stored feature vector for the signer. Signatures closer to this stored representation than a chosen threshold are accepted, all other signatures are rejected as forgeries.
(b)            Zeiler et al., Stochastic Pooling for Regularization of Deep Convolutional Neural Networks (16 Jan 2013) teaches a simple and effective method for regularizing large convolutional neural networks that replaces the conventional deterministic pooling operations with a stochastic procedure, randomly picking the activation within each pooling region according to a multinomial distribution, given 
(c)            Chen et al., Compressing Neural Networks with the Hashing Trick (19 Apr 2015) teaches a novel network architecture, HashedNets, that exploits inherent redundancy in neural networks to achieve drastic reductions in model sizes. HashedNets uses a low-cost hash function to randomly group connection weights into hash buckets, and all connections within the same hash bucket share a single parameter value. These parameters are tuned to adjust to the HashedNets weight sharing architecture with standard backprop during training. The hashing procedure introduces no additional memory overhead, and the novel network architecture demonstrates on several benchmark data sets that HashedNets shrink the storage requirements of neural networks substantially while mostly preserving generalization performance.
 
         Any inquiry concerning this communication or earlier communications from the examiner should be directed to ERICH C. TZOU whose telephone number is (571)272-9852.  The examiner can normally be reached on Monday-Friday 7:30AM-5:00PM EST with alternative Fridays off.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an 
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ann J. Lo can be reached on 571-272-9767.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
 
 
 
/E.C.T./Examiner, Art Unit 2126 
/ANN J LO/Supervisory Patent Examiner, Art Unit 2126