DETAILED ACTION
Notice of Pre-AIA  or AIA  Status

Claims 1-13 are pending in this application. The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . 
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

Specification
The title of the invention is not descriptive.  A new title is required that is clearly indicative of the invention to which the claims are directed. 

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
 (b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 1-13 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention. Independent Claims 1, 12 and 13 recites the limitation “learning data” several times, and it is unclear if it is the same data is being used an input into “a machine learning model”, “a learning data analysis unit”, “a learning method determination unit” and “a learning unit”, or if it is being transformed in a way during the course of operations, and the learning data that is passed to subsequent operations comprises the transformed learning data. For purposes of examination, ay data that is being used in a learning system is being interpreted as “learning data” into the broadest reasonable interpretation, one of ordinary skill in the art at the time of the invention would have given. Appropriate correction is required to distinguish and clarify the intended scope of the claimed features for further prosecution on the merits of the claims. 

35 U.S.C. § 112 Sixth Paragraph - Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that use the word “unit” in claims 1-12 or “step” in claim 13, which are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.


Claims 1-4, 6-7 and 11-13 are rejected under 35 U.S.C. 102(a)(1)/(a)(2) as being anticipated by Atorre et al. (US 2019/0035431 Al), hereby referred to as “Atorre”
Consider Claim 1.
Atorre teaches: 
1. A learning apparatus that inputs learning data to a machine learning model including a plurality of layers for analyzing an input image and learns the machine learning model, / 12. A non-transitory computer-readable storage medium storing an operation program of a learning apparatus that inputs learning data to a machine learning model including a plurality of layers for analyzing an input image and learns the machine learning model, / 13. An operation method of a learning apparatus that inputs learning data to a machine learning model including a plurality of layers for analyzing an input image and learns the machine learning model,  (Atorre: [0092]-[0095], Figure 1, A computer-implemented method includes receiving a target digital content item that includes a plurality of frames, identifying a set of candidate host frames for inserting source digital content items from the plurality of frames based on one or more attributes of the target digital content item, determining a candidate score for each respective candidate host frame of the candidate host frames, and generating host time defining data including identifications and the candidate scores of the candidate host frames, where the candidate score indicates a degree of transition of the target digital content item at the candidate host frame.)
1. the machine learning model being a model for performing semantic segmentation of determining a plurality of classes in the input image in units of pixels by / 12. the machine learning model being a model for performing semantic segmentation of determining a plurality of classes in the input image in units of pixels by / 13. the machine learning model being a model for performing semantic segmentation of determining a plurality of classes in the input image in units of pixels by (Atorre: [0011]The system is operable to cause a processor to detect or predict the host time from the target digital content item based on one or more of: (4) the detection or prediction of language or symbols in the target digital content's visual data, which indicate that there has been or will be a transition in the scenes or stories in the target digital content item; (5) the detection or prediction of language in the target digital content's audio data, which indicates that there has been or will be a transition in the scenes or stories in the target digital content item; [0231] FIG. 14 illustrates an example of a method of host time identification using a combination of audio, visual data, and metadata from target digital content in accordance with some embodiments. In step 1402, the content integration system 100 is configured to train a deep Siamese neural network (which is a neural network that can predict an output score of similarity between two things) using pairs of feature representations of shots (where the features include a key frame and a vector representation capturing: the HSY color histogram, semantic features including the content vector output of a neural network trained on images, MFCC audio features, the beginning and end frame index ( or time) of each shot, and metadata captions) and labels indicating whether those shots are from a same scene.)
1. extracting, for each layer, features which are included in the input image and have different frequency bands of spatial frequencies, the learning apparatus comprising: / 12. extracting, for each layer, features which are included in the input image and have different frequency bands of spatial frequencies, the operation program causing a computer to function as: / 13. extracting, for each layer, features which are included in the input image and have different frequency bands of spatial frequencies, the operation method comprising: (Atorre: [0077] the host time identification module can be configured to identify, as host times: (1) times that represent a shift or change in visual data, such as pixel values; (2) times that represent a shift or change in audio data, such as frequency values [0126], (f) spectral centroids, which are the center of gravity of the spectrum and are calculated as the weighted mean of the frequencies present in the signal determined using a Fourier transform, with their magnitudes as the weights ("spectral centroids"); (g) spectral spread, which is a measure of the bandwidth of the spectrum ("spectral spread"); (h) spectral entropy, which is the entropy of the normalized spectral energies for a set of sub-frames ("spectral entropy");)
1. at least one memory and at least one processor which function as: a learning data analysis unit that analyzes at least the frequency bands included in an annotation image of the learning data, the learning data being a pair of a learning input image and the annotation image in which each class region included in the learning input image is indicated by an annotation; / 12. a learning data analysis unit that analyzes at least the frequency bands included in an annotation image of the learning data, the learning data being a pair of a learning input image and the annotation image in which each class region included in the learning input image is indicated by an annotation; / 13. a learning data analysis step of analyzing at least the frequency bands included in an annotation image of the learning data, the learning data being a pair of a learning input image and the annotation image in which each class region included in the learning input image is indicated by an annotation;  (Atorre:[0130], where a preliminary step in host time identification involves parsing the target digital content into its visual components and its audio components, the host time identification module 106 can be configured to parse the target digital content into a list of frames including the visual aspect of the target digital content [0156] In some embodiments where the host time identification module 106 is configured to identify and/or score host times or candidate host times in the target digital content item by analyzing the attributes of the visual aspects of the target digital content item or its components, searching through all frames for consecutive pairs of frames or sequences of successive frames whose attributes (e.g. pixel values) reflect a level of change that exceeds a predetermined threshold (indicating a transition), and then selecting as host times or candidate host times one or more times or frame numbers associated with the transition ( e.g., the frame number of the frame at the start of a new shot, scene, or story), the level of change observed is also used to determine a candidate score for those host times or candidate host times.)
1. a learning method determination unit that determines a learning method using the learning data based on an analysis result of the frequency bands by the learning data analysis unit; / 12. a learning method determination unit that determines a learning method using the learning data based on an analysis result of the frequency bands by the learning data analysis unit; / 13. a learning method determination step of determining a learning method using the learning data based on an analysis result of the frequency bands in the learning data analysis step; (Atorre: [0096] The camera motion classification module 108 is configured to use a machine learning model trained on samples of content with different types and levels of camera motion in digital content to predict the type or level of camera motion in a given scene of the target digital content item. The camera motion classification module 108 can output the classification information to the host time identification module 106 to aid the identification of host times. [0219] In some embodiments, host time locations in the target digital content item are predicted using a machine learning classifier trained on visual and/or audio attributes and/or the metadata from examples of host times labelled as positive or negative examples, the target digital content item or its components, and/or the metadata about the target digital content item.) 
1. and a learning unit that learns the machine learning model via the determined learning method using the learning data. / 12. and a learning unit that learns the machine learning model via the determined learning method using the learning data. / 13. and a learning step of learning the machine learning model via the determined learning method using the learning data. (Atorre: [0220]-[0222] 0220] In some embodiments where host times in the target digital content item are predicted using any type of machine learning classifier or neural network model, after the prediction, the host times or candidate host times may be additionally scored or have their candidate score weighted or adjusted based on factors , such as user data about most viewed portions of the target digital content item or digital content in general, the proximity of the candidate host time to the start of the content (which is useful when inserting advertisements into videos, so as to ensure that the advertisement is likely to be shown before some percentage of viewers drop off), or, for the advertising use-case, advertisement time insertion policies or historical data tracking which portions of the digital content can host the most effective or most lucrative advertisement insertions. [0225])

2. The learning apparatus according to claim 1, wherein the learning data analysis unit specifies the frequency band for which extraction of the feature is of relatively high necessity among the frequency bands, as a necessary band, by analyzing the frequency bands included in the annotation image, and the learning method determination unit reconfigures the machine learning model based on the specified necessary band. (Atorre: [0096] The camera motion classification module 108 is configured to use a machine learning model trained on samples of content with different types and levels of camera motion in digital content to predict the type or level of camera motion in a given scene of the target digital content item. The camera motion classification module 108 can output the classification information to the host time identification module 106 to aid the identification of host times. [0219] In some embodiments, host time locations in the target digital content item are predicted using a machine learning classifier trained on visual and/or audio attributes and/or the metadata from examples of host times labelled as positive or negative examples, the target digital content item or its components, and/or the metadata about the target digital content item. [0220]-[0222] 0220] In some embodiments where host times in the target digital content item are predicted using any type of machine learning classifier or neural network model, after the prediction, the host times or candidate host times may be additionally scored or have their candidate score weighted or adjusted based on factors , such as user data about most viewed portions of the target digital content item or digital content in general, the proximity of the candidate host time to the start of the content (which is useful when inserting advertisements into videos, so as to ensure that the advertisement is likely to be shown before some percentage of viewers drop off), or, for the advertising use-case, advertisement time insertion policies or historical data tracking which portions of the digital content can host the most effective or most lucrative advertisement insertions. [0225])

3. The learning apparatus according to claim 2, wherein, in the reconfiguration, the learning method determination unit determines, among the plurality of layers, a necessary layer which is necessary for learning and an optional layer which is optional in learning, based on the specified necessary band, and reduces a processing amount of the optional layer to be smaller than a processing amount of the necessary layer. (Atorre: [0096] The camera motion classification module 108 is configured to use a machine learning model trained on samples of content with different types and levels of camera motion in digital content to predict the type or level of camera motion in a given scene of the target digital content item. The camera motion classification module 108 can output the classification information to the host time identification module 106 to aid the identification of host times. [0219] In some embodiments, host time locations in the target digital content item are predicted using a machine learning classifier trained on visual and/or audio attributes and/or the metadata from examples of host times labelled as positive or negative examples, the target digital content item or its components, and/or the metadata about the target digital content item. [0220]-[0222] 0220] In some embodiments where host times in the target digital content item are predicted using any type of machine learning classifier or neural network model, after the prediction, the host times or candidate host times may be additionally scored or have their candidate score weighted or adjusted based on factors , such as user data about most viewed portions of the target digital content item or digital content in general, the proximity of the candidate host time to the start of the content (which is useful when inserting advertisements into videos, so as to ensure that the advertisement is likely to be shown before some percentage of viewers drop off), or, for the advertising use-case, advertisement time insertion policies or historical data tracking which portions of the digital content can host the most effective or most lucrative advertisement insertions. [0225])

4. The learning apparatus according to claim 1, wherein the learning data analysis unit specifies the frequency band for which extraction of the feature is of relatively high necessity among the frequency bands, as a necessary band, by analyzing the frequency bands included in the annotation image, and the learning method determination unit matches a range of the frequency bands included in the annotation image with a range of the analyzable frequency bands in the machine learning model, by lowering a resolution of the learning input image based on the specified necessary band. (Atorre: [0096] The camera motion classification module 108 is configured to use a machine learning model trained on samples of content with different types and levels of camera motion in digital content to predict the type or level of camera motion in a given scene of the target digital content item. The camera motion classification module 108 can output the classification information to the host time identification module 106 to aid the identification of host times. [0219] In some embodiments, host time locations in the target digital content item are predicted using a machine learning classifier trained on visual and/or audio attributes and/or the metadata from examples of host times labelled as positive or negative examples, the target digital content item or its components, and/or the metadata about the target digital content item. [0220]-[0222] 0220] In some embodiments where host times in the target digital content item are predicted using any type of machine learning classifier or neural network model, after the prediction, the host times or candidate host times may be additionally scored or have their candidate score weighted or adjusted based on factors , such as user data about most viewed portions of the target digital content item or digital content in general, the proximity of the candidate host time to the start of the content (which is useful when inserting advertisements into videos, so as to ensure that the advertisement is likely to be shown before some percentage of viewers drop off), or, for the advertising use-case, advertisement time insertion policies or historical data tracking which portions of the digital content can host the most effective or most lucrative advertisement insertions. [0225])

6. The learning apparatus according to claim 3, wherein the learning data analysis unit specifies the frequency band for which extraction of the feature is of relatively high necessity among the frequency bands, as a necessary band, by analyzing the frequency bands included in the annotation image, and the learning method determination unit matches a range of the frequency bands included in the annotation image with a range of the analyzable frequency bands in the machine learning model, by lowering a resolution of the learning input image based on the specified necessary band. (Atorre: [0096] The camera motion classification module 108 is configured to use a machine learning model trained on samples of content with different types and levels of camera motion in digital content to predict the type or level of camera motion in a given scene of the target digital content item. The camera motion classification module 108 can output the classification information to the host time identification module 106 to aid the identification of host times. [0219] In some embodiments, host time locations in the target digital content item are predicted using a machine learning classifier trained on visual and/or audio attributes and/or the metadata from examples of host times labelled as positive or negative examples, the target digital content item or its components, and/or the metadata about the target digital content item. [0220]-[0222] 0220] In some embodiments where host times in the target digital content item are predicted using any type of machine learning classifier or neural network model, after the prediction, the host times or candidate host times may be additionally scored or have their candidate score weighted or adjusted based on factors , such as user data about most viewed portions of the target digital content item or digital content in general, the proximity of the candidate host time to the start of the content (which is useful when inserting advertisements into videos, so as to ensure that the advertisement is likely to be shown before some percentage of viewers drop off), or, for the advertising use-case, advertisement time insertion policies or historical data tracking which portions of the digital content can host the most effective or most lucrative advertisement insertions. [0225])

7. The learning apparatus according to claim 1, wherein the learning data analysis unit analyzes the frequency bands included in the annotation image and the learning input image, and the learning method determination unit determines the learning method based on an analysis result of the annotation image and an analysis result of the learning input image. (Atorre: [0096] The camera motion classification module 108 is configured to use a machine learning model trained on samples of content with different types and levels of camera motion in digital content to predict the type or level of camera motion in a given scene of the target digital content item. The camera motion classification module 108 can output the classification information to the host time identification module 106 to aid the identification of host times. [0219] In some embodiments, host time locations in the target digital content item are predicted using a machine learning classifier trained on visual and/or audio attributes and/or the metadata from examples of host times labelled as positive or negative examples, the target digital content item or its components, and/or the metadata about the target digital content item. [0220]-[0222] 0220] In some embodiments where host times in the target digital content item are predicted using any type of machine learning classifier or neural network model, after the prediction, the host times or candidate host times may be additionally scored or have their candidate score weighted or adjusted based on factors , such as user data about most viewed portions of the target digital content item or digital content in general, the proximity of the candidate host time to the start of the content (which is useful when inserting advertisements into videos, so as to ensure that the advertisement is likely to be shown before some percentage of viewers drop off), or, for the advertising use-case, advertisement time insertion policies or historical data tracking which portions of the digital content can host the most effective or most lucrative advertisement insertions. [0225])

11. The learning apparatus according to claim 1, wherein the learning data analysis unit analyzes the frequency bands based on a learning data group including a plurality of pieces of the learning data. (Atorre: [0096] The camera motion classification module 108 is configured to use a machine learning model trained on samples of content with different types and levels of camera motion in digital content to predict the type or level of camera motion in a given scene of the target digital content item. The camera motion classification module 108 can output the classification information to the host time identification module 106 to aid the identification of host times. [0219] In some embodiments, host time locations in the target digital content item are predicted using a machine learning classifier trained on visual and/or audio attributes and/or the metadata from examples of host times labelled as positive or negative examples, the target digital content item or its components, and/or the metadata about the target digital content item. [0220]-[0222] 0220] In some embodiments where host times in the target digital content item are predicted using any type of machine learning classifier or neural network model, after the prediction, the host times or candidate host times may be additionally scored or have their candidate score weighted or adjusted based on factors , such as user data about most viewed portions of the target digital content item or digital content in general, the proximity of the candidate host time to the start of the content (which is useful when inserting advertisements into videos, so as to ensure that the advertisement is likely to be shown before some percentage of viewers drop off), or, for the advertising use-case, advertisement time insertion policies or historical data tracking which portions of the digital content can host the most effective or most lucrative advertisement insertions. [0225])

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103(a) which forms the basis for all obviousness rejections set forth in this Office action:
(a) A patent may not be obtained though the invention is not identically disclosed or described as set forth in section 102 of this title, if the differences between the subject matter sought to be patented and the prior art are such that the subject matter as a whole would have been obvious at the time the invention was made to a person having ordinary skill in the art to which said subject matter pertains.  Patentability shall not be negatived by the manner in which the invention was made.

Claims 5, 8 and 10 are rejected under 35 U.S.C. 103(a) as being unpatentable over Atorre et al. (US 2019/0035431 Al), hereby referred to as “Atorre”, in view of Iwami et al. (US PGPub US 2014/0111711 Al), hereby referred to as “Iwami”. 
Consider Claims 5, 8 and 10. 
Claims 5, 8 and 10 are rejected for the same reasoning presented below. 
Atorre teaches The learning apparatus according to Claims 1 and 2 comprising a learning data analysis unit. 
Atorre does not teach: 
5. The learning apparatus according to claim 2, wherein the learning data analysis unit specifies the frequency band for which extraction of the feature is of relatively high necessity among the frequency bands, as a necessary band, by analyzing the frequency bands included in the annotation image, and the learning method determination unit matches a range of the frequency bands included in the annotation image with a range of the analyzable frequency bands in the machine learning model, by lowering a resolution of the learning input image based on the specified necessary band.
8. The learning apparatus according to claim 1, wherein the learning data analysis unit generates an image pyramid including a plurality of images which are obtained by gradually lowering the resolution of the annotation image and have different resolutions, and analyzes the frequency bands included in the annotation image based on the image pyramid.
Iwami teaches: 
1. An apparatus for analyzing an input image, (Iwami: abstract, [0019])
1. the apparatus performing semantic segmentation of determining a plurality of classes in the input image in units of pixels by (Iwami: [0125] )
1. extracting, features which are included in the input image and have different frequency bands of spatial frequencies, the apparatus comprising: (Iwami: [0024] Furthermore, the mesh pattern preferably is formed based on output image data obtained by carrying out the following data generating process. Namely, the process comprises a selection step of selecting plural positions from within a predetermined two-dimensional image region, a generation step of generating image data that represent a pattern of the mesh pattern based on the selected plural positions, a calculation step of calculating a quantified evaluation value concerning noise characteristics of the mesh pattern, based on the generated image data, and a determination step of determining one of the image data as the output image data, based on the calculated evaluation value and predetermined evaluation conditions. [0248] FIG. 21B is a distribution diagram of a power spectrum Spc (hereinafter referred to simply as a spectrum Spc) obtained by implementing FFT on the image data Img of FIG. 21A. T11e horizontal axis of the distribution diagram indicates the spatial frequency in the x-axis direction, whereas the vertical axis indicates the spatial frequency in the y-axis direction. Further, as the displayed density within each spatial frequency band becomes thinner, the intensity level (spectral value) becomes smaller, and as the displayed density becomes denser, the intensity level becomes greater. In the example shown in the diagram, the spectral distribution of the spectrum Spc is isotropic and has two annular peaks.)
5. The learning apparatus according to claim 2, wherein the learning data analysis unit specifies the frequency band for which extraction of the feature is of relatively high necessity among the frequency bands, as a necessary band, by analyzing the frequency bands included in the annotation image, and the learning method determination unit matches a range of the frequency bands included in the annotation image with a range of the analyzable frequency bands in the machine learning model, by lowering a resolution of the learning input image based on the specified necessary band. (Iwami: [0018] Furthermore, the mesh pattern preferably satisfies at least one of the following first and second conditions. First condition: In relation to a centroid position distribution power spectrum of the mesh shapes, an average intensity on a spatial frequency side higher than a predetermined spatial frequency is greater than an average intensity on a spatial frequency band side lower than the predetermined spatial frequency. Second condition: In a convolution integral of a power spectrum of the mesh pattern and human standard visual response characteristics, respective integral values thereof within a spatial frequency band greater than or equal to ¼ of and less than or equal to ½ of a spatial frequency corresponding to an average line width of the thin metal wires, are greater than an integral value at zero spatial frequency.)
8. The learning apparatus according to claim 1, wherein the learning data analysis unit generates an image pyramid including a plurality of images which are obtained by gradually lowering the resolution of the annotation image and have different resolutions, and analyzes the frequency bands included in the annotation image based on the image pyramid. (Iwami: [0275] In the foregoing manner, in a convolution integral between the spectrum Spc as viewed in plan and a standard human visual response characteristic (VTF), respective integral values {Noise Intensity NP(Ux, Uy)} within a spatial frequency band greater than or equal to ¼ of and less than or equal to ½ of the Nyquist frequency (i.e., the spatial frequency corresponding to the average wire width of the thin metal wires 16), are greater than the integral value {Noise Intensity NP(0, 0)}. Therefore, compared to the low spatial frequency band side, the noise amount on the side of the high spatial frequency band is relatively large. Although human visual perception has a high response characteristic in a low spatial frequency band, in mid to high spatial frequency bands, properties of the response characteristic decrease rapidly, and thus, the sensation of noise as perceived visually by humans tends to decrease. In accordance with this phenomenon, the sensation of granular noise caused by the pattern of the conductive sheet 10 is lowered, and the visibility of objects to be observed can be significantly enhanced. Further, since plural polygonal meshes are provided, the cross sectional shape of the respective wires after cutting is substantially constant, and thus the conductive sheet exhibits a stable conducting capability.)
10. The learning apparatus according to claim 1, wherein the input image is a cell image in which cells appear. (Iwami: [0444] vinyl alcohols (PVA), polyvinyl pyrolidones (PVP), polysaccharides such as starches, celluloses and derivatives thereof, polyethylene oxides, polyvinylamines, chitosans, polylysines, polyacrylic acids, polyalginic acids, polyhyaluronic acids, and carboxycelluloses, etc. Such binders exhibit neutral, anionic, or cationic properties depending on the ionic properties of the functional group.)
It would have been obvious before the effective filing date of the claimed invention was made to one of ordinary skill in the art to substitute in Iwami’s multi-resolution datasets into  the overall learning architecture of Atorre as they are both directed towards the overall technology of image analysis and processing. The determination of obviousness is predicated upon the following findings: One skilled in the art would have been motivated to modify Atorre’s overall learning architecture in order to leverage the powerful classification efficiency to the multi-resolution input data of Iwami. Furthermore, the prior art collectively includes each element claimed (though not all in the same reference), and one of ordinary skill in the art could have combined the elements in the manner explained above using known engineering design, interface and programming techniques, without changing a “fundamental” operating principle of Atorre, while the teaching of Iwami continues to perform the same function as originally taught prior to being combined, in order to produce the repeatable and predictable result of improving the overall analysis and classification of multi-resolution input data for a variety of fields including cellular and medical image data.  It is for at least the aforementioned reasons that the examiner has reached a conclusion of obviousness with respect to the claim in question.

Allowable Subject Matter
Claims 9 is objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims, and to overcome the rejections under 35 U.S.C 112 second paragraph for indefiniteness .
Claims 9 is not rejected because the prior art fails to teach the method of Claim 7 and the device of Claim 21, which specifically comprises the following features in combination with other recited limitations: 
-; 9. The learning apparatus according to claim 1, wherein the machine learning model is configured with a convolutional neural network including an encoder network and a decoder network, the encoder network being a network that performs convolution processing of extracting an image feature map representing features which are included in the input image and have different frequency bands by performing convolution computation using a filter, performs pooling processing of outputting the image feature map having a reduced image size by calculating local statistics of the image feature map extracted by the convolution processing and compressing the image feature map, and extracts the image feature map for each layer by repeatedly performing, in the next layer, the convolution processing and the pooling processing on the image feature map which is output in the previous layer and is reduced, the decoder network being a network that generates an output image in which each class region is segmented by repeatedly performing upsampling processing and merging processing, the upsampling processing being processing of, from the minimum-size image feature map which is output in the encoder network, gradually enlarging an image size of the image feature map by upsampling, and the merging processing being processing of combining the image feature map which is gradually enlarged with the image feature map which is extracted for each layer of the encoder network and has the same image size.

Conclusion
The prior art made of record in form PTO-892 and not relied upon is considered pertinent to applicant's disclosure. 

    PNG
    media_image1.png
    177
    1447
    media_image1.png
    Greyscale

Any inquiry concerning this communication or earlier communications from the examiner should be directed to TAHMINA ANSARI whose telephone number is 571-270-3379.  The examiner can normally be reached on IFP Flex - Monday through Friday 9 to 5.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, SUMATI LEFKOWITZ can be reached on 571-272-3638.  The fax phone numbers for the organization where this application or proceeding is assigned are 571-273-8300 for regular communications and 571-273-8300 for After Final communications. TC 2600’s customer service number is 571-272-2600.
Any inquiry of a general nature or relating to the status of this application or proceeding should be directed to the receptionist whose telephone number is 571-272-2600.




2662
/Tahmina Ansari/

June 30, 2022

/TAHMINA N ANSARI/Primary Examiner, Art Unit 2662