DETAILED ACTION
This action is in response to the amendments filed on June 7th, 2021. A summary of this action:
Claims 2-6, 8-12, 14-21 have been presented for examination.
Claims 19-21, 2, 14 have been amended
Claims 2-5, 8-11, 14-17, 19-21 is/are rejected under 35 U.S.C. 103 as being unpatentable over Shibuya et al., US 2012/0290879 in view of Maeda et al., US 2012/0041575 and in further view of Zhang et al., “KRNN: k Rare-class Nearest Neighbour classification”, 2016
Claims 6, 12, 18 is/are rejected under 35 U.S.C. 103 as being unpatentable over Shibuya et al., US 2012/0290879 in view of Maeda et al., US 2012/0041575 and in further view of Zhang et al., “KRNN: k Rare-class Nearest Neighbour classification”, 2016 in further view of Skand, “kNN(k-Nearest Neighbour) Algorithm in R”, 2017
This action is Final

	Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendment/Arguments
Regarding the claim objections
	In light of the applicant’s amendments, the objections to the claims are withdrawn. 

Regarding the § 112(a) Rejection
amendments to the claims, the § 112(a) rejection is withdrawn. 
	For clarity of record, the present amendments draw support from the original claims as filed on Feb. 27th, 20218. 

Regarding the objection to the specification 
	The objection is withdrawn – the newly amended ¶ 43 draws support from ¶ 42 of “The contribution may be further refined by determining the relative density of the label in the neighborhood to the density of the label in the model overall 616, determining a contribution of the signal to the condition represented by the labeled point 618, and sorting the signals by contribution 622”, i.e. ¶ 43 is a re-phasing of the recitation in ¶ 42. 

Regarding the § 103 Rejection
	The rejection is maintained. 
	The applicant submits (Remarks, page 17, emphasis by applicant):
	Nothing in the cited portion of Shibuya discusses breaking a feature space into feature subspaces or having means and distributions of the learning data as feature values, as alluded to in the Office Action, but it does not matter. 
Even assuming that an "event signal" could indicate a condition or a label, nothing in these cited portion of Shibuya discloses "receiving a labeled point comprising, for a range of time, a first time series from each of the plurality of signals, each first time series being associated with at least one feature vector, each of the plurality of signals having a corresponding signal projection subspace" to which the labeled point and thus the at least one feature vector is to be projected, as recited in Claim 19.

	See the rejection – at least page 24, wherein ¶ 88 is cited – the feature space is generated by extracting the features from the sensor signals. Each sensor signal is transformed/extracted to a “feature vector... x the number of sensor”. 	
See the rejection – clearly, a feature vector for each signal in the series of signals is part of this, wherein the “feature vector” term used by Shibuya is a vector of vectors, i.e. see above, as it has a “window width” and “number of sensors”.
 	In regards to the feature subspaces – this is merely a matter of a reasonable interpretation of the claims in light of the specification.  
	A feature space comprises feature subspaces – mathematically, a subspace is merely a subspace of a space. 
	As used in the claims, the “subspace” is merely a subspace in the feature space for each signal, i.e. each signal has a subspace, wherein the plurality of signals has a feature space. The subspaces as recited in the claims merely convey that they are subspaces of a feature space associated with the plurality of signals. 
	Obviously, this encompasses Shibuya as relied upon – see the rejection for clarification. Clearly, Shibuya’s feature space includes various subspaces, one for each of “the number of sensor”, i.e. for 5 sensors, there are 5 subspaces – this is part of the joint feature space. As stated in the rejection. 	
	In regards to the “alluding” argument – see the rejection, specifically see the citations to Shibuya.

The applicant further submits (Remarks, page 18):
It is unclear to Applicant what "newly received labelled point" referred to in the Office Action is. It may mean the signal data to be evaluated for anomaly against the "normal models". Regardless, nothing in these cited portion of Shibuya discloses projecting whatever is considered to correspond to a "condition point" or whatever is considered to correspond to a "labeled point" in Shibuya into "each signal projection subspace of the plurality of signal projection subspaces", each corresponding to a signal generated by one or more sensors for the same "range of time" associated with the "condition point" or "labeled point", as recited in Claim 19...

The applicant’s arguments and supporting amendments have been fully considered and are not persuasive. 
As per the rejection, page 24: “the system of Shibuya is obviously receiving a continuous stream of data segmented into “modes”, i.e. this is for anomaly detection , it would have been obvious that a newly received “mode”, e.g. “start”, and the associated signal data/feature vectors would have been encompassed by the labelled point, i.e. the “mode” “start” obviously is a label which describes a condition of the machine system starting, wherein this “mode” comprises the time series data from the signals for that particular “mode”, and wherein the system extracts feature vectors for that “mode”
To further clarify this: the data is being received continuously and divided into blocks according to the “modes”. As time progresses, there are numerous blocks of the sensor data for each mode, i.e. see figure 2C – there are at least 2 blocks of the sensor data for the “Normally off” and “normally on” modes, as well as the “start” mode. Shibuya, as relied upon, is a system which accumulates the data over time.

But again, this is a system that is continuously receiving this data. 
As such, newly received data for a mode is an example of a labelled point – the previously received data, to which the newly received data is compared to, forms the model condition points, i.e. for each mode/label there is a collection of “learning data” which is previously received sensor data and this data is used to evaluate newly received data. 
See figures 2C, 4, and 9A – 9B for more clarification as cited in the rejection (at least on pages 23-24). 

The applicant submits (Remarks, page 18):
Fig. 5 of the present application might discuss projection with respect to a feature axis, but that is further projection after the projection of a "condition point" or "labeled point" into a "signal projection subspace", as recited in Claim 19. Any projection into a feature space that might be discussed in the cited portion of Shibuya has nothing to do with "projecting each of a plurality of model condition points [or labeled point] into each signal projection subspace of the plurality of signal projection subspaces", as recited in Claim 19.

The applicant’s arguments and supporting amendments have been fully considered and are not persuasive. 
	See the applicant’s remarks, page 14, from August 6th, 2020: Fig. 5 of the application, for example, illustrates that the projection of a "time series" for a signal of a "labeled point" 502 or a "modeled condition point" within 504 ( either point having a time series for each of a plurality 
	See the applicant’s remarks, page 12, from Feb. 3rd, 2020: “Projecting the labeled point [comprising ... a time series from each of a set plurality of signals] into each signal projection subspace" means no more than retrieving the time series of the "labeled point" corresponding to each of the plurality of signals (because no other time series of the labeled point corresponds to the signal) and projecting the time series to a "subspace" for the signal. It is unrelated to training any classifier system. Fig. 5, for example, illustrates how each data point (or specifically the time series of feature vectors from a signal) is projected into a one- or multi-dimensional subspace for the signal where each axis of the subspace represents one of the "signal features" of the signal.”
	The prior art relied upon, as taken in combination, teaches the presently claimed invention. The applicant’s argument is moot – it is an unreasonable interpretation of their own claims and specification, as evidence by the previous arguments submitted in regards to figure 5. 

The applicant further submits (Remarks, page 19):
Nothing in these cited portion of Shibuya discusses breaking each of a plurality of time series associated with a condition associated with a "range of time" into individual signals and determine which signals and thus sensors contribute more to a particular condition. Therefore, the cited portion of Shibuya does not disclose "calculating a contribution of each of the plurality of signals to the first condition to form a sorted list of signals and contributions", as recited in Claim 19.

The applicant’s arguments and supporting amendments have been fully considered and are not persuasive. 
	See the rejection – the prior art as relied upon teaches this.
	As per the rejection: 
Shibuya, ¶ 117 “Further, since it is considered that a signal having a large deviation [contribution to the first condition/label] when the anomaly occurs contributes to the anomaly judgment, when the signals are displayed in the order of the large deviation from the top, it is easily verified which sensor signal has the anomaly. In addition, when a past case of the cause event is displayed in the same manner as the presented result event, it is easy to accept the same phenomenon to trust the advance notice of the result event.”, wherein this is used to detect an anomaly occurring in the received “event” (¶ 120) wherein as per at least figure 19 as cited above is the mode label from the event signal – in other words the system determines for each newly received mode/”cause event” [including the first label] the deviation of each signal compared to the “normal” signal/model(s)”  and sorts the signals into an ordered list by the “order of large deviation from the top” to determine “which sensor signal has the anomaly” [which sensor signal contributes the most to the first label/cause event] – to clarify this obviously forms a sorted listed of signals based on the contribution [e.g., the “deviation”] of the signal to the event label/mode of the received data, wherein the deviation is a measure of how much the present labelled point/associated time series/associated feature vectors vary/deviate from the normal case, i.e. this shows how much each signal contributes to the first condition/label.

The applicant further submits (Remarks, page 19):
Just like Shibuya, Maeda does not consider any "signal projection subspace", as recited in Claim 19, but at most a feature space. In addition, nothing in the cited portion of Maeda discusses an analysis neighborhood for the new (sensor) signal data that has been considered to correspond to the "labeled point in the Office Action ( or for anything else in Maeda that could be considered to correspond to the "labeled point") to be evaluated for an anomaly, as recited in Claim 19.
The applicant’s arguments and supporting amendments have been fully considered and are not persuasive. 
See above in regards to the “subspace” – subspace is merely a subspace of a feature space, i.e. the subspace for a sensor in a feature space which comprises all of the sensors.
The prior art as relied upon teaches this claimed feature – clearly there is a feature space in the prior art, and clearly this feature space comprises subspaces for each of the sensors, i.e. it is by the “number of sensor” (Shibuya, ¶ 88). 
In other words, for each sensor there is a subspace, the aggregate data for all of the sensors forms the feature space, as taught by the prior art. 
In regards to the analysis neighborhood – obviously, this limitation encompasses the use of a nearest neighbors method, e.g. “k-NN” or a similar method, wherein at least the kNN method uses “distance with a feature space” (see the rejection, page 34-35). 
And as per the abstract of Maeda – this is being used for identifying an abnormal “sensor signal”, e.g. see figure 4 of Maeda which shows the “time series data” for the “gas flow rate” and draws circles on the “anomaly” and other portions of the data- clearly, it would have been obvious in light of Maeda and Shibuya to use the kNN classifier in each signal subspace, as it is trying to identify the “anomaly”, and other conditions [see the citations],  for each signal in each sensors data, after the “Feature extraction”. 

    PNG
    media_image1.png
    405
    648
    media_image1.png
    Greyscale


The applicant further submits (Remarks, page 20):
The Abstract and Section 5.2 of Zhang cited in the Office Action describes merely taking into account the local versus global class imbalance levels by considering the (A) "positive odds ratio in the local neighborhood" and the (B) "positive odds in the global population". The cited portion of Zhang does not disclose "for each signal projection subspace of a plurality of signal project subspaces, calculating a first ratio" ... and "for each signal projection subspace, calculating a second ratio", as recited in Claim 19.
The applicant’s arguments and supporting amendments have been fully considered and are not persuasive. 
In re Keller, 642 F.2d 413, 208 USPQ 871 (CCPA 1981); In re Merck & Co., 800 F.2d 1091, 231 USPQ 375 (Fed. Cir. 1986).
	Zhang is relied upon for “a modification to the kNN’s probability estimation to account for ‘rare classes’” (rejection, page 38). 
	In other words, Zhang is modifying the kNN of Maeda as used in each signal subspace.
	
The applicant further submits (Remarks, page 21):
However, Shibuya clearly discusses supervised learning (apparently referred to as "specifically-oriented learning" in Zhang), applied to determine a label for new data based on existing, trained "normal models", while Zhang clearly discusses unsupervised learning (apparently referred to as "generally-oriented learning"), where the given data is directly classified or labeled. The Office Action provides no detail on how Shibuya could benefit from the Zhang technique and thus no motivation on why the two references should be combined.

	The applicant’s arguments and supporting amendments have been fully considered and are not persuasive. 
	As per the rejection: “The motivation to combine would have been that the technique of Zhang would have biased the classification towards the rare class, e.g. an anomaly/outlying class, and as such Zhang would have improved the system’s ability to determine if a newly received set of time series from signals was anomalous [i.e., a rare class, it is abnormal], i.e. 
	The applicant’s arguments regarding the supervised/unsupervised learning are not persuasive. These terms are not used by Zhang, nor is the applicant’s inference reasonable of the differences between “generality-oriented” and “specificity-oriented”. 
	These terms are clearly described by Zhang, and not in the manner inferred to by the applicant.
	In addition, as per Zhang, page 33, col. 2, ¶ 2 “KNN is a computationally simple specificity-oriented algorithm.”
	Clearly, the applicant’s argument that “Zhang clearly discusses unsupervised learning (apparently referred to as "generally-oriented learning"),” has ignored this portion of Zhang in its entirety. Instead, the applicant’s arguments draw from the applicant’s own inferences from Zhang on what these terms mean, and then from their own inferences draw a conclusion that is contrary to the explicit recitation in Zhang. 
	For more evidence of this – see the abstract of Zhang: “We research local strategies for the specificity-oriented learning algorithms like the k Nearest Neighbour (KNN) to address the within-class imbalance issue of positive data sparsity.”

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the 
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 2-5, 8-11, 14-17, 19-21 is/are rejected under 35 U.S.C. 103 as being unpatentable over Shibuya et al., US 2012/0290879 in view of Maeda et al., US 2012/0041575 and in further view of Zhang et al., “KRNN: k Rare-class Nearest Neighbour classification”, 2016

Regarding Claim 19.
Shibuya teaches: 
	A control method for a machine system comprising sensors generating a plurality of signals over time, the method comprising: (Shibuya, abstract, teaches an anomaly detection system for a “facility”, e.g. see ¶ 2-3 – this is for facilities that include machine systems such as a “windmill”, a “nuclear reactor”, etc. – then see figure 9A which shows an example embodiment in which 4 signals are received wherein ¶ 105 teaches these signals are “sensor signal”  and see figure 1 – this is to “diagnose” an anomaly in a machine system based on signals received from sensors)

    PNG
    media_image2.png
    630
    892
    media_image2.png
    Greyscale

	receiving a labeled point comprising, for a range of time, a first time series from each of the plurality of signals, each first time series being associated with at least one feature vector , each feature vector having components corresponding to one or more signal features, the labeled point having a first label describing a first condition of a plurality of conditions of the machine system, each of the plurality of signals having a corresponding signal projection subspace; (Shibuya, see figure 19 and ¶ 169 – 170 – this is a system which combines both the “first embodiment” and the “second embodiment” wherein sensor signal(s) are received from a facility along with an “event signal” wherein the “event signal” is used for a “mode dividing unit”, i.e. the signals from each sensor are divided into a plurality of time series due to the “mode” – see ¶174 “Meanwhile, the mode dividing unit 1908 performs mode dividing of to clarify – the system receives a plurality of sensor signals over a period of time, segments the signals into time series for each mode [each label], and extracts feature vectors for each cycle/mode, i.e. the system receives from this process a labelled point for each mode comprising time series from each signal and the associated feature vectors for each time series for each mode, e.g. for the mode “start” the system uses the label of this mode to divide/segment the received sensors, and for the segment of time series received during the “start” mode the system performs a feature extraction into a feature space, i.e. “data is extracted....every period” – see figures 9A and 9B for the plurality of signals and associated features
for more clarification also see ¶ 72 teaches “The event signal 103 is a signal indicating an operation, a failure, or a warning of the facility which is output irregularly and is constituted by a character string indicating the time and the operation, the failure, or the warning.”, e.g. a “normal OFF”, a “start”, a “normal ON”, and a “stop”, and see ¶ 82 “An example of the sensor 
in regards to claim interpretation – the system of Shibuya is obviously receiving a continuous stream of data segmented into “modes”, i.e. this is for anomaly detection , it would have been obvious that a newly received “mode”, e.g. “start”, and the associated signal data/feature vectors would have been encompassed by the labelled point, i.e. the “mode” “start” obviously is a label which describes a condition of the machine system starting, wherein this “mode” comprises the time series data from the signals for that particular “mode”, and wherein the system extracts feature vectors for that “mode” 
to clarify and in regards to the feature space– see ¶ 88 “The feature amount extraction is considered using the sensor signal as it is. A window of ±1, ±2, etc., is set with respect to a predetermined time and a feature indicating a time variation of data may be extracted by a feature vector of a window width (3, 5, etc.,) x the number of sensor”, in other words a feature vector for all the signals for a “window” of time [e.g., the operating cycle such as “start”] is extracted wherein the feature vector comprises a separate feature vector for each of the “number of sensor” [e.g. see figure 9B] wherein each sensor has its own feature space [as it is obviously a feature vector that was extracted forming a feature space for each vector, also see figure 13 for an example], in other words there is a joint feature space divided into feature subspaces for each of “the number of sensors” wherein for each sensor there is a feature space for the sensor with at least one associated feature vector, also see ¶ 143 and figure 13 for an example – a “feature” space is created for each signal, e.g. the “daily mean” of the signal as one feature and the “Daily distribution” of the signal, in other words “the mean and the 

    PNG
    media_image3.png
    361
    918
    media_image3.png
    Greyscale


    PNG
    media_image4.png
    670
    779
    media_image4.png
    Greyscale


	projecting each of a plurality of model condition points into each signal projection subspace of the plurality of signal projection subspaces, each model condition point comprising a second time series from each of the plurality of signals, each second time series being associated with at least one feature vector , each feature vector having components corresponding to one or more signal features, the model condition point having a second label describing one of the plurality of conditions of the machine system, the projecting comprising evaluating at least one feature vector associated with the second time series from each of the plurality of signals in the corresponding signal projection subspace; (Shibuya, see figure 19 as cited above and as detailed above, specifically see the # 1903 for the “Learning-Data selecting unit” for the model creation – this selects the “learning data” wherein ¶ 173 teaches “The sensor signal 102 output from the facility 101 is accumulated for learning in advance. The feature amount extraction unit 1901 inputs the accumulated sensor signal 102 and performs feature amount extraction to acquire the feature vector. The feature-selection unit 1902 performs data check of the feature vector output from the feature amount extraction unit 1901 and selects a feature to be used. The learning data selecting unit 1903 performs data check of the feature vector configured by the selected feature and check of the event signal 103 and selects the learning data used to create the normal model”, in other words the system accumulates sensor(s) signal data over a period of time for learning, and uses these plurality of time series as model condition points for creating a “normal model” – to clarify see figure 5 and ¶ 83-86 – the “normal model” is created by using the accumulated signals from the same sensors “during a predetermined period” and dividing these signals for the “mode”, i.e. ¶ 91 “the normal-model creation unit 106 classifies the learning data selected in step S503 for each of the modes divided by the mode dividing unit 104 and creates the normal model for each mode in step S505.”, and then see ¶ 179 “When the normal model is created for each mode, the anomaly measurement is computed by using the normal models of all the modes and the minimum value is acquired.” and ¶ 161-163 which teaches creating “plural normal models” [model condition points] for each “cycle”/mode by “random sampling” for “several cycles” and see ¶178 “The feature amount extraction unit 1901 inputs the sensor signal 102 and performs the same feature amount extraction as that at the learning time to acquire the feature vector”, in other words the system obtains model condition points [plural normal models] for each mode [points associated with labels for a plurality of conditions, e.g. start/stop/on/off] wherein each model condition point comprises time series from each of the sensors [random sampling over several cycles for accumulating sensor signal data] wherein each of these time series at least one feature vector [using the “same feature...extraction”] as the labelled point and wherein these model condition points and then used for classifying (fig. 5, “classify data for each mode”) using the feature vectors – in regards to this being in the signal projection subspace see above – the system is using the same feature extraction method for both the labelled input data and the learning data, so obviously these are in the same signal projection feature spaces, e.g. figure 13 shows an example – obviously as figure 13 is an example comprising a plurality of points for “every one day” (¶ 143) this shows not just the feature space/feature vectors for the most recent data [the labelled point for the mode] but also shows the learning data that was previously accumulated, and ¶ 143 then clarifies, as cited above, that the period/cycle for this would also obviously be by mode, e.g. “when the starting/stopping time are known” [starting/stopping modes], to clarify – the system extract features, e.g. the mean for each mode over several cycles of each mode, to form a feature space for each signal, i.e. each signal has a feature space which comprises the received data for each mode, including the labelled point [e.g., the point being the data for the most recent “start” cycle] wherein the system then projects the “learning data” into the same feature space for each signal as the system stores the “learning data” and extracts the feature vectors using the “same” algorithm as for the labelled point, e.g. see figure 13,, also see ¶ 88 as cited above, and also see figure 6 ¶ 93 which provides an example of a “3D feature space” and clarifies that “the dimension of the feature space may be...higher”, i.e. the feature space for each signal, in figure 6 the “evaluation data” is the projected feature vectors from the “the number of the learning data” (see ¶ 93), and in regards to evaluating the at least one feature vector in the subspace see ¶ 93-99 which provides an example of evaluating the feature vectors using a “local sub-space classifier” which creates a subspace in the feature space, e.g., for a 3D space it creates a 2D plane to evaluate the “distance between the evaluation data [learning data] and the point b [the labelled point]” for the normal model, e.g. this evaluation is using the “mean...and covariance matrix...of the learning data” for the feature space, also see ¶ 99 for other methods of creating “the normal model” in the feature space such as “a nearest method” or a “similarity base model” )	

    PNG
    media_image5.png
    697
    904
    media_image5.png
    Greyscale

	projecting the labeled point into each signal projection subspace, comprising evaluating the at least one feature vector associated with the first time series from each of the plurality of signals in the corresponding signal projection subspace; (Shibuya, as cited above, e.g. see figures 5 and 13, also see figure 17 and ¶ 158 – the “features” from both the “learning data” and the newly received labelled point [i.e. a plurality of time series labelled with a model from the facility sensors] are projected into the same feature subspace for each signal wherein ¶ 88 as cited above provides an example of feature extraction for forming a feature vector for each signal, wherein there feature vectors for each signal are then turned into a “a feature vector of a window width (3, 5, etc.,) x the number of sensors.”, i.e. there is a feature vector for the in other words the system evaluates the feature vectors of both the model condition points/learning data and the labelled point, e.g. an “anomaly” to find a distance between the feature vectors to determine the “deviation” of each signal)

    PNG
    media_image6.png
    939
    938
    media_image6.png
    Greyscale

	[...]
calculating a contribution of each of the plurality of signals to the first condition..., to form a sorted list of signals and contributions;(Shibuya, ¶ 117 “Further, since it is considered that a signal having a large deviation [contribution to the first condition/label] when the anomaly occurs contributes to the anomaly judgment, when the signals are displayed in the order of the large deviation from the top, it is easily verified which sensor signal has the  wherein this is used to detect an anomaly occurring in the received “event” (¶ 120) wherein as per at least figure 19 as cited above is the mode label from the event signal – in other words the system determines for each newly received mode/”cause event” [including the first label] the deviation of each signal compared to the “normal” signal/model(s)”  and sorts the signals into an ordered list by the “order of large deviation from the top” to determine “which sensor signal has the anomaly” [which sensor signal contributes the most to the first label/cause event] – to clarify this obviously forms a sorted listed of signals based on the contribution [e.g., the “deviation”] of the signal to the event label/mode of the received data, wherein the deviation is a measure of how much the present labelled point/associated time series/associated feature vectors vary/deviate from the normal case, i.e. this shows how much each signal contributes to the first condition/label)

Shibuya does not explicitly teach:
	constraining an analysis neighborhood for the labeled point to a set number of model condition points closest to the labeled point as projected in each signal projection subspace;
	for each signal projection subspace, calculating a first percent of model condition points in the analysis neighborhood having a same label as the labeled point in the analysis neighborhood out of the set number of model condition points in the analysis neighborhood
6for each signal projection subspace, calculating a second percent of model condition points in the model having a same label as the labeled point out of the plurality of model condition points;
	from the first percent and the second percent calculated for the corresponding signal projection subspace
	and adapting the machine system's behavior based on the sorted list. 

Maeda teaches: 
constraining an analysis neighborhood for the labeled point to a set number of model condition points closest to the labeled point as projected in each signal projection subspace; (As an initial matter, the inventors for both Shibuya as relied upon above and Maeda are the same, and then see Maeda the abstract – this is for identifying an abnormal “sensor signal” and see figures 3-4, Maeda is for a related invention to the Shibuya invention, then see figure 11 and the description starting in ¶ 86 – Maeda is using a method similar to a “k-NN method”, i.e. the “k” [a set number of model condition points] which are the “nearest neighbors” wherein kNN uses a “distance within a feature space” wherein ¶ 87 then teaches that this is to find the “k pieces of data with highest similarities to the unit for deciding normal range” and then see ¶ 88 and figure 13 – the system is finding “a number k” of the nearest/most similar signal data [based on “distance within a feature space”] to a newly “observed sensor signal” and based on the k-nearest neighbors of the sensor signal determines if the “observed sensor signal” is “an anomaly” and provides calculation of “a deviance of observation data”, in other words it would have been obvious that, as taken in combination with Shibuya as relied upon above, the system would have used a kNN algorithm, or a “similar method”, to constrain an analysis neighborhood for each signal/each signal’s subspace to a set number “k” of model condition points that are closest in “distance within [the] feature space” [and obviously, the subspace for the sensor/signal] in order to determine the “deviance” of a “observation data” [a labelled point] from the normal points, i.e. this detects if a newly received labelled point comprising signal data for an event period, e.g. “starting” [see Shibuya] is “an anomaly” by checking the deviation for the new data to the k nearest neighbors of that data from the “learning data” “within [the] feature space” for the signal [the feature subspace for the particular signal])
	and adapting the machine system's behavior based on the sorted list. (Medea, as cited above, and as taken in combination with Shibuya teaches detecting an anomalous event and forming a list of signals wherein the list is sorted by which signals have the most deviation at the top – then see Medea, ¶ 171 – the system is be used for “condition-based maintenance” wherein “parts replacement is performed [the system is adjusted] in accordance with the conditions of the device” wherein this is based on the “normal and anomalous data” of the devices as “it is important [for condition-based maintenance] to detect outliers from normal data”, it would have been obvious to use the sorted list of signals and their deviations/contributions to adapt the machine system, e.g. by implementing a parts replacement to correct for anomalous data)

The motivation to combine would have been that 1) Maeda provides a computationally efficient means of anomaly detection using kNN wherein Maeda’s technique also provides “an 
In addition, both references are by the same reference in a similar time period, i.e. it would have been obvious that these references were describing different aspects of the same overall system. 

Shibuya, as taken in combination with Maeda does not explicitly teach: 
for each signal projection subspace, calculating a first percent of model condition points in the analysis neighborhood having a same label as the labeled point in the analysis neighborhood out of the set number of model condition points in the analysis neighborhood
	6for each signal projection subspace, calculating a second percent of model condition points in the model having a same label as the labeled point out of the plurality of model condition points;
	from the first percent and the second percent calculated for the corresponding signal projection subspace


Zhang teaches:
for each signal projection subspace, calculating a first percent of model condition points in the analysis neighborhood having a same label as the labeled point in the analysis neighborhood out of the set number of model condition points in the analysis neighborhood
 (Zhang, abstract, teaches a modification to a KNN “to address the within-class imbalance” in order to “bias classification towards the rare class” and then see § 5.2 which teaches that “A strategy is desired to compute the true positive propensity for these regions so as to distinguish and effectively rank the corresponding query instances. To this end, if the local positive interval for a query instance t (Eq. (2)) is higher than the global positive interval (Eq. (1)), intuitively it indicates that t has higher posterior positive probability than the positive prior based on the observed positive frequency in the training population...The positive posterior probability estimation for t in Eq. (3) therefore should be adjusted. Let λ denote the positive odds (P:N) in the query neighbourhood over the positive odds in the global population. As an example, if P:N = 1:5 in the query region [first percent, this is the percent in the neighborhood – the recitation of percent is an obvious variation of the ratio] and P:N=1:10  [second percent, this is the percent in the total population] in the global training population, then λ = 2....” wherein “λ takes into account the local versus global class imbalance levels and the positive odds ratio in the local neighbourhood versus the global population indicates the positive propensity for the query instance...”, and then see table 1 which also calculates the “Pos frequency” as a variation to P:N, the first ratio is merely the frequency “in the query region”, e.g. 1/6)
	6for each signal projection subspace, calculating a second percent of model condition points in the model having a same label as the labeled point out of the plurality of model condition points;  (Zhang, § 5.2 as cited above, “P:N=1:10 in the global training population” and 
	from the first percent and the second percent calculated for the corresponding signal projection subspace (Zhang, the λ above is the ratio of the first to the second ratio, e.g. (1/5)/(1/10) = “2”) wherein this forms the probability of the labelled point being in the neighborhood/”query region” adjusted for the class imbalance, wherein this adjusts the probability of the prediction)

It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings from Shibuya, as modified above, on a system which uses a kNN for finding anomalous signals  with the teachings from Zhang on a modification to the kNN’s probability estimation to account for “rare classes”. The motivation to combine would have been that the technique of Zhang would have biased the classification towards the rare class, e.g. an anomaly/outlying class, and as such Zhang would have improved the system’s ability to determine if a newly received set of time series from signals was anomalous [i.e., a rare class, it is abnormal], i.e. “These strategies more accurately characterise the rare-class distribution for accurate classification.” (Zhang, §8)
	

Regarding Claim 2
Zhang teaches:
The control method of claim 19, wherein calculating the contribution of each the signal to the first condition comprises:
	if the first percent is less than or equal to the second percent, setting the contribution of the signal to (first percent /second percent ) - 1; (Zhang, as cited above, renders this obvious, i.e. this is “λ” -1 which is an obvious variant, and in terms of using λ as the contribution rate this also would have been obvious, this is the odds that the model condition points in the neighborhood have the same label as the labelled point, divided by the odds that the model condition points in the model have the same label as the labeled point)
	and otherwise, setting the contribution of the signal to (first percent  - second percent )/(1 - second percent ). (This is also an obvious variant from Zhang using a simple rearrangement of λ, and this additionally this is contingent)

Regarding Claim 3
Maeda teaches: 
	The control method of claim 19, wherein constraining the analysis neighborhood for the labeled point to the set number of model condition points closest to the labeled point in the signal projection subspace comprises sorting the model condition points by a distance from the labeled point and limiting the analysis neighborhood to a number of the model condition points having the closest distance. (Maeda, ¶ 86-90, as cited above, this uses a kNN algorithm, or a similar algorithm, to select the k nearest points by “distance” such as for when k = 5 this is “the five highest pieces”, in other words obviously a kNN algorithm constrains the neighborhood of the k nearest neighbors to the closest k neighbors based on “distance within a feature space” [e.g., the subspace] wherein the closest/nearest are sorted by distance [e.g. highest] from the labelled point) 

Regarding Claim 4
Maeda teaches: 
	The control method of claim 3, wherein the distance comprises a distance between a projection of the labeled point and projections of the model condition points on a signal feature axis in the signal projection subspace. (Maeda, ¶ 86 – “distance within a feature space” is used, obviously this includes the distance between the labelled point and the model condition points in each signals feature space, wherein the distance is on a signal feature axis [it’s a distance in a coordinate system, obviously this includes the distance on the feature axis])

Regarding Claim 5
Maeda teaches: 
	The control method of claim 19, wherein adapting the machine system's behavior comprises remediating a faulty component. (Maeda, ¶ 171 teaches using the system for determining “parts replacement” [example of remediating a fault component])



Regarding Claim 20.
Shibuya teaches: 

A machine system monitoring and alert apparatus, comprising: (Shibuya, abstract, teaches an anomaly detection system for a “facility”, e.g. see ¶ 2-3 – this is for facilities that include machine systems such as a “windmill”, a “nuclear reactor”, etc. – then see figure 9A which shows an example embodiment in which 4 signals are received wherein ¶ 105 teaches these signals are “sensor signal”  and see figure 1 – this is to “diagnose” an anomaly in a machine system based on signals received from sensors)
	a computer system comprising a plurality of inputs to receive time series signals from a plurality of sensors; (Shibuya, as cited above)

    PNG
    media_image2.png
    630
    892
    media_image2.png
    Greyscale

the computer system comprising a processor and a memory adapted with instructions forming:  (Shibuya, as cited above)
	logic to receive a labeled point comprising, for a range of time, a first time series from each of the plurality of signals, each first time series being associated with at least one feature vector , each feature vector having components corresponding to one or more signal features, the labeled point having a first label describing a first condition of a plurality of conditions of the machine system, each of the plurality of signals having a corresponding signal projection subspace;(Shibuya, see figure 19 and ¶ 169 – 170 – this is a system which combines both the “first embodiment” and the “second embodiment” wherein sensor signal(s) are received from a facility along with an “event signal” wherein the “event signal” is used for a “mode dividing unit”, i.e. the signals from each sensor are divided into a plurality of time series due to the “mode” – see ¶174 “Meanwhile, the mode dividing unit 1908 performs mode dividing of dividing the time for each operating state based on the event signal 103” wherein ¶ 143 clarifies “As described above, when the operating cycle is regular, e.g., the operation starts and stops at the determined time of one day, data is extracted every fixed period, e.g., one day to compute the mean and the distribution. Although the period is not one day, the same applies thereto. When the operation starting/stopping time is known, data in a period which can be regarded as a normal operation is carried out may be extracted to compute the mean and the distribution [example of feature extraction] and this method may be applied even though the operating cycle is irregular.” and see ¶ 183 “In the learning data selection processing in the learning-data selection unit 1903, the method using the event signal is considered in addition to the same method as the example described by using FIG. 12.”, to clarify – the system receives a plurality of sensor signals over a period of time, segments the signals into time series for each mode [each label], and extracts feature vectors for each cycle/mode, i.e. the system receives from this process a labelled point for each mode comprising time series from each signal and the associated feature vectors for each time series for each mode, e.g. for the mode “start” the system uses the label of this mode to divide/segment the received sensors, and for the segment of time series received during the “start” mode the system performs a feature extraction into a feature space, i.e. “data is extracted....every period” – see figures 9A and 9B for the plurality of signals and associated features
for more clarification also see ¶ 72 teaches “The event signal 103 is a signal indicating an operation, a failure, or a warning of the facility which is output irregularly and is constituted by a character string indicating the time and the operation, the failure, or the warning.”, e.g. a “normal OFF”, a “start”, a “normal ON”, and a “stop”, and see ¶ 82 “An example of the sensor signal 102 is shown in FIG. 4. The sensor signal 102 is plural time-series signals...” and see figures 2C and figures 9A to 9B 
in regards to claim interpretation – the system of Shibuya is obviously receiving a continuous stream of data segmented into “modes”, i.e. this is for anomaly detection , it would have been obvious that a newly received “mode”, e.g. “start”, and the associated signal data/feature vectors would have been encompassed by the labelled point, i.e. the “mode” “start” obviously is a label which describes a condition of the machine system starting, wherein this “mode” comprises the time series data from the signals for that particular “mode”, and wherein the system extracts feature vectors for that “mode” 
to clarify and in regards to the feature space– see ¶ 88 “The feature amount extraction is considered using the sensor signal as it is. A window of ±1, ±2, etc., is set with respect to a predetermined time and a feature indicating a time variation of data may be extracted by a feature vector of a window width (3, 5, etc.,) x the number of sensor”, in other words a feature vector for all the signals for a “window” of time [e.g., the operating cycle such as “start”] is extracted wherein the feature vector comprises a separate feature vector for each of the “number of sensor” [e.g. see figure 9B] wherein each sensor has its own feature space [as it is obviously a feature vector that was extracted forming a feature space for each vector, also see figure 13 for an example], in other words there is a joint feature space divided into feature subspaces for each of “the number of sensors” wherein for each sensor there is a feature space for the sensor with at least one associated feature vector, also see ¶ 143 and figure 13 for an example – a “feature” space is created for each signal, e.g. the “daily mean” of the signal as one feature and the “Daily distribution” of the signal, in other words “the mean and the distribution” are computed for each mode for each signal, obviously forming a feature space for the signal wherein each feature space comprises the feature vectors for each mode, as the features, e.g. mean, are extracted for each cycle such as “starting/stopping”
	logic to projecting each of a plurality of model condition points into each signal projection subspace of the plurality of signal projection subspaces, each model condition point comprising a second time series from each of the plurality of signals, each second time series being associated with at least one feature vector , each feature vector having components corresponding to one or more signal features, the model condition point having a second label describing one of the plurality of conditions of 7Ser. No. 15/906,702 filed 2/27/2018 Gregory Olsen, et al. - GAU 2128 (Hopkins) Docket No. 60363-0027 the machine system, the projecting comprising evaluating the at least one feature vector associated with the second time series from each of the plurality of signals in the corresponding signal projection subspace; (Shibuya, see figure 19 as cited above and as detailed above, specifically see the # 1903 for the “Learning-Data selecting unit” for the model creation – this selects the “learning data” wherein ¶ 173 teaches “The sensor signal 102 output from the facility 101 is accumulated for learning in advance. The feature amount extraction unit 1901 inputs the accumulated sensor signal 102 and performs feature amount extraction to acquire the feature vector. The feature-selection unit 1902 performs data check of the feature vector output from the feature amount extraction unit 1901 and selects a feature to be used. The learning data selecting unit 1903 performs data check of the feature vector configured by the selected feature and check of the event signal 103 and selects the learning data used to create the normal model”, in other words the system accumulates sensor(s) signal data over a period of time for learning, and uses these plurality of time series as model condition points for creating a “normal model” – to clarify see figure 5 and ¶ 83-86 – the “normal model” is created by using the accumulated signals from the same sensors “during a predetermined period” and dividing these signals for the “mode”, i.e. ¶ 91 “the normal-model creation unit 106 classifies the learning data selected in step S503 for each of the modes divided by the mode dividing unit 104 and creates the normal model for each mode in step S505.”, and then see ¶ 179 “When the normal model is created for each mode, the anomaly measurement is computed by using the normal models of all the modes and the minimum value is acquired.” and ¶ 161-163 which teaches creating “plural normal models” [model condition points] for each “cycle”/mode by “random sampling” for “several cycles” and see ¶178 “The feature amount extraction unit 1901 inputs the sensor signal 102 and performs in other words the system obtains model condition points [plural normal models] for each mode [points associated with labels for a plurality of conditions, e.g. start/stop/on/off] wherein each model condition point comprises time series from each of the sensors [random sampling over several cycles for accumulating sensor signal data] wherein each of these time series at least one feature vector [using the “same feature...extraction”] as the labelled point and wherein these model condition points and then used for classifying (fig. 5, “classify data for each mode”) using the feature vectors – in regards to this being in the signal projection subspace see above – the system is using the same feature extraction method for both the labelled input data and the learning data, so obviously these are in the same signal projection feature spaces, e.g. figure 13 shows an example – obviously as figure 13 is an example comprising a plurality of points for “every one day” (¶ 143) this shows not just the feature space/feature vectors for the most recent data [the labelled point for the mode] but also shows the learning data that was previously accumulated, and ¶ 143 then clarifies, as cited above, that the period/cycle for this would also obviously be by mode, e.g. “when the starting/stopping time are known” [starting/stopping modes], to clarify – the system extract features, e.g. the mean for each mode over several cycles of each mode, to form a feature space for each signal, i.e. each signal has a feature space which comprises the received data for each mode, including the labelled point [e.g., the point being the data for the most recent “start” cycle] wherein the system then projects the “learning data” into the same feature space for each signal as the system stores the “learning data” and extracts the feature vectors using the “same” algorithm as for the labelled point, e.g. see figure 13,, also see ¶ 88 as cited above, and also see figure 6 ¶ 93 which provides an example of a “3D feature space” and clarifies that “the dimension of the feature space may be...higher”, i.e. the feature space for each signal, in figure 6 the “evaluation data” is the projected feature vectors from the “the number of the learning data” (see ¶ 93), and in regards to evaluating the at least one feature vector in the subspace see ¶ 93-99 which provides an example of evaluating the feature vectors using a “local sub-space classifier” which creates a subspace in the feature space, e.g., for a 3D space it creates a 2D plane to evaluate the “distance between the evaluation data [learning data] and the point b [the labelled point]” for the normal model, e.g. this evaluation is using the “mean...and covariance matrix...of the learning data” for the feature space, also see ¶ 99 for other methods of creating “the normal model” in the feature space such as “a nearest method” or a “similarity base model” )
	logic to project the labeled point into each signal projection subspace, comprising evaluating the at least one feature vector associated with the first time series from each of the plurality of signals in the corresponding signal projection subspace; (Shibuya, as cited above, e.g. see figures 5 and 13, also see figure 17 and ¶ 158 – the “features” from both the “learning data” and the newly received labelled point [i.e. a plurality of time series labelled with a model from the facility sensors] are projected into the same feature subspace for each signal wherein ¶ 88 as cited above provides an example of feature extraction for forming a feature vector for each signal, wherein there feature vectors for each signal are then turned into a “a feature vector of a window width (3, 5, etc.,) x the number of sensors.”, i.e. there is a feature vector for the plurality of signals, and within that feature vector there is a feature vector for the “number of sensors”, e.g. the mean (fig 13) of each signal wherein ¶ 93-¶99 teaches then evaluating the in other words the system evaluates the feature vectors of both the model condition points/learning data and the labelled point, e.g. an “anomaly” to find a distance between the feature vectors to determine the “deviation” of each signal)
logic to calculate a contribution of each of the plurality of signals to the first condition ... to form a sorted list of signals and contributions;(Shibuya, ¶ 117 “Further, since it is considered that a signal having a large deviation [contribution to the first condition/label] when the anomaly occurs contributes to the anomaly judgment, when the signals are displayed in the order of the large deviation from the top, it is easily verified which sensor signal has the anomaly. In addition, when a past case of the cause event is displayed in the same manner as the presented result event, it is easy to accept the same phenomenon to trust the advance notice of the result event.”, wherein this is used to detect an anomaly occurring in the received “event” (¶ 120) wherein as per at least figure 19 as cited above is the mode label from the event signal – in other words the system determines for each newly received mode/”cause to clarify this obviously forms a sorted listed of signals based on the contribution [e.g., the “deviation”] of the signal to the event label/mode of the received data, wherein the deviation is a measure of how much the present labelled point/associated time series/associated feature vectors vary/deviate from the normal case, i.e. this shows how much each signal contributes to the first condition/label)
	and logic to display signals that are higher in the sorted list of signals as the most likely contributors for the machine system condition corresponding to the labeled point. (Shibuya, as cited above, e.g. ¶ 117 – the signals are displayed in a sorted list the sued in order of the signals contribution to the labeled point)


Shibuya does not explicitly teach:
	logic to constrain an analysis neighborhood for the labeled point to a set number of model condition points closest to the labeled point as projected in each signal projection subspace; 
logic to calculate, for each signal projection subspace, a first percent of model condition points in the analysis neighborhood having a same label as the labeled point in the analysis neighborhood out of the set number of model condition points in the analysis neighborhood;
logic to calculate, for each signal projection subspace, a second percent of model condition points in the model having a same label as the labeled point out of the plurality of model condition points;
...from the first percent and the second percent calculated for the corresponding signal projection subspace...


Maeda teaches: 
	logic to constrain an analysis neighborhood for the labeled point to a set number of model condition points closest to the labeled point as projected in each signal projection subspace; (As an initial matter, the inventors for both Shibuya as relied upon above and Maeda are the same, and then see Maeda the abstract – this is for identifying an abnormal “sensor signal” and see figures 3-4, Maeda is for a related invention to the Shibuya invention, then see figure 11 and the description starting in ¶ 86 – Maeda is using a method similar to a “k-NN method”, i.e. the “k” [a set number of model condition points] which are the “nearest neighbors” wherein kNN uses a “distance within a feature space” wherein ¶ 87 then teaches that this is to find the “k pieces of data with highest similarities to the unit for deciding normal range” and then see ¶ 88 and figure 13 – the system is finding “a number k” of the nearest/most similar signal data [based on “distance within a feature space”] to a newly “observed sensor signal” and based on the k-nearest neighbors of the sensor signal determines if the “observed sensor signal” is “an anomaly” and provides calculation of “a deviance of in other words it would have been obvious that, as taken in combination with Shibuya as relied upon above, the system would have used a kNN algorithm, or a “similar method”, to constrain an analysis neighborhood for each signal/each signal’s subspace to a set number “k” of model condition points that are closest in “distance within [the] feature space” [and obviously, the subspace for the sensor/signal] in order to determine the “deviance” of a “observation data” [a labelled point] from the normal points, i.e. this detects if a newly received labelled point comprising signal data for an event period, e.g. “starting” [see Shibuya] is “an anomaly” by checking the deviation for the new data to the k nearest neighbors of that data from the “learning data” “within [the] feature space” for the signal [the feature subspace for the particular signal])
The motivation to combine would have been that 1) Maeda provides a computationally efficient means of anomaly detection using kNN wherein Maeda’s technique also provides “an anomaly explanation message” (Maeda, ¶ 86) in addition to the “deviance” and 2) Shibuya, ¶ 99 suggests uses a “nearest method” technique [e.g., kNN], and 3) for “condition-based maintenance..., in many cases, anomalous data is rarely collected and the bigger the facility, the more difficult it is to collect anomalous data. Therefore, it is important to detect outliers from normal data.” (Maeda, ¶ 171)
	In addition, both references are by the same reference in a similar time period, i.e. it would have been obvious that these references were describing different aspects of the same overall system. 

Shibuya, as taken in combination with Maeda does not explicitly teach: 
logic to calculate, for each signal projection subspace, a first percent of model condition points in the analysis neighborhood having a same label as the labeled point in the analysis neighborhood out of the set number of model condition points in the analysis neighborhood;
	logic to calculate, for each signal projection subspace, a second percent of model condition points in the model having a same label as the labeled point out of the plurality of model condition points;
...from the first percent and the second percent calculated for the corresponding signal projection subspace...

Zhang teaches:
logic to calculate, for each signal projection subspace, a first percent of model condition points in the analysis neighborhood having a same label as the labeled point in the analysis neighborhood out of the set number of model condition points in the analysis neighborhood; (Zhang, abstract, teaches a modification to a KNN “to address the within-class imbalance” in order to “bias classification towards the rare class” and then see § 5.2 which teaches that “A strategy is desired to compute the true positive propensity for these regions so as to distinguish and effectively rank the corresponding query instances. To this end, if the local positive interval for a query instance t (Eq. (2)) is higher than the global positive interval (Eq. (1)), intuitively it indicates that t has higher posterior positive probability than the positive prior based on the observed positive frequency in the training population...The positive posterior probability estimation for t in Eq. (3) therefore should be adjusted. Let λ denote the positive odds (P:N) in the query neighbourhood over the positive odds in the global population. As an example, if P:N = 1:5 in the query region [first ratio, this is the ratio in the neighborhood] and P:N=1:10  [second ratio, this is the ratio in the total population] in the global training population, then λ = 2....” wherein “λ takes into account the local versus global class imbalance levels and the positive odds ratio in the local neighbourhood versus the global population indicates the positive propensity for the query instance...”, and then see table 1 which also calculates the “Pos frequency” as a variation to P:N, the first ratio is merely the frequency “in the query region”, e.g. 1/6)
	logic to calculate, for each signal projection subspace, a second percent of model condition points in the model having a same label as the labeled point out of the plurality of model condition points;(Zhang, § 5.2 as cited above, “P:N=1:10 in the global training population” and then see table 1 which also calculates the “Pos frequency” as a variation to P:N, the second ratio is merely the frequency “in the global population region”, e.g. 1/11)
...from the first percent and the second percent calculated for the corresponding signal projection subspace...  (Zhang, the λ above is the ratio of the first to the second ratio, e.g. (1/5)/(1/10) = “2”) wherein this forms the probability of the labelled point being in the neighborhood/”query region” adjusted for the class imbalance, wherein this adjusts the probability of the prediction)

It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings from Shibuya, as modified above, on a 8)

7Ser. No. 15/906,702 filed 2/27/2018 Gregory Olsen, et al. - GAU 2128 (Hopkins) Docket No. 60363-0027 the machine system, the projecting comprising evaluating the at least one feature vector associated with the second time series from each of the plurality of signals in the corresponding signal projection subspace;

Regarding Claim 8
Zhang teaches:
	The apparatus of claim 20, wherein calculating the contribution of each the signal to the first condition comprises:
	if the first ratio is less than or equal to the second ratio, setting the contribution of the signal to (first ratio/second ratio) - 1; (Zhang, as cited above, renders this obvious, i.e. this is “λ” -1 which is an obvious variant, and in terms of using λ as the contribution rate this also would have been obvious, this is the odds that the model condition points in the neighborhood have the same label as the labelled point, divided by the odds that the model condition points in the model have the same label as the labeled point)
	and otherwise, setting the contribution of the signal to (first ratio - second ratio)/(1 - second ratio). (This is also an obvious variant from Zhang using a simple rearrangement of λ, and this additionally this is contingent)

Regarding Claim 9
Maeda teaches: 
	The apparatus of claim 20, wherein constraining the analysis neighborhood for the labeled point to the set number of model condition points closest to the labeled point in the signal projection subspace comprises sorting the model condition points by a distance from the labeled point and limiting the analysis neighborhood to a number of the model condition points having the closest distance.  (Maeda, ¶ 86-90, as cited above, this uses a kNN algorithm, or a similar algorithm, to select the k nearest points by “distance” such as for when k = 5 this is “the five highest pieces”, in other words obviously a kNN algorithm constrains the neighborhood of the k nearest neighbors to the closest k neighbors based on “distance within a feature space” [e.g., the subspace] wherein the closest/nearest are sorted by distance [e.g. highest] from the labelled point) 

Regarding Claim 10.
Maeda teaches: 
	The apparatus of claim 9, wherein the distance comprises a distance between a projection of the labeled point and projections of the model condition points on a signal feature axis in the signal projection subspace. (Maeda, ¶ 86 – “distance within a feature space” is used, obviously this includes the distance between the labelled point and the model condition points in each signals feature space, wherein the distance is on a signal feature axis [it’s a distance in a coordinate system, obviously this includes the distance on the feature axis])

Regarding Claim 11.
Maeda, as taken in combination with Shibuya, teaches: 
	The apparatus of claim 20, wherein the first condition comprises a faulty component.  (Maeda, ¶ 171 teaches using the system for determining “parts replacement” [example of remediating a faulty component] and Maeda, as well as Shibuya, teaches that the system is to detect anomalies/abnormal events, it would have been obvious that an abnormal event would have comprised a faulty component, e.g. Shibuya ¶ 12 wherein the anomaly includes “or a repairing operation such as component replacement”)

Regarding Claim 21.
Shibuya teaches: 
	A non-transitory machine-readable storage medium storing instructions that when executed by a processor, cause the processor to execute a control method for a machine system comprising sensors generating a plurality of signals over time, the method comprising: (Shibuya, abstract, teaches an anomaly detection system for a “facility”, e.g. see ¶ 2-3 – this is for facilities that include machine systems such as a “windmill”, a “nuclear reactor”, etc. – then see figure 9A which shows an example embodiment in which 4 signals are received wherein ¶ 105 teaches these signals are “sensor signal”  and see figure 1 – this is to “diagnose” an anomaly in a machine system based on signals received from sensors)
	8 Gregory Olsen, et al. - GAU 2128 (Hopkins) Docket No. 60363-0027 receiving a labeled point comprising, for a range of time, a first time series from each of the plurality of signals, each first time series being associated with at least one feature vector , each feature vector having components corresponding to one or more signal features, the labeled point having a first label describing a first condition of a plurality of conditions of the machine system, each of the plurality of signals having a corresponding signal projection subspace;   (Shibuya, see figure 19 and ¶ 169 – 170 – this is a system which combines both the “first embodiment” and the “second embodiment” wherein sensor signal(s) are received from a facility along with an “event signal” wherein the “event signal” is used for a “mode dividing unit”, i.e. the signals from each sensor are divided into a plurality of time series due to the “mode” – see ¶174 “Meanwhile, the mode dividing unit 1908 performs mode dividing of dividing the time for each operating state based on the event signal 103” wherein ¶ 143 clarifies “As described above, when the operating cycle is regular, e.g., the operation starts and stops at the determined time of one day, data is extracted every fixed period, e.g., one day to compute the mean and the distribution. Although the period is not one day, the same applies thereto. When the operation starting/stopping time is known, data in a period which can be regarded as a normal operation is carried out may be extracted to compute the mean and the distribution [example of feature extraction] and this method may be applied even though the operating cycle is irregular.” and see ¶ 183 “In the learning data selection processing in the learning-data selection unit 1903, the method using the event signal is considered in addition to the same method as the example described by using FIG. 12.”, to clarify – the system receives a plurality of sensor signals over a period of time, segments the signals into time series for each mode [each label], and extracts feature vectors for each cycle/mode, i.e. the system receives from this process a labelled point for each mode comprising time series from each signal and the associated feature vectors for each time series for each mode, e.g. for the  – see figures 9A and 9B for the plurality of signals and associated features
for more clarification also see ¶ 72 teaches “The event signal 103 is a signal indicating an operation, a failure, or a warning of the facility which is output irregularly and is constituted by a character string indicating the time and the operation, the failure, or the warning.”, e.g. a “normal OFF”, a “start”, a “normal ON”, and a “stop”, and see ¶ 82 “An example of the sensor signal 102 is shown in FIG. 4. The sensor signal 102 is plural time-series signals...” and see figures 2C and figures 9A to 9B 
in regards to claim interpretation – the system of Shibuya is obviously receiving a continuous stream of data segmented into “modes”, i.e. this is for anomaly detection , it would have been obvious that a newly received “mode”, e.g. “start”, and the associated signal data/feature vectors would have been encompassed by the labelled point, i.e. the “mode” “start” obviously is a label which describes a condition of the machine system starting, wherein this “mode” comprises the time series data from the signals for that particular “mode”, and wherein the system extracts feature vectors for that “mode” 
to clarify and in regards to the feature space– see ¶ 88 “The feature amount extraction is considered using the sensor signal as it is. A window of ±1, ±2, etc., is set with respect to a predetermined time and a feature indicating a time variation of data may be extracted by a feature vector of a window width (3, 5, etc.,) x the number of sensor”, in other words a feature vector for all the signals for a “window” of time [e.g., the operating cycle such as “start”] is in other words there is a joint feature space divided into feature subspaces for each of “the number of sensors” wherein for each sensor there is a feature space for the sensor with at least one associated feature vector, also see ¶ 143 and figure 13 for an example – a “feature” space is created for each signal, e.g. the “daily mean” of the signal as one feature and the “Daily distribution” of the signal, in other words “the mean and the distribution” are computed for each mode for each signal, obviously forming a feature space for the signal wherein each feature space comprises the feature vectors for each mode, as the features, e.g. mean, are extracted for each cycle such as “starting/stopping”
	projecting each of a plurality of model condition points into each signal projection subspace of the plurality of signal projection subspaces, each model condition point comprising a second time series from each of the plurality of signals, each second time series being associated with at least one feature vector , each feature vector having components corresponding to one or more signal features, the model condition point having a second label describing one of the plurality of conditions of the machine system, the projecting comprising evaluating the at least one feature vector associated with the second time series from each of the plurality of signals in the corresponding signal projection subspace; (Shibuya, see figure 19 as cited above and as detailed above, specifically see the # 1903 for the “Learning-Data selecting unit” for the model creation – this selects the “learning data” wherein ¶ 173 teaches “The sensor signal 102 output from the facility 101 is accumulated for learning in advance. in other words the system accumulates sensor(s) signal data over a period of time for learning, and uses these plurality of time series as model condition points for creating a “normal model” – to clarify see figure 5 and ¶ 83-86 – the “normal model” is created by using the accumulated signals from the same sensors “during a predetermined period” and dividing these signals for the “mode”, i.e. ¶ 91 “the normal-model creation unit 106 classifies the learning data selected in step S503 for each of the modes divided by the mode dividing unit 104 and creates the normal model for each mode in step S505.”, and then see ¶ 179 “When the normal model is created for each mode, the anomaly measurement is computed by using the normal models of all the modes and the minimum value is acquired.” and ¶ 161-163 which teaches creating “plural normal models” [model condition points] for each “cycle”/mode by “random sampling” for “several cycles” and see ¶178 “The feature amount extraction unit 1901 inputs the sensor signal 102 and performs the same feature amount extraction as that at the learning time to acquire the feature vector”, in other words the system obtains model condition points [plural normal models] for each mode [points associated with labels for a plurality of conditions, e.g. start/stop/on/off] wherein each model condition point comprises time series from each of the sensors [random sampling over several cycles for accumulating sensor signal data] wherein each of these time series at least one feature vector [using the “same feature...extraction”] as the labelled point and wherein these model condition points and then used for classifying (fig. 5, “classify data for each mode”) using the feature vectors – in regards to this being in the signal projection subspace see above – the system is using the same feature extraction method for both the labelled input data and the learning data, so obviously these are in the same signal projection feature spaces, e.g. figure 13 shows an example – obviously as figure 13 is an example comprising a plurality of points for “every one day” (¶ 143) this shows not just the feature space/feature vectors for the most recent data [the labelled point for the mode] but also shows the learning data that was previously accumulated, and ¶ 143 then clarifies, as cited above, that the period/cycle for this would also obviously be by mode, e.g. “when the starting/stopping time are known” [starting/stopping modes], to clarify – the system extract features, e.g. the mean for each mode over several cycles of each mode, to form a feature space for each signal, i.e. each signal has a feature space which comprises the received data for each mode, including the labelled point [e.g., the point being the data for the most recent “start” cycle] wherein the system then projects the “learning data” into the same feature space for each signal as the system stores the “learning data” and extracts the feature vectors using the “same” algorithm as for the labelled point, e.g. see figure 13,, also see ¶ 88 as cited above, and also see figure 6 ¶ 93 which provides an example of a “3D feature space” and clarifies that “the dimension of the feature space may be...higher”, i.e. the feature space for each signal, in figure 6 the “evaluation data” is the projected feature vectors from the “the number of the learning data” (see ¶ 93), and in regards to evaluating the at least one feature vector in the subspace see ¶ 93-99 which provides an example of evaluating the feature vectors using a “local sub-space 
	projecting the labeled point into each signal projection subspace, comprising evaluating the at least one feature vector associated with the first time series from each of the plurality of signals in the corresponding signal projection subspace; (Shibuya, as cited above, e.g. see figures 5 and 13, also see figure 17 and ¶ 158 – the “features” from both the “learning data” and the newly received labelled point [i.e. a plurality of time series labelled with a model from the facility sensors] are projected into the same feature subspace for each signal wherein ¶ 88 as cited above provides an example of feature extraction for forming a feature vector for each signal, wherein there feature vectors for each signal are then turned into a “a feature vector of a window width (3, 5, etc.,) x the number of sensors.”, i.e. there is a feature vector for the plurality of signals, and within that feature vector there is a feature vector for the “number of sensors”, e.g. the mean (fig 13) of each signal wherein ¶ 93-¶99 teaches then evaluating the feature vectors for both the model condition points [the learning data/evaluation data] and the labelled point in the feature space to for “anomaly measurement”, e.g. evaluating the “distance between the evaluation data [the model condition points] and the point b [the labelled point]”, e.g. ¶ 116 “The distance between the feature vector at the time of anomaly judgment and each of the representative vectors is examined and a cause event X corresponding to the nearest in other words the system evaluates the feature vectors of both the model condition points/learning data and the labelled point, e.g. an “anomaly” to find a distance between the feature vectors to determine the “deviation” of each signal)
[...]
calculating a contribution of each of the plurality of signals to the first condition..., to form a sorted list of signals and contributions;(Shibuya, ¶ 117 “Further, since it is considered that a signal having a large deviation [contribution to the first condition/label] when the anomaly occurs contributes to the anomaly judgment, when the signals are displayed in the order of the large deviation from the top, it is easily verified which sensor signal has the anomaly. In addition, when a past case of the cause event is displayed in the same manner as the presented result event, it is easy to accept the same phenomenon to trust the advance notice of the result event.”, wherein this is used to detect an anomaly occurring in the received “event” (¶ 120) wherein as per at least figure 19 as cited above is the mode label from the event signal – in other words the system determines for each newly received mode/”cause event” [including the first label] the deviation of each signal compared to the “normal” signal/model(s)”  and sorts the signals into an ordered list by the “order of large deviation from the top” to determine “which sensor signal has the anomaly” [which sensor signal contributes the most to the first label/cause event] – to clarify this obviously forms a sorted listed of signals based on the contribution [e.g., the “deviation”] of the signal to the event label/mode of the received data, wherein the deviation is a measure of how much the present labelled point/associated time series/associated feature vectors vary/deviate from the normal case, i.e. this shows how much each signal contributes to the first condition/label)


Shibuya does not explicitly teach:
	constraining an analysis neighborhood for the labeled point to a set number of model condition points closest to the labeled point as projected in each signal projection subspace;
	for each signal projection subspace, calculating a first percent of model condition points in the analysis neighborhood having a same label as the labeled point in the analysis neighborhood out of the set number of model condition points in the analysis neighborhood;
	6for each signal projection subspace, calculating a second percent of model condition points in the model having a same label as the labeled point out of the plurality of model condition points;
	from the first percent and the second percent calculated for the corresponding signal projection subspace
	and adapt the physical system's behavior based on the sorted list.

Maeda teaches: 
constraining an analysis neighborhood for the labeled point to a set number of model condition points closest to the labeled point as projected in each signal projection subspace; (As an initial matter, the inventors for both Shibuya as relied upon above and Maeda are the same, and then see Maeda the abstract – this is for identifying an abnormal “sensor signal” and see figures 3-4, Maeda is for a related invention to the Shibuya invention, then see figure 11 and the description starting in ¶ 86 – Maeda is using a method similar to a “k-NN method”, i.e. the “k” [a set number of model condition points] which are the “nearest neighbors” wherein kNN uses a “distance within a feature space” wherein ¶ 87 then teaches that this is to find the “k pieces of data with highest similarities to the unit for deciding normal range” and then see ¶ 88 and figure 13 – the system is finding “a number k” of the nearest/most similar signal data [based on “distance within a feature space”] to a newly “observed sensor signal” and based on the k-nearest neighbors of the sensor signal determines if the “observed sensor signal” is “an anomaly” and provides calculation of “a deviance of observation data”, in other words it would have been obvious that, as taken in combination with Shibuya as relied upon above, the system would have used a kNN algorithm, or a “similar method”, to constrain an analysis neighborhood for each signal/each signal’s subspace to a set number “k” of model condition points that are closest in “distance within [the] feature space” [and obviously, the subspace for the sensor/signal] in order to determine the “deviance” of a “observation data” [a labelled point] from the normal points, i.e. this detects if a newly received labelled point comprising signal data for an event period, e.g. “starting” [see Shibuya] is “an anomaly” by checking the deviation for the new data to the k nearest neighbors of that data from the “learning data” “within [the] feature space” for the signal [the feature subspace for the particular signal])
and adapting the machine system's behavior based on the sorted list. (Medea, as cited above, and as taken in combination with Shibuya teaches detecting an anomalous event and forming a list of signals wherein the list is sorted by which signals have the most deviation at the top – then see Medea, ¶ 171 – the system is be used for “condition-based maintenance” wherein “parts replacement is performed [the system is adjusted] in accordance with the conditions of the device” wherein this is based on the “normal and anomalous data” of the devices as “it is important [for condition-based maintenance] to detect outliers from normal data”, it would have been obvious to use the sorted list of signals and their deviations/contributions to adapt the machine system, e.g. by implementing a parts replacement to correct for anomalous data)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings from system of Shibuya for “detecting advance signs of anomalies” such as by determining the deviation of each sensor signal from the normal [from the learning data] with the teachings from Maeda on using learning data about normal cases for anomaly detection using an algorithm similar to K-NN and applying the system for condition based maintenance. The motivation to combine would have been that 1) Maeda provides a computationally efficient means of anomaly detection using kNN wherein Maeda’s technique also provides “an anomaly explanation message” (Maeda, ¶ 86) in addition to the “deviance” and 2) Shibuya, ¶ 99 suggests uses a “nearest method” technique [e.g., kNN], and 3) for “condition-based maintenance..., in many cases, anomalous data is rarely collected 
	In addition, both references are by the same reference in a similar time period, i.e. it would have been obvious that these references were describing different aspects of the same overall system. 

Shibuya, as taken in combination with Maeda does not explicitly teach: 
for each signal projection subspace, calculating a first percent of model condition points in the analysis neighborhood having a same label as the labeled point in the analysis neighborhood out of the set number of model condition points in the analysis neighborhood;
	6for each signal projection subspace, calculating a second percent of model condition points in the model having a same label as the labeled point out of the plurality of model condition points;
	from the first percent and the second percent calculated for the corresponding signal projection subspace

Zhang teaches:
	for each signal projection subspace, calculating a first percent of model condition points in the analysis neighborhood having a same label as the labeled point in the analysis neighborhood out of the set number of model condition points in the analysis neighborhood (Zhang, abstract, teaches a modification to a KNN “to address the within-class imbalance” in order to “bias classification towards the rare class” and then see § 5.2 which teaches that “A  Let λ denote the positive odds (P:N) in the query neighbourhood over the positive odds in the global population. As an example, if P:N = 1:5 in the query region [first ratio, this is the ratio in the neighborhood] and P:N=1:10  [second ratio, this is the ratio in the total population] in the global training population, then λ = 2....” wherein “λ takes into account the local versus global class imbalance levels and the positive odds ratio in the local neighbourhood versus the global population indicates the positive propensity for the query instance...”, and then see table 1 which also calculates the “Pos frequency” as a variation to P:N, the first ratio is merely the frequency “in the query region”, e.g. 1/6)
for each signal projection subspace, calculating a second percent of model condition points in the model having a same label as the labeled point out of the plurality of model condition points;(Zhang, § 5.2 as cited above, “P:N=1:10 in the global training population” and then see table 1 which also calculates the “Pos frequency” as a variation to P:N, the second ratio as claimed is merely an obvious variant of Zhang, i.e. there is 1 point in Zhang’s teaching with the same label in the dataset, with 11 points and with 6 points in the neighborhood, the second ratio is merely the 1 point in the neighborhood and in the model to the 6 points in the neighborhood, i.e. its 1:6, also this is obvious as this second ratio is merely describing the odds 
	from the first percent and the second percent calculated for the corresponding signal projection subspace (Zhang, the λ above is the ratio of the first to the second ratio, e.g. (1/5)/(1/10) = “2”) wherein this forms the probability of the labelled point being in the neighborhood/”query region” adjusted for the class imbalance, wherein this adjusts the probability of the prediction, this claim is merely an obvious variant of Zhang with a simple rearrangement of the odds being determined and used)

It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings from Shibuya, as modified above, on a system which uses a kNN for finding anomalous signals  with the teachings from Zhang on a modification to the kNN’s probability estimation to account for “rare classes”. The motivation to combine would have been that the technique of Zhang would have biased the classification towards the rare class, e.g. an anomaly/outlying class, and as such Zhang would have improved the system’s ability to determine if a newly received set of time series from signals was anomalous [i.e., a rare class, it is abnormal], i.e. “These strategies more accurately characterise the rare-class distribution for accurate classification.” (Zhang, §8)
	
Regarding Claim 14.
Zhang teaches:
The machine-readable storage medium of claim 21, wherein calculating the contribution of each the signal to the first condition comprises:
	if the first percent is less than or equal to the second percent, setting the contribution of the signal to (first percent/second percent) - 1;(Zhang, as cited above, renders this obvious, i.e. this is “λ” -1 which is an obvious variant, and in terms of using λ as the contribution rate this also would have been obvious, this is the odds that the model condition points in the neighborhood have the same label as the labelled point, divided by the odds that the model condition points in the model have the same label as the labeled point)
	and otherwise, setting the contribution of the signal to (first percent - second percent)/(1 - second percent).  (This is also an obvious variant from Zhang using a simple rearrangement of λ, and this additionally this is contingent)

Regarding Claim 15.
Maeda teaches: 
	The machine-readable storage medium of claim 21, wherein constraining the analysis neighborhood for the labeled point to the set number of model condition points closest to the labeled point in the signal projection subspace comprises sorting the model condition points by a distance from the labeled point and limiting the analysis neighborhood to a number of the condition points having the closest distance. (Maeda, ¶ 86-90, as cited above, this uses a kNN algorithm, or a similar algorithm, to select the k nearest points by “distance” such as for when k = 5 this is “the five highest pieces”, in other words obviously a kNN algorithm constrains the neighborhood of the k nearest neighbors to the closest k neighbors based on “distance within a feature space” [e.g., the subspace] wherein the closest/nearest are sorted by distance [e.g. highest] from the labelled point) 

Regarding Claim 16.
Maeda teaches: 
	The machine-readable storage medium of claim 15, wherein the distance comprises a distance between a projection of the labeled point and projections of the model condition points on a signal feature axis in the signal projection subspace. (Maeda, ¶ 86 – “distance within a feature space” is used, obviously this includes the distance between the labelled point and the model condition points in each signals feature space, wherein the distance is on a signal feature axis [it’s a distance in a coordinate system, obviously this includes the distance on the feature axis])

Regarding Claim 17.
Maeda teaches: 
	The machine-readable storage medium of claim 21, wherein adapting the physical system's behavior comprises remediating a faulty component. (Maeda, ¶ 171 teaches using the system for determining “parts replacement” [example of remediating a fault component])

Claims 6, 12, 18 is/are rejected under 35 U.S.C. 103 as being unpatentable over Shibuya et al., US 2012/0290879 in view of Maeda et al., US 2012/0041575 and in further view of Zhang et al., “KRNN: k Rare-class Nearest Neighbour classification”, 2016 in further view of Skand, “kNN(k-Nearest Neighbour) Algorithm in R”, 2017

Regarding Claim 6
Shibuya, as modified above, does not explicitly teach: 
	The control method of claim 19, wherein the set number of the model condition points comprises a square root of a total number of model condition points.  

Skand teaches: 
	The control method of claim 19, wherein the set number of the model condition points comprises a square root of a total number of model condition points.  (Skand, section “Requirements for kNN”, #1 teaches “Generally k gets decided on the square root of number of data points”)

It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings from Shibuya, as modified above, on a system which uses a kNN algorithm with the teachings from Skand on using the “square root” of the number of data points [model condition points]. The motivation to combine would have been that this value provides a k-value which reduces the “variance” while avoiding a “bias” (Skand, # 1, as cited above)

Regarding Claim 12.

	The apparatus of claim 20, wherein the set number of the model condition points comprises a square root of a total number of model condition points. 

Skand teaches: 
	...wherein the set number of the model condition points comprises a square root of a total number of model condition points.  (Skand, section “Requirements for kNN”, #1 teaches “Generally k gets decided on the square root of number of data points”)

It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings from Shibuya, as modified above, on a system which uses a kNN algorithm with the teachings from Skand on using the “square root” of the number of data points [model condition points]. The motivation to combine would have been that this value provides a k-value which reduces the “variance” while avoiding a “bias” (Skand, # 1, as cited above)



Regarding Claim 18.
Shibuya, as modified above, does not explicitly teach: 
	The machine-readable storage medium of claim 21, wherein the set number of the model condition points comprises a square root of a total number of model condition points. 
logic to project the labeled point into each signal projection subspace, comprising evaluating the at least one feature vector associated with the first time series from each of the plurality of signals in the corresponding signal projection subspace;
	logic to constrain an analysis neighborhood for the labeled point to a set number of model condition points closest to the labeled point as projected in each signal projection subspace;
	logic to calculate, for each signal projection subspace, a first percent of model condition points in the analysis neighborhood having a same label as the labeled point in the analysis neighborhood out of the set number of model condition points in the analysis neighborhood;
	logic to calculate, for each signal projection subspace, a second percent of model condition points in the model having a same label as the labeled point out of the plurality of model condition points;
	logic to calculate a contribution of each of the plurality of signals to the first condition from the first percent and the second percent calculated for the corresponding signal projection subspace, to form a sorted list of signals and contributions;
	and logic to display signals that are higher in the sorted list of signals as the most likely contributors for the machine system condition corresponding to the labeled point. 
Skand teaches: 
	...wherein the set number of the model condition points comprises a square root of a total number of model condition points.  (Skand, section “Requirements for kNN”, #1 teaches “Generally k gets decided on the square root of number of data points”)

It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings from Shibuya, as modified above, on a system which uses a kNN algorithm with the teachings from Skand on using the “square root” of the number of data points [model condition points]. The motivation to combine would have been that this value provides a k-value which reduces the “variance” while avoiding a “bias” (Skand, # 1, as cited above)

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Rashid et al., “Times-series data augmentation and deep learning for construction equipment activity recognition”, 2019, see page 2 col. 1, ¶ 1 “recurrent neural networks (RNN) are able to automatically extract high-level representative features that consider temporal dynamics among consecutive time steps of the sensor data” – for pertinence see ¶ 25 in the instant specification 
Nguyen et al., US 2018/0322394- see the abstract, see ¶ 1-4 – this is a recurrent neural network for feature extraction from time series – for pertinence see ¶ 25 in the instant specification 
Song et al., US 2019/0034497  - see the abstract, this is a system which uses a recurrent neural network for feature extraction from multi-variate time series– for pertinence see ¶ 25 in the instant specification 
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DAVID A. HOPKINS whose telephone number is (571)272-0537.  The examiner can normally be reached on Monday to Friday, 8:30AM to 5 PM EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is 
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Omar Fernandez Rivas can be reached on (571) 272-2589.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/D.A.H./Examiner, Art Unit 2128                                                                                                                                                                                                        
/OMAR F FERNANDEZ RIVAS/Supervisory Patent Examiner, Art Unit 2128