DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 2022-09-23 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statements are being considered by the examiner.
Response to Amendment
The amendment filed 2022-10-21 has been entered.  Applicant’s amendments to the claimed have overcome each and every objection and rejection under 35 USC 101 set forth in the previous office action.  The status of claims is as follows:
Claims 1-25 are pending in the application.
Claims 1-5 and 7-25 are amended.

Response to Arguments
Applicant’s arguments with respect to rejections under 35 USC 103 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.  The rejections now rely on a combination with a new piece of art, Dai et al., as necessitated by the amendment.


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-6, 16, and 20-25 are rejected under 35 U.S.C. 103 as being unpatentable over Smith et al. (“CoCoA: A General Framework for Communication-Efficient Distributed Optimization”; hereinafter “Smith”) in view of Zhang et al. (“SLAQ: Quality-Driven Scheduling for Distributed Machine Learning”; hereinafter “Zhang”) and Dai et al. (“Toward Understanding the Impact of Staleness in Distributed Machine Learning”; hereinafter “Dai”).
As per Claim 1, Smith teaches a computer-implemented method of parallel training of a machine learning model on a computerized system, whose computing tasks can be assigned to multiple workers of the system, and wherein the method comprises:
accessing training data (Smith, Page 11 Top Paragraph, discloses:  “From a practical perspective, this version of the framework will typically imply that the data is distributed by training point”).
starting a parallel training of the machine learning model based on the accessed training data, the training distributed through a first number K of workers, K > 1 (As shown above, Smith, Page 11 Top Paragraph, discloses that the data is “distributed” by training point.  This distribution is among workers (“machines”), as Smith discloses on Page 7 Section 3.1:  “The goal of the CoCoA framework is to find a global minimizer of the objective (A), while distributing computation based on the partitioning of the dataset A across machines”, wherein one of ordinary skill in the art will appreciate that calculating a minimizer of an objective function describes training.  Smith, top of Page 11 explicitly recited training points:  “From a practical perspective, this version of the framework will typically imply that the data is distributed by training point”.
Examiner notes that Applicant has stated the instant invention is based on CoCoA in Specification [0051]:  “The present elastic framework is a distributed, auto-elastic ML system, which was developed based on the state-of-the-art CoCoA framework”).
However, Smith does not teach and in response to detecting a change in a temporal evolution of a quantity indicative of a convergence rate of the parallel training, the change reflecting a deterioration of the convergence rate, scaling-in the parallel training of the machine learning model, so as for the parallel training to be subsequently distributed through a second number K' of workers, where K > K' > 1.
Zhang teaches and in response to detecting a change in a temporal evolution of a quantity indicative of a convergence rate of the parallel training, the change reflecting a deterioration of the convergence rate and providing an indicator for deciding whether to scale-in the training [for improving the convergence rate in training the machine learning model], scaling-in the parallel training of the machine learning model, so as for the parallel training to be subsequently distributed through a second number K' of workers, where K > K' > 1[, K' being determined to improve the convergence rate of the parallel training of the machine learning model]. (Zhang, Page 391 Para 3, discloses:  “SLAQ dynamically allocates resources based on job resource demands, intermediate model quality, and the system’s workload.”  Here, Zhang discloses allocating resources based on “intermediate model quality”.  Zhang suggests that this is determined based on a change in the temporal evolution of a convergence rate, as in the previous paragraph, Page 391 Para 2, Zhang states:  “During a burst of job submissions, equal resources will be allocated to jobs that are in their early stages and could benefit significantly from extra resources as those that have nearly converged and cannot improve much further. This is not efficient”.  Here, Zhang indicates that, while the model is still converging, the convergence rate over time is slowing down as each training iteration gives diminishing returns.  This is reinforced in Zhang Page 390 Para 3:  “It generates a low-quality model at the beginning and improves the model’s quality through a sequence of training iterations until it converges. In general, the quality improvement diminishes as more iterations are completed.”  Here, “quality improvement diminishes” refers to a deterioration of the convergence rate.  Zhang uses this as motivation to re-allocate resources, as Page 391 Para 3, concludes:  “The intuition behind SLAQ is that in the context of approximate ML training, more resources should be allocated to jobs that have the most potential for quality improvement.”  Zhang follows this up on Page 394 Right Column Para 3:  “The scheduler reclaims workers back from some job drivers, and reallocates them to other jobs for better system-wide performance goals”.  Finally, on Page 398 Para 1, Zhang discloses:  “When including job j at allocation aj, we are paying cost of aj and receiving value of Δlj = Lossj(aj,t) − Lossj(aj,t + T). The scheduler prefers jobs with highest value of Δlj/aj; i.e., we want to receive the largest gain in loss reduction normalized by resource spent.”  Here, Zhang is disclosing allocating resources based on the highest convergence rate (the delta change of the loss).  The delta change of the loss is an indicator of whether or not the training should be scaled in, thus “reclaiming” resources to allocate to something else (rather than this training) that has the “most potential for quality improvement”.  Thus, during a reallocation of resources, if more workers are assigned to other jobs with higher convergence rates, then less workers (“reclaims workers”) are allocated to the current training.
Examiner notes that Applicant acknowledges that SLAQ also allocates resources based on model feedback on convergence in Specification [0066]:  “SLAQ is a cluster scheduler for ML applications. SLAQ also relies on feedback from ML applications. However, instead of optimizing the time to arbitrary accuracy for one application, SLAQ tries to minimize the time to low accuracy for many applications at the same time, by shifting resources from applications with low convergence rates to those with high rates, assuming that resources can be used more effectively there.”)
Zhang and Smith are analogous art because they are both in the field of endeavor of distributed machine learning.
It would have been obvious before the effective filing date of the claimed invention to combine the resource reallocation of Zhang with the distributed training of Smith.  One of ordinary skill in the art would be motivated to do so in order to maximize the cost effectiveness of training resources (Zhang, Page 398 Para 3:  “Maximizing the total loss reduction targets the cost effectiveness of cluster resources. This is desirable not only on clusters used by a single company which may have high resource contention, but potentially even on multi-tenant clusters (clouds) in which revenue could be directly associated with the total quality progress (loss reduction) of ML jobs.”)
	Examiner notes that Zhang teaches reallocating workers from training that will not provide much benefit for convergence, in order to use those resources elsewhere.  However, Zhang does not teach that using less workers actually improves the convergence rate of the training itself. Thus, the combination of Zhang and Smith does not teach scale-in the training and determine K' to improve the convergence rate of the parallel training of the machine learning model.
Dai teaches scale-in the training and determine K' to improve the convergence rate of the parallel training of the machine learning model.  (Recall above that Zhang teaches scale-in the training.  Dai teaches that this scaling-in improves the convergence rate of the parallel training of the machine learning model.  Dai, Page 5 “Effects of More Workers”, discloses:  “The impact of staleness is amplified by the number of workers. In the case of MF, Fig. 3(b) shows that the convergence slowdown in terms of the number of batches (normalized by the convergence for s = 0) on 8 workers is more than twice of the slowdown on 4 workers. For example, in Fig. 3(b) the slowdown at s = 15 is ~3.4, but the slowdown at the same staleness level on 8 workers is ~8.2. Similar observations can be made for CNNs (Fig. 3). This can be explained by the fact that additional workers amplifies the effect of staleness by (1) generating updates that will be subject to delays, and (2) missing updates from other workers that are subject to delays.”)
Dai and the combination of Smith and Zhang are analogous art because they are both in the field of endeavor of parallel training of machine learning models.
It would have been obvious before the effective filing date of the claimed invention to combine the elastic parallel training of Smith and Zhang with the suggestion of scaling-in of Dai.  Zhang themselves acknowledge the issue of “staleness slowdown on convergence rate” on Page 401 Right Column, Third Paragraph:  “The convergence progress of the underlying ML training algorithms is typically robust to a certain degree of fluctuation and slack, so the efficiency improvement obtained from the parallelism outweighs the staleness slowdown on convergence rate.”  One of ordinary skill in the art would be motivated to combine Dai with Zhang for a ML model that does not exhibit “typical” robustness to a “certain degree of fluctuation and slack” mentioned by Zhang, as in such a case Dai’s scaling in the training will actually result in an efficiency improvement as opposed to throwing more workers at the training, thus saving time (Dai, Page 5:  “The impact of staleness is amplified by the number of workers. In the case of MF, Fig. 3(b) shows that the convergence slowdown in terms of the number of batches (normalized by the convergence for s = 0) on 8 workers is more than twice of the slowdown on 4 workers.”)

As per Claim 2, the combination of Smith, Zhang, and Dai teaches the method according to Claim 1.  Smith teaches wherein the machine learning model is a generalized linear model.  (Smith, Page 4 Top, discloses:  “In this paper we develop a general framework for minimizing problems of the following form…This formulation includes many popular methods in machine learning and signal processing, such as support vector machines, linear and logistic regression, lasso and sparse logistic regression, and many others.”)

As per Claim 3, the combination of Smith, Zhang, and Dai teaches the method according to Claim 2.  Smith teaches wherein the quantity is a duality-gap measuring a distance between a primal formulation of a training objective for the training and a dual formulation of this training objective. (Smith, Page 3 Top, discloses:  “Using primal-dual information in this manner not only allows for efficient methods (achieving, e.g., up to 50x speedups compared to state-of-the-art), but also allows for strong primal-dual convergence guarantees and practical benefits such as computation of the duality gap for use as an accuracy certificate and stopping criterion.”)

As per Claim 4, the combination of Smith, Zhang, and Dai teaches the method according to Claim 3.  Smith teaches duality-gap.  However, Smith does not teach wherein the change in the temporal evolution is detected by comparing a short-term evolution of the duality-gap to a long-term evolution thereof, the long-term evolution extending over a longer period of time than the short-term evolution.
Zhang teaches wherein the change in the temporal evolution is detected by comparing a short-term evolution of the [duality-gap] convergence rate to a long-term evolution thereof, the long-term evolution extending over a longer period of time than the short-term evolution (Zhang Page 390 Para 3:  “It generates a low-quality model at the beginning and improves the model’s quality through a sequence of training iterations until it converges. In general, the quality improvement diminishes as more iterations are completed.”  Zhang, Page 391 Para 2, states:  “During a burst of job submissions, equal resources will be allocated to jobs that are in their early stages and could benefit significantly from extra resources as those that have nearly converged and cannot improve much further.”  In these statements, Zhang discloses that the convergence rate over early stages (long-term evolution) is compared to the convergence rate in the current stage (short-term evolution), which reveals that this short-term evolution shows diminishing convergence compared to the previous long term).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Zhang and Smith for at least the reasons recited in Claim 1.

As per Claim 5, the combination of Smith, Zhang, and Dai teaches the method according to Claim 4.  Smith teaches duality-gap.  However, Smith does not teach wherein the short-term evolution is compared to the long-term evolution so as to detect a knee of the temporal evolution of the duality-gap, wherein the knee corresponds to the change and determines a given moment in time, whereby the training of the generalized linear model is scaled-in at the given moment in time.
Zhang teaches wherein the short-term evolution is compared to the long-term evolution so as to detect a knee of the temporal evolution of the [duality-gap] convergence rate, wherein the knee corresponds to the change and determines a given moment in time, whereby the training of the generalized linear model is scaled-in at the given moment in time.  (Examiner notes Applicant describes a “knee” in Specification [0037]:  “a ‘knee’ of this temporal evolution, i.e., a substantial change that translates into a pronounced modification in the temporal slope.”  Zhang Page 390 Para 3:  “It generates a low-quality model at the beginning and improves the model’s quality through a sequence of training iterations until it converges. In general, the quality improvement diminishes as more iterations are completed.”  Zhang, Page 391 Para 2, states:  “During a burst of job submissions, equal resources will be allocated to jobs that are in their early stages and could benefit significantly from extra resources as those that have nearly converged and cannot improve much further.”  In these statements, Zhang discloses a decrease in the rate of the convergence.  One of ordinary skill in the art and knowledge of calculus will appreciate that a rate of change is also known as a “slope”, and thus Zhang also discloses finding a “knee” that corresponds to a change in the convergence.  Zhang discloses that this causes the training to be scaled-in in Page 394 Right Column Para 3:  “The scheduler reclaims workers back from some job drivers, and reallocates them to other jobs for better system-wide performance goals”.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Zhang and Smith for at least the reasons recited in Claim 1.

As per Claim 6, the combination of Smith, Zhang, and Dai teaches the method according to Claim 3.  Smith teaches duality-gap.  However, Smith does not teach wherein the training of the generalized linear model is scaled-in upon detecting a change in a slope of the temporal evolution of the duality-gap. 
Zhang teaches wherein the training of the generalized linear model is scaled-in upon detecting a change in a slope of the temporal evolution of the duality-gap (Zhang Page 390 Para 3:  “It generates a low-quality model at the beginning and improves the model’s quality through a sequence of training iterations until it converges. In general, the quality improvement diminishes as more iterations are completed.”  Zhang, Page 391 Para 2, states:  “During a burst of job submissions, equal resources will be allocated to jobs that are in their early stages and could benefit significantly from extra resources as those that have nearly converged and cannot improve much further.”  In these statements, Zhang discloses a decrease in the rate of the convergence.  One of ordinary skill in the art and knowledge of calculus will appreciate that a rate of change is also known as a “slope”, and thus Zhang also discloses finding a “knee” that corresponds to a change in the convergence.  Zhang discloses that this causes the training to be scaled-in in Page 394 Right Column Para 3:  “The scheduler reclaims workers back from some job drivers, and reallocates them to other jobs for better system-wide performance goals”.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Zhang and Smith for at least the reasons recited in Claim 1.

As per Claim 16, the combination of Smith, Zhang, and Dai teaches the method according to Claim 1.  Zhang teaches wherein: the first number K of workers form a first set of workers; the second number K' of workers form a second set of workers; and scaling-in the parallel training comprises reallocating at least part of the training data as initially used by workers of the first set to workers of the second set.  (Zhang, Page 394 Right Column Para 3:  “The scheduler reclaims workers back from some job drivers, and reallocates them to other jobs for better system-wide performance goals”.  Here, Zhang discloses reallocating jobs between workers.  The workers in the previous iteration are a first set of workers, and the smaller remaining workers after the reallocation are a second set of workers.  Zhang, Page 394 Section 3, discloses:  “A centralized SLAQ scheduler coordinates the resource allocation of multiple ML training jobs. As shown in Figure 4(a), each job is composed of a set of tasks. Each task processes data based on the ML algorithm on a small partition of the dataset, and can be scheduled to run on any node.”  Here, Zhang discloses that the allocation to workers comprises splitting up the training data (“small partition of the dataset”).  Thus the “small portion of the dataset” recited by Zhang must be reallocated from a “reclaimed” worker to one of the remaining workers.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Zhang and Smith for at least the reasons recited in Claim 1.

As per Claim 20, this is a computerized system claim corresponding to method Claim 1.  The difference is that it recites a computerized system.  Smith discloses “machines” on Page 7 Section 3.1:  “The goal of the CoCoA framework is to find a global minimizer of the objective (A), while distributing computation based on the partitioning of the dataset A across machines”.  Claim 20 is rejected for the same reasons as Claim 1.

As per Claim 21, this is a computerized system claim corresponding to method Claim 3.  The difference is that it recites a computerized system and whereby the system is configured to scale-in the parallel training upon detecting the change in the duality-gap.  Smith discloses “machines” on Page 7 Section 3.1:  “The goal of the CoCoA framework is to find a global minimizer of the objective (A), while distributing computation based on the partitioning of the dataset A across machines”.  Smith teaches duality-gap in Page 3 Top:  “Page 3 Top, discloses:  “Using primal-dual information in this manner not only allows for efficient methods (achieving, e.g., up to 50x speedups compared to state-of-the-art), but also allows for strong primal-dual convergence guarantees and practical benefits such as computation of the duality gap for use as an accuracy certificate and stopping criterion.”  However, Smith does not teach whereby the system is configured to scale-in the parallel training upon detecting the change in the duality-gap.
Zhang teaches whereby the system is configured to scale-in the parallel training upon detecting the change in the [duality-gap] convergence rate.  Zhang discloses that this causes the training to be scaled-in in Page 394 Right Column Para 3:  “The scheduler reclaims workers back from some job drivers, and reallocates them to other jobs for better system-wide performance goals”.
Besides these reasons, Claim 21 is rejected for similar reasons as Claim 3.

As per Claim 22, this is a computerized system claim corresponding to method Claim 4.  The difference is that it recites a computerized system and whereby the system is configured to scale-in the parallel training upon detecting the change in the duality-gap.  Smith discloses “machines” on Page 7 Section 3.1:  “The goal of the CoCoA framework is to find a global minimizer of the objective (A), while distributing computation based on the partitioning of the dataset A across machines”.  Smith teaches duality-gap in Page 3 Top:  “Page 3 Top, discloses:  “Using primal-dual information in this manner not only allows for efficient methods (achieving, e.g., up to 50x speedups compared to state-of-the-art), but also allows for strong primal-dual convergence guarantees and practical benefits such as computation of the duality gap for use as an accuracy certificate and stopping criterion.”  However, Smith does not teach whereby the system is configured to scale-in the parallel training upon detecting the change in the duality-gap.
Zhang teaches whereby the system is configured to scale-in the parallel training upon detecting the change in the [duality-gap] convergence rate.  Zhang discloses that this causes the training to be scaled-in in Page 394 Right Column Para 3:  “The scheduler reclaims workers back from some job drivers, and reallocates them to other jobs for better system-wide performance goals”.
Besides these reasons, Claim 22 is rejected for similar reasons as Claim 4.

As per Claim 23, this is a computer program product claim corresponding to method Claim 1.  The difference is that it recites a computer program product and a computerized system.  Smith discloses a computer program product at the bottom of Page 3:  “Our code is available at: gingsmith.github.io/cocoa/.”  Smith discloses “machines” on Page 7 Section 3.1:  “The goal of the CoCoA framework is to find a global minimizer of the objective (A), while distributing computation based on the partitioning of the dataset A across machines”.  Claim 20 is rejected for the same reasons as Claim 1.

As per Claim 24, this is a computer program product claim corresponding to method Claim 3.  The difference is that it recites a computer program product and a computerized system.  Smith discloses a computer program product at the bottom of Page 3:  “Our code is available at: gingsmith.github.io/cocoa/.”  Smith discloses “machines” on Page 7 Section 3.1:  “The goal of the CoCoA framework is to find a global minimizer of the objective (A), while distributing computation based on the partitioning of the dataset A across machines”.  Claim 24 is rejected for the same reasons as Claim 3.

As per Claim 25, this is a computer program product claim corresponding to method Claim 4.  The difference is that it recites a computer program product and a computerized system.  Smith discloses a computer program product at the bottom of Page 3:  “Our code is available at: gingsmith.github.io/cocoa/.”  Smith discloses “machines” on Page 7 Section 3.1:  “The goal of the CoCoA framework is to find a global minimizer of the objective (A), while distributing computation based on the partitioning of the dataset A across machines”.  Claim 25 is rejected for the same reasons as Claim 4.

Claims 7-15 are rejected under 35 U.S.C. 103 as being unpatentable over Smith in view of Zhang and Dai, further in view of Dal (KR 101691305 B1)
As per Claim 7, the combination of Smith, Zhang, and Dai teaches the method according to Claim 6.  Smith teaches duality-gap (see Rejection to Claim 3).  However, Smith does not teach temporal evolution of the duality-gap.
Zhang teaches slope and temporal evolution of the duality-gap  (Zhang Page 390 Para 3:  “It generates a low-quality model at the beginning and improves the model’s quality through a sequence of training iterations until it converges. In general, the quality improvement diminishes as more iterations are completed.”  Zhang, Page 391 Para 2, states:  “During a burst of job submissions, equal resources will be allocated to jobs that are in their early stages and could benefit significantly from extra resources as those that have nearly converged and cannot improve much further.”  In these statements, Zhang discloses a decrease in the rate of the convergence.  One of ordinary skill in the art and knowledge of calculus will appreciate that a rate of change is also known as a “slope”, and thus Zhang also discloses finding a “knee” that corresponds to a change in the convergence.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Zhang and Smith for at least the reasons recited in Claim 1.
However, the combination of Smith, Zhang, and Dai does not teach wherein the change is detected by comparing at least two slopes of the temporal evolution of the duality-gap, the at least two slopes including a short-term slope and a long-term slope.
Dal teaches wherein the change is detected by comparing at least two slopes of the temporal evolution [of the duality-gap], the at least two slopes including a short-term slope and a long-term slope. (Recall that Smith teaches duality gap.  Dal, Machine Translation of [0026-0029] and Table 1, discloses:  “In addition, when an update situation occurs in the ticker information, the real estate market information management unit 120 can notify the user who set the real estate as interest information to change the ticker information by using e-mail, text, SNS or the like. For example, the real estate market information management unit 120 analyzes patterns of real estate market fluctuations.
Table 1
Long term trend \ Short term trend
Rise (↗)
(→)
Fall (↘)
Rise (↗)
Long-term slope < short-term slope:exponential riseLong-term slope = short-term slope:linear riseLong-term slope > short-term slope:logarithmic rise
Splendor
Fall reversal
(→)
Start rising
Splendor
Start to fall
Fall (↘)
Rise reversal
Check floor
Long-term slope < short-term slope:decrease in dropLong-term slope = Short-term slope:Long-term slope > Short-term slope:increase in slope


Table 1 presents a table that analyzes patterns of current real estate market fluctuations with short-term and long-term trends. Here, the long-term trend is the first-order differential, which is the rate of change of the median filter value or the average of the market price over a short period of time from the past to the present. The short-term trend is the average of the market price or the median filter value change over a short period of time from the past to the present. Based on the criteria of this price classification, the user is informed of the change in the price of the object of interest or the value of the possessed object, and when the corresponding item is inputted, the same object is changed in the same pattern as the long-When changing, they can inform the user through mail, telephone, text, SNS, etc.”
Above, Dal discloses comparing the short-term slope and long-term slope to detect a temporal evolution (“exponential rise”, “logarithmic rise”, “decrease in drop”, “increase in slope”))
Dal and the combination of Smith, Zhang, and Dai are analogous art because the problem faced by Dal is (see MPEP 2141.01(a):  “This does not require that the reference be from the same field of endeavor as the claimed invention, in light of the Supreme Court's instruction that "[w]hen a work is available in one field of endeavor, design incentives and other market forces can prompt variations of it, either in the same field or a different one." Id. at 417, 82 USPQ2d 1396. Rather, a reference is analogous art to the claimed invention if: (1) the reference is from the same field of endeavor as the claimed invention (even if it addresses a different problem); or (2) the reference is reasonably pertinent to the problem faced by the inventor (even if it is not in the same field of endeavor as the claimed invention).”  Examiner points out that one of ordinary skill in the art, merely having knowledge of undergraduate calculus, will appreciate the study of rates-of-change, first and second-order derivatives, and inflection points, and that analyzing rates of change is a skill that is not restricted to machine learning, but is applied across a wide variety of disciplines.
It would have been obvious before the effective filing date of the claimed invention to combine the slope comparison of Dal with the elastic training of the combination of Smith, Zhang, and Dai.  One of ordinary skill in the art would be motivated to do so in order to efficiently allocate computing resources based on the pattern indicating the effectiveness of the current resource allocation (Dal, [0026]:  “For example, the real estate market information management unit 120 analyzes patterns of real estate market fluctuations.”)

As per Claim 8, the combination of Smith, Zhang, Dai, and Dal teaches the method according to Claim 7.  Smith teaches duality-gap (see Rejection to Claim 3).  However, Smith does not teach wherein the short-term slope is compared to the long-term slope so as to detect a knee of the temporal evolution of the duality-gap, wherein the knee corresponds to the change and determines a given moment in time, whereby the parallel training of the generalized linear model is scaled-in at the given moment in time.
Zhang teaches whereby the parallel training of the generalized linear model is scaled-in at the given moment in time.  (Zhang Page 390 Para 3:  “It generates a low-quality model at the beginning and improves the model’s quality through a sequence of training iterations until it converges. In general, the quality improvement diminishes as more iterations are completed.”  Zhang, Page 394 Right Column Para 3:  “The scheduler reclaims workers back from some job drivers, and reallocates them to other jobs for better system-wide performance goals”.  Here, Zhang discloses training and scaling in at a given point in time (when quality improvement diminishes)).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Zhang and Smith for at least the reasons recited in Claim 1.
However, the combination of Smith and Zhang does not teach wherein the short-term slope is compared to the long-term slope so as to detect a knee of the temporal evolution of the duality-gap, wherein the knee corresponds to the change and determines a given moment in time.
Dal teaches wherein the short-term slope is compared to the long-term slope so as to detect a knee of the temporal evolution [of the duality-gap], wherein the knee corresponds to the change and determines a given moment in time. (Dal, as shown above in the Rejection to Claim 7, discloses comparing a short-term slope to a long-term slope to analyze temporal evolution of the rate of change.  Dal suggests a “knee” in areas of Table 1, such as:  “Long-term slope < short-term slope: exponential rise”, “Long-term slope > short-term slope: logarithmic rise”, “Long-term slope < short-term slope: decrease in drop”, “Long-term slope > Short-term slope: increase in slope”.  Dal, Machine Translation of [0041-0042] and Table 2, illustrates the “knee”, and Examiner has highlighted several examples:  “In addition, the yield prediction unit 140 can obtain the price change pattern of Table 2 based on the first-order differential value and the second-order differential value of the price, and based on this, the future price can be predicted using the pattern shown in Table 3.”)

    PNG
    media_image1.png
    424
    442
    media_image1.png
    Greyscale

	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Dal with the combination of Smith and Zhang, for at least the reasons recited in Claim 7.

As per Claim 9, the combination of Smith, Zhang, Dai, and Dal teaches the method according to Claim 8.  Dal teaches wherein the short-term slope is compared to the long-term slope so as for the given moment in time to be determined by a time at which SS x d becomes smaller than S, wherein SS and S are values characterizing the short-term slope and the long-term slope, whereas d is a factor such that 1 < d.  (Examiner notes that Ss * d < S is an equivalent expression to (S / Ss) > d.
Dal, Machine Translation of [0026-0029] and Table 1, discloses “Long-term slope > short-term slope: logarithmic rise” and “Long-term slope < short-term slope: decrease in drop”.  Note that in the latter scenario, this refers to a “drop”, and the absolute value of the long-term slope is greater than the absolute value of the short term slope.  Here, Dal teaches that the magnitude of the long-term slope is greater than the short-term slope, and thus S > Ss, which is equivalent to stating that (S / Ss) > 1.  Dal, Machine Translation of [0019], discloses a “server”, and thus a computer:  “The real estate transaction value management apparatus 100 is a server for providing a service for calculating or providing an investment amount or a return rate of the real estate by using state information of the real estate and forecasting the rate of the real estate”.  Examiner notes that a computer can only store a finite amount of decimal places, and thus Dal’s method effectively checks that (S / Ss) >= d0, where d0 the smallest possible number greater than 1 that the computer can store in memory.  Therefore, Dal teaches (S / Ss) > d, where 1 < d < d0.)
However, Dal does not explicitly teach that d < 2, regarding whereas d is a factor such that 1 < d < 2.  Nevertheless, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to establish that d < 2, since it has been held that discovering an optimum value of a result effective variable involves only routine skill in the art. In Re Boesch, 617 F.2d 272, 205 USPQ 215 (CCPA 1980).  Examiner points out that the Instant Specification confirms that this is not a calculated value with theoretical significance, but rather empirically determined.  See [0040]: “where d is a factor that is larger than or equal to 1, e.g., 1 < d < 2, while SS and S are values characterizing the short-term slope and the long-term slope, respectively. The factor d, which defines a safe margin for deciding whether to scale-in or not, may for example be set to d = 1.25, which value proves to be suitable in practice” and [0055]:  “It was empirically determined that N = 2 and d = 1.25 works well across all evaluated datasets.”)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Dal with the combination of Smith, Zhang, and Dai, for at least the reasons recited in Claim 7.

As per Claim 10, the combination of Smith, Zhang, Dai, and Dal teaches the method according to Claim 9.  Dal teaches the factor d > 1  (see Rejection to Claim 9). However, Dal does not explicitly teach wherein the factor d is set to d = 1.25.  Nevertheless, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to establish that d = 1.25, since it has been held that discovering an optimum value of a result effective variable involves only routine skill in the art. In Re Boesch, 617 F.2d 272, 205 USPQ 215 (CCPA 1980).  Examiner points out that the Instant Specification confirms that this is not a calculated value with theoretical significance, but rather empirically determined.  See [0040]: “The factor d, which defines a safe margin for deciding whether to scale-in or not, may for example be set to d = 1.25, which value proves to be suitable in practice” and [0055]:  “It was empirically determined that N = 2 and d = 1.25 works well across all evaluated datasets.”)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Dal with the combination of Smith, Zhang, and Dai, for at least the reasons recited in Claim 7.

As per Claim 11, the combination of Smith, Zhang, Dai, and Dal teaches the method according to Claim 7.  Smith teaches convergence of the duality-gap (Smith, Page 14 top:  “In this section, we provide convergence rates for the proposed framework and introduce a key theoretical technique in analyzing non-strongly convex terms in the primal-dual setting.”  Smith, Page 3 Top, discloses:  “Using primal-dual information in this manner not only allows for efficient methods (achieving, e.g., up to 50x speedups compared to state-of-the-art), but also allows for strong primal-dual convergence guarantees and practical benefits such as computation of the duality gap for use as an accuracy certificate and stopping criterion.”)
operated at the computerized system during the parallel training (Smith, Page 7 Section 3.1:  “The goal of the CoCoA framework is to find a global minimizer of the objective (A), while distributing computation based on the partitioning of the dataset A across machines”.)
However, Smith does not teach extending since a last scale-in event.
Zhang teaches convergence [of the duality-gap] over a period of time extending since a last scale-in event extending since a last scale-in event (Recall above Smith teaches duality gap.  Zhang Page 390 Para 3:  “It generates a low-quality model at the beginning and improves the model’s quality through a sequence of training iterations until it converges. In general, the quality improvement diminishes as more iterations are completed.”  Zhang, Page 391 Para 2, states:  “During a burst of job submissions, equal resources will be allocated to jobs that are in their early stages and could benefit significantly from extra resources as those that have nearly converged and cannot improve much further.”  Zhang discloses scaling in Page 394 Right Column Para 3:  “The scheduler reclaims workers back from some job drivers, and reallocates them to other jobs for better system-wide performance goals”.  Zhang, Page 397 Right Column Para 4, discloses that this is a repetitive process, and thus this can be done “since” a “last” scaling event:  “The latter scenario is likely because SLAQ makes a scheduling decision once every epoch, which typically spans multiple iterations.”)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Zhang with Smith, for at least the reasons recited in Claim 1.
However, the combination of Smith and Zhang does not explicitly teach long-term slope.
Dal teaches long-term slope. (Dal, Machine Translation of [0026-0029] and Table 1, discloses comparing short term with long term slope to analyze rate of change over time.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Dal with the combination of Smith, Zhang, and Dai, for at least the reasons recited in Claim 7.  The combination would allow one to use comparison of short term and long term slope, as recited by Dal, in order to perform the analysis of convergence rate over time in order to decide the scaling down of workers, as recited by Zhang.

As per Claim 12, the combination of Smith, Zhang, Dai, and Dal teaches the method according to Claim 7.  Smith teaches convergence of the duality-gap (Smith, Page 14 top:  “In this section, we provide convergence rates for the proposed framework and introduce a key theoretical technique in analyzing non-strongly convex terms in the primal-dual setting.”  Smith, Page 3 Top, discloses:  “Using primal-dual information in this manner not only allows for efficient methods (achieving, e.g., up to 50x speedups compared to state-of-the-art), but also allows for strong primal-dual convergence guarantees and practical benefits such as computation of the duality gap for use as an accuracy certificate and stopping criterion.”)
operated at the computerized system during the parallel training (Smith, Page 7 Section 3.1:  “The goal of the CoCoA framework is to find a global minimizer of the objective (A), while distributing computation based on the partitioning of the dataset A across machines”.)
However, Smith does not teach extending since a last scale-in event.
Zhang teaches convergence [of the duality-gap] over a period of time extending over a finite number N of one or more most recent training epochs of the parallel training, N> 1. (Recall above Smith teaches duality gap.  Zhang Page 390 Para 3:  “It generates a low-quality model at the beginning and improves the model’s quality through a sequence of training iterations until it converges. In general, the quality improvement diminishes as more iterations are completed.”  Zhang, Page 391 Para 2, states:  “During a burst of job submissions, equal resources will be allocated to jobs that are in their early stages and could benefit significantly from extra resources as those that have nearly converged and cannot improve much further.”  Zhang discloses scaling in Page 394 Right Column Para 3:  “The scheduler reclaims workers back from some job drivers, and reallocates them to other jobs for better system-wide performance goals”.  Zhang, Page 397 Right Column Para 4, making a scheduling decision “once every epoch”:  “The latter scenario is likely because SLAQ makes a scheduling decision once every epoch, which typically spans multiple iterations.”  Zhang discloses multiple epochs (N > 1), and also multiple iterations.  Note that although Zhang states “once every epoch”, Instant Specification [0041] states that an iteration may also be considered an “epoch”:  “Note, an iteration is mostly equivalent to an epoch, but it does not have to be.”  Thus, Zhang makes a decision on the convergence rate that “typically spans multiple iterations”, and thus Zhang discloses an evolution over N > 1.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Zhang with Smith, for at least the reasons recited in Claim 1.
However, the combination of Smith and Zhang does not explicitly teach long-term slope.
Dal teaches short-term slope. (Dal, Machine Translation of [0026-0029] and Table 1, discloses comparing short term with long term slope to analyze rate of change over time.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Dal with the combination of Smith and Zhang, for at least the reasons recited in Claim 7.  The combination would allow one to use comparison of short term and long term slope, as recited by Dal, in order to perform the analysis of convergence rate over time in order to decide the scaling down of workers, as recited by Zhang.
As per Claim 13, the combination of Smith, Zhang, Dai, and Dal teaches the method according to Claim 12.  As shown above, Zhang teaches N > 1 (see Rejection to Claim 12). However, Zhang does not explicitly teach wherein the finite number N is set to N=2.  Nevertheless, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to establish that N = 2, since it has been held that discovering an optimum value of a result effective variable involves only routine skill in the art. In Re Boesch, 617 F.2d 272, 205 USPQ 215 (CCPA 1980).  Examiner points out that the Instant Specification confirms that this is not a calculated value with theoretical significance, but rather empirically determined.  See [0055]:  “It was empirically determined that N = 2 and d = 1.25 works well across all evaluated datasets.”)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Zhang with Smith, for at least the reasons recited in Claim 1.

As per Claim 14, the combination of Smith, Zhang, Dai, and Dal teaches the method according to Claim 1.  Zhang teaches wherein the second number K' is determined according to a fraction K/m, where m is a constant factor, m > 1. (Zhang, Page 394 Right Column Para 3, discloses:  “The scheduler reclaims workers back from some job drivers, and reallocates them to other jobs for better system-wide performance goals”. Here, Zhang discloses “reclaiming” workers from jobs, which means those jobs will have less workers.  Examiner notes that the claim language does not limit m to an integer value (Spec [0042] merely states that it “typically” is).  Examiner notes that when Zhang “reclaims” any number x (>= 1) workers, the second number of workers K’ = K / (K/(K-x)), and thus the “constant factor” m = K/(K-x), which must be > 1 when x >= 1.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Zhang, Smith, and Dai for at least the reasons recited in Claim 1.

As per Claim 15, the combination of Smith, Zhang, Dai, and Dal teaches the method according to Claim 14.  Zhang teaches constant factor m, m > 1 is set to m = 4 (see Rejection to Claim 14).  However, Zhang does not explicitly teach wherein the constant factor m is set to m = 4. Nevertheless, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to establish that m = 4, since it has been held that discovering an optimum value of a result effective variable involves only routine skill in the art. In Re Boesch, 617 F.2d 272, 205 USPQ 215 (CCPA 1980).  Examiner points out that the Instant Specification confirms that this is not a calculated value with theoretical significance, but rather empirically determined.  See [0042]:  “The constant factor m may for example be set to m = 4, which turned out to work well in practice” and [0055]:  “Rather, use is made of a fixed m = 4, as tests have shown that the convergence rate difference for smaller m is often very small”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Zhang with Smith, for at least the reasons recited in Claim 1.

Claims 17-18 are rejected under 35 U.S.C. 103 as being unpatentable over Smith in view of Zhang and Dai, further in view of Ahn et al. (“Soft memory box: A virtual shared memory framework for fast deep neural network training in distributed high performance computing”; hereinafter “Ahn”).
As per Claim 17, the combination of Smith, Zhang, and Dai teaches the method according to Claim 16.  Zhang teaches wherein reallocating at least part of the training data comprises transferring such data [in parallel] between multiple pairs of workers between, on the one hand, workers of the first set and, on the other hand, workers of the second set.  (Zhang, Page 396 Section 4.2.1, also discloses in parallel:  “The underlying frameworks help achieve data parallelization for training ML models: the training dataset is large and gets partitioned on multiple worker nodes, and the size of models (i.e., set of parameters) is comparably much smaller. The model parameters are updated by the workers, aggregated in the job driver, and disseminated back to the workers in the next iteration.”  Zhang, Page 394 Right Column Para 3:  “The scheduler reclaims workers back from some job drivers, and reallocates them to other jobs for better system-wide performance goals”.  Here, Zhang discloses reallocating jobs between workers.  The workers in the previous iteration are a first set of workers, and the smaller remaining workers after the reallocation are a second set of workers.  Zhang, Page 394 Section 3, discloses:  “A centralized SLAQ scheduler coordinates the resource allocation of multiple ML training jobs. As shown in Figure 4(a), each job is composed of a set of tasks. Each task processes data based on the ML algorithm on a small partition of the dataset, and can be scheduled to run on any node.”  Here, Zhang discloses that the allocation to workers comprises splitting up the training data (“small partition of the dataset”)).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Zhang and Smith for at least the reasons recited in Claim 1.
However, the combination of Smith and Zhang does not explicitly teach transferring such data in parallel between multiple pairs of workers.
Ahn teaches transferring such data in parallel between multiple pairs of workers.  (Ahn, Page 26494 Patra 2, discloses Remote Direct Memory Access (RDMA):  “Moreover, Infiniband supports Remote Direct Memory Access (RDMA). With RDMA, CPU does not need to control the transmission between local memory and remote memory, and thus, RDMA can reduce the number of memory copies between user space and kernel space. Eventually, RDMA not only reduces the access latency but also dramatically improves the communication bandwidth.”  Ahn states that this increases “communication bandwidth”.  Ahn gives further detail in Page 26499 Section B:  “The size of the shared memory allocated to each process is 1 GB. Each process performs read, write, and read/write (mix each 50%) simultaneously after the shared memory is allocated.”.  Here, Ahn discloses multiple process performing data transfer in parallel (“simultaneously”)).
Ahn and the combination of Smith, Zhang, and Dai are analogous art because they are both in the field of endeavor of distributed training of machine learning models.
It would have been obvious before the effective filing date of the claimed invention to combine the RDMA of Ahn with the distributed training of Smith, Zhang, and Dai.  One of ordinary skill in the art would be motivated to do so in order to reduce latency and improve bandwidth (Ahn, Page 26494 Para 2:  “With RDMA, CPU does not need to control the transmission between local memory and remote memory, and thus, RDMA can reduce the number of memory copies between user space and kernel space. Eventually, RDMA not only reduces the access latency but also dramatically improves the communication bandwidth.”)

As per Claim 18, the combination of Smith, Zhang, Dai, and Ahn teaches the method according to Claim 17.  Ahn teaches wherein the training data are transferred according to a foreground data copy mechanism based on a remote direct memory access.
Ahn teaches wherein the training data are transferred according to a foreground data copy mechanism based on a remote direct memory access. (Ahn, Page 26493 Intro Para 2, discloses distributed training:  “In distributed deep learning platforms, workers (or trainers) conduct distributed training of deep neural networks.”  Ahn, Page 26494 Para 4, discloses:  “The proposed shared memory framework for distributed deep learning can be operated in the high-performance clusters that are connected via high-speed networking techniques (e.g., Infiniband). In addition, the proposed architecture utilizes RDMA technique and thus eliminates copy operations of communication data between application-level buffers and kernel-level memory buffers. Eventually, the proposed framework provides a method which can share deep learning parameters among distributed workers using RDMA for reading/writing data in the memory of remote nodes.”  Here, Ahn discloses transferring training data (“share deep learning parameters”) according to a copy (“share”) mechanism based on remote direct memory access (“RDMA”).  One of ordinary skill in the art will appreciate that a foreground process is a process that can be interacted with by a user, and Ahn Page 26494 Para 2 discloses an interaction of RDMA with user space:  “RDMA can reduce the number of memory copies between user space and kernel space.”  Ahn also discloses a user level process on Page 26498 Section F:  “The SMB Server is implemented as a Linux user level process.”)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Ahn with the combination of Zhang, Smith, and Dai for at least the reasons recited in Claim 17.

Claim 19 is rejected under 35 U.S.C. 103 as being unpatentable over Smith in view of Zhang, Dai, and Ahn, further in view of Kwon et al. (“Beyond the Memory Wall: A Case for Memory-centric HPC System for Deep Learning”; hereinafter “Kwon”).
As per Claim 19, the combination of Smith, Zhang, Dai, and Ahn teaches the method according to Claim 18.  Zhang teaches wherein the second number K' is determined according to a fraction K/m, where m is a constant factor, m > 1 (Zhang, Page 394 Right Column Para 3, discloses:  “The scheduler reclaims workers back from some job drivers, and reallocates them to other jobs for better system-wide performance goals”. Here, Zhang discloses “reclaiming” workers from jobs, which means those jobs will have less workers.  Examiner notes that the claim language does not limit m to an integer value (Spec [0042] merely states that it “typically” is).  Examiner notes that when Zhang “reclaims” any number x (>= 1) workers, the second number of workers K’ = K / (K/(K-x)), and thus the “constant factor” m = K/(K-x), which must be > 1 when x >= 1.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Zhang and Smith for at least the reasons recited in Claim 1.
Ahn teaches transferring data in parallel (see Rejection to Claim 17), in which multiple workers transfer data simultaneously.  One of ordinary skill in the art will appreciate that a number of threads can be reduced by a fraction m, and one of ordinary skill in the art will appreciate that given m links, that the total parallel bandwidth will be m * the bandwidth in each link. Nevertheless, the combination of Smith, Zhang, and Ahn does not explicitly teach wherein the second number K' is determined according to a fraction K/m, where m is a constant factor, m > 1, and the data copy mechanism is implemented so as to achieve a transfer rate of m x r, where r denotes a single-link bandwidth of the system, and thus for the sake of clarity Examiner references another piece of art for this explicit matter.
Kwon teaches wherein the second number K' is determined according to a fraction K/m, where m is a constant factor, m > 1, and the data copy mechanism is implemented so as to achieve a transfer rate of m x r, where r denotes a single-link bandwidth of the system (Kwon, Page 151 Left Column, discloses:  “This means that a singular high-bandwidth link of  25 GB/sec per device would amount to a total of 100 GB/sec of worst-case host-side memory bandwidth consumption when accounting for the four PCIe-attached devices connected to a single CPU socket.”  Here, Kwon gives an example of 4 links, and achieving a bandwidth of 25 * 4, where 25 is the single-link bandwidth.)
Kwon and the combination of Smith, Zhang, Dai, and Ahn are analogous art because they are both in the field of endeavor of distributed training of machine learning models.
It would have been obvious before the effective filing date of the claimed invention to combine the RDMA of Kwon with the distributed training of Smith, Zhang, Dai, and Ahn.  One of ordinary skill in the art would be motivated to do so in order to improve the speed with which complex machine learning models can be trained (Kwon, Page 150 Section C:  “As the DNN algorithm gets more complex and deeper [20]–[22], the need for distributed multi-node systems, each with multiple accelerator devices, have significantly increased to provide high computing horsepower for DL practitioners”)

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Peng et al. (“Optimus: An Efficient Dynamic Resource Scheduler for Deep Learning Clusters”) discloses allocating workers in order of marginal gains, until no more gains can be made, on Page 7 Para 1:  “Our resource allocation algorithm in each scheduling interval works as follows. We first allocate one worker and one parameter server to each active job to avoid starvation, and then sort all jobs in order of their marginal gains computed using (9). Then we iteratively select the job with the largest marginal gain and add one worker or parameter server to the job, according to which of the two terms in (9) is larger (i.e., whether adding a worker or adding a parameter server brings larger marginal gain). Marginal gains of the jobs are updated when their resource allocation changes. The procedure repeats until some resource in the cluster is used up, or marginal gains of all jobs become non-positive.”
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to LEONARD A SIEGER whose telephone number is (571)272-9710. The examiner can normally be reached M-F 8:00 am - 5:00 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ann Lo can be reached on (571) 272-9767. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/L.A.S./Examiner, Art Unit 2126   
/ANN J LO/Supervisory Patent Examiner, Art Unit 2126